ai-weekly · English · 8 min

AI Week W01/2026: The Global Community Pauses to Look Back — and a 173,000-Word Book Opens the Year

January 5, 2026

The first week of 2026 brought no major releases — but something more valuable: a moment of synthesis. A Japanese CTO published a free 173,000-word technical book on LLM research; Simon Willison's year-in-review earned 940 upvotes on Hacker News; MIT Technology Review mapped five trajectories shaping 2026. Meanwhile, Vietnam's developer community is doing something practical: building the infrastructure to stop depending on expensive hosted APIs.

The first week of 2026 brought no new model launches, no benchmark that rewrote the rankings. Instead, the global AI community did something that rarely happens: it stopped and took stock. A CTO in Tokyo published a free 39-chapter technical book synthesizing two years of LLM research. Simon Willison's year-end retrospective landed on Hacker News with 940 points and 599 comments. In Vietnam, engineers were publishing practical guides on local inference, open-source agent tooling, and model leaderboards — building the foundation to stop depending on expensive hosted APIs. This was a week of synthesis, not announcements.

The thread running through W01: the competitive axis is shifting away from raw model quality toward the architectural layer above — memory, tool-use, agent orchestration, and inference cost. Communities in Japan, Vietnam, and the anglophone world are arriving at the same conclusion from different starting points.

From Japan

Is LLM dead? The "stack above the model" is now the real battleground

A Zenn analysis published in early January opens with a deliberately provocative question: "Is the LLM paradigm already obsolete?" The answer isn't yes or no — it's a shift in competitive axis. The author argues that the convergence of multimodal systems and multi-agent architectures is making raw model performance a commodity. Google Genie 3 (interactive world models) and cross-provider agent SDKs are presented as evidence: value is migrating to memory, tool-use, and inter-agent coordination. The piece names 2026 as the year the "stack above the model" becomes the actual differentiator.

For anyone building on top of LLMs, the practical implication is clear: invest in orchestration layer design, not just better model selection.

Every 2025 model release in one timeline — from o3 to Veo 3

A widely-read Zenn year-end retrospective, published December 31 and read heavily entering the new year, catalogs all major generative AI model releases of 2025 by provider. OpenAI shipped 10+ distinct models including o3, o4-mini, GPT-4.5, and GPT-5/5.1. Google released Gemini iterations from 2.0 to 3.0, Gemma3, and Veo 3 for video generation. Anthropic shipped the Claude 4.x and 5.x families alongside its agent SDK. The author's read: performance gains were incremental, but the practical shift in how professionals conduct R&D using specialized AI tooling was real and significant.

This is the most reliable single-source timeline for anyone who needs to reconstruct the 2025 landscape before making toolchain decisions for the year ahead.

39 chapters, 173,000 words — a free technical encyclopedia of LLM research

On January 5, the CTO of Globis published a free 39-chapter technical book on Zenn synthesizing all notable LLM research from 2024 through 2025. The scope spans architecture alternatives to Transformers (Mamba, SSMs, MoE), the alignment training shift from RLHF to DPO, inference-time compute scaling, multimodal unification theory, multi-agent coordination design, mechanistic interpretability, hallucination mitigation methods, and the mathematical formalization of alignment problems.

This is a rare artifact — not a blog post or opinion piece but a structured synthesis that would take months of paper-reading to assemble independently. It lands at the start of 2026 as both a capstone on two years of research and a map for the year ahead.

The 2026 AI/LLM practitioner bookmark list — curated by use case

A familiar Qiita AI curator published a reference link collection on January 1 aggregating essential resources for AI and LLM practitioners in 2026, organized by three use cases: research, engineering, and product development. The list covers foundational papers, production deployment guides, evaluation benchmarks, and community-maintained tooling lists.

Rather than spending hours in discovery, this is a reliable, up-to-date bookmark list for 2026 — particularly valuable for anyone reorienting after the holiday break.

From Vietnam

Building production-grade local AI agents without paying for API calls

A Viblo survey of 10 open-source tools covers production-grade local AI agent development — framed explicitly as an alternative to expensive hosted APIs. Every tool listed has more than 10,000 GitHub stars. The article highlights a lightweight agent framework that gained significant traction in early 2026, with minimal-code agent logic and a focus on local and private cloud deployment rather than cloud-dependent pipelines.

For Vietnamese developers — and any team operating under API cost constraints — this is the most practical guide currently available for building capable agent systems without recurring inference costs.

The 2026 update to essential GitHub repos for AI agent development

Viblo's updated January 2026 roundup ranks the top 10 GitHub repositories for AI agent development with an updated order compared to prior-year lists. Coverage spans orchestration frameworks, memory systems, tool-calling infrastructure, and evaluation harnesses. The key observation in the 2026 update: new entrants reflect the community's shift toward smaller, composable agent components over heavyweight monolithic frameworks.

The move toward composable repos is a real architectural signal, not just a popularity contest. Knowing which repos are gaining momentum helps practitioners make sharper build-vs-integrate calls.

vLLM — 10 to 20x throughput gains: a Vietnamese-language guide to production LLM deployment

A technical tutorial on Viblo walks through LLM deployment with vLLM, covering PagedAttention mechanics, continuous batching, and quantization options for production inference. The article benchmarks vLLM against naive HuggingFace Transformers inference, demonstrating 10–20x throughput improvements on GPU hardware. Writing this in Vietnamese makes critical infrastructure knowledge accessible to the local developer community without requiring English documentation.

vLLM has become the de-facto standard for self-hosted LLM inference. Quality Vietnamese-language material like this meaningfully lowers the adoption barrier for the local community.

End-of-2025 LLM leaderboard: DeepSeek V3 and Qwen hold their own against closed-source

Viblo's year-end LLM ranking, published January 1, evaluates the top 10 models on reasoning, coding, instruction-following, and multilingual capability — with specific attention to models available for local deployment. DeepSeek V3 and Qwen 2.5 feature prominently alongside closed-source entries from OpenAI and Anthropic.

Independent benchmark roundups that include open-source alternatives help practitioners make model selection decisions based on evidence rather than provider marketing alone.

Global

MIT Technology Review: five trajectories that will define AI in 2026

MIT Technology Review's January 5 forecast identifies five major AI trajectories for 2026. First: Chinese open-source LLMs (DeepSeek R1, Qwen, GLM) gaining significant US startup adoption due to lower cost and customizability. Second: LLM-assisted scientific discovery via AlphaEvolve-style systems combining LLMs with evolutionary algorithms — with OpenEvolve and AlphaResearch already emerging as variants. Third: agentic commerce projected by McKinsey at $3–5 trillion annually by 2030. Fourth: accelerating US federal vs. state AI regulatory battles triggered by the Trump executive order on state-level AI laws. Fifth: expanding legal liability frontiers including chatbot harm cases and defamation suits.

The trajectory most directly relevant to DS/AI researchers: LLMs paired with verifiable solvers to attack previously intractable problems. That combination is the clearest path to next-generation research tooling.

Simon Willison's 2025 LLM retrospective: 940 upvotes and the questions that actually matter

Simon Willison's year-end LLM retrospective, posted to Hacker News on January 1, received 940 points and 599 comments. The piece traces the arc from reasoning model releases (o3, DeepSeek R1) through the open-source vs. closed debate to the emergence of agentic systems. Community discussion surfaced the key tensions: LLM productivity value vs. hype, the $1 trillion+ Capex investment creating a 5–6 year infrastructure runway, and whether 2025 represented genuine capability progress or incremental polish.

Willison's annual retrospective is one of the most reliable practitioner-oriented surveys of the year. The Hacker News discussion layer adds critical signal about which advances builders actually value — as opposed to which advances labs are marketing.

Benchmarks are saturating — and the field is being forced to invent new evaluation

arXiv community discussion in early January 2026 surfaced an ongoing structural problem: benchmark saturation, where frontier models are approaching ceiling performance on MMLU, HumanEval, and GSM8K, forcing the field toward harder and more realistic evaluation protocols. New directions include long-context recall, multi-step agentic task completion, and adversarial robustness under distribution shift. Model collapse risk — studied via linguistic similarity trajectories across large-scale corpora from 2013 to 2025 — is also flagged as a structural concern for the next generation of training runs.

Benchmark saturation is a direct signal that the community needs new evaluation infrastructure. Practitioners who understand this shift will design more robust internal evals rather than anchoring on stale public leaderboards.

Editor's Pick: 39 Chapters of LLM Research Synthesis — the Most Significant Piece of W01

Of all 11 stories this week, sue738's free 173,000-word technical book on Zenn — written by Globis's CTO — is the one with the most lasting value.

This is not a news article or opinion piece. It is a structured synthesis of two years of LLM research, written by a practitioner at CTO level, covering everything from Mamba and SSM as Transformer architecture alternatives, to DPO alignment, inference-time compute scaling, multimodal unification, multi-agent design, and mechanistic interpretability. To assemble equivalent coverage independently, an engineer would need to read hundreds of papers over several months.

What makes it especially well-timed for early 2026: the book functions both as a capstone on 2024–2025 research and as a map for what to understand before making architectural decisions in the coming year. Any DS/AI engineer currently planning a 2026 toolchain or research agenda should start here. And it costs nothing.

Watch next week for whether MIT Technology Review's forecasts — particularly the growth of Chinese open-source LLMs in US startups and the federal vs. state regulation clash — begin to materialize as specific news. It's also the first full working week of 2026 for most engineering teams, which is typically when actual toolchain decisions get made rather than just planned.

weekly-digest2026llmmulti-agentsopen-sourcebenchmark

Sources