← Back to topics
Discussion 2026-05-28

AI Agents: The Paradigm Shift from Conversation to Action

AI AgentLarge Language ModelsVibe CodingMedical AIAI for AI

In 2026, the AI industry is experiencing a paradigm shift from “conversational AI” to “action-oriented AI.” As agents begin to watch screens, click mice, write code, and conduct research, the competitive focus of AI has shifted from “who is smarter” to “who can get things done.”

Policy Foundation: National Guidelines for AI Agents

In May, China’s Cyberspace Administration, National Development and Reform Commission, and Ministry of Industry and Information Technology jointly issued the “Implementation Opinions on the Standardized Application and Innovative Development of AI Agents.” This marks the first national-level policy to formally define AI agents as “intelligent systems with autonomous perception, memory, decision-making, interaction, and execution capabilities.” The document outlines 19 typical application scenarios across scientific research, industrial development, consumer stimulation, public welfare, and social governance.

AI Pulse View: This is the world’s first national-level regulatory framework specifically targeting AI agents, signaling their transition from “technical experiments” to “production tools.” Government-level clarity provides certainty for industrial innovation while establishing safety boundaries for agent deployment.

Medical AI: A 7B Model Surpassing GPT-5

The LeapQuest team from Shanghai Academy of Innovation, together with Zhejiang University, Shanghai Jiao Tong University, and Fudan University, published two papers at ICML 2026, introducing the “Think with Images/Think with Videos” paradigm to medical AI for the first time.

Ophiuchus (for medical imaging) and MedScope (for clinical long-form videos) share a breakthrough: instead of passively receiving visual input, the model actively calls visual tools during reasoning—using SAM2 for fine segmentation, zooming into critical regions, and clipping video segments—integrating visual evidence directly into its chain of thought.

Under identical tool configurations, Ophiuchus-7B achieved an average score of 68.0 across 8 VQA benchmarks, outperforming OpenAI-o3 (62.2), Gemini 2.5 Pro (61.8), and GPT-5 (59.9). Tool invocation accuracy reached 97.9%.

AI Pulse View: This demonstrates that in vertical domains, agent architecture (active tool calling, dynamic evidence gathering) can compensate for parameter gaps. A 7B model with the right architecture can outperform closed-source models with tens or hundreds of billions of parameters on specific tasks. There is a qualitative difference between “knowing how to call tools” and “knowing how to think with tools.”

AI Building AI: ForgeTrain Outperforms NVIDIA Megatron

OpenBMB (面壁智能) released ForgeTrain—the world’s first production-grade LLM pre-training framework entirely written by AI. Under identical hardware conditions, ForgeTrain’s training speed is 10% faster than NVIDIA’s Megatron, and it achieves a 10% acceleration on Huawei’s Ascend chips.

Meanwhile, MiniCPM5-1B—a model trained by ForgeTrain—has pushed the intelligence density ceiling for small models: it surpassed all models under 2B parameters on the AA-Index benchmark, outperforming Qwen3.5-2B released three months earlier with half the parameters.

OpenBMB categorizes “AI building AI” into five levels from L1 to L5: from L1 (AI gives suggestions only) to L5 (AI autonomously sets research agendas). ForgeTrain sits at the L3-L4 stage—AI end-to-end producing next-generation model infrastructure.

AI Pulse View: When AI begins writing the frameworks that train AI, the “human bottleneck” in LLM R&D is being broken. This is not just an efficiency gain—it is a fundamental shift in the research paradigm: humans move from “writing code in the loop” to “designing objectives outside the loop.” For domestic chip ecosystems, AI-generated adaptation frameworks could be the key path to closing the gap with NVIDIA’s ecosystem.

Vibe Coding: The “Fourth Programming Revolution”

At the 2026 China AIGC Industry Summit, Baidu’s Miaoda (秒哒) product director Zhu Guangxiang shared how Vibe Coding is reshaping the software industry: 87% of Miaoda users have zero coding knowledge, and 16% are one-person companies (OPCs). One Shanghai enterprise replaced a 12-person R&D team with just 4 project managers, cutting delivery cycles from “years” to “months” and securing orders worth over 10 million RMB.

An 8-year-old built a custom children’s operating system through Miaoda; the app “SiLeMe” achieved a valuation of 10 million RMB with only 1,000 RMB in development cost; Lovable’s AI-generated apps account for 10% of all new global apps. Wall Street SaaS stocks evaporated $280 billion in 48 hours, with Salesforce dropping 40%—the old model is being disrupted.

AI Pulse View: The essence of Vibe Coding is not “lowering the programming barrier”—it is “turning demanders into suppliers.” When creativity is no longer constrained by technical capability, the supply-demand dynamics of the software market will be redefined. Traditional SaaS vendors’ moats are eroding, and the rise of OPCs heralds the arrival of the “super individual” era.

The Agent Ecosystem: From Model Competition to Execution Competition

Baidu founder Robin Li stated in a Xinhua interview: “AI agents have broken into the mainstream. For the first time, the protagonist of AI is not the model, but the application.” He proposed that Daily Active Agents (DAA) should become the core metric for measuring AI ecosystem prosperity, analogous to Daily Active Users (DAU) in the mobile internet era.

Currently, Baidu, Alibaba, Tencent, ByteDance, Zhipu AI, and Moonshot AI have all aggressively entered the agent space. Kimi has built a complete technical capability system from a trillion-parameter MoE architecture to an agent orchestration platform. Taobao Flash Purchase has been fully integrated with the Qwen agent, covering over 300 prefecture-level cities.

According to the “AI Agents Empowering Industry Decision-Making: Trends and Practices White Paper (2026),” agent penetration rates in manufacturing, finance, and government sectors have already exceeded 50%.

AI Pulse View: The competitive axis of the AI industry is fundamentally shifting: from “whose model is stronger” to “whose agent can do more things.” This means the core competitiveness of future AI products will no longer be parameter scale or benchmark scores, but tool-calling capability, cross-system execution, and the closed-loop efficiency of transforming user intent into actual delivery.

Conclusion: The Gears Are Already Accelerating

From Ophiuchus’s “seeing while thinking,” to ForgeTrain’s “AI building AI,” to Miaoda’s “everyone is a developer”—the AI industry in May 2026 reveals a clear, unifying theme: AI is moving from “generating answers” to “delivering results.”

When models learn to proactively gather evidence, when AI writes the frameworks that train AI, when 8-year-olds can build operating systems—we are not seeing isolated technological breakthroughs, but projections of the same paradigm shift across different dimensions: AI is becoming an actor, not merely a responder.

AI Pulse View: The significance of this transformation lies not in “what AI can do,” but in “who AI has enabled to do what they couldn’t before.” When the dividends of technology spread from the engineer class to everyone with an idea, the boundaries of innovation will be redefined. The gears are already accelerating.