
NVIDIA Launches Its Computing Platform for the Age of Agentic AI
“Finally, AI is able to do productive work. And therefore, the inflection point of inference has arrived.”
Yesterday, Jensen Huang made a forceful argument that the evolution of generative AI has reached a new stage. After years focused on model training and advancement, the time has come for AI to contribute through real, measurable work.
“AI now has to think. In order to think, it has to inference. AI now has to do. In order to do, it has to inference. AI has to read. In order to do so, it has to inference. It has to reason. It has to inference.”
To support the age of inference, NVIDIA used its keynote to present its usual dizzying array of announcements and demonstrations (including, this year, a robot Olaf from Disney’s Frozen). For us, two new launches stood out among the set: the Vera CPU, the world’s first processor purpose-built for agentic AI workloads, and the BlueField-4 STX storage architecture, a modular reference design built to keep context memory fast, accessible, and efficient at rack scale. Together, they form the foundational compute and storage layers of the Vera Rubin platform.
The CPU Is No Longer Just Supporting the Model
The Vera CPU features 88 custom NVIDIA-designed Olympus cores and a second-generation low-power memory subsystem built on LPDDR5X, delivering up to 1.2 TB/s of bandwidth — twice the bandwidth at half the power of general-purpose CPUs. The headline performance claim is twice the efficiency and 50% faster than traditional rack-scale CPUs.
But the more strategic claim is about what the CPU is now being asked to do. As agentic AI advances, it is coordinating tool calls, orchestrating multi-step reasoning workflows, and running the inference scaffolding that keeps the whole system coherent. As Huang put it: “The CPU is no longer simply supporting the model; it’s driving it.”
NVIDIA also announced a new Vera CPU rack integrating 256 liquid-cooled Vera CPUs capable of sustaining more than 22,500 concurrent CPU environments, each running independently at full performance. For AI factories runing tens of thousands of simultaneous agent instances, that density is a significant operational advantage. Vera is in full production and will be available from partners in the second half of 2026.
Why the Storage Stack Had to Change Too
The Vera CPU story cannot be told without the BlueField-4 STX announcement, because the two are designed to address the same challenge with agentic AI moving into action.
General-purpose storage infrastructure was never designed with agentic AI in mind, and the performance gap becomes apparent the moment an agent has to retrieve context across long, multi-step reasoning chains. As context grows, traditional storage and data paths can slow AI inference and reduce GPU utilization. In production agentic systems, every retrieval from a slow storage tier adds latency to a reasoning chain that may already span hundreds of steps.
STX addresses this with, with NVIDIA saying it can achieve up to five times more tokens per second compared with traditional storage, four times higher energy efficiency, and two times faster data ingestion. The architecture centers on the new NVIDIA CMX context memory storage platform, a high-performance layer that expands GPU memory for scalable inference. Under the hood, BlueField-4 STX leverages a new BlueField-4 processor that combines the NVIDIA Vera CPU with the ConnectX-9 SuperNIC, along with Spectrum-X Ethernet networking, NVIDIA DOCA, and NVIDIA AI Enterprise software.
The TechArena Take
NVIDIA arrived at GTC 2026 with something more consequential than a chip announcement. It arrived with an argument: that the infrastructure built for large-scale model training is not the infrastructure needed to run AI that reasons, plans, and acts continuously.
The Vera CPU is designed to close the gap between CPU architecture and what agentic orchestration actually demands. STX addresses the storage latency problem that has been quietly degrading inference performance in production deployments. The announced partner list is wide, the timelines are near-term, and the roadmap extends further still, with the next Feynman architecture already scoped to advance every pillar of the AI factory: compute, memory, storage, networking, and security.
When Huang says the inflection point of inference has arrived, Vera and STX are the infrastructure answer to that claim. AI that thinks, reads, reasons, and acts continuously demands an infrastructure stack built specifically for that workload. The announcements this week show NVIDIA is already building for that demand.
“Finally, AI is able to do productive work. And therefore, the inflection point of inference has arrived.”
Yesterday, Jensen Huang made a forceful argument that the evolution of generative AI has reached a new stage. After years focused on model training and advancement, the time has come for AI to contribute through real, measurable work.
“AI now has to think. In order to think, it has to inference. AI now has to do. In order to do, it has to inference. AI has to read. In order to do so, it has to inference. It has to reason. It has to inference.”
To support the age of inference, NVIDIA used its keynote to present its usual dizzying array of announcements and demonstrations (including, this year, a robot Olaf from Disney’s Frozen). For us, two new launches stood out among the set: the Vera CPU, the world’s first processor purpose-built for agentic AI workloads, and the BlueField-4 STX storage architecture, a modular reference design built to keep context memory fast, accessible, and efficient at rack scale. Together, they form the foundational compute and storage layers of the Vera Rubin platform.
The CPU Is No Longer Just Supporting the Model
The Vera CPU features 88 custom NVIDIA-designed Olympus cores and a second-generation low-power memory subsystem built on LPDDR5X, delivering up to 1.2 TB/s of bandwidth — twice the bandwidth at half the power of general-purpose CPUs. The headline performance claim is twice the efficiency and 50% faster than traditional rack-scale CPUs.
But the more strategic claim is about what the CPU is now being asked to do. As agentic AI advances, it is coordinating tool calls, orchestrating multi-step reasoning workflows, and running the inference scaffolding that keeps the whole system coherent. As Huang put it: “The CPU is no longer simply supporting the model; it’s driving it.”
NVIDIA also announced a new Vera CPU rack integrating 256 liquid-cooled Vera CPUs capable of sustaining more than 22,500 concurrent CPU environments, each running independently at full performance. For AI factories runing tens of thousands of simultaneous agent instances, that density is a significant operational advantage. Vera is in full production and will be available from partners in the second half of 2026.
Why the Storage Stack Had to Change Too
The Vera CPU story cannot be told without the BlueField-4 STX announcement, because the two are designed to address the same challenge with agentic AI moving into action.
General-purpose storage infrastructure was never designed with agentic AI in mind, and the performance gap becomes apparent the moment an agent has to retrieve context across long, multi-step reasoning chains. As context grows, traditional storage and data paths can slow AI inference and reduce GPU utilization. In production agentic systems, every retrieval from a slow storage tier adds latency to a reasoning chain that may already span hundreds of steps.
STX addresses this with, with NVIDIA saying it can achieve up to five times more tokens per second compared with traditional storage, four times higher energy efficiency, and two times faster data ingestion. The architecture centers on the new NVIDIA CMX context memory storage platform, a high-performance layer that expands GPU memory for scalable inference. Under the hood, BlueField-4 STX leverages a new BlueField-4 processor that combines the NVIDIA Vera CPU with the ConnectX-9 SuperNIC, along with Spectrum-X Ethernet networking, NVIDIA DOCA, and NVIDIA AI Enterprise software.
The TechArena Take
NVIDIA arrived at GTC 2026 with something more consequential than a chip announcement. It arrived with an argument: that the infrastructure built for large-scale model training is not the infrastructure needed to run AI that reasons, plans, and acts continuously.
The Vera CPU is designed to close the gap between CPU architecture and what agentic orchestration actually demands. STX addresses the storage latency problem that has been quietly degrading inference performance in production deployments. The announced partner list is wide, the timelines are near-term, and the roadmap extends further still, with the next Feynman architecture already scoped to advance every pillar of the AI factory: compute, memory, storage, networking, and security.
When Huang says the inflection point of inference has arrived, Vera and STX are the infrastructure answer to that claim. AI that thinks, reads, reasons, and acts continuously demands an infrastructure stack built specifically for that workload. The announcements this week show NVIDIA is already building for that demand.



