The factory is replacing the computer as AI infrastructure enters a new phase

Mark Venables

The Week in AI, AI Factories, AI Hardware/Infrastructure

Share this article

NVIDIA used its GTC conference to signal a decisive shift in how artificial intelligence is built and deployed, moving beyond individual chips and models toward fully integrated systems designed to generate and manage intelligence at scale. The company’s latest announcements point to a future in which AI is no longer defined by standalone hardware or software, but by coordinated “factories” that produce, refine and serve intelligence continuously.

At the centre of this shift is the introduction of the NVIDIA Vera Rubin platform, a tightly integrated system combining multiple chips and networking technologies into a single architecture designed to support every stage of AI development, from training through to real-time inference. The platform brings together GPUs, CPUs, networking and storage into a unified system, reflecting a broader industry transition away from discrete components toward fully orchestrated infrastructure.

This is reinforced by the launch of the Vera CPU, which has been designed specifically for agentic AI workloads, where systems must plan, execute and refine tasks autonomously. Unlike traditional processors that support models, the Vera CPU is positioned as an active component in AI systems, enabling faster response times and more efficient scaling of complex workloads.

Infrastructure becomes the system

Alongside new hardware, NVIDIA introduced a series of software and architectural layers that further blur the distinction between infrastructure and application. Dynamo, an open source inference platform, is designed to act as an operating system for AI factories, orchestrating how workloads are distributed across GPUs and memory in real time.

This reflects a growing challenge in the deployment of large-scale AI systems, where the complexity of managing inference workloads has become as significant as training models. By coordinating resources dynamically, Dynamo aims to reduce inefficiencies and improve performance, particularly for agentic systems that generate unpredictable and bursty workloads.

Storage is also being redefined. The BlueField-4 STX architecture introduces a new approach to data handling, designed to support long-context reasoning by keeping data accessible and responsive at scale. Traditional storage systems, built for capacity rather than responsiveness, are increasingly seen as a bottleneck for AI systems that must maintain continuity across extended interactions.

Designing intelligence at scale

Perhaps the most significant shift lies in how these systems are designed and deployed. NVIDIA’s Vera Rubin DSX reference architecture and Omniverse-based digital twin blueprint provide a framework for building AI factories as complete, simulated environments before they are physically constructed.

This approach treats AI infrastructure not as a collection of servers, but as an integrated system that must balance compute, power, cooling and networking in real time. By modelling these interactions in advance, organisations can optimise performance and efficiency before committing to large-scale deployments.

The implications extend beyond technical architecture. As AI systems become more autonomous and more widely deployed, the infrastructure supporting them is emerging as a defining factor in performance, cost and scalability. The emphasis on token generation, energy efficiency and system-level coordination suggests that competitive advantage will increasingly depend on how effectively organisations can design and operate these AI factories.

Taken together, the announcements at GTC indicate that the centre of gravity in AI is shifting. The focus is no longer solely on building better models, but on constructing the environments in which those models operate. In that context, the factory, rather than the computer, is becoming the fundamental unit of the AI era.