Nvidia pushes AI boundaries with Rubin CPX for massive context models

Mark Venables

The Week in AI, AI Hardware/Infrastructure

Share this article

Nvidia has unveiled a new class of processor designed to handle the growing computational demands of artificial intelligence models that work with enormous amounts of data and context. The Rubin CPX GPU, announced at the AI Infra Summit, is purpose-built for so-called massive-context inference, where AI systems process millions of tokens simultaneously to generate code or high-resolution video.

The company says Rubin CPX will enable applications such as million-token software coding and generative video, areas that are stretching the capabilities of current hardware. By integrating long-context processing and dedicated video encoders into a single chip, the GPU is intended to support tasks like hour-long video generation or complex software development in real time.

Designed for long context AI

Rubin CPX will sit at the heart of the new Vera Rubin NVL144 CPX platform, which combines Nvidia’s Vera CPUs and Rubin GPUs in a rack that delivers eight exaflops of AI compute and 100 terabytes of fast memory. According to Nvidia, the platform offers 7.5 times the AI performance of its GB300 NVL72 systems and 3 times faster attention mechanisms, a key factor in enabling large models to work across extended context windows without loss of speed.

The processor delivers up to 30 petaflops of compute using the company’s NVFP4 precision and features 128GB of GDDR7 memory. It can be paired with Nvidia’s Quantum X800 InfiniBand fabric or Spectrum-X Ethernet networking for scale-out deployments. Nvidia describes Rubin CPX as a cost-efficient monolithic design focused on high performance and energy efficiency for AI inference.

Long-context capability is particularly relevant for generative AI. Models that analyse or create lengthy video can require up to one million tokens for an hour of content, well beyond the limits of most GPUs. In software engineering, being able to consider an entire codebase and years of interaction history at once could transform AI coding assistants from basic generators to sophisticated optimisation tools.

Early interest from AI developers

Several AI companies are already exploring how Rubin CPX could accelerate their work. Cursor, an AI-powered code editing platform, expects the new GPU to improve the speed and depth of intelligent code generation. Generative video firm Runway plans to use the technology to enable more complex, long-form creative workflows, while Magic, a developer of foundation models for AI agents, sees the potential to handle codebases and documentation running to hundreds of millions of tokens.

Nvidia chief executive Jensen Huang said Rubin CPX introduces a new category of processor dedicated to massive-context AI, comparing its potential impact to the company’s earlier RTX breakthrough in graphics and physical AI.

Preparing for large scale deployment

Rubin CPX will be supported by Nvidia’s full AI software stack, including the Dynamo platform for scaling inference workloads and the enterprise-ready Nemotron family of multimodal models. The company expects the processor to be available by the end of 2026.

The launch reflects a broader shift in the AI industry toward models that need far larger context windows and more powerful hardware to match. As generative AI applications expand into long-form video and complex code reasoning, advances like Rubin CPX illustrate the scale of infrastructure required to keep pace with the next generation of intelligent systems.