AI-defined vehicles are not just coming; they are already here

Mark Venables

Share this article

AI-defined vehicles are reshaping autonomy by fusing physical AI and generative technologies. This new model demands scalable platforms, foundational models and end-to-end simulation that bridges the gap between training and real-world deployment.

AI’s role in vehicle autonomy has evolved from enabling perception to defining a new category of physical AI. This shift is more than semantic. It represents the expansion of AI from a purely digital presence into the physical world, with autonomous vehicles leading the charge. The past year has seen foundational models and generative AI move from research labs into production vehicles, delivering more human-like driving experiences in increasingly complex scenarios.

Xinzhou Wu, Vice President of Automotive at NVIDIA, makes no attempt to understate the change. “In 2024, we started seeing real-world applications of generative AI that are beginning to reshape both the experience of autonomous driving and how we develop these systems,” he says. “Just one year ago, we discussed how generative AI would raise the ceiling for AV capabilities. Now we see multiple products that reflect those advancements are entering the market.”

At the heart of this evolution lies a broader trend Wu describes as the era of physical AI. “Everything we are doing now, hardware, software, and safety systems, is contributing to the first chapter of the physical AI era,” he continues. “We sometimes describe this as a move toward ‘autonomous everything’ and robotics. It is an incredibly exciting shift, and our mission at NVIDIA is to accelerate that arrival.”

The significance of this shift cannot be overstated. For industries dependent on mobility, logistics, and just-in-time operations, the arrival of AI-defined vehicles brings a new level of predictability, efficiency and safety. Unlike earlier iterations of automation, which often relied on rigid rules and segmented tasks, AI-defined systems are adaptive. They learn from data and improve over time. That adaptability is critical when transferring intelligence from a lab-controlled training environment to a dynamic, real-world setting.

Three computers, one ecosystem

NVIDIA’s framework for this AI-defined future rests on a three-computer model, integrating cloud, simulation and in-vehicle computing. The first component is DGX, the cloud infrastructure responsible for data processing and model training. This is followed by OVX, the simulation platform that generates and validates scenarios, and finally, the in-car computer, moving from NVIDIA’s current Orin platform to the more powerful Thor.

Wu calls this the foundation for closing the loop between perception, simulation and validation. “While in-car testing will always remain part of the process, we expect most validation to move to the cloud,” he continues. “OVX is also critical to closed-loop training, where simulation and training are tightly integrated. This is not fully realised yet, but it is coming.”

The stack that enables this loop is NVIDIA DRIVE. Certified for functional safety and built for mass deployment, the DRIVE platform integrates hardware, operating systems and safety frameworks with NVIDIA HALOS, the company’s latest safety system. The software supports everything from Level 2 driver assistance to Level 3 autonomy.

Generative AI redefines simulation

Simulation has always played a role in AV development, but generative AI has shifted the baseline. End-to-end models no longer interact with object-level abstractions. They operate directly on pixels. This demands simulation that is not only accurate but photorealistic.

“The technology developed by our research team is already in use within our validation pipeline,” Wu explains. “We can now take recorded scenes, such as a vehicle performing lane keeping, and simulate alternative behaviours like a lane change. We generate a realistic video using neural rendering that reflects what the car would have seen and experienced during that manoeuvre.”

This capability is underpinned by two components: LogSim and WorldSim. LogSim replays real-world data, while WorldSim generates entirely synthetic scenarios. Together, they provide the dual lens of reality and creativity to train robust AI. These are now supported by generative tools such as COSMOS, a foundation model for video generation that allows for photorealistic, synthetic data creation using natural language prompts.

“We can even create synthetic scenes using natural language descriptions, then use tools like Sensor RTX and COSMOS to generate hundreds or thousands of scenario variants for our simulation libraries,” Wu explains. “This applies not just to cameras but also to radar and lidar.”

The data flywheel, NVIDIA’s term for the interplay between cloud training, simulation and vehicle feedback, benefits directly. Real-world scenarios can be recreated, altered and replayed in simulation with pixel-level accuracy. The result is a continuous feedback loop where every mile driven, simulated or imagined contributes to the refinement of the model.

As simulation becomes increasingly central to training and validation, its scope expands. Use cases now include the simulation of edge cases and rare scenarios which are difficult to capture in real-world testing. Double-parked vehicles, unusual pedestrian behaviour, or non-standard signage can now be modelled and inserted into training environments. This helps ensure a more comprehensive and safety-oriented learning process, accelerating time to deployment without compromising reliability.

Aomayo and the architecture of autonomy

At the centre of this evolving architecture is Aomayo, an end-to-end AV model named to evoke the complexity of the task ahead. “The model uses an LLM-like backbone with an autoregressive structure,” Wu says. Input tokens come from vision tokenisers and optionally from a vectorised world representation. The output includes both image predictions and control trajectories.”

Training follows a data pyramid: internet-scale data is distilled into smaller components, followed by driving data from partners, which is then refined with high-quality sensor and driver behaviour inputs. This is concluded with supervised fine-tuning, reinforcement learning, and quantisation. The final model is robust and scalable, enabling high levels of autonomy with fewer modular components.

This fused architecture is designed to operate flexibly across platforms. A streamlined version will soon launch on Orin, enabling L2++ capabilities. A more powerful model, optimised for Thor, will follow. “With dual Thor ECUs, we will be able to take urban autonomy to the next level,” Wu adds.

Importantly, the dual-stack approach allows for redundancy. The classical stack remains on a satellite ECU, acting as a guardrail for the fused model. This configuration ensures both safety and performance as systems transition toward full autonomy.

A global vision becomes a global market

Much of this technology is being rolled out at pace, with NVIDIA DRIVE software now shipping in production vehicles across Europe and the United States. But Wu points to China as the most dynamic AV market. “We have seen the penetration rate of L2+ features double in 2024 across several major OEMs,” he says. These are not luxury models; these are mid- to low-priced vehicles, some costing as little as $10,000 to $15,000.”

The outlook is clear. With production cycles already committed, penetration of L2+ could exceed 50 per cent by the decade’s end. In practical terms, this means AI-defined vehicles will become the default experience for consumers, not a high-end exception.

This shift demands an infrastructure that can keep pace. Thor, the next-generation in-vehicle SoC based on the Blackwell architecture, has been purpose-built for generative AI. It supports the native deployment of large language models and complex end-to-end AV stacks. Several OEMs will begin production with Thor in late 2025.

For Wu, this marks a turning point. “This year is pivotal,” he concludes. “After ten years of development, we are finally selling our software. The L2 Orin SOP based on the classical stack will launch in a matter of months. Later this year, we will deploy the first version of the slim Aomayo model, supporting L2++ across highway and urban scenarios. By late 2026 or early 2027, we aim to launch L3 capabilities on highways, with urban L3 following soon after.”

The road ahead is defined not only by capability but also by architecture. A seamless, scalable fusion of cloud and car is required. NVIDIA’s focus on full-stack development, foundational model training, and closed-loop simulation provides one route to that future. It is not a speculative vision. It is being shipped, integrated, and driven today.