The inference explosion is rewriting the economics of artificial intelligence

Mark Venables

AI In Depth, AI Factories, AI Hardware/Infrastructure, Exclusives

Share this article

Artificial intelligence has crossed a threshold where usage, not training, defines its trajectory. The shift to reasoning and agentic systems is driving a surge in computation that is fundamentally changing how AI is built, deployed, and monetised.

For more than a decade, the trajectory of artificial intelligence has been defined by training. The industry focused on building larger models, ingesting more data, and pushing the limits of compute to create systems that could understand, generate, and predict with increasing accuracy. That phase created the foundation for everything that followed, but it is no longer where the real pressure sits. What Jensen Huang, NVIDIA CEO, made clear in his keynote at GTC in San Jose is that the centre of gravity has shifted, and with it, the entire economic structure of AI.

“The amount of computation in the last two years has increased by roughly 10,000 times,” Huang says. “The amount of usage has probably gone up by 100 times. When I combine these two, I believe that computing demand has increased by one million times in the last two years. It is the feeling that every startup has, every AI lab has, that if they could just get more capacity, they could generate more tokens, their revenues would go up, more people could use it, and the AI could become smarter.”

This is not a continuation of the previous cycle. It is a structural break. The scale of demand is no longer tied to how often models are trained, but to how often they are used, and that distinction changes everything. Training is episodic, bounded, and predictable. Inference is continuous, compounding, and increasingly unpredictable as systems become more capable and more deeply embedded in workflows.

From response to action

The driver behind this shift is not simply adoption, but a fundamental change in what AI can do. The progression Huang describes is not incremental improvement, but a series of inflection points that collectively redefine the role of AI within organisations.

“An AI that was able to perceive became an AI that could generate,” he continues. “An AI that could generate became an AI that could reason. An AI that could reason now became an AI that can do work, very productive work. The amount of computation in the last two years has gone off the charts. This is the inflection point of inference.”

The move from generating outputs to performing tasks is where the economics begin to change. Once AI starts to act, rather than simply respond, it becomes part of the operational fabric of a business. It is no longer invoked occasionally, it is engaged continuously, embedded in processes that require constant interaction, iteration, and decision making.

“AI now has to think. To think, it has to inference. AI now has to do,” Huang explained. “In order to do, it has to inference. AI has to read. In order to read, it has to inference. Every part of AI, every time it has to think, it has to reason, it has to do, it has to generate tokens, it has to inference.”

The repetition here is not rhetorical, it is structural. Every meaningful action performed by AI requires inference, and as those actions become more complex and more frequent, the volume of inference expands exponentially. This is not simply more usage of the same systems, but a multiplication of demand driven by the nature of the tasks being performed.

Reasoning breaks the model

The second force amplifying this explosion is the evolution of the models themselves. The transition from generative systems to reasoning systems introduces a new level of computational intensity, one that extends far beyond the simple generation of outputs.

“Reasoning allowed it to reflect, allows it to think to itself, allows it to plan, break down problems, decompose a problem it could not understand into parts that it could understand,” Huang adds. “It could ground itself on research. The amount of input tokens and the amount of output tokens increased the amount of computation tremendously.”

Reasoning is not a single step. It is a process, iterative, layered, and often recursive, where the system generates intermediate steps, evaluates them, and refines its output. Each of those stages consumes additional compute, and when multiplied across millions of interactions, the effect is profound. The system is no longer producing a single answer, it is executing a chain of thought, and that chain requires orders of magnitude more computation.

The emergence of agentic systems compounds this further by extending the scope of what AI is responsible for. “For the first time, you do not ask AI what, where, when, how. You ask it to create, to do, to build,” Huang explains. “It is able to use tools, take your context, read files, break down a problem, reason about it, reflect on it, and actually perform tasks. It can solve problems and actually do work.”

This is the moment where AI transitions from a tool to an actor. Agents do not simply generate responses, they execute workflows, interact with systems, and iterate towards outcomes. Each step in that process triggers further inference, creating a cascade of computation that extends far beyond a single interaction. The result is not just more demand, but a fundamentally different type of demand, one that is dynamic, expanding, and difficult to predict.

The collapse of predictable demand

The industry is still adjusting to this shift, and many organisations remain anchored in the previous phase of AI, where the primary challenge was building and deploying models. That mindset assumes that intelligence is something created once and then applied, a capability that can be rolled out across use cases and gradually scaled over time. What Huang is describing makes that approach increasingly inadequate, because the real challenge is no longer the creation of intelligence, but the continuous, large-scale production of it.

In this new environment, inference is not a supporting function within the lifecycle of AI, it is the dominant force shaping its economics and its infrastructure. The systems that matter are not those that achieve marginal gains in model performance, but those that can sustain vast volumes of computation, repeatedly, efficiently, and under real-world constraints. Power, cost, and performance are no longer independent considerations, they are tightly coupled variables that determine how much intelligence can be generated and how economically it can be delivered.

The consequence is a shift in what defines competitive advantage. Organisations are no longer differentiated solely by the sophistication of their models, but by their ability to operate those models at scale, to manage the explosion in demand, and to convert computation into usable output in a way that is both reliable and cost effective. This requires a level of system design and operational discipline that extends far beyond traditional software deployment, aligning infrastructure, architecture, and economics into a single, coherent strategy.

What emerges from Huang’s argument is not simply a faster or more capable version of AI, but a fundamentally different system, one in which intelligence is continuously generated, consumed, and reinvested into further computation. The transition from training to inference marks the point where artificial intelligence moves from being a technological capability to becoming an operational necessity, embedded within the core of how organisations function.

In that context, the defining question is no longer how intelligent a system can become in isolation, but how much intelligence it can sustain over time, how efficiently that intelligence can be produced, and how effectively it can be translated into real-world outcomes. The answer to that question will determine not just the performance of individual systems, but the structure of the industry itself, as artificial intelligence moves fully into its next phase as a continuous, industrialised process.