The processor everyone forgot is now running the AI economy

Mark Venables

AI In Depth, AI Hardware/Infrastructure, Exclusives

Share this article

The AI boom has been framed as a triumph of acceleration, yet the system is beginning to fracture under the weight of its own orchestration. What looks like a compute revolution is, in operational terms, a coordination crisis, and the CPU is back at the centre of it.

There is a point at which a technology stops being defined by what it can do and starts being defined by what it cannot sustain. Artificial intelligence is approaching that point. The industry continues to measure progress in model capability and accelerator performance, yet the system that must carry those gains into real-world operation is beginning to show strain in ways that are harder to quantify and increasingly difficult to ignore.

“One hundred and seventeen billion, what is that number,” Rene Haas, CEO, ARM asks. “That is the total humans ever to live on Earth. Three hundred and fifty billion. That is the number of Arm based chips that have been shipped, that is three times the total number of humans who have ever existed on the planet, and around 160 Arm chips for every global household, that just gives you a sense of the scale of what we have done, and it feeds into everything that makes us what we are today.”

Haas frames the argument through scale, not as a retrospective achievement, but as a reminder that the infrastructure underpinning artificial intelligence is not being built from first principles. It already exists, distributed across devices, data centres and networks, and what AI is doing is forcing that system into a mode of operation it was never designed to handle at this intensity. The implication is that the constraints now emerging are not failures of innovation, but consequences of success.

Scale hides the constraint

The origins of Arm reinforce this idea in a way that feels almost counterintuitive given its current positioning at the centre of AI infrastructure. The company was not built to dominate large-scale compute environments. It was built to solve a problem defined by limitation rather than abundance, and that design philosophy continues to shape how it approaches the present.

“The company was born to run off batteries, and the original requirement was that the chip had to be low power and operate within strict thermal limits,” Haas says. “We nailed that objective so completely that when the first development board was powered up and then unplugged, the chip kept running based on leakage current from the other components.”

That anecdote carries more weight in the current context than it did when it was simply a story about engineering ingenuity. Artificial intelligence is pushing infrastructure towards power ceilings that are becoming increasingly difficult to extend, and efficiency is no longer a secondary concern that can be optimised later. It is becoming the condition under which the entire system either scales or stalls. “If anyone thinks that this is something that is going to go away, it is a little bit of an ostrich syndrome, this is here with us, and it has really changed how people think about computing,” Haas adds.

The permanence of AI is not in question. What is in question is whether the architecture that has emerged around it is aligned with how it is being used. The industry narrative has focused heavily on acceleration, creating the impression that performance gains at the model level are the primary determinant of progress, yet that narrative begins to break down when examined through the lens of operational reality. “Somewhere along the way people thought CPUs were dead,” Haas continues. They thought that the only way you handle AI is through accelerated computing, that the CPU’s role in the AI world is no longer relevant.”

That assumption was convenient because it simplified a complex system into a single axis of improvement. It allowed the conversation to centre on GPUs and specialised hardware while relegating everything else to the background. The problem is that the system itself does not operate in that simplified way, and as AI moves from experimentation into continuous operation, the parts of the architecture that were overlooked are beginning to reassert their importance.

The accelerator narrative breaks down

“The conventional use of the cloud was you type in a query, you do a search, you get a prompt back, and CPUs were doing literally all the work, and when you add AI the cloud is still servicing that request, the accelerator generates the token, and the CPU in that data centre orchestrates and sends the token back.”

This distinction between generation and orchestration is central to understanding the shift that is now underway. Accelerators produce outputs, but they do not manage the system that produces those outputs, and it is that management layer that is becoming increasingly complex as AI workloads evolve. The cloud has not been replaced by AI, it has been reconfigured around it, and that reconfiguration is exposing dependencies that were previously taken for granted.

“What has changed in the last number of months has been this explosion of agents,” Haas explains. “Agents are essentially tools that act on a request and come back with a full flow of answers. It is not just a query, it is actually work, it is running a payroll task, doing scheduling, executing workflows.”

The shift from queries to workflows marks a fundamental change in how compute is consumed. Traditional cloud interactions were episodic, defined by discrete requests and responses that could be processed and completed within a bounded context. Agentic AI introduces continuity, a stream of activity that persists beyond a single interaction and requires ongoing coordination across multiple systems.

“As we move to agents, the number of tokens per human goes up by 15 times if not greater,” Haas says. “Agents can generate requests far faster than humans and they do not sleep, they are active 24/7, so they are constantly pushing these requests into the cloud.”

This is not simply an increase in volume. It is a change in the nature of demand, one that places sustained pressure on the parts of the system responsible for coordination rather than computation alone. The infrastructure must now handle not just more requests, but more complex and interdependent sequences of work, each requiring scheduling, resource allocation and integration with other processes. “Agents are workflows, payroll tasks, scheduler tasks, and asynchronous operations,” Haas says. “That is what CPUs do, that is not work that can be done by an accelerator, the accelerator generates the tokens but the CPUs move all the work around.”

When agents become the workload

The consequence is a bottleneck that does not appear in benchmark scores or model comparisons but becomes immediately visible in production environments. The system begins to slow not because it cannot generate answers, but because it cannot move those answers through the pipeline efficiently enough to keep pace with demand. What was once considered a supporting layer is becoming the limiting factor. “You see a huge bottleneck in terms of flow,” Haas adds. “The data centre is choking because all of this work has to move through the system, and the accelerators, which are very expensive, need those tokens to be moved back and forth.”

The physical reality of data centres amplifies this problem in ways that are difficult to abstract away. Infrastructure is constrained by power, space and capital, and each of those constraints becomes more acute as workloads increase. The introduction of agentic AI does not simply increase utilisation; it changes the shape of demand in a way that existing architectures struggle to accommodate.

“We estimate there are about 30 million CPU cores per gigawatt in a data centre today,” Haas explains. “As we move into this agentic AI world that could go up four times to around 120 million CPU cores for the same power envelope.” The scale of that increase is not easily absorbed. Expanding capacity is neither immediate nor straightforward, and the assumption that infrastructure can scale in line with demand is becoming increasingly fragile.

The system is already operating close to its limits and adding more compute without addressing efficiency risks compounding the problem rather than solving it. “You are trying to put four times the amount of CPU cores into the same power envelope,” Haas says. “Power is precious, the capital required is precious, and the data centre is already full.”

This is where the original design philosophy of Arm begins to reassert itself, not as a historical footnote, but as a practical necessity. Efficiency is no longer an advantage; it is a requirement for sustaining growth in AI workloads. The ability to deliver performance within strict power constraints becomes the defining factor in whether the system can continue to scale. “You need a CPU that has the DNA of being born to run off a battery,” Haas adds.

That statement reframes the future of AI infrastructure in unexpectedly simple terms. The challenge is not only to build more powerful systems, but to build systems that can operate within the limits imposed by energy, cost and physical space. It is a constraint-driven model that feels closer to the origins of computing than the narratives that have dominated recent years.

For most of its history, Arm has operated as a provider of intellectual property, enabling others to build the systems that define the market. That model has proven resilient, but it is now being tested by the speed and complexity of AI-driven demand, where time to market and system integration are becoming as important as performance. “We have traditionally provided IP in a standalone form, CPU, GPU, system IP, and that has served us well for more than 30 years,” Haas continues, “But the complexity of chips is increasing, the cycle times are getting longer, and there is a need to do more and do it faster.”

The introduction of compute subsystems reflects an attempt to address that complexity by reducing the time required to move from design to production. It is a shift towards integration, not as a strategic repositioning, but as a response to the realities of modern chip development. “We introduced compute subsystems that take all the blocks of IP and put them together in a finished, verified way,” Haas explains. “In some cases it can shave a year or even 18 months off the time from starting design to getting into production.”

From IP provider to system supplier

That evolution, however, is only part of the story. The demands created by AI are pushing the company beyond its traditional role, forcing a reconsideration of where value is created within the ecosystem and how that value is delivered to partners. “We are now announcing our first silicon chip that we are selling to customers for revenue,” Haas explains. “This is a big, big deal, because we are now in a new business for Arm, supplying CPUs as chips.”

The move into silicon is framed not as a departure from the ecosystem model, but as an extension of it, driven by the needs of partners who are themselves navigating the constraints of AI infrastructure. It reflects a recognition that the existing model may not be sufficient to address the speed and scale at which the market is evolving.

“The biggest reason we are doing this is that our partners have asked for it,” Haas agrees. “But we are also doing this to solve the problem of what happens as agentic AI becomes mainstream, because all of the work required to make that happen is CPU bound.”

What emerges from Haas’s argument is not a rejection of acceleration, but a rebalancing of the system that supports it. Artificial intelligence is not a single layer problem and treating it as such has created blind spots that are now becoming operational risks. The CPU is not returning because it was ever truly absent, but because the system has reached a point where its role can no longer be obscured.

The narrative of AI as a story of exponential capability remains intact, but beneath that narrative the mechanics of how that capability is delivered are being rewritten in real time. The future of AI will not be determined solely by how fast models can generate answers, but by how effectively the system can sustain the flow of work those answers create.