The end of centralised cloud is closer than enterprises think

Mark Venables

AI In Depth, AI Factories, AI Hardware/Infrastructure, Exclusives

Share this article

The AI era is exposing the cracks in centralised cloud infrastructure. As GPU demand spirals and inference workloads demand new performance profiles, decentralised models are starting to look less like an alternative and more like an inevitability.

For more than two decades, enterprises have defaulted to centralised cloud for compute. It was the logical evolution of a client-server model that simplified operations, abstracted infrastructure, and created a standardised way to scale applications globally. That logic is faltering under the weight of artificial intelligence.

AI workloads are qualitatively different from the transactional and web-based applications that the hyperscalers were built to support. Training and inference at scale require

specialised hardware, predictable throughput, and minimal latency. The supply chains that underpin traditional cloud models cannot keep up, while their economic structures often work against AI adopters.

Kyle Okamoto, Chief Technology Officer at Aethir, believes this is more than an optimisation challenge. “Centralised cloud has worked well for twenty years, but as we move into a new paradigm of AI compute, supply chain lines are too long, there are not enough choices, and not enough locations,” he says. “The concept is literally centralised, and for inference and other use cases that need distributed infrastructure, centralised is the antithesis. The traditional public cloud model is plagued by resource limitations, unpredictable costs, and network bottlenecks. That is creating a barrier to innovation where high performance, dedicated compute is essential.”

GPU as a service explained

The conversation around GPU as a service is clouded by ambiguity. Some providers offer little more than virtual slices of hardware, often bundled with noisy neighbour issues familiar to any enterprise that has scaled within a hyperscaler. Others deliver dedicated bare-metal resources without the capital outlay.

“There are two main branches of the definition,” Okamoto explains. “The less preferred is where you are sharing compute, network fabric, CPU, storage, and you end up with the noisy neighbour problem. The other option is to have dedicated machines. The analogy I like to use is that it is as if you built it yourself. It is as if you bought the machine, racked and stacked it, cabled and labelled it, and operated it. The difference is that you do not incur capital costs and have shorter timelines. You can get going in a few hours instead of many months.”

The operational difference is significant. Dedicated GPU as a service enables enterprises to avoid the 15 to 20 per cent hypervisor overhead that eats into performance on shared clouds, while providing more flexibility to scale up or out as models evolve.

Beyond hyperscale economics

The economics are becoming impossible to ignore. Traditional hyperscalers operate on business models designed for predictable, steady-state cloud consumption. AI is volatile and cyclical, with periods of intense training followed by sustained inference. Locking enterprises into long-term contracts on fixed architectures makes little sense when GPU generations are turning over in less than a year.

“The economics are staggering,” Okamoto adds. “We are forty to sixty per cent cheaper than the neo clouds and maybe eighty to ninety per cent cheaper than the public hyperscale clouds, while still backed by SLAs and without vendor lock in. We do not charge hidden fees for ingress, egress, data transfer, storage, or API calls. There is a single price every month, and that is what you pay. We built the model around what is wrong with today’s cloud environment and tried to solve it going forward.”

Enterprises may still hesitate because of convenience. Hyperscalers made their platforms sticky for a reason, offering developers frictionless onboarding while finance teams dealt with unpredictable bills. Moving away from that convenience is a lift, but Okamoto argues that for serious deployments, the trade-off is worth it. “For larger enterprises that understand how to run their own networks, it is a no-brainer,” he explains. “The current model is broken, and that is why you are seeing entire segments of new providers emerge. The difference is whether they are built for the next decade of AI or still trying to optimise yesterday’s cloud.”

Latency, inference and the edge

The centre of gravity in AI is shifting from training to inference. Training can be done in a handful of extensive facilities, but inference must happen close to the point of interaction. That raises a fundamental design question: should compute be centralised in a few massive data centres, or distributed across hundreds of locations?

“At a trend level, the AI landscape is shifting,” Okamoto says. “There is a growing focus on inference over training, and that requires hardware optimised for high-throughput, low-latency workloads. It is not just about training anymore; it is about actually using the AI you have trained, and that needs to be lightning quick. GPUs need to be where people are, where agentic AI is being used, and where models are being turned into inference. To do that, you need a lot more than ten or twenty massive locations in the world. You need to be distributed.”

For industries building customer-facing AI, the difference is critical. Waiting an extra second for a generative model to respond can mean losing a customer, a trade, or a transaction. Enterprises cannot rely on a transatlantic round trip to keep inference responsive.

Adopting decentralised GPU as a service is not just a procurement shift. It touches architectures, workflows, and DevOps practices. The assumption that decentralisation means stitching together fragments of compute across continents is misleading.

“There is a misnomer that you are using a little bit here, a little bit there, stitched together,” Okamoto explains. “That is the case with some providers. We do not do that. If you want sixteen H200 boxes, you will get sixteen H200 boxes in a local data centre, interconnected with InfiniBand or RoCE, storage, and dedicated egress. It is as if you built it yourself today. From a DevOps and workflow perspective, there is no major difference between a decentralised GPU as a service and buying your own cluster. The difference is flexibility and economics.”

That distinction matters for enterprises facing unpredictable model lifecycles. A GPU cluster optimised for training today may be redundant tomorrow when inference becomes the priority. Architectures must adapt without forcing long-term capital commitments.

Sustainability through utilisation

Sustainability is another driver pushing enterprises to rethink their AI infrastructure. AI’s environmental impact is under growing scrutiny, with energy consumption and water usage at the centre of the debate. Improving utilisation is one of the most immediate ways to mitigate the footprint.

“There are tens of thousands of GPUs sitting idle today,” Okamoto says. “On one hand, there is scarcity because Nvidia cannot produce fast enough, and on the other, there is massive underutilisation. Those machines are sucking up power all day long and burning electricity. Enterprises that build their own clusters generally run at between 15 and 25 per cent utilisation. That means they are wasting seventy-five to eighty-five per cent of the capital they have deployed. Driving up efficiency not only lowers cost but increases sustainability by reducing wasted power and resources.”

Looking further ahead, Okamoto expects federated models to become the default for AI deployment. The ability to disaggregate workflows and embed them into local locations enables enterprises to maintain data sovereignty while improving performance.

“As regulations continue to increase, federated AI will be a reality,” he argues. “A decentralised platform with hundreds of locations allows enterprises to expose GPUs locally and disaggregate workflows. This enables low-latency inference, real-time applications, and data residency compliance. There is a foundational infrastructure change needed to empower enterprises to deploy scalable, privacy-preserving AI at the edge.”

The implication is that AI infrastructure design can no longer be divorced from regulatory and data governance considerations. As agentic AI systems proliferate, enterprises will need architectures that can scale performance while keeping sensitive data within national borders.

A new infrastructure playbook

The trajectory is clear. AI is pushing centralised cloud towards its limits. Enterprises are under pressure to find infrastructure models that strike a balance between performance, cost, sustainability, and sovereignty. The answer will not be one-size-fits-all.

“Agentic AI is not just about bigger GPUs,” Okamoto concludes. “It is about understanding the entire lifecycle. You need training resources, fine-tuning, multimodal capabilities, and a presence where customers are. Enterprises need flexibility to scale up, scale down, and scale out. GPU as a service provides that flexibility without the need to constantly plan twelve months in advance or commit to architectures that will be obsolete before they are fully deployed. That is the real shift. AI will not wait for centralised cloud to catch up.”

The pressure points are now visible: economics, latency, utilisation, and regulation. For executives responsible for AI deployment, the question is no longer whether decentralised GPU infrastructure makes sense, but how quickly they can adapt their organisations to a model designed for the realities of artificial intelligence.