AI factories need power, not promises, as cooling becomes the silent bottleneck

Share this article

As AI workloads scale into gigawatt territory, power is no longer a back-end constraint but a central business challenge. Two-phase liquid cooling offers a physics-driven alternative to air and pump systems, unlocking higher performance without increasing infrastructure load.

The AI boom has revolutionised data center design. Where once cooling was a peripheral concern, today, it defines the limits of what can be built. As power-hungry GPUs push beyond 1,200 watts per chip and workloads swing between idle and peak in milliseconds, traditional air and single-phase liquid cooling cannot keep up.

This is not merely an infrastructure issue; it is a performance ceiling. Every watt spent on fans, chillers, and thermal inefficiencies is a watt not used for AI compute. In a world chasing ever-larger models, that waste directly translates into slower time to insight and higher operating costs. According to My Truong, Chief Technology Officer at ZutaCore, the industry must stop trying to squeeze more efficiency out of outdated systems and instead embrace the laws of physics themselves.

“We have spent decades throwing away up to half the energy in a data center on cooling,” Truong explains. “Even inside the server, as much as 20 per cent of energy is lost just moving air with fans. That might be acceptable at a small scale, but when you are talking about AI factories operating at hundreds of megawatts, it becomes completely untenable.”

Air has long dominated data center cooling, primarily because it has been simple to implement. But as compute density increases and the industry pushes into the double-digit kilowatt-per-rack territory, the air simply does not scale. The inefficiencies of airflow, the energy demands of chillers, and the constraints on temperature variance are now running into a wall defined not just by cost but by physics itself. It is not that existing cooling is flawed; instead, it was never designed for this level of performance.

Physics over pumps

ZutaCore’s approach is based on a two-phase direct-to-chip liquid cooling system. Unlike traditional systems that use pumps or fans to move heat, this solution relies on phase change and gravity. Heat from the CPU or GPU vaporises a dielectric fluid, which then rises, condenses, and returns by gravity. It is not a new concept, but its application at this scale and precision is.

“In partnership with Munters, we have built an end-to-end solution that essentially removes the need to waste power on moving heat,” Truong explains. “Instead of air or pumped liquid, we use the natural process of evaporation and condensation. This reduces the total facility PUE to around 1.03 and dramatically increases the proportion of energy available for compute.”

The implications are significant. In a typical 100-megawatt data center with a PUE of 1.8, an operator might support around 46,000 GPUs. Drop that PUE to 1.03 and the same energy envelope can accommodate 81,000 GPUs, a 75 per cent increase in available compute without touching the grid connection.

The performance gain here is not about adding capacity; it is about unlocking latent value in existing infrastructure. Many AI-focused facilities are already operating at the limit of what their local grid can supply. In that environment, the ability to extract more useful compute from the same energy footprint becomes not just a technical enhancement but a commercial imperative.

Thermals, throttling and time to insight

The performance impact goes beyond the number of GPUs. AI workloads are highly sensitive to temperature fluctuations. Uneven cooling across a die leads to thermal throttling, where cores reduce speed to avoid overheating. This effect can undermine silicon performance, reduce model training speed, and shorten hardware lifespan.

“Two-phase cooling enables temperature stability across the entire surface of the chip,” Truong continues. “Instead of hotspots and gradients, we are keeping the thermal profile flat. That means less throttling, more consistent processing, and ultimately faster completion of AI workloads.”

He offers a simple example. A traditional CPU cooled with air might take 12.45 seconds to compute a known benchmark, while the same system under two-phase cooling completes it in 10.2 seconds. Scale that to an AI training task that previously took five days, and you can now finish in four, leaving headroom to rerun or refine the model before the week is out.

It is a feedback loop that matters. Shorter compute times mean faster iterations. Faster iterations mean better models. In an industry where competitive advantage increasingly depends on AI agility, thermal performance is not just an operational detail; it is a strategic lever.

As workloads shift toward multi-modal, high-parameter models, training times can stretch into weeks or even months. Any opportunity to shave time without compromising accuracy is a competitive differentiator. This is not about overclocking. It is about consistency, ensuring every chip performs predictably under sustained load.

Scaling without expansion

Two-phase liquid cooling also redefines what is physically possible in a given space. The University of Oregon’s College of Earth, Ocean, and Atmospheric Sciences implemented ZutaCore’s solution without requiring any expansion to its cooling infrastructure. Performance gains ranged from 20 to 50 per cent, with CPU temperatures reduced by up to 30°C and power use lowered due to the elimination of high-speed fans.

“By moving the heat load into liquid cooling, they avoided the need to install additional CRACs or expand the data hall footprint,” Truong says. “In the AI era, where time and real estate are equally scarce, that kind of efficiency gain has real-world value.” The gains are not just performance-related. The shift away from traditional cooling also mitigates the risk of thermal shock and rapid temperature changes that strain silicon. As workloads spike from zero to full power and back in milliseconds, especially in AI inference or training tasks, these temperature swings create physical stress on the chips themselves.

“Multiple dies within a single package expand at different rates,” Truong explains. “Without uniform cooling, that introduces reliability issues. In eight-way GPU systems, the failure of one component can render the entire workload offline. We are seeing attrition rates of four to five per cent in some AI server environments. Flattening the thermal curve reduces that.”

The implications for AI infrastructure operators are profound. It is no longer sufficient to measure system reliability solely by component quality or software stability. Thermal conditions now play a direct role in uptime, lifespan, and performance across entire compute clusters.

The silent limit on AI scalability

What emerges is a more urgent question. For all the industry’s talk about faster GPUs, denser racks, and advanced interconnects, the actual bottleneck may be heat. Not heat as a technical issue, but heat as a gating factor for business growth.

SoftBank’s deployment of the world’s first two-phase solution for NVIDIA H100s is a bellwether moment. By reducing PUE, extending hardware life, and increasing performance, they are demonstrating that AI infrastructure does not need to be power-hungry or land-intensive to scale. With plug-and-play compatibility and no need for chilled water, the solution offers a viable path forward for operators seeking to overcome the limitations of traditional builds.

“We are talking about delivering more compute in the same power envelope, more speed in the same silicon footprint, and more reliability for workloads that are already pushing hardware to its edge,” Truong concludes. “This is not about incremental efficiency; it is about reframing what we believe is possible.”

That belief, in many ways, is the more profound shift now underway. In the race to build the next AI superlab or edge node, the conversation is expanding beyond chips and models. The real differentiator is becoming the infrastructure that quietly powers them. And if energy is the currency of compute, then cooling, subtle, silent, and often overlooked, may be its ultimate arbiter.

Related Posts
Others have also viewed

The next frontier of start-up acceleration lies in the AI tech stack

The rise of generative and agentic AI has redefined what it means to start a ...

Quantum-centric supercomputing will redefine the AI stack

Executives building for the next decade face an awkward truth, the biggest AI breakthrough may ...

The invisible barrier that could decide the future of artificial intelligence

As AI workloads grow denser and data centres reach physical limits, the real bottleneck in ...
Into The madverse podcast

Episode 21: The Ethics Engine Inside AI

Philosopher-turned-AI leader Filippo explores why building AI that can work is not the same as ...