The cooling dilemma reshaping AI infrastructure

Mark Venables

AI In Depth, AI Hardware/Infrastructure, Exclusives

Share this article

The rapid acceleration of artificial intelligence is straining the physical limits of data centres. A new white paper from LiquidStack, produced in collaboration with Syska Hennessy Group and Chemours, examines how different liquid cooling approaches could shape the future of high-density computing at scale.

Artificial intelligence is no longer a discrete workload in data centres. It has become the dominant force shaping their design. Training and running large models demand rack densities that were almost unthinkable only a few years ago. Figures from industry surveys show a sharp climb from single-digit kilowatt racks to deployments regularly exceeding 50 kW, with 100 kW now projected as the norm.

The challenge is simple to describe but hard to solve: air cooling cannot keep up. The volume of airflow required to extract heat from today’s chips, let alone tomorrow’s, is beyond the practical limits of fans and ducts. Heat flux off GPUs and CPUs has outpaced the ability of air to carry it away. This is not a question of optimisation but of physics.

Liquid cooling is therefore shifting from a niche experiment to an operational necessity. For years, it was considered an exotic approach reserved for high-performance computing labs, research universities or specialist financial trading environments. Now it is central to the debate on how the next generation of AI-ready data centres will be built.

Executives must accept that cooling technology has moved out of the background. It is no longer a mere operational detail, but a board-level issue. The economics of AI adoption, the pace of innovation, and the perception of sustainability now all converge at the rack level.

Comparing pathways to liquid cooling

The LiquidStack white paper evaluates three main approaches. Each offers a different balance of performance, efficiency, and operational complexity.

Direct-to-chip cooling uses cold plates to channel liquid directly across processors. This delivers effective heat transfer at the chip level, but other components in the server still need to be cooled with air. That means additional fans, ducting, and chillers remain part of the infrastructure. Direct-to-chip is sometimes seen as a halfway house: it eases thermal stress on CPUs and GPUs but leaves the wider system reliant on traditional methods.

Single-phase immersion takes a step further by submerging entire servers in a dielectric liquid. The coolant is circulated, absorbing heat, then pumped through a heat exchanger before being re-cooled. While efficient at moderate densities, the white paper shows that single-phase immersion becomes more demanding as racks grow hotter. At higher chip power, the fluid temperature must be kept lower to absorb heat, forcing chillers to work harder. This drives up energy use and erodes efficiency gains.

Two-phase immersion takes a different approach. Servers are placed in tanks filled with a dielectric fluid engineered to boil at relatively low temperatures. As chips generate heat, the fluid boils and rises as vapour. That vapour condenses on coils within the tank, and the latent heat is carried away. Because the boiling point is fixed, the system does not require progressively lower fluid temperatures as chip power increases. In effect, it adapts naturally to rising thermal loads without extra energy penalties.

For the operators of data centres designed around AI and high-performance computing, the implications are stark. Two-phase immersion not only handles current densities but also provides a more straightforward pathway to the 100–250 kW racks that the industry expects to see within the next decade.

Regional performance and total cost of ownership

To test these methods, the study modelled a 36 MW data centre footprint across four locations: Copenhagen, Ashburn, Singapore, and Abu Dhabi. The regional spread was deliberate. Northern Europe represents cooler climates with access to free cooling. Ashburn, Virginia, reflects a temperate but humid environment. Singapore illustrates tropical, high-humidity conditions. Abu Dhabi represents the extremes of desert heat.

In Copenhagen, two-phase immersion achieved a 12 per cent lower total cost of ownership over ten years compared with direct-to-chip and single-phase immersion. The ability to utilise ambient conditions efficiently, without relying heavily on chillers, gave it a decisive edge. Northern Europe’s data centre market is already under political scrutiny over energy use, and operators in Denmark face expectations of renewable integration. A technology that reduces both energy and water consumption aligns directly with regulatory requirements.

Ashburn produced similar findings. Often referred to as “Data Center Alley”, the region already carries one of the world’s highest densities of facilities. The study found that while all cooling systems could benefit from seasonal free cooling, two-phase immersion required less supporting infrastructure, resulting in reduced floor space and water consumption. For hyperscale operators facing land constraints in Virginia, the ability to pack more compute into smaller footprints is strategically essential.

Singapore demonstrated the clearest differential. The region’s constant humidity and high temperatures mean traditional air-cooling systems face constant strain. Operators in the city-state are under government pressure to minimise water use and energy demand.

The white paper showed that two-phase immersion cuts operating costs substantially while reducing water consumption to negligible levels. At a time when new data centre construction in Singapore is subject to strict sustainability criteria, this capability may be the deciding factor in whether projects receive approval.

Abu Dhabi presented the most significant challenge. Ambient temperatures are so high that chillers remain unavoidable. Even so, two-phase immersion maintained cost advantages, with improved PUE and water efficiency. In the Middle East, where water scarcity is both an environmental and geopolitical issue, the ability to run dry is more than an operational detail; it is a political and reputational safeguard. For global cloud providers establishing regional hubs, the scrutiny of environmental impact is as intense as the scrutiny of cost.

From cost to capability

For executives, the financial implications are only one part of the story. Energy efficiency translates directly into compute capacity. Every watt saved in cooling is a watt freed for processing. With AI training runs lasting weeks or months, the opportunity cost of wasted energy becomes acute. Delayed iteration cycles can slow down product development and erode a competitive edge.

Two-phase immersion also reduces physical footprint. With higher rack densities supported in fewer tanks, operators can deploy more compute per square metre. At a time when land acquisition for hyperscale data centres is increasingly complex, this factor carries significant weight.

The white paper underscores that while direct-to-chip and single-phase immersion offer transitional relief, their long-term economics are less favourable. As chip power climbs, their operational costs escalate. Two-phase immersion scales more gracefully, creating a more precise roadmap for AI infrastructure investment.

Sustainability under scrutiny

The debate about AI is not just technical or financial. Environmental scrutiny is intensifying. Data centres already account for up to 4 per cent of global electricity use, and AI adoption is projected to add hundreds of billions of dollars in operating costs by 2028. Water use has become equally contentious, with evaporative cooling towers drawing attention in drought-prone regions.

The white paper highlights the sustainability advantage of the two-phase immersion process. In every climate modelled, water consumption was negligible. For regions such as Singapore and Abu Dhabi, where water stress is a real concern, this positions the technology as a sustainable option at a time when regulatory and community pressures are mounting.

Investors, too, are taking notice. ESG reporting frameworks increasingly require disclosure of energy and water use. Cooling technology is therefore not just an operational matter but an investor-relations issue. Choosing an inefficient system risks both regulatory scrutiny and reputational damage in capital markets.

Cyber resilience is also tied to infrastructure choices. As cooling systems become more connected, the attack surface widens. Some monitored immersion systems incorporate secure boot features and hardware-based trust anchors to protect firmware integrity. In an era where AI infrastructure is treated as critical national infrastructure, these safeguards cannot be overlooked.

The strategic choice ahead

The findings place cooling technology firmly in the strategic domain. No longer just the concern of engineers, it now sits at the intersection of boardroom priorities: cost, sustainability, resilience, and regulatory optics.

LiquidStack, one of the early pioneers of two-phase immersion, argues that this approach is uniquely suited to meet the rising demands of AI hardware. The evidence presented in the white paper supports that case. Yet the broader message for executives is sharper: without a decisive move to advanced liquid cooling, AI infrastructure faces a physical ceiling that no amount of algorithmic innovation can overcome.

The choice of cooling is no longer about keeping servers operational. It is about enabling the next phase of AI progress. Those who invest in resilient, efficient systems will not simply manage costs but will shape the pace of innovation. In a landscape where the limits of physics and the demands of markets collide, solving the cooling dilemma may be the defining factor in determining who leads in the age of artificial intelligence.

The full report can be downloaded here