Liquid cooling is no longer optional for AI data centers

Mark Venables

AI In Depth, AI Hardware/Infrastructure, Exclusives

Share this article

The exponential growth of AI workloads is pushing thermal limits beyond what air cooling can manage. Liquid cooling enables higher compute density, improved energy efficiency, and long-term operational resilience in modern AI data centers.

The shift from air to liquid cooling in AI data centers is no longer a theoretical discussion about future trends. It is a practical response to an engineering and energy reality that has already arrived. With server racks now consuming upwards of 130 kilowatts and individual GPUs exceeding 1,000 watts in thermal design power (TDP), traditional cooling strategies have reached their limits. The exponential rise in heat density has not only changed how data centers are built but also what it means to operate them profitably, sustainably, and at scale.

From hot air to hot water

Five years ago, air cooling could comfortably manage most high-performance computing environments, even those hosting advanced CPUs and GPUs. But the pace of progress has been relentless. “Back in 2020, we were dealing with chips drawing between 400 and 700 watts,” CW Chen, General Manager of Advanced Thermal Solutions at Supermicro, explains.

“Today, some GPUs are already over 1,000 watts, and we expect this trend to continue. If you walk into a server hall today, it is not unlike standing in front of sixty industrial hair dryers. Air is simply not a feasible medium anymore.”

Direct-to-chip liquid cooling (DLC) is not only more effective, but it is also structurally necessary. The change is evident not just in thermal design but in data centre architecture itself. As Chen puts it, “With the heat loads we are seeing now, the cooling approach needs to be as close to the source as possible. DLC allows us to extract the heat before it ever becomes a facility-wide problem.”

Scaling density without scaling cost

One common misconception about liquid cooling is that it is expensive. While true for retrofits, this is not the case for greenfield builds, which are designed from the ground up with thermal efficiency in mind. Supermicro’s total cost of ownership (TCO) models show that a DLC-based 10-megawatt AI facility can reduce power consumption by 38 per cent and lower capital expenditure compared to air-cooled equivalents. A single liquid-cooled rack can replace five air-cooled racks at the same compute density, significantly shrinking the footprint and reducing HVAC requirements dramatically.

The implications of this shift go beyond simple cost savings. Data centre design is becoming a question of how much performance can be delivered per square metre, per kilowatt, and per dollar. Liquid cooling enables operators to densify their infrastructure without encountering thermal limitations. In regions where power availability or floor space is constrained, that changes the calculus completely.

“Our most recent deployments show PUEs below 1.1 in full production environments,” Chen says. “That is not a simulated figure; it is measured in some of the most demanding and humid regions in Asia.” The savings come not just from reduced fan usage or HVAC downgrades but from rethinking the entire chain of heat management. Integrated CDU units with redundant, hot-swappable pumps and warm water operation at up to 45°C enable the elimination of chillers in many regions. Waste heat can even be reused to warm nearby buildings or water systems, creating new opportunities for operational efficiency and circular design.

The opportunity extends into data centre siting itself. With reduced cooling infrastructure needs, operators can build facilities in more diverse locations, closer to power sources or customers, rather than being tied to temperate climates or access to municipal chillers. This flexibility also opens doors for innovative heat reuse projects, from district heating to industrial co-location.

Designing for thermal intelligence

While cooling has historically been treated as an afterthought in infrastructure, AI-scale data centers demand smarter systems. The integrated control logic, advanced telemetry, and API-level manageability are essential for balancing performance, reliability, and energy use in real time. “Every element, from coolant velocity and droplet heat absorption to pressure drop and condensation control, must be designed with precision,” Chen adds. “This is thermal engineering at the level of fluid dynamics, not just airflow.”

Precision thermal control does more than prevent overheating. It enables predictive maintenance, optimises fan and pump speeds, and ensures consistent performance across hundreds of nodes under dynamic AI workloads. The ability to capture real-time data on every component, from CDU manifolds to cooling towers, and to act on that data autonomously is what transforms liquid cooling from a plumbing challenge into a strategic advantage.

The design language is increasingly analogous to biological systems. “Just like the human body, you need a heart to pump, a vascular system to distribute, and sensors to monitor health,” Chen explains. “In our racks, the CDU is the heart, the coolant lines are the veins, and our software monitors and manages it all.” It is an apt metaphor for an industry now tasked with creating living, breathing systems capable of powering the next generation of intelligent machines.

Operational agility in deployment

The challenges of AI do not end at design. Speed of deployment is increasingly critical for organisations racing to build training clusters or inference infrastructure. The Supermicro model is one example of how integration can support operational agility. “We build racks fully populated with servers, piping, CDUs, and management software already installed,” Chen notes. “Deployment on-site can be reduced to connecting three things: power, internet, and water.”

This model of factory-integrated, plug-and-play infrastructure is rapidly becoming the new normal in AI. In one recent deployment, over 6,000 liquid-cooled GPU servers were shipped as fully assembled racks, accelerating time-to-commissioning and reducing on-site complexity. “If you are still treating liquid cooling as a bolt-on, you are missing the opportunity to simplify, not complicate.” Chen continues.

These efficiencies are not limited to the installation process. Ongoing operations also benefit from the modular, integrated approach. Redundant pumps and power supplies can be swapped without powering down racks. Software integration enables AI-assisted optimisation of coolant flow, energy use, and workload distribution. The result is a system that is not only easier to deploy but also easier to manage, monitor, and maintain. It becomes a platform for rapid scalability, not a maintenance headache.

Efficiency is the new capacity

Liquid cooling is often framed to support ever-higher compute performance. However, its role is increasingly defined by efficiency rather than power. A lower PUE is not just a badge of honour; it is a competitive advantage. Lower energy bills, less real estate, and fewer constraints on expansion mean that the most efficient data centers are also the most profitable and sustainable.

This is especially critical in a world where power availability is becoming a gating factor for AI growth. Grid constraints, regional caps, and rising electricity prices mean that the ability to extract more compute per kilowatt is becoming more valuable than the ability to buy more servers. “Liquid cooling lets you shift power from air handling to compute,” Chen says. “That is the core of AI infrastructure economics now.”

The pressure to decarbonise is also becoming a key driver. Liquid cooling, when paired with warm water loops and reuse strategies, supports broader sustainability goals while simultaneously improving compute economics. That dual benefit is increasingly attractive to both CIOs and sustainability officers.

As organisations look to build or expand their AI infrastructure, the decision to go liquid is no longer just about thermal performance. It is about strategic viability. From extending equipment lifespan and automating maintenance to reducing energy costs and improving space utilisation, DLC is emerging as a foundational pillar of AI infrastructure strategy.

“Air cooling has had a good run, but it cannot keep up with AI,” Chen concludes. “Those who adopt liquid early will not just run cooler; they will run faster, cheaper, and smarter. And in AI, that makes all the difference.”