Meta is building a data center the size of Manhattan. How cool is that?

Share this article

Kevin Roof, Director of Offer & Capture Management at LiquidStack, explores how Meta’s next-generation AI data centres will push cooling technology to its limits. As the company builds facilities on a scale that rivals Manhattan itself, the challenge is not just generating intelligence, it is keeping it from overheating.

Meta is spending hundreds of billions of dollars on a string of multi gigawatt AI data centers that will be less like traditional business facilities, but cities in their own right.

That’s not hyperbole. Mark Zuckerberg in July said that its Hyperion data center would “be able to scale up to 5GW over several years, and “multiple more” of these titan clusters would be built.

“Just one of these covers a significant part of the footprint of Manhattan,” the Facebook founder said. To prove his point, he posted a gif that illustrated precisely how one of these massive facilities would fit across the sliver of land that is home to 1.66 million people.

But if running a city like Manhattan is a challenge, keeping it cool is even harder.

Manhattan is a classic example of the phenomenon where cities are several degrees hotter than surrounding areas, thanks to the city’s “built environment.” [urban heat island effect]

And that’s without running millions of GPUs and associated equipment.

We know that data center operators already face a massive challenge when it comes to managing the heat their facilities produce – and calming the public’s concern over their potential environmental impact.

So, how should Meta approach the cooling challenge for these truly titanic data centers? Here are some thoughts based on our experience of cooling at scale.

Calculating the options

Current cutting-edge data center designs stretch into the 100s of kWs, and we can safely assume that Meta is laying its plans with NVIDIA’s roadmap in mind, and that envisions 1MW racks.

Traditional evaporative cooling is considered a non-starter for data centers of this scale except for peripheral activities. Just the amount of water needed would be astronomical, tens of millions of Olympic sized swimming pools.

So liquid cooling is the most practical option for the majority of cooling in these next generation  data centers.

But what type of liquid cooling? Immersion does have an edge when it comes to pure cooling potential. But is complex and relatively inflexible. Direct to chip is the most likely option here. Indeed, NVIDIA’s own reference designs lean towards direct to chip liquid cooling for its cutting-edge designs.  

Opting for direct to chip liquid cooling is the easy part though. Meta’s engineers then face the challenge of implementing it at titanic scale. Taking the right approach from the outset could make their lives easier, for years into the future.

Just consider the amount of kit involved. Whether you do some sums on the back of a napkin or drop a few prompts into ChatGPT, Meta’s planned data centers are likely to house between one and two million GPUs spread across ten to twenty thousand racks, along with associated networking and storage.

That is an incredible amount of equipment to deploy and manage. And a correspondingly massive amount of cooling infrastructure to deploy and maintain too. And there will likely be failures or upgrades that require components to be replaced over time.

So, when designing this cooling infrastructure, Meta will need to consider not just the amount of heat it must shift today, or next year, but generations of silicon into the future.

This means Meta will need a cooling solution, or solutions, which doesn’t just offer raw cooling capability. It needs a liquid cooling approach that can evolve alongside its plans.

Even someone with the vast resources of Meta won’t want to tie-up capital or delay deploying racks while it for cooling components to be delivered. But neither will it want cooling equipment sat in storage, waiting to be deployed, or in place, but underutilized.

It needs partners who can supply at scale, continuously, and which have an engineering ecosystem capable of deploying and install cooling at the same pace it rolls off the production line.

Even better, Meta should choose a liquid cooling architecture that can be deployed incrementally. Why wait until the whole of Manhattan is covered in racks before flicking the on switch? Why not start serving customers or refining models when the equivalent of Central Park is ready to go online.

A liquid vision?

That’s just the start of the data center lifecycle though. From that point on manageability is key. The racks of compute and storage in a data center are designed with redundancy in mind. Components are standardized and hot-swappable as far as possible. so, Zuckerberg’s engineers should ensure the same applies across the infrastructure keeping it all cool.

A decoupled system – where the control console is a in a separate unit for example – will give Meta more options when it comes to scaling up over time. If there are faults in one component, it won’t need to replace the entire unit. So, cooling units and individual components, such as pumps or sensors, should be specced in the same way.

And of course, this will be all for nothing, if those components are not easily accessible – and spares readily available. Front access will be critical and will also allow more flexibility when it come to positioning units and designing the aisles.

Presumably, Meta will have the brain and compute power to develop cutting edge predictive analytics to manage maintenance. But it will help immensely if the cooling architecture it chooses can serve up the right, real time information to inform those systems.

At a broader level, the heat the cooling system extracts must still go somewhere, Municipal authorities may be grateful for the jobs a data center will bring but may not he happy if it creates microclimates in the immediate area. Environmental groups and regulators will be on the watch for any adverse environmental impact.

Meta should be thinking from the outset about how this energy can be repurposed, whether for district heat systems or other industrial uses such as factories or heating greenhouses. A cooling partner with a broader vision of how heat can be reused, and the engineering skills to make this happen, will be a great help here too.

Meta’s datacentre ambitions are more than just a massive test of its resolve to dominate AI. They represent an engineering bet of truly titanic proportions. And that means they also present the biggest test yet for the liquid cooling sector. It’s in all our interests that it passes.

Related Posts
Others have also viewed

The quiet rise of agentic intelligence inside the enterprise

The next wave of enterprise transformation will not be led by humanoid robots but by ...
Into The madverse podcast

Episode 25: AI that gives doctors time

Mark Venables speaks with Dr Ian Robertson, Interim Director at Tandem Health, about how AI ...

Europe’s AI gold rush is shaped by power, not property

Artificial intelligence is forcing Europe to confront an inconvenient truth: the limits of its energy ...

A new era for AI ecosystem innovation

David Terry, Schneider Electric’s AI Enterprise & Alliance Partner Director for EMEA discusses the emergence ...