The network is no longer infrastructure it is the constraint on AI

Mark Venables

AI In Depth, AI Factories, AI Hardware/Infrastructure, Exclusives

Share this article

AI is not failing at the model layer, it is failing in motion, in the movement of data between systems that were never designed to operate at this scale. The more compute we deploy, the more brutally the network is exposed as the system that ultimately decides what AI can and cannot become.

The industry continues to talk about AI as if the problem is compute, but the reality emerging inside production environments is far less convenient. Compute scales aggressively, accelerators become denser, and models grow larger, yet performance gains stall because the network cannot keep pace with the volume, velocity, and synchronisation demands being placed upon it. What appears to be a question of processing power is increasingly revealed as a problem of coordination, timing, and the ability to move data without friction across distributed systems that must behave as one.

Rami Rahim, EVP, President and GM of HP Networking at Hewlett Packard Enterprise, does not frame this as a marginal inefficiency or a tuning exercise. His perspective is shaped by nearly three decades in networking, spanning the internet boom, the shift to mobile, and the rise of the data centre, yet he positions the current moment as more consequential than any of those transitions. “Networking has never been more important,” he says. “One GPU is perfect for a great gaming experience, but when you connect 100,000 GPUs together through a high performance, low latency network, you are no longer talking about entertainment, you are talking about accelerating scientific discovery and solving problems that were previously out of reach.”

That scale introduces a form of fragility that traditional architectures were never designed to handle, where latency, packet loss, or jitter are not minor inefficiencies but system-level risks that cascade across environments. Rahim is explicit about how quickly expectations have shifted, noting that “the capabilities we used to prototype in labs are now must haves, virtual assistants, liquid cooling systems, co packed optics, these are no longer science projects, they are requirements for operating at the level modern workloads demand.” The implication is clear, networking is no longer a supporting layer that enables performance, it is the layer that determines whether performance is even possible.

From management to autonomy

The operational consequences of this shift are already visible inside enterprise environments, where networks are being asked to absorb increasing complexity without a corresponding expansion in human capacity. IT teams are not just managing infrastructure, they are attempting to maintain stability in systems that are becoming more dynamic, more distributed, and less predictable with each iteration. This creates a growing disconnect between what networks are required to do and what humans can realistically oversee in real time.

Rahim frames the response in terms that move beyond optimisation into necessity. “Networks are being asked to do more, and the people managing them are being asked to do more with less,” he continues. “By delivering self-configuring, self-optimising, self-healing networks, we give those teams the ability to focus on strategic objectives rather than constantly reacting to issues. This is not about reducing workload, it is about redefining it, shifting the role of IT from reactive management to strategic oversight.”

The idea of a self-driving network is often positioned as a future state, but here it emerges as a structural requirement driven by the speed and complexity of modern workloads. Rahim reinforces this with operational outcomes, explaining that customers adopting our AI driven platforms are seeing up to a 90 percent reduction in trouble tickets and on-site service calls, along with a significant reduction in the time spent on hands on network operations. These are not incremental improvements; they represent a fundamental shift in how networks are operated and maintained.

At the centre of this transition are platforms such as Aruba Central and Juniper Mist, which Rahim positions as complementary rather than competing sources of intelligence. “For more than a decade, these platforms have been analysing data from billions of connected devices, and by combining them we accelerate the pace of learning and move closer to a truly self driving network,” he explains. “The network, in this model, becomes a continuously evolving system, learning from its own behaviour and adapting in ways that reduce the need for human intervention.”

Two problems not one

One of the more persistent misunderstandings in the current discourse is the tendency to treat AI networking as a single challenge, when in reality it consists of two distinct and equally complex problems. Rahim separates these explicitly, arguing that conflating them leads to incomplete solutions that fail under real-world conditions. “Powering these experiences requires a new generation of networking, one that includes AI for networks, the intelligence that keeps them running smoothly, and networks for AI, which are built to handle the scale and speed of modern workloads,” he says.

AI for networks operates in the domain of intelligence, where the focus is on visibility, prediction, and automated decision making. It transforms the network from a passive system into one that actively interprets conditions and acts based on those interpretations. Rahim highlights how systems such as Marvis enable this shift, allowing organisations to query future performance scenarios and receive responses that combine real time analysis with predictive modelling, effectively turning the network into a decision engine rather than a reporting tool.

“For more than a decade, our platforms have been analysing data from billions of devices, and that accumulation of intelligence is what allows us to deliver faster insights, smarter operations, and better experiences,” he explains. “The significance of this is not just in the data itself, but in how it is used to inform actions, reducing the gap between observation and response.”

Networks for AI, however, exist within the constraints of physical infrastructure, where the challenge is not interpretation but execution under extreme conditions. Rahim is clear about how different these workloads are from traditional enterprise applications. “Training models require massive east west traffic flows, inference requires ultra low latency, and both demand scale and reliability that traditional networks were not designed for,” he continues. “This creates pressure across every layer of the architecture, forcing a reconsideration of how systems are built and interconnected.”

The movement of inference closer to the edge illustrates how these pressures are reshaping infrastructure. As data becomes more distributed, the network must support processing that occurs across multiple locations while maintaining consistency and performance. “As inference moves closer to where the data is, smaller, more flexible edge systems become critical, and that is driving demand for compact, high performance routing solutions designed for these environments,” Rahim says. “This is not an extension of existing models, it is a redistribution of capability that introduces new forms of complexity.”

Where performance meets physics

As networks scale to meet the demands of AI workloads, they encounter constraints that cannot be resolved through software alone. Thermal management, power efficiency, and physical density become defining factors, shaping what is possible within data centre environments. These are not secondary considerations; they are fundamental limitations that influence how infrastructure is designed and deployed.

Rahim is candid about the difficulty of operating at this level, particularly when it comes to cooling. “Getting liquid cooling right for production environments is extremely hard, and it requires a level of expertise that cannot be improvised,” he explains. “This is not simply a technical challenge, it is an operational one, requiring coordination across multiple domains to ensure that performance can be sustained without compromising stability.”

The integration of high-performance networking with advanced cooling and AI driven operations reflects a broader convergence within infrastructure, where previously separate concerns are becoming tightly interconnected. Rahim describes how combining these elements allows systems to operate at the scale required by next generation workloads while maintaining manageability, but the underlying message is that these components can no longer be considered in isolation.

Security follows a similar trajectory, moving from a discrete layer to an embedded property of the network itself. Rahim rejects the idea that security can be added on after the fact, arguing that in an environment where all activity flows through the network, it must be built into the system from the start. “Every attack leverages the network, which means the network must also be used to sense, detect, and stop those attacks, and that only works if security is built in rather than bolted on,” he says.

This reframing positions the network as both the point of vulnerability and the mechanism of defence, requiring a level of integration that extends across all layers of the system. It also reinforces the broader theme that networking, security, and operations are no longer distinct disciplines, but interconnected aspects of a single, evolving architecture.

The implications of these shifts extend beyond technical design into the fundamental understanding of what infrastructure represents in the AI era. The network is no longer a passive layer that enables applications to function, it is the active system that determines whether they can scale, whether they can operate reliably, and whether they can deliver on their intended outcomes. Rahim returns to this point in his closing remarks, grounding the discussion in real world impact. “Every day, our networks connect the moments that matter, from healthcare providers analysing real time patient data to students in connected classrooms to retailers fulfilling orders in seconds, and all of these experiences depend on a network that can operate at scale, adapt in real time, and remain secure,” he concludes.

The narrative around AI continues to prioritise models and compute, yet the operational reality points elsewhere, to the infrastructure that must support these systems under sustained pressure. Without networks that can match the scale, speed, and intelligence of modern workloads, progress does not just slow, it stalls entirely. The constraint is no longer hidden in the background, it is visible in the limits of performance, in the challenges of scaling, and in the growing recognition that the network is not simply part of the equation, it is the factor that ultimately defines what AI can achieve.