The network is no longer the background

Mark Venables

AI In Depth, AI Hardware/Infrastructure, Exclusives

Share this article

Artificial intelligence is now colliding with physical reality in ways most enterprises are unprepared for. As AI systems become distributed, latency sensitive, and operationally autonomous, the true bottleneck is no longer compute, but the networks that connect everything together.

For more than a decade, AI infrastructure has been framed almost entirely around compute. The dominant narrative has been one of larger models, denser silicon, and hyperscale data centres absorbing ever greater volumes of power and capital. What is far less discussed is that AI is now hitting limits that cannot be solved by simply buying more GPUs. Power availability, real estate constraints, and energy costs are forcing AI systems to fragment across regions, campuses, industrial sites, and sovereign environments. Once that fragmentation begins, the system stops being a data centre problem and becomes a network problem.

Hemant Malik, Head of Technology Strategy and Portfolio at Nokia, argues that this shift fundamentally changes how AI infrastructure should be understood. The most common misconception, he suggests, is that AI can still be treated as a centralised capability, even as its physical footprint becomes increasingly distributed.

“By physical limits you mean power, real estate, and the physical infrastructure for networks,” Malik says. “There is no single centralised capability anymore because of those constraints. When you distribute, and you must distribute, the most critical thing becomes the network. Compute and GPUs are obviously important, but if you do not have a matching network that can move data across distributed compute and storage facilities, the whole system breaks. You do not get the experience, the performance, or the cost benefits that AI deployments need today.”

The consequence of this shift is subtle but profound. In traditional IT systems, the network existed to serve applications. In modern AI systems, applications exist only to the extent that the network can sustain continuous, high bandwidth, low latency coordination between compute, storage, edge devices, and users. The network stops being plumbing and becomes the system itself.

From scale to coordination

The physical constraints shaping AI are forcing a break with the architectural assumptions of the cloud era. Hyperscale facilities are still growing, but they are no longer sufficient on their own. Training, inference, data ingestion, and real time control are increasingly spread across many locations, each optimised for different cost, latency, and sovereignty requirements. This means that AI performance is now determined less by how powerful any single site is, and more by how well the system behaves as a whole.

Malik argues that this is where most infrastructure strategies are still dangerously misaligned with reality. The industry continues to focus on scale, when the real challenge has become coordination. “There is a lot of focus on compute, but without equally robust, low latency connectivity, everything downstream collapses,” he continues. “You will not get the requisite performance or the economic efficiency that AI needs. The network is what makes distributed AI usable at all.”

In a distributed system, the value of any GPU cluster is conditional on its ability to exchange data with other clusters in real time. Training workloads depend on synchronisation across thousands of processors. Inference pipelines rely on immediate access to contextual data that may be physically located elsewhere. Edge systems generate continuous telemetry streams that must be aggregated and acted upon within tight time windows. None of this works if the network becomes a bottleneck.

“If the network stalls, GPUs sit idle and everything stops,” Malik said. “Training and inference both require petabytes of continuous data movement. The network determines whether that compute is actually productive.” This is why the old idea of the cloud as a single destination is becoming obsolete. AI traffic does not move along a straight line from user to hyperscaler and back again. It flows dynamically between edge, regional, and central systems depending on what the model is doing at any given moment.

“Heavy training still goes to the core. Inference stays closer to the user. Most real workloads mix both continuously,” Malik adds. “The network must support all of that at the same time.”

From programmable to autonomous

The next stage of this evolution is what Malik describes as AI native networks. These are not simply faster networks or more programmable networks. They are systems that observe their own behaviour, predict future demand, and reconfigure themselves continuously without human intervention.

“An AI native network senses and acts on its own,” Malik said. “It exports real time telemetry from the switches and the equipment, uses embedded models to predict hotspots, and then autonomously tunes itself using closed loop automation. They are trained on real traffic patterns, service requirements, and user behaviour.”

This represents a shift from human managed infrastructure to machine managed infrastructure. Instead of engineers defining policies and reacting to incidents, the network itself becomes the decision-making layer. It monitors performance, anticipates congestion, reallocates capacity, and corrects faults before users experience them. “If you look at maturity on a scale of one to five, most organisations today are at around two or two and a half,” Malik explains. “By the end of the decade, most will be around four or four and a half.”

In this world, reliability also changes meaning. It is no longer just about uptime. It becomes about data integrity, predictability, and correctness under continuous load. “It is not only about five nines availability,” Malik says. “It is predictive assurance, packet level rollback, inline integrity checks, forward error correction, and AI driven anomaly detection to catch silent data corruption. That is what mission critical means in AI systems.”

Silent corruption is particularly dangerous in AI contexts. A corrupted packet can poison a training run, distort a model, or generate incorrect decisions at scale. The network therefore becomes responsible not just for delivery, but for the semantic integrity of the data itself.

Latency becomes the business logic

As AI systems interact more directly with physical processes, latency stops being a technical metric and becomes a business variable. Where intelligence is executed determines whether applications are viable at all, particularly in robotics, healthcare, industrial automation, and critical infrastructure.

“The goal is to move service delivery points closer to the user,” Malik said. “Not everything should go to the core, and not everything should be served from the edge. It depends on latency requirements. The right hierarchy in the network must deliver the service.”

This leads to architectures built around elastic regional backbones, capable of dynamically shifting traffic between edge, metro, and central data centres. Rather than a single highway to the cloud, Malik describes a mesh of express lanes, continuously steered by software. “You need capacity at the edge, high speed fibre into neighbourhoods, campuses, industrial sites, and then software and AI enabled traffic steering between distributed locations,” he says. “The network must deliver performance where it is needed, when it is needed.”

In some applications, even ten milliseconds is already too slow. Closed loop control systems in robotics and autonomous machinery require sub millisecond decision cycles. In these contexts, the network becomes the limiting factor on what AI can do at all. “There are use cases where even ten milliseconds is too slow,” Malik said. “Closed loop decisions depend entirely on the network.”

This reframes infrastructure planning. It is no longer about building the biggest data centre, but about designing the right spatial distribution of intelligence across physical space.

Sustainability moves into the network

The sustainability debate around AI is usually framed around compute efficiency and data centre energy consumption. Malik argues that the network is where some of the most meaningful gains can be achieved. “Network is the easiest place to add performance per watt without sacrificing compute cycles,” he said. “With the right photonic switching and silicon design, you reduce amplification, cut long haul transport energy, and push much higher throughput at much lower power per bit.”

These gains are not limited to hardware. Architectural decisions also determine how much data needs to move, how far it travels, and how often it is duplicated across systems. “Do not overload central data centres,” Malik adds. “Move workloads optimally, use edge sites, integrate intent based traffic steering. That reduces both power footprint and carbon footprint.”

In this view, sustainability is not an overlay applied after infrastructure is built. It emerges from how networks are designed, how intelligence is placed, and how traffic flows are orchestrated. “The right architecture resolves the contradiction between performance and sustainability,” Malik said. “It is not just how components integrate; it is how the network equipment itself is built.”

Networks as strategic differentiators

Perhaps Malik’s most radical argument is that networks must be reclassified from cost centres to strategic assets. In an AI economy, the network determines whether expensive investments in compute generate value or remain stranded. “Enterprises need to stop thinking of the network as a cost centre and start thinking of it as a business enabler,” Malik continues. “Compute grabs the headlines, but it is the network that moves the data, stitches together cloud and edge, and delivers the split-second responses that AI systems require.”

This changes how infrastructure should be governed. Connectivity becomes part of application architecture, not a procurement decision handled by a separate team. “If the network stalls, GPUs sit idle and everything stops,” Malik said. “Without the right network fabric, AI simply cannot happen.”

In that sense, networks are no longer invisible. They become visible, strategic, and economically decisive. AI does not fail because models are not powerful enough. It fails because the systems that move data cannot keep up with what intelligence now demands. “For years, networks were the silent glue,” Malik concludes. “Now they are becoming the system itself.”