Enterprises have spent two decades optimising cloud delivery, yet the real constraint is no longer provisioning, it is permission, governance, and trust. If billions of agents is not marketing theatre, then the next fight is about boundaries, evaluation, and whether anyone dares to let automation touch the real business.
Las Vegas keynotes are designed to dazzle but speaking on the main stage at AWS re:Invent 2025, Matt Garman, CEO of Amazon Web Services, kept returning to something far less theatrical and far more operational. He framed the current moment not as another leap in model capability, but as a structural shift in how software itself behaves inside organisations. The centre of gravity is moving from systems that respond, to systems that act, and that change brings with it a completely different set of risks and organisational tensions.
“AI assistants are starting to give way to AI agents that can perform tasks and automate on your behalf,” Garman said. “This is where we are starting to see material business returns from AI investment. I believe that the advent of AI agents has brought us to an inflection point in AI strategy. Industry, it is turning from a technical wonder into something that delivers real value.”
The claim sounds bold, but the subtext is cautious rather than triumphalist. Garman acknowledged that the promised returns of generative AI have not yet fully materialised across most enterprises, and that the technology has moved faster than organisational confidence. The missing piece is not intelligence or compute, it is reliability, predictability, and the willingness of leadership teams to let machines touch real business processes without constant human supervision.
Agents as operating model
The idea of billions of agents risks sounding like conference hyperbole until you consider what it actually implies for how organisations scale and structure work. Once the marginal cost of execution collapses, companies stop automating isolated tasks and start redesigning entire workflows around machine-scale labour. Software stops behaving like a support layer and starts behaving like a parallel workforce that can act continuously, in the background, and at volumes no human organisation could ever sustain.
“In the future, there is going to be billions of agents inside of every company and across every imaginable field,” Garman said. “Already we see agents accelerating healthcare discoveries, improving customer service, making payroll processing more efficient. Agents are starting to scale people’s impact up by ten times in some cases, so they have more time to invent more.”
This is not a story about better interfaces or more natural conversations. It is a story about labour economics inside digital organisations, and about the redistribution of decision-making away from people and into systems. The strategic consequence is that management shifts from coordinating workers to directing outcomes, while execution becomes something that happens automatically once intent is expressed in machine-readable form.
The real disruption is not that agents can do more, but that they no longer need to wait. They do not pause for meetings, approvals, or working hours, and they do not forget context when projects stretch over weeks or months. The organisational challenge becomes how to remain in control of systems that are permanently active and permanently capable of acting.
Autonomy breaks traditional software
Garman was explicit that autonomy is not simply an upgrade to existing software paradigms, but a break from them. The foundations of software engineering, built around determinism and predictable execution, do not translate cleanly into systems that reason dynamically and choose their own paths through complex tasks.
“These agents work in non-deterministic ways, which is part of what makes them so powerful,” he said. “But it also means that the foundation and tools that got us to where we are building software are not necessarily the ones that we need for agents.”
Traditional systems fail in ways that are technically legible. They crash, return errors, or produce outputs that can be traced back to specific bugs. Agentic systems fail in more ambiguous ways, by making plausible decisions that turn out to be wrong, inappropriate, or harmful only in hindsight. The system may be behaving exactly as designed yet still generate outcomes that the organisation cannot accept.
“What makes agents powerful is this ability to reason and act autonomously,” Garman said. “But that also makes it hard for you to have complete confidence that your agents are not going to stray out of bounds.” This introduces a fundamentally new class of enterprise risk. The problem is no longer whether software executes correctly, but whether it behaves acceptably, and whether anyone can prove why it behaved the way it did when something goes wrong.
Trust becomes infrastructure
One of Garman’s strongest arguments was that trust can no longer be treated as a governance problem layered on top of technology. It must be engineered directly into the architecture of agent systems, or autonomy simply cannot scale.
“One big challenge that we have seen when you are building agents is how do you get them to behave predictably and in line with your intents,” he said. “Customers struggle with this. You can embed policies directly in your agent’s code, but because agents generate and execute their own code on the fly, these safeguards are really best effort and can only provide weak guarantees.”
In other words, prompt-level controls and internal logic checks are not governance, they are aspirations. The only way to make autonomy survivable at enterprise scale is to separate decision-making from enforcement, so that agent actions are intercepted and evaluated before they reach real systems. “This policy enforcement is outside of your agent’s application code,” Garman said. “It sits in between your agent and all of your data and your APIs and tools, so you can predictably control their behaviour.”
This shift is subtle but profound. Control becomes external, auditable, and independent of the model’s internal reasoning. Enterprises stop trying to make agents safe purely through training and alignment and instead build hard boundaries that cannot be bypassed by clever reasoning or unexpected behaviour.
Observability is not judgement
Even with policy enforcement, Garman argued that most organisations lack the tools to understand whether agents are behaving well over time. Operational metrics such as latency, error rates, and throughput say nothing about the quality of decisions being made.
“You want to know things like, are your agents making the right decisions. Are they using the best tool for the job. Are their answers correct and appropriate. Are they even on brand,” he said. “These are things that are super hard to measure today.”
This creates a new operational discipline where enterprises must evaluate not just system performance, but system judgement. Correctness, helpfulness, and harm become live metrics that need to be monitored continuously, because behaviour can drift even when the underlying infrastructure remains stable. “You only know how your agents are going to react and respond when you have them out there in the real world,” Garman said. “That means you have to continuously monitor and evaluate behaviour in real time and then quickly react if you see them doing something you do not like.”
The management analogy he used is revealing, because it treats agents less like software and more like employees. “At Amazon we say trust but verify. I trust our teams to go and invent, but I also have mechanisms that allow me to dive deep and inspect when things are on track,” he said. “The same thing applies to agents.” Autonomy is granted, but under constant review, with the assumption that systems will sometimes behave in ways that require intervention.
Model choice is not the real battle
Despite extensive discussion of model capabilities, Garman dismissed the idea that any single model will dominate the enterprise landscape. Instead, he described a future built around portfolios of models, each optimised for different tasks, risks, and cost profiles. “We have never believed there was going to be one model to rule them all,” he said. “There will be a tonne of great models out there.”
This reflects how agent systems operate in practice. Different workflows require different trade-offs between reasoning depth, latency, cost, and modality, and no single model can optimise all of those simultaneously. The enterprise problem becomes orchestration rather than selection, and the strategic asset becomes the ability to switch models without rebuilding systems.
Model choice matters only when it is decoupled from infrastructure. The real value lies in being able to experiment, replace components, and adapt to rapid changes in the model landscape without destabilising core business processes.
Infrastructure becomes the real constraint
As agents move into production, infrastructure stops being background and becomes a limiting factor on what organisations can safely deploy. Agents require continuous retrieval, fast inference, persistent memory, experiment tracking, and predictable cost structures that support constant iteration. “Getting to a future of billions of agents is going to require us to push the limits of what is possible with the infrastructure,” Garman said. “We are going to have to invent new building blocks for agentic systems.”
In this world, inference becomes part of every application, not a specialised service. Data storage, vector retrieval, and experiment management become core operating capabilities rather than supporting tools. “In the future, inference is such an integral part of every single application that everyone builds,” Garman said. “You need a secure, scalable, feature rich inference platform.”
The deeper implication is that AI stops being a layer and becomes the substrate of enterprise systems. The question is no longer whether organisations adopt AI, but whether their infrastructure can support autonomous behaviour without collapsing under operational and governance risk.
Autonomy is the new enterprise risk
Garman’s keynote ultimately converged on a simple but unsettling reality. Agentic systems change the nature of business risk in ways that most organisations are not yet prepared to manage. Traditional software risk is technical and largely predictable.
Agentic risk is behavioural, probabilistic, and often only visible after damage has occurred. Agents can make decisions, act on systems, and persist over long periods without human intervention, which means mistakes scale as easily as successes. “You want to give agents autonomy,” Garman said. “But you also want to put some ground rules in place to avoid major issues.”
The future of enterprise AI will not be decided by who builds the smartest agents. It will be decided by who can deploy them without losing control, without breaking trust, and without creating systems that nobody can fully explain when they go wrong. In that sense, the agent era is not primarily a technology story. It is a governance story disguised as innovation, where the real competitive advantage lies not in intelligence, but in the ability to survive autonomy.




