Robotics has yet to experience its ‘ChatGPT moment’, but as Mark Venables discovers, the key to unlocking intelligent physical systems lies not in algorithmic genius but in real-world data. With a 100,000-fold data gap separating robot models from their language-based cousins, a new generation of industrial automation is emerging one warehouse pick at a time.
The robots we were promised would fold laundry, make the tea, and carry our shopping. We mainly got arms in warehouses, picking boxes and stacking them onto pallets. But behind that modest reality lies a seismic shift in robotics, driven by an insight familiar to any executive trying to scale AI across their organisation: it is not the model that matters, it is the data.
“AI has progressed tremendously in helping us think, but when it comes to helping us act in the physical world, the progress has been much slower,” Ken Goldberg, Co-Founder and Chief Scientist at Ambi Robotics and William S. Floyd Distinguished Chair at UC Berkeley, says. “We have been waiting for robots to arrive that can help in meaningful ways in homes, hospitals, warehouses and beyond.”
The challenge is not just technical; it is dimensional. Whereas vision and language operate in one or two dimensions, robotics operates in at least six. In addition to the need for real-world data and the fragility of simulation, it becomes clear why humanoid robots have not yet had their ‘ChatGPT moment.’
Simulation can bridge part of the gap. It works well for locomotion and drone control where the physics are relatively tractable. However, once contact, friction and soft materials are involved, as in manipulation tasks, the limits of simulation become painfully apparent. “The contact dynamics involved in manipulation are incredibly nuanced,” Goldberg adds. “Friction, micro-collisions, and small deformations are difficult to simulate accurately. We do not yet have physics engines that can model those behaviours with the level of precision required.”
From internet images to industrial dexterity
For vision and language, the internet was the source of a revolution. ImageNet and Common Crawl offered oceans of labelled data. Robotics, however, suffers a drought. Even video data is of limited use, offering only partial visibility into physical interaction. “These videos are still limited to two dimensions,” Goldberg explains. “They lack the depth, control, and action data required for robotics. We need full 3D scene understanding and action sequencing, which current video data does not provide.”
One early breakthrough came with Dex-Net, a project inspired by Fei-Fei Li’s work on ImageNet. Ambi Robotics collected 15,000 3D object models, generating over 6.7 million synthetic grasp attempts using Monte Carlo simulations. The resulting models could predict grasp success rates from noisy 3D data and performed well in physical trials. One robot, shown a shoe it had never seen before, picked it up successfully in front of Jeff Bezos. That moment helped launch Ambi Robotics and a pivot from research to industrial deployment.
The result was AmbiSort, a robotic system capable of sorting packages by zip code at commercial throughput levels. These systems are now deployed across the US and have handled over 100 million parcels. More important than the output is what these robots collect: over 200,000 hours of real-world manipulation data. “Each pick, each success or failure, gets recorded,” Goldberg says. “Across all our systems, we have collected 200,000 hours of real-world robotic manipulation data, about a petabyte of data or 22 years’ worth of continuous robot operation.”
This data trove is more than a milestone. It is a blueprint for scaling robotics through the same principles that propelled language and vision AI breakthroughs. For organisations wrestling with the complexity of physical workflows, the lesson is simple and profound: build for repeatability, a feedback instrument, and value every interaction as a data point.
The data flywheel for physical AI
This accumulation of real-world data is not just archival. It is the foundation for a generative AI system called Prime, trained on just one per cent of the collected data and outperforming previous models by 16 per cent. Goldberg describes this as the beginning of a “data flywheel” for robotics: “You start with a deployable system using existing methods like simulation and heuristics,” he explains. “Once deployed, the system generates real-world data that feeds into training. Each new model improves performance and unlocks new capabilities.”
That same principle powered the shift from AmbiSort to AmbiStack, a robot trained to pick and place, stacking boxes on pallets with the elegance of a logistics-themed Tetris champion. The challenge here is computational as much as mechanical. Packing boxes optimally in three dimensions is an NP-hard problem. Reinforcement learning offered a solution, with the model learning to achieve a 90 per cent packing density, more than double that of naïve approaches.
These capabilities matter because they address actual pain points in industrial operations. Manual picking and packing are hard to automate, not due to a lack of motivation but because it is deceptively complex. “Humans have evolved extraordinary dexterity and tactile sensitivity over millions of years, but robots have not caught up,” Goldberg continues. “This is Moravec’s Paradox – the idea that tasks that are simple for humans, like picking up objects, are extremely difficult for robots.”
And yet, by collecting and feeding back real-world data, systems like AmbiStack are learning the intricacies of weight distribution, packaging variability, and edge-case scenarios that would derail more brittle solutions. This iterative improvement, built on live deployment rather than theoretical perfection, offers a clear path for physical AI to develop competencies in the real world.
Specialisation before generalisation
There is no shortage of ambition in robotics, but the difference between ambition and progress is often one of focus. Ambi Robotics has chosen a narrow warehouse automation domain precisely because it is manageable, measurable, and operationally valuable. “Rather than focusing on solving general-purpose robotics from the outset, the ultimate robot butler, we should focus on narrower, specific tasks,” Goldberg says. “Starting with a focused application gives us a better shot at success. From there, we can expand and generalise.”
This approach resonates far beyond robotics. The lesson is the same in healthcare, finance, and manufacturing: start with data-rich domains where success can be demonstrated, then scale. Generality, if it comes at all, comes from iteration, not intention.
The Prime model’s scaling curve, echoing the logarithmic improvements seen in large language models, suggests that data accumulation, not architectural novelty, is the key to performance. But it is the nature of the data that matters. Real-world, task-specific data recorded by robots in action has proven far more effective than synthetic approximations or video proxies.
What emerges is a roadmap for robotics companies and any enterprise looking to extend AI into the physical world. Systems that learn from interaction, improve over time, and are built for domain-specific excellence are more than tactical investments; they are strategic differentiators.
From robotic arms to ageing societies
This might seem like an esoteric corner of logistics, but its implications are broader. The problem Ambi is addressing is not unique to e-commerce. Healthcare, elder care, and agriculture are all domains where physical interaction is essential, and labour is increasingly scarce. “Most of our engagement is online, but we still have physical bodies that require interaction with the physical environment,” Goldberg says. “We need to move, make, and maintain things, both our environment and ourselves.”
As populations age and workforces shrink, physical AI must step in. But it will not arrive as a general-purpose humanoid robot with catch-all capabilities. It will arrive, as software did, through narrow verticals, honed by real-world data and refined through repetition.
The challenge ahead is not just to build better robots but also better systems for generating, storing, and learning from physical interaction data. As Goldberg notes, “The robot data gap is enormous, on the order of 100,000 times smaller than what we have for language models. But with approaches like the data flywheel, we can start closing that gap one step at a time.”
The future of robotics will not be built by mimicking human form or behaviour. It will be built by solving human problems. It will not be sparked by a single breakthrough but by thousands of systems solving real-world tasks more effectively over time. That means shifting focus from idealised visions of humanoid assistants to the systems already at work in warehouses, labs, and factories, systems that learn, adapt, and scale through data.
Those robots may not fold laundry or make tea, but they will do something more valuable: They will extend the reach of human capability, not by replacing people but by enabling them to focus on what matters most. The rest, as always, will follow.




