Arm brings on-device AI to the forefront of mobile innovation

Mark Venables

Share this article

The push to embed artificial intelligence into mobile devices is accelerating, and at its centre is a set of low-level CPU instructions few consumers will ever hear about, but which could soon define how smartphones handle AI. Arm, the British chip design company behind the architecture powering many of the world’s phones, has introduced Scalable Matrix Extension 2 (SME2), a new capability designed to supercharge AI performance directly on mobile CPUs.

As mobile developers race to integrate more advanced AI features into apps, from real-time image enhancements to offline voice assistants, the pressure is mounting to maintain speed, efficiency, and power consumption across diverse hardware environments. SME2, an update to Arm’s v9 architecture, targets precisely these needs by accelerating matrix-heavy workloads used in computer vision and generative AI tasks, all without adding extra complexity for developers.

Crucially, these gains are delivered not through specialised chips, but through standard CPUs already in use across smartphones, part of a broader shift toward heterogeneous computing, where AI workloads are distributed across CPUs, GPUs, NPUs, and other processing units depending on context and efficiency.

A silent update with big implications

Perhaps the most significant aspect of Arm’s SME2 rollout is that developers do not need to rewrite or adapt their code to benefit. SME2 is automatically activated via Arm’s KleidiAI, a software acceleration layer that sits within widely used Android AI frameworks and runtime libraries. Through integration with Google’s XNNPACK, Microsoft’s ONNX Runtime, Alibaba’s MNN, and others, SME2 routes matrix operations directly to the enhanced hardware instructions, streamlining performance without developer intervention.

This silent optimisation means that once SME2-compatible hardware reaches users’ hands, apps already using supported frameworks will see automatic boosts in speed and efficiency. For example, Google’s Gemma 3 model reportedly delivers six times faster response times on SME2-enhanced hardware. Tasks like summarising 800-word documents can now begin in under one second, all running locally on a single CPU core.

These improvements come at a time when the AI arms race is moving rapidly toward the edge. On-device AI reduces reliance on cloud processing, unlocking lower latency and increased privacy while cutting dependency on network connectivity. One leading software vendor has already committed to shifting much of its application’s token generation from the cloud to mobile, citing advances such as SME2 as a tipping point.

Building AI ecosystems across platforms

The broader implications extend beyond Android. SME2 benefits are already available on the latest iOS devices that support Arm-based architecture. In total, nine million apps are currently running on Arm, supported by a developer ecosystem more than 22 million strong. By embedding SME2 into the software stack itself, Arm is aiming to unify the developer experience and enable consistent AI performance across ecosystems.

This cross-platform strategy underlines a key point: AI is no longer a value-add for mobile apps, it is becoming foundational. Developers building on top of KleidiAI-enhanced frameworks will be positioned to deliver faster, richer AI features without navigating a labyrinth of hardware fragmentation or writing device-specific optimisations.

Arm’s message to developers is direct, prepare now by aligning with KleidiAI-supported frameworks, and your applications will be ready to exploit SME2-enhanced devices as they arrive. With mobile AI capabilities expected to expand rapidly over the coming year, early alignment may determine not just performance, but market competitiveness.

As the AI layer of the smartphone becomes more intelligent, more responsive, and more local, it is the underlying architecture, and how seamlessly it adapts, that will shape the user experience. Arm’s SME2 may sit deep in the stack, but its impact will be felt in the way everyday apps learn, listen, enhance and predict, all in real time, from the palm of the hand.