AI moves to the mainstream as serverless model deployment gains traction

Mark Venables

The Week in AI, AI Hardware/Infrastructure, AI Solutions

Share this article

A new AI infrastructure model is gaining ground, promising to shift the centre of gravity in artificial intelligence away from hyperscale data centres and towards more flexible, affordable deployment options. With the launch of its Serverless Inference platform, European AI hyperscaler, Nscale is positioning itself at the heart of this transition, offering a public, on-demand solution designed to make model inference more accessible for organisations of all sizes.

The move reflects a broader industry trend: the rapid growth of AI model deployment is running up against traditional infrastructure constraints. Inference, the process of running trained AI models to generate outputs, often requires highly specialised, expensive hardware that sits idle when not in use. For smaller companies or development teams without dedicated compute resources, this makes experimentation and scaling prohibitively expensive.

Nscale’s Serverless Inference platform aims to change that by providing a pay-as-you-go model, where users only pay for the compute they actually use. By eliminating the need to manage or invest in underlying infrastructure, the platform dramatically lowers the barrier to entry for generative AI adoption. It is designed to complement Nscale’s existing private cloud offerings, which serve enterprises with large-scale, dedicated workloads.

According to Daniel Bathurst, Chief Product Officer at Nscale, the new platform is intended to “make AI model deployment simple and cost-effective” while supporting Europe’s ambitions for digital sovereignty. “With upcoming features set to include dedicated endpoints, fine-tuning capabilities and the ability to support custom model hosting, we’re proud to offer sovereign, European AI infrastructure to meet rapidly growing inference demand,” said Bathurst.

The service offers immediate access to a range of widely used generative models, including Meta’s Llama, Alibaba’s Qwen, and DeepSeek, via OpenAI-compatible APIs and a user-friendly web console. These options enable users to test and scale inference workloads without vendor lock-in or complex integration challenges. For organisations under pressure to demonstrate ROI from AI initiatives, serverless models could provide a practical bridge between prototyping and production.

Beyond accessibility, the platform incorporates enterprise-grade features such as observability, Slurm and Kubernetes orchestration, and multi-tenant security. These are essential for organisations that require performance, reliability, and compliance without compromising data protection or operational transparency.

As AI becomes a strategic priority for businesses across sectors, the focus is shifting from algorithmic breakthroughs to operational enablement — how to make AI usable, adaptable, and economically viable. Serverless inference sits at the heart of this challenge, offering a scalable alternative that removes the need for capital-intensive investment in hardware or in-house AI ops teams.

The rise of platforms like Nscale’s also raises broader questions about the future shape of the AI economy. Will the dominance of US-based hyperscalers be challenged by regional alternatives that prioritise sovereignty and flexibility? Can serverless AI become the standard delivery model in the same way that serverless computing revolutionised application deployment?

These questions are likely to become more urgent as generative AI capabilities continue to evolve and regulatory scrutiny intensifies, especially in Europe. Infrastructure providers that can offer both compliance and cost control may find themselves in a strong position to shape the next phase of the AI boom.

By offering a public AI inference platform that is both developer-friendly and enterprise-ready, Nscale is betting on a future where access, flexibility, and sovereignty matter as much as raw performance. If that bet pays off, it could signal a fundamental shift in how AI is consumed and scaled.