Instant clusters promise a faster path to AI breakthroughs

Share this article

Together AI has launched a new service designed to remove one of the most stubborn obstacles to artificial intelligence development: the slow and complex process of assembling large GPU clusters for training and inference. The company’s Instant Clusters platform, now generally available, allows engineers to deploy tightly networked NVIDIA GPUs in minutes rather than days, offering a self-service approach that reflects the on-demand ethos of modern cloud computing.

AI researchers often spend valuable time procuring hardware, configuring drivers and wiring together networking components before a single training run can begin. By automating this infrastructure layer, Together AI aims to let teams focus on model design and data quality rather than the mechanics of provisioning. The service supports both NVIDIA Hopper and the next-generation Blackwell GPUs, with orchestration through Kubernetes or Slurm and built-in low-latency networking.

Removing the hardware bottleneck

Large language models and other demanding workloads rely on high-performance, multi-node GPU clusters. Traditional setup processes involve procurement cycles and manual configuration that can delay experiments or slow the scale-up required for inference surges. Instant Clusters replaces those steps with an API-driven interface and preconfigured components such as NVIDIA’s GPU Operator, network operators for InfiniBand and Ethernet, and secure ingress controllers.

This approach allows organisations to scale from a single node to hundreds of GPUs while maintaining consistent performance. Shared storage, pinned driver versions and checkpointing are included to support reproducible environments for distributed training or reinforcement learning tasks. The service is designed for what the company calls “AI native” firms, where sudden spikes in compute demand are common.

Meeting the needs of modern AI

The timing reflects wider pressures in the AI sector. Global demand for GPU capacity is soaring as enterprises experiment with larger and more complex models, while competition for high-end chips is intense. Reducing the time required to access compute resources can accelerate research cycles and help organisations respond quickly to market or product developments.

Together AI’s internal researchers were early testers, using the system to run short-lived but computationally intensive jobs that would otherwise require lengthy setup. Early users in healthcare and data science report similar benefits, using the platform to train and refine models without delays or extensive hardware expertise.

By combining automation with rigorous reliability checks—burn-in testing of nodes, validation of interconnect performance and continuous monitoring—the service aims to provide the stability required for production workloads as well as experimental runs.

The release of Instant Clusters highlights a broader trend in AI infrastructure: the shift toward cloud-style, on-demand services that can keep pace with rapidly evolving models and unpredictable demand. For developers and researchers, the ability to start training on a large GPU cluster within minutes may prove as significant as the raw processing power of the chips themselves.

Related Posts
Others have also viewed
Into The madverse podcast

Episode 27: Why trusting one AI model is the biggest risk

In this episode of Into The Madverse, Mark Venables speaks with Matt Penton, Director of ...

Identity is becoming the weakest link as autonomous systems spread

As artificial intelligence moves from experimentation into everyday enterprise operations, a familiar security assumption is ...

Why streaming data is becoming the weakest link

As artificial intelligence becomes embedded across business operations, organisations are discovering that the hardest part ...

Schneider Electric reshapes its data centre leadership for the AI era

Britain’s data centre sector is entering a decisive phase. Artificial intelligence is driving unprecedented demand ...