DeepSeek R1 is the first truly capable reasoning model that can be self-hosted – and with that comes a host of changes to the way we deploy AI models.
Read on below to see why Anyscale's platform is ideal for hosting these sorts of models, or check out the interactive demonstration showcasing how to deploy DeepSeek-R1-Distill-Llama-8B on Anyscale:
For the first time, organizations can deploy an advanced reasoning model while maintaining full control over their infrastructure, data privacy, and customization. Unlike proprietary cloud-based AI solutions, self-hosting DeepSeek R1 unlocks unparalleled transparency (the ability to view / inspect reasoning tokens, to customize model behavior via test-time scaling, and customize hardware selection) and boundless promise for organizations to create their own DeepSeek R1– customized and differentiated for their unique applications.
The potential of reasoning-powered agentic applications is enormous – leveraging multiple models working in concert offers a path to a truly AI-driven enterprise. By seamlessly integrating models with a broad ecosystem of tools, next generation AI applications have the potential to transform operations at every level. Yet, as with all transformative technology, great power demands great responsibility.
The buzz around DeepSeek is exciting, but it prompts deep questions on where and how to run such a model. Anyone can access an open-source model – but your choice of AI infrastructure determines how effectively you can deploy, optimize, secure, and scale it.
How well does it handle GPU workloads? Is the platform it built for AI, or just retrofitted from CPU-based platforms?
Can it scale beyond a simple demo? Will it still work at hundreds, thousands, or tens of thousands of GPUs in production?
Will it extend and unify all AI workloads – including data processing and traditional model serving? Or is it just good for a subset of AI types or tasks, meaning you’ll need more vendors?
Where does my data go? Does your data leave your account and face security risk and egress cost or can you run anywhere you want?
Is the team experienced in AI model training and serving? Or are they just following AI trends?
Are they AI visionaries or reactionaries? Did they invest in technologies like R1 (reinforcement learning and GenAI) back when it was seen as academic?
How future-proof is it? Can it support the next R1-scale breakthrough seamlessly?
How does the platform lock me in? Does it take control of your data, force you to use their compute in their own system, or prevent you from using the tooling of my choice?
At Anyscale, we provide a unified, future-proof AI platform that allows you to run DeepSeek R1 – and any other OSS model – without limitations. Whether you're optimizing for throughput, latency, or cost; deploying a single model or many as part of a complex application, Anyscale ensures that you have complete control over your infrastructure.
Deploy DeepSeek R1 in our cloud (serverless), your cloud (inside your VPC), or on-prem.
Beyond the major cloud providers, we support any GPU provider (OCI, Lambda, CoreWeave, etc.).
Best of all, Anyscale’s Global Resource Scheduler and Smart Instance Manager optimize instance selection with reservation aware, cloud aware, and cost aware scheduling.
AI is evolving beyond simple LLMs – future applications will comprise many models including reasoning models, “traditional” LLMs, deep learning models, and external tools.
We are pioneers in RL: Our library, RLLib, offers a powerful foundation for large scale Reinforcement Learning and is the engine powering VeRL. Projects like TinyZero are reproducing DeepSeek R1 Zero and creating new reasoning models for <$30. Check it out!
Ray Serve was built to support multi-model composition through deployment graphs for ensembling, chaining, and dynamic routing – the types of patterns common in agentic applications.
Read more about model composition with Ray Serve in our blog
Check out the Anyscale Model Router – which can be trained to dynamically direct queries to high cost reasoning models, closed LLMs, or cost-effective open-source LLMs, based on query complexity, optimizing both response quality and cost.
Ray was originally built specifically for complex distributed AI tasks, making it ideal and the industry standard for complex GenAI workloads.
Anyscale is powered by Ray – the open source AI Compute Engine at the forefront of the AI revolution. Unlike proprietary serving solutions, we provide full transparency and control over model execution.
This means that you have control over where to run, on exactly what hardware, and access to control every optimization and feature we’ve built to enhance your AI application.
One of the foundational design philosophies of Ray has always been to enhance all integrations, but never to lock you into a single stack. So as the market changes and more tools and libraries rise, feel secure that a migration to Anyscale is ready for the AI tech stack of tomorrow.
As you deploy reasoning models like DeepSeek R1 across your business, Anyscale dynamically autoscales to meet your demands. Anyscale supports deployments of over 8,000+ nodes comprising 50,000+ GPUs -powering the largest AI workloads in production.
Clusters spin up to 1,000+ nodes in under a minute, even from cold starts. No need for warm pools or idle compute waste. Pay for what you need only when you are using it.
Anyscale is one of the major contributors to vLLM, with private features that enhance fast model loading, node startup, and overall performance. Our enterprise support and professional services include access to some of the builders of vLLM at the cutting-edge of model inference research.
Optimize at every level, from infrastructure tuning to high-level performance improvements.
Model Serving goes beyond just running vLLM inference, actual production applications need managed networking, MLOps, and enterprise-ready governance and security.
Anyscale's inference engine is Enterprise Grade: vLLM-powered, performance-tuned, and infinitely scalable
Running DeepSeek-R1-Distill-Llama-8B on Anyscale took just 5 minutes from login to a production-ready service. And it will be the same for any OSS model in the future – no complex setup, no rigid restrictions.
Check out the interactive demo below: