Home ResourcesServe Models At Scale

Serve Models At Scale

There are four common patterns of machine learning production: pipeline, ensemble, business logic, and online learning. Implementing these patterns typically involves a tradeoff between ease of development and production readiness. Web frameworks are simple and work out of the box but can only provide single predictions; they cannot deliver performance or scale. Custom tooling glue tools together but are hard to develop, deploy, and manage. Specialized systems are great at serving ML models but they are not as flexible or easy to use and can be costly.

Anyscale helps you go beyond existing model serving limitations with Ray and Ray Serve, which offers scalable, efficient, composable, and flexible serving. Ray Serve provides:

- A better developer experience and abstraction
- The ability to flexibly compose multiple models and independently scale them
- Build-in request batching to help you meet your performance objectives
- And resource management (CPUs, GPUs) to specify fractional resource requirements

Learn more about Anyscale

Why everyone is turning to Ray

Develop on your laptop and then scale the same Python code elastically across hundreds of ndes or GPUs on any cloud — with no changes.

What You Need for the Right Scalable ML Platform

Train, test, deploy, serve, and monitor machine learning models efficiently and with speed with Ray and Anyscale.

Scale with a Click

Rely on a robust infrastructure that can scale up machine learning workflows as needed. Scale everything from XGBoost to Python to TensorFlow to Scikit-learn on top of Ray.

An Open, Broad Ecosystem

Gain to the most up-to-date technologies and their communities, don’t limit what libraries or packages you can use for your models. Load data from Snowflake, Databricks, or S3. Track your experiments with Weights & Balances or MLFlow. Or monitor your production services with Grafana. Don’t limit yourself.

Iterate quickly

Reduce friction and increase productivity by eliminating the gap between prototyping and production. Use the same tech stack regardless of environment.

What Users are Saying About Ray and Anyscale

Explore how thousands of engineers from companies of all sizes and across all verticals are tackling real-world workloads with Ray and Anyscale.

Ray has profoundly simplified the way we write scalable distributed programs for Cohere’s LLM pipelines. Its intuitive design allows us to manage complex workloads and train our models across thousands of TPUs with little to no overhead.

Siddhartha Kamalakara
Machine Learning Engineer

Ant Group has deployed Ray Serve on 240,000 cores for model serving, which has increased by about 3.5 times compared to last year. The peak throughput during Double 11, the largest online shopping day in the world, was 1.37 million transactions per second. Ray allowed us to scale elastically to handle this load and to deploy ensembles of models in a fault tolerant manner.

Tengwei Cai
Staff Engineer

We use Ray to run a number of AI workloads at Samsara. Since implementing the platform, we’ve been able to scale the training of our deep learning models to hundreds of millions of inputs, and accelerate deployment while cutting inference costs by 50% - we even use Ray to drive model evaluation on our IoT devices! Ray's performance, resource efficiency, and flexibility made it a great choice for supporting our evolving AI requirements.

Evan Welbourne
Head of AI and Data