reduction in total ML inferencing costs for Samsara
cores for model serving deployed with Ray Serve at Ant Group
higher QPS serving with optimized version of Ray Serve (vs. open source Ray Serve)
fewer nodes with features like Replica Compaction (compared to open source Ray)
Ray Serve is a scalable model serving library for building online inference applications, offering features like model composition, model multiplexing, and built-in autoscaling.
Because Ray Serve is framework-agnostic, you can use a single toolkit to serve everything from deep learning models built with any ML framework, including PyTorch, TensorFlow, and other popular frameworks.
Plus, Ray Serve has several features and performance optimizations for serving LLMs such as response streaming, dynamic request batching, multi-node/multi-GPU serving, and more.
Integrate multiple ML models with separate resource requirements and auto-scaling needs within one deployment. Orchestrate processing workflows at scale with Ray Serve.
Scale from your laptop to 1,000s of nodes easily
Production services support for model training and deployment
Launch Your Cluster on Any Cloud with Any Accelerator
Support led by the creators and maintainers of Ray
Runtime: Performance and CostScale from your laptop to 1,000s of nodes easily | N/A N/A | – – | |
Production ReadinessProduction services support for model training and deployment | Limited Limited | ||
Cloud and GPU SupportLaunch Your Cluster on Any Cloud with Any Accelerator | N/A N/A | Limited Limited | |
Many Model Patterns | Limited Limited | ||
SupportSupport led by the creators and maintainers of Ray | — — | Limited Limited |
Jumpstart your development process with custom-made templates, only available on Anyscale.
Base models, LoRA adapters, and embedding models. Deploy with optimized RayLLM.
Text-to-image generation model by Stability AI. Deploy with Ray Serve.
Optimize performance for Stable diffusion with Triton on Ray Serve.
Anyscale is the smartest place to host Ray, providing a managed experience that increases performance, optimizes utilization, and reduces costs. Anyscale improves Ray Serve runtime and scalability with advanced features like: