Anyscale's Ray LLM

Anyscale Exclusive ML Library for LLM Inference

What is Ray LLM?

Ray LLM is an ML library for LLM inference, only available on Anyscale.

Combining the performance gains from Anyscale’s optimized vLLM and the production-readiness and scalability capabilities of Anyscale’s Ray Serve, RayLLM is the best way to deploy and manage open source LLMs.

RayLLM

Benefits

Enhanced vLLM Engine

We’ve optimized vLLM capabilities for throughput, latency, and model parallelism. Tune your engine performance to your specific needs and reduce batch and online inference costs.

LLM Inference Capabilities

RayLLM supports any model. It’s compatible with OpenAI’s API, and offers JSON mode and function calling support. Plus, easily deploy multi-LoRA and vision language models.

Any Model, Any Accelerator

Fine-tune and deploy any open-source model that’s supported on HuggingFace, including popular open-source models like LLaMA, Mistral, and more. Run any inference engine, including vLLM, TRT-LLM, TGI, and more.

Best-in-Class Reliability

RayLLM is built on top of Ray Serve, Anyscale’s highly scalable and efficient ML serving system. Deploy LLMs without worrying, with support for head node recovery, Multi-AZ support, and zero downtime upgrades.

Feature Highlights

JSON Mode

Easily enable JSON mode in RayLLM. Plus get a variety of optimizations for performance and reliability, exclusive for Anyscale users.

UI – JSON mode
Canva Logo Black

“We have no ceiling on scale, and an incredible opportunity to bring AI features and value to our 170 million users.”

Greg Roodt
ML Lead, Canva

Try Anyscale Today

Build, deploy, and manage scalable AI and Python applications on the leading AI platform. Unlock your AI potential with Anyscale.