Best-in-Class LLM Batch Inference

Run LLM batch inference jobs cheaper than competitors. See why Anyscale is the choice for LLM batch inference.

LLM Batch Inference

Optimized Performance

Throughput-Optimized Workloads

We know how important throughput is for your large scale, offline batch inference jobs—and we’ve optimized Anyscale accordingly.

Advanced vLLM Optimizations

The Anyscale inference team consists of many of the leading committers to the vLLM project. Our experts can help tune engine performance to reduce costs.

Long-Context Use Cases

Our custom optimizations for prefix caching enable significant performance improvements on long-context use cases compared to vLLM.

Reduce Costs By Using GPUs and CPUs

Anyscale makes it easy to leverage heterogeneous compute. Use CPUs and GPUs in the same pipeline to increase utilization, fully saturate GPUs, and decrease costs.

Looking For in LLM Online Inference?

Check out our dedicated LLM online inference page to see how Anyscale supports LLM serving at scale.

Optimized LLM Batch Inference at Any Scale

Automated Throughput Tuner

amazon bedrock black
N/A
Apache Spark
-
Ray + vLLM logos
-
anyscale blue

Support for Different GPUs/Accelerators

amazon bedrock black
N/A
Apache Spark
-
Ray + vLLM logos
anyscale blue

Support for Large Model Parallelism

amazon bedrock black
Apache Spark
-
Ray + vLLM logos
anyscale blue

Spot Instance Support

amazon bedrock black
N/A
Apache Spark
Ray + vLLM logos
anyscale blue

Aceclerated Long Context Inference

amazon bedrock black
-
Apache Spark
-
Ray + vLLM logos
-
anyscale blue

Custom Optimized Kernels

amazon bedrock black
N/A
Apache Spark
-
Ray + vLLM logos
-
anyscale blue

Multi-Modal Support

amazon bedrock black
-
Apache Spark
Ray + vLLM logos
anyscale blue
amazon bedrock black
Apache Spark
Ray + vLLM logos
anyscale blue

Automated Throughput Tuner

amazon bedrock black
N/A
Apache Spark
-
Ray + vLLM logos
-
anyscale blue

Support for Different GPUs/Accelerators

amazon bedrock black
N/A
Apache Spark
-
Ray + vLLM logos
anyscale blue

Support for Large Model Parallelism

amazon bedrock black
Apache Spark
-
Ray + vLLM logos
anyscale blue

Spot Instance Support

amazon bedrock black
N/A
Apache Spark
Ray + vLLM logos
anyscale blue

Aceclerated Long Context Inference

amazon bedrock black
-
Apache Spark
-
Ray + vLLM logos
-
anyscale blue

Custom Optimized Kernels

amazon bedrock black
N/A
Apache Spark
-
Ray + vLLM logos
-
anyscale blue

Multi-Modal Support

amazon bedrock black
-
Apache Spark
Ray + vLLM logos
anyscale blue
Cost 400 x 250 white background

Best-Price Performance

We’ve optimized our inference engine so you don’t have to.

  • 6.1x cost savings compared to Amazon Bedrock
  • 90% cost savings on select instances with spot instances and fault-tolerant continuous batching
Serve and deploy models at scale

Scale Your Datasets and Models

Anyscale supports tensor parallelism, data parallelism, and pipeline parallelism so you can use any GPU and any model for your workload—including more available cost efficient options like A10 and L4 accelerators and models like Llama-3.1-8B or Llama-3.1-405B.

CAnva

“We have no ceiling on scale, and an incredible opportunity to bring AI features and value to our 170 million users.”

Greg Roodt
ML Lead, Canva

Out-of-the-Box Templates & App Accelerators

Jumpstart your development process with custom-made templates, only available on Anyscale.

Batch Inference with LLMs

Run LLM offline inference on large scale input data with Ray Data

End-to-End LLM Workflows

Execute end-to-end LLM workflows to develop and productionize LLMs at scale

FAQs

At Anyscale, we know how important it is to stay competitive within the AI space. That’s why we’re constantly updating and iterating on our product to make sure it’s the fastest, cheapest, and most performant option for AI/ML workloads. When it comes to offline batch inference, we’ve invested in a number of advanced capabilities to enhance your inference process, including:

  • Cascade inference
  • FP8 support
  • Batch size tuning
  • Pipeline parallelism
  • Continuous batching
  • And more!

Offline Batch Inference at Scale

See why Anyscale is the best option for offline batch inference.