The best way to do unstructured data processing. Any data format, any scale—only with Ray Data on Anyscale.
Built for LLM online inference, batch inference, embedding generation, and synthetic data generation.
Maximize compute utilization and leverage GPUs and CPUs to process videos of any size.
Scale image processing workloads by independently scaling CPU and GPU resources, delivering high throughput, lower costs, and improved utilization.
Process audio data without breaking the bank. Anyscale makes it easy to run a variety of use cases, including speech to text.
Text Support | ||||
Image Support | - - | |||
Audio Support | Manual Manual | - - | Manual Manual | |
Video Support | Manual Manual | - - | Manual Manual | |
3D Data Support | - - | - - | Binary Binary | Binary Binary |
Task-Specific CPU & GPU Allocation | - - | - - | ||
Stateful Tasks | - - | - - | ||
Native NumPy Support | - - | - - | ||
Native Pandas Support | - - | |||
Model Parallelism Support | - - | - - | ||
Nested Task Parallelism | - - | - - | ||
Fast Node Launching and Autoscaling | - - | - - | - - | 60 sec 60 sec |
Fractional GPU Support | - - | Limited Limited | ||
Load Datasets Larger Than Cluster Memory | - - | - - | ||
Improved Observability | - - | - - | - - | |
Autoscale Workers to Zero | - - | Limited Limited | ||
Job Queues | - - | - - | - - | |
Priority Scheduling | - - | - - | - - | |
Accelerated Execution | - - | - - | - - | |
Data Loading / Data Ingest / Last Mile Preprocessing |
With Ray, Amazon could compact 12X larger datasets than Apache Spark, improve cost efficiency by 91%, and process 13X more data per hour.
Leverage and parallelize CPUs and GPUs in the same pipeline to increase utilization and decrease costs.
Anyscale’s Ray Data consistently outperforms competitors:
Don’t just process data—use it. Anyscale’s Ray Data slots in seamlessly with other Ray libraries like Ray Train and Ray Serve, so you can effortlessly deliver use cases for batch inference and training.
Jumpstart your development process with custom-made templates, only available on Anyscale.
Run LLM offline inference on large scale input data with Ray Data
Compute text embeddings with Ray Data and HuggingFace models.
Pre-train a Stable Diffusion V2 model with Ray Train and Ray Data
Ray Data is an open source machine learning library built on top of Ray, a best-in-class Pythonic distributed computing platform. Anyscale was founded by the creators of Ray to continue optimizing proprietary technology—built on top of Ray open source—to meet the challenges of the fast-paced AI world. With Anyscale’s proprietary Ray Data, you get access to additional and advanced capabilities including:
Get up to 90% cost reduction on unstructured data processing with Anyscale, the smartest place to run Ray.