ML library for distributed model training. Anyscale supports and further optimizes Ray Train for improved performance, reliability, and scale.
faster iteration for companies like Canva
node startup and autoscaling
lower costs on many workloads (vs open source Ray) through spot instance and elastic training support
reduction in cloud costs for companies like Canva
Ray Train is an open source machine learning library built on top of Ray, a best-in-class distributed compute platform for AI/ML workloads.
Ray Train integrates with your preferred training frameworks, including PyTorch, Hugging Face, Tensorflow, XGBoost, and more—so you can develop with your preferred tech stack, then scale to the cloud with just one line of code.
Increase training iteration speed without increasing cost by implementing distributed training on Anyscale. Easily scale from your laptop to any number of GPUs with just one line of code.
Ray Train includes built-in checkpointing to reduce compute. Easily recover from system failures and resume training from a recent checkpoint.
Train with parallelized compute to complete training jobs faster. Increase iteration speed with the ability to scale across nodes during development.
Leverage CPUs and GPUs in the same pipeline with to increase GPU utilization and decrease costs
Integrate with training frameworks like PyTorch, Hugging Face, Tensorflow, and more. Develop with your preferred tech stack, then scale to the cloud with just one line of code.
Elastic Training & Spot Instance Support | - - | - - | |
Job Retries & Fault Tolerance Support | - - | ||
Fast Node Launching and Autoscaling | - - | - - | 60 sec 60 sec |
Fractional Heterogeneous Resource Allocation | - - | ||
Detailed Training Dashboard | - - | - - | |
Last-Mile Data Preprocessing | - - | ||
Autoscaling Development Environment | - - | - - | |
Distributed Debugger | - - | - - | |
Data Integrations (Databricks, Snowflake, S3, GCS, etc) | |||
Framework Support (Pytorch, Huggingface, Tensorflow, XGBoost, etc) | |||
Experiment Tracking Integrations (Weights and Biases, MLflow, etc) | |||
Orchestration Integrations (Prefect, Apache Airflow, etc) | |||
Alerting | - - | ||
Resumable Jobs | |||
Priority Scheduling | - - | ||
Job Queues | - - | ||
EFA Support | Custom Custom |
Jumpstart your development process with custom-made templates, only available on Anyscale.
Execute end-to-end LLM workflows to develop and productionize LLMs at scale
Pre-train a Stable Diffusion V2 model with Ray Train and Ray Data
Fine-tune a personalized Stable Diffusion XL model with Ray Train
Ray Train is an open source machine learning library built on top of Ray, a best-in-class distributed compute platform for AI/ML workloads. Anyscale, built by the creators of Ray, offers additional proprietary enhancements on top of open source Ray Train, like:
Enable simple, fast, and affordable distributed model training with Anyscale. Learn more, or get started today.