Update June 2024: Anyscale Endpoints (Anyscale's LLM API Offering) and Private Endpoints (self-hosted LLMs) are now available as part of the Anyscale Platform. Click here to get started on the Anyscale platform.
This blog post is part of the Ray Summit 2023 highlights series where we provide a summary of the most exciting talk from our recent LLM developer conference.
Disclaimer: Summary was AI generated with human edits from video transcript.
Optimizing machine learning (ML) platforms for training with next-generation ML platforms is crucial to keep pace with the rapidly evolving landscape of AI and ML technologies. In this blog post, we'll explore the insights gained from Airbnb’s talk discussing their journey in integrating advanced ML technologies into their infrastructure. We'll delve into the importance of optimizing ML platforms, the role of Kubernetes, Ray, and other tools, and the challenges and solutions in achieving cost efficiency, scalability, and performance in ML training.
Airbnb had an existing ML infrastructure built on Kubernetes, but found gaps in Kubernetes' ability to support ML workloads. Issues included lack of support for fractional GPUs, exposing runtime metrics, usability for developers, and scaling for large models.
They adopted Ray to address these gaps. Benefits of Ray included easy prototypes locally, remote execution with dynamic runtime control, and support for latest ML frameworks like PyTorch and optimization libraries like DeepSpeed.
For cost efficiency, they built a fully elastic Ray cluster on AWS using auto-scaling groups, Kubernetes cluster auto-scaler, and KubeRay. This minimized resource fragmentation.
For training efficiency, they enabled high throughput networking across workers using AWS EFA and RDMA. This is important for distributed training.
With Ray they were able to train models up to 12B parameters on 8x A100 GPUs. They achieved 150 TFLOPS per A100 GPU on benchmarks.
Looking ahead, they plan to investigate model parallelism for 30B+ parameter models, integrate Aviary for serving cost savings, and consolidate training engines.
Before diving into optimization strategies, it's essential to understand why it's critical to continually enhance ML platforms. The field of ML is marked by constant innovation. New algorithms, frameworks, and hardware architectures emerge regularly. Keeping up with these developments is essential to ensure that ML models remain competitive and effective.
Airbnb's journey serves as an excellent example of this phenomenon. They introduced the use of advanced ML platforms like Ray to adapt to the changing landscape effectively. This adaptability is crucial because staying relevant in the ML field requires harnessing the latest tools and techniques.
Kubernetes is a fundamental component of modern ML infrastructure. It provides a scalable and flexible environment for deploying and managing ML applications. Airbnb's integration of Kubernetes into its ML infrastructure reflects its importance.
Kubernetes allows ML practitioners to encapsulate their applications as native Kubernetes resources. This approach simplifies application management, improves resource utilization, and supports scaling and service discovery. The result is a streamlined and efficient ML development process
Ray, an open-source distributed computing framework, plays a vital role in Airbnb's ML infrastructure. Ray facilitates distributed ML training, allowing ML practitioners to harness the power of multiple GPUs efficiently.
One of the most significant advantages of Ray is its ability to scale ML workloads seamlessly. Whether you're training models with billions of parameters or performing complex computations, Ray offers the necessary elasticity. This scalability ensures that ML models can be trained quickly and effectively, even as model sizes and complexity increase.
Cost efficiency is a paramount concern in ML training. ML models, especially large deep learning models, can be resource-intensive and expensive to train. Optimizing costs while maintaining or improving model performance is a critical challenge.
Airbnb's approach to cost efficiency involves several key strategies:
1. Ephemeral Clusters: By provisioning ephemeral clusters on-demand, Airbnb minimizes idle resources. These clusters can be tailored to the specific requirements of ML workloads, ensuring cost-efficient resource utilization.
2. Resource Management: Implementing resource management systems helps control resource allocation and utilization, preventing over-provisioning and unnecessary costs.
3. Auto Scaling: Leveraging auto-scaling features ensures that resources are dynamically adjusted based on workload demands. This approach minimizes both underutilization and overutilization of resources.
As ML models become more complex, the need for scalability becomes paramount. Airbnb's exploration of 3D model parallelism is a prime example of this. This approach targets models beyond 30 billion parameters, involving pipeline parallelism for internal nodes and tensor parallelism for intra-node workers.
This level of scaling allows ML practitioners to tackle the most challenging problems, such as training massive language models, with the confidence that they can effectively leverage distributed computing resources.
The field of ML is continuously evolving, with new technologies and techniques emerging regularly. Integrating these advancements into existing infrastructure can be challenging but is essential for staying competitive.
Airbnb's exploration of Aviary for online serving is a testament to their commitment to adopting new technologies. This integration aims to reduce serving costs through features like continuous batching. Embracing innovations like Aviary can have a significant impact on the cost-effectiveness of ML deployments.
The rapid pace of innovation in the ML community is both a blessing and a challenge. While it provides access to cutting-edge tools and techniques, it also requires constant adaptation and learning. Airbnb's experience highlights the need to engage actively with the open-source community to stay informed and leverage the latest advancements effectively.
Optimizing ML platforms for training with next-gen ML platforms is not just a matter of staying up-to-date with the latest technologies. It's about ensuring that ML infrastructure can efficiently harness the power of these technologies. Kubernetes, Ray, and other tools are pivotal in achieving this efficiency.
Cost efficiency, scalability, and adaptability are critical considerations in ML platform optimization. Airbnb's journey serves as a valuable case study, showcasing how organizations can navigate the ever-evolving landscape of ML technologies.
In conclusion, businesses must view ML platform optimization as an ongoing process. By embracing new tools, strategies, and technologies, they can not only keep pace with the ML field's rapid evolution but also stay ahead of the competition and deliver more powerful and cost-effective ML solutions.
As the ML community continues to innovate, organizations must remain agile, adaptable, and committed to optimization to unlock the full potential of machine learning.
Sign up now for Anyscale endpoints and get started fast or contact sales if you are looking for a comprehensive overview of the Anyscale platform.