Anyscale aims to put the power of modern compute into the hands of every developer. In order to do so sustainably, enterprises need to implement reasonable controls to facilitate innovation while preventing impending AI sprawl and overspending. To support this mission, we are excited to introduce new enterprise governance and observability tools that help organizations better control their AI infrastructure, understand their infrastructure utilization, and improve their ML workloads' efficiency.
Anyscale customers get comprehensive insights into their infrastructure, helping them optimize compute resources and application performance. By gathering data from AI workloads, Anyscale increases visibility into how resources are used across clouds and compute resources, making it easier for teams to identify inefficiencies and enhance utilization.
Anyscale offers a holistic view of compute utilization, showing aggregate resource usage across all jobs, services, and workspaces. This transparency helps teams easily identify the most and least utilized clusters, spot underutilized machines, and make decisions about reallocating resources or adjusting provisioning.
Key features include:
Cluster-wide visibility: See metrics for jobs, workspaces, and services, including how resources like CPU, GPU, and memory are being utilized.
Spot utilization insights: Understand how efficiently spot instances are used, and track spot preemptions to identify patterns.
Ray telemetry: Understand Ray metrics across multiple clusters even after the clusters are terminated.
Anyscale’s tools include Resource Quotas to give platform administrators granular control over the allocation and usage of cloud resources across their organization.
With resource quotas, teams can set hard limits on resources such as the number of instances, CPU cores, and GPUs with the ability to set quotas on the specific types of GPUs. These controls help to keep projects within budget and avoid unexpected overages. Resource quotas provide a flexible method of creating guardrails for different projects and users by taking into account all active resources.
Traditional methods would restrict who can define the types of resources or the number of experiments that can be run at a time. However with Ray and Anyscale, users are able to run many small experiments or larger experiments in a way that provides guardrails without inhibiting innovation. In addition, this new tool enables teams to balance resource allocation across various users, ensuring fair usage and preventing resource hogging.
Key Benefits:
Cost Control: Set hard limits to prevent over-consumption of cloud resources.
Fair Allocation: Provide teams and projects the resources they need without waste.
Transparency: Gain visibility into resource consumption across your organization.
Customizable: Tailor quotas to specific teams, users, or workloads to meet your organization's unique needs.
At Anyscale, we’re committed to empowering customers with the tools they need to build and deploy AI workloads at scale.
Reach out to our team to take control over your AI infrastructure and costs. Book a demo here: https://www.anyscale.com/book/demo