New service gives application developers the fastest way to fine-tune and deploy powerful open-source LLMs at scale.
Update June 2024: Anyscale Endpoints (Anyscale's LLM API Offering) and Private Endpoints (self-hosted LLMs) are now available as part of the Anyscale Platform. Click here to get started on the Anyscale platform.
SAN FRANCISCO - September 18, 2023 - Anyscale, the AI infrastructure company built by the creators of Ray, the world’s fastest growing open-source unified framework for scalable computing, today launched Anyscale Endpoints, a new service enabling developers to integrate fast, cost-efficient, and scalable large language models (LLMs) into their applications using popular LLM APIs.
Unveiled at Ray Summit 2023, the leading conference on LLMs and generative AI for developers, Endpoints is less than half the cost of comparable proprietary solutions for general workloads and up to 10X less expensive for specific tasks.
Previously, developers had to assemble machine learning pipelines, train their own models from scratch, then secure, deploy and scale them. This resulted in high costs and slower time-to-market. Anyscale Endpoints lets developers use familiar API calls to seamlessly add "LLM superpowers" to their operational applications without the painstaking process of developing a custom AI platform.
“Obstacles like infrastructure complexity, compute resources and cost have historically limited AI application developers when it comes to open-source LLMs,” said Robert Nishihara, Co-founder and CEO of Anyscale. “With seamless access via a simple API to powerful GPUs at a market-leading price, Endpoints lets developers take advantage of open-source LLMs without the complexity of traditional ML infrastructure. As AI innovation continues to accelerate, Endpoints enables developers to harvest the latest developments of the open-source community and stay focused on what matters—building the next generation of AI applications.”
The Power of Open Source for LLMs
Demand for generative AI and high-quality LLM applications is growing rapidly. According to a new report from Bloomberg Intelligence, the generative AI market is poised to grow from $40 billion in 2022 to $1.3 trillion over the next decade.
Unmatched Price-Performance
As a testament to the unmatched scale and efficiency of the Anyscale Platform, Endpoints is offered at $1 per million tokens for state-of-the-art open-source LLMs like Llama-2 70B, and costs even less for other models. This dramatically expands access to LLM services for application developers. Anyscale is also typically able to add new models in hours, not weeks, so Anyscale Endpoints users have rapid access to the continuous innovation of the open-source community.
A Path to an AI Application Platform
LLMs provide significant value to companies as a result of their ability to be tailored to the specific use cases and fine-tuned with additional content and context to serve end users’ specific needs. Fine-tuning helps users get the best combination of price and performance for their use case.
In addition to fine-tuning, Anyscale provides the ability to run and use the Endpoints service within the customer’s existing cloud account on AWS (Amazon Web Services) or GCP (Google Cloud Platform). Not only does that improve security for activities like fine-tuning, it enables customers to reuse existing security controls and policies and use computing resources in their own cloud to process their proprietary data.
Anyscale Endpoints customers also have the option to upgrade to the full Anyscale AI Application Platform, giving them the ability to fully customize an LLM, and have fine-grained control over their data and models and end-to-end app architecture as well as deploy multiple AI applications on the same infrastructure.
The new service seamlessly integrates with many popular Python and machine learning libraries and frameworks, including Weight & Biases, Arize and Hugging Face, enabling developers to address multiple different types of use cases across any cloud as their AI applications evolve.
Driving User Success
“Realchar.ai is about delivering immersive, realistic experiences for our users, not fighting infrastructure or upgrading open-source models” said Shaun Wei, CEO and Cofounder at Realchar.ai, an Endpoints beta user. “Endpoints made it possible for us to introduce new services in hours, instead of weeks, and for a fraction of the cost of proprietary services. It also enables us to seamlessly personalize user experiences at scale.”
“We use Anyscale Endpoints to power consumer-facing services that have reach to millions of Google Chrome and Microsoft Edge users,” said Siddartha Saxena, Co-Founder and CTO at Merlin. “Anyscale Endpoints gives us 5x-8x cost advantages over alternatives, making it easy for us to make Merlin even more powerful while staying affordable for millions of users.”
Anyscale Endpoints is available today and will continue to evolve rapidly, powered by open-source innovation at both the AI infrastructure and LLM model layers. To try or learn more about Anyscale Endpoints, visit: https://endpoints.anyscale.com .
About Anyscale
Anyscale is the leading AI application platform. With Anyscale, developers can build, run and scale AI applications instantly. Built by the creators of Ray, the world’s fastest growing open-source unified framework for scalable computing, thousands of companies rely on technology from Anyscale to accelerate the delivery of AI products to market at significantly reduced cost. Backed by Andreessen Horowitz, NEA, Addition, Intel Capital and Foundation Capital, Anyscale is headquartered in San Francisco, CA. www.anyscale.com
Press Contact: anyscale@launchsquad.com