Case Study

How Handshake saves 50% on LLM GPU Costs with Anyscale

Handshake uses AI to help students discover the right opportunities, stay informed about the job market, make career decisions, and uplevel their skills as they prepare for their next role. Using Anyscale, Handshake was able to modernize their AI capabilities and now trains models four times faster at half the cost.

5x

faster iteration for AI workloads

50%

savings on cloud costs

10x

scalability for LLM GPUs

>50%

cost savings for LLM GPUs

LinkOverview

From automated career guidance to AI-led mock interviews to improved job recommendations, generative AI has started to transform the job search process. Handshake, which serves tens of millions of students, is leading the charge.

Handshake’s business serves schools, employers and students seeking jobs. The scale of this opportunity is immense, with a million job opportunities and -15 million candidates trying to find the right match.

LinkThe Challenge

Handshake needs to quickly match millions of individuals with jobs and career opportunities as those opportunities become available. Previously, the company relied on tooling that offered limited control and responsiveness to user behavior. Handshake saw an untapped opportunity to use AI to improve that process in ways that benefit applicants and employers alike. Examples include:

Identifying and ranking best-fit matches between millions of jobs and applicants
Providing virtual career guidance and inspiration to help candidates make better career decisions
Scaling semantic understanding and discovery experiences across job descriptions, career events, student resumes and user-generated content

Earlier AI efforts faced a number of hurdles, including missing critical AI platform capabilities and limited custom AI infrastructure, making it hard to deliver prototypes and iterate on AI feature development.

This included the ability to:

Train, fine-tune and host LLMs
Deploy graph neural networks, two tower recommenders and other recommender models
Scale training across multiple GPUs or machines
Leverage a variety of GPU types when appropriate (e.g., A100s and A10Gs)

Just as importantly, Handshake wanted to be sure they picked a platform that would enable a smaller team to deliver state-of-the-art AI models. The team had prior experience building custom infra and solutions with Google Cloud Platform, but quickly found during evaluation that Anyscale’s Ray would better meet their needs. They chose the Anyscale Platform to get a managed version of Ray along with additional capabilities, improved performance and reduced cost. "Using Anyscale as a push-button solution for managing Ray clusters enabled data scientists to quickly set-up and scale production environments with deep learning dependencies—even for real-time applications that typically would require significant ML Ops experience," said Scot Fang, TLM for Machine Learning at Handshake.

Our Solution

The experts at Handshake decided to adopt Anyscale to modernize their AI platform capabilities. Key enhancements include abilities to:

Train, fine-tune and deploy LLMs like Mixtral8x22B
Deploy graph neural networks, two tower recommenders and other deep learning models
Scale training across multiple GPUs and multiple machines
Leverage heterogeneous GPU types when appropriate
Enable LLM workstreams at the scale of millions of documents

LinkThe Impact

The upgraded platform is now twice as fast and Handshake is overachieving on one of their key metrics, engagement on jobs, which is a reflection of the quality of the job recommendation model. Engagement on jobs increased by +90%YoY after deploying graph neural networks and two tower recommenders on Anyscale.

“Iterating and scaling foundational embedding models (graph neural networks, two tower recommenders) trained over 100M-200M+ interactions was difficult and time-consuming until we migrated our feature producers and training jobs to Anyscale + Ray. Our experiment velocity with deep models and dependencies has 5x’ed while training on more data for cheaper,” said Scot. “Anyscale has also enabled net-new real-time inference services and real-time user experiences that data scientists can roll out without waiting on ML Ops experts.” This includes taking advantage of frameworks like vLLM and a wide range of open-source models, all from a simple user interface.

The team’s velocity has increased significantly with Anyscale, which is essential for lean teams that have to move fast. Anyscale provides a platform that has transformed Handshake’s data scientists into full-stack machine learning engineers with only a lean team of 1-2 ML infrastructure engineers supporting and maintaining it. “Anyscale enables our small ML infra team to support state-of-the-art ML solutions at scale on accelerated hardware” said Kyle Gallatin, senior ML infra engineer. “Whether it’s deploying the latest generative AI models or more general ML solutions, Anyscale relieves the burden on ML infra—allowing our team to focus on other impactful work.”

They’ve been able to move workloads to more available, less expensive hardware as well. “We can scale with cheaper GPUs, operating at a fraction of the cost by not having to reserve A100-GPU capacity and taking advantage of Ray features like fractional GPUs. Anyscale also handles dependency management automatically as we move things around,” said Scot.

The Handshake team found it easy to host both large (Mixtral 8x22B) and small (Llama-3 8B) LLM endpoints on Anyscale. The team has already scaled and fine-tuned Llama-2 7B on Anyscale to the point that they’re able to process 35 million tokens per hour for $2.00 on commodity GPUs. By not relying on A100 GPUs, the Handshake team can easily 10x their baseline throughput of 9,600 tokens/second by adding extra commodity GPUs, while saving 30% on costs versus comparable A100 workloads.

The upgraded platform is now far more cost efficient, and enables LLM workstreams at the scale of millions of documents. That contributes to better matching across millions of students’ skills, work experiences and job descriptions as well as LLM-generated explanations of job recommendations, campaign messages, feed content and more.

The team was also very happy with the support provided by the Anyscale team. “Support was excellent during the trial period and continues to be excellent. We deployed and fine-tuned Endpoints on Llama. We call any time something breaks. We get tips about specific optimizations regarding batch-size, compute configurations, tensor transformations. Support is always there and they respond quickly.”

Summarizing the impact of Anyscale at Handshake:

90% higher engagement on jobs, a key business metric
5x faster iteration for AI workloads, accelerating time to market and innovation velocity
50% savings on cloud costs
10x scalability and >50% cost savings for LLM GPUs
Future-proofed for ongoing evolution in the AI space with a modern AI architecture

LinkThe Future of AI for Job Search is Bright

Moving forward, the Handshake team is building on their success and momentum, leveraging real-time embedding generation and retrieval, using AI to build recommender experiences with more transparency/controls, and scaling semantic understanding for new discovery experiences across diverse content types. The Anyscale Platform has provided a future-proof way for the experts at Handshake to continue to expand their use of AI to disrupt the employment market and transform the way job seekers start their careers.

"What we’re doing would be prohibitively expensive with other AI platforms. Anyscale enables us to quickly take advantage of open models like Llama-3, Mixtral, or whatever comes next, and in AI, there’s always a ‘next’. We’re ready for it with Anyscale."

Deepak Kumar

VP of AI and Data, Handshake