Key Stats/Results:
75% reduction in cloud operating costs
3X faster iteration for machine learning application development
1 FTE saved
Industry served: Municipal infrastructure maintenance
Use Case: Training object detection models on video, batch inference on thousands of videos
Overview:
Aging municipal infrastructure is a problem across the United States. Sewers are a critical component of municipal infrastructure, and in extreme cases, poor sewer maintenance can lead to dangerous situations including the collapse of city streets.
Historically, sewer maintenance has been performed by expert human inspectors, usually onboard camera trucks in the field, laboriously entering observed defects into a database while driving the robotic camera. Sometimes hardware quality, lighting, and other environmental conditions make the process error-prone. False positives, where maintenance is recommended but not needed, add to municipal expenses, but false negatives create failure risk.
Beyond that, those inspectors are busy and backlogged. It’s not unusual for some cities to wait for weeks or months for an inspector to review pipe inspection footage to make maintenance recommendations.
To address these challenges, SewerAI provides tools for municipalities to map, monitor, and maintain critical wastewater infrastructure.
The team at SewerAI developed a computer vision model to accelerate the inspection process, improve its accuracy, reduce the inspection backlog, and ultimately reduce operating costs while improving safety for municipalities.
Using deep learning to process thousands of hours of video footage was critical to the business, but came with significant infrastructure challenges.
Cost-efficiency at scale: Foremost among the challenges was scaling batch inference in a cost-effective way across hundreds of GPUs. Using AWS Batch required spinning up and spinning down a new machine for every video, adding significant overhead and an inability to share compute resources across tasks.
Faster training by scaling data ingest: The training process for SewerAI’s object detection models requires random access to many frames across many videos. The original approach created a bottleneck in the ingestion process, with expensive and scarce GPUs often only achieving 25% utilization.
Faster iterations for developers: Onboarding and ramping new engineers proved challenging as well, as each engineer had to learn to configure clusters, build Docker images, and manage dependencies. Beyond that, it would typically take about 10 minutes to test every code change, slowing down development.
SewerAI turned to Anyscale, creators of the Ray open source project, for help. The SewerAI team was immediately impressed with the level of deployment flexibility that Ray offered. SewerAI’s workloads are large and spikey, and their team loved the fact that Anyscale offered a fast path to deployment without having to develop and maintain infrastructure for distributed AI.
The team started prototyping and had fully proven the solution in just over 2 months.
Using Anyscale, the SewerAI team was able to:
Speed up batch inference jobs by 3x, from one hour to twenty minutes.
Reduce the number of machines required by 50%.
Reduce the time to test each code change from 10 minutes to nearly instantaneous, making it much faster for SewerAI’s developers to iterate.
Raise GPU utilization from 25% to over 95%.
Scale data ingest and preprocessing across many videos to saturate GPUs during training.
The reduction in processing time and machines required amounts to more than a 75% reduction in TCO compared to AWS Batch.
The team at SewerAI takes advantage of Anyscale’s Workspaces feature, which enables ML practitioners to build distributed Ray applications and advance from research to development to production easily, all within a single environment. Workspaces make it easier for SewerAI to ramp new teammates on the application while giving them a centralized environment that integrates with all of their favorite AI/ML tools.
Beyond that, the Anyscale-powered application is delivering value for SewerAI’s customers by improving issue diagnosis accuracy, exceeding even expert human performance on all defect types and ensuring all safety critical defects are located. This allows all municipalities and contractors to deliver reliable information to their stakeholders despite the shortage of experienced inspectors, as well as leverage their large backlog of unreviewed inspections to gain actionable insights.
The SewerAI team has been very happy with the support and expertise provided by the team at Anyscale. Moving forward, the SewerAI team plans to move more workloads to the Anyscale platform and take more advantage of Google Cloud via Anyscale’s multi-cloud support. Anyscale has become their centralized platform for AI, giving them one place to develop and deploy their AI workloads.
“Anyscale was exactly what we needed to scale our ML workloads without investing time and money in building our own infrastructure. Now we have a long-term platform that boosts our developers’ productivity, shortens our iteration cycles, and slashes our long-term cloud costs by over 75%. It is a platform that we can mold to fit our specific needs, rather than trying to adapt to more opinionated solutions that don’t fit our exact case. As an AI-driven business, that’s a game-changer for us.”
-Noah Rubinstein, AI Architect, SewerAI