Revolutionizing marine mammal research with AI-powered photo identification

Happywhale, a pioneering research collaboration and citizen science platform, has been using DataCrunch infrastructure since 2022 to identify individual whales and dolphins globally, processing millions of photos and transforming marine conservation research. Starting with classic pattern recognition, Happywhale evolved to implement advanced machine learning algorithms from a Kaggle competition, achieving breakthrough accuracy in whale identification. The platform now hosts the majority of the world’s humpback whale photo ID data and is expanding to other marine species including dolphins and seals.

In the North Pacific alone, Happywhale has recorded nearly every living humpback whale (approx. 90%) and supports continuous tracking of long-term trends—including detectable population declines tied to climate-driven food shortages (approx. 7,000 whales lost between 2012–2021 and a 20% population decline from 2014–2016).

With the DataCrunch Cloud Platform, Happywhale is able to:

Process whale identification requests in under 10 seconds
Maintain 97–99% accuracy in matching individuals
Support researchers globally with essential services
Enable real-time field identification via mobile
Focus on science instead of infrastructure management

About Happywhale

Happywhale is a research collaboration and citizen science web platform that uses AI-powered image recognition to identify individual marine mammals through photo-ID - particularly humpback whale flukes (tail images). Founded in 2015 by marine biologist Ted Cheeseman and developer Ken Southerland, the platform has revolutionized how researchers track and study whale populations globally.

Since then, Happywhale and its platform has built a strong track record:

1M+ photos submitted
100,000+ individual marine mammals identified
97-99% matching accuracy
24/7 global usage
Matching request every few minutes

Happywhale serves two key audiences: citizen scientists who submit photos to track individual whales, and researchers conducting population studies and conservation work. The platform has become an essential service for marine research laboratories worldwide.

How It Works

Happywhale's AI-powered system identifies individual whales by analyzing the unique characteristics of their flukes (tail fins for whale matching) and dorsal fins (for dolphin matching) - essentially nature's fingerprints. The algorithm looks at the shape, pattern, and features in an individual Humpback whale's tail, and the platform builds a profile of each individual and how it travels around the world.

Happywhale matching - How It Works

Each whale's fluke has distinct identifying features:

Unique patterns and markings: This ever-growing database uses artificial intelligence to match the individual shapes and patterns on the underside of humpback whale tails like facial recognition.
Pigmentation patterns: AI technology used for photo matching to identify individual humpback whales based on the shape of their tail flukes and their pigment patterns.
Overall shape and edge characteristics: the underside of every tail has a unique shape and pattern, much like a human face or fingerprint.

“We basically broke the bottleneck of pattern recognition. What took an hour per photo now happens virtually instantaneously. This allowed our collaboration to expand to a global scale. We now have most of the world’s humpback whale photo ID data in one place.” – Ted Cheeseman, Co-founder & Director at Happywhale

Ted Cheeseman, Co-founder & Director at Happywhale

The Challenge

Before Happywhale’s AI solution, researchers had to spend over 50 minutes manually matching each whale photo with existing data — a tedious process that limited their work to their own data sets. A crowdsourced identification platform, however, could allow researchers worldwide to collaborate and share information, greatly expanding the scale and impact of whale research. With whales migrating thousands of miles across ocean basins, tracking individuals was nearly impossible without a faster, more accurate system. The team needed infrastructure that could handle continuous global usage while maintaining sub-10-second response times for field researchers.

Starting with SIFT-based pattern recognition in 2015, Happywhale transitioned to machine learning in 2018–2019. The breakthrough came from implementing a Kaggle competition winner’s algorithm, which Ken Southerland refactored to run an order of magnitude faster while maintaining accuracy.

Photo Identification on DataCrunch

The application runs Node.js and Python servers on DataCrunch GPU instances, processing identification requests every few minutes from users worldwide. Feature sets and trained models are stored in memory using an LRU cache for optimal performance, with S3 buckets handling image storage.

“We initially used a traditional hyperscaler, which made things ridiculously complex and left me trying to figure everything out on my own. With DataCrunch, it’s literally ‘set it and forget it’. I got our application running on a GPU instance quickly and easily, and any question I had was instantly answered by an infrastructure expert.” – Ken Southerland, Co-founder & Lead Developer at Happywhale

Ken Southerland, Co-founder & Lead Developer at Happywhale

The DataCrunch Cloud Platform equips Happywhale with:

Stable GPU instances running 24/7 for global users
Simple, user-friendly platform management
Direct human support for technical needs
Future containerization and auto-scaling capabilities
Compute resources for model retraining initiatives

Researchers can now photograph a whale, upload via mobile, and receive identification within seconds. This enables critical decisions in the field, such as whether it’s necessary to collect genetic samples, reducing disturbance to animals and optimizing research permits. Citizen scientists and whale watching hobbyists can see instantly if the whale they are photographing has been identified and track its sightings, and even adopt and name the whale if previously unnamed.

Vision for the Future

Ted Cheeseman envisions a future where every image uploaded to Happywhale is automatically analyzed without human intervention. “My main goal would be to have every image that comes into Happywhale get a ‘Hey, what species do I see in this?’” he explains. The system would automatically detect whether there’s anything identifiable in the photo - be it a whale fluke, dorsal fin, or seal - and route it through the appropriate species-specific algorithm (inference endpoint). This automation would eliminate the current manual step where users must specify which algorithm to use, enabling further scaling and transforming Happywhale into a truly intelligent platform capable of processing images from virtually anywhere, instantly identifying not just whales but multiple marine species from a single unified interface.

The implications for Ken and the DataCrunch team are that GPU, CPU, and storage resources must be able to scale automatically to accommodate an exponential increase in requests for processing of species-specific feature sets. Model training and re-training workloads must also be seamlessly integrated into the infrastructure. The team is currently exploring containerization to increase scalability while minimizing GPU costs, and to also increase application portability.

Next Steps

Learn more about Happywhale. Discover how you can contribute to marine conservation by submitting whale photos and tracking individual animals.
Deploy your GPU infrastructure on DataCrunch. Power your AI research projects with DataCrunch’s reliable GPU instances and responsive support team.