How Simli Achieved Cost-Efficient, Real-Time Inference for Interactive AI Avatars

Simli is on a mission to make interactive, lifelike AI avatars a foundational part of future digital experiences across e-commerce, customer support, corporate training, EdTech, and beyond. Achieving this requires the production-grade GPU compute, ultra-low latency, and cost-efficiency that DataCrunch delivers.

Simli benefits from a fast-moving partnership with DataCrunch, collaborating on the design and configuration of bare-metal GPU clusters that meet the requirements for Simli’s real-time inference workloads.

In addition, Simli utilizes on-demand GPU resources from DataCrunch for scaling user requests and running research & development workloads. Thanks to how DataCrunch handles disk objects, Simli experiences 30–50% faster startup times and subsequent cost savings.

About Simli

Simli develops compute-efficient, interactive AI avatars targeting businesses seeking to enhance AI-powered applications, spanning various sectors like e-commerce, customer service, and EdTech. Their primary focus is on providing an excellent and user-friendly developer experience.

Simli’s interactive AI offering utilizes a novel 3D neural architecture based on Gaussian splatting, a new graphics primitive offering high visual fidelity and compute efficiency. Unlike alternatives that often rely on video-based lip-syncing, Simli's neural network provides full control over 3D animation, allowing the entire character's face – not just the lips – to be animated in response to audio. This results in more dynamic avatars that are also more cost-efficient.

Simli, an early-stage startup, was founded by an experienced team of AI builders and serial founders who understood the importance of time-to-market. Their story of starting in stealth mode and moving fast through rapid experimentation to production-grade workloads is similar to what many other AI-native startups experience.

Although Simli initially relied on credits from traditional hyperscalers, the team knew that the hyperscalers could not provide the customization, flexibility, and responsiveness that more agile AI neocloud providers could.

“DataCrunch is the perfect mix of being nimble and having a stable, production-grade product. With DataCrunch, we can promise our customers high uptimes and competitive SLAs.” – Lars Vågnes, Founder & CEO, Simli

The Challenge: Delivering Lifelike, Interactive Avatars Cost-Efficiently

Creating avatars that respond in real time to human speech with full facial expressions – not just lip-syncing – is no small feat. It requires ultra-low latency and production-grade stability, all while remaining cost-efficient.

“We needed production-grade reliability with pricing that made sense for a startup. DataCrunch hit that sweet spot.” – Lars Vågnes

To deliver a compelling user experience, Simli created custom workflows, including their proprietary load balancer and the creation of WebRTC-based peer-to-peer connections with end-users. These network-sensitive processes, which involve running multiple interacting Docker instances, required a customized bare-metal GPU cluster from DataCrunch.

Business benefits:

2–3× more avatar sessions per dollar compared to hyperscalers
30–50% faster GPU startup times, from 5 minutes to under 2 minutes
Significant reduction in costs with flexible disk management and access to GPU models
Cost-effective inference scaling for real-time interactive services

The DataCrunch Solution

After evaluating multiple AI neocloud providers, Simli chose DataCrunch for its:

Production-grade reliability
Developer-first experience
Self-service GPU availability
Ultimate cost-efficiency

“We found that other providers often offered cheaper GPUs but lacked the reliability required for a production-level, low-latency, API-driven service like ours,” noted Lars, explaining why DataCrunch stood out as the optimal choice.

“Startup times and compute costs both dropped significantly. We’re now streaming more avatars, faster, and at lower costs than ever.” – Lars Vågnes

Solution requirements:

<300ms latency
Fast GPU startup times - under 2 minutes
Access to a broad spectrum of GPU models
API-driven automation and scaling

Simli purpose-built its stack to for cost-efficient real-time inference. A broad availability of GPU models from DataCrunch allowed Simli to explore options and find the most optimal solution for all of its critical workloads in terms of latency and cost. A reliable network infrastructure was also critical for Simli's WebRTC-based real-time connections.

With DataCrunch, Simli experienced 30–50% faster GPU startup times (reduced from 5 minutes to under 2 minutes). They attributed this improvement largely to how DataCrunch handles disk objects, allowing Simli to pre-load a lot of data and attach it efficiently to GPUs upon commission. In addition, DataCrunch makes it easy to detach and reattach disks and avoid preemption issues with the on-demand resources.

Team: The Developer Experience

With only a few engineers split between building the product and managing infrastructure, every minute saved matters. Simli’s agile team – especially their lead engineer, Antony Kiroles – benefited from fewer headaches, better uptime, and a developer-first offering from DataCrunch.

“Quality of life went up. We don’t have to deal with the quirks and preemptions we had on other platforms. Ultimately, a great developer experience is being able to run workloads when and where you need to, without sales delays, needing to contact support, or getting stuck in strange Docker environments.” – Lars Vågnes

“Our overall experience with DataCrunch's cloud platform has been great. Things work reliably, and the customer support and responsiveness allow for rapid problem resolution like quick quota adjustments,” Lars emphasized.

Key Takeaways

The fast-moving partnership between Simli and DataCrunch demonstrates several critical success factors for AI-native startups:

Cost Optimization: Achieving 2-3x better compute efficiency enables sustainable scaling of interactive applications.
Performance Requirements: Meeting sub-second latency requirements for real-time applications requires specialized infrastructure.
Reliability: Production-grade stability is essential for API services, especially for interactive applications.
Support Quality: Direct access to technical support teams enables rapid problem resolution and knowledge sharing.
Flexibility: Fast access to a wide range of GPU models allows optimizing workloads for latency and cost.

Simli's API offering now delivers interactive AI avatars at significantly lower costs while maintaining the <300ms latency requirements necessary for real-time interactions.

Looking Ahead: Unlocking Interactive AI

Simli is on a mission to reshape the economics of interactive AI. After a year in development, Simli unveils Trinity-1, the first real-time, interactive Gaussian avatar API.

Trinity-1 unlocks interactive AI to millions of users at less than 1 cent per minute, while the market rates exceed 5-20 cents per minute.

To learn more about how Simli unlocks interactive AI with Trinity-1, visit www.simli.com or watch the demo
To power your AI product with base and on-demand GPU resources, get started with DataCrunch Cloud Platform