FLUX.1 Kontext Run with an API

Serverless Containers

Create fast and scalable endpoints

for containerized models

Queue-based auto-scaling Pay per use Scale to zero

Deploy a container

Customers and Partners Who Trust DataCrunch

DataCrunch Serverless Containers

Create inference endpoints to serve your containerized models flexibly, fetched from any container registry

Deploy
Package your models in containers and deploy from any registry (Docker Hub, GitHub, etc) using API, CLI, or UI.
Scale
Auto-scale based on the number of incoming requests. Scale to hundreds of GPUs or zero when idle.
Monitor
Get logs and metrics on resource utilization and application behavior in the UI or as endpoints for Prometheus or Loki.
Pay per usage
Pay only for the compute that is in active use without charges for idle time. Start, stop, hibernate instantly via the UI or API.

Deployment types

Continuous

Automatically scales to usage for continuous inference workloads
Jobs

Long-running tasks with safe downscaling to prevent interruption

Deploy a container Need other configurations?

Pricing

Pay only for the compute that is in active use

Interruptible spot pricing is available at 50%

Multi-GPU options with 1x, 2x, and 4x

Compute type	Specs	On-demand price	Spot price
B200 SXM6 180GB	28 CPU 175 GB RAM 180 GB GPU VRAM	$4.39/h	$2.19/h
H200 SXM5 141GB	21 CPU 175 GB RAM 141 GB GPU VRAM	$2.85/h	$1.42/h
H100 SXM5 80GB	21 CPU 175 GB RAM 80 GB GPU VRAM	$2.19/h	$1.09/h
A100 SXM4 80GB	21 CPU 110 GB RAM 80 GB GPU VRAM	$1.28/h	$0.64/h
A100 SXM4 40GB	21 CPU 110 GB RAM 40 GB GPU VRAM	$0.79/h	$0.40/h
RTX PRO 6000 96GB	28 CPU 85 GB RAM 96 GB GPU VRAM	$1.53/h	$0.76/h
L40S 48GB	20 CPU 58 GB RAM 48 GB GPU VRAM	$1.01/h	$0.50/h
RTX 6000 Ada 48GB	10 CPU 58 GB RAM 48 GB GPU VRAM	$0.91/h	$0.45/h
RTX 4500 Ada or similar	10 CPU 40 GB RAM 24 GB GPU VRAM	$0.70/h	$0.35/h
RTX 4500 PRO or similar	10 CPU 60 GB RAM 32 GB GPU VRAM	$1.00/h	$0.50/h

Note: All specs and prices for 1x GPU.

Deploy a container Need other configurations?

Auto-scaling support

Control scaling sensitivity based on the queue length per replica with additional scaling metrics and attributes.

See the docs.

Sync and async requests

Access your container directly in the synchronous mode or use our cloud platform to run your workloads asynchronously.

See the docs.

Metrics

Get detailed, time-series metrics on replica count, GPU and CPU utilization, request rates, inference duration, and queue size with complementary endpoints for Prometheus or Loki.

Logs

Access container-level logs in real time to debug errors, trace requests, and monitor applications directly from the UI.

API and SDK

Interact with the Serverless Containers through DataCrunch Public API or official Python SDK.

Deploy a container Need other configurations?

Get real-time observability your way

Integrate DataCrunch Serverless Containers with your Prometheus or Grafana stack

Looking for something different?

Other inference services

Managed endpoints

Our engineering team can create and maintain a custom endpoint for any model, with you paying per use only.

Request an endpoint
Co-development

Our engineering team can be your strategic partner for building and maintaining custom inference solutions, tailored to your use cases.

You can also reach us via the contact form

Customer feedback

What they say about us...

Having direct contact between our engineering teams enables us to move incredibly fast. Being able to deploy any model at scale is exactly what we need in this fast moving industry. DataCrunch enables us to deploy custom models quickly and effortlessly.

Iván de Prado Head of AI at Freepik
From deployment to training, our entire language model journey was powered by DataCrunch's clusters. Their high-performance servers and storage solutions allowed us to run smooth operations and maximum uptime, and to to focus on achieving exceptional results without worrying about hardware issues.

José Pombal AI Research Scientist at Unbabel
DataCrunch powers our entire monitoring and security infrastructure with exceptional reliability. We also enforce firewall restrictions to protect against unauthorized access. Thanks to DataCrunch, our training clusters run smoothly and securely.

Nicola Sosio ML Engineer at Prem AI
We needed production-grade reliability with pricing that made sense for a startup. DataCrunch hit that sweet spot.

Lars Vagnes Founder & CEO

Having direct contact between our engineering teams enables us to move incredibly fast. Being able to deploy any model at scale is exactly what we need in this fast moving industry. DataCrunch enables us to deploy custom models quickly and effortlessly.

Iván de Prado Head of AI at Freepik
From deployment to training, our entire language model journey was powered by DataCrunch's clusters. Their high-performance servers and storage solutions allowed us to run smooth operations and maximum uptime, and to to focus on achieving exceptional results without worrying about hardware issues.

José Pombal AI Research Scientist at Unbabel
DataCrunch powers our entire monitoring and security infrastructure with exceptional reliability. We also enforce firewall restrictions to protect against unauthorized access. Thanks to DataCrunch, our training clusters run smoothly and securely.

Nicola Sosio ML Engineer at Prem AI
We needed production-grade reliability with pricing that made sense for a startup. DataCrunch hit that sweet spot.

Lars Vagnes Founder & CEO

Meet our team

What makes DataCrunch different is our commitment to interoperability. Our goal is seamless onboarding: whether you’re deploying a Triton server, a custom Flask app, or a container originally built for another provider, it should just work. We build around emerging standards like vLLM, OpenAI-compatible APIs, and common deployment schemas to future-proof our stack and simplify migration paths. It’s freedom by design, not lock-in.

Nikolai Syrjälä Head of Managed Services

Serverless Containers

DataCrunch Serverless Containers

Deploy

Scale

Monitor

Pay per usage

Deployment types

Continuous

Jobs

Pricing

Pay only for the compute that is in active use

Auto-scaling support

Sync and async requests

Metrics

Logs

API and SDK

Get real-time observability your way

Integrate DataCrunch Serverless Containers with your Prometheus or Grafana stack

Other inference services

Managed endpoints

Co-development

What they say about us...

Meet our team