Serverless Containers
Create fast and scalable endpoints
for containerized models
DataCrunch Serverless Containers
Create inference endpoints to serve your containerized models flexibly, fetched from any container registry-
Deploy
Package your models in containers and deploy from any registry (Docker Hub, GitHub, etc) using API, CLI, or UI. -
Scale
Auto-scale based on the number of incoming requests. Scale to hundreds of GPUs or zero when idle. -
Monitor
Get logs and metrics on resource utilization and application behavior in the UI or as endpoints for Prometheus or Loki. -
Pay per usage
Pay only for the compute that is in active use without charges for idle time. Start, stop, hibernate instantly via the UI or API.
Deployment types
-
Continuous
Automatically scales to usage for continuous inference workloads
-
Jobs
Long-running tasks with safe downscaling to prevent interruption
Pricing
Pay only for the compute that is in active use
Interruptible spot pricing is available at 50%
Multi-GPU options with 1x, 2x, and 4x
Compute type | Specs | On-demand price | Spot price |
---|---|---|---|
B200 SXM6 180GB | 30 CPU 240 GB RAM 180 GB GPU VRAM | $5.39/h | $2.69/h |
H200 SXM5 141GB | 21 CPU 175 GB RAM 141 GB GPU VRAM | $4.13/h | $2.06/h |
H100 SXM5 80GB | 21 CPU 175 GB RAM 80 GB GPU VRAM | $3.98/h | $1.99/h |
A100 SXM4 80GB | 21 CPU 110 GB RAM 80 GB GPU VRAM | $1.75/h | $0.88/h |
A100 SXM4 40GB | 21 CPU 110 GB RAM 40 GB GPU VRAM | $1.29/h | $0.65/h |
L40S 48GB | 20 CPU 58 GB RAM 48 GB GPU VRAM | $1.29/h | $0.65/h |
RTX6000 Ada 48GB | 10 CPU 58 GB RAM 48 GB GPU VRAM | $1.29/h | $0.65/h |
Auto-scaling support
Control scaling sensitivity based on the queue length per replica with additional scaling metrics and attributes.
See the docs.
Sync and async requests
Access your container directly in the synchronous mode or use our cloud platform to run your workloads asynchronously.
See the docs.
Metrics
Get detailed, time-series metrics on replica count, GPU and CPU utilization, request rates, inference duration, and queue size with complementary endpoints for Prometheus or Loki.

Logs
Access container-level logs in real time to debug errors, trace requests, and monitor applications directly from the UI.

API and SDK
Interact with the Serverless Containers through DataCrunch Public API or official Python SDK.



Get real-time observability your way
Integrate DataCrunch Serverless Containers with your Prometheus or Grafana stack
Looking for something different?
Other inference services
-
Managed endpoints
Our engineering team can create and maintain a custom endpoint for any model, with you paying per use only.
-
Co-development
Our engineering team can be your strategic partner for building and maintaining custom inference solutions, tailored to your use cases.
Customer feedback
What they say about us...
-
Having direct contact between our engineering teams enables us to move incredibly fast. Being able to deploy any model at scale is exactly what we need in this fast moving industry. DataCrunch enables us to deploy custom models quickly and effortlessly.
Iván de Prado Head of AI at Freepik -
From deployment to training, our entire language model journey was powered by DataCrunch's clusters. Their high-performance servers and storage solutions allowed us to run smooth operations and maximum uptime, and to to focus on achieving exceptional results without worrying about hardware issues.
José Pombal AI Research Scientist at Unbabel -
DataCrunch powers our entire monitoring and security infrastructure with exceptional reliability. We also enforce firewall restrictions to protect against unauthorized access. Thanks to DataCrunch, our training clusters run smoothly and securely.
Nicola Sosio ML Engineer at Prem AI