How Freepik scaled FLUX media generation to millions of requests per day with DataCrunch and WaveSpeed

Summary

Freepik has been working with DataCrunch since early 2024 to integrate state-of-the-art media generation into its AI Suite and scale beyond millions of inference requests per day.

DataCrunch has been providing Freepik with its cutting-edge GPU infrastructure and managed inference services, delivering the following capabilities:

Managed GPU orchestration
Elastic scaling and near-zero cold starts
High-velocity model serving with WaveSpeed
Direct contact for expert support and strategic collaboration

Freepik’s customers generate over 60 million images per month, with a significant portion of these requests made possible by the DataCrunch infrastructure and services.

1. Customer Profile

Freepik is a leading AI-powered creative suite that combines advanced generative AI tools with 250M+ curated stock assets to streamline high-quality content creation.

In early 2024, Freepik redefined its business model to leverage generative AI for creating high-quality media content. Freepik started its journey into image generation with models like Stable Diffusion XL and experimental endpoints. Heading into 2025, Freepik has refined its approach to adopt models like FLUX and scale to production-grade infrastructure while simultaneously accommodating a rapidly growing user base.

Freepik's AI Suite features an AI Image Generator, which uses models such as FLUX for producing photorealistic images from text prompts (T2I) or images (I2I), and an AI Video Generator, powered by models such as Google DeepMind Veo 2, for creating videos from text or images. These features were designed to be intuitive while prioritizing style customization and guided workflows.

Freepik’s AI Suite enjoyed over:

80 million visitors per month
+600 thousand subscribed users
+60 million images generated per month

A significant portion of these inference requests involves the FLUX model suite with the inference endpoints managed by DataCrunch and utilizing the inference engine by WaveSpeed:

FLUX Dev
FLUX Tools (e.g., inpainting)

2. Cost-Efficient Media Generation: FLUX Dev

2.1. Initial Challenges

Scaling infrastructure is rarely an easy task. Scaling infrastructure while accommodating an exponentially growing user base, catering to daily usage spikes, and optimizing for inference cost and speed is just short of impossible.

As Freepik has set off to build a world-class product and customer experience, their image generation infrastructure had to be at the frontier of performance while meeting the following requirements:

Sustaining latency below the 2-6 second range (p50)
Optimizing for throughput/$ (image per hour per compute)
Avoiding any perceivable quality regression over the official unoptimized baseline

In order to meet these requirements, Freepik must operate on the cutting edge of efficiency, as even fractions of cents and more optimal GPU utilization per generation would translate to significant cost savings at such scale.

2.2. Technical Approach

By working with DataCrunch on building and scaling its inference infrastructure, Freepik can focus all its efforts on improving the quality of the product and offering a world-class user experience. DataCrunch also enabled Freepik to adapt to rapid advancements in image generation.

DataCrunch has continuously provided direct communication between its engineers and Freepik, enabling rapid collaboration and adjustments. This direct communication was required to support the scale and growth that Freepik experienced, and to remove any delay or possibility of miscommunication.

“Having direct contact between our engineering teams enables us to move incredibly fast. Being able to deploy any model at scale is exactly what we need in this fast-moving industry. DataCrunch enables us to deploy custom models quickly and effortlessly.” – Iván de Prado, Head of AI at Freepik

As a result, Freepik has benefited from a cost-efficient model serving in its AI Suite while achieving target throughput/$ and latency. Our efforts at DataCrunch were directed at avoiding the “inference speed at all costs” pitfall, which usually comes with tradeoffs such as output degradation and unstable scaling.

In order to accomplish this, we conducted in-depth research into lossless optimizations that preserve model capabilities, followed by rigorous evaluation:

Multi-metric Evaluation: Using the combination of DreamSim, FID, CLIP, ImageReward, and Aesthetic
Human Evaluation: Freepik conducted extensive evaluation via A/B arenas, comparing against the baseline and other inference providers.
Prompt Diversity: Evaluated diverse prompt categories focusing on objects, scenes, and abstract concepts.

Starting from 2025, the DataCrunch-WaveSpeed HW-SW co-design research has enabled us to further push the maximum practical inference efficiency offered to Freepik while utilizing each organization’s core strengths:

The DataCrunch GPU Infrastructure and Serverless Containers with auto-scaling on internal Kubernetes cluster, high-throughput networking, and low-latency object storage.
The WaveSpeed Inference Engine with an in-house ML compiler, custom-tuned and fused CUDA kernels, advanced lossless quantization, DiT activation caching (e.g., AdaCache), and lightweight inference servers with negligible overhead.

3. Inference Benchmarking: Methodology & Results

Key metrics: Average inference time and latency (p99), cost per generation, and throughput per hour.

All of the following results were achieved with a world size of 1 device. Optimizing each endpoint for efficiency has enabled us to dynamically allocate available resources based on Freepik’s traffic at a given point in time and requirements (e.g., user tiers).

Endpoint	Input Parameters	GPU inference time (sec)	throughput per GPU/h
flux-dev	Size = 1024x1024, steps = 28, optional cache = 0.1	4.4	818.2
flux-dev-fast	Size = 1024x1024, steps = 28, optional cache = None	3.3	1091
flux-dev-fast	Size = 1024x1024, steps = 28, optional cache = 0.1	2.2	1636
flux-dev-fast	Size = 1024x1024, steps = 28, optional cache = None	1.64	2184
flux-dev-ultra	Size = 1024x1024, steps = 28, optional cache = None	1.648	2184
flux-dev-ultra	Size = 1024x1024, steps = 28, optional cache = 0.1	1.045	3445
flux-dev-ultra	Size = 1024x1024, steps = 28, optional cache = 0.16	0.768	4688

4. Technical Solution: Production-Grade GPU Infrastructure

The DataCrunch GPU infrastructure delivers a production-grade foundation for large-scale generative AI systems, with the WaveSpeed optimization engine further pushing the envelope of efficiency.

Managed GPU Orchestration: GPU fine-grained resource management is delivered via an internal Kubernetes cluster, allowing WaveSpeed to control resource allocation for inference without worrying about infrastructure management or diving into Infrastructure-as-Code.
Elastic Scaling: The Serverless Containers were customized to auto-scale from zero to over 500 GPU instances based on the number of incoming requests. This capability is crucial to absorbing the daily spikes in traffic without queuing or dropping requests as well as scaling to zero when idle to avoid unnecessary costs.
Near-Zero Cold Starts: Cold-start latency was significantly reduced through pre-warming compilations, faster image pulling times, and container caching, approximating the true lambda performance for GPU workloads.
High-Velocity Model Serving: Optimized storage and network fabrics significantly reduced the time spent loading the model weights and pulling the Docker images. This is especially crucial for the FLUX LoRA deployments since the LoRA weights need to be cached and moved to each GPU instance. In addition, the DataCrunch Shared Filesystem further increases the model servicing velocity and reduces the data transfer overheads by letting multiple instances read and write to the same centralized file repository.

5. Business Outcomes: Cost Savings & Strategic Advantages

DataCrunch, jointly working with WaveSpeed, has enabled Freepik to scale image generation with the FLUX Dev models while minimizing inference costs.

In addition to direct cost savings, the strategic partnership has enabled a large fraction of Freepik’s users to access the FLUX models with higher generation quotas.

The DataCrunch team has rigorously evaluated and ensured output quality while leveraging heterogeneous compute infrastructure by applying HW-aware optimizations.

One of the key takeaways was that sustainable and scalable inference requires an understanding of ML systems from different perspectives:

GPU-aware optimizations (e.g., B200 custom CUDA kernels)
Rigorous testing and evaluation
Infrastructure integration (i.e., auto-scaling, zero cold starts, network tuning)

As Freepik aims for higher traffic and inference volumes with its new enterprise plan, DataCrunch will continue to apply this successful approach to achieve continued scaling performance.

6. Future Predictions: What’s Next for Media Generation

Image generation from text prompts or image conditioning has become insufficient to meet the growing demand for control and editing capabilities in professional environments, as used by digital artists and advertising agencies.

With the release of the FLUX, Black Forest Labs has made a giant leap in image generation quality. We expect that FLUX.1 Kontext [dev] will create a similar inflection point for image editing, with its adoption rates exceeding those of the FLUX model.

The stories, such as OpenAI 4o image generation, confirm that there is high demand for models with a high degree of steerability, easy conditioning with input images, character consistency, and strong prompt adherence.

With the release of FLUX.1 Kontext [max] and [pro], Black Forest Labs have demonstrated that they are capable of reproducing such capabilities while being more cost-efficient.

Looking ahead, DataCrunch plans to go beyond image generation and toward end-to-end workflows. Ongoing research and development projects have been focusing on:

Scalable and cost-effective integration of SOTA video generation with models like Alibaba WAN 2.1 and VACE
Image and video upscaling models
Reducing the complexity of building agentic media products
Releasing FLUX Dev (base model, LoRA, tools) inference endpoints in the DataCrunch Cloud Platform – stay tuned

Get started

Deploy Serverless Containers with the DataCrunch Cloud Platform
Reach out to DataCrunch about Managed Services
Explore the WaveSpeed FLUX image generation