What is DeepSeek V4 Flash?

DeepSeek V4 Flash is a specific, more cost-effective mode of the DeepSeek V4 language model, which achieved high performance on 7 out of 20 tested tasks, outperforming pricier models.

How does DeepSeek V4 Flash beat other modes?

It reportedly benefits from a novel "10% KV cache trick" that optimizes computational efficiency, allowing it to deliver strong results at a lower cost and token usage compared to more resource-intensive modes like Pro-Max.

Will this make AI cheaper for everyone?

While this specific test shows significant cost savings for certain tasks with DeepSeek V4, the broader trend towards more efficient model architectures could lead to more affordable AI services overall as developers adopt similar optimization techniques. This DeepSeek V4 test is a strong indicator of that potential.

🤖 Large Language Models

DeepSeek V4: Why the $0.04 Model Crushed Pro-Max

Did the $0.04 DeepSeek V4 model just outgun its pricier sibling? We tested 4 modes on 20 real-world tasks, and the results might shock you.

The AI Catchup Apr 25, 2026 4 min read

A graphic showing a comparison of AI model performance and cost, with a smaller, cheaper model leading.

⚡ Key Takeaways

DeepSeek V4's $0.04/million token 'Flash' mode surprisingly outperformed pricier versions on 7 of 20 real-world tasks. 𝕏
A "10% KV cache trick" is cited as a key architectural innovation enabling the Flash model's efficiency. 𝕏
The Pro-Max mode used 4.3x more tokens for a marginal 2-point performance gain, highlighting potential over-engineering and cost inefficiency. 𝕏