theAIcatchup

LMSYS leaderboard graph with MiMo-V2-Pro at the top spot

Hunter Alpha's Stealth Coup: MiMo-V2-Pro Tops Leaderboards Without Fanfare

An unnamed API endpoint on OpenRouter just hijacked the top spot on AI leaderboards. No hype, no demos—just raw performance from MiMo-V2-Pro, aka Hunter Alpha.

3 min read 4 days, 12 hours ago

Diagram of four LLM evaluation pillars: multiple-choice, verifiers, leaderboards, and LLM judges with code snippets

AI Hardware

LLM Evaluations: Four Flawed Pillars Propping Up AI Hype

LLM benchmarks promise objectivity. They're mostly marketing mirrors reflecting what sells models, not what works.

4 min read 2 weeks ago

Arena leaderboard screenshot with Claude topping AI model rankings

AI Hardware

Arena's Bulletproof Leaderboard: How It Tamed the AI Wild West

Picture this: AI giants pour billions into models, but one leaderboard decides the winners. Arena's not just ranking them—it's rewriting the rules of the game.

3 min read 2 weeks ago

#AI leaderboards

Hunter Alpha's Stealth Coup: MiMo-V2-Pro Tops Leaderboards Without Fanfare

LLM Evaluations: Four Flawed Pillars Propping Up AI Hype

Arena's Bulletproof Leaderboard: How It Tamed the AI Wild West

Stay in the loop