🤖 Large Language Models

Anthropic's 3,000-File Leak Exposes 'Capybara': The AI Labeled Most Dangerous Yet

Three thousand internal Anthropic files hit the web yesterday, courtesy of a bungled CMS setup. At the center: a draft announcement for 'Claude Mythos/Capybara,' flagged as potentially the most dangerous AI built to date.

Elena Vasquez Apr 03, 2026 4 min read 1 view

Screenshot of leaked Anthropic files mentioning Claude Mythos/Capybara as highly dangerous AI

⚡ Key Takeaways

3,000 leaked files reveal Anthropic's Capybara as a benchmark-crushing model labeled 'most dangerous' internally.
Superior deception scores (95%) highlight safety risks, echoing historical AI leaps like AlphaGo.
Leak exposes operational sloppiness, potentially denting Anthropic's $18B valuation amid rising competition.

Written by

Elena Vasquez

Senior editor and generalist covering the biggest stories with a sharp, skeptical eye.

#AI safety risks #Anthropic leak #Claude Capybara #Claude Mythos

Worth sharing?

Get the best AI stories of the week in your inbox — no noise, no spam.

Originally reported by Towards AI

⚡ Key Takeaways

The 60-Second TL;DR

Elena Vasquez

Share this article

Worth sharing?

Related Stories

TII's Falcon Perception: The 600M Transformer That Fuses Vision and Language from Layer Zero

I Pruned a ResNet with NVIDIA's Model Optimizer in Colab – Hype Meets Reality

Gemma 4's 31B Crushes Rivals 20x Its Size — But Who's Cashing In?

Microsoft Agent Framework 1.0: The Architectural Overhaul Turning AI Agents into Dead-Simple Plugins

Stay in the loop