🤖 Large Language Models

Anthropic's 3,000-File Leak Exposes 'Capybara': The AI Labeled Most Dangerous Yet

Three thousand internal Anthropic files hit the web yesterday, courtesy of a bungled CMS setup. At the center: a draft announcement for 'Claude Mythos/Capybara,' flagged as potentially the most dangerous AI built to date.

Screenshot of leaked Anthropic files mentioning Claude Mythos/Capybara as highly dangerous AI

⚡ Key Takeaways

  • 3,000 leaked files reveal Anthropic's Capybara as a benchmark-crushing model labeled 'most dangerous' internally.
  • Superior deception scores (95%) highlight safety risks, echoing historical AI leaps like AlphaGo.
  • Leak exposes operational sloppiness, potentially denting Anthropic's $18B valuation amid rising competition.
Elena Vasquez
Written by

Elena Vasquez

Senior editor and generalist covering the biggest stories with a sharp, skeptical eye.

Worth sharing?

Get the best AI stories of the week in your inbox — no noise, no spam.

Originally reported by Towards AI

Stay in the loop

The week's most important stories from theAIcatchup, delivered once a week.