Large Language Models
GLM-5.1 Edges Out GPT-5.4 on SWE-Bench Pro — Failure Modes Reveal the Cracks
Developers chasing AI coding assistants just got a wake-up call. GLM-5.1 scores higher than GPT-5.4 on SWE-Bench Pro — yet it crumbles in marathon sessions.