🤖 Large Language Models

Baidu's 0.9B PaddleOCR-VL 1.5 Just Beat GPT-4o at Reading Documents—But Who's Cashing In?

Everyone figured bloated giants like GPT-4o owned document parsing. Baidu's scrappy 0.9B model just flipped the script—94.5% accuracy, cheaper, faster. But is it hype or hardware shift?

Architecture diagram of PaddleOCR-VL 1.5 showing layout segmentation and VLM core

⚡ Key Takeaways

  • PaddleOCR-VL 1.5's 0.9B model hits 94.5% on OmniDocBench, topping GPT-4o with polygon layout seg and native res encoding. 𝕏
  • Hybrid arch fixes traditional OCR flaws—irregular shapes, reading order—runs cheap on consumer hardware. 𝕏
  • Baidu's efficiency play signals shift to sub-2B doc models, leapfrogging US giants like Tesseract 2.0. 𝕏
Published by

theAIcatchup

AI news that actually matters.

Worth sharing?

Get the best AI stories of the week in your inbox — no noise, no spam.

Originally reported by Towards AI

Stay in the loop

The week's most important stories from theAIcatchup, delivered once a week.