What does PaddleOCR-VL 1.5 do?

It's a 0.9B param model for parsing docs—text, tables, layouts, formulas—with polygon segmentation and native res vision, beating GPT-4o on accuracy.

Can I run PaddleOCR-VL 1.5 on my laptop?

Yes, needs PaddlePaddle GPU (RTX 30/40 series ideal), under 8GB VRAM. Inference flies.

Does PaddleOCR-VL 1.5 work for non-English documents?

Excels in multilingual, especially Asian langs; trained broad but shines where GPT-4o stumbles.

🤖 Large Language Models

Baidu's 0.9B PaddleOCR-VL 1.5 Just Beat GPT-4o at Reading Documents—But Who's Cashing In?

Everyone figured bloated giants like GPT-4o owned document parsing. Baidu's scrappy 0.9B model just flipped the script—94.5% accuracy, cheaper, faster. But is it hype or hardware shift?

theAIcatchup Apr 10, 2026 4 min read

Architecture diagram of PaddleOCR-VL 1.5 showing layout segmentation and VLM core

⚡ Key Takeaways

PaddleOCR-VL 1.5's 0.9B model hits 94.5% on OmniDocBench, topping GPT-4o with polygon layout seg and native res encoding. 𝕏
Hybrid arch fixes traditional OCR flaws—irregular shapes, reading order—runs cheap on consumer hardware. 𝕏
Baidu's efficiency play signals shift to sub-2B doc models, leapfrogging US giants like Tesseract 2.0. 𝕏

Published by

theAIcatchup

AI news that actually matters.

#Baidu PaddlePaddle #Document Parsing #OCR #OCR Models #PaddleOCR-VL 1.5 #vision-language models

Worth sharing?

Get the best AI stories of the week in your inbox — no noise, no spam.

Originally reported by Towards AI

⚡ Key Takeaways

The 60-Second TL;DR

theAIcatchup

Share this article

Worth sharing?

Related Stories

Phi-4-reasoning-vision: The 15B Brain That Sees Math Problems and Crushes Big VLMs

Why LLM Agents Are Secretly Bankrupting Your AI Startup

Chandra OCR 2's 5B Params Smoke GPT-4o on Doc Benchmarks—Open-Source Finally Wins

LeCun's JEPA Exposes LLM Limits — AGI's Next Frontier?

Stay in the loop