What is Visual Document Retrieval (VDR)?

VDR matches text queries to relevant document images, preserving layouts, tables, charts — ideal for searching financial reports or contracts.

How much does finetuning improve multimodal embeddings?

In this case, NDCG@10 jumped from 0.888 to 0.947, beating larger models; your mileage varies by dataset.

Can I finetune my own multimodal model with Sentence Transformers?

Yes, using SentenceTransformerTrainer with image-text pairs — works on Qwen VLMs out of the box.

🛠️ AI Tools

Finetuning Multimodal Embeddings with Sentence Transformers: Real Gains or Just Another Benchmark Win?

I've seen a thousand 'breakthrough' model tweaks in 20 years, but this finetune of Qwen's multimodal embedder actually delivers: 0.947 NDCG on VDR, smoking rivals four times its size. Still, who's cashing in?

theAIcatchup Apr 24, 2026 4 min read

Screenshot of finetuned Qwen multimodal embedding model training on document images

⚡ Key Takeaways

Finetuning Qwen3-VL-Embedding-2B on VDR data boosts NDCG@10 to 0.947, topping larger rivals. 𝕏
Sentence Transformers pipeline is dev-friendly for multimodal embeddings and rerankers. 𝕏
Real wins demand domain data; generic models fall short on specialized tasks like document layouts. 𝕏

Written by

Aisha Patel

Former ML engineer turned writer. Covers computer vision and robotics with a practitioner perspective.

#Multimodal Embeddings #Qwen-VL #finetuning #sentence-transformers #visual-document-retrieval

Worth sharing?

Get the best AI stories of the week in your inbox — no noise, no spam.

Originally reported by Hugging Face Blog

⚡ Key Takeaways

The 60-Second TL;DR

Aisha Patel

Share this article

Worth sharing?

Related Stories

Sentence Transformers Multimodal Magic: Embeddings Across Text, Images, and Beyond

NotebookLM's Cinematic Videos: 20 Per Day Limit Signals Google's AI Content Push

ReAct Agents Are Burning 90% of Retries on Ghost Tools—Here's the Fix That Saves Everything

AI Agents: Data Engineers' New Autonomous Allies (With Code)

Stay in the loop