⚙️ AI Hardware

Training VLMs 'From Scratch'? It's a $100M Lie Nobody Buys Anymore

Labs ditched true scratch training after it devoured $100M in compute for mediocre results. Now it's all Frankenstein mods on pre-trained giants.

Schematic of VLM stack: ViT backbone, Q-Former adapter, and language model layers

⚡ Key Takeaways

  • VLMs aren't trained from scratch; they mod pre-trained ViT, Q-Former, and LMs to slash costs.
  • Frozen backbones prevent overfitting; adapters like Q-Former bridge vision-text gaps.
  • Modular approach commoditizes VLMs, predicting open-source dominance by 2028.

🧠 What's your take on this?

Cast your vote and see what theAIcatchup readers think

Elena Vasquez
Written by

Elena Vasquez

Senior editor at theAIcatchup. Generalist covering the biggest AI stories with a sharp, skeptical eye.

Worth sharing?

Get the best AI stories of the week in your inbox — no noise, no spam.

Originally reported by Towards Data Science

Stay in the loop

The week's most important stories from theAIcatchup, delivered once a week.