Why Robot Brains Still Can't Pick Up After Themselves
Vision-language models promise robot smarts, but they trip over 'where' in complex tasks. GroundedPlanBench calls their bluff β with real gains, or just lab tricks?
β‘ Key Takeaways
- VLMs falter on spatial grounding for long robot tasks; GroundedPlanBench proves it.
- V2GP turns demo videos into 43K training plans, boosting success rates.
- Joint planning outperforms decoupled, but real-world scale needs more than code.
π§ What's your take on this?
Cast your vote and see what theAIcatchup readers think
Worth sharing?
Get the best AI stories of the week in your inbox β no noise, no spam.
Originally reported by Microsoft Research AI