Research3 min read
NVIDIA, Berkeley & Stanford: Best AI Models Still Fail at Robot Control — Until You Add Agentic Scaffolding
A new benchmark framework from NVIDIA, UC Berkeley, Stanford, and Carnegie Mellon systematically tests twelve frontier models on robot manipulation tasks. The verdict: even GPT-5.2, Gemini-3-Pro, and Claude Opus 4.5 fail at most tasks without human-designed abstractions. Agentic scaffolding — parallel generation, self-correction, reusable functions — dramatically closes the gap.