Know3D Solves 3D Generation's Blind Spot — Letting You Steer What AI Can't See
Researchers have built Know3D, a system that uses multimodal language models to control the hidden back side of AI-generated 3D objects via text prompts — filling a fundamental gap in single-image 3D generation that has limited the technology's practical utility.

D.O.T.S AI Newsroom
AI News Desk
A new research system called Know3D addresses one of the most persistent limitations of AI-driven 3D generation: the model's fundamental inability to determine what should appear on the side of an object that isn't visible in the input image. The work, covered by The Decoder, represents a practical approach to a problem that has blocked single-image 3D generation from wider deployment in design, gaming, and e-commerce workflows.
The Blind Spot Problem
When an AI model generates a 3D object from a single photograph, it is working with incomplete information by definition. The image shows one perspective. Everything else — the back, the underside, the interior structure — must be inferred. Current 3D generation models handle this through training data priors: they guess what the hidden side probably looks like based on similar objects they've seen. For symmetric objects like cups or spheres, this works acceptably. For asymmetric objects — a backpack with unique rear pockets, a vehicle with a distinctive exhaust system — the guesses are often wrong and practically useless.
How Know3D Works
Know3D's approach inserts a language model into the generation pipeline as a knowledge bridge. Users can write a text description of what should appear on the hidden side of the object — "leather patch on the back" or "exhaust pipe on the left side" — and the system translates that description into geometry constraints that guide the 3D generator.
The key architectural insight is the use of Qwen2.5-VL as the language model, Qwen-Image-Edit for image generation, and Microsoft's Trellis.2 as the 3D generator. Rather than feeding language model output directly to the 3D network — which doesn't work because the representations are too abstract for spatial generation — Know3D routes through an image generation step that produces spatial-structural information compatible with the 3D model's input format.
The intermediate image representation, extracted from internal states just before the image generator's final output, proves more robust than using the generator's finished image. These internal states carry spatial and semantic structure even when the generator produces an imperfect result — meaning errors in the image generation step don't propagate fully into the 3D geometry.
Practical Implications
The immediate applications are in product visualization, game asset creation, and e-commerce photography — all contexts where 3D models need to be accurate and controllable, not just plausible. Know3D doesn't eliminate the need for 3D artists in professional pipelines, but it meaningfully reduces the iteration cost for use cases where "good enough" 3D from a single reference photo is sufficient.