Live
OpenAI announces GPT-5 with unprecedented reasoning capabilitiesGoogle DeepMind achieves breakthrough in protein folding for rare diseasesEU passes landmark AI Safety Act with global implicationsAnthropic raises $7B as enterprise demand for Claude surgesMeta open-sources Llama 4 with 1T parameter modelNVIDIA unveils next-gen Blackwell Ultra chips for AI data centersApple integrates on-device AI across entire product lineupSam Altman testifies before Congress on AI regulation frameworkMistral AI reaches $10B valuation after Series C funding roundStability AI launches video generation model rivaling SoraOpenAI announces GPT-5 with unprecedented reasoning capabilitiesGoogle DeepMind achieves breakthrough in protein folding for rare diseasesEU passes landmark AI Safety Act with global implicationsAnthropic raises $7B as enterprise demand for Claude surgesMeta open-sources Llama 4 with 1T parameter modelNVIDIA unveils next-gen Blackwell Ultra chips for AI data centersApple integrates on-device AI across entire product lineupSam Altman testifies before Congress on AI regulation frameworkMistral AI reaches $10B valuation after Series C funding roundStability AI launches video generation model rivaling Sora
Research

Naver's Seoul World Model Grounds AI Video Generation in Real City Geometry to Stop Hallucination

South Korean internet giant Naver has built a video world model tied to actual physical geography — training on 1.2 million Street View panoramas to generate spatially coherent urban environments. It generalises to cities it has never seen, without fine-tuning.

D.O.T.S AI Newsroom

D.O.T.S AI Newsroom

AI News Desk

3 min read
Naver's Seoul World Model Grounds AI Video Generation in Real City Geometry to Stop Hallucination

Every major video world model released in the past two years shares the same structural flaw: beyond the starting frame, they hallucinate. Streets that don't exist, buildings with impossible geometries, spatial layouts that contradict themselves from one generated frame to the next. The models produce visually convincing footage, but the environments they create are entirely fictional — and unstable.

Researchers from Naver and Naver Cloud have published a paper introducing a fundamentally different approach. Their Seoul World Model (SWM) is grounded in real physical geography — specifically, 1.2 million panoramic Street View images from Naver Map, South Korea's dominant mapping service. The result, according to the paper, is the first video world model tied to an actual physical location.

How It Works

The interface is geographic rather than textual. A user enters GPS coordinates, specifies a desired camera movement — panning, zooming, traversing a street — and adds a text prompt for atmosphere or time-of-day. The model queries the Street View database, retrieves the nearest matching panoramas, and uses those real images as geometric anchors for step-by-step video generation.

The critical innovation is in how the model handles the tension between real reference images and dynamic video generation. Street View captures are static — they freeze cars, pedestrians, and ambient conditions at a single moment in time. A model naively trained on these would either reproduce those transient objects or struggle to generate plausible motion around them.

The Naver team solves this with what they call cross-temporal pairing: during training, reference images and target video sequences are deliberately drawn from different recording sessions. This teaches the model to distinguish between permanent structures — building facades, road geometry, infrastructure — and transient elements like parked vehicles or pedestrians. The model learns geometry from the environment, not from the snapshot moment.

Generalisation Without Fine-Tuning

The most commercially significant claim in the paper is generalisation. SWM was trained entirely on Seoul street data, but the researchers report that the model generates spatially coherent video for other cities — including cities it has never processed — without any city-specific fine-tuning. The geometric and structural patterns learned from Seoul's dense urban grid transfer to other urban environments.

If the generalisation claim holds under scrutiny, it suggests a path toward grounded video world models that don't require per-city training datasets — a meaningful reduction in the data acquisition bottleneck that has constrained geospatially-aware AI development.

Why This Matters Beyond Mapping

Naver's framing is geographic, but the underlying problem — maintaining spatial coherence over generated sequences — has applications well beyond navigation or urban simulation. Robotics, autonomous vehicle simulation, augmented reality, and urban planning tools all share the same requirement: generated environments that respect physical reality. SWM represents an early proof point that grounding generative models in real-world spatial data, rather than purely synthetic training, produces meaningfully more reliable outputs.

Back to Home

Related Stories

Google's AI Overviews Are Right Nine Times Out of Ten — but the 10% Failure Rate Has a Specific Shape
Research

Google's AI Overviews Are Right Nine Times Out of Ten — but the 10% Failure Rate Has a Specific Shape

A new independent study is the first to systematically measure the factual accuracy of Google's AI Overviews at scale. The headline finding — 90% accuracy — is better than critics expected and worse than Google implies. The more important finding is where that 10% comes from: complex multi-step queries, niche topics, and questions where the web itself is the source of conflicting claims.

D.O.T.S AI Newsroom
Databricks Co-Founder Wins Top Computing Prize — and Says AGI Is 'Already Here'
Research

Databricks Co-Founder Wins Top Computing Prize — and Says AGI Is 'Already Here'

Matei Zaharia, co-founder of Databricks and creator of Apache Spark, has won the ACM Prize in Computing — one of the most prestigious awards in computer science. In interviews accompanying the announcement, Zaharia made a pointed argument: AGI is not a future event but a present condition, and the industry's endless debate about its arrival is obscuring more useful questions about what to do with the AI we already have.

D.O.T.S AI Newsroom
Researchers Fingerprinted 178 AI Models' Writing Styles — and Found Alarming Clone Clusters
Research

Researchers Fingerprinted 178 AI Models' Writing Styles — and Found Alarming Clone Clusters

A new study from Rival analyzed 3,095 standardized responses across 178 AI models, extracting 32-dimension stylometric fingerprints to map which models write like which others. The findings reveal tightly grouped clone clusters across providers — and raise serious questions about whether the AI ecosystem is converging on a single voice.

D.O.T.S AI Newsroom