Live
OpenAI announces GPT-5 with unprecedented reasoning capabilitiesGoogle DeepMind achieves breakthrough in protein folding for rare diseasesEU passes landmark AI Safety Act with global implicationsAnthropic raises $7B as enterprise demand for Claude surgesMeta open-sources Llama 4 with 1T parameter modelNVIDIA unveils next-gen Blackwell Ultra chips for AI data centersApple integrates on-device AI across entire product lineupSam Altman testifies before Congress on AI regulation frameworkMistral AI reaches $10B valuation after Series C funding roundStability AI launches video generation model rivaling SoraOpenAI announces GPT-5 with unprecedented reasoning capabilitiesGoogle DeepMind achieves breakthrough in protein folding for rare diseasesEU passes landmark AI Safety Act with global implicationsAnthropic raises $7B as enterprise demand for Claude surgesMeta open-sources Llama 4 with 1T parameter modelNVIDIA unveils next-gen Blackwell Ultra chips for AI data centersApple integrates on-device AI across entire product lineupSam Altman testifies before Congress on AI regulation frameworkMistral AI reaches $10B valuation after Series C funding roundStability AI launches video generation model rivaling Sora
Research

Alibaba's Qwen Team Built HopChain to Fix How AI Vision Models Collapse During Multi-Step Reasoning

Vision language models perform well on single-step tasks but fall apart on questions requiring sequential reasoning about images. Alibaba's Qwen team and Tsinghua University developed HopChain — a framework that improved performance on 20 of 24 standard benchmarks by targeting the compounding error problem directly.

D.O.T.S AI Newsroom

D.O.T.S AI Newsroom

AI News Desk

2 min read
Alibaba's Qwen Team Built HopChain to Fix How AI Vision Models Collapse During Multi-Step Reasoning

Vision language models have been getting better at answering questions about images, but there is a consistent failure mode that benchmark scores obscure: tasks requiring multiple consecutive reasoning steps cause performance to drop sharply. A model that accurately identifies an object in an image may fail badly when asked to reason about the object's relationship to other elements, draw an inference from that relationship, and then apply that inference to answer a follow-up question. Each step introduces a small error probability, and those errors compound.

The Problem HopChain Targets

Researchers from Alibaba's Qwen team and Tsinghua University identified the root cause as a gap in training data. Standard vision-language benchmarks test individual perceptual or reasoning steps in isolation. Models trained on these benchmarks never learn to handle the error propagation that occurs when steps are chained. HopChain addresses this directly by generating training data that explicitly contains multi-stage image questions — chains of questions where each hop depends on correctly processing the previous one.

The framework generates these "reasoning chains" programmatically across diverse image types and reasoning patterns, creating a dataset specifically designed to surface and correct the compounding error problem. The researchers describe the process as identifying the failure mode, building a diagnostic framework around it, and then using that diagnostic framework as training signal.

Results Across Benchmarks

Testing on 24 standard vision-language benchmarks, HopChain improved performance on 20 of them. The improvements were most pronounced on tasks that explicitly required sequential reasoning — spatial relationship inference, multi-object comparison, and visual question answering chains. Performance on simpler single-step tasks was mostly unaffected, suggesting HopChain addresses the multi-step problem without degrading basic capabilities.

For practitioners deploying vision models in real-world applications — document processing, medical imaging analysis, manufacturing quality control — multi-step reasoning is often the core requirement. A model that fails gracefully on single-step tasks but collapses on chained inference is less useful than its benchmark scores suggest. HopChain represents a targeted attempt to close that gap.

Back to Home

Related Stories

Google's AI Overviews Are Right Nine Times Out of Ten — but the 10% Failure Rate Has a Specific Shape
Research

Google's AI Overviews Are Right Nine Times Out of Ten — but the 10% Failure Rate Has a Specific Shape

A new independent study is the first to systematically measure the factual accuracy of Google's AI Overviews at scale. The headline finding — 90% accuracy — is better than critics expected and worse than Google implies. The more important finding is where that 10% comes from: complex multi-step queries, niche topics, and questions where the web itself is the source of conflicting claims.

D.O.T.S AI Newsroom
Databricks Co-Founder Wins Top Computing Prize — and Says AGI Is 'Already Here'
Research

Databricks Co-Founder Wins Top Computing Prize — and Says AGI Is 'Already Here'

Matei Zaharia, co-founder of Databricks and creator of Apache Spark, has won the ACM Prize in Computing — one of the most prestigious awards in computer science. In interviews accompanying the announcement, Zaharia made a pointed argument: AGI is not a future event but a present condition, and the industry's endless debate about its arrival is obscuring more useful questions about what to do with the AI we already have.

D.O.T.S AI Newsroom
Researchers Fingerprinted 178 AI Models' Writing Styles — and Found Alarming Clone Clusters
Research

Researchers Fingerprinted 178 AI Models' Writing Styles — and Found Alarming Clone Clusters

A new study from Rival analyzed 3,095 standardized responses across 178 AI models, extracting 32-dimension stylometric fingerprints to map which models write like which others. The findings reveal tightly grouped clone clusters across providers — and raise serious questions about whether the AI ecosystem is converging on a single voice.

D.O.T.S AI Newsroom