Live
OpenAI announces GPT-5 with unprecedented reasoning capabilitiesGoogle DeepMind achieves breakthrough in protein folding for rare diseasesEU passes landmark AI Safety Act with global implicationsAnthropic raises $7B as enterprise demand for Claude surgesMeta open-sources Llama 4 with 1T parameter modelNVIDIA unveils next-gen Blackwell Ultra chips for AI data centersApple integrates on-device AI across entire product lineupSam Altman testifies before Congress on AI regulation frameworkMistral AI reaches $10B valuation after Series C funding roundStability AI launches video generation model rivaling SoraOpenAI announces GPT-5 with unprecedented reasoning capabilitiesGoogle DeepMind achieves breakthrough in protein folding for rare diseasesEU passes landmark AI Safety Act with global implicationsAnthropic raises $7B as enterprise demand for Claude surgesMeta open-sources Llama 4 with 1T parameter modelNVIDIA unveils next-gen Blackwell Ultra chips for AI data centersApple integrates on-device AI across entire product lineupSam Altman testifies before Congress on AI regulation frameworkMistral AI reaches $10B valuation after Series C funding roundStability AI launches video generation model rivaling Sora
Research

IBM's Granite 4.0 Vision Leads Benchmarks on Enterprise Document Understanding at 3B Parameters

IBM has released Granite 4.0 3B Vision, a compact vision-language model purpose-built for enterprise document intelligence. Despite its 3 billion parameter count, it achieves leading scores on chart understanding and table extraction benchmarks — outperforming models two to three times larger — through a LoRA adapter architecture and a proprietary 1.7M-chart dataset called ChartNet.

D.O.T.S AI Newsroom

D.O.T.S AI Newsroom

AI News Desk

3 min read
IBM's Granite 4.0 Vision Leads Benchmarks on Enterprise Document Understanding at 3B Parameters

IBM has released Granite 4.0 3B Vision, and the benchmark results are disorienting if you expect performance to track with parameter count. A 3 billion parameter model achieving the highest scores on document understanding evaluations — ahead of models with 9B parameters and more — is a result that demands explanation. IBM's explanation involves a combination of architectural choices, training data composition, and a purpose-built dataset called ChartNet that has not been seen in this space before.

What Granite 4.0 Vision Does

The model is designed for four specific enterprise document tasks: table extraction, chart understanding, semantic key-value pair extraction, and image captioning. These are not the tasks that general-purpose vision-language models optimize for — standard benchmarks emphasize visual question answering, object recognition, and scene understanding. Enterprise document work requires a different set of capabilities: parsing multi-row tables without losing structure, converting a bar chart to a CSV, identifying field-value pairs across inconsistently formatted forms.

Granite 4.0 Vision is implemented as a LoRA adapter on top of the Granite 4.0 Micro base model. This means enterprises can deploy a single base model and load the vision adapter only when needed, with automatic fallback to text-only processing when vision is not required. For organizations deploying at scale, the operational efficiency of this architecture is significant — one model deployment serves both modalities.

The Benchmark Results

On chart summarization (Chart2Summary), Granite 4.0 Vision scores 86.4% — the highest of any model evaluated. On table extraction tasks (PubTablesV2), it scores 92.1 TEDS on cropped tables and 79.3 TEDS on full-page documents, both leading results. On OmniDocBench, a comprehensive document understanding evaluation, it scores 64.0 TEDS, again the top result among evaluated models.

The one category where a larger model beats it: Chart2CSV (converting chart images to spreadsheet data), where Qwen3.5-9B edges ahead at 63.4% versus Granite 4.0 Vision's 62.1%. At 9B parameters — three times the size — that represents a narrow advantage on a single task, not a general dominance. The overall comparison favors the smaller model consistently.

ChartNet: The Training Data Advantage

IBM's performance explanation centers on ChartNet, a dataset of 1.7 million chart samples across 24 chart types, generated using six different plotting libraries. Each sample includes the plotting code that generated it, the rendered image, the underlying data table, a natural language summary, and question-answer pairs. This density of annotation per sample — ground truth at multiple levels of abstraction — gives training a quality signal that typical web-scraped chart collections cannot provide.

ChartNet is being presented at CVPR 2026, and IBM is releasing it publicly alongside the model. The dataset release matters independently of the model: any team working on chart understanding can use ChartNet as a training resource, which may accelerate progress across the field rather than just benefiting IBM's own deployments.

The Enterprise AI Compute Argument

Granite 4.0 Vision is a concrete counterexample to the assumption that enterprise AI deployment requires large models. The performance results make the compute argument straightforwardly: for document intelligence tasks specifically, a 3B model with the right training data and architecture beats 9B models with general-purpose training. Organizations running document processing at scale — financial institutions, insurers, legal firms, logistics companies — can run more instances at lower cost without sacrificing accuracy on the tasks that matter.

IBM has integrated Granite 4.0 Vision with Docling, its open-source document processing framework, creating an end-to-end pipeline from PDF input to structured data output. The combination — model plus processing framework — is positioned as a deployable enterprise document intelligence stack rather than a research artifact.

Back to Home

Related Stories

Google's AI Overviews Are Right Nine Times Out of Ten — but the 10% Failure Rate Has a Specific Shape
Research

Google's AI Overviews Are Right Nine Times Out of Ten — but the 10% Failure Rate Has a Specific Shape

A new independent study is the first to systematically measure the factual accuracy of Google's AI Overviews at scale. The headline finding — 90% accuracy — is better than critics expected and worse than Google implies. The more important finding is where that 10% comes from: complex multi-step queries, niche topics, and questions where the web itself is the source of conflicting claims.

D.O.T.S AI Newsroom
Databricks Co-Founder Wins Top Computing Prize — and Says AGI Is 'Already Here'
Research

Databricks Co-Founder Wins Top Computing Prize — and Says AGI Is 'Already Here'

Matei Zaharia, co-founder of Databricks and creator of Apache Spark, has won the ACM Prize in Computing — one of the most prestigious awards in computer science. In interviews accompanying the announcement, Zaharia made a pointed argument: AGI is not a future event but a present condition, and the industry's endless debate about its arrival is obscuring more useful questions about what to do with the AI we already have.

D.O.T.S AI Newsroom
Researchers Fingerprinted 178 AI Models' Writing Styles — and Found Alarming Clone Clusters
Research

Researchers Fingerprinted 178 AI Models' Writing Styles — and Found Alarming Clone Clusters

A new study from Rival analyzed 3,095 standardized responses across 178 AI models, extracting 32-dimension stylometric fingerprints to map which models write like which others. The findings reveal tightly grouped clone clusters across providers — and raise serious questions about whether the AI ecosystem is converging on a single voice.

D.O.T.S AI Newsroom