Live
OpenAI announces GPT-5 with unprecedented reasoning capabilitiesGoogle DeepMind achieves breakthrough in protein folding for rare diseasesEU passes landmark AI Safety Act with global implicationsAnthropic raises $7B as enterprise demand for Claude surgesMeta open-sources Llama 4 with 1T parameter modelNVIDIA unveils next-gen Blackwell Ultra chips for AI data centersApple integrates on-device AI across entire product lineupSam Altman testifies before Congress on AI regulation frameworkMistral AI reaches $10B valuation after Series C funding roundStability AI launches video generation model rivaling SoraOpenAI announces GPT-5 with unprecedented reasoning capabilitiesGoogle DeepMind achieves breakthrough in protein folding for rare diseasesEU passes landmark AI Safety Act with global implicationsAnthropic raises $7B as enterprise demand for Claude surgesMeta open-sources Llama 4 with 1T parameter modelNVIDIA unveils next-gen Blackwell Ultra chips for AI data centersApple integrates on-device AI across entire product lineupSam Altman testifies before Congress on AI regulation frameworkMistral AI reaches $10B valuation after Series C funding roundStability AI launches video generation model rivaling Sora
Research

H Company's Holo3 Hits 78.85% on OSWorld — Setting a New State-of-the-Art for Computer Use AI

H Company has released Holo3, an open-weight 35B model (10B active) that achieves 78.85% on OSWorld-Verified — the hardest standard benchmark for desktop computer use. Released under Apache 2.0, the model outperforms larger proprietary alternatives on enterprise automation tasks and represents the most capable open-weight computer-use agent available.

D.O.T.S AI Newsroom

D.O.T.S AI Newsroom

AI News Desk

3 min read
H Company's Holo3 Hits 78.85% on OSWorld — Setting a New State-of-the-Art for Computer Use AI

H Company has released Holo3, an open-weight AI model built specifically for autonomous computer use, and it has immediately set the new state of the art on the field's primary benchmark. On OSWorld-Verified — the hardest standardized evaluation for AI agents that must navigate real desktop environments — Holo3 achieves 78.85%, outperforming larger proprietary models while using a fraction of the parameters.

The Architecture: Efficiency as a Design Principle

Holo3's full parameter count is 35B, but only 10B are active at any inference step — a mixture-of-experts architecture that delivers the reasoning depth of a much larger model at the compute cost of a smaller one. This matters practically: running a 35B-parameter computer-use agent at the scale required for enterprise automation is only feasible if the per-inference cost is manageable. Holo3's active-parameter design addresses this directly.

The model is released under Apache 2.0, which means it can be run, modified, and deployed commercially without restriction. Weights are available on Hugging Face alongside a free-tier inference API for teams evaluating the model before committing to local deployment infrastructure.

What 78.85% on OSWorld Actually Means

OSWorld-Verified is a benchmark designed specifically to resist gaming. It tests AI agents on real computer tasks — navigating GUI interfaces, filling forms, moving files, extracting information across applications — in actual desktop environments rather than simulated ones. Previous state-of-the-art scores on the benchmark have been in the 60-70% range for frontier proprietary models. Holo3's 78.85% is a meaningful step function improvement.

H Company has also published a proprietary benchmark suite of 486 multi-step tasks across four enterprise categories: e-commerce, business software, collaboration, and multi-application workflows. The suite is designed to test the failure modes that OSWorld doesn't catch — long-horizon tasks requiring coordination across multiple applications, error recovery when intermediate steps fail, and consistency over extended sessions. Holo3 performs well across all four categories, with the largest performance advantages on the multi-application tasks where coordination is hardest.

The Training Methodology: Agentic Learning Flywheel

H Company's performance advantage comes substantially from training methodology. The company describes an "agentic learning flywheel" built on three components: synthetic navigation data generated for specific scenarios, programmatic augmentation to handle out-of-domain situations, and reinforcement learning with aggressive filtering to suppress failure modes.

The synthetic environment factory is particularly notable. Rather than collecting human demonstrations or scraping existing software interactions, H Company built automated systems that generate enterprise environments from scratch using coding agents, then verify that the generated tasks are solvable and calibrate difficulty levels. This produces training data at a scale and diversity that human demonstration collection cannot match.

Implications for Enterprise Automation

Computer use AI — models that can operate software the way humans do, without API integrations — is the unlock for automating enterprise workflows that have resisted automation for decades. Most enterprise software was not built with API-first design; significant operational work happens through GUIs that assume a human is present. A model that can reliably navigate those GUIs at 78.85% accuracy on standardized tasks is approaching the reliability threshold where deployment on real workflows becomes viable.

H Company's open-weight release also shifts the competitive dynamics in the computer use space. Previously, the most capable models were proprietary and cloud-accessed, creating data governance concerns for enterprises processing sensitive workflows. Holo3's Apache 2.0 license enables fully on-premise deployment — a critical requirement for industries where data cannot leave the building.

Back to Home

Related Stories

Google's AI Overviews Are Right Nine Times Out of Ten — but the 10% Failure Rate Has a Specific Shape
Research

Google's AI Overviews Are Right Nine Times Out of Ten — but the 10% Failure Rate Has a Specific Shape

A new independent study is the first to systematically measure the factual accuracy of Google's AI Overviews at scale. The headline finding — 90% accuracy — is better than critics expected and worse than Google implies. The more important finding is where that 10% comes from: complex multi-step queries, niche topics, and questions where the web itself is the source of conflicting claims.

D.O.T.S AI Newsroom
Databricks Co-Founder Wins Top Computing Prize — and Says AGI Is 'Already Here'
Research

Databricks Co-Founder Wins Top Computing Prize — and Says AGI Is 'Already Here'

Matei Zaharia, co-founder of Databricks and creator of Apache Spark, has won the ACM Prize in Computing — one of the most prestigious awards in computer science. In interviews accompanying the announcement, Zaharia made a pointed argument: AGI is not a future event but a present condition, and the industry's endless debate about its arrival is obscuring more useful questions about what to do with the AI we already have.

D.O.T.S AI Newsroom
Researchers Fingerprinted 178 AI Models' Writing Styles — and Found Alarming Clone Clusters
Research

Researchers Fingerprinted 178 AI Models' Writing Styles — and Found Alarming Clone Clusters

A new study from Rival analyzed 3,095 standardized responses across 178 AI models, extracting 32-dimension stylometric fingerprints to map which models write like which others. The findings reveal tightly grouped clone clusters across providers — and raise serious questions about whether the AI ecosystem is converging on a single voice.

D.O.T.S AI Newsroom