H Company's Holo3 Hits 78.85% on OSWorld — Setting a New State-of-the-Art for Computer Use AI

H Company has released Holo3, an open-weight 35B model (10B active) that achieves 78.85% on OSWorld-Verified — the hardest standard benchmark for desktop computer use. Released under Apache 2.0, the model outperforms larger proprietary alternatives on enterprise automation tasks and represents the most capable open-weight computer-use agent available.

H Company has released Holo3, an open-weight AI model built specifically for autonomous computer use, and it has immediately set the new state of the art on the field's primary benchmark. On OSWorld-Verified — the hardest standardized evaluation for AI agents that must navigate real desktop environments — Holo3 achieves 78.85%, outperforming larger proprietary models while using a fraction of the parameters.

The Architecture: Efficiency as a Design Principle

Holo3's full parameter count is 35B, but only 10B are active at any inference step — a mixture-of-experts architecture that delivers the reasoning depth of a much larger model at the compute cost of a smaller one. This matters practically: running a 35B-parameter computer-use agent at the scale required for enterprise automation is only feasible if the per-inference cost is manageable. Holo3's active-parameter design addresses this directly.

The model is released under Apache 2.0, which means it can be run, modified, and deployed commercially without restriction. Weights are available on Hugging Face alongside a free-tier inference API for teams evaluating the model before committing to local deployment infrastructure.

What 78.85% on OSWorld Actually Means

OSWorld-Verified is a benchmark designed specifically to resist gaming. It tests AI agents on real computer tasks — navigating GUI interfaces, filling forms, moving files, extracting information across applications — in actual desktop environments rather than simulated ones. Previous state-of-the-art scores on the benchmark have been in the 60-70% range for frontier proprietary models. Holo3's 78.85% is a meaningful step function improvement.

H Company has also published a proprietary benchmark suite of 486 multi-step tasks across four enterprise categories: e-commerce, business software, collaboration, and multi-application workflows. The suite is designed to test the failure modes that OSWorld doesn't catch — long-horizon tasks requiring coordination across multiple applications, error recovery when intermediate steps fail, and consistency over extended sessions. Holo3 performs well across all four categories, with the largest performance advantages on the multi-application tasks where coordination is hardest.

The Training Methodology: Agentic Learning Flywheel

H Company's performance advantage comes substantially from training methodology. The company describes an "agentic learning flywheel" built on three components: synthetic navigation data generated for specific scenarios, programmatic augmentation to handle out-of-domain situations, and reinforcement learning with aggressive filtering to suppress failure modes.

The synthetic environment factory is particularly notable. Rather than collecting human demonstrations or scraping existing software interactions, H Company built automated systems that generate enterprise environments from scratch using coding agents, then verify that the generated tasks are solvable and calibrate difficulty levels. This produces training data at a scale and diversity that human demonstration collection cannot match.

Implications for Enterprise Automation

Computer use AI — models that can operate software the way humans do, without API integrations — is the unlock for automating enterprise workflows that have resisted automation for decades. Most enterprise software was not built with API-first design; significant operational work happens through GUIs that assume a human is present. A model that can reliably navigate those GUIs at 78.85% accuracy on standardized tasks is approaching the reliability threshold where deployment on real workflows becomes viable.

H Company's open-weight release also shifts the competitive dynamics in the computer use space. Previously, the most capable models were proprietary and cloud-accessed, creating data governance concerns for enterprises processing sensitive workflows. Holo3's Apache 2.0 license enables fully on-premise deployment — a critical requirement for industries where data cannot leave the building.

H Company's Holo3 Hits 78.85% on OSWorld — Setting a New State-of-the-Art for Computer Use AI

The Architecture: Efficiency as a Design Principle

What 78.85% on OSWorld Actually Means

The Training Methodology: Agentic Learning Flywheel

Implications for Enterprise Automation

Related Stories

Google's AI Overviews Are Right Nine Times Out of Ten — but the 10% Failure Rate Has a Specific Shape

Databricks Co-Founder Wins Top Computing Prize — and Says AGI Is 'Already Here'

Researchers Fingerprinted 178 AI Models' Writing Styles — and Found Alarming Clone Clusters