AI Offensive Cyber Capabilities Are Doubling Every Six Months, Safety Researchers Find
A study from Lyptus Research found that AI's ability to exploit security vulnerabilities has been compressing its doubling time from 9.8 months to 5.7 months since 2024 — with frontier models now able to complete tasks requiring three hours of expert human effort at a 50% success rate.

D.O.T.S AI Newsroom
AI News Desk
AI models are improving at offensive cybersecurity tasks nearly twice as fast as they were a year ago, according to a new study from Lyptus Research, an AI safety organization. The findings, published April 5, 2026, represent one of the most rigorous quantitative assessments of AI's expanding capability to find and exploit security vulnerabilities.
The research used the METR time-horizon methodology — a framework for measuring how long AI systems can operate autonomously on complex tasks — and was validated by ten professional security experts across 291 total tasks.
The Acceleration Is Accelerating
The headline number is concerning. Since 2019, AI offensive cyber capability has been doubling roughly every 9.8 months. Since 2024, that doubling time has compressed to approximately 5.7 months. The pace of improvement is itself accelerating.
To understand what that means practically: GPT-2 in 2019 had a time horizon of roughly 30 seconds on offensive security tasks — meaning it could autonomously pursue a goal for about half a minute before losing coherence or failing. Contemporary frontier models — Opus 4.6 and GPT-5.3 Codex — can now complete tasks at a 50% success rate that would take human experts roughly three hours.
That is approximately a 360-fold expansion in autonomous capability over seven years. And the researchers believe they are still underestimating the actual rate of progress.
Context Budgets Matter More Than Realized
One of the study's more technically significant findings concerns the relationship between context window size and capability. When given a 10 million token context budget instead of 2 million tokens, GPT-5.3 Codex extended its time horizon from 3.1 hours to 10.5 hours on cybersecurity tasks.
This is not merely a quantitative improvement. It suggests that compute allocation — how many tokens a model is given to reason through a problem — is a significant lever on offensive capability that is underappreciated in current safety evaluations. Published benchmarks using standard context budgets may be substantially underestimating what deployed systems can do when given more resources.
The Open-Source Lag Is Shrinking
Open-source models trail closed-source counterparts in offensive capability by approximately 5.7 months. Which means capabilities that were restricted to leading labs six months ago are now publicly available. The democratization of frontier-adjacent capabilities is happening on the same compressed timeline as capability itself.
Implications for Defensive Security
The asymmetry between offensive and defensive capabilities is a long-standing concern in cybersecurity. AI is accelerating the offensive side of that equation faster than most defensive tooling has adapted. Security teams that built threat models around human adversary capabilities — or even around AI capabilities from 12 months ago — are increasingly operating with outdated assumptions.
All underlying data is publicly available on GitHub and Hugging Face, enabling independent verification and follow-on research.