Live
OpenAI announces GPT-5 with unprecedented reasoning capabilitiesGoogle DeepMind achieves breakthrough in protein folding for rare diseasesEU passes landmark AI Safety Act with global implicationsAnthropic raises $7B as enterprise demand for Claude surgesMeta open-sources Llama 4 with 1T parameter modelNVIDIA unveils next-gen Blackwell Ultra chips for AI data centersApple integrates on-device AI across entire product lineupSam Altman testifies before Congress on AI regulation frameworkMistral AI reaches $10B valuation after Series C funding roundStability AI launches video generation model rivaling SoraOpenAI announces GPT-5 with unprecedented reasoning capabilitiesGoogle DeepMind achieves breakthrough in protein folding for rare diseasesEU passes landmark AI Safety Act with global implicationsAnthropic raises $7B as enterprise demand for Claude surgesMeta open-sources Llama 4 with 1T parameter modelNVIDIA unveils next-gen Blackwell Ultra chips for AI data centersApple integrates on-device AI across entire product lineupSam Altman testifies before Congress on AI regulation frameworkMistral AI reaches $10B valuation after Series C funding roundStability AI launches video generation model rivaling Sora
Policy

OpenAI Launches Safety Bug Bounty Program Targeting AI's Emerging Attack Surface

OpenAI's new Safety Bug Bounty program extends its existing security research rewards to cover a category of vulnerabilities specific to agentic AI — prompt injection, data exfiltration, and manipulation of autonomous AI behaviour. The program formalises what the security community has been doing informally for two years.

D.O.T.S AI Newsroom

D.O.T.S AI Newsroom

AI News Desk

3 min read
OpenAI Launches Safety Bug Bounty Program Targeting AI's Emerging Attack Surface

OpenAI has launched a Safety Bug Bounty program that offers financial rewards for researchers who identify security vulnerabilities specific to its AI systems' safety properties — a distinct category from traditional software security bounties that the company has run since 2023.

The new program targets three classes of AI-specific risk that have emerged as agentic AI systems — models that can take real-world actions, browse the web, write and execute code, and interact with external services — have become widely deployed: prompt injection, data exfiltration, and autonomous behaviour manipulation.

Why These Vulnerabilities Are Different

Traditional security vulnerabilities involve exploiting flaws in software logic, memory management, or authentication. AI-specific vulnerabilities work differently. A prompt injection attack, for instance, doesn't exploit a code flaw — it exploits the model's tendency to follow instructions embedded in content it processes. If an AI agent browsing the web encounters a webpage containing hidden instructions telling it to exfiltrate session tokens or modify its behaviour in ways the user did not intend, and the model follows those instructions, the vulnerability is real and potentially severe — but it has no CVE equivalent in classical security frameworks.

OpenAI's framing of the program acknowledges this explicitly. The company is treating AI safety properties — robustness to manipulation, resistance to prompt injection, boundary enforcement in agentic contexts — as security properties that can be systematically identified and reported by external researchers.

The Industry Context

The security research community has been documenting prompt injection attacks on large language models since at least 2022, with researchers at leading universities and independent labs publishing increasingly sophisticated attack demonstrations. Until now, the industry's response has been largely informal: some labs acknowledge reported issues privately, some publish blog posts describing mitigations, but formal programs that reward researchers for AI-specific safety findings have been rare.

Microsoft launched a limited AI-focused bug bounty extension in 2024. Google has incorporated AI systems into its existing Vulnerability Rewards Program. OpenAI's new program represents a more categorical commitment: treating the safety properties of AI systems as a distinct, reportable, and compensable attack surface alongside traditional software security.

For the growing community of AI security researchers — red teamers, jailbreak researchers, and agentic AI penetration testers — the formalisation of this category is meaningful. It creates economic incentives for responsible disclosure in a domain where the norms around what to report, to whom, and how have been deeply unclear.

Back to Home

Related Stories

Musk Updates His OpenAI Lawsuit to Route Any $150 Billion Damages Award to the Nonprofit Foundation
Policy

Musk Updates His OpenAI Lawsuit to Route Any $150 Billion Damages Award to the Nonprofit Foundation

Elon Musk has amended his lawsuit against OpenAI with a strategic addition: any damages recovered — potentially up to $150 billion — should be redirected to OpenAI's nonprofit foundation rather than awarded to Musk personally. The update reframes the litigation from a personal grievance into a structural argument about OpenAI's obligations to its original charitable mission.

D.O.T.S AI Newsroom
OpenAI's Child Safety Blueprint Confronts AI's Role in the Surge of Child Sexual Exploitation
Policy

OpenAI's Child Safety Blueprint Confronts AI's Role in the Surge of Child Sexual Exploitation

OpenAI has released a Child Safety Blueprint outlining its approach to detecting, preventing, and reporting AI-generated child sexual abuse material. The document arrives as law enforcement agencies globally report a sharp increase in CSAM volume, with AI tools enabling the production of synthetic material at scale. It is the company's most detailed public statement on the problem it helped create.

D.O.T.S AI Newsroom
Anthropic's Claude Mythos Found Thousands of Zero-Days — So They're Not Releasing It
Policy

Anthropic's Claude Mythos Found Thousands of Zero-Days — So They're Not Releasing It

Anthropic has quietly restricted its most capable new model, Claude Mythos, after the system autonomously discovered thousands of critical vulnerabilities in major operating systems and browsers — including a 27-year-old OpenBSD bug and a 16-year-old FFmpeg flaw. The model is being deployed exclusively through Project Glasswing with 11 vetted security partners. It is the most concrete case yet of an AI lab withholding a model because of genuinely demonstrated risk.

D.O.T.S AI Newsroom