AI-Enabled Vulnerability Discovery Is Reshaping National Cyber Defence

8 May 2026 10:55 AM

The risk that AI-enabled vulnerability discovery tools will be misused is real. How can the UK minimise its strategic dependence and stay capable in cyber defence?

Achilles and Memnon fighting, Trojan War, Attic, red-figure krater.

Anthropic’s new large language model (LLM), Claude Mythos, drew intense attention after demonstrating strong vulnerability-discovery capabilities. For example, in the widely used browser Firefox alone, the preview version of Claude Mythos reportedly helped identify 271 vulnerabilities. The severity levels of these 271 Firefox vulnerabilities were not disclosed, but earlier Claude Opus testing on Firefox found 22 security flaws, 14 of which were high-severity.

To put this into perspective, exploits targeting zero-day browser vulnerabilities – that is, code designed to abuse a previously unknown vulnerability in the browser to gain access to the device running it – can cost between a few hundred thousand and several million dollars on the exploit market. Furthermore, according to prior research by RAND, zero-day exploits have historically remained useful for an average of 6.9 years after private discovery. Finding, validating and weaponizing such vulnerabilities has typically required specialist human expertise and significant time, often taking months or years.

Now, new LLM-based systems are beginning to automate several parts of that process, which increases speed and reduces costs at the same time. They can help find vulnerabilities at scale, reason about exploitability and write exploit code. Beyond this, new agentic AI capabilities make it increasingly possible for LLM-based systems to move from identifying vulnerabilities to attempting exploitation with limited human guidance. In UK AI Security Institute tests, Mythos was able to carry out multi-stage attacks on vulnerable networks. Similarly, a recent Stanford study found that an agentic system could outperform most human professionals in certain types of penetration testing in live enterprise environments. As the Financial Times put it, the concern is that such tools could ‘turbocharge hacking’.

Click here for the full press release