AI & Security · Cybersecurity

1. AI as defender and attacker

AI cuts both ways, and the central question — in Anthropic’s own words — is which side extracts more value first:

“The advantage will belong to the side that can get the most out of these tools. In the short term, this could be attackers, if frontier labs aren’t careful about how they release these models.”
— Anthropic, Claude Mythos Preview^[18]

For defenders	For attackers
Automated vulnerability discovery & patching at scale	Autonomous exploit development; non-experts gain expert capability
Alert triage and anomaly detection in the SOC	Flawless, localised phishing and deepfake social engineering
Code review and secure-by-default generation	Faster N-day weaponisation as the patch gap compresses
Faster incident response and threat hunting	Polymorphic malware and adaptive evasion

The structural worry is the transition period: capability tends to favour offence first (a single new exploit has immediate impact) while defence requires slow, organisation-wide change (patching, re-architecture). The strategic goal of responsible release is to give defenders a head start through that window.

2. Attacks on AI systems

AI is not just a weapon and a shield — it is also a new attack surface. Key classes, several of which trace to the AI & IT ecosystem research:

Adversarial examples^[19] — inputs perturbed imperceptibly to cause misclassification.
Prompt injection — malicious instructions smuggled into the data an LLM reads, hijacking its behaviour; the defining vulnerability class of the agentic era and now its own OWASP Top 10 for LLM Applications^[20].
Data poisoning — corrupting training data to implant backdoors or degrade behaviour.
Model & data exfiltration — stealing model weights or recovering training data through extraction attacks.

As AI agents gain the ability to act — browse, run code, move money — prompt injection becomes a system-security problem, not a content-moderation one. The mitigations are classic security engineering: least privilege for agents, input/output isolation, and human-in-the-loop for consequential actions.

3. Case study: Claude Mythos

Claude Mythos Preview is a frontier model from Anthropic, announced 7 April 2026, whose headline result is a step-change in autonomous cyber capability. According to Anthropic’s disclosure and contemporaneous reporting^[18]^[21], Mythos has:

Found zero-day vulnerabilities in every major operating system and web browser tested;
Surfaced flaws that survived decades of human review (reportedly including a 27-year-old bug in OpenBSD’s SACK implementation);
Developed working exploits — including multi-stage sandbox escapes — that skilled penetration testers estimated would take weeks of manual effort;
Reconstructed plausible source code from stripped binaries, eroding security-through-obscurity for closed-source software.

The significance is not that AI can find bugs — fuzzers and static analysers have done so for years — but that a general-purpose model can chain discovery, reasoning, and exploit construction end-to-end, at a level Anthropic says surpasses all but the most skilled humans.

4. Project Glasswing

Project Glasswing^[23] is the deployment vehicle Anthropic built specifically so it could put Mythos to defensive use without a broad release. It is a limited-access programme for organisations that operate or maintain critical software infrastructure.

Launch (7 April 2026): twelve partners — Amazon (AWS), Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, Palo Alto Networks, and Anthropic itself.
Results to date: partners have identified more than 10,000 high- or critical-severity vulnerabilities across systemically important software since launch.
Disclosure: findings follow coordinated vulnerability disclosure, with Anthropic publishing cryptographic commitments to release details once patches ship.
Funding: roughly $100M in model credits for defensive research and $4M directly to open-source security organisations, plus a Cyber Verification Program for legitimate security professionals.

5. Timeline (to this morning)

Date (2026)	Event
7 Apr	Anthropic announces Claude Mythos Preview and launches Project Glasswing with 12 partners^[23].
Late May	Glasswing partners pass 10,000+ high/critical vulnerabilities found; access has grown to ~50 organisations^[21].
1 Jun	Reporting that Anthropic will extend Mythos access to EU institutions, including the EU cybersecurity agency ENISA^[24].
2 Jun	Glasswing expansion announced: ~150 additional organisations across 15+ countries — adding power, water, healthcare, telecom, and hardware sectors. Anthropic notes that for most partners “a major attack could affect more than 100 million people”^[22].
Early Jun	Reports that Mythos-class capability may be coming to Claude Code, and that Anthropic intends to bring “Mythos-class models” to all customers once safeguards mature^[25].

As of this morning (5 June 2026), the live state of play: Mythos remains a restricted preview; Glasswing is mid-expansion toward ~200 organisations in 15+ countries; no firm public-release date has been set; and Anthropic has flagged that within 6–12 months it expects other labs to field Mythos-class models, potentially without comparable safeguards — the core argument for acting now.

6. Implications & open questions

Mythos does not break the logic of the defence-in-depth model — it compresses the timeline on which that model operates. Several consequences follow:

The patch gap becomes the battleground. If exploits can be generated in minutes, defenders need automated discovery, shortened patch cycles, and automated response just to keep pace.
Security-through-obscurity is dead. Binary reverse-engineering at scale means closed source is no protection; correctness and memory safety matter more than ever (the case for Rust and formal methods).
Release policy is itself a security control. Whether frontier capability is gated, and to whom, materially changes who holds the advantage during the transition.
Proliferation is the wild card. Anthropic’s restraint only helps if peers exercise similar caution; an uncontrolled Mythos-class release elsewhere would reset the calculus.

The optimistic thesis — that defenders ultimately benefit more, because the same capability that finds a bug for an attacker also finds it for the patcher — is plausible but unproven. The next year, as these models broaden access, is the experiment.