AI Now Crafts Working Zero-Day Exploits

Google Confirmed the First AI-Crafted Zero-Day. What Changes Now.

On May 11, 2026, Google’s Threat Intelligence Group (GTIG) published evidence that a cybercrime group used an AI model to identify a zero-day vulnerability and write a Python exploit for it — a 2FA bypass rooted in a hardcoded trust assumption buried in the authentication enforcement logic. Google worked with the affected vendor to disclose and patch before the threat actor could execute what GTIG describes as a planned mass exploitation campaign. GTIG identified AI authorship through the code’s own signals: a hallucinated CVSS score, educational docstrings, textbook-format Python consistent with LLM training data.

What the AI was good at is the part that should change defender thinking. Traditional fuzzers and static analysis tools are optimized to detect sinks, crashes, and improper input sanitization. They are structurally bad at high-level logic flaws — the kind where a developer writes a trust assumption that contradicts the application’s own authentication enforcement. Reasoning-capable LLMs are not bad at these. The 2FA bypass demonstrated it at production scale.

5 Key Takeaways

1. AI helped build a working zero-day for the first time.

Google’s Threat Intelligence Group confirmed a cybercrime group used an AI model to discover and weaponize a 2FA bypass in a popular open-source admin tool, then planned a mass exploitation campaign. This is the first publicly confirmed AI-crafted zero-day in the wild. It is not the last one that will exist — it is the first one that has been caught. The next exploit will not leave the same forensic tells. AI governance programs built on patch velocity alone are already operating on the wrong assumption.

2. The AI found a flaw that scanners are bad at finding.

The vulnerability was a high-level semantic logic flaw — a hardcoded trust assumption — the kind of contextual error fuzzers and static analysis tools routinely miss but reasoning-capable LLMs can reliably surface. GTIG researchers confirmed frontier LLMs have “an increasing ability to perform contextual reasoning, effectively reading the developer’s intent to correlate 2FA enforcement logic with the contradictions of its hardcoded exceptions.” That is a new attacker capability. It produced a working exploit.

3. The exploit’s “tells” were textbook LLM output.

A hallucinated CVSS score, educational docstrings, a clean ANSI color class, and detailed help menus gave the AI authorship away. The next exploit will not be so easy to spot. Threat actors learn, and polishing those tells out is a one-week task for any operator who read GTIG’s report. Organizations that update their incident response plans now — while the case is fresh — will be ahead of the disclosure that forces the conversation for everyone else.

4. Frontier-lab guardrails are not the whole defense.

GTIG reported neither Gemini nor Anthropic’s Mythos model was used. Threat actors are routing around frontier-lab safety controls through gray-market proxy services and automated account-pooling pipelines. Open-weight models and gray-market proxies are sufficient for the contextual-reasoning task that produced the 2FA bypass. Safety at one lab is not safety across the ecosystem — and the ecosystem is where shadow AI risk and supply chain attack paths converge.

5. The defensive control that matters is at the data layer.

Patching is necessary but reactive. Organizations that govern data access through identity verification, ABAC policy enforcement, and tamper-evident audit logs limit the blast radius when — not if — the next AI-crafted exploit lands. The attacker can find the flaw faster and weaponize it faster. They still have to reach the data to do damage. The data layer is where the durable control lives.

You Trust Your Organization is Secure. But Can You Verify It?

Read Now

The Speed Compression Is the Real Story

GTIG chief analyst John Hultquist told Infosecurity Magazine: “There’s a misconception that the AI vulnerability race is imminent. The reality is that it’s already begun. For every zero-day we can trace back to AI, there are probably many more out there.” The CrowdStrike 2026 Global Threat Report documented an 89% increase in AI-enabled adversary activity year-over-year, a 42% increase in zero-day exploits, and a 29-minute average eCrime breakout time.

Compress vulnerability discovery, weaponization, and exploitation onto the same timeline and the defender’s window collapses. The traditional incident response cycle — detect, triage, contain, eradicate — assumes daylight between an exploit being available and being used at scale. AI-assisted threat actors are eliminating that daylight. And the next batch of AI-crafted exploits will not include the hallucinated CVSS score that gave this one away.

State Actors Are Already at Scale

The GTIG report’s broader findings make the cybercrime case look modest. North Korean APT45 has been observed sending thousands of repetitive prompts to AI models to recursively analyze vulnerabilities and validate proof-of-concept exploits — an arsenal impractical to manage manually. UNC2814, a China-linked actor, used expert-persona jailbreaking to push Gemini into researching pre-authentication RCE flaws in router firmware. A China-nexus actor was observed using Hexstrike and Strix agentic frameworks alongside the Graphiti memory system to autonomously probe a Japanese technology firm and an East Asian cybersecurity platform, pivoting between reconnaissance tools without sustained operator involvement.

Agentic AI is no longer a conference talking point. It is an operational threat tool with documented use cases against named victim categories. The attacker capability stack is not one breakthrough exploit — it is dozens of incremental capability lifts that taken together compress every step of the kill chain.

The Supply Chain Is the Second Front

The GTIG AI Threat Tracker also documented Russia-nexus actors deploying malware families — CANFAIL and LONGSTREAM — that use AI-generated decoy code to obscure malicious functionality, with LLM-authored comments explicitly describing blocks of code as unused filler. A March 2026 supply chain incident reinforced the practical impact: criminal group TeamPCP compromised GitHub repositories including those tied to the LiteLLM AI gateway library and the Trivy vulnerability scanner, embedding a credential stealer called SANDCLOCK in affected build environments to extract AWS keys and GitHub tokens later used in ransomware partnerships.

The LiteLLM compromise is a canary. LiteLLM is widely used to connect applications to multiple AI providers. Exposure of API secrets from that package gives attackers access to an organization’s AI environment — enabling reconnaissance and data collection at scale from inside enterprise networks. The supply chain risk is no longer abstract; it has documented victims and a documented exfiltration mechanism.

The Data Layer Is the Last Line of Defense

Here is the operational implication: faster vulnerability discovery and exploitation collapse the value of perimeter-based defense. Patching is still essential, but the time between disclosure and weaponization is shrinking toward zero. The controls that matter are the ones that limit damage when an exploit lands — not the ones that prevent initial breach.

The Kiteworks 2026 Forecast Report documented a 15-to-20-point gap between AI governance (monitoring, human-in-the-loop) and AI containment (purpose binding, kill switch, network isolation). 54% of boards do not have AI governance in their top five topics — and those organizations are roughly half as likely to conduct AI impact assessments as organizations with engaged boards. These are control-plane deficiencies, not awareness deficiencies. The gap is now being measured against attackers operationally exploiting AI to compress their kill chain.

How Kiteworks Approaches the New Math

The architectural answer to AI-compressed vulnerability timelines is data-layer governance. The attacker can find the flaw faster and weaponize it faster — but they still have to reach the data to do damage. If every data interaction is authenticated, authorized against ABAC policies, encrypted with FIPS 140-3 validated cryptography, and recorded in a tamper-evident audit trail, the compromised perimeter is not the breach. The data is still under control.

The Kiteworks Secure MCP Server and AI Data Gateway extend zero-trust data access to LLM applications and RAG pipelines — so when AI agents reach for enterprise data, every operation is governed at the data layer rather than trusted at the model layer. The Kiteworks Private Data Network extends this architecture across email, file sharing, MFT, SFTP, web forms, and APIs under one policy engine and one consolidated audit log. When a compromised agent or attacker reaches for data it is not authorized to see, the policy engine refuses — and the log records the attempt.

What Security Leaders Should Do This Quarter

First, treat AI-assisted exploitation as the operating assumption. The GTIG case gives boards a documented incident, not a hypothetical. 54% of boards still do not have AI governance in their top five topics. The board conversation should change this quarter, anchored to GTIG’s disclosure rather than abstract risk.

Second, prioritize blast-radius controls over initial-access controls. Purpose binding, kill switches, and network isolation are the controls that determine how bad an AI-assisted breach gets. They are also the controls most organizations do not have. Funding those controls now is cheaper than funding them after the next disclosure.

Third, demand tamper-evident audit trails for every data interaction. GTIG caught this case because they had visibility into the planned operation. Most organizations will not have GTIG-level threat intelligence. 33% of organizations still lack evidence-quality audit trails capable of supporting regulatory or litigation inquiry — that is the gap that makes a patched breach undefendable.

Fourth, treat third-party AI tooling as a regulated data processor. The LiteLLM compromise shows what happens when an AI gateway library is treated as plumbing rather than a privileged data path. Inventory every AI integration, scope every API token to least privilege, and rotate credentials on a schedule that matches the new threat tempo.

Fifth, prepare for the next disclosure to be worse. Build the assumption of AI-polished exploits into your threat modeling now, while the GTIG case is fresh, rather than after a more sophisticated incident hits your sector.

To learn more about protecting sensitive data from AI attacks, schedule a custom demo today.

Frequently Asked Questions

Lead with the GTIG disclosure as a documented incident, not a hypothetical. Frame the budget around blast-radius controls — ABAC enforcement, tamper-evident audit logs, kill switches — rather than patching velocity. The Kiteworks 2026 Forecast found 54% of boards lack AI governance in their top five topics, and boards that engage show dramatically higher AI control maturity across every measured dimension.

Yes. AI-assisted threat actors compress the time between disclosure and weaponization, eroding the value of even fast patching. The defensible position is data-layer governance: every PHI interaction authenticated, ABAC-enforced, and logged. HIPAA Security Rule audits probe audit trail gaps directly — 33% of organizations lack evidence-quality trails, and that gap is the HIPAA finding.

CMMC Level 2 AC, AU, and IA families require enforced authorization and immutable audit trails — the same controls that limit blast radius when AI-assisted exploits land. Only 46% of DIB organizations consider themselves prepared per the Kiteworks CMMC Preparedness Report. Data-layer governance with ABAC enforcement satisfies all three control families simultaneously and produces the evidence assessors require.

Inventory every AI integration with persistent credentials or tokens, scope each to least-privilege access, and apply tamper-evident logging to every AI-to-data interaction. The Kiteworks 2026 Forecast documented a 15-to-20-point gap between AI governance maturity and AI containment maturity — the containment gap is where supply chain compromises do their damage once inside the environment.

EU AI Act high-risk provisions enforceable August 2026 require granular, immutable logs for high-risk AI decisions and data flows. A documented AI-assisted exploitation case raises the bar on what “appropriate technical and organizational measures” means under Article 32-equivalent standards. Organizations outside EU AI Act scope are 22 to 33 points behind on every major AI control per the Kiteworks 2026 Forecast — the gap regulators will measure first.

Additional Resources

Get started.

It’s easy to start ensuring regulatory compliance and effectively managing risk with Kiteworks. Join the thousands of organizations who are confident in how they exchange private data between people, machines, and systems. Get started today.

Table of Content
Share
Tweet
Share
Explore Kiteworks