Defending AI Agents from Memory Poisoning Attacks

OWASP Agent Memory Guard: Stop AI Agents from Being Weaponized Through Their Own Memory

AI agent memory poisoning exploits the fact that agents are stateful. Unlike a traditional API call that processes input and returns output without remembering anything, an agent maintains persistent state across interactions. That state lives in several places: conversation history buffers, vector stores queried for context via semantic search, scratchpads where the agent writes intermediate reasoning, and RAG indexes connecting the agent to enterprise document repositories.

Any attacker with write access to one of these storage layers gains a command channel into future agent behavior. The attack does not require compromising the agent’s code, the model weights, or the API credentials used to invoke the model. It requires only the ability to place a carefully crafted string into storage the agent will later read.

OWASP’s ASI06 classification distinguishes memory poisoning from standard prompt injection by focusing on persistence. A direct prompt injection happens in a single session and ends when that session closes. Memory poisoning survives session boundaries, agent restarts, and even redeployments if the underlying store is not wiped. Conventional security tooling has no visibility into what an agent stores between calls.

5 Key Takeaways

1. AI agent memory is an unguarded attack surface in most enterprises.

Conversation histories, vector stores, scratchpads, and RAG indexes all accept writes with no authentication or integrity verification by default. An attacker with write access to any of these storage layers gains a persistent command channel into future agent behavior — one that survives session boundaries, agent restarts, and redeployments if the underlying store is not wiped. OWASP formalized this as ASI06 in its Top 10 for Agentic Applications. Data governance frameworks have not caught up to this threat model.

2. OWASP Agent Memory Guard achieves 92.5% recall, 100% precision, and 59-microsecond median latency.

Five detectors — prompt injection, PII/PHI and secret leakage, key tampering, SHA-256 integrity verification, and size-anomaly detection — run inline on every memory read and write. Zero false positives means legitimate memory operations are never blocked. 59-microsecond median latency means it can run in production without throughput impact. These numbers make it viable for inline deployment, not only out-of-band analysis.

3. Memory poisoning enables exfiltration through AI workflows that appear entirely normal.

A manipulated agent can exfiltrate files, call external APIs, and forward content to attacker-controlled endpoints — all while appearing to function normally from the user interface. This is particularly relevant where AI agents interact with secure file sharing systems and MFT pipelines holding contracts, regulated data, and intellectual property. Security tools watching for anomalous human behavior cannot see this attack class.

4. ABAC enforcement limits blast radius even when memory poisoning succeeds.

A memory-poisoned agent that attempts to read from a data store it was never authorized to access gets denied at the policy enforcement point — the poisoned instruction simply cannot execute. Zero-trust principles require that every resource access be explicitly authorized, regardless of whether the requesting entity is a human user or an AI agent. Memory defense stops the poisoning; access control limits the damage if it succeeds anyway.

5. Memory poisoning that causes regulated data exfiltration is a reportable breach.

A memory store containing patient records, CUI, or personal data is subject to HIPAA, CMMC, and GDPR regardless of whether it is a traditional database or a vector index. Memory poisoning that causes an agent to exfiltrate regulated data is a reportable incident under any of these frameworks. Every AI deployment touching regulated data needs a memory security layer before that layer is tested under attacker pressure.

You Trust Your Organization is Secure. But Can You Verify It?

Read Now

How Agent Memory Guard Works

OWASP Agent Memory Guard sits as a runtime interception layer between the AI agent and its memory backends. Every read and every write passes through a policy evaluation pipeline before it completes. That pipeline runs five distinct detectors in sequence:

The prompt injection detector scans content for patterns attempting to override system instructions or inject commands into the agent’s context. The PII/PHI and secret leakage detector flags content containing personal data, credentials, or tokens. The key tampering detector identifies modifications to cryptographic material. The SHA-256 integrity detector verifies that memory content has not been altered since first written. The size anomaly detector catches unusually large writes that may indicate bulk injection attempts.

Policy is defined in YAML and supports four dispositions: allow, redact, quarantine, and block. When a poisoning attempt is detected and blocked, Agent Memory Guard supports rollback to a known-good memory state — something conventional audit log approaches do not provide. Logging records what happened; rollback actually restores the agent to a clean operating state.

The Enterprise Risk: Memory as an Exfiltration Vector

Security practitioners tend to think of AI agent risks in terms of what the agent might say — hallucinations, privacy violations, biased outputs. Memory poisoning introduces a different risk category: what the agent might do. A manipulated agent can exfiltrate files, call external APIs, forward content to attacker-controlled endpoints, and escalate access — all while appearing to function normally from the perspective of the user interface.

This exfiltration path is particularly relevant where AI agents interact with secure file sharing systems, managed file transfer pipelines, and structured document repositories. An agent that reads from a confidential contracts repository and writes summaries to a collaboration tool is doing its job. The same agent, after memory poisoning, might read from that repository and write the raw content — not just summaries — to an endpoint the attacker controls.

The Kiteworks Private Data Network addresses this at the access control layer. ABAC enforcement means that even a fully compromised agent cannot access data outside its policy-defined permissions — the agent’s identity, assigned role, and operating context must match the resource’s access policy before any read or write proceeds.

Implementing a Layered Defense for AI Agent Memory

Organizations deploying AI agents in regulated environments face a specific challenge: the regulatory frameworks that govern data handling — GDPR, HIPAA, CMMC 2.0 — were not written with agentic AI in mind. A memory store containing patient records, CUI, or personal data is subject to those regulations regardless of whether it is a traditional database or a vector index.

OWASP Agent Memory Guard provides runtime protection. The Kiteworks AI Data Gateway provides a governed channel for AI interactions with enterprise data, ensuring sensitive content does not flow to AI systems or AI memories through uncontrolled paths. The Secure MCP Server controls which AI tools can interact with enterprise data at all. AI data governance in practice means treating AI agent memory the same way you treat any other enterprise data store: classify it, apply access controls, monitor access patterns, and verify integrity.

To learn more about protecting your sensitive data in AI agent workflows, schedule a custom demo today.

Frequently Asked Questions

ASI06 is OWASP’s classification for memory and state manipulation attacks against AI agents. It covers scenarios where an attacker modifies persistent state an agent reads — conversation history, vector store contents, scratchpad data, and RAG index entries — to alter behavior in subsequent interactions. It distinguishes memory poisoning from transient prompt injection by focusing on persistence across session boundaries. OWASP Agent Memory Guard is the reference implementation. Organizations building agentic AI should treat ASI06 as a first-class threat alongside prompt injection and data exfiltration.

ABAC evaluates every resource access against a policy considering the attributes of the requesting entity, the resource, and the operating environment. A memory-poisoned agent attempting to read from a data store it was never authorized to access gets denied at the policy enforcement point — the poisoned instruction cannot execute. Kiteworks’ ABAC enforcement applies at the protocol layer, meaning the restriction holds regardless of which AI model or orchestration framework the agent uses. This creates a meaningful blast-radius limitation that complements runtime memory defense.

Most conventional security tooling has no visibility into AI agent memory stores. SIEM systems detect anomalous API call patterns but cannot inspect the semantic content of a vector database write to determine whether it contains an injected command. DLP solutions detect known sensitive data patterns but were not designed to parse prompt injection syntax embedded in document chunks. Agent Memory Guard fills this gap with purpose-built detectors. SHA-256 integrity verification is particularly valuable — it catches tampered content that matches no known malicious pattern, because it simply changed after being written.

Any regulated data an AI agent can access is a target. In healthcare, agents with PHI access expose HIPAA covered entities to breach liability. In defense contracting, agents processing CUI face CMMC requirements that presuppose data integrity — CUI exfiltration is a reportable incident under DFARS. In financial services, agents handling PCI DSS data face the same exposure. Memory poisoning converts a legitimate, authorized AI workflow into an unauthorized data access event.

OWASP Agent Memory Guard inspects and enforces policies on what enters and exits agent memory stores. The Kiteworks AI Data Gateway and Secure MCP Server control which enterprise data sources AI agents can reach, which tools they can invoke, and which outputs they can produce. A well-configured deployment uses Agent Memory Guard to prevent memory from being poisoned, and Kiteworks’ zero-trust architecture to limit what a poisoned agent can accomplish if memory defense is bypassed.

Additional Resources

Get started.

It’s easy to start ensuring regulatory compliance and effectively managing risk with Kiteworks. Join the thousands of organizations who are confident in how they exchange private data between people, machines, and systems. Get started today.

Table of Content
Share
Tweet
Share
Explore Kiteworks