Home > Security and Compliance Blog > Cybersecurity Risk Management > Why RAG Implementations Fail Security Review — and How to Build One That Doesn’t

Why RAG Implementations Fail Security Review — and How to Build One That Doesn’t

by Tim Freestone updated March 16, 2026 Cybersecurity Risk Management

Reading Time: 15 minutes

The demo worked. The pilot was compelling. The business case was strong. And then the security review started.

For many enterprise AI teams, this is where Retrieval Augmented Generation (RAG) projects stall — not because the technology failed, but because the architecture that produced a great demo was not the architecture that could satisfy a security team’s requirements for a production data access system.

Table of Contents

The failure pattern is consistent enough to be predictable: the AI team built a system optimized for retrieval quality and developer velocity; the security team evaluated it as a system that accesses sensitive enterprise data at scale; the two perspectives produced different conclusions about readiness.

This post is for VP AI/ML Engineering and CISOs who want to understand the pattern, close the gap, and get RAG to production without another security review restart.

Executive Summary

Main Idea: RAG implementations fail security review for six predictable reasons, all of which trace to the same root cause: the retrieval layer was treated as infrastructure rather than as a governed data access system. Security teams are not blocking RAG because they oppose AI — they are blocking it because the retrieval component has the data access profile of a privileged system and the governance posture of a development tool. Resolving the six failure modes requires adding a governance layer to the retrieval architecture before security review, not during it.

Why You Should Care: Every cycle a RAG project spends in security review remediation is a cycle it is not delivering business value. The cost is not just delay — it is the credibility of the AI program with business stakeholders who approved the initiative, and the relationship between the AI team and the security function that will govern every subsequent AI project. Building RAG right the first time is not a compliance exercise; it is how AI teams establish the trust that makes future projects move faster.

5 Key Takeaways

Security teams evaluate RAG pipelines as data access systems, not AI tools. The evaluation criteria are the same as for any system that accesses sensitive enterprise data at scale: authentication, access controls, audit logging, monitoring, and incident response. AI teams that present RAG as a productivity application are answering questions the security team is not asking.
The six most common security review failure modes are all architectural, not configurational. They cannot be resolved by adding documentation or adjusting policies. They require changes to authentication architecture, retrieval layer authorization, logging infrastructure, and monitoring integration — changes that are significantly easier to make before the retrieval layer is built than after it is in pilot.
The gap between what security asks for and what AI teams build is not a communication problem — it is a prioritization problem. Retrieval quality, developer experience, and time-to-demo are optimized in the build phase; access controls, audit logging, and monitoring are not. The result is a system that performs well on the dimensions the AI team measured and fails on the dimensions the security team measures.
Pre-retrieval authorization scoping is the single architectural decision that resolves the most security review questions simultaneously. When the retrieval layer enforces per-request RBAC and ABAC and scopes retrieval to what the authenticated user is authorized to access, the over-permissioned retrieval failure mode is closed, the data classification enforcement question is answered, and the authorization equivalence test is satisfied.
A governed retrieval layer is not a constraint on RAG capability — it is the difference between a RAG project that reaches production and one that cycles through security review. The AI Data Gateway pattern provides the governance layer as a deployable component rather than a custom build, allowing AI teams to satisfy security requirements without rebuilding the retrieval architecture from scratch.

Why Security Reviews Stop RAG Projects: The Structural Disconnect

The disconnect between AI teams and security teams on RAG is structural, not interpersonal. AI teams build RAG pipelines using frameworks — LangChain, LlamaIndex, Haystack — that are optimized for retrieval quality and development velocity. These frameworks handle vector indexing, embedding, semantic search, and context assembly well. They do not handle per-user authorization, per-document logging, sensitivity label enforcement, or SIEM integration — because those are not retrieval quality problems. They are governance problems, and the frameworks were built by people solving retrieval quality problems.

When a security team reviews the resulting system, they apply the same evaluation framework they apply to any new data access system. They ask: who can access what data, how is that access controlled, how is it logged, how is it monitored, and what happens when something goes wrong?

The RAG pipeline answers these questions poorly not because the AI team was careless, but because the framework they used does not generate good answers to these questions by default. The service account exists because it was the easiest way to get retrieval working. The session-level logging exists because it was what the framework provided. The absence of sensitivity label enforcement exists because the framework does not know about MIP labels.

The result is a cycle that repeats across organizations: AI team presents demo, security team reviews, security team finds six problems, AI team spends two months remediating while business stakeholders ask when the project is going live, AI team presents again, security team finds three remaining problems, cycle repeats. The projects that break this cycle are the ones where the retrieval architecture was designed with the security evaluation criteria in mind from the beginning — where data governance was a design input, not a gate at the end.

You Trust Your Organization is Secure. But Can You Verify It?

Read Now

The Six Failure Modes: What Security Finds and Why It Blocks Approval

The following six failure modes appear in the majority of enterprise RAG security reviews in regulated industries. Each represents a question the security team will ask, an answer the AI team cannot give with default RAG architecture, and the architectural change required to resolve it.

Failure Mode	What the AI Team Built	Why Security Blocks It	What Resolves It
Over-permissioned retrieval	RAG pipeline uses a service account with broad repository access; retrieval is relevance-based with no per-user authorization scoping	Security team asks: what prevents a user from retrieving documents outside their authorization level? AI team has no answer that does not require architectural rework.	Per-request RBAC/ABAC enforcement at the retrieval layer; retrieval scoped to what the authenticated user is authorized to access, not what is semantically relevant across the full corpus
Service account authentication	AI system authenticates to data sources via a shared service account or static API key; no per-user identity preserved	Security team asks: which individual is responsible for each data access event? AI team cannot provide individual attribution — the only identity in the log is the service account.	OAuth 2.0 with PKCE and user-delegated authorization; individual user identity preserved through the authentication flow to the retrieval layer and logged with every access event
No audit trail at the retrieval layer	Logging implemented at the AI application layer (session logs, query logs) but not at the data layer; individual document retrievals are not recorded	Security team asks: can you produce a record of every document retrieved, by which user, on which date? AI team can produce session logs that do not satisfy per-document, per-user recording requirements.	Per-document retrieval logging at the data layer, capturing document identifier, user identity, authorization decision, sensitivity classification, and timestamp for every retrieval event
No sensitivity label enforcement	RAG pipeline ignores existing MIP or data classification labels on indexed documents; retrieval is based on semantic relevance regardless of document classification	Security team asks: what prevents the AI from retrieving a document marked Restricted or Confidential for a user without the requisite clearance? AI team has no technical control to demonstrate.	MIP label evaluation at the retrieval layer; documents above the requesting user’s authorization level are denied before entering the AI context; denial logged with policy basis
No data residency or sovereignty controls	RAG pipeline indexes documents from repositories across jurisdictions; retrieval and processing may occur in infrastructure outside the data’s legally required residency boundaries	Security team or legal asks: where is this data being processed, and does that satisfy our GDPR/sovereign cloud obligations for EU/UK data? AI team cannot answer without reviewing infrastructure documentation.	Data residency controls that enforce where retrieval, processing, and storage occur; tenant isolation that ensures cross-jurisdiction data does not comingle in retrieval operations
No incident response integration	RAG pipeline has no documented procedure for detecting, containing, or remediating a data access incident; no SIEM integration; no anomaly detection for retrieval volume or pattern	Security team asks: if the AI pipeline is used for bulk data extraction, how would you detect it? What is the response procedure? AI team has no documented answer.	Real-time SIEM integration with retrieval activity; per-user retrieval volume baselines and anomaly alerting; documented AI-specific incident response procedures integrated with the broader IR program

What Security Teams Are Actually Asking For

Translating security review requirements into architectural specifications is a recurring friction point between AI teams and security functions. The questions security asks are not arbitrary. They map directly to the security controls that enterprise security programs apply to all privileged data access systems, derived from the same framework requirements that govern EHR access, financial reporting systems, and regulated file transfer infrastructure. Understanding what the questions are actually asking for makes the required architecture legible.

When security asks “what prevents a user from accessing data outside their authorization”…

…they are asking for evidence that the retrieval system enforces the organization’s access control policies per-request, not per-session. The answer they are looking for is: “retrieval is scoped by the authenticated user’s RBAC and ABAC profile at the time of each query, and documents outside that scope are excluded from retrieval before they enter the AI context.” The answer they are not looking for is: “the AI model is instructed not to reference documents the user should not see.”

When security asks “who is responsible for each data access event”…

…they are asking for individual user attribution in the audit log, not a service account identity. The answer they are looking for is: “OAuth 2.0 with user-delegated authorization preserves the authenticated user’s identity through to the retrieval layer, and every log entry contains both the AI system identity and the individual user identity.” The answer they are not looking for is: “the AI platform logs the session, and we can correlate that to user activity.”

When security asks “how would you detect bulk data extraction”…

…they are asking for a monitoring architecture, not a theoretical description of what anomalous behavior would look like. The answer they are looking for is: “retrieval activity feeds SIEM in real time; per-user retrieval volume baselines are established; deviations above threshold generate automated alerts.” The answer they are not looking for is: “if someone was doing that, we would probably see it in the logs.”

When security asks “what happens if this system is compromised”…

…they are asking for documented incident response procedures specific to the AI pipeline, not a reference to the general IR policy. The answer they are looking for is: “the IR plan has an AI-specific addendum covering detection indicators, containment procedures for the retrieval component, forensic preservation steps, and the notification workflow if PHI or personal data is involved.”

Security Review Readiness: Ten Requirements and What Each Needs

The following checklist maps the ten most consistent security review requirements for enterprise RAG implementations to the specific architectural capabilities required to satisfy each one. AI teams should use this list before submitting to security review — any requirement where the answer is “not yet” is a finding that will delay approval.

Security Review Requirement	Category	What Is Required to Satisfy It
Can you demonstrate that retrieval is scoped to what the requesting user is authorized to access, not the full corpus?	Authentication / Access	Requires per-request RBAC/ABAC enforcement at the retrieval layer with logged authorization decisions; relevance-only retrieval with no user scoping does not satisfy this requirement
Can you show that no shared service accounts or static API keys are used for AI data access?	Authentication	Requires OAuth 2.0 with user-delegated authorization; individual user identity must be preserved through to the retrieval layer and present in every audit log entry
Can you produce a sample log entry showing the individual user, specific document retrieved, authorization decision, and timestamp for a RAG retrieval event?	Audit / Logging	Requires per-document retrieval logging at the data layer; session logs or application logs do not satisfy this requirement; the sample log entry is the most common evidence request in security review
Can you demonstrate that MIP sensitivity labels on indexed documents are evaluated at retrieval time?	Data Classification	Requires MIP label integration at the retrieval layer; documents above user authorization level must be denied before entering AI context; denial must be logged with policy basis
Can you demonstrate that AI data access events are fed to SIEM in real time?	Monitoring / Detection	Requires real-time SIEM integration at the retrieval layer; periodic log exports do not satisfy continuous monitoring requirements for FedRAMP, SOC 2, or enterprise security policy
Can you describe the anomaly detection rules active for AI retrieval activity and show an example alert?	Monitoring / Detection	Requires documented baseline rules for retrieval volume and query patterns, and at least one example alert demonstrating the monitoring is operational rather than configured-and-inactive
Can you demonstrate that retrieval and processing of data subject to residency requirements occurs within the required jurisdiction?	Data Residency	Requires infrastructure documentation showing retrieval, processing, and storage locations; for GDPR-scoped data, must demonstrate EU/UK residency or lawful transfer mechanism
Do you have a documented incident response procedure specific to AI data access incidents?	Incident Response	Requires an AI-specific addendum to the existing IR plan, covering detection indicators, containment procedures, and forensic preservation steps specific to RAG pipeline incidents
Can you demonstrate that the access controls applied to AI data access are equivalent to those applied to human access to the same data?	Access Control Equivalence	Requires that the RBAC/ABAC policies governing human access to the repository also govern AI retrieval from the same repository; separate, weaker AI access controls fail this test
Can you show that the AI system cannot retrieve or process data that the directing user is not authorized to access through any other channel?	Authorization Scope	Requires pre-retrieval authorization scoping, not post-retrieval filtering; the test is whether unauthorized data ever enters the AI context, not whether it is removed from the response

Architecture That Passes: Building the Governance Layer Before Security Review

The most effective way to pass a RAG security review is to conduct a pre-build version of it. Before the retrieval architecture is finalized, the AI team should work through the ten requirements above and identify which ones require architectural decisions rather than configuration choices.

These are the decisions that are expensive to reverse once the pipeline is built. The ones that are cheap to add later — documentation, policy updates, IR plan addenda — can wait. The ones that require redesigning the authentication model, rebuilding the retrieval authorization layer, or replacing service account credentials are not cheap to add later.

The authentication architecture decision is the most consequential and the most irreversible. Choosing OAuth 2.0 with user-delegated authorization over a service account as the retrieval authentication mechanism determines whether individual user attribution is available in every subsequent log entry, every audit trail, and every breach notification scope determination.

It is architecturally straightforward to make this choice before building; it is architecturally expensive to retrofit it into a deployed pipeline where session management was designed around service account identity.

The retrieval authorization architecture decision is the second most consequential. Pre-retrieval authorization scoping — enforcing RBAC and ABAC constraints before the retrieval operation, not as a post-retrieval filter — requires the retrieval system to be aware of the requesting user’s authorization profile.

This is not available in standard RAG framework configurations; it requires a governed retrieval layer that sits between the user’s query and the vector search operation. Building this layer as part of the initial architecture is straightforward; retrofitting it into a pipeline where the vector search operates directly against the full corpus requires rebuilding the retrieval component from scratch.

The logging architecture decision follows from the retrieval authorization architecture. Per-document retrieval logging requires that the retrieval layer generates a log event for each document it returns, with the document identifier, the user identity, the authorization decision, and the sensitivity classification.

This log event must be generated at the retrieval layer, not reconstructed from application logs. If the retrieval layer does not generate this event by design, adding it later requires instrumenting the retrieval component — which is simpler when the component is purpose-built for governance than when it is a framework-default vector search operation.

The Governance Layer: Why the AI Data Gateway Pattern Resolves the Security Review Problem

The architectural insight that simplifies RAG security review is that all six failure modes can be resolved by a single governance layer that sits between the RAG pipeline’s retrieval request and the data repositories it indexes.

This governance layer handles authentication (OAuth 2.0 with PKCE), per-request authorization (RBAC and ABAC evaluated against the user’s profile), sensitivity label enforcement (MIP label evaluation at retrieval time), per-document logging (every retrieval event logged with full attribution), and SIEM integration (real-time forwarding).

The RAG pipeline itself — the vector indexing, the embedding, the context assembly, the model — is unchanged. The governance layer is not a constraint on retrieval quality; it is a wrapper around the retrieval operation that produces compliant access rather than unconstrained access.

This is the AI Data Gateway pattern. Rather than building governance capabilities into the RAG framework directly — which requires custom development against frameworks that were not designed for it — the governance layer is a separate component that the retrieval operation passes through.

The RAG pipeline requests a retrieval; the governance layer evaluates the request against the authenticated user’s authorization profile, enforces sensitivity policies, executes the retrieval against the authorized subset of the corpus, logs the result, and returns the retrieved documents to the pipeline.

From the RAG pipeline’s perspective, it received the documents it requested. From the security team’s perspective, every security review requirement was satisfied before the documents were returned.

The build-versus-buy calculus for this pattern is straightforward for most organizations. Building a governed retrieval layer from scratch requires: implementing OAuth 2.0 with PKCE and integrating it with the enterprise identity and access management system; building a per-request authorization engine that evaluates RBAC and ABAC policies from the organization’s policy store; integrating MIP label evaluation into the retrieval path; building a per-document logging infrastructure with SIEM forwarding; and maintaining all of this as a production system.

The alternative is deploying a purpose-built AI Data Gateway that provides all of these capabilities as a managed component, allowing the AI team to focus on building the AI application while the governance layer handles what the security team needs.

How Kiteworks Gets RAG to Production

The premise behind the Kiteworks AI Data Gateway is that RAG security review failure is an architectural problem with an architectural solution — and that the solution does not require rebuilding the RAG pipeline, only adding the governance layer it was missing. The AI Data Gateway provides that layer as a deployable component integrated with the Kiteworks Private Data Network, closing each of the six security review failure modes without requiring custom development.

Authentication is handled through OAuth 2.0 with PKCE, preserving the authenticated employee’s identity from the AI assistant through to the retrieval layer. No service account mediates the access chain; every retrieval is authorized under the individual user’s identity and every audit log entry carries that identity alongside the AI system identity.

Per-request RBAC and ABAC authorization is enforced at the retrieval layer by Kiteworks’ Data Policy Engine, which evaluates the user’s authorization profile against the requested document before the document enters the AI context. MIP sensitivity labels are evaluated at retrieval time; documents above the user’s clearance level are denied before retrieval, and the denial is logged with the policy basis.

Per-document retrieval logging generates a complete audit log entry for every retrieval event: user identity, AI system identity, document identifier, sensitivity classification, authorization decision, and timestamp. Every entry feeds the Kiteworks SIEM integration in real time, establishing the continuous monitoring record that security teams and compliance frameworks require.

Per-user retrieval volume baselines are active by default, and anomaly alerts are generated when retrieval patterns deviate — providing the detection capability that security teams ask about and AI teams rarely have a good answer for.

The zero trust data exchange architecture that governs secure file sharing, managed file transfer, and secure email across the organization extends to every RAG retrieval — so the data governance posture demonstrated to security for traditional data channels is the same posture demonstrated for the AI pipeline.

There is no separate security review for a new data access architecture; there is an extension of an already-approved governance framework to a new consumption pattern. For AI teams trying to move RAG from pilot to production, that is the difference between a six-month security review cycle and a tractable one.

For VP AI/ML Engineering and CISOs who want to stop losing RAG projects at the security gate, Kiteworks provides the governance layer that closes the gap. To see how the AI Data Gateway satisfies each of the ten security review requirements, schedule a custom demo today.

Frequently Asked Questions

RAG frameworks are built to solve retrieval quality problems: semantic search, embedding quality, context assembly, and response accuracy. They are not built to solve governance problems: per-user authorization, per-document audit logging, sensitivity label enforcement, and SIEM integration. When an AI team builds a RAG pipeline using standard frameworks optimized for retrieval quality, they produce a system that performs well on retrieval dimensions and poorly on governance dimensions. Security teams evaluate the governance dimensions, find them deficient, and block approval. The failure is predictable because the frameworks do not provide governance capabilities by default, and most AI teams do not add them until security asks. Building the governance layer into the retrieval architecture before the security review is the only reliable way to avoid this cycle.

Pre-retrieval authorization scoping enforces the requesting user’s RBAC and ABAC authorization profile as a constraint on the retrieval operation itself, so the vector search only returns documents the user is authorized to access. Post-retrieval filtering retrieves all semantically relevant documents first, then removes the ones the user is not authorized to see. The security difference is fundamental: post-retrieval filtering means unauthorized documents were retrieved and placed in the AI’s context before being removed — the unauthorized access already occurred. Pre-retrieval authorization scoping means unauthorized documents are never retrieved at all. Security teams require the latter because the former does not prevent the data access; it only filters the response.

Enterprise security review for RAG authentication requires three things: individual user identity preserved through to the retrieval layer (not a shared service account), short-lived tokens rather than static API keys (to limit credential exposure window), and authentication that satisfies the enterprise identity and access management framework already in place. OAuth 2.0 with PKCE satisfies all three: user-delegated authorization preserves the individual user’s identity through the authentication flow to the retrieval layer; tokens are short-lived and PKCE prevents authorization code interception; and OAuth 2.0 integrates with enterprise identity providers. Service accounts and static API keys fail the first requirement and are found as security review findings in virtually every regulated-industry RAG evaluation.

It is technically possible but practically difficult. Standard RAG frameworks do not provide per-request ABAC authorization, per-document retrieval logging, MIP label integration, or SIEM forwarding as built-in capabilities. Implementing these capabilities on top of standard frameworks requires custom development of an OAuth 2.0 integration layer, a per-request authorization engine, a retrieval instrumentation layer for per-document logging, a MIP label evaluation integration, and a log forwarding integration — each a maintainable production system. Organizations that build this custom are effectively building an AI Data Gateway from scratch. Deploying a purpose-built component is faster, more reliable, and produces a security review evidence package that maps directly to evaluation criteria, rather than requiring the AI team to document how each custom implementation satisfies each requirement.

A correctly implemented governance layer does not degrade retrieval quality or response accuracy; it changes the scope of the retrieval operation. Pre-retrieval authorization scoping means the vector search is executed against the subset of the corpus the user is authorized to access, rather than the full corpus. For most users, this is the corpus they would expect the AI to search — their authorized data governance domain. The response accuracy for their authorized scope is unchanged. The only quality change is that the AI will not reference documents outside the user’s authorization level — which is the intended behavior, not a degradation. The zero trust data exchange principle of verifying every request rather than trusting broad access applies here: a governed retrieval that returns accurate results within authorized scope is a better production system than an ungoverned retrieval that occasionally returns results from outside authorized scope.

Additional Resources