Claude and Copilot Are in Your File System. Who Decides What They Can See?

At some point in the last twelve months, AI assistants arrived in your organization’s file systems. Maybe IT approved it. Maybe a business unit deployed Microsoft Copilot as part of an M365 rollout. Maybe employees connected Claude or another AI tool to their work drives on their own. 

However it happened, the result is the same: AI systems are now retrieving enterprise files on behalf of employees, and in most organizations, nobody has explicitly decided what those AI systems are allowed to see. The access controls that govern human file access were not designed for AI actors. The audit logs that track human file access were not configured to capture AI retrievals with the attribution detail compliance requires. The policies that define who can access what were written for employees, not for AI systems acting on employees’ behalf. 

This post is for CISOs and compliance officers who need to answer a question that is now operationally urgent: who actually decides what AI assistants can see in your file system?

Executive Summary

Main Idea: AI assistants are accessing enterprise file systems under authorization models that were designed for human users — with access rights, session boundaries, and audit trails that are structurally inadequate for AI actors. The gap between what organizations believe their AI access controls cover and what they actually cover is significant, largely invisible, and carries direct compliance exposure.

Why You Should Care: When a regulator or auditor asks what files your AI accessed, who authorized each retrieval, and how you can demonstrate that sensitivity classifications were enforced — those questions have answers only if your AI integration was built to generate them. Most are not. The time to discover this gap is before the inquiry, not during it.

5 Key Takeaways

  1. AI assistants accessing enterprise file systems typically run under service accounts with permissions that exceed any individual user’s authorization — meaning the AI can retrieve documents that the employee asking the question is not permitted to see through any other channel.
  2. Session-level authorization is not equivalent to per-request RBAC and ABAC enforcement. It is a single checkpoint followed by unmonitored access at machine speed.
  3. Most AI file access audit logs record the service account identity, not the human user whose query triggered the retrieval. This is not merely a logging gap — it is a HIPAA compliance gap, a GDPR compliance gap, and a forensic investigation gap.
  4. Data classification and sensitivity labels applied to files have no effect on AI retrieval unless the AI integration explicitly evaluates them at the retrieval layer. An AI assistant can surface a document marked Confidential or Restricted as readily as it surfaces an unclassified one.
  5. The governance question is not whether to allow AI file access — employees are more productive with it, and blocking it drives shadow AI. The question is whether AI file access is governed by the same policies, with the same enforcement and the same audit trail, that govern every other form of data access in the organization.

The Access Model Nobody Approved

When an organization deploys an AI assistant with file system access, it typically configures a service account to authenticate the AI to the file repository. That service account is given permissions broad enough to serve the full user population — because the AI needs to be able to retrieve any file any user might legitimately need. The result is a single credential with access to everything in scope, shared across every user who interacts with the AI.

No organization would deliberately create a human user account with this permission profile. A shared account with access to every sensitive file in the repository, usable by any employee, with no per-user access restrictions — that would fail every identity and access management review, every risk assessment, and every compliance audit.

But when the same permission structure is implemented for an AI service account, it frequently passes review because it looks like infrastructure, not access policy. The AI is treated as a system, not as an actor making data access decisions on behalf of users. That distinction does not hold up under scrutiny — and it certainly does not hold up under a regulatory inquiry.

The practical consequence is that an AI assistant connected to an enterprise file system via a service account can retrieve documents that the employee asking the question was never authorized to see. A junior analyst asking Claude to summarize the competitive landscape may receive a response grounded in documents classified above their clearance. A customer service representative asking Copilot to pull account history may inadvertently surface files from other accounts. The AI is not malfunctioning — it is doing exactly what it was configured to do. The access model was simply never designed to prevent this.

You Trust Your Organization is Secure. But Can You Verify It?

Read Now

Session-Level Authorization Is Not Zero-Trust. It Is a Perimeter With One Gate.

The most common governance response to AI file access concerns is to point to authentication: the AI is authenticated before it connects, access is verified, the session is established within the organization’s zero-trust architecture. This is accurate as far as it goes — and it does not go far enough.

Session-level authentication verifies that the AI system is permitted to connect to the file repository at the moment the session is established. It does not verify that the specific user directing the AI for a specific query is authorized to access the specific file the AI is about to retrieve. Once the session is open, every subsequent operation inherits the authorization established at connection time — the AI can retrieve anything the service account can reach, for any user, for any purpose, without a single additional authorization check.

This is equivalent to verifying a visitor’s identity at the building entrance and then giving them unrestricted access to every office, server room, and executive suite for the duration of their visit. The initial verification happened. Everything after it is implicit trust — which is precisely what zero trust data exchange principles exist to eliminate.

For human users, session-level controls are a reasonable approximation of continuous verification because human session behavior is bounded by human operational tempo. For an AI system that can execute thousands of file operations within a single session, session-level authorization is a single checkpoint followed by a period of unmonitored access at machine speed. That is a perimeter model. It is not zero-trust.

Per-request RBAC and ABAC enforcement means that every individual file operation — every retrieval, every search, every download — is evaluated against the requesting user’s actual access rights at the moment it occurs. The AI does not inherit session-level authorization; it inherits the specific permissions of the specific user whose query it is executing, for that query only. If that user is not authorized to see a document, the AI cannot retrieve it — regardless of what the service account can reach, regardless of session state, regardless of how the query is phrased.

Your Sensitivity Labels Are Not Protecting You From AI — Unless the AI Checks Them

Most organizations with mature data governance programs have invested in data classification frameworks — sensitivity labels applied to files that define how they should be handled, who can access them, and what can be done with them. Microsoft Information Protection, native file classification systems, manual classification workflows — these represent real governance investment, and they work well for governing human access to files.

They have no effect on AI retrieval unless the AI integration explicitly evaluates them. A document labeled Confidential is not more difficult for an AI to retrieve than an unclassified one. The sensitivity label is metadata attached to the file. Whether that metadata is evaluated before the file is returned to an AI — or ignored entirely — depends entirely on how the AI integration was architected. In most AI file system integrations, sensitivity labels are not evaluated at the retrieval layer. The AI retrieves the most relevant documents based on the query, and relevance scoring has no concept of sensitivity classification.

The implication for compliance officers is direct: the data governance controls you have invested in do not extend to AI access unless your AI integration was explicitly designed to enforce them. Every classification decision, every sensitivity label, every access restriction applied to a file in your repository — none of it constrains AI retrieval unless the AI integration evaluates it at the retrieval layer. An organization that believes its classification framework protects sensitive files from unauthorized AI exposure has a false assurance that will not survive an audit.

Five Access Control Failures — and What They Look Like in Practice

Access Control Failure What Is Missing What Actually Happens
Over-Privileged Service Account AI assistant runs under a service account with access to all file shares; any user can query any file the account can reach A junior analyst asks Claude to summarize the M&A pipeline. Claude retrieves board-level deal documents the analyst is not authorized to view.
Session-Level Authorization Only AI authorization verified at connection time; all subsequent operations inherit that authorization regardless of what is requested A contractor authenticated during business hours; their AI session persists. After hours, the AI continues retrieving documents without re-verification.
No Sensitivity Label Enforcement AI retrieves documents based on content relevance without evaluating classification labels An employee asks Copilot for competitor analysis. It retrieves documents marked Confidential — including a draft acquisition term sheet.
Missing Per-User Attribution All AI file access logged under the service account; no record of which user’s query triggered each retrieval A data breach investigation reveals thousands of file accesses by “AI-service-account.” No audit trail can identify who initiated the queries.
No Rate Limiting on Retrieval AI can execute unlimited file retrievals within a session; no volume controls at the data layer A compromised AI session retrieves 40,000 documents over 90 minutes before the SIEM alert is acknowledged.

The Questions You Cannot Answer — Until You Can

For compliance officers, the AI file access governance problem crystallizes into a specific and uncomfortable question: if a regulator or auditor asked what your AI assistants accessed, who authorized each retrieval, and how you can demonstrate that your data protection obligations were met — could you answer?

The answer depends entirely on what your AI integration was built to log and enforce. Most AI file access audit trails record that a service account retrieved a file. They do not record which employee’s query triggered the retrieval, whether the retrieval was consistent with that employee’s access rights, whether the file’s sensitivity classification was evaluated, or what was done with the content.

This is not a logging configuration problem — it is an architectural problem. The attribution detail required to answer compliance questions must be captured at the moment of retrieval; it cannot be reconstructed afterward.

The HIPAA Minimum Necessary Rule requires that access to protected health information be limited to the minimum necessary to accomplish the purpose. When an AI retrieves PHI to answer a query, demonstrating minimum necessary compliance requires knowing exactly what was retrieved, in response to which query, by which user.

GDPR requires that personal data processing have a documented lawful basis — which, for AI retrieval, requires knowing which user directed the retrieval and what purpose it served. SOX requires complete records of access to financial data.

FedRAMP compliance requires audit logging for all operations within authorized information systems, including AI operations, at an attribution level that identifies the responsible human actor.

What Auditors Will Ask — and What Answering Requires

Question an Auditor or Regulator Will Ask Applicable Framework What Answering It Requires
Which employees have used an AI assistant to access files containing PHI or PII in the last 90 days? HIPAA, GDPR Requires per-user attribution in AI audit logs — service account logging alone cannot answer this question
What specific documents did the AI retrieve in response to a given user query? HIPAA Minimum Necessary, GDPR data minimization Requires query-level logging that maps each retrieval to the specific request that triggered it
Was the AI prevented from accessing documents the requesting user was not authorized to see? All regulated frameworks Requires per-request RBAC/ABAC enforcement with logged policy decisions — not just session authentication
Can we demonstrate that AI data access was consistent with applicable sensitivity classifications? GDPR, SOX, FedRAMP Requires sensitivity label evaluation at the retrieval layer and documentation that labels were enforced
What is the complete access history for a specific file that may have been retrieved by AI? HIPAA, GDPR right of access, eDiscovery Requires file-level audit trail that includes AI retrievals with the same attribution detail as human access

Blocking AI Is Not the Answer. Governing It Is.

The reflex response to AI file access concerns — block the AI tools, revoke access, wait for the governance framework to catch up — creates a different problem. Employees who find AI genuinely useful for their work will access it through other means. Personal accounts, browser-based AI tools, consumer applications that have no connection to enterprise governance and no access to authoritative enterprise data.

This is not a hypothetical: shadow AI is already present in most organizations, and restricting sanctioned AI tools accelerates rather than reduces ungoverned AI usage.

The governance question is not whether to allow AI file access. It is whether AI file access is governed by the same policies, with the same enforcement and the same audit trail, that govern every other form of data access in the organization. An employee who accesses a file through a desktop application is subject to access controls, session monitoring, and audit logging.

That same employee accessing the same file through an AI assistant should be subject to identical controls — per-request authorization, sensitivity label enforcement, dual-attribution logging. If the AI path bypasses any of those controls, it represents a governance gap that will eventually be exploited, discovered, or both.

How Kiteworks Ensures That AI Sees Only What It Should

The organizations that will manage AI adoption without creating compliance liability are not the ones that block AI the longest — they are the ones that extend governance to AI the fastest. That extension requires an architecture where the answer to “who decides what AI can see” is the same as the answer for every other actor: the policy engine, evaluated at the moment of each individual request, against the specific user’s actual access rights.

Kiteworks enforces this through per-request RBAC and ABAC at the AI operation level — not at session establishment. Every file retrieval, every search, every folder operation executed by Claude, Copilot, or any MCP-compatible AI assistant through the Kiteworks Private Data Network is evaluated against the authenticated user’s current access rights before data is returned.

The AI inherits the user’s permissions — not the service account’s permissions — for every individual operation. If the user cannot see a document, the AI cannot retrieve it, regardless of session state, regardless of query construction, regardless of what the service account can reach.

Sensitivity label enforcement happens at the retrieval layer: Microsoft Information Protection classifications and Kiteworks data classification policies are evaluated before data is returned to the AI. Confidential documents are not surfaced in response to queries from users who are not authorized to see them — not because the AI is instructed not to mention them, but because the governance layer does not return them.

Every AI file operation is logged with dual attribution — AI system identity and authenticated human user — feeding the Kiteworks audit log and integrating with SIEM in real time. The compliance questions in the table above have answers — because the architecture was built to generate them.

For CISOs who need to demonstrate that AI file access is governed with the same rigor as human access, and for compliance officers who need audit-ready documentation of what AI accessed and who authorized it, Kiteworks provides the governed data layer that makes both possible. To see how it works, schedule a custom demo today.

Frequently Asked Questions

Most enterprise AI assistants access file systems through service accounts configured with permissions broad enough to serve the full user population. This means the AI can retrieve any file the service account can reach — regardless of whether the individual employee directing the AI is authorized to see that file. Combined with session-level authorization that grants implicit access for the session’s duration, this creates an access model where AI effectively bypasses the access controls that govern human file access. The risk is not hypothetical: employees can inadvertently receive AI-generated responses grounded in documents they were never authorized to view.

Only if the AI integration was explicitly designed to evaluate them. Data classification labels and sensitivity classifications applied to files are metadata — they have no effect on AI retrieval unless the integration evaluates them at the retrieval layer before returning data. In most AI file system integrations, relevance to the query determines what is retrieved; sensitivity classification does not factor in. Organizations that believe their classification framework protects sensitive files from AI exposure should verify whether their AI integration actually enforces it.

Session-level authorization verifies the AI system at connection time and grants implicit access for the session’s duration. Per-request RBAC and ABAC enforcement evaluates the specific user’s actual access rights for every individual AI operation — every file retrieval, every search, every folder navigation — at the moment it is requested. The difference in practice: with session-level authorization, any file accessible to the service account can be retrieved for any user; with per-request enforcement, only files the requesting user is specifically authorized to access can be retrieved, for that request only.

Both frameworks require attribution-level documentation that identifies the human responsible for each data access event. For AI file access, this means dual-attribution logging: every retrieval must record both the AI system identity and the authenticated human user whose query triggered it, along with the specific file retrieved, the timestamp, and the action taken. HIPAA compliance additionally requires that access to PHI satisfy the minimum necessary standard, which requires knowing exactly what was retrieved in response to which query. Service account-only logging — which records that “the AI accessed a file” without identifying the human requestor — does not satisfy either framework’s documentation requirements.

The goal is to extend existing data governance policies to AI actors — not to create separate AI-specific policies, and not to block AI tools that employees find genuinely useful. This means deploying AI integration architecture that enforces per-request authorization using the same RBAC and ABAC policies that govern human access, evaluates sensitivity labels at the retrieval layer, and generates dual-attribution audit logs for every AI operation. Organizations that accomplish this give employees governed AI access that is both more capable and more secure than the ungoverned alternatives they would otherwise use.

Additional Resources

Get started.

It’s easy to start ensuring regulatory compliance and effectively managing risk with Kiteworks. Join the thousands of organizations who are confident in how they exchange private data between people, machines, and systems. Get started today.

Table of Content
Share
Tweet
Share
Explore Kiteworks