Home > Security and Compliance Blog > Cybersecurity Risk Management > Internal Data Is the #1 Stolen Asset. Your Employees Are Exporting It Voluntarily.

Internal Data Is the #1 Stolen Asset. Your Employees Are Exporting It Voluntarily.

by Patrick Spencer updated May 29, 2026 Cybersecurity Risk Management

Reading Time: 7 minutes

The 2026 DBIR includes a sentence worth quoting in every board-level risk briefing: “Internal mostly means emails, plans and reports — the kind of material you’d expect to be lying around once an attacker strolls in via stolen credentials or an unpatched vulnerability.” Internal data appeared in 67% of breaches. Credentials in 28%. Personal data in 23%.

Most enterprise data security programs are built around regulated categories — PII for privacy regimes, PHI for HIPAA, payment card data for PCI DSS, CUI for CMMC. Those categories deserve the protection they get. But the compliance pressure does not reflect actual attacker preference. Internal data at 67% is roughly three times the personal-data figure — yet most enterprise programs concentrate the majority of classification, encryption, monitoring, and DLP investment around the personal-data side of that ratio. Strategic plans, M&A documentation, pricing models, engineering specifications, executive correspondence, board materials, financial forecasts — these are typically governed by ad-hoc folder permissions and ad-hoc sharing decisions, not by content-aware policy enforcement.

Table of Contents

5 Key Takeaways

1. Internal data is the most-stolen asset by a wide margin.

The 2026 Verizon DBIR found internal data — emails, plans, and reports — was compromised in 67% of breaches. Credentials appeared in 28%, personal data in 23%. The most-stolen asset is not customer records or payment card data. It is the working content of the organization: strategy documents, contracts in negotiation, competitive intelligence, engineering specifications, board materials. The category most enterprises have under-classified is the category most frequently stolen.

2. The same content is flowing into AI tools pre-breach.

Across 858,440 DLP events targeting AI services, source code was the top data type uploaded, followed by structured data and research and technical documentation. The data attackers want post-breach is the data employees are voluntarily exporting pre-breach into ungoverned AI services. Both flows describe the same governance gap viewed from two directions.

3. Convenience is now the dominant insider motive.

60% of malicious-insider breaches in the 2026 DBIR Privilege Misuse pattern were driven by Convenience — employees trying to get their work done outside policy. Not malice; friction. Shadow AI is exactly this motivational profile. Policies built on prohibition will continue to fail. Policies built on sanctioned alternatives that meet employees where they are will succeed where prohibition has not.

4. The two flows describe the same governance gap.

The data attackers extract post-breach is the data employees voluntarily send to AI services pre-breach. Both flows escape the same AI governance layer that should be governing them. Only 33% of organizations have complete knowledge of where their sensitive data resides per the 2026 Thales Data Threat Report — you cannot govern the flows you cannot locate.

5. The architecture that closes both flows is the same one.

Content-aware policy enforcement at the data layer, governed AI access through enterprise pipes, and tamper-evident audit logs of every interaction. When the same policy engine governs every data access request — human, attacker via stolen credentials, AI agent — both vectors are addressed by one architecture.

You Trust Your Organization is Secure. But Can You Verify It?

Read Now

The 858,440 DLP Events: The Pre-Breach Picture

The same content categories dominating post-breach exfiltration are the categories employees are voluntarily moving into AI services the enterprise does not control. The 2026 DBIR analyzed 858,440 DLP events involving uploads to generative AI tools. Source code led by a large margin. Structured data followed. In 3.2% of policy violations, research and technical documentation was uploaded to unauthorized AI systems. Verizon’s commentary: “as if the source code part was not enough, you now have potential intellectual property walking out the door.”

Regular AI use on corporate devices reached 45% of employees, up from 15% the prior year. 67% of users access AI from non-corporate accounts on their corporate devices. Shadow AI is now the third most common non-malicious insider action in DLP data — a fourfold year-over-year increase. The 2026 DTEX Insider Threat Report reinforces the gap: 92% of organizations say generative AI has changed how employees share information, yet only 13% have integrated AI into their formal insider threat strategy. The behavior has changed at scale. The governance has not.

Convenience Is the Most Underestimated Insider Motive

Of malicious-insider breaches in the 2026 DBIR Privilege Misuse pattern, 60% were driven by Convenience. Verizon’s example: “an employee wants to work from home and emails company data to a personal account.” Financial motives accounted for 33%. Espionage and other motives split the remaining 7%.

The Shadow AI pattern is exactly this motivational profile. An employee facing a deadline pastes a contract into a public LLM to summarize it before a meeting — not to sell it to a competitor, but to get the work done. The prohibition that fails for the personal-email-account case fails for the same structural reason in the AI case. Security leaders need to internalize the constructive consequence: policies built around the assumption that employees will comply with restrictive controls when those controls slow their work will continue to fail. Policies built around sanctioned, governed alternatives succeed where prohibition has not.

The Asymmetry Between Threat Defense and AI Defense

Most organizations are investing significantly in defenses against the attacker side — the CrowdStrike 2026 Global Threat Report’s 89% increase in AI-enabled adversary activity, zero-day exploitation, supply chain patterns. Those investments are necessary. But they protect against half of the problem.

The voluntary outbound flow of internal data into AI services on personal accounts is largely undefended. The DLP events are the visible part. The invisible part is the data leaving through browser extensions, AI-enabled SaaS plugins, and agentic workflows that do not trip DLP at all. The 2026 Thales Data Threat Report’s finding that only 33% of organizations have complete knowledge of where their sensitive data resides means neither side of the asymmetry can be fully addressed. The same underlying problem — inadequate content-aware data classification and inventory — limits both defenses.

The Architectural Response: One Governance Layer, Two Defended Flows

The architectural response to the threat-side problem and the AI-side problem is the same. Both flows reach for the same data through the same channels. The defense that closes one closes the other if built at the right layer — content-aware policy enforcement at the data layer rather than the network or application layer.

When a request reaches sensitive content — whether from a legitimate user, an attacker with stolen credentials, an AI agent, or an AI service an employee granted access to — the same policy engine evaluates the request. The request is authenticated. Authorization is checked against ABAC and RBAC controls. The action is logged. Content is delivered, redacted, or denied based on policy, not based on which channel reached for it.

The Kiteworks Secure MCP Server provides a governed bridge for AI assistants like Claude and Microsoft Copilot to interact with enterprise data through the Model Context Protocol — OAuth 2.0 authentication, policy evaluation on every operation, tamper-evident audit log of every interaction. The AI Data Gateway extends governed access to RAG pipelines and automated document processing. Content-aware redaction and classification at the data layer permits a contract summary to be generated by AI without permitting the full contract terms to leave the governance perimeter.

The Kiteworks Private Data Network extends this architecture across email, file sharing, MFT, SFTP, web forms, APIs, and AI integrations under one policy engine and one consolidated audit log. The forensic record covers every data access regardless of channel — human, AI, internal, or third-party — producing the answer when an incident is investigated.

What Security and Risk Leaders Should Do Now

First, expand data classification beyond regulated categories. Most classification programs are oriented around PII, PHI, payment card data, and CUI. The DBIR argues for adding internal categories — strategic plans, M&A documentation, engineering specifications, executive correspondence, board materials, financial forecasts — to the classification regime. The category attackers prioritize is the category most programs have under-classified.

Second, treat Shadow AI as a content-control problem, not a user-behavior problem. Policies asking employees not to use unsanctioned AI services will fail. Content-aware controls at the data layer succeed where prohibition does not — governing the data regardless of which channel reaches for it.

Third, provide sanctioned AI access pathways. 45% of employees are regular AI users on corporate devices — that will not reverse. Enterprises providing governed AI access through vetted platforms with policy enforcement and audit trail at the data layer displace Shadow AI into governed channels.

Fourth, consolidate the audit record across all data access channels. Both attacker exfiltration and employee AI uploads require the same forensic answer: what data moved, by whom, when, through which channel. The answer either exists in one place or must be assembled across many.

Fifth, treat content-aware policy enforcement as foundational, not a premium DLP feature. Both flows — breach exfiltration and Shadow AI export — are addressed by the same architectural pattern. Investments now will protect against both vectors as they continue to evolve.

To learn more about protecting intellectual property and other sensitive data from AI ingestion, schedule a custom demo today.

Frequently Asked Questions

Internal data — emails, plans, reports — was the most-stolen type in 67% of breaches, three times more frequently than personal data (23%). Most security programs concentrate protection on regulated categories because compliance requires it. The DBIR data argues that attackers have different priorities, and the category most enterprises have under-classified is the category most frequently stolen.

858,440 DLP events involving AI uploads, with source code, structured data, and research documentation as the top categories. 45% of employees are regular AI users on corporate devices (up from 15% the prior year); 67% use non-corporate accounts. Shadow AI is now the third most common non-malicious insider action in DLP data — internal data exits through these channels at substantial scale, in many cases without a DLP event at all.

Partially, but increasingly miscalibrated. 60% of malicious-insider breaches in the 2026 Privilege Misuse pattern were driven by Convenience, not malice. The Shadow AI pattern maps to the same motivational profile. Programs calibrated primarily around malicious-actor profiles miss the dominant insider risk — and prohibition-based policies designed for malicious actors fail under Convenience pressure.

The DBIR’s 67% internal-data finding versus 23% personal-data argues yes. Attackers steal internal data three times more frequently. Expanding data classification to strategic plans, M&A documentation, engineering specs, and executive correspondence — with the same governance applied to PII and PHI — aligns protection with actual attacker preference, not just regulatory mandates.

The architectural answer is the same for both flows: content-aware policy enforcement at the data layer, ABAC/RBAC applied to every access request regardless of channel, OAuth 2.0 authentication for AI integrations, and tamper-evident audit logs streamed to SIEM. The Kiteworks Secure MCP Server and AI Data Gateway govern AI data access through one layer that addresses both attacker exfiltration and Shadow AI export simultaneously.

Additional Resources

Blog Post How to Protect Clinical Trial Data in International Research
Blog Post The CLOUD Act and UK Data Protection: Why Jurisdiction Matters
Blog Post Zero Trust Data Protection: Implementation Strategies for Enhanced Security
Blog Post Data Protection by Design: How to Build GDPR Controls into Your MFT Program
Blog Post How to Prevent Data Breaches with Secure File Sharing Across Borders