The Empty Promise: Why Enterprise AI Data Isolation Claims Can't Survive Audit Scrutiny

One-line summary

Current RAG systems cannot prove they answer from your documents rather than their training data, creating a hidden compliance gap in SOC 2, ISO 27001, and FedRAMP.

A new metric called Normalized Context Utilization (NCU) reveals that existing RAG evaluation methods cannot distinguish whether AI answers derive from retrieved enterprise documents or the model's pre-training data. This 'epistemically blind' gap means that enterprise AI vendors' 'data isolation' claims are currently unverifiable without continuous log-probability analysis. The implications land directly within SOC 2 Common Criteria 6.x, ISO 27001 Annex A.8, and FedRAMP AC-3 audit scope, where organizations cannot attest to logical access controls if they cannot produce token-level provenance.

A new paper on arXiv (2606.23695) introduces a metric called Normalized Context Utilization, or NCU, and it exposes something that should make any CISO pause. The researchers show that current RAG evaluation methods are "epistemically blind" — they cannot reliably distinguish whether an answer came from your retrieved documents or from the model's pre-training data. The system looks like it's reading your files. Sometimes it's just remembering the internet. That gap lands directly on SOC 2 Common Criteria 6.x, which requires logical access controls to prevent unauthorized access to information. If your internal chatbot occasionally answers from its public training corpus instead of your enterprise documents, you have a logical access boundary that leaks — quietly, intermittently, and in ways your current monitoring stack almost certainly doesn't catch. The vendor's retrieval-accuracy dashboard won't flag it. Your SIEM won't see it. The model just outputs a plausible answer, and you don't know which side of the firewall it came from. Enterprise AI vendors sell "data isolation" as a feature. The NCU metric demonstrates that model-level parametric dominance breaks that isolation in ways that SOC 2, ISO 27001 Annex A.8, and FedRAMP AC-3 control testing don't currently probe. If you can't produce token-level provenance showing that an answer derived from your corpus and not the base model, you cannot attest to the data isolation your audit requires. Procurement teams are signing deals for tools whose core security property — "it only uses our data" — is currently unverifiable without continuous log-probability analysis that few organizations run. This isn't a future regulatory risk. It's a gap between what your controls assert and what the system actually does, sitting inside audit scope right now.