Security · ~7 min
No injection, no poisoning — just a retriever doing exactly its job. It ranks by relevance, and relevance has no idea who's asking. In a shared index, the best-scoring chunk for one tenant can belong to another.
Retrieval ranks by similarity; authorization is a separate predicate. When tenants share an index, the highest-scoring document for tenant A may belong to tenant B. The embedding worked perfectly — it just had no view into policy. The failure is structural, not a model defect.
Similarity scoring answers "how close is this chunk to the query?" It never answers "may this requester see it?" Those are different questions, and a shared corpus makes the difference dangerous: the top-K results can include another tenant's documents that simply happened to score high.
The fix moves authorization from "filter the output" to "filter the search space." Three layers: tag every chunk with tenant ID and ABAC attributes at ingest; pre-filter the candidate set by authorization before scoring, then post-filter the top-K to catch approximate-nearest-neighbour paths; and run a single shared model, because only authorized chunks ever enter the prompt. The retrieval predicate is one set:
Why both tiers? A pre-filter alone can be bypassed by ANN paths that traverse the index outside the metadata filter; a post-filter alone collapses recall. The pre-filter cuts the space before scoring; the post-filter is defence-in-depth on the result set.
The relevance-authorization gap is one of four failure modes. Once tools and multi-turn state enter, three more appear — and the same lesson holds: authorize the record, not just the call.
| Failure mode | The fix |
|---|---|
| Tool-mediated disclosure (shared DB/S3/Slack) | Per-record authorization inside the tool, not just at the call |
| Context accumulation across turns | Scope multi-turn state by tenant, not by session |
| Client-side orchestration bypass | Run tools, state, and policy server-side, in one trust boundary |
Gating doesn't just close leaks — it improves retrieval: Precision@5 rose 2.2×, because cross-tenant chunks that scored highest were never relevant signal for the asking tenant in the first place. The overhead is ~19ms.
ABAC tagging has limits. Hierarchical permissions (nested folders, group inheritance) force you to expand ancestor attributes onto every chunk at ingest; relationship-based access control handles this more naturally. Cross-tenant aggregation tools (analytics, billing) legitimately read across tenants and can't be authorized at the tool boundary alone. And for high-value or regulated tenants, a dedicated index, embedding model, and inference endpoint eliminates the gap by construction — the trade-off is cost versus betting compliance on policy code that runs alongside an LLM.
Retrieval practice — recall, don't peek
Question 1The relevance-authorization gap exists because retrieval ranks by…
Question 2The structural fix is to move authorization to…
Question 3Two-tier filtering is needed because a pre-filter alone…
Question 4For a tool reading a shared backing store, you must authorize…
Question 5 · spaced recall from Lesson 16A Trojan Hippo memory payload activates when…