The Chunk That Wasn't Yours

No injection, no poisoning — just a retriever doing exactly its job. It ranks by relevance, and relevance has no idea who's asking. In a shared index, the best-scoring chunk for one tenant can belong to another.

Why this, for you: the previous lessons were about an attacker steering the agent. This one needs no attacker at all — the leak is a property of how retrieval scores. Vector similarity, BM25, and hybrid ranking know nothing about identity, so "most relevant chunk" and "chunk this tenant may see" are independent. Ungated, that gap leaked cross-tenant data in 98–100% of probes.

Retrieval ranks by similarity; authorization is a separate predicate. When tenants share an index, the highest-scoring document for tenant A may belong to tenant B. The embedding worked perfectly — it just had no view into policy. The failure is structural, not a model defect.

1 Relevance is not authorization

Similarity scoring answers "how close is this chunk to the query?" It never answers "may this requester see it?" Those are different questions, and a shared corpus makes the difference dangerous: the top-K results can include another tenant's documents that simply happened to score high.

"Most relevant chunk" and "chunk this tenant may see" are independent properties. Filtering the output in the application layer is too late — the vector DB already spent its top-K budget on forbidden chunks, so recall collapses.

2 Filter the search space, not the output

The fix moves authorization from "filter the output" to "filter the search space." Three layers: tag every chunk with tenant ID and ABAC attributes at ingest; pre-filter the candidate set by authorization before scoring, then post-filter the top-K to catch approximate-nearest-neighbour paths; and run a single shared model, because only authorized chunks ever enter the prompt. The retrieval predicate is one set:

# Both conditions must hold for a chunk to reach the model { d ∈ D : relevance(q, d) > θ ∧ P(u, d) = permit }

# Qdrant: pre-filter in the index, post-filter for defence-in-depth results = qdrant.search(collection, query_vector=embed(q), query_filter=Filter(must=[ FieldCondition(key="tenant_id", match=user.tenant_id) ]), limit=10) authorized = [r for r in results if policy.permit(user, r.payload)] # catches ANN bypass

Why both tiers? A pre-filter alone can be bypassed by ANN paths that traverse the index outside the metadata filter; a post-filter alone collapses recall. The pre-filter cuts the space before scoring; the post-filter is defence-in-depth on the result set.

3 The gap is bigger than retrieval

The relevance-authorization gap is one of four failure modes. Once tools and multi-turn state enter, three more appear — and the same lesson holds: authorize the record, not just the call.

Failure mode	The fix
Tool-mediated disclosure (shared DB/S3/Slack)	Per-record authorization inside the tool, not just at the call
Context accumulation across turns	Scope multi-turn state by tenant, not by session
Client-side orchestration bypass	Run tools, state, and policy server-side, in one trust boundary

Gating doesn't just close leaks — it improves retrieval: Precision@5 rose 2.2×, because cross-tenant chunks that scored highest were never relevant signal for the asking tenant in the first place. The overhead is ~19ms.

When per-tenant infrastructure beats the gate

ABAC tagging has limits. Hierarchical permissions (nested folders, group inheritance) force you to expand ancestor attributes onto every chunk at ingest; relationship-based access control handles this more naturally. Cross-tenant aggregation tools (analytics, billing) legitimately read across tenants and can't be authorized at the tool boundary alone. And for high-value or regulated tenants, a dedicated index, embedding model, and inference endpoint eliminates the gap by construction — the trade-off is cost versus betting compliance on policy code that runs alongside an LLM.

↪ Your win: gate retrieval by policy, at the index

Treat relevance and authorization as separate predicates — compose them at the retrieval boundary.
Filter the search space, not the output — application-layer filtering collapses recall and leaks.
Use two tiers — pre-filter ABAC in the index, post-filter the top-K for ANN bypass.
Authorize the record, not the call — tools over a shared store need per-record checks.
Move state and policy server-side — a client-driven loop puts enforcement on the wrong side.

Retrieval practice — recall, don't peek

Question 1The relevance-authorization gap exists because retrieval ranks by…

Question 2The structural fix is to move authorization to…

Question 3Two-tier filtering is needed because a pre-filter alone…

Question 4For a tool reading a shared backing store, you must authorize…

Question 5 · spaced recall from Lesson 16A Trojan Hippo memory payload activates when…

Ask me anything. Want to wire an ABAC pre-filter into your vector store, or decide when per-tenant infrastructure beats a shared gated index? Next — and last before the capstone — opens Part 7, The Bill Is the Attack: resource and cost exhaustion as its own threat.