LAW 01
The Index Is Regulated Data Too
Not just the source file. The search index, embeddings, cached snippets, highlights, thumbnails, OCR output, metadata, and query logs may all become regulated data. Protect all layers, not just the original asset.
LAW 02
Policy Before Retrieval
Never index before classification. Never search before authorization. Never return before masking. Never forget to audit. These are absolute. There are no exceptions.
LAW 03
Search = Data-Access Event
Every search produces an audit event. query → identity → policy → purpose → scope → filtered index → masked result → audit event → lineage record. This is the full pipeline, always.
LAW 04
Regulated Search Is Set Filtering
R = Q ∩ A(u) ∩ P(p) ∩ J(j). Visible results = intersection of query match, user auth, purpose allowed, jurisdiction allowed. Rank ONLY after this intersection. Never rank then filter.
LAW 05
Never Index Raw — Always Project
Source asset NEVER goes directly into the index. Compute a SearchPolicy projection first. Projection hash = H(asset_hash + schema_hash + rules_version). The projection is what gets indexed.
LAW 06
Snippets Are Mini-Disclosures
Mask/redact text BEFORE generating snippets. Never generate snippet from raw text then mask after. For high-risk data: no snippets by default. Metadata-only results until explicit authorization.
LAW 07
Query Logs Are Regulated Data
A query can itself be sensitive: "patient HIV status" / "fraud investigation CFO" / "child custody abuse evidence". Log query hash for analytics. Raw query only in security audit zone. Search telemetry can be more sensitive than search results.
LAW 08
Inverted Index Risk
Normal inverted index leaks regulated terms. Solution: do not index sensitive tokens / redact before indexing / tokenize before indexing / encrypt index fields / partition indexes by access domain / apply document-level security before results.
LAW 09
Existence Leakage
Even returning 0 results can leak. Facet count "3 hidden matches" leaks existence. Policy before scoring. Policy before facets. Policy before counts. Never reveal existence of a record the user cannot access.
LAW 10
Vector Search Is Regulated Too
Embeddings may encode sensitive facts. Chunks may contain PII. RAG retrieval is data access. Prompt context is data disclosure. LLM output is derived data. The model cannot leak what it never receives. Retrieval policy is the main safety boundary.
LAW 11
Silence Is First-Class Output
When L7 triggers: return structured refusal, not a 404, not an error. Refusal includes: what was asked (hashed), which layer triggered, why, what alternate scope might be authorized. Refusal IS the answer.
LAW 12
Transitive Deletion
Deleting a source asset triggers DeleteClosure(asset): source → OCR → projection → inverted index → vector index → snippet cache → autocomplete → LLM summary cache → query result cache. All derived artifacts. No orphans.
LAW 13
Two-Index Pattern
Index A: safe metadata (broader access, C1–C2 queries). Index B: regulated full-text/chunks (narrow access, C3–C5 only). PEMCLAU = Index B. Merostone lattice = Index A. Separation reduces blast radius.
LAW 14
Break-Glass Search
Emergency access: requires stated reason + elevated role + time-limit + post-review commitment. Break-glass audit = full (raw query logged). Creates sorry-flow entry. The gap must be closed or justified. Not admin bypass — controlled exception.
LAW 15
Autocomplete Is Regulated
Autocomplete should not train on restricted terms unless the user is authorized. Typing "jo" should not suggest "John Smith HIV result." Separate autocomplete indexes by classification. Filter suggestions by user policy.
LAW 16–35
The Complete 35-Law Corpus
Full corpus in: TRB-MEEK-SEARCH-BOXINER-001 · ARB1-MEEK-SEARCH-BOXINER-001 · arch/merchant-spear/ · fleet-wiki :9400. All 35 laws + 12 ARB decisions + Perl/Ruby mode spec + projection hash protocol + sovereign Protobuf search envelope.