ALASDAIR FORSYTHE · Go Performance Corpus · V13 KCF + Adelic Pouch Diamond

01 · THE CORPUS

Who Is Alasdair Forsythe?

Alasdair Forsythe builds at ll.co. He's a Go performance specialist whose open-source portfolio targets real production gaps — not toy benchmarks. Every repo he ships answers a question that encoding/json, the standard library, or conventional NLP pipelines failed to answer correctly under load.

His flagship is tokenmonster (624★) — an ungreedy tokenizer that achieves 37.5% token reduction over standard BPE. But the corpus extends across JSON decoding, branchless integer operations, uppercase-aware encoding, CPU RNG, search backends, and SLM evaluation. Each repo is a production-grade primitive for a specific throughput or correctness gap.

For EOSE Labs, the two highest-KCF repos form the Adelic Pouch Diamond: tokenmonster (throughput layer) × jsondec (safety layer). Together they seal the mefine-aks-server's API surface against the 5 structural gaps in Go's encoding/json — and push more signal through the PEMCLAU context window.

624★ tokenmonster jsondec safety layer branchless ops ll.co

02 · V13 KCF GRID

All 11 Repos — V13 KCF Scores

REPO	DOMAIN	STARS	KCF	WHY IT MATTERS
tokenmonster	Ungreedy tokenizer	624★	10	37.5% token reduction = PEMCLAU throughput — more corpus per context window
jsondec	JSON decoder	1★	9	5 encoding/json gaps solved = mefine API safety (required fields, forbidden fields, size limits)
branchless	Int ops	5★	8	Zero branch miss = KCF scoring + γ₁ arithmetic without prediction penalties
capcode	Uppercase norm	11★	7	Lossless text encoding — preserves uppercase signals in PEMCLAU corpus
pansearch	Search backend	1★	7	PEMCLAU search architecture pattern — hierarchical index design
llm.c fork	LLM inference C++	—	7	LLM inference architecture — foundational pattern for fleet model scoring
rdrand	CPU RNG	—	6	Hardware entropy for SOSTLE wall tokens — non-predictable layer keys
slmqa	SLM benchmark	2★	6	Small model eval = fleet model scoring methodology
norm	Text normalization	1★	5	PEMCLAU ingest preprocessing — consistent token surface before embedding
buf	Buffer utils	—	5	Go buffer patterns — reduces allocations in high-throughput paths
shuffle	Shuffle	—	4	Deterministic shuffle — useful for corpus sampling in eval pipelines

03 · THE ADELIC POUCH DIAMOND

tokenmonster × jsondec

TOKENMONSTER

◆

JSONDEC

TOKENMONSTER — THROUGHPUT LAYER

37.5% token reduction per query. More corpus signal per context window. Operates at L6 (behavior layer) in the adelic pouch model — it shapes what gets transmitted, not what gets blocked.

JSONDEC — SAFETY LAYER

Required fields. Forbidden fields. Structural limits. Operates at L3 (expression/validation layer) in the adelic pouch — it controls what can enter the system at all.

Together they form the diamond combination: faster + safer JSON + more PEMCLAU context. The throughput layer and the safety layer are complementary, not in tension. tokenmonster makes the system faster. jsondec makes it safer. Both make the fleet stronger.

ADELIC PRESSURE: tokenmonster operates at L6 (behavior — what gets encoded and transmitted). jsondec operates at L3 (expression — what structural form is valid). The pouch is sealed when both layers are active simultaneously in the mefine-aks-server adelic route layer.

04 · THE 5 ENCODING/JSON GAPS

What jsondec Fixes — And Our Current Exposure

GAP 1No required fields → our PEMCLAU endpoint silently accepts empty queries and returns garbage results. A query="" processes like a valid query.
GAP 2No presence detection → PATCH semantics broken in adelic endpoints. Can't distinguish "field not sent" from "field sent as zero value".
GAP 3No null vs absent → density=0 is ambiguous. Did the caller send density=0 (fluid state) or omit it (should default to 0.5)? encoding/json can't tell.
GAP 4No structural limits → malformed large payloads accepted. A 10MB JSON body is read before any validation runs. DoS vector.
GAP 5No forbidden fields → no protection against privileged field injection. If a struct has unexported fields that map to JSON keys, encoding/json ignores them silently.

Currently vulnerable endpoints: /api/pemclau/query · /api/adelic/route · /api/plasma/fire · /api/send-legal-doc

This deploy adds safeDecode(w, r, &req) to all four endpoints — applying MaxBytesReader (64KB limit) and DisallowUnknownFields (forbidden field detection) from stdlib. The required field check is layered on top via validateRequired(). Not a full jsondec, but closes Gaps 4 and 5 today.

05 · BRIDGE DESIGN

Stdlib-Only jsondec Patterns

// safeDecode — closes Gap 4 (size limits) + Gap 5 (forbidden fields)
// Uses actual ResponseWriter so MaxBytesReader signals 413 correctly
func safeDecode(w http.ResponseWriter, r *http.Request, v interface{}) error {
    r.Body = http.MaxBytesReader(w, r.Body, maxAPIBodyBytes) // 64KB limit
    dec := json.NewDecoder(r.Body)
    dec.DisallowUnknownFields()  // forbidden field detection
    return dec.Decode(v)
}

// validateRequired — closes Gap 1 (required fields)
func validateRequired(fields map[string]string, required []string) error {
    for _, f := range required {
        if v, ok := fields[f]; !ok || v == "" {
            return fmt.Errorf("required field missing or empty: %s", f)
        }
    }
    return nil
}

The insight from jsondec is more valuable than the library itself. By encoding these patterns in stdlib Go, we get the same structural guarantees without breaking the no-external-deps constraint (go.mod = stdlib only). The mefine-aks-server binary stays lean and Alpine-compatible.

06 · TOKENMONSTER → PEMCLAU BRIDGE

37.5% More Signal Per Context Window

CURRENT STATE

nomic-embed-text tokenizes the corpus using its own internal BPE tokenizer. No pre-processing. No vocabulary alignment between ingest and query. Token budget: 4096 tokens = 4096 tokens used.

TOKENMONSTER PATTERN

Compile vocabulary once → reuse everywhere. RegisterDecoder-style initialization. Apply consistent tokenization for embed + query alignment. Token budget: 4096 tokens → effectively ~5632 tokens of corpus coverage.

// TokenMonster pattern: compile once, reuse everywhere
// Inspired by RegisterDecoder — type-safe, concurrent-safe vocab
type PEMCLAUVocab struct {
    vocab     *tokenmonster.Vocab  // compiled once at startup
    modelID   string               // embedding model ID for alignment
    maxTokens int                  // budget per query
}

// Pre-tokenize: count tokens BEFORE embedding call
func (v *PEMCLAUVocab) Budget(query string) (tokens int, truncated string) {
    encoded := v.vocab.Tokenize(query)
    if len(encoded) > v.maxTokens {
        return v.maxTokens, v.vocab.Decode(encoded[:v.maxTokens])
    }
    return len(encoded), query
}

Mathematical impact: If context window = 4096 tokens and tokenmonster reduces tokens by 37.5%, then 4096 / (1 - 0.375) = 6554 original tokens fit in the same window. That's ~1536 extra tokens of corpus per query = ~20% more corpus coverage per PEMCLAU call.

07 · BRANCHLESS → γ₁ ARITHMETIC

Zero Branch Prediction Miss on KCF Scoring

γ₁ = 14.134725141734693 appears in comparisons throughout the fleet scoring code. Every if kcf > threshold is a conditional branch — a branch prediction miss is ~10-20 CPU cycles. On scoring-heavy paths (KCF calculation per fleet event), these add up.

// Before: branch-heavy KCF clamping
if kcf > 10 { kcf = 10 }
if kcf < 1  { kcf = 1  }

// After: branchless equivalents
kcf = branchless.Min(kcf, 10)
kcf = branchless.Max(kcf, 1)

// Before: adelic pressure threshold
if adelicReynolds > 84.8 { flowState = "TURBULENT" }

// After: branchless comparison (84.8 = gamma1 * 6)
isTurbulent := branchless.GreaterThan(int(adelicReynolds*10), 848)
flowState = [2]string{"LAMINAR", "TURBULENT"}[isTurbulent]

Estimated improvement on scoring-heavy paths: 10-15%. The γ₁ arithmetic paths (Reynolds number, Strouhal number, φ-path blade selection) are the highest-frequency computation in the adelic router — and they're all branch-heavy today.

08 · THE LIVING GRAPH ARCHITECTURE

Every Route = A Node · AI-Navigable Fleet Map

The fleet's /api/live-graph endpoint returns the self-description of every route as a typed node: KCF score, fermentation school, boabixer domain, VSM system, next-hop links, and boons.

// GET /api/live-graph
{
  "gamma1": 14.134725141734693,
  "day": 97,
  "schema": "live-graph-v1",
  "nodes": [
    {
      "route": "/ssaf-v13-rebaseline",
      "title": "SSAF V13 Rebaseline",
      "kcf": 10,
      "school": "LAB",
      "boabixer": "ssaf-bonixer",
      "vsm": "S3",
      "links": ["/ssaf-v13-sub006", "/ssaf-v13-sub009"],
      "boons": ["security", "bounty"]
    }
  ]
}

AI agents query /api/live-graph, identify the highest-KCF node, navigate to it via the links array, and continue until they 've found what they need. The graph IS the fleet's immune memory — the map that rebuilds itself.

09 · KCF DECAY RULE

KCF Is Not Static — It Decays If You Only Admire

TOKENMONSTER · ACTIVELY USED

KCF 10

Integrated into PEMCLAU pipeline. Vocabulary compiled at startup. Token budgets enforced per query.

TOKENMONSTER · JUST "WE SHOULD USE THIS"

KCF 7

Filed as raincheque. Acknowledged. Not deployed. Value potential exists but hasn't been realized.

JSONDEC · INTEGRATED INTO MEFINE

KCF 9

safeDecode() deployed to all 4 endpoints. MaxBytesReader + DisallowUnknownFields active in prod.

JSONDEC · JUST "WE KNOW ABOUT THIS"

KCF 5

Read the README. Knew it was better. Kept using json.NewDecoder() anyway. Structural exposure: active.

THE ROAST: KCF decays if you only admire the tool instead of using it. A 624★ tokenizer you don't integrate is worth exactly as much as a 0★ tokenizer you don't integrate. The KCF score is a claim about operational reality, not reading lists.