ALASDAIR · GO PERFORMANCE · KCF V13 · ADELIC POUCH DIAMOND γ₁=14.134725141734693
EOSE LABS · V13 KCF · DAY 97
ALASDAIR FORSYTHE
Go Performance Corpus
KCF V13 · ADELIC POUCH DIAMOND · γ₁=14.134725141734693
Who Is Alasdair Forsythe?

Alasdair Forsythe builds at ll.co. He's a Go performance specialist whose open-source portfolio targets real production gaps — not toy benchmarks. Every repo he ships answers a question that encoding/json, the standard library, or conventional NLP pipelines failed to answer correctly under load.

His flagship is tokenmonster (624★) — an ungreedy tokenizer that achieves 37.5% token reduction over standard BPE. But the corpus extends across JSON decoding, branchless integer operations, uppercase-aware encoding, CPU RNG, search backends, and SLM evaluation. Each repo is a production-grade primitive for a specific throughput or correctness gap.

For EOSE Labs, the two highest-KCF repos form the Adelic Pouch Diamond: tokenmonster (throughput layer) × jsondec (safety layer). Together they seal the mefine-aks-server's API surface against the 5 structural gaps in Go's encoding/json — and push more signal through the PEMCLAU context window.

624★ tokenmonster jsondec safety layer branchless ops ll.co
All 11 Repos — V13 KCF Scores
REPO DOMAIN STARS KCF WHY IT MATTERS
tokenmonster Ungreedy tokenizer 624★ 10 37.5% token reduction = PEMCLAU throughput — more corpus per context window
jsondec JSON decoder 1★ 9 5 encoding/json gaps solved = mefine API safety (required fields, forbidden fields, size limits)
branchless Int ops 5★ 8 Zero branch miss = KCF scoring + γ₁ arithmetic without prediction penalties
capcode Uppercase norm 11★ 7 Lossless text encoding — preserves uppercase signals in PEMCLAU corpus
pansearch Search backend 1★ 7 PEMCLAU search architecture pattern — hierarchical index design
llm.c fork LLM inference C++ 7 LLM inference architecture — foundational pattern for fleet model scoring
rdrand CPU RNG 6 Hardware entropy for SOSTLE wall tokens — non-predictable layer keys
slmqa SLM benchmark 2★ 6 Small model eval = fleet model scoring methodology
norm Text normalization 1★ 5 PEMCLAU ingest preprocessing — consistent token surface before embedding
buf Buffer utils 5 Go buffer patterns — reduces allocations in high-throughput paths
shuffle Shuffle 4 Deterministic shuffle — useful for corpus sampling in eval pipelines
tokenmonster × jsondec
TOKENMONSTER
JSONDEC
TOKENMONSTER — THROUGHPUT LAYER
37.5% token reduction per query. More corpus signal per context window. Operates at L6 (behavior layer) in the adelic pouch model — it shapes what gets transmitted, not what gets blocked.
JSONDEC — SAFETY LAYER
Required fields. Forbidden fields. Structural limits. Operates at L3 (expression/validation layer) in the adelic pouch — it controls what can enter the system at all.

Together they form the diamond combination: faster + safer JSON + more PEMCLAU context. The throughput layer and the safety layer are complementary, not in tension. tokenmonster makes the system faster. jsondec makes it safer. Both make the fleet stronger.

ADELIC PRESSURE: tokenmonster operates at L6 (behavior — what gets encoded and transmitted). jsondec operates at L3 (expression — what structural form is valid). The pouch is sealed when both layers are active simultaneously in the mefine-aks-server adelic route layer.

What jsondec Fixes — And Our Current Exposure

Currently vulnerable endpoints: /api/pemclau/query · /api/adelic/route · /api/plasma/fire · /api/send-legal-doc

This deploy adds safeDecode(w, r, &req) to all four endpoints — applying MaxBytesReader (64KB limit) and DisallowUnknownFields (forbidden field detection) from stdlib. The required field check is layered on top via validateRequired(). Not a full jsondec, but closes Gaps 4 and 5 today.

Stdlib-Only jsondec Patterns
// safeDecode — closes Gap 4 (size limits) + Gap 5 (forbidden fields)
// Uses actual ResponseWriter so MaxBytesReader signals 413 correctly
func safeDecode(w http.ResponseWriter, r *http.Request, v interface{}) error {
    r.Body = http.MaxBytesReader(w, r.Body, maxAPIBodyBytes) // 64KB limit
    dec := json.NewDecoder(r.Body)
    dec.DisallowUnknownFields()  // forbidden field detection
    return dec.Decode(v)
}

// validateRequired — closes Gap 1 (required fields)
func validateRequired(fields map[string]string, required []string) error {
    for _, f := range required {
        if v, ok := fields[f]; !ok || v == "" {
            return fmt.Errorf("required field missing or empty: %s", f)
        }
    }
    return nil
}

The insight from jsondec is more valuable than the library itself. By encoding these patterns in stdlib Go, we get the same structural guarantees without breaking the no-external-deps constraint (go.mod = stdlib only). The mefine-aks-server binary stays lean and Alpine-compatible.

37.5% More Signal Per Context Window
CURRENT STATE
nomic-embed-text tokenizes the corpus using its own internal BPE tokenizer. No pre-processing. No vocabulary alignment between ingest and query. Token budget: 4096 tokens = 4096 tokens used.
TOKENMONSTER PATTERN
Compile vocabulary once → reuse everywhere. RegisterDecoder-style initialization. Apply consistent tokenization for embed + query alignment. Token budget: 4096 tokens → effectively ~5632 tokens of corpus coverage.
// TokenMonster pattern: compile once, reuse everywhere
// Inspired by RegisterDecoder — type-safe, concurrent-safe vocab
type PEMCLAUVocab struct {
    vocab     *tokenmonster.Vocab  // compiled once at startup
    modelID   string               // embedding model ID for alignment
    maxTokens int                  // budget per query
}

// Pre-tokenize: count tokens BEFORE embedding call
func (v *PEMCLAUVocab) Budget(query string) (tokens int, truncated string) {
    encoded := v.vocab.Tokenize(query)
    if len(encoded) > v.maxTokens {
        return v.maxTokens, v.vocab.Decode(encoded[:v.maxTokens])
    }
    return len(encoded), query
}

Mathematical impact: If context window = 4096 tokens and tokenmonster reduces tokens by 37.5%, then 4096 / (1 - 0.375) = 6554 original tokens fit in the same window. That's ~1536 extra tokens of corpus per query = ~20% more corpus coverage per PEMCLAU call.

Zero Branch Prediction Miss on KCF Scoring

γ₁ = 14.134725141734693 appears in comparisons throughout the fleet scoring code. Every if kcf > threshold is a conditional branch — a branch prediction miss is ~10-20 CPU cycles. On scoring-heavy paths (KCF calculation per fleet event), these add up.

// Before: branch-heavy KCF clamping
if kcf > 10 { kcf = 10 }
if kcf < 1  { kcf = 1  }

// After: branchless equivalents
kcf = branchless.Min(kcf, 10)
kcf = branchless.Max(kcf, 1)

// Before: adelic pressure threshold
if adelicReynolds > 84.8 { flowState = "TURBULENT" }

// After: branchless comparison (84.8 = gamma1 * 6)
isTurbulent := branchless.GreaterThan(int(adelicReynolds*10), 848)
flowState = [2]string{"LAMINAR", "TURBULENT"}[isTurbulent]

Estimated improvement on scoring-heavy paths: 10-15%. The γ₁ arithmetic paths (Reynolds number, Strouhal number, φ-path blade selection) are the highest-frequency computation in the adelic router — and they're all branch-heavy today.

Every Route = A Node · AI-Navigable Fleet Map

The fleet's /api/live-graph endpoint returns the self-description of every route as a typed node: KCF score, fermentation school, boabixer domain, VSM system, next-hop links, and boons.

// GET /api/live-graph
{
  "gamma1": 14.134725141734693,
  "day": 97,
  "schema": "live-graph-v1",
  "nodes": [
    {
      "route": "/ssaf-v13-rebaseline",
      "title": "SSAF V13 Rebaseline",
      "kcf": 10,
      "school": "LAB",
      "boabixer": "ssaf-bonixer",
      "vsm": "S3",
      "links": ["/ssaf-v13-sub006", "/ssaf-v13-sub009"],
      "boons": ["security", "bounty"]
    }
  ]
}

AI agents query /api/live-graph, identify the highest-KCF node, navigate to it via the links array, and continue until they 've found what they need. The graph IS the fleet's immune memory — the map that rebuilds itself.

KCF Is Not Static — It Decays If You Only Admire
TOKENMONSTER · ACTIVELY USED
KCF 10
Integrated into PEMCLAU pipeline. Vocabulary compiled at startup. Token budgets enforced per query.
TOKENMONSTER · JUST "WE SHOULD USE THIS"
KCF 7
Filed as raincheque. Acknowledged. Not deployed. Value potential exists but hasn't been realized.
JSONDEC · INTEGRATED INTO MEFINE
KCF 9
safeDecode() deployed to all 4 endpoints. MaxBytesReader + DisallowUnknownFields active in prod.
JSONDEC · JUST "WE KNOW ABOUT THIS"
KCF 5
Read the README. Knew it was better. Kept using json.NewDecoder() anyway. Structural exposure: active.

THE ROAST: KCF decays if you only admire the tool instead of using it. A 624★ tokenizer you don't integrate is worth exactly as much as a 0★ tokenizer you don't integrate. The KCF score is a claim about operational reality, not reading lists.