Alasdair Forsythe builds at ll.co. He's a Go performance specialist whose open-source portfolio targets real production gaps — not toy benchmarks. Every repo he ships answers a question that encoding/json, the standard library, or conventional NLP pipelines failed to answer correctly under load.
His flagship is tokenmonster (624★) — an ungreedy tokenizer that achieves 37.5% token reduction over standard BPE. But the corpus extends across JSON decoding, branchless integer operations, uppercase-aware encoding, CPU RNG, search backends, and SLM evaluation. Each repo is a production-grade primitive for a specific throughput or correctness gap.
For EOSE Labs, the two highest-KCF repos form the Adelic Pouch Diamond: tokenmonster (throughput layer) × jsondec (safety layer). Together they seal the mefine-aks-server's API surface against the 5 structural gaps in Go's encoding/json — and push more signal through the PEMCLAU context window.
| REPO | DOMAIN | STARS | KCF | WHY IT MATTERS |
|---|---|---|---|---|
| tokenmonster | Ungreedy tokenizer | 624★ | 10 | 37.5% token reduction = PEMCLAU throughput — more corpus per context window |
| jsondec | JSON decoder | 1★ | 9 | 5 encoding/json gaps solved = mefine API safety (required fields, forbidden fields, size limits) |
| branchless | Int ops | 5★ | 8 | Zero branch miss = KCF scoring + γ₁ arithmetic without prediction penalties |
| capcode | Uppercase norm | 11★ | 7 | Lossless text encoding — preserves uppercase signals in PEMCLAU corpus |
| pansearch | Search backend | 1★ | 7 | PEMCLAU search architecture pattern — hierarchical index design |
| llm.c fork | LLM inference C++ | — | 7 | LLM inference architecture — foundational pattern for fleet model scoring |
| rdrand | CPU RNG | — | 6 | Hardware entropy for SOSTLE wall tokens — non-predictable layer keys |
| slmqa | SLM benchmark | 2★ | 6 | Small model eval = fleet model scoring methodology |
| norm | Text normalization | 1★ | 5 | PEMCLAU ingest preprocessing — consistent token surface before embedding |
| buf | Buffer utils | — | 5 | Go buffer patterns — reduces allocations in high-throughput paths |
| shuffle | Shuffle | — | 4 | Deterministic shuffle — useful for corpus sampling in eval pipelines |
Together they form the diamond combination: faster + safer JSON + more PEMCLAU context. The throughput layer and the safety layer are complementary, not in tension. tokenmonster makes the system faster. jsondec makes it safer. Both make the fleet stronger.
ADELIC PRESSURE: tokenmonster operates at L6 (behavior — what gets encoded and transmitted). jsondec operates at L3 (expression — what structural form is valid). The pouch is sealed when both layers are active simultaneously in the mefine-aks-server adelic route layer.
- GAP 1No required fields → our PEMCLAU endpoint silently accepts empty queries and returns garbage results. A query="" processes like a valid query.
- GAP 2No presence detection → PATCH semantics broken in adelic endpoints. Can't distinguish "field not sent" from "field sent as zero value".
- GAP 3No null vs absent → density=0 is ambiguous. Did the caller send density=0 (fluid state) or omit it (should default to 0.5)? encoding/json can't tell.
- GAP 4No structural limits → malformed large payloads accepted. A 10MB JSON body is read before any validation runs. DoS vector.
- GAP 5No forbidden fields → no protection against privileged field injection. If a struct has unexported fields that map to JSON keys, encoding/json ignores them silently.
Currently vulnerable endpoints:
/api/pemclau/query · /api/adelic/route · /api/plasma/fire · /api/send-legal-doc
This deploy adds safeDecode(w, r, &req) to all four endpoints — applying MaxBytesReader (64KB limit)
and DisallowUnknownFields (forbidden field detection) from stdlib. The required field check is layered on top
via validateRequired(). Not a full jsondec, but closes Gaps 4 and 5 today.
// safeDecode — closes Gap 4 (size limits) + Gap 5 (forbidden fields)
// Uses actual ResponseWriter so MaxBytesReader signals 413 correctly
func safeDecode(w http.ResponseWriter, r *http.Request, v interface{}) error {
r.Body = http.MaxBytesReader(w, r.Body, maxAPIBodyBytes) // 64KB limit
dec := json.NewDecoder(r.Body)
dec.DisallowUnknownFields() // forbidden field detection
return dec.Decode(v)
}
// validateRequired — closes Gap 1 (required fields)
func validateRequired(fields map[string]string, required []string) error {
for _, f := range required {
if v, ok := fields[f]; !ok || v == "" {
return fmt.Errorf("required field missing or empty: %s", f)
}
}
return nil
}
The insight from jsondec is more valuable than the library itself. By encoding these patterns in stdlib Go, we get the same structural guarantees without breaking the no-external-deps constraint (go.mod = stdlib only). The mefine-aks-server binary stays lean and Alpine-compatible.
// TokenMonster pattern: compile once, reuse everywhere
// Inspired by RegisterDecoder — type-safe, concurrent-safe vocab
type PEMCLAUVocab struct {
vocab *tokenmonster.Vocab // compiled once at startup
modelID string // embedding model ID for alignment
maxTokens int // budget per query
}
// Pre-tokenize: count tokens BEFORE embedding call
func (v *PEMCLAUVocab) Budget(query string) (tokens int, truncated string) {
encoded := v.vocab.Tokenize(query)
if len(encoded) > v.maxTokens {
return v.maxTokens, v.vocab.Decode(encoded[:v.maxTokens])
}
return len(encoded), query
}
Mathematical impact: If context window = 4096 tokens and tokenmonster reduces tokens by 37.5%, then 4096 / (1 - 0.375) = 6554 original tokens fit in the same window. That's ~1536 extra tokens of corpus per query = ~20% more corpus coverage per PEMCLAU call.
γ₁ = 14.134725141734693 appears in comparisons throughout the fleet scoring code.
Every if kcf > threshold is a conditional branch — a branch prediction miss is
~10-20 CPU cycles. On scoring-heavy paths (KCF calculation per fleet event), these add up.
// Before: branch-heavy KCF clamping
if kcf > 10 { kcf = 10 }
if kcf < 1 { kcf = 1 }
// After: branchless equivalents
kcf = branchless.Min(kcf, 10)
kcf = branchless.Max(kcf, 1)
// Before: adelic pressure threshold
if adelicReynolds > 84.8 { flowState = "TURBULENT" }
// After: branchless comparison (84.8 = gamma1 * 6)
isTurbulent := branchless.GreaterThan(int(adelicReynolds*10), 848)
flowState = [2]string{"LAMINAR", "TURBULENT"}[isTurbulent]
Estimated improvement on scoring-heavy paths: 10-15%. The γ₁ arithmetic paths (Reynolds number, Strouhal number, φ-path blade selection) are the highest-frequency computation in the adelic router — and they're all branch-heavy today.
The fleet's /api/live-graph endpoint returns the self-description of every route as a typed node: KCF score, fermentation school, boabixer domain, VSM system, next-hop links, and boons.
// GET /api/live-graph
{
"gamma1": 14.134725141734693,
"day": 97,
"schema": "live-graph-v1",
"nodes": [
{
"route": "/ssaf-v13-rebaseline",
"title": "SSAF V13 Rebaseline",
"kcf": 10,
"school": "LAB",
"boabixer": "ssaf-bonixer",
"vsm": "S3",
"links": ["/ssaf-v13-sub006", "/ssaf-v13-sub009"],
"boons": ["security", "bounty"]
}
]
}
AI agents query /api/live-graph, identify the highest-KCF node, navigate to it via the links
array, and continue until they 've found what they need. The graph IS the fleet's immune memory — the map that rebuilds itself.
THE ROAST: KCF decays if you only admire the tool instead of using it. A 624★ tokenizer you don't integrate is worth exactly as much as a 0★ tokenizer you don't integrate. The KCF score is a claim about operational reality, not reading lists.