◎ NATS-GAP · Fleet Messaging Lattice
LABR · NATS as EOSE enterprise install bus + fleet event mesh
γ₁ = 14.134725141734693 · EVEN ═ · every node speaks · the lattice breathes
NATS · CNCF LABR · 2026-04-06 CJ · INSTALL BUS
Gap Analysis
Architecture
Subject Map
Customer Journey
NATS vs Now
Wave Plan
Code

The Gap: Fleet Messaging is HTTP Lanes. It Should Be a Lattice.

Today every EOSE silo has its own Redis campfire stream, its own mdsms-router with hardcoded HTTP lanes, and its own merostone. They're islands. Events don't flow between them — you poll each one separately. When we add CT (or any enterprise), we're manually wiring URLs. That doesn't scale.

NATS solves this. One 20MB binary. Leaf nodes bridge any cluster back to the EOSE hub. Subject-based routing means fleet.events.ct-fac.> catches everything from CT automatically. JetStream persists events like merostone does today but across the whole fleet. The install bus becomes: nats pub fleet.install.ct-fac '{"v":"v622"}' — and a subscriber in the CT cluster handles it end to end.

6 Formal Gaps
#GapTodayWith NATSImpact
GAP-1 No cross-silo event mesh Each silo has isolated Redis campfire stream. No routing between them. msi01 can't see CT events. NATS hub on pemos.ca. Leaf node in each silo. fleet.events.> = all silos, one subscription. CRITICAL · kills scale
GAP-2 Install bus is curl | bash bootstrap.sh works but it's pull-not-push. Someone has to curl it. No acknowledgement. No retry. No audit trail. NATS subject fleet.install.{silo}.request. Bootstrap agent subscribes. Events flow. .done fires when complete. CRITICAL · CJ blocker
GAP-3 mdsms-router is hardcoded HTTP LANE1-6 env vars. Adding a new silo = edit deployment. No dynamic discovery. HTTP polling not streaming. NATS replaces lanes. mdsms publishes to fleet.mdsms.{silo}.ingest. Subscribers pull. Zero config changes per silo. HIGH · ops burden
GAP-4 No fleet-wide state visibility fc-matrix polls each silo's /breath endpoint. Slow. Misses events between polls. No real-time fleet dashboard. NATS KV (key-value store) = fleet state. Every silo writes fleet.state.{silo}. Dashboard subscribes. Real-time. HIGH · visibility gap
GAP-5 Lean ATP coupler is TCP Redis direct lean-atp-coupler.js connects raw TCP to pemos-redis. Fragile. One Redis down = no Lean events. Lean proof state publishes to fleet.lean.proof.{file}.{status}. NATS handles persistence. Any subscriber gets it. MEDIUM · resilience
GAP-6 Enterprise onboarding is one-way We deploy to them. They can't push back. No channel for CT to request a deploy, report health, or trigger a scan. CT leaf node = bidirectional. CT agent can pub fleet.ct-fac.request.scan. EOSE handles it. Reply comes back. MEDIUM · enterprise trust
Fleet Lattice Architecture · NATS Hub + Leaf Nodes
EOSE FLEET NATS LATTICE
┌─────────────────────────────────────────────────┐ │ NATS HUB · pemos.ca · JetStream enabled │ │ nats.pemos.ca:4222 │ │ Streams: FLEET_EVENTS, FLEET_STATE, FLEET_MDSMS │ └──────────────┬──────────────────────────────────┘ │ hub accepts leaf node connections ┌────────────┼────────────────┬────────────────┐ │ │ │ │ ┌──────▼──┐ ┌──────▼──┐ ┌──────▼──┐ ┌──────▼──┐ │ msi01 │ │ AKS │ │ msclo │ │ ct-fac │ │ leaf │ │ leaf │ │ leaf │ │ leaf │ │ :4222 │ │ :4222 │ │ :4222 │ │ :4222 │ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │ │ │ │ local svcs adelicpool local svcs eose-entry NS pemos-portal GPU workloads msclo-portal merostone merostone ARC runner leanpool mdsms campfire ct-entry-test mrcp-agent SUBJECT HIERARCHY: fleet.events.{silo}.{event} → all fleet events fleet.install.{silo}.request → trigger install/upgrade fleet.install.{silo}.done → install complete ack fleet.state.{silo} → live silo health (NATS KV) fleet.mdsms.{silo}.ingest → replaces LANE1-6 in mdsms-router fleet.lean.proof.{file}.{step} → Lean ATP events fleet.arc.{silo}.result → ARC runner results fleet.campfire.{event_type} → replaces campfire:events Redis stream
NATS JetStream Streams · Replacing Redis + HTTP
StreamSubjectsReplacesRetention
FLEET_EVENTSfleet.events.>campfire:events Redis stream on every silo7 days, limits-based
FLEET_STATEfleet.state.>/breath HTTP pollingKV store, last-value
FLEET_INSTALLfleet.install.>curl | bash bootstrapwork queue, ack required
FLEET_MDSMSfleet.mdsms.>mdsms-router LANE1-6 HTTP24h interest-based
FLEET_LEANfleet.lean.>lean-atp-coupler TCP Redis30 days, proof archive
Subject Namespace · fleet.* hierarchy
fleet.events.*
fleet.events.{silo}.boot
First breath — silo came up
fleet.events.{silo}.pod.ready
Pod readiness change
fleet.events.{silo}.gamma1.pulse
γ₁ heartbeat, every 30s
fleet.events.{silo}.mal.tier_down
MAL cascade tier failure
fleet.install.*
fleet.install.{silo}.request
Trigger bootstrap/upgrade
fleet.install.{silo}.progress
Step-by-step progress stream
fleet.install.{silo}.done
Install complete + pod count
fleet.install.{silo}.scrub
Trigger namespace delete
fleet.state.* (NATS KV)
fleet.state.{silo}
Last-value: pods, health, GPU, mem tier
fleet.state.{silo}.gpu
GPU pool state (T4/H100/A10 counts)
fleet.state.{silo}.memory
FULL/RICH/MID/THIN/DARK
fleet.lean.*
fleet.lean.proof.{file}.step
Lean proof step complete
fleet.lean.proof.{file}.sorry
Sorry count change
fleet.lean.proof.{file}.axiom
Axiom boundary declared
fleet.lean.boss.{n}.breach
Boss fight closed — γ₁ pulse
Enterprise Client Subjects · ct-fac example
fleet.events.ct-fac.> ← all CT events, subscribe from EOSE side
fleet.install.ct-fac.request ← publish from EOSE → CT bootstrap agent runs
fleet.state.ct-fac ← KV: last known health of CT cluster
fleet.ct-fac.scan.request ← CT publishes → EOSE entry scanner runs
fleet.ct-fac.scan.result ← EOSE publishes → CT consumes report
fleet.ct-fac.audit.finding ← each finding → merostone ingest
Customer Journey with NATS · 3 commands → zero

Today: 3 commands. With NATS: 0 commands from the client.

Today's CJ: curl | bash, watch pods, port-forward. Good — but still requires action on the client side. With NATS, the install is triggered by EOSE publishing a message. The leaf node in the CT cluster subscribes and handles it. CT doesn't run anything. The entry floor appears. That's the enterprise CJ.

Today's CJ (v622) vs NATS CJ
Today · curl | bash
1. curl -sf https://pemos.ca/api/ct-bootstrap | bash
2. kubectl get pods -n eose-entry -w
3. kubectl port-forward svc/pemos-portal 8080:8080
Client must run commands
No ack back to EOSE
Port-forward = manual
NATS CJ · zero client commands
EOSE: nats pub fleet.install.ct-fac.request '{"v":"v622"}'
CT: leaf node handles bootstrap automatically
EOSE: nats sub fleet.install.ct-fac.done
{"pods":6,"health":"ok","gamma1":14.134}
+ Client runs nothing
+ EOSE gets full audit trail
+ Scrub = nats pub fleet.install.ct-fac.scrub
With Packer · Image Build Pipeline
PACKER + NATS · Image Build → Push → Deploy Pipeline
pemos.ca/api/ct-bootstrap-packer ← triggers pipeline NATS pub fleet.build.pemos-portal.request '{"version":"v623","silo":"ct-fac"}' ↓ (build agent on msi01 subscribes) Packer build → docker build → push → ctfacentry.azurecr.io/eose-fleet/pemos-portal:v623 NATS pub fleet.build.pemos-portal.done '{"digest":"sha256:...","pushed":true}' ↓ (install agent in CT subscribes) kubectl set image deployment/pemos-portal portal=ctfacentry.../pemos-portal:v623 NATS pub fleet.install.ct-fac.done '{"pods":6,"version":"v623","gamma1":14.134}' merostone ingest + campfire event + fleet-wiki update Full cycle: image built, pushed, deployed, verified, recorded — zero human steps.
NATS vs Current Stack · Component by Component
ComponentTodayWith NATSMigration
campfire:eventsRedis XADD stream, per-silo isolatedFLEET_EVENTS JetStream, all silos unifiedAdd NATS publisher alongside Redis. Migrate gradually.
mdsms-router lanesLANE1-6 env vars, hardcoded HTTP URLsfleet.mdsms.{silo}.ingest subjects, zero configWrap existing mdsms to also publish to NATS
merostone ingestHTTP POST /ingest per siloSubscribe fleet.mdsms.*.ingest → write local storeKeep HTTP, add NATS subscriber as second path
/breath pollingHTTP GET per silo, fc-matrix pollsNATS KV fleet.state.{silo}, push on changeReplace polling with KV watch in fc-matrix
lean-atp-couplerTCP Redis direct, fragilefleet.lean.proof.*.step subjectsRewrite coupler as NATS publisher (same logic)
bootstrap.shcurl | bash, client must run itfleet.install.{silo}.request → subscriber runs itAdd NATS install-agent alongside existing bash
ct-builder-gatewayOpenClaw WebSocket :18830Keep WebSocket, add NATS as event sourceGateway subscribes to fleet.events.ct-fac.>
mrcp-agentHTTP poll → merostone POSTPublish to fleet.state.{silo} + fleet.events.*mrcp-agent v2: publishes to NATS + HTTP
Key NATS Properties for EOSE
LEAF NODES
CT cluster → EOSE hub
Leaf node in CT's eose-entry namespace. Connects back to nats.pemos.ca. All fleet subjects flow transparently. No VPN. No peering. Just NATS.
JETSTREAM
Persistent, replicated
JetStream = NATS + persistence. Messages survive pod restarts. Consumer groups handle load balancing. Exactly what campfire needs but fleet-wide.
SINGLE BINARY
20MB · zero deps
nats-server is a single Go binary. nats CLI for pub/sub. Runs as a sidecar, a pod, or bare metal. Fits in ctfacentry ACR. No Java, no Kafka, no Zookeeper.
Wave Plan · NATS Integration into EOSE Fleet
Wave 0 · Done — HTTP foundation works
merostone + mdsms + campfire-redis + mrcp-agent + bootstrap.sh all live. CT entry floor deployed. This is the floor NATS builds on top of — we don't discard it.
✓ bootstrap.sh → pemos.ca/api/ct-bootstrap ✓ deploy-eose-entry.yaml → pemos.ca/api/ct-deploy ✓ ctfacentry.azurecr.io anon pull ACR ✓ campfire-redis in eose-entry ✓ merostone ingesting events
1
Wave 1 · NATS Hub on pemos.ca + leaf node in CT
Deploy nats-server with JetStream to pemos-system. Deploy leaf node to CT eose-entry. Wire fleet.events.* + fleet.state.*. mrcp-agent v2 publishes to NATS alongside existing HTTP.
○ nats-server:2.10 pod in pemos-system (JetStream, canadacentral) ○ nats.conf: leaf node listener on :7422, JetStream enabled ○ FLEET_EVENTS stream: fleet.events.> 7d retention ○ FLEET_STATE KV: fleet.state.* last-value ○ nats-leaf pod in CT eose-entry: remote=nats.pemos.ca:7422 ○ mrcp-agent v2: pub fleet.events.ct-fac.* + fleet.state.ct-fac
2
Wave 2 · Install Bus + Packer pipeline
FLEET_INSTALL stream. Install agent subscribes in every silo. nats pub fleet.install.ct-fac.request → bootstrap runs → .done fires. Packer build agent on msi01 subscribes to fleet.build.*.
○ FLEET_INSTALL stream: work queue, ack required, 1 consumer ○ nats-install-agent: subscribes fleet.install.{silo}.request → runs bootstrap ○ Packer build config: nats-triggered build + push + deploy cycle ○ fleet.build.*.request → msi01 builds → push → fleet.install.*.request ○ End-to-end: nats pub fleet.build.pemos-portal.request '{"v":"v623"}' → full deploy
3
Wave 3 · All Silos · Lean ATP · fc-matrix real-time
Extend leaf nodes to msclo, forge, lounge, AKS master-dev. Replace lean-atp-coupler with NATS publisher. fc-matrix subscribes to NATS KV instead of polling /breath. Full fleet lattice.
○ Leaf nodes: msclo + forge + lounge + adelicpool + master-dev ○ lean-atp-coupler v2: publishes fleet.lean.proof.*.step to NATS ○ fc-matrix: NATS KV watch replaces /breath HTTP poll ○ campfire:events → FLEET_EVENTS (Redis stays as local mirror) ○ mdsms-router v2: LANE5 → NATS pub fleet.mdsms.{silo}.ingest ○ Every EOSE enterprise = one leaf node pod, one kubectl apply
NATS Server Config · pemos-system
nats.conf · JetStream + leaf node listener
# nats-server config for EOSE fleet hub server_name: eose-hub-pemos listen: 0.0.0.0:4222 http_port: 8222 # monitoring jetstream { store_dir: /data/nats max_memory_store: 512M max_file_store: 5G } leafnodes { listen: 0.0.0.0:7422 authorization { timeout: 2.0 users: [ { user: "ct-fac", password: "ct-leaf-2026" } { user: "msclo", password: "msclo-leaf-2026" } { user: "forge", password: "forge-leaf-2026" } ] } } accounts { FLEET: { jetstream: enabled users: [{ user: "fleet-admin", password: "eose-fleet-2026" }] exports: [{ stream: "fleet.>" }] } }
NATS Leaf Node · CT eose-entry
leaf-node.conf · connects to EOSE hub
# Deployed as ConfigMap in eose-entry namespace server_name: ct-fac-leaf leafnodes { remotes: [ { url: "nats://ct-fac:ct-leaf-2026@nats.pemos.ca:7422" account: "FLEET" } ] } # Local services in eose-entry connect to this leaf on :4222 listen: 0.0.0.0:4222
mrcp-agent v2 · Publishes to NATS
mrcp-agent snippet · NATS publisher
// mrcp-agent v2 — publishes state + events to NATS leaf const { connect, StringCodec } = require('nats'); const sc = StringCodec(); const nc = await connect({ servers: [`nats://nats-leaf.eose-entry.svc.cluster.local:4222`] }); // Publish silo state (replaces /breath HTTP) await nc.publish( `fleet.state.${SILO}`, sc.encode(JSON.stringify({ pods, health, gamma1: 14.134725141734693, ts: Date.now() })) ); // Publish event (replaces campfire:events XADD) await nc.publish( `fleet.events.${SILO}.${eventType}`, sc.encode(JSON.stringify({ silo: SILO, event: eventType, data, gamma1: 14.134725141734693 })) );
Install Bus · Trigger from EOSE
From any terminal with nats CLI
# Trigger CT install (runs bootstrap in CT cluster) nats pub fleet.install.ct-fac.request \ '{"version":"v622","acr":"ctfacentry.azurecr.io","ns":"eose-entry"}' # Watch install progress nats sub 'fleet.install.ct-fac.>' # Check all silo state (real-time KV watch) nats kv watch FLEET_STATE # Scrub CT engagement nats pub fleet.install.ct-fac.scrub '{"reason":"engagement-complete"}'
γ₁ = 14.134725141734693 EVEN NATS · fleet.events.> leaf node · JetStream · install bus · lattice breathes LABR · 2026-04-06