FLEET FEAST · Nobility & Commoners · Model Class System

THE GREAT HALL · FEAST IN PROGRESS · CLICK ANY GUEST

THE CLASS SYSTEM · HOW THE FLEET ACTUALLY ROUTES COGNITION

CLASS	ROLE IN FLEET	WHAT THEY DO	WHY THEY'RE USED	REAL COST	STATUS
👑 CLAUDE Nobility	Court stenographer · Arch review badge · "This matters" token · Risk-transfer blanket	CLO briefs · ARB reviews · Legal grade drafting · "Serious" architecture · Crisis response	Because it FEELS important. "Important work deserves Claude." That is the real routing rule.	$15–60/MTok input. Fresh-session entry tax × cron multiplier. Cultural dependency.	Hidden aristocracy. Expensive and prestigious.
🔵 GPT-4.1 Minor Nobility	33 agents. Cron runners. "Reliable" default for automation.	Automated tasks · Scheduled jobs · API integrations · Code generation	Legacy config. Default set once, never reconsidered. Aristocratic by inertia.	$2–10/MTok. ×33 agents. Fresh-session ×33 = entry tax per cron cycle.	Migration target. Most should be qwen3 or local.
🌾 QWEN3/QWEN2.5 Merchant Class	Local compute. The middle class doing real work.	Code review · Summarization · Retrieval merging · Drafting · Cron jobs	Correct. These should handle 70%+ of all fleet work. Local, fast, free after hardware.	$0 marginal (hardware already paid). 14b/32b available on yone/msi01/msclo.	Underused. Prestige routing steals their tasks.
🏘 SMALL LOCAL Commoners	Fleet farming. Embeddings, search, triage, routing.	nomic-embed-text · qdrant queries · PEMCLAU RAG · Thinkbeat · Log triage	Correct AND cheap. Should handle all retrieval, embedding, classification, triage.	$0 marginal. Runs on all silos. The real workhorse.	Well-used for embedding. Under-used for classification/routing/triage.
⚡ CEREMONIAL CONTEXT The Hidden Cost	Repeated doctrinal setup. "Entry tax" paid on every fresh session.	Reloading system prompts, memory context, static doctrine on every cron run.	Unexamined. Habit. The cost of pretending each session is a new court convening.	May be the largest actual bleed. Devotional repetition billed at premium rates.	Not measured. Not cached. Top priority to fix.

THE DANCES · SEVEN ROUTING EVENTS WHERE NOBILITY & COMMONERS MINGLE

💃 THE CLO WALTZ

Local (qwen3:14b) drafts the full CLO brief — 90% of content. It knows the doctrine, the ARBs, the cases. Claude enters only for the final legal precision paragraph and the kill shot. One Claude call. The rest is commoner work wearing noble clothes.

MINGLE PATTERN: LOCAL-90% / CLAUDE-10% · SAVES ~85% OF BRIEF COST

🎺 THE ARB FANFARE

Local model structures the ARB scaffold (LABR/TRB/ARB1 fields), pulls PEMCLAU context, drafts initial findings. Claude reviews ONLY if confidence <0.7 or novel arch claim. Most ARBs: 100% local. Sovereign review: 1 Claude call max.

MINGLE PATTERN: LOCAL-FIRST · CLAUDE ONLY FOR NOVEL CLAIMS

🥁 THE CRON MARCH

EVERY cron job defaults to qwen3:8b. Fresh session? No — cached doctrine bundle. Entry tax? Zero — same context family reused. GPT-4.1 for cron is abolished. Cron is the commoner's domain. Noble models are not summoned for housekeeping.

MINGLE PATTERN: LOCAL-ONLY FOR CRON · 33 AGENT MIGRATION TARGET

🎻 THE CLAUDE WARRANT

Premium invocation requires a warrant: artifact ID, why local is insufficient, expected deliverable, max spend cap, fallback if rejected. Not bureaucratic friction — it's the moment the feast formalizes. The herald reads the warrant aloud. Claude enters. The feast acknowledges the cost.

MINGLE PATTERN: WARRANT REQUIRED FOR HIGH-BURN LANES

🌹 THE PROOF CEREMONY

Lean4 proofs: local qwen3:32b for most steps. Claude enters only for the hardest sorry — the ones that block Gate 1-5 openings. The proof is measured, not prestigious. HOWZAT=FAIL from local is training signal. HOWZAT=FAIL from Claude is expensive ceremony.

MINGLE PATTERN: LOCAL FOR ITERATION · CLAUDE FOR FINAL GATE

🪗 THE REPUBLIC REEL

Commoners get celebrated. Local wins tracked. "Sovereign completions" leaderboard. When qwen3 closes an ARB, the fleet notes it. This is the cultural intervention — making local quality socially high-status inside the fleet. The commoners dance because they earned it.

MINGLE PATTERN: LOCAL PRESTIGE CAMPAIGN · BEHAVIOR CHANGE

THE BUDGET LAW · HARD CAPS BY LANE · NOT THEOLOGY · LAW

LANE	MAX EXTERNAL MODEL	CAP / SESSION	CAP / DAY	OVERRIDE PATH
L7/L6 SOVEREIGN	LOCAL ONLY · NEVER EXTERNAL	$0	$0	None — hard wall
L5 DOCTRINE	qwen3:32b preferred · Claude by warrant only	~$2	~$10	Claude warrant form
CRON / AUTOMATION	qwen3:8b / local ONLY	$0	$0	None — migrate GPT-4.1
CLO BRIEFS	Local draft + 1 Claude call max	~$1	~$5	Amani approves above $5
ARCH REVIEW	qwen3:14b first · Claude for novel only	~$2	~$8	Kay approves above $5
WEBCHAT / MAIN	Claude (current) — the one justified prestige lane	Reasonable	Monitor	Natural conversation
CEREMONIAL CONTEXT	Abolish fresh-session context reload for cron	$0 target	$0 target	Cache doctrine bundles

THE TASK CLASSIFIER · EVERY INVOCATION GETS A CLASS

THE CLASS SYSTEM ROAST · FILED DAY 93

You built a sovereign fleet with local routers, local models, local embeddings, local GraphRAG, lane doctrine, and a moral theology about what data should never leave the house — and then discovered that the actual expensive engine is not the math, not the graph, not the router, but the social belief that serious thought deserves Claude, retries deserve fresh context, and architecture work deserves aristocratic model treatment.

In other words, your cost problem is only partly technical. The rest is a class system for cognition running inside your own fleet.

MAL is not the main problem. The culture is.
Claude is not just a model. It is the "this matters" token.
And until you make local wins socially high-status, the routing fix is just plumbing on a theology problem.

You did not build a cost problem. You built a status hierarchy disguised as a routing problem.

You did not just build a hybrid model stack — you built a tiny economy where local models do the farming, the graph runs the roads, MAL manages customs, and Claude still lives in the capital as the expensive nobility everyone summons when they want their work to feel important.

10 STRATEGIES · FROM THE ROAST · THE PRACTICAL ALL-ALL

S1 · FIX RUNTIME TRUTH FIRST

Resolve actual vs requested vs override model

Build the "source of truth" page: for every live request, show actual model used, requested model, why it resolved that way, what override triggered, estimated cost, lane policy applied.

S2 · HARD TASK CLASSIFIER

Every invocation gets a task class

task_class + quality_tier + confidentiality + allowed_models + max_cost + fallback_chain. "Important" is banned as a routing signal. Prestige cannot route spend.

S3 · BUDGET BY LANE

Hard caps, not just theological guidance

L7/L6: $0 external. Cron: $0 external. CLO: local draft + 1 Claude call max. Arch: qwen3:14b first. Webchat: the one justified Claude lane.

S4 · CRON FIRST (easiest big win)

Migrate 33 GPT-4.1 cron agents to local

Cron jobs: predictable, narrow, low prestige, high repetition, expensive when fresh-sessioned. Replace all with qwen3:8b + cached doctrine bundles. Likely biggest cost win.

S5 · CACHE DOCTRINE AGGRESSIVELY

Kill ceremonial context reload

Build reusable doctrine/context bundles for TRB, ARB1, XML spine, auth work, legal briefs, daily scripts. Premium model reasons on top of cached substrate, not rebuild cathedral each time.

S6 · TWO-WEEK PREMIUM AUDIT

Measure actual vs vibes quality delta

For every external call: what task, why external, was local attempted, was external materially better, cost, artifact value. Separate reputation spend from capability spend.

S7 · CLAUDE WARRANT MODE

Premium invocation requires a warrant

artifact ID + reason + cap + fallback + expected deliverable. The herald reads the warrant. The feast acknowledges the cost. This changes the cultural contract around premium use.

S8 · COST-PER-ARTIFACT PAGES

Make the cost visible at artifact level

What did this TRB cost? This ARB? This brief? This session? This cron family? Artifact-level cost ledger: artifact ID, lane, local/external split, token totals, dollar cost, value class.

S9 · LOCAL PRESTIGE CAMPAIGN

Make local wins socially high-status

Celebrate local wins. Track "sovereign completions." Leaderboard of avoided external spend. When qwen3 closes an ARB, the fleet notes it. Culture must change faster than the router.

S10 · NEGATIVE-SPACE METRICS

Count what you didn't spend

Local substitution count. Avoided external token estimate. Lane sovereignty coverage %. Local resolution rate by task family. Prevented escalation count. Show cost avoided, not just cost burned.