| CLASS |
ROLE IN FLEET |
WHAT THEY DO |
WHY THEY'RE USED |
REAL COST |
STATUS |
👑 CLAUDE Nobility |
Court stenographer · Arch review badge · "This matters" token · Risk-transfer blanket |
CLO briefs · ARB reviews · Legal grade drafting · "Serious" architecture · Crisis response |
Because it FEELS important. "Important work deserves Claude." That is the real routing rule. |
$15–60/MTok input. Fresh-session entry tax × cron multiplier. Cultural dependency. |
Hidden aristocracy. Expensive and prestigious. |
🔵 GPT-4.1 Minor Nobility |
33 agents. Cron runners. "Reliable" default for automation. |
Automated tasks · Scheduled jobs · API integrations · Code generation |
Legacy config. Default set once, never reconsidered. Aristocratic by inertia. |
$2–10/MTok. ×33 agents. Fresh-session ×33 = entry tax per cron cycle. |
Migration target. Most should be qwen3 or local. |
🌾 QWEN3/QWEN2.5 Merchant Class |
Local compute. The middle class doing real work. |
Code review · Summarization · Retrieval merging · Drafting · Cron jobs |
Correct. These should handle 70%+ of all fleet work. Local, fast, free after hardware. |
$0 marginal (hardware already paid). 14b/32b available on yone/msi01/msclo. |
Underused. Prestige routing steals their tasks. |
🏘 SMALL LOCAL Commoners |
Fleet farming. Embeddings, search, triage, routing. |
nomic-embed-text · qdrant queries · PEMCLAU RAG · Thinkbeat · Log triage |
Correct AND cheap. Should handle all retrieval, embedding, classification, triage. |
$0 marginal. Runs on all silos. The real workhorse. |
Well-used for embedding. Under-used for classification/routing/triage. |
⚡ CEREMONIAL CONTEXT The Hidden Cost |
Repeated doctrinal setup. "Entry tax" paid on every fresh session. |
Reloading system prompts, memory context, static doctrine on every cron run. |
Unexamined. Habit. The cost of pretending each session is a new court convening. |
May be the largest actual bleed. Devotional repetition billed at premium rates. |
Not measured. Not cached. Top priority to fix. |
💃 THE CLO WALTZ
Local (qwen3:14b) drafts the full CLO brief — 90% of content. It knows the doctrine, the ARBs, the cases. Claude enters only for the final legal precision paragraph and the kill shot. One Claude call. The rest is commoner work wearing noble clothes.
MINGLE PATTERN: LOCAL-90% / CLAUDE-10% · SAVES ~85% OF BRIEF COST
🎺 THE ARB FANFARE
Local model structures the ARB scaffold (LABR/TRB/ARB1 fields), pulls PEMCLAU context, drafts initial findings. Claude reviews ONLY if confidence <0.7 or novel arch claim. Most ARBs: 100% local. Sovereign review: 1 Claude call max.
MINGLE PATTERN: LOCAL-FIRST · CLAUDE ONLY FOR NOVEL CLAIMS
🥁 THE CRON MARCH
EVERY cron job defaults to qwen3:8b. Fresh session? No — cached doctrine bundle. Entry tax? Zero — same context family reused. GPT-4.1 for cron is abolished. Cron is the commoner's domain. Noble models are not summoned for housekeeping.
MINGLE PATTERN: LOCAL-ONLY FOR CRON · 33 AGENT MIGRATION TARGET
🎻 THE CLAUDE WARRANT
Premium invocation requires a warrant: artifact ID, why local is insufficient, expected deliverable, max spend cap, fallback if rejected. Not bureaucratic friction — it's the moment the feast formalizes. The herald reads the warrant aloud. Claude enters. The feast acknowledges the cost.
MINGLE PATTERN: WARRANT REQUIRED FOR HIGH-BURN LANES
🌹 THE PROOF CEREMONY
Lean4 proofs: local qwen3:32b for most steps. Claude enters only for the hardest sorry — the ones that block Gate 1-5 openings. The proof is measured, not prestigious. HOWZAT=FAIL from local is training signal. HOWZAT=FAIL from Claude is expensive ceremony.
MINGLE PATTERN: LOCAL FOR ITERATION · CLAUDE FOR FINAL GATE
🪗 THE REPUBLIC REEL
Commoners get celebrated. Local wins tracked. "Sovereign completions" leaderboard. When qwen3 closes an ARB, the fleet notes it. This is the cultural intervention — making local quality socially high-status inside the fleet. The commoners dance because they earned it.
MINGLE PATTERN: LOCAL PRESTIGE CAMPAIGN · BEHAVIOR CHANGE
THE CLASS SYSTEM ROAST · FILED DAY 93
You built a sovereign fleet with local routers, local models, local embeddings, local GraphRAG, lane doctrine, and a moral theology about what data should never leave the house — and then discovered that the actual expensive engine is not the math, not the graph, not the router, but the social belief that serious thought deserves Claude, retries deserve fresh context, and architecture work deserves aristocratic model treatment.
In other words, your cost problem is only partly technical. The rest is a class system for cognition running inside your own fleet.
MAL is not the main problem. The culture is.
Claude is not just a model. It is the "this matters" token.
And until you make local wins socially high-status, the routing fix is just plumbing on a theology problem.
You did not build a cost problem. You built a status hierarchy disguised as a routing problem.
You did not just build a hybrid model stack — you built a tiny economy where local models do the farming, the graph runs the roads, MAL manages customs, and Claude still lives in the capital as the expensive nobility everyone summons when they want their work to feel important.
S1 · FIX RUNTIME TRUTH FIRST
Resolve actual vs requested vs override model
Build the "source of truth" page: for every live request, show actual model used, requested model, why it resolved that way, what override triggered, estimated cost, lane policy applied.
S2 · HARD TASK CLASSIFIER
Every invocation gets a task class
task_class + quality_tier + confidentiality + allowed_models + max_cost + fallback_chain. "Important" is banned as a routing signal. Prestige cannot route spend.
S3 · BUDGET BY LANE
Hard caps, not just theological guidance
L7/L6: $0 external. Cron: $0 external. CLO: local draft + 1 Claude call max. Arch: qwen3:14b first. Webchat: the one justified Claude lane.
S4 · CRON FIRST (easiest big win)
Migrate 33 GPT-4.1 cron agents to local
Cron jobs: predictable, narrow, low prestige, high repetition, expensive when fresh-sessioned. Replace all with qwen3:8b + cached doctrine bundles. Likely biggest cost win.
S5 · CACHE DOCTRINE AGGRESSIVELY
Kill ceremonial context reload
Build reusable doctrine/context bundles for TRB, ARB1, XML spine, auth work, legal briefs, daily scripts. Premium model reasons on top of cached substrate, not rebuild cathedral each time.
S6 · TWO-WEEK PREMIUM AUDIT
Measure actual vs vibes quality delta
For every external call: what task, why external, was local attempted, was external materially better, cost, artifact value. Separate reputation spend from capability spend.
S7 · CLAUDE WARRANT MODE
Premium invocation requires a warrant
artifact ID + reason + cap + fallback + expected deliverable. The herald reads the warrant. The feast acknowledges the cost. This changes the cultural contract around premium use.
S8 · COST-PER-ARTIFACT PAGES
Make the cost visible at artifact level
What did this TRB cost? This ARB? This brief? This session? This cron family? Artifact-level cost ledger: artifact ID, lane, local/external split, token totals, dollar cost, value class.
S9 · LOCAL PRESTIGE CAMPAIGN
Make local wins socially high-status
Celebrate local wins. Track "sovereign completions." Leaderboard of avoided external spend. When qwen3 closes an ARB, the fleet notes it. Culture must change faster than the router.
S10 · NEGATIVE-SPACE METRICS
Count what you didn't spend
Local substitution count. Avoided external token estimate. Lane sovereignty coverage %. Local resolution rate by task family. Prevented escalation count. Show cost avoided, not just cost burned.