STE-6 is the EOSE evaluation standard. 6 dimensions derived from the Canon. Frontier models score 0/6 by architecture — they cannot verify against ground truth (γ₁), cannot self-audit (LSOS), cannot safely escalate (FEP), and emergent behaviour is suppressed (FOF). EOSE scores 6/6 by design.
Standard benchmarks for fleet models vs frontier. These are public benchmark scores — MMLU, HumanEval, GSM8K, MATH, ARC-Challenge.