Skip to main content
This page documents Astrolabe routing behavior as implemented in server.js. Last updated: February 25, 2026

End-to-end routing pipeline

For each POST /v1/chat/completions request, Astrolabe executes this pipeline:
  1. Detect conversation features (hasToolsDeclared, toolMessages, hasMultimodal, approximate tokens, etc.).
  2. Apply inbound auth check and optional request rate limiter (applyRequestRateLimit).
  3. Run high-stakes safety gate detection (detectSafetyGate).
  4. Classify category/complexity (classifyRequest) using:
    • classifier candidate chain: CLASSIFIER_MODEL_KEY -> nano -> gemFlash -> grok -> m25 -> kimiK25 -> glm5
    • heuristic fallback when classifier output is invalid/unavailable.
  5. Apply routing profile complexity adjustment (applyRoutingProfile).
  6. Resolve base category route (resolveCategoryRoute).
  7. Apply cost guardrails (applyCostGuardrails), including strict target routing when enabled.
  8. Build candidate model list (buildCandidatesForRoute) with modality-aware fallback behavior.
  9. Execute upstream request with retryable fallback (callWithModelCandidates).
  10. If non-streaming and not forced-model mode, run self-check (runSelfCheck) using:
  • self-check candidate chain: SELF_CHECK_MODEL_KEY -> nano -> gemFlash -> grok -> m25 -> kimiK25 -> glm5
  1. If low confidence and escalation conditions are met, escalate once (buildEscalationTarget) and re-run.
  2. Return response with routing headers (x-astrolabe-*).
Forced-model mode (ASTROLABE_FORCE_MODEL) bypasses classifier routing and self-check escalation.

Base route matrix (before cost guardrails)

CategorySimpleStandardComplexCritical
heartbeatnanogrokm25m25
core_loopgrokm25m25opus
retrievalnanom25m25opus
summarizationnanom25gem31Proopus
planninggrokm25m25opus
orchestrationgrokm25m25opus
codingdsCoderm25m25opus
researchgrokm25m25opus
creativegrokm25m25opus
communicationgrokm25m25opus
reflectiongrokm25m25opus
high_stakesopus (or sonnet floor only when ASTROLABE_ALLOW_HIGH_STAKES_BUDGET_FLOOR=true and ASTROLABE_ROUTING_PROFILE=budget)same as simplesame as simplesame as simple

Routing profile adjustment

applyRoutingProfile adjusts complexity before route matrix lookup:
ASTROLABE_ROUTING_PROFILEAdjustment rule
qualityShift complexity up by one step (simple -> standard -> complex -> critical)
balancedNo complexity shift
budgetShift complexity down by one step only for lower-risk categories (injection risk below MEDIUM-HIGH)
In current policy, budget downshift can apply to categories like heartbeat, summarization, creative, communication, reflection, but not high-risk categories such as core_loop, orchestration, coding, high_stakes.

Strict guardrail rules (ASTROLABE_COST_EFFICIENCY_MODE=strict)

ScopeConditionTarget modelRule intent
Onboarding/social textisOnboardingLikeRequest and no toolsgrokKeep setup/chit-chat on budget model
standard + multimodalhasMultimodal=truekimiK25Multimodal specialist default
complex/critical + multimodalhasMultimodal=true and approxTokens < 30000kimiK25Multimodal specialist default
complex/critical + multimodal very longhasMultimodal=true and approxTokens >= 30000gem31ProVery-long-context multimodal specialist
standard light tool core loop/orchestrationcategory in core_loop/orchestration, tools present, approxTokens <= 3000, toolMessages <= 2grokCheap route for light tool-use
coding specialist promotioncategory coding, approxTokens >= 8000, architecture signal regex matchglm5Large-context architecture-heavy coding
research/planning/reflection specialist promotioncategory in research/planning/reflection, approxTokens >= 12000, deep-analysis signal regex matchglm5Deep comparative/citation-heavy analysis
complex non-specialistno multimodal/specialist triggerm25Complex text default
critical non-high-stakes non-specialistno multimodal/specialist triggerm25Cap critical non-high-stakes away from direct opus
Simple heartbeatcategory heartbeatnanoPin routine heartbeat
Simple retrievalcategory retrievalnanoPin routine lookup
Simple summarization text-onlycategory summarization, no multimodalnanoPin routine short summarization
Simple summarization multimodalcategory summarization, multimodalkimiK25Routine multimodal summarization
Simple codingcategory coding; if tool chatter then grok, else dsCoderdsCoder or grokCheap coding starter
Other simple categoriesall other simple non-high-stakesgrokDefault strict budget model

Premium blocking rules

When ASTROLABE_ALLOW_DIRECT_PREMIUM_MODELS=false for non-high-stakes:
  1. Direct opus routes are downgraded:
    • if adjusted complexity is critical: downgrade to value tier (m25)
    • otherwise: downgrade to grok
  2. Direct sonnet routes are downgraded to grok when:
    • adjusted complexity is simple or standard, or
    • strict mode + short prompt + no tools + no long context + no multimodal.
  3. Strict mode only: gem31Pro can be downgraded to grok for short, non-tool, non-multimodal conversational requests.

Fallback chains

Astrolabe first tries the selected route model, then falls back by key order:
const MODEL_FALLBACKS = {
  nano: ["grok", "m25", "dsCoder", "kimiK25", "glm5", "gemFlash", "sonnet"],
  dsCoder: ["grok", "m25", "glm5", "kimiK25", "gemFlash", "sonnet"],
  gemFlash: ["grok", "m25", "kimiK25", "glm5", "sonnet", "opus"],
  grok: ["nano", "m25", "kimiK25", "glm5", "gemFlash", "sonnet"],
  gem31Pro: ["kimiK25", "grok", "m25", "glm5", "sonnet", "opus"],
  m25: ["glm5", "kimiK25", "sonnet", "gem31Pro", "grok", "opus"],
  kimiK25: ["gem31Pro", "grok", "nano", "m25", "sonnet", "opus"],
  glm5: ["m25", "grok", "kimiK25", "gem31Pro", "sonnet", "opus"],
  sonnet: ["m25", "glm5", "kimiK25", "grok", "gem31Pro", "opus"],
  opus: ["sonnet", "m25", "glm5", "kimiK25"]
};
When hasMultimodal=true, Astrolabe restricts to this multimodal-safe subset:
const MULTIMODAL_FALLBACK_KEYS = ["kimiK25", "gem31Pro", "grok", "nano", "sonnet", "opus"];

Escalation logic

Self-check escalation is one-step maximum.

Escalate decision (shouldEscalateFromSelfCheck)

  1. Never escalate when forced model is active.
  2. Never escalate when score >= 4.
  3. Always escalate when score <= 1.
  4. Always escalate for high_stakes category.
  5. In strict mode, escalate for complex/critical even when score is 2-3.

Escalation target (buildEscalationTarget)

Default path map:
const ESCALATION_PATH = {
  nano: "grok",
  dsCoder: "m25",
  gemFlash: "grok",
  grok: "m25",
  gem31Pro: "m25",
  m25: "sonnet",
  kimiK25: "sonnet",
  glm5: "sonnet",
  sonnet: "opus",
  opus: null
};
Category-aware m25 escalation override (m25EscalationTarget):
  1. Multimodal: m25 -> kimiK25, or m25 -> gem31Pro when approxTokens >= 30000.
  2. Coding specialist condition: m25 -> glm5 when coding + approxTokens >= 8000 + architecture signals.
  3. Research/planning/reflection specialist condition: m25 -> glm5 when approxTokens >= 12000 + deep-analysis signals.
  4. Otherwise: m25 -> sonnet.
High-confidence failure shortcut:
  1. If score <= 1, non-strict modes jump to opus.
  2. Strict mode keeps non-high-stakes and non-critical escalation cost-aware (m25/specialist path) before premium.

Config-to-logic map

VariableAffects StageDefaultEffectInteractions
ASTROLABE_ROUTING_PROFILEComplexity adjustmentbudgetApplies budget/balanced/quality complexity shiftInteracts with category injection-risk gates
ASTROLABE_COST_EFFICIENCY_MODEGuardrail layer + escalation strictnessstrictControls strict target routing and strict escalation postureoff bypasses guardrails; strict enables threshold rules
ASTROLABE_ALLOW_DIRECT_PREMIUM_MODELSGuardrail premium-block layerfalseBlocks direct non-high-stakes sonnet/opus startsWorks after base route and strict targeting
ASTROLABE_ENABLE_SAFETY_GATESafety pre-classificationtrueEnables/disables high-stakes signal trigger pathIf disabled, high-stakes depends on classifier/heuristics only
ASTROLABE_HIGH_STAKES_CONFIRM_MODEHigh-stakes execution safetypromptprompt injects policy, strict requires confirmation token, off disables confirm handlingUses ASTROLABE_HIGH_STAKES_CONFIRM_TOKEN in strict mode
ASTROLABE_HIGH_STAKES_CONFIRM_TOKENHigh-stakes confirmation checkconfirmRequired literal token in header/body when strict mode is activeIgnored unless confirm mode is strict
ASTROLABE_ALLOW_HIGH_STAKES_BUDGET_FLOORHigh-stakes base routefalseAllows sonnet floor instead of always opus in budget profileRequires ASTROLABE_ROUTING_PROFILE=budget
ASTROLABE_CLASSIFIER_MODEL_KEYClassification model selectionnanoPreferred classifier model key; fallback chain still appliesUnknown keys are skipped; built-in chain remains
ASTROLABE_SELF_CHECK_MODEL_KEYSelf-check model selectionnanoPreferred self-check model key; fallback chain still appliesUnknown keys are skipped; built-in chain remains
ASTROLABE_CONTEXT_MESSAGESClassifier context construction8Max recent messages included in classifier promptClamped to 3-20
ASTROLABE_CONTEXT_CHARSClassifier context construction2500Max recent context characters for classifier promptClamped to 600-12000
ASTROLABE_RATE_LIMIT_ENABLEDRequest admission gatefalseEnables in-memory request limiter for chat completions endpointIf enabled, over-budget requests return 429 rate_limit_exceeded
ASTROLABE_RATE_LIMIT_WINDOW_MSRequest admission gate60000Rate-limit window duration in msClamped to 1000-3600000
ASTROLABE_RATE_LIMIT_MAX_REQUESTSRequest admission gate120Max requests per key inside each windowClamped to 1-100000
ASTROLABE_FORCE_MODELFull routing overrideemptyBypasses routing/classifier/escalation and pins one upstream model idSets initial/final model to forced id

Worked examples

  1. Simple retrieval
    • Input shape: short “find X” lookup, no tools, no multimodal.
    • Outcome: retrieval/simple -> nano (strict guardrail keeps nano).
  2. Standard planning, text-only
    • Input shape: multi-step planning request, no multimodal.
    • Outcome: planning/standard -> m25.
  3. Standard core loop with light tools
    • Input shape: tool-enabled core loop, <=2 tool messages, <=3000 tokens.
    • Outcome: base route m25, strict guardrail retargets to grok.
  4. Complex multimodal long context
    • Input shape: multimodal request with approx context >=30000 tokens.
    • Outcome: strict multimodal rule promotes to gem31Pro (instead of kimiK25).
  5. High-stakes critical action
    • Input shape: transfer/delete/legal-action intent with strong safety signals.
    • Outcome: category forced to high_stakes and routed to opus (or sonnet floor only if budget-floor override is enabled); strict confirm mode can require explicit token.