server.js.
Last updated: February 25, 2026
End-to-end routing pipeline
For eachPOST /v1/chat/completions request, Astrolabe executes this pipeline:
- Detect conversation features (
hasToolsDeclared,toolMessages,hasMultimodal, approximate tokens, etc.). - Apply inbound auth check and optional request rate limiter (
applyRequestRateLimit). - Run high-stakes safety gate detection (
detectSafetyGate). - Classify category/complexity (
classifyRequest) using:- classifier candidate chain:
CLASSIFIER_MODEL_KEY -> nano -> gemFlash -> grok -> m25 -> kimiK25 -> glm5 - heuristic fallback when classifier output is invalid/unavailable.
- classifier candidate chain:
- Apply routing profile complexity adjustment (
applyRoutingProfile). - Resolve base category route (
resolveCategoryRoute). - Apply cost guardrails (
applyCostGuardrails), including strict target routing when enabled. - Build candidate model list (
buildCandidatesForRoute) with modality-aware fallback behavior. - Execute upstream request with retryable fallback (
callWithModelCandidates). - If non-streaming and not forced-model mode, run self-check (
runSelfCheck) using:
- self-check candidate chain:
SELF_CHECK_MODEL_KEY -> nano -> gemFlash -> grok -> m25 -> kimiK25 -> glm5
- If low confidence and escalation conditions are met, escalate once (
buildEscalationTarget) and re-run. - Return response with routing headers (
x-astrolabe-*).
ASTROLABE_FORCE_MODEL) bypasses classifier routing and self-check escalation.
Base route matrix (before cost guardrails)
| Category | Simple | Standard | Complex | Critical |
|---|---|---|---|---|
heartbeat | nano | grok | m25 | m25 |
core_loop | grok | m25 | m25 | opus |
retrieval | nano | m25 | m25 | opus |
summarization | nano | m25 | gem31Pro | opus |
planning | grok | m25 | m25 | opus |
orchestration | grok | m25 | m25 | opus |
coding | dsCoder | m25 | m25 | opus |
research | grok | m25 | m25 | opus |
creative | grok | m25 | m25 | opus |
communication | grok | m25 | m25 | opus |
reflection | grok | m25 | m25 | opus |
high_stakes | opus (or sonnet floor only when ASTROLABE_ALLOW_HIGH_STAKES_BUDGET_FLOOR=true and ASTROLABE_ROUTING_PROFILE=budget) | same as simple | same as simple | same as simple |
Routing profile adjustment
applyRoutingProfile adjusts complexity before route matrix lookup:
ASTROLABE_ROUTING_PROFILE | Adjustment rule |
|---|---|
quality | Shift complexity up by one step (simple -> standard -> complex -> critical) |
balanced | No complexity shift |
budget | Shift complexity down by one step only for lower-risk categories (injection risk below MEDIUM-HIGH) |
heartbeat, summarization, creative, communication, reflection, but not high-risk categories such as core_loop, orchestration, coding, high_stakes.
Strict guardrail rules (ASTROLABE_COST_EFFICIENCY_MODE=strict)
| Scope | Condition | Target model | Rule intent |
|---|---|---|---|
| Onboarding/social text | isOnboardingLikeRequest and no tools | grok | Keep setup/chit-chat on budget model |
standard + multimodal | hasMultimodal=true | kimiK25 | Multimodal specialist default |
complex/critical + multimodal | hasMultimodal=true and approxTokens < 30000 | kimiK25 | Multimodal specialist default |
complex/critical + multimodal very long | hasMultimodal=true and approxTokens >= 30000 | gem31Pro | Very-long-context multimodal specialist |
standard light tool core loop/orchestration | category in core_loop/orchestration, tools present, approxTokens <= 3000, toolMessages <= 2 | grok | Cheap route for light tool-use |
coding specialist promotion | category coding, approxTokens >= 8000, architecture signal regex match | glm5 | Large-context architecture-heavy coding |
research/planning/reflection specialist promotion | category in research/planning/reflection, approxTokens >= 12000, deep-analysis signal regex match | glm5 | Deep comparative/citation-heavy analysis |
complex non-specialist | no multimodal/specialist trigger | m25 | Complex text default |
critical non-high-stakes non-specialist | no multimodal/specialist trigger | m25 | Cap critical non-high-stakes away from direct opus |
Simple heartbeat | category heartbeat | nano | Pin routine heartbeat |
Simple retrieval | category retrieval | nano | Pin routine lookup |
Simple summarization text-only | category summarization, no multimodal | nano | Pin routine short summarization |
Simple summarization multimodal | category summarization, multimodal | kimiK25 | Routine multimodal summarization |
Simple coding | category coding; if tool chatter then grok, else dsCoder | dsCoder or grok | Cheap coding starter |
| Other simple categories | all other simple non-high-stakes | grok | Default strict budget model |
Premium blocking rules
WhenASTROLABE_ALLOW_DIRECT_PREMIUM_MODELS=false for non-high-stakes:
- Direct
opusroutes are downgraded:- if adjusted complexity is
critical: downgrade to value tier (m25) - otherwise: downgrade to
grok
- if adjusted complexity is
- Direct
sonnetroutes are downgraded togrokwhen:- adjusted complexity is
simpleorstandard, or - strict mode + short prompt + no tools + no long context + no multimodal.
- adjusted complexity is
- Strict mode only:
gem31Procan be downgraded togrokfor short, non-tool, non-multimodal conversational requests.
Fallback chains
Astrolabe first tries the selected route model, then falls back by key order:hasMultimodal=true, Astrolabe restricts to this multimodal-safe subset:
Escalation logic
Self-check escalation is one-step maximum.Escalate decision (shouldEscalateFromSelfCheck)
- Never escalate when forced model is active.
- Never escalate when score
>= 4. - Always escalate when score
<= 1. - Always escalate for
high_stakescategory. - In strict mode, escalate for
complex/criticaleven when score is 2-3.
Escalation target (buildEscalationTarget)
Default path map:
m25 escalation override (m25EscalationTarget):
- Multimodal:
m25 -> kimiK25, orm25 -> gem31ProwhenapproxTokens >= 30000. - Coding specialist condition:
m25 -> glm5whencoding+approxTokens >= 8000+ architecture signals. - Research/planning/reflection specialist condition:
m25 -> glm5whenapproxTokens >= 12000+ deep-analysis signals. - Otherwise:
m25 -> sonnet.
- If score
<= 1, non-strict modes jump toopus. - Strict mode keeps non-high-stakes and non-critical escalation cost-aware (
m25/specialist path) before premium.
Config-to-logic map
| Variable | Affects Stage | Default | Effect | Interactions |
|---|---|---|---|---|
ASTROLABE_ROUTING_PROFILE | Complexity adjustment | budget | Applies budget/balanced/quality complexity shift | Interacts with category injection-risk gates |
ASTROLABE_COST_EFFICIENCY_MODE | Guardrail layer + escalation strictness | strict | Controls strict target routing and strict escalation posture | off bypasses guardrails; strict enables threshold rules |
ASTROLABE_ALLOW_DIRECT_PREMIUM_MODELS | Guardrail premium-block layer | false | Blocks direct non-high-stakes sonnet/opus starts | Works after base route and strict targeting |
ASTROLABE_ENABLE_SAFETY_GATE | Safety pre-classification | true | Enables/disables high-stakes signal trigger path | If disabled, high-stakes depends on classifier/heuristics only |
ASTROLABE_HIGH_STAKES_CONFIRM_MODE | High-stakes execution safety | prompt | prompt injects policy, strict requires confirmation token, off disables confirm handling | Uses ASTROLABE_HIGH_STAKES_CONFIRM_TOKEN in strict mode |
ASTROLABE_HIGH_STAKES_CONFIRM_TOKEN | High-stakes confirmation check | confirm | Required literal token in header/body when strict mode is active | Ignored unless confirm mode is strict |
ASTROLABE_ALLOW_HIGH_STAKES_BUDGET_FLOOR | High-stakes base route | false | Allows sonnet floor instead of always opus in budget profile | Requires ASTROLABE_ROUTING_PROFILE=budget |
ASTROLABE_CLASSIFIER_MODEL_KEY | Classification model selection | nano | Preferred classifier model key; fallback chain still applies | Unknown keys are skipped; built-in chain remains |
ASTROLABE_SELF_CHECK_MODEL_KEY | Self-check model selection | nano | Preferred self-check model key; fallback chain still applies | Unknown keys are skipped; built-in chain remains |
ASTROLABE_CONTEXT_MESSAGES | Classifier context construction | 8 | Max recent messages included in classifier prompt | Clamped to 3-20 |
ASTROLABE_CONTEXT_CHARS | Classifier context construction | 2500 | Max recent context characters for classifier prompt | Clamped to 600-12000 |
ASTROLABE_RATE_LIMIT_ENABLED | Request admission gate | false | Enables in-memory request limiter for chat completions endpoint | If enabled, over-budget requests return 429 rate_limit_exceeded |
ASTROLABE_RATE_LIMIT_WINDOW_MS | Request admission gate | 60000 | Rate-limit window duration in ms | Clamped to 1000-3600000 |
ASTROLABE_RATE_LIMIT_MAX_REQUESTS | Request admission gate | 120 | Max requests per key inside each window | Clamped to 1-100000 |
ASTROLABE_FORCE_MODEL | Full routing override | empty | Bypasses routing/classifier/escalation and pins one upstream model id | Sets initial/final model to forced id |
Worked examples
- Simple retrieval
- Input shape: short “find X” lookup, no tools, no multimodal.
- Outcome:
retrieval/simple -> nano(strict guardrail keepsnano).
- Standard planning, text-only
- Input shape: multi-step planning request, no multimodal.
- Outcome:
planning/standard -> m25.
- Standard core loop with light tools
- Input shape: tool-enabled core loop, <=2 tool messages, <=3000 tokens.
- Outcome: base route
m25, strict guardrail retargets togrok.
- Complex multimodal long context
- Input shape: multimodal request with approx context >=30000 tokens.
- Outcome: strict multimodal rule promotes to
gem31Pro(instead ofkimiK25).
- High-stakes critical action
- Input shape: transfer/delete/legal-action intent with strong safety signals.
- Outcome: category forced to
high_stakesand routed toopus(orsonnetfloor only if budget-floor override is enabled); strict confirm mode can require explicit token.

