Skip to main content
This page is the source of truth for Astrolabe’s model roster posture and benchmark interpretation. Last updated: February 25, 2026

Snapshot and source policy

  1. Runtime availability, context, and modality data are taken from the OpenRouter models API snapshot on February 25, 2026.
  2. Cost values are documented from Astrolabe MODELS in server.js and cross-checked against OpenRouter metadata.
  3. Benchmarks use a mixed policy:
    • Direct: exact model/version evidence exists
    • Family: nearest family/version proxy
    • Vendor: vendor/model-card self-reported claim
  4. Missing exact benchmark rows are shown as N/A instead of inferred values.

Runtime model matrix

KeyModel IDProvider NameTierInput $/1MOutput $/1MContextInput ModalitiesTools/Reasoning SupportPrimary Routing Role
opusanthropic/claude-opus-4.6Anthropic: Claude Opus 4.6PREMIUM5.0025.001,000,000text, imageYesHigh-stakes floor and final escalation
sonnetanthropic/claude-sonnet-4.6Anthropic: Claude Sonnet 4.6STANDARD3.0015.001,000,000text, imageYesEscalation layer and optional high-stakes budget floor
m25minimax/minimax-m2.5MiniMax: MiniMax M2.5VALUE0.301.10196,608textYesPrimary standard/complex text workhorse
kimiK25moonshotai/kimi-k2.5MoonshotAI: Kimi K2.5VALUE0.452.20262,144text, imageYesMultimodal specialist default
glm5z-ai/glm-5Z.ai: GLM 5VALUE0.952.55204,800textYesLarge-context engineering and deep text-heavy specialist
grokx-ai/grok-4.1-fastxAI: Grok 4.1 FastBUDGET0.200.502,000,000text, imageYesConversational and light-tool budget route
nanoopenai/gpt-5-nanoOpenAI: GPT-5 NanoULTRA-CHEAP0.050.40400,000text, image, fileYesClassifier/self-check default and trivial simple routes
dsCoderdeepseek/deepseek-v3.2DeepSeek: DeepSeek V3.2ULTRA-CHEAP0.250.40163,840textYesSimple coding starter route
gemFlashgoogle/gemini-3-flash-previewGoogle: Gemini 3 Flash PreviewMID-TIER0.503.001,048,576text, image, file, audio, videoYesConditional fallback/specialist path, not cheap-first default
gem31Progoogle/gemini-3.1-pro-previewGoogle: Gemini 3.1 Pro PreviewMID-TIER2.0012.001,048,576text, image, file, audio, videoYesVery-long-context multimodal specialist/escalation

Benchmark signal matrix

KeySWE-bench signalBFCL signalOther signalConfidenceNotes
m25mini-SWE-agent + MiniMax M2.5 (high reasoning) = 75.8% resolvedN/A exact row in current BFCL overall snapshotMiniMax launch claims: 80.2 SWE-bench Verified, 76.9 BFCL multi-turn, 76.3 BrowseCompVendor + FamilyStrong directional signal; treat vendor claims as non-independent
grokN/A found on SWE-bench board snapshotGrok-4-1-fast-reasoning (FC) rank 5, overall 69.57%, multi-turn 58.87%OpenRouter metadata shows low cost and 2M contextDirect BFCL (family SWE unavailable)Strong tool-calling signal; no direct SWE row in snapshot
kimiK25mini-SWE-agent + Kimi K2.5 (high reasoning) = 70.8%Nearest BFCL row: Moonshotai-Kimi-K2-Instruct (FC)Kimi model-card evidence availableDirect SWE + Family BFCLUse as multimodal specialist; BFCL proxy is K2 family
glm5mini-SWE-agent + GLM-5 (high reasoning) = 72.8%Nearest BFCL row: GLM-4.6 (FC thinking) rank 4, overall 72.38%, multi-turn 68.00%GLM model-card evidence availableDirect SWE + Family BFCLGLM-5 BFCL exact row unavailable in snapshot
gem31Promini-SWE-agent + Gemini 3 Pro Preview = 74.2% (live-SWE-agent row 77.4%)Gemini-3-Pro-Preview rows (Prompt/FC)Large multimodal context support from OpenRouterFamilyStrong but expensive; used conditionally
sonnetNearest rows: Claude 4.5 Sonnet cluster on SWE-benchClaude-Sonnet-4-5-20250929 (FC) rank 2, overall 73.24%, multi-turn 61.37%High quality but materially higher costFamilyEscalation-focused in Astrolabe policy
opusIncludes mini-SWE-agent + Claude Opus 4.6 = 75.6% plus higher 4.5 Opus cluster entriesClaude-Opus-4-5-20251101 (FC) rank 1, overall 77.47%, multi-turn 68.38%Highest-cost tier in rosterDirect/Familial mixedReserved for high-stakes or final escalation
nanomini-SWE-agent + GPT-5 nano (2025-08-07) = 34.8%GPT-5-nano-2025-08-07 (FC) rank 24, overall 51.45%, multi-turn 34.50%Lowest-cost classifier/self-check roleFamilyKept away from hard reasoning/tool chains
dsCodermini-SWE-agent + DeepSeek V3.2 (high reasoning) = 70.0%DeepSeek-V3.2-Exp rows (FC/Prompt+Thinking)Very low cost for simple coding startsFamilyEscalates quickly for non-trivial tasks
gemFlashmini-SWE-agent + Gemini 3 Flash (high reasoning) = 75.8%N/A direct row found for Gemini 3 Flash in BFCL snapshotMultimodal input breadth but higher output cost than budget/value modelsFamily/PartialFallback/specialist, not cheap-first route

Benchmark row mapping used in this page

  • m25: SWE signal mini-SWE-agent + MiniMax M2.5 (high reasoning) 75.8; vendor claims from MiniMax launch post (80.2 SWE-bench Verified, 76.3 BrowseComp, 76.9 BFCL multi-turn) with confidence Vendor + Family.
  • grok: BFCL signal Grok-4-1-fast-reasoning (FC); SWE signal N/A found on SWE-bench board snapshot; confidence Direct BFCL / no SWE.
  • kimiK25: SWE signal mini-SWE-agent + Kimi K2.5 (high reasoning) 70.8; BFCL nearest Kimi K2 family entry; confidence Direct SWE + Family BFCL.
  • glm5: SWE signal mini-SWE-agent + GLM-5 (high reasoning) 72.8; BFCL nearest GLM-4.6 (FC thinking); confidence Direct SWE + Family BFCL.
  • gem31Pro: SWE signal mini-SWE-agent + Gemini 3 Pro Preview 74.2 (plus live-SWE-agent 77.4); BFCL Gemini-3-Pro-Preview (FC/Prompt); confidence Family.
  • sonnet: SWE nearest Claude 4.5 Sonnet entries; BFCL Claude-Sonnet-4-5 entry; confidence Family.
  • opus: SWE includes mini-SWE-agent + Claude Opus 4.6 75.6; BFCL nearest Claude-Opus-4-5; confidence Direct/Familial mixed.
  • nano: SWE GPT-5 nano row; BFCL GPT-5-nano-2025-08-07; confidence Family.
  • dsCoder: SWE DeepSeek V3.2 row; BFCL DeepSeek-V3.2-Exp rows; confidence Family.
  • gemFlash: SWE Gemini 3 Flash row; BFCL direct entry not found in snapshot; confidence Family/Partial.

Interpretation note

Benchmark rows are not apples-to-apples because they vary by agent harness, tool scaffolding, prompt strategy, and model version/date. Astrolabe routing should stay aligned to runtime outcomes (cost-first with specialist promotion rules), not single-board rank ordering.

Sources