OpenRouter Free Models Tracker
Track free OpenRouter models, compare scenario fit, and find practical options for development and assistant use.
OpenRouter Free Models Tracker
OpenRouter hosts a broad model ecosystem with many options, but the key issue is not just "there are free models", it's which free models fit this use case today.
This page is a practical tracker focused on:
- Coding and engineering-related use
- Roleplay / chat scenario prompts
- JSON and structured output use cases
- Long-context use cases
The list is generated from OpenRouter model metadata and filtered by free pricing (input and output). Data refresh can be scheduled by the project scheduler for automatic updates.
What is OpenRouter?
OpenRouter is a unified API gateway providing access to hundreds of AI models from OpenAI, Anthropic, Google, Meta, Mistral, and open-source alternatives through a single standardized interface.
Instead of integrating each provider separately, developers use one API key and one endpoint. OpenRouter handles routing, fallbacks, and billing. This makes it especially useful for:
- Comparing model quality and cost across providers in one call
- Building cost-optimized pipelines that route to free or cheap models first
- Accessing models only available through specific providers
- Experimenting with new models without separate API key provisioning
Understanding Free Models on OpenRouter
Free models on OpenRouter have both input and output pricing set to 0. This is not a permanent guarantee — providers may change pricing, rate limits, or retire free tiers at any time.
Why Free Models Matter for Real Projects
Free models are not just for hobbyists. Many production scenarios benefit from free-tier routing:
- Development and testing: Run CI pipelines and evaluation scripts without burning token budgets.
- Long-tail queries: Route low-value or repetitive queries to free models, reserving paid capacity for high-stakes requests.
- Fallback chains: Configure a primary paid model with a free fallback for rate-limit or downtime scenarios.
- Research and benchmarking: Compare multiple models on the same dataset without cost constraints.
Rate Limits and Fair Use
Free models typically carry rate limits — requests per minute, tokens per day, or concurrent connections. These limits vary by provider and model. Verify limits directly on the OpenRouter model page before building a production dependency.
How to Access These Models
All free models are accessible through the standard OpenRouter API. The base URL is https://openrouter.ai/api/v1 and the interface is compatible with the OpenAI SDK.
Basic Python setup:
from openai import OpenAI
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="YOUR_OPENROUTER_API_KEY",
)
response = client.chat.completions.create(
model="model-id-from-table-below",
messages=[{"role": "user", "content": "Hello"}],
)
Filter free models via CLI:
curl https://openrouter.ai/api/v1/models | \
jq '[.data[] | select(.pricing.prompt == "0" and .pricing.completion == "0")]'
Choosing the Right Free Model
For Coding Tasks
Look for high context length, tool-calling support, and JSON mode. Key benchmarks: HumanEval, MBPP, LiveCodeBench. Longer context lets the model see more of your codebase at once.
For JSON and Structured Output
Prioritize models that explicitly support JSON mode or tool calling. Test with complex nested schemas — some models produce syntactically valid but semantically incorrect JSON at depth.
For Long-Context Tasks
Context length in the table is the maximum supported input. Effective performance often degrades after 50–70% of the advertised window. Test with your actual document lengths before committing.
For Roleplay and Chat
Instruction-following quality and personality consistency matter most. Models trained with RLHF or DPO tend to produce more natural conversational responses.
Tips for Production Use
- Pin model versions — use the full model ID including version suffix to avoid silent model swaps.
- Implement fallbacks — free models can become rate-limited; build a fallback chain in your routing logic.
- Cache responses — for repeated identical queries, application-layer caching eliminates redundant API calls.
- Use streaming — streaming reduces time-to-first-token and improves perceived performance.
- Monitor usage — even free models count toward your OpenRouter dashboard; track to spot unexpected spikes.
Related Resources
- OpenRouter documentation — full API reference and model metadata schema
- OpenRouter model rankings — community-voted quality scores per use case
- MCP Hub — browse MCP servers that integrate with OpenRouter for agent workflows
Last update
2026-06-09T20:04:25.354Z
Coding
| Model | Provider | Context Length | Price (prompt/completion) | Note |
|---|---|---|---|---|
| Owl Alpha | Unknown | 1048756 | 0/0 | Owl Alpha is a high-performance foundation model designed for agentic workloads. Natively supports tool use, and long-co |
| Qwen: Qwen3 Coder 480B A35B (free) | Unknown | 1048576 | 0/0 | Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is opt |
| Poolside: Laguna XS.2 (free) | Unknown | 262144 | 0/0 | Laguna XS.2 is the second-generation model in the XS size class from Poolside, their efficient co |
| Poolside: Laguna M.1 (free) | Unknown | 262144 | 0/0 | Laguna M.1 is the flagship coding agent model from Poolside, optimized for complex software engin |
| MoonshotAI: Kimi K2.6 (free) | Unknown | 262144 | 0/0 | Kimi K2.6 is Moonshot AI's next-generation multimodal model, designed for long-horizon coding, coding-driven UI/UX gener |
| Qwen: Qwen3 Next 80B A3B Instruct (free) | Unknown | 262144 | 0/0 | Qwen3-Next-80B-A3B-Instruct is an instruction-tuned chat model in the Qwen3-Next series optimized for fast, stable respo |
| NVIDIA: Nemotron 3 Nano 30B A3B (free) | Unknown | 256000 | 0/0 | NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers |
Roleplay
| Model | Provider | Context Length | Price (prompt/completion) | Note |
|---|---|---|---|---|
| Nous: Hermes 3 405B Instruct (free) | Unknown | 131072 | 0/0 | Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, m |
JSON
| Model | Provider | Context Length | Price (prompt/completion) | Note |
|---|---|---|---|---|
| Qwen: Qwen3 Coder 480B A35B (free) | Unknown | 1048576 | 0/0 | Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is opt |
| Poolside: Laguna XS.2 (free) | Unknown | 262144 | 0/0 | Laguna XS.2 is the second-generation model in the XS size class from Poolside, their efficient co |
| Poolside: Laguna M.1 (free) | Unknown | 262144 | 0/0 | Laguna M.1 is the flagship coding agent model from Poolside, optimized for complex software engin |
| Google: Gemma 4 31B (free) | Unknown | 262144 | 0/0 | Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. |
| NVIDIA: Nemotron 3 Nano Omni (free) | Unknown | 256000 | 0/0 | NVIDIA Nemotron™ 3 Nano Omni is a 30B-A3B open multimodal model designed to function as a perception and context sub-age |
Long Context
| Model | Provider | Context Length | Price (prompt/completion) | Note |
|---|---|---|---|---|
| Owl Alpha | Unknown | 1048756 | 0/0 | Owl Alpha is a high-performance foundation model designed for agentic workloads. Natively supports tool use, and long-co |
| Qwen: Qwen3 Coder 480B A35B (free) | Unknown | 1048576 | 0/0 | Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is opt |
| NVIDIA: Nemotron 3 Ultra (free) | Unknown | 1000000 | 0/0 | NVIDIA Nemotron 3 Ultra is an open frontier-reasoning and orchestration model from NVIDIA, with 55B active parameters ou |
| Poolside: Laguna XS.2 (free) | Unknown | 262144 | 0/0 | Laguna XS.2 is the second-generation model in the XS size class from Poolside, their efficient co |
| Poolside: Laguna M.1 (free) | Unknown | 262144 | 0/0 | Laguna M.1 is the flagship coding agent model from Poolside, optimized for complex software engin |
| MoonshotAI: Kimi K2.6 (free) | Unknown | 262144 | 0/0 | Kimi K2.6 is Moonshot AI's next-generation multimodal model, designed for long-horizon coding, coding-driven UI/UX gener |
| Google: Gemma 4 31B (free) | Unknown | 262144 | 0/0 | Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. |
| Qwen: Qwen3 Next 80B A3B Instruct (free) | Unknown | 262144 | 0/0 | Qwen3-Next-80B-A3B-Instruct is an instruction-tuned chat model in the Qwen3-Next series optimized for fast, stable respo |
| NVIDIA: Nemotron 3 Nano Omni (free) | Unknown | 256000 | 0/0 | NVIDIA Nemotron™ 3 Nano Omni is a 30B-A3B open multimodal model designed to function as a perception and context sub-age |
| OpenAI: gpt-oss-120b (free) | Unknown | 131072 | 0/0 | gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-rea |
| Meta: Llama 3.2 3B Instruct (free) | Unknown | 131072 | 0/0 | Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language process |
| Nous: Hermes 3 405B Instruct (free) | Unknown | 131072 | 0/0 | Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, m |
| NVIDIA: Nemotron Nano 12B 2 VL (free) | Unknown | 128000 | 0/0 | NVIDIA Nemotron Nano 2 VL is a 12-billion-parameter open multimodal reasoning model designed for video understanding and |
| NVIDIA: Nemotron Nano 9B V2 (free) | Unknown | 128000 | 0/0 | NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified mod |
| LiquidAI: LFM2.5-1.2B-Thinking (free) | Unknown | 32768 | 0/0 | LFM2.5-1.2B-Thinking is a lightweight reasoning-focused model optimized for agentic tasks, data extraction, and RAG—whil |
