OpenRouter Free Models Tracker

Track free OpenRouter models, compare scenario fit, and find practical options for development and assistant use.

OpenRouter Free Models Tracker

OpenRouter hosts a broad model ecosystem with many options, but the key issue is not just "there are free models", it's which free models fit this use case today.

This page is a practical tracker focused on:

Coding and engineering-related use
Roleplay / chat scenario prompts
JSON and structured output use cases
Long-context use cases

The list is generated from OpenRouter model metadata and filtered by free pricing (input and output). Data refresh can be scheduled by the project scheduler for automatic updates.

What is OpenRouter?

OpenRouter is a unified API gateway providing access to hundreds of AI models from OpenAI, Anthropic, Google, Meta, Mistral, and open-source alternatives through a single standardized interface.

Instead of integrating each provider separately, developers use one API key and one endpoint. OpenRouter handles routing, fallbacks, and billing. This makes it especially useful for:

Comparing model quality and cost across providers in one call
Building cost-optimized pipelines that route to free or cheap models first
Accessing models only available through specific providers
Experimenting with new models without separate API key provisioning

Understanding Free Models on OpenRouter

Free models on OpenRouter have both input and output pricing set to 0. This is not a permanent guarantee — providers may change pricing, rate limits, or retire free tiers at any time.

Why Free Models Matter for Real Projects

Free models are not just for hobbyists. Many production scenarios benefit from free-tier routing:

Development and testing: Run CI pipelines and evaluation scripts without burning token budgets.
Long-tail queries: Route low-value or repetitive queries to free models, reserving paid capacity for high-stakes requests.
Fallback chains: Configure a primary paid model with a free fallback for rate-limit or downtime scenarios.
Research and benchmarking: Compare multiple models on the same dataset without cost constraints.

Rate Limits and Fair Use

Free models typically carry rate limits — requests per minute, tokens per day, or concurrent connections. These limits vary by provider and model. Verify limits directly on the OpenRouter model page before building a production dependency.

How to Access These Models

All free models are accessible through the standard OpenRouter API. The base URL is https://openrouter.ai/api/v1 and the interface is compatible with the OpenAI SDK.

Basic Python setup:

from openai import OpenAI
client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="YOUR_OPENROUTER_API_KEY",
)
response = client.chat.completions.create(
    model="model-id-from-table-below",
    messages=[{"role": "user", "content": "Hello"}],
)

Filter free models via CLI:

curl https://openrouter.ai/api/v1/models | \
  jq '[.data[] | select(.pricing.prompt == "0" and .pricing.completion == "0")]'

Choosing the Right Free Model

For Coding Tasks

Look for high context length, tool-calling support, and JSON mode. Key benchmarks: HumanEval, MBPP, LiveCodeBench. Longer context lets the model see more of your codebase at once.

For JSON and Structured Output

Prioritize models that explicitly support JSON mode or tool calling. Test with complex nested schemas — some models produce syntactically valid but semantically incorrect JSON at depth.

For Long-Context Tasks

Context length in the table is the maximum supported input. Effective performance often degrades after 50–70% of the advertised window. Test with your actual document lengths before committing.

For Roleplay and Chat

Instruction-following quality and personality consistency matter most. Models trained with RLHF or DPO tend to produce more natural conversational responses.

Tips for Production Use

Pin model versions — use the full model ID including version suffix to avoid silent model swaps.
Implement fallbacks — free models can become rate-limited; build a fallback chain in your routing logic.
Cache responses — for repeated identical queries, application-layer caching eliminates redundant API calls.
Use streaming — streaming reduces time-to-first-token and improves perceived performance.
Monitor usage — even free models count toward your OpenRouter dashboard; track to spot unexpected spikes.

Related Resources

OpenRouter documentation — full API reference and model metadata schema
OpenRouter model rankings — community-voted quality scores per use case
MCP Hub — browse MCP servers that integrate with OpenRouter for agent workflows

Last update

2026-07-09T20:16:38.417Z

Coding

Model	Provider	Context Length	Price (prompt/completion)	Note
Qwen: Qwen3 Coder 480B A35B (free)	Unknown	1048576	0/0	Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is opt
Poolside: Laguna XS 2.1 (free)	Unknown	262144	0/0	Laguna XS 2.1 is the latest coding agent model in the 33B-A3B category from Poolside and a step
Poolside: Laguna M.1 (free)	Unknown	262144	0/0	Laguna M.1 is the flagship coding agent model from Poolside, optimized for complex software engi
Qwen: Qwen3 Next 80B A3B Instruct (free)	Unknown	262144	0/0	Qwen3-Next-80B-A3B-Instruct is an instruction-tuned chat model in the Qwen3-Next series optimized for fast, stable respo
Cohere: North Mini Code (free)	Unknown	256000	0/0	North Mini Code is Cohere's first agentic coding model and the debut of its North family. A sparse mixture-of-experts mo
NVIDIA: Nemotron 3 Nano 30B A3B (free)	Unknown	256000	0/0	NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers

Roleplay

Model	Provider	Context Length	Price (prompt/completion)	Note
Nous: Hermes 3 405B Instruct (free)	Unknown	131072	0/0	Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, m

JSON

Model	Provider	Context Length	Price (prompt/completion)	Note
Qwen: Qwen3 Coder 480B A35B (free)	Unknown	1048576	0/0	Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is opt
Poolside: Laguna M.1 (free)	Unknown	262144	0/0	Laguna M.1 is the flagship coding agent model from Poolside, optimized for complex software engi
Google: Gemma 4 31B (free)	Unknown	262144	0/0	Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output.
NVIDIA: Nemotron 3 Nano Omni (free)	Unknown	256000	0/0	NVIDIA Nemotron™ 3 Nano Omni is a 30B-A3B open multimodal model designed to function as a perception and context sub-age

Long Context

Model	Provider	Context Length	Price (prompt/completion)	Note
Qwen: Qwen3 Coder 480B A35B (free)	Unknown	1048576	0/0	Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is opt
NVIDIA: Nemotron 3 Ultra (free)	Unknown	1000000	0/0	NVIDIA Nemotron 3 Ultra is an open frontier-reasoning and orchestration model from NVIDIA, with 55B active parameters ou
Tencent: Hy3 (free)	Unknown	262144	0/0	Hy3 is a 295B-parameter Mixture-of-Experts model from Tencent (21B active, 192 experts with top-8 routing) built for rea
Poolside: Laguna M.1 (free)	Unknown	262144	0/0	Laguna M.1 is the flagship coding agent model from Poolside, optimized for complex software engi
Google: Gemma 4 31B (free)	Unknown	262144	0/0	Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output.
Qwen: Qwen3 Next 80B A3B Instruct (free)	Unknown	262144	0/0	Qwen3-Next-80B-A3B-Instruct is an instruction-tuned chat model in the Qwen3-Next series optimized for fast, stable respo
NVIDIA: Nemotron 3 Nano Omni (free)	Unknown	256000	0/0	NVIDIA Nemotron™ 3 Nano Omni is a 30B-A3B open multimodal model designed to function as a perception and context sub-age
OpenAI: gpt-oss-120b (free)	Unknown	131072	0/0	gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-rea
Meta: Llama 3.2 3B Instruct (free)	Unknown	131072	0/0	Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language process
Nous: Hermes 3 405B Instruct (free)	Unknown	131072	0/0	Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, m
NVIDIA: Nemotron Nano 12B 2 VL (free)	Unknown	128000	0/0	NVIDIA Nemotron Nano 2 VL is a 12-billion-parameter open multimodal reasoning model designed for video understanding and
NVIDIA: Nemotron Nano 9B V2 (free)	Unknown	128000	0/0	NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified mod
LiquidAI: LFM2.5-1.2B-Thinking (free)	Unknown	32768	0/0	LFM2.5-1.2B-Thinking is a lightweight reasoning-focused model optimized for agentic tasks, data extraction, and RAG—whil