OpenRouter Free Models Tracker

Track free OpenRouter models, compare scenario fit, and find practical options for development and assistant use.

OpenRouter Free Models Tracker

OpenRouter hosts a broad model ecosystem with many options, but the key issue is not just "there are free models", it's which free models fit this use case today.

This page is a practical tracker focused on:

  • Coding and engineering-related use
  • Roleplay / chat scenario prompts
  • JSON and structured output use cases
  • Long-context use cases

The list is generated from OpenRouter model metadata and filtered by free pricing (input and output). Data refresh can be scheduled by the project scheduler for automatic updates.

What is OpenRouter?

OpenRouter is a unified API gateway providing access to hundreds of AI models from OpenAI, Anthropic, Google, Meta, Mistral, and open-source alternatives through a single standardized interface.

Instead of integrating each provider separately, developers use one API key and one endpoint. OpenRouter handles routing, fallbacks, and billing. This makes it especially useful for:

  • Comparing model quality and cost across providers in one call
  • Building cost-optimized pipelines that route to free or cheap models first
  • Accessing models only available through specific providers
  • Experimenting with new models without separate API key provisioning

Understanding Free Models on OpenRouter

Free models on OpenRouter have both input and output pricing set to 0. This is not a permanent guarantee — providers may change pricing, rate limits, or retire free tiers at any time.

Why Free Models Matter for Real Projects

Free models are not just for hobbyists. Many production scenarios benefit from free-tier routing:

  • Development and testing: Run CI pipelines and evaluation scripts without burning token budgets.
  • Long-tail queries: Route low-value or repetitive queries to free models, reserving paid capacity for high-stakes requests.
  • Fallback chains: Configure a primary paid model with a free fallback for rate-limit or downtime scenarios.
  • Research and benchmarking: Compare multiple models on the same dataset without cost constraints.

Rate Limits and Fair Use

Free models typically carry rate limits — requests per minute, tokens per day, or concurrent connections. These limits vary by provider and model. Verify limits directly on the OpenRouter model page before building a production dependency.

How to Access These Models

All free models are accessible through the standard OpenRouter API. The base URL is https://openrouter.ai/api/v1 and the interface is compatible with the OpenAI SDK.

Basic Python setup:

from openai import OpenAI
client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="YOUR_OPENROUTER_API_KEY",
)
response = client.chat.completions.create(
    model="model-id-from-table-below",
    messages=[{"role": "user", "content": "Hello"}],
)

Filter free models via CLI:

curl https://openrouter.ai/api/v1/models | \
  jq '[.data[] | select(.pricing.prompt == "0" and .pricing.completion == "0")]'

Choosing the Right Free Model

For Coding Tasks

Look for high context length, tool-calling support, and JSON mode. Key benchmarks: HumanEval, MBPP, LiveCodeBench. Longer context lets the model see more of your codebase at once.

For JSON and Structured Output

Prioritize models that explicitly support JSON mode or tool calling. Test with complex nested schemas — some models produce syntactically valid but semantically incorrect JSON at depth.

For Long-Context Tasks

Context length in the table is the maximum supported input. Effective performance often degrades after 50–70% of the advertised window. Test with your actual document lengths before committing.

For Roleplay and Chat

Instruction-following quality and personality consistency matter most. Models trained with RLHF or DPO tend to produce more natural conversational responses.

Tips for Production Use

  1. Pin model versions — use the full model ID including version suffix to avoid silent model swaps.
  2. Implement fallbacks — free models can become rate-limited; build a fallback chain in your routing logic.
  3. Cache responses — for repeated identical queries, application-layer caching eliminates redundant API calls.
  4. Use streaming — streaming reduces time-to-first-token and improves perceived performance.
  5. Monitor usage — even free models count toward your OpenRouter dashboard; track to spot unexpected spikes.

Related Resources

Last update

2026-06-09T20:04:25.354Z

Coding

Model Provider Context Length Price (prompt/completion) Note
Owl Alpha Unknown 1048756 0/0 Owl Alpha is a high-performance foundation model designed for agentic workloads. Natively supports tool use, and long-co
Qwen: Qwen3 Coder 480B A35B (free) Unknown 1048576 0/0 Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is opt
Poolside: Laguna XS.2 (free) Unknown 262144 0/0 Laguna XS.2 is the second-generation model in the XS size class from Poolside, their efficient co
Poolside: Laguna M.1 (free) Unknown 262144 0/0 Laguna M.1 is the flagship coding agent model from Poolside, optimized for complex software engin
MoonshotAI: Kimi K2.6 (free) Unknown 262144 0/0 Kimi K2.6 is Moonshot AI's next-generation multimodal model, designed for long-horizon coding, coding-driven UI/UX gener
Qwen: Qwen3 Next 80B A3B Instruct (free) Unknown 262144 0/0 Qwen3-Next-80B-A3B-Instruct is an instruction-tuned chat model in the Qwen3-Next series optimized for fast, stable respo
NVIDIA: Nemotron 3 Nano 30B A3B (free) Unknown 256000 0/0 NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers

Roleplay

Model Provider Context Length Price (prompt/completion) Note
Nous: Hermes 3 405B Instruct (free) Unknown 131072 0/0 Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, m

JSON

Model Provider Context Length Price (prompt/completion) Note
Qwen: Qwen3 Coder 480B A35B (free) Unknown 1048576 0/0 Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is opt
Poolside: Laguna XS.2 (free) Unknown 262144 0/0 Laguna XS.2 is the second-generation model in the XS size class from Poolside, their efficient co
Poolside: Laguna M.1 (free) Unknown 262144 0/0 Laguna M.1 is the flagship coding agent model from Poolside, optimized for complex software engin
Google: Gemma 4 31B (free) Unknown 262144 0/0 Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output.
NVIDIA: Nemotron 3 Nano Omni (free) Unknown 256000 0/0 NVIDIA Nemotron™ 3 Nano Omni is a 30B-A3B open multimodal model designed to function as a perception and context sub-age

Long Context

Model Provider Context Length Price (prompt/completion) Note
Owl Alpha Unknown 1048756 0/0 Owl Alpha is a high-performance foundation model designed for agentic workloads. Natively supports tool use, and long-co
Qwen: Qwen3 Coder 480B A35B (free) Unknown 1048576 0/0 Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is opt
NVIDIA: Nemotron 3 Ultra (free) Unknown 1000000 0/0 NVIDIA Nemotron 3 Ultra is an open frontier-reasoning and orchestration model from NVIDIA, with 55B active parameters ou
Poolside: Laguna XS.2 (free) Unknown 262144 0/0 Laguna XS.2 is the second-generation model in the XS size class from Poolside, their efficient co
Poolside: Laguna M.1 (free) Unknown 262144 0/0 Laguna M.1 is the flagship coding agent model from Poolside, optimized for complex software engin
MoonshotAI: Kimi K2.6 (free) Unknown 262144 0/0 Kimi K2.6 is Moonshot AI's next-generation multimodal model, designed for long-horizon coding, coding-driven UI/UX gener
Google: Gemma 4 31B (free) Unknown 262144 0/0 Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output.
Qwen: Qwen3 Next 80B A3B Instruct (free) Unknown 262144 0/0 Qwen3-Next-80B-A3B-Instruct is an instruction-tuned chat model in the Qwen3-Next series optimized for fast, stable respo
NVIDIA: Nemotron 3 Nano Omni (free) Unknown 256000 0/0 NVIDIA Nemotron™ 3 Nano Omni is a 30B-A3B open multimodal model designed to function as a perception and context sub-age
OpenAI: gpt-oss-120b (free) Unknown 131072 0/0 gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-rea
Meta: Llama 3.2 3B Instruct (free) Unknown 131072 0/0 Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language process
Nous: Hermes 3 405B Instruct (free) Unknown 131072 0/0 Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, m
NVIDIA: Nemotron Nano 12B 2 VL (free) Unknown 128000 0/0 NVIDIA Nemotron Nano 2 VL is a 12-billion-parameter open multimodal reasoning model designed for video understanding and
NVIDIA: Nemotron Nano 9B V2 (free) Unknown 128000 0/0 NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified mod
LiquidAI: LFM2.5-1.2B-Thinking (free) Unknown 32768 0/0 LFM2.5-1.2B-Thinking is a lightweight reasoning-focused model optimized for agentic tasks, data extraction, and RAG—whil