manage-token-budget
О программе
Этот навык обеспечивает управление бюджетом токенов для длительных рабочих процессов AI-агентов, помогая контролировать расходы и предотвращать переполнение контекстного окна. Он отслеживает использование, устанавливает лимиты и может сокращать контекст или применять прогрессивное раскрытие информации при приближении к ограничениям. Используйте его для добавления защитных механизмов по стоимости в автономные циклы агентов, системы опроса или любые рабочие процессы, где существует риск накопления контекста.
Быстрая установка
Claude Code
Рекомендуетсяnpx skills add pjt222/agent-almanac -a claude-code/plugin add https://github.com/pjt222/agent-almanacgit clone https://github.com/pjt222/agent-almanac.git ~/.claude/skills/manage-token-budgetСкопируйте и вставьте эту команду в Claude Code для установки этого навыка
Документация
Manage Token Budget
Control cost + ctx footprint of agentic systems → track tokens/cycle, audit ctx, enforce caps, prune low-value, route via metadata before full procs. Core: every token earns place. Inform decisions = stay; occupy space without influence = prune.
Community evidence: 37hr autonomous session = $13.74 from 30min heartbeat + verbose system instructions + unchecked ctx accumulation. Fix: 4hr intervals + notification-only + no feed browsing. This skill codifies prevention patterns.
Use When
- Long-lived agent loops (heartbeats, polling, autonomous workflows) → costs compound
- Ctx windows grow unpredictably between cycles
- API costs spike past baseline → post-mortem
- New workflow → need cost guardrails from start
- Post-incident audit
- System prompts/memory/tool schemas dominate ctx
In
- Required: System/workflow to budget (running or planned)
- Required: Budget ceiling ($/period or token/cycle)
- Optional: Cost data (API logs, billing)
- Optional: Model ctx window size
- Optional: Degradation policy (what drops at limit)
Do
Step 1: Per-Cycle Cost Tracking
Instrument loop → log tokens at every boundary.
Per cycle capture:
- Input tokens: system prompt + memory + tool schemas + history + new content
- Output tokens: model res incl. tool calls
- Total cost: in × in_price + out × out_price
- Timestamp: when ran
- Trigger: timer/event/user
Store structured (JSONL, CSV, DB) — NOT in ctx window:
{"cycle": 47, "ts": "2026-03-12T14:30:00Z", "trigger": "heartbeat",
"input_tokens": 18420, "output_tokens": 2105, "cost_usd": 0.0891,
"cumulative_cost_usd": 3.42}
No instrumentation → estimate from billing:
- Total / cycles = avg/cycle
- Compare baseline (pricing × ctx size)
→ Per-cycle log w/ tokens + costs, granular enough to find expensive cycles. Log lives outside ctx.
If err: no token counts (some APIs lack metadata) → billing → averages. Coarse tracking (daily $/daily cycles) reveals trends. None possible → Step 2 + estimate from ctx size.
Step 2: Audit Ctx Window
Measure occupants, rank by size.
Decompose:
- System prompt: base instructions, CLAUDE.md, personality
- Memory: MEMORY.md, auto-loaded topic files
- Tool schemas: MCP server defs, fn calling
- Skill procs: full SKILL.md loaded
- History: prior turns
- Dynamic: tool outs, file content, search results
Budget table:
Context Budget Audit:
+------------------------+--------+------+-----------------------------------+
| Component | Tokens | % | Notes |
+------------------------+--------+------+-----------------------------------+
| System prompt | 4,200 | 21% | Includes CLAUDE.md chain |
| Memory (auto-loaded) | 3,800 | 19% | MEMORY.md + 4 topic files |
| Tool schemas | 2,600 | 13% | 3 MCP servers, 47 tools |
| Active skill procedure | 1,900 | 9% | Full SKILL.md loaded |
| Conversation history | 5,100 | 25% | 12 prior turns |
| Current cycle content | 2,400 | 12% | Tool outputs from this cycle |
+------------------------+--------+------+-----------------------------------+
| TOTAL | 20,000 | 100% | Model limit: 200,000 |
| Remaining headroom |180,000 | | |
+------------------------+--------+------+-----------------------------------+
Flag oversize vs. decision value. 4k-token memory unused this task = pure overhead.
→ Ranked table per consumer + size + %. ≥1 component stands out (usually history or verbose tool outs).
If err: hard to count → char/4 for English, char/3 for JSON/YAML. Goal = relative rank, not exact.
Step 3: Budget Caps + Enforcement
Hard + soft limits, action per limit.
-
Soft limit (warn): 60-75% of hard. Hit:
- Log warn w/ usage + remaining
- Begin voluntary prune (Step 4) low-value first
- Reduce frequency (e.g., 30min → 2hr)
- Continue degraded
-
Hard limit (stop): absolute max. Hit:
- Halt autonomous immediately
- Alert operator (notify, email, log)
- Preserve state summary
- No new cycle till human authorizes
-
Per-cycle cap: max per single cycle. Stops runaway:
- Truncate tool outs or skip low-pri actions
- Log truncation
Doc in workflow config:
token_budget:
soft_limit_usd: 5.00 # warn and begin pruning
hard_limit_usd: 10.00 # halt and alert
per_cycle_cap_usd: 0.50 # max per individual cycle
soft_limit_pct: 70 # % of context window triggering pruning
hard_limit_pct: 90 # % of context window triggering halt
enforcement: strict # strict = halt on hard limit; advisory = log only
alert_channel: notification # how to notify the operator
→ Caps doc'd at 3 levels (soft/hard/per-cycle) + explicit actions. Policy answers "what at limit?" before limit.
If err: $ premature → ctx-% only (soft 70%, hard 90%) + add $ after 24-48hr data. Advisory mode (log no halt) OK during calibration.
Step 4: Emergency Prune
At limits, drop low-value systematically.
Priority (drop lowest first):
- Old tool outs: verbose results from prior cycles where decision made. Decision persists, evidence goes.
- Redundant turns: early turns superseded by later. Turn 3 asked X, turn 7 revised to Y → turn 3 redundant.
- Verbose formatting: tables, ASCII art, decorative headers. Summarize w/ one-liner.
- Done sub-task ctx: multi-step, sub-task complete + outs in summary/file.
- Inactive skill procs: no longer following → drop full text.
- Irrelevant memory: auto-loaded, unrelated projects/sessions.
Per pruned item, tombstone:
[PRUNED: 2,400 tokens of npm audit output from cycle 12 — 3 vulnerabilities found, all patched]
~20 tokens but preserves conclusion.
→ Ctx drops below soft after prune. Each pruned item has tombstone. No decision-critical info lost.
If err: prune to lvl 4 + still over → workflow too ctx-heavy for cycle freq. Escalate: "N% post-prune. (a) longer interval, (b) reduce scope, (c) split workflows, (d) accept higher cost."
Step 5: Progressive Disclosure for Skill Loading
Route via registry metadata before loading full procs → spend tokens on routing not reading.
Pattern:
- Route first: Need skill → read registry entry (id, desc, domain, complexity, tags) from
_registry.yml~3-5 lines, ~50 tokens - Confirm relevance: Match? No → next candidate. ~50 tokens/miss vs. ~500-2000 for wrong SKILL.md
- Load on match: Only when confirmed → load full SKILL.md
- Unload after: Proc complete → prune full text (Step 4 lvl 5), keep summary
Same pattern for other large payloads:
- Memory: MEMORY.md index first, topic files only when relevant
- Tool docs: Names + one-liners for routing, full schemas only when called
- Files: Listings + signatures first, full only for fns being modified
Without progressive disclosure:
Load 5 candidate skills → 5 × 1,500 tokens = 7,500 tokens → use 1 skill
With progressive disclosure:
Route through 5 registry entries → 5 × 50 tokens = 250 tokens
Load 1 matched skill → 1 × 1,500 tokens = 1,500 tokens
Total: 1,750 tokens (77% reduction)
→ Skill loading = 2-phase: lightweight routing → full only on match. Same for memory, schemas, files.
If err: registry insufficient (vague descs, missing tags) → improve registry not abandon. Fix = better metadata, not more loading.
Step 6: Cost-Aware Cycle Intervals
Set intervals from cost data, not arbitrary schedules.
-
Cost/hour at current interval:
cost_per_hour = avg_cost_per_cycle × cycles_per_hour- Ex: $0.09/cycle × 2/hr = $0.18/hr = $4.32/day
-
Vs. budget:
hours_until_hard_limit = (hard_limit - cumulative_cost) / cost_per_hour- hours < intended runtime → extend interval
-
Min effective interval:
- Fastest change rate of monitored system? Source updates 4hr → polling 30min wastes 7/8 cycles
- Match interval to refresh rate, not anxiety
- Event-driven → webhooks/push, not polling
-
Apply:
Before: 30-minute heartbeat, verbose processing
→ 48 cycles/day × $0.09/cycle = $4.32/day
After: 4-hour heartbeat, notification-only
→ 6 cycles/day × $0.04/cycle = $0.24/day
→ 94% cost reduction
→ Interval justified by cost data + matches refresh rate. Tradeoff doc'd for future baseline.
If err: low-latency required, can't extend → reduce per-cycle cost (smaller prompts, fewer schemas, summarized history). 2 levers: frequency + cost/cycle.
Step 7: Validate Controls
Confirm all working + within budget.
- Tracking: 3-5 cycles → verify per-cycle logs w/ accurate counts
- Soft limit test: lower temp → warn fires + prune begins
- Hard limit test: lower temp → halts + alerts
- Per-cycle cap test: inject large out → truncated not blown
- Progressive disclosure test: trace skill load → routes registry before full SKILL.md
- Projection:
- Daily cost current settings
- Days till hard limit at burn rate
- Monthly expected
Budget Validation Report:
+-----------------------+----------+--------+
| Check | Expected | Actual |
+-----------------------+----------+--------+
| Per-cycle logging | Present | |
| Soft limit warning | Fires | |
| Hard limit halt | Halts | |
| Per-cycle cap | Truncates| |
| Progressive disclosure| Routes | |
| Daily cost projection | < $X.XX | |
+-----------------------+----------+--------+
→ All 5 controls verified. Projection within budget.
If err: not firing → check enforcement wired into actual loop, not just doc'd. Config w/o enforcement = plan, not control. Projection over → Step 6, adjust interval/cost.
Check
- Per-cycle tracking logs in/out tokens, cost, ts every cycle
- Ctx audit identifies all consumers + tokens + %
- Caps at 3 levels: soft, hard, per-cycle
- Each cap has explicit action (warn, prune, halt, alert)
- Emergency prune follows priority + tombstones
- Progressive disclosure routes metadata before full
- Cycle interval justified by cost + refresh rate
- Validation tests confirm controls fire
- Projection within budget
- Post-incident: root cause + prevention in place
Traps
- Tracking in ctx: Logs in history inflate the thing controlled. Log externally, keep summary in ctx.
- Soft limits w/o enforcement: Warn nobody sees ≠ control. Must trigger visible action — prune/extend/notify. Silent exceed → will exceed.
- Prune decisions over data: Drop tool outs before decisions = lose info. Prune evidence AFTER decision, not before. Tombstone = preserve conclusion, drop evidence.
- Interval = anxiety not refresh: Poll 30min when source updates 4hr wastes 87.5%. Measure actual refresh first.
- Load full skills for routing: 400-line SKILL.md to decide "right skill?" = 10-20x cost of 3-line registry. Route metadata first.
- Ignore system prompt: System prompts + CLAUDE.md + auto-memory = invisible costs paid every cycle. 5k-token prompt × 48/day = 240k tokens/day for instructions. Audit + trim first.
- Caps w/o human escalation: Hit limits + silently degrade → damage accumulates. Hard limits MUST notify human.
→
assess-context— eval reasoning ctx structural health; complements Step 2 auditmetal— extract conceptual essence; progressive disclosure applies in prospect phasechrysopoeia— value extraction + dead weight elim; same value-per-token at code levelmanage-memory— organize + prune persistent memory; reduces memory ctx component
GitHub репозиторий
Похожие навыки
executing-plans
ДизайнИспользуйте навык executing-plans, когда у вас есть полный план реализации для выполнения контролируемыми партиями с контрольными точками проверки. Он загружает и критически анализирует план, затем выполняет задачи небольшими партиями (по умолчанию 3 задачи), сообщая о прогрессе между каждой партией для проверки архитектором. Это обеспечивает систематическую реализацию со встроенными контрольными точками проверки качества.
requesting-code-review
ДизайнЭтот навык запускает суб-агента для ревью кода, который анализирует изменения в коде на соответствие требованиям перед дальнейшими действиями. Его следует использовать после завершения задач, реализации крупных функций или перед слиянием с основной веткой. Ревью помогает выявить проблемы на ранней стадии, сравнивая текущую реализацию с исходным планом.
connect-mcp-server
ДизайнЭтот навык предоставляет разработчикам подробное руководство по подключению серверов MCP к Claude Code с использованием транспортов HTTP, stdio или SSE. Он охватывает установку, конфигурацию, аутентификацию и безопасность для интеграции внешних сервисов, таких как GitHub, Notion и пользовательские API. Используйте его при настройке интеграций MCP, конфигурации внешних инструментов или работе с Model Context Protocol от Claude.
web-cli-teleport
ДизайнЭтот навык помогает разработчикам выбирать между веб-интерфейсом Claude Code и CLI на основе анализа задачи, а также обеспечивает бесшовное перемещение сессий между этими средами. Он оптимизирует рабочий процесс, управляя состоянием и контекстом сессии при переключении между веб-интерфейсом, CLI или мобильным приложением. Используйте его для сложных проектов, требующих различных инструментов на разных этапах работы.
