manage-token-budget
정보
이 스킬은 장시간 실행되는 AI 에이전트 워크플로우를 위한 토큰 예산 관리 기능을 제공하여 비용을 통제하고 컨텍스트 창 오버플로우를 방지합니다. 사용량을 추적하고 상한을 적용하며, 한계에 도달했을 때 컨텍스트를 정리하거나 점진적 공개 방식을 사용할 수 있습니다. 자율 에이전트 루프, 폴링 시스템, 컨텍스트 누적 위험이 있는 모든 워크플로우에 비용 안전장치를 추가할 때 활용하세요.
빠른 설치
Claude Code
추천npx skills add pjt222/agent-almanac -a claude-code/plugin add https://github.com/pjt222/agent-almanacgit clone https://github.com/pjt222/agent-almanac.git ~/.claude/skills/manage-token-budgetClaude Code에서 이 명령을 복사하여 붙여넣어 스킬을 설치하세요
문서
Manage Token Budget
Control cost + ctx footprint of agentic systems → track tokens/cycle, audit ctx, enforce caps, prune low-value, route via metadata before full procs. Core: every token earns place. Inform decisions = stay; occupy space without influence = prune.
Community evidence: 37hr autonomous session = $13.74 from 30min heartbeat + verbose system instructions + unchecked ctx accumulation. Fix: 4hr intervals + notification-only + no feed browsing. This skill codifies prevention patterns.
Use When
- Long-lived agent loops (heartbeats, polling, autonomous workflows) → costs compound
- Ctx windows grow unpredictably between cycles
- API costs spike past baseline → post-mortem
- New workflow → need cost guardrails from start
- Post-incident audit
- System prompts/memory/tool schemas dominate ctx
In
- Required: System/workflow to budget (running or planned)
- Required: Budget ceiling ($/period or token/cycle)
- Optional: Cost data (API logs, billing)
- Optional: Model ctx window size
- Optional: Degradation policy (what drops at limit)
Do
Step 1: Per-Cycle Cost Tracking
Instrument loop → log tokens at every boundary.
Per cycle capture:
- Input tokens: system prompt + memory + tool schemas + history + new content
- Output tokens: model res incl. tool calls
- Total cost: in × in_price + out × out_price
- Timestamp: when ran
- Trigger: timer/event/user
Store structured (JSONL, CSV, DB) — NOT in ctx window:
{"cycle": 47, "ts": "2026-03-12T14:30:00Z", "trigger": "heartbeat",
"input_tokens": 18420, "output_tokens": 2105, "cost_usd": 0.0891,
"cumulative_cost_usd": 3.42}
No instrumentation → estimate from billing:
- Total / cycles = avg/cycle
- Compare baseline (pricing × ctx size)
→ Per-cycle log w/ tokens + costs, granular enough to find expensive cycles. Log lives outside ctx.
If err: no token counts (some APIs lack metadata) → billing → averages. Coarse tracking (daily $/daily cycles) reveals trends. None possible → Step 2 + estimate from ctx size.
Step 2: Audit Ctx Window
Measure occupants, rank by size.
Decompose:
- System prompt: base instructions, CLAUDE.md, personality
- Memory: MEMORY.md, auto-loaded topic files
- Tool schemas: MCP server defs, fn calling
- Skill procs: full SKILL.md loaded
- History: prior turns
- Dynamic: tool outs, file content, search results
Budget table:
Context Budget Audit:
+------------------------+--------+------+-----------------------------------+
| Component | Tokens | % | Notes |
+------------------------+--------+------+-----------------------------------+
| System prompt | 4,200 | 21% | Includes CLAUDE.md chain |
| Memory (auto-loaded) | 3,800 | 19% | MEMORY.md + 4 topic files |
| Tool schemas | 2,600 | 13% | 3 MCP servers, 47 tools |
| Active skill procedure | 1,900 | 9% | Full SKILL.md loaded |
| Conversation history | 5,100 | 25% | 12 prior turns |
| Current cycle content | 2,400 | 12% | Tool outputs from this cycle |
+------------------------+--------+------+-----------------------------------+
| TOTAL | 20,000 | 100% | Model limit: 200,000 |
| Remaining headroom |180,000 | | |
+------------------------+--------+------+-----------------------------------+
Flag oversize vs. decision value. 4k-token memory unused this task = pure overhead.
→ Ranked table per consumer + size + %. ≥1 component stands out (usually history or verbose tool outs).
If err: hard to count → char/4 for English, char/3 for JSON/YAML. Goal = relative rank, not exact.
Step 3: Budget Caps + Enforcement
Hard + soft limits, action per limit.
-
Soft limit (warn): 60-75% of hard. Hit:
- Log warn w/ usage + remaining
- Begin voluntary prune (Step 4) low-value first
- Reduce frequency (e.g., 30min → 2hr)
- Continue degraded
-
Hard limit (stop): absolute max. Hit:
- Halt autonomous immediately
- Alert operator (notify, email, log)
- Preserve state summary
- No new cycle till human authorizes
-
Per-cycle cap: max per single cycle. Stops runaway:
- Truncate tool outs or skip low-pri actions
- Log truncation
Doc in workflow config:
token_budget:
soft_limit_usd: 5.00 # warn and begin pruning
hard_limit_usd: 10.00 # halt and alert
per_cycle_cap_usd: 0.50 # max per individual cycle
soft_limit_pct: 70 # % of context window triggering pruning
hard_limit_pct: 90 # % of context window triggering halt
enforcement: strict # strict = halt on hard limit; advisory = log only
alert_channel: notification # how to notify the operator
→ Caps doc'd at 3 levels (soft/hard/per-cycle) + explicit actions. Policy answers "what at limit?" before limit.
If err: $ premature → ctx-% only (soft 70%, hard 90%) + add $ after 24-48hr data. Advisory mode (log no halt) OK during calibration.
Step 4: Emergency Prune
At limits, drop low-value systematically.
Priority (drop lowest first):
- Old tool outs: verbose results from prior cycles where decision made. Decision persists, evidence goes.
- Redundant turns: early turns superseded by later. Turn 3 asked X, turn 7 revised to Y → turn 3 redundant.
- Verbose formatting: tables, ASCII art, decorative headers. Summarize w/ one-liner.
- Done sub-task ctx: multi-step, sub-task complete + outs in summary/file.
- Inactive skill procs: no longer following → drop full text.
- Irrelevant memory: auto-loaded, unrelated projects/sessions.
Per pruned item, tombstone:
[PRUNED: 2,400 tokens of npm audit output from cycle 12 — 3 vulnerabilities found, all patched]
~20 tokens but preserves conclusion.
→ Ctx drops below soft after prune. Each pruned item has tombstone. No decision-critical info lost.
If err: prune to lvl 4 + still over → workflow too ctx-heavy for cycle freq. Escalate: "N% post-prune. (a) longer interval, (b) reduce scope, (c) split workflows, (d) accept higher cost."
Step 5: Progressive Disclosure for Skill Loading
Route via registry metadata before loading full procs → spend tokens on routing not reading.
Pattern:
- Route first: Need skill → read registry entry (id, desc, domain, complexity, tags) from
_registry.yml~3-5 lines, ~50 tokens - Confirm relevance: Match? No → next candidate. ~50 tokens/miss vs. ~500-2000 for wrong SKILL.md
- Load on match: Only when confirmed → load full SKILL.md
- Unload after: Proc complete → prune full text (Step 4 lvl 5), keep summary
Same pattern for other large payloads:
- Memory: MEMORY.md index first, topic files only when relevant
- Tool docs: Names + one-liners for routing, full schemas only when called
- Files: Listings + signatures first, full only for fns being modified
Without progressive disclosure:
Load 5 candidate skills → 5 × 1,500 tokens = 7,500 tokens → use 1 skill
With progressive disclosure:
Route through 5 registry entries → 5 × 50 tokens = 250 tokens
Load 1 matched skill → 1 × 1,500 tokens = 1,500 tokens
Total: 1,750 tokens (77% reduction)
→ Skill loading = 2-phase: lightweight routing → full only on match. Same for memory, schemas, files.
If err: registry insufficient (vague descs, missing tags) → improve registry not abandon. Fix = better metadata, not more loading.
Step 6: Cost-Aware Cycle Intervals
Set intervals from cost data, not arbitrary schedules.
-
Cost/hour at current interval:
cost_per_hour = avg_cost_per_cycle × cycles_per_hour- Ex: $0.09/cycle × 2/hr = $0.18/hr = $4.32/day
-
Vs. budget:
hours_until_hard_limit = (hard_limit - cumulative_cost) / cost_per_hour- hours < intended runtime → extend interval
-
Min effective interval:
- Fastest change rate of monitored system? Source updates 4hr → polling 30min wastes 7/8 cycles
- Match interval to refresh rate, not anxiety
- Event-driven → webhooks/push, not polling
-
Apply:
Before: 30-minute heartbeat, verbose processing
→ 48 cycles/day × $0.09/cycle = $4.32/day
After: 4-hour heartbeat, notification-only
→ 6 cycles/day × $0.04/cycle = $0.24/day
→ 94% cost reduction
→ Interval justified by cost data + matches refresh rate. Tradeoff doc'd for future baseline.
If err: low-latency required, can't extend → reduce per-cycle cost (smaller prompts, fewer schemas, summarized history). 2 levers: frequency + cost/cycle.
Step 7: Validate Controls
Confirm all working + within budget.
- Tracking: 3-5 cycles → verify per-cycle logs w/ accurate counts
- Soft limit test: lower temp → warn fires + prune begins
- Hard limit test: lower temp → halts + alerts
- Per-cycle cap test: inject large out → truncated not blown
- Progressive disclosure test: trace skill load → routes registry before full SKILL.md
- Projection:
- Daily cost current settings
- Days till hard limit at burn rate
- Monthly expected
Budget Validation Report:
+-----------------------+----------+--------+
| Check | Expected | Actual |
+-----------------------+----------+--------+
| Per-cycle logging | Present | |
| Soft limit warning | Fires | |
| Hard limit halt | Halts | |
| Per-cycle cap | Truncates| |
| Progressive disclosure| Routes | |
| Daily cost projection | < $X.XX | |
+-----------------------+----------+--------+
→ All 5 controls verified. Projection within budget.
If err: not firing → check enforcement wired into actual loop, not just doc'd. Config w/o enforcement = plan, not control. Projection over → Step 6, adjust interval/cost.
Check
- Per-cycle tracking logs in/out tokens, cost, ts every cycle
- Ctx audit identifies all consumers + tokens + %
- Caps at 3 levels: soft, hard, per-cycle
- Each cap has explicit action (warn, prune, halt, alert)
- Emergency prune follows priority + tombstones
- Progressive disclosure routes metadata before full
- Cycle interval justified by cost + refresh rate
- Validation tests confirm controls fire
- Projection within budget
- Post-incident: root cause + prevention in place
Traps
- Tracking in ctx: Logs in history inflate the thing controlled. Log externally, keep summary in ctx.
- Soft limits w/o enforcement: Warn nobody sees ≠ control. Must trigger visible action — prune/extend/notify. Silent exceed → will exceed.
- Prune decisions over data: Drop tool outs before decisions = lose info. Prune evidence AFTER decision, not before. Tombstone = preserve conclusion, drop evidence.
- Interval = anxiety not refresh: Poll 30min when source updates 4hr wastes 87.5%. Measure actual refresh first.
- Load full skills for routing: 400-line SKILL.md to decide "right skill?" = 10-20x cost of 3-line registry. Route metadata first.
- Ignore system prompt: System prompts + CLAUDE.md + auto-memory = invisible costs paid every cycle. 5k-token prompt × 48/day = 240k tokens/day for instructions. Audit + trim first.
- Caps w/o human escalation: Hit limits + silently degrade → damage accumulates. Hard limits MUST notify human.
→
assess-context— eval reasoning ctx structural health; complements Step 2 auditmetal— extract conceptual essence; progressive disclosure applies in prospect phasechrysopoeia— value extraction + dead weight elim; same value-per-token at code levelmanage-memory— organize + prune persistent memory; reduces memory ctx component
GitHub 저장소
연관 스킬
executing-plans
디자인executing-plans 스킬은 검토 체크포인트가 포함된 통제된 배치로 실행할 완전한 구현 계획이 있을 때 사용합니다. 이 스킬은 계획을 불러와 비판적으로 검토한 후, 소규모 배치(기본값 3개 작업)로 작업을 실행하면서 각 배치 사이에 진행 상황을 아키텍트 검토를 위해 보고합니다. 이를 통해 내재된 품질 관리 체크포인트를 갖춘 체계적인 구현이 보장됩니다.
requesting-code-review
디자인이 스킬은 코드 변경 사항을 요구 사항에 따라 분석하기 위해 코드 리뷰어 하위 에이전트를 호출합니다. 작업 완료 후, 주요 기능 구현 후, 또는 메인 브랜치에 병합하기 전에 사용해야 합니다. 이 리뷰는 현재 구현체와 원래 계획을 비교하여 문제를 조기에 발견하는 데 도움이 됩니다.
connect-mcp-server
디자인이 스킬은 개발자들이 HTTP, stdio 또는 SSE 전송 방식을 통해 MCP 서버를 Claude Code에 연결하는 포괄적인 가이드를 제공합니다. GitHub, Notion 및 사용자 정의 API와 같은 외부 서비스를 통합하기 위한 설치, 구성, 인증 및 보안을 다룹니다. MCP 통합 설정, 외부 도구 구성 또는 Claude의 모델 컨텍스트 프로토콜 작업 시 활용하세요.
web-cli-teleport
디자인이 스킬은 작업 분석을 기반으로 개발자가 Claude Code 웹 인터페이스와 CLI 인터페이스 중 선택할 수 있도록 돕고, 두 환경 간 원활한 세션 텔레포트를 가능하게 합니다. 웹, CLI 또는 모바일 환경 전환 시 세션 상태와 컨텍스트를 관리하여 워크플로를 최적화합니다. 다양한 단계에서 서로 다른 도구가 필요한 복잡한 프로젝트에 사용하세요.
