MCP HubMCP Hub
스킬 목록으로 돌아가기

manage-token-budget

pjt222
업데이트됨 2 days ago
5 조회
17
2
17
GitHub에서 보기
디자인aiapiautomationdesign

정보

이 스킬은 장시간 실행되는 AI 에이전트 워크플로우를 위한 토큰 예산 관리 기능을 제공하여 비용을 통제하고 컨텍스트 창 오버플로우를 방지합니다. 사용량을 추적하고 상한을 적용하며, 한계에 도달했을 때 컨텍스트를 정리하거나 점진적 공개 방식을 사용할 수 있습니다. 자율 에이전트 루프, 폴링 시스템, 컨텍스트 누적 위험이 있는 모든 워크플로우에 비용 안전장치를 추가할 때 활용하세요.

빠른 설치

Claude Code

추천
기본
npx skills add pjt222/agent-almanac -a claude-code
플러그인 명령대체
/plugin add https://github.com/pjt222/agent-almanac
Git 클론대체
git clone https://github.com/pjt222/agent-almanac.git ~/.claude/skills/manage-token-budget

Claude Code에서 이 명령을 복사하여 붙여넣어 스킬을 설치하세요

문서

Manage Token Budget

Control cost + ctx footprint of agentic systems → track tokens/cycle, audit ctx, enforce caps, prune low-value, route via metadata before full procs. Core: every token earns place. Inform decisions = stay; occupy space without influence = prune.

Community evidence: 37hr autonomous session = $13.74 from 30min heartbeat + verbose system instructions + unchecked ctx accumulation. Fix: 4hr intervals + notification-only + no feed browsing. This skill codifies prevention patterns.

Use When

  • Long-lived agent loops (heartbeats, polling, autonomous workflows) → costs compound
  • Ctx windows grow unpredictably between cycles
  • API costs spike past baseline → post-mortem
  • New workflow → need cost guardrails from start
  • Post-incident audit
  • System prompts/memory/tool schemas dominate ctx

In

  • Required: System/workflow to budget (running or planned)
  • Required: Budget ceiling ($/period or token/cycle)
  • Optional: Cost data (API logs, billing)
  • Optional: Model ctx window size
  • Optional: Degradation policy (what drops at limit)

Do

Step 1: Per-Cycle Cost Tracking

Instrument loop → log tokens at every boundary.

Per cycle capture:

  1. Input tokens: system prompt + memory + tool schemas + history + new content
  2. Output tokens: model res incl. tool calls
  3. Total cost: in × in_price + out × out_price
  4. Timestamp: when ran
  5. Trigger: timer/event/user

Store structured (JSONL, CSV, DB) — NOT in ctx window:

{"cycle": 47, "ts": "2026-03-12T14:30:00Z", "trigger": "heartbeat",
 "input_tokens": 18420, "output_tokens": 2105, "cost_usd": 0.0891,
 "cumulative_cost_usd": 3.42}

No instrumentation → estimate from billing:

  • Total / cycles = avg/cycle
  • Compare baseline (pricing × ctx size)

→ Per-cycle log w/ tokens + costs, granular enough to find expensive cycles. Log lives outside ctx.

If err: no token counts (some APIs lack metadata) → billing → averages. Coarse tracking (daily $/daily cycles) reveals trends. None possible → Step 2 + estimate from ctx size.

Step 2: Audit Ctx Window

Measure occupants, rank by size.

Decompose:

  1. System prompt: base instructions, CLAUDE.md, personality
  2. Memory: MEMORY.md, auto-loaded topic files
  3. Tool schemas: MCP server defs, fn calling
  4. Skill procs: full SKILL.md loaded
  5. History: prior turns
  6. Dynamic: tool outs, file content, search results

Budget table:

Context Budget Audit:
+------------------------+--------+------+-----------------------------------+
| Component              | Tokens | %    | Notes                             |
+------------------------+--------+------+-----------------------------------+
| System prompt          | 4,200  | 21%  | Includes CLAUDE.md chain          |
| Memory (auto-loaded)   | 3,800  | 19%  | MEMORY.md + 4 topic files         |
| Tool schemas           | 2,600  | 13%  | 3 MCP servers, 47 tools           |
| Active skill procedure | 1,900  |  9%  | Full SKILL.md loaded              |
| Conversation history   | 5,100  | 25%  | 12 prior turns                    |
| Current cycle content  | 2,400  | 12%  | Tool outputs from this cycle      |
+------------------------+--------+------+-----------------------------------+
| TOTAL                  | 20,000 | 100% | Model limit: 200,000             |
| Remaining headroom     |180,000 |      |                                   |
+------------------------+--------+------+-----------------------------------+

Flag oversize vs. decision value. 4k-token memory unused this task = pure overhead.

→ Ranked table per consumer + size + %. ≥1 component stands out (usually history or verbose tool outs).

If err: hard to count → char/4 for English, char/3 for JSON/YAML. Goal = relative rank, not exact.

Step 3: Budget Caps + Enforcement

Hard + soft limits, action per limit.

  1. Soft limit (warn): 60-75% of hard. Hit:

    • Log warn w/ usage + remaining
    • Begin voluntary prune (Step 4) low-value first
    • Reduce frequency (e.g., 30min → 2hr)
    • Continue degraded
  2. Hard limit (stop): absolute max. Hit:

    • Halt autonomous immediately
    • Alert operator (notify, email, log)
    • Preserve state summary
    • No new cycle till human authorizes
  3. Per-cycle cap: max per single cycle. Stops runaway:

    • Truncate tool outs or skip low-pri actions
    • Log truncation

Doc in workflow config:

token_budget:
  soft_limit_usd: 5.00        # warn and begin pruning
  hard_limit_usd: 10.00       # halt and alert
  per_cycle_cap_usd: 0.50     # max per individual cycle
  soft_limit_pct: 70           # % of context window triggering pruning
  hard_limit_pct: 90           # % of context window triggering halt
  enforcement: strict          # strict = halt on hard limit; advisory = log only
  alert_channel: notification  # how to notify the operator

→ Caps doc'd at 3 levels (soft/hard/per-cycle) + explicit actions. Policy answers "what at limit?" before limit.

If err: $ premature → ctx-% only (soft 70%, hard 90%) + add $ after 24-48hr data. Advisory mode (log no halt) OK during calibration.

Step 4: Emergency Prune

At limits, drop low-value systematically.

Priority (drop lowest first):

  1. Old tool outs: verbose results from prior cycles where decision made. Decision persists, evidence goes.
  2. Redundant turns: early turns superseded by later. Turn 3 asked X, turn 7 revised to Y → turn 3 redundant.
  3. Verbose formatting: tables, ASCII art, decorative headers. Summarize w/ one-liner.
  4. Done sub-task ctx: multi-step, sub-task complete + outs in summary/file.
  5. Inactive skill procs: no longer following → drop full text.
  6. Irrelevant memory: auto-loaded, unrelated projects/sessions.

Per pruned item, tombstone:

[PRUNED: 2,400 tokens of npm audit output from cycle 12 — 3 vulnerabilities found, all patched]

~20 tokens but preserves conclusion.

→ Ctx drops below soft after prune. Each pruned item has tombstone. No decision-critical info lost.

If err: prune to lvl 4 + still over → workflow too ctx-heavy for cycle freq. Escalate: "N% post-prune. (a) longer interval, (b) reduce scope, (c) split workflows, (d) accept higher cost."

Step 5: Progressive Disclosure for Skill Loading

Route via registry metadata before loading full procs → spend tokens on routing not reading.

Pattern:

  1. Route first: Need skill → read registry entry (id, desc, domain, complexity, tags) from _registry.yml ~3-5 lines, ~50 tokens
  2. Confirm relevance: Match? No → next candidate. ~50 tokens/miss vs. ~500-2000 for wrong SKILL.md
  3. Load on match: Only when confirmed → load full SKILL.md
  4. Unload after: Proc complete → prune full text (Step 4 lvl 5), keep summary

Same pattern for other large payloads:

  • Memory: MEMORY.md index first, topic files only when relevant
  • Tool docs: Names + one-liners for routing, full schemas only when called
  • Files: Listings + signatures first, full only for fns being modified
Without progressive disclosure:
  Load 5 candidate skills → 5 × 1,500 tokens = 7,500 tokens → use 1 skill

With progressive disclosure:
  Route through 5 registry entries → 5 × 50 tokens = 250 tokens
  Load 1 matched skill → 1 × 1,500 tokens = 1,500 tokens
  Total: 1,750 tokens (77% reduction)

→ Skill loading = 2-phase: lightweight routing → full only on match. Same for memory, schemas, files.

If err: registry insufficient (vague descs, missing tags) → improve registry not abandon. Fix = better metadata, not more loading.

Step 6: Cost-Aware Cycle Intervals

Set intervals from cost data, not arbitrary schedules.

  1. Cost/hour at current interval:

    • cost_per_hour = avg_cost_per_cycle × cycles_per_hour
    • Ex: $0.09/cycle × 2/hr = $0.18/hr = $4.32/day
  2. Vs. budget:

    • hours_until_hard_limit = (hard_limit - cumulative_cost) / cost_per_hour
    • hours < intended runtime → extend interval
  3. Min effective interval:

    • Fastest change rate of monitored system? Source updates 4hr → polling 30min wastes 7/8 cycles
    • Match interval to refresh rate, not anxiety
    • Event-driven → webhooks/push, not polling
  4. Apply:

Before: 30-minute heartbeat, verbose processing
  → 48 cycles/day × $0.09/cycle = $4.32/day

After: 4-hour heartbeat, notification-only
  → 6 cycles/day × $0.04/cycle = $0.24/day
  → 94% cost reduction

→ Interval justified by cost data + matches refresh rate. Tradeoff doc'd for future baseline.

If err: low-latency required, can't extend → reduce per-cycle cost (smaller prompts, fewer schemas, summarized history). 2 levers: frequency + cost/cycle.

Step 7: Validate Controls

Confirm all working + within budget.

  1. Tracking: 3-5 cycles → verify per-cycle logs w/ accurate counts
  2. Soft limit test: lower temp → warn fires + prune begins
  3. Hard limit test: lower temp → halts + alerts
  4. Per-cycle cap test: inject large out → truncated not blown
  5. Progressive disclosure test: trace skill load → routes registry before full SKILL.md
  6. Projection:
    • Daily cost current settings
    • Days till hard limit at burn rate
    • Monthly expected
Budget Validation Report:
+-----------------------+----------+--------+
| Check                 | Expected | Actual |
+-----------------------+----------+--------+
| Per-cycle logging     | Present  |        |
| Soft limit warning    | Fires    |        |
| Hard limit halt       | Halts    |        |
| Per-cycle cap         | Truncates|        |
| Progressive disclosure| Routes   |        |
| Daily cost projection | < $X.XX  |        |
+-----------------------+----------+--------+

→ All 5 controls verified. Projection within budget.

If err: not firing → check enforcement wired into actual loop, not just doc'd. Config w/o enforcement = plan, not control. Projection over → Step 6, adjust interval/cost.

Check

  • Per-cycle tracking logs in/out tokens, cost, ts every cycle
  • Ctx audit identifies all consumers + tokens + %
  • Caps at 3 levels: soft, hard, per-cycle
  • Each cap has explicit action (warn, prune, halt, alert)
  • Emergency prune follows priority + tombstones
  • Progressive disclosure routes metadata before full
  • Cycle interval justified by cost + refresh rate
  • Validation tests confirm controls fire
  • Projection within budget
  • Post-incident: root cause + prevention in place

Traps

  • Tracking in ctx: Logs in history inflate the thing controlled. Log externally, keep summary in ctx.
  • Soft limits w/o enforcement: Warn nobody sees ≠ control. Must trigger visible action — prune/extend/notify. Silent exceed → will exceed.
  • Prune decisions over data: Drop tool outs before decisions = lose info. Prune evidence AFTER decision, not before. Tombstone = preserve conclusion, drop evidence.
  • Interval = anxiety not refresh: Poll 30min when source updates 4hr wastes 87.5%. Measure actual refresh first.
  • Load full skills for routing: 400-line SKILL.md to decide "right skill?" = 10-20x cost of 3-line registry. Route metadata first.
  • Ignore system prompt: System prompts + CLAUDE.md + auto-memory = invisible costs paid every cycle. 5k-token prompt × 48/day = 240k tokens/day for instructions. Audit + trim first.
  • Caps w/o human escalation: Hit limits + silently degrade → damage accumulates. Hard limits MUST notify human.

  • assess-context — eval reasoning ctx structural health; complements Step 2 audit
  • metal — extract conceptual essence; progressive disclosure applies in prospect phase
  • chrysopoeia — value extraction + dead weight elim; same value-per-token at code level
  • manage-memory — organize + prune persistent memory; reduces memory ctx component

GitHub 저장소

pjt222/agent-almanac
경로: i18n/caveman-ultra/skills/manage-token-budget
0
agentsagentskillsai-assisted-developmentclaude-codeskillsteams

연관 스킬

executing-plans

디자인

executing-plans 스킬은 검토 체크포인트가 포함된 통제된 배치로 실행할 완전한 구현 계획이 있을 때 사용합니다. 이 스킬은 계획을 불러와 비판적으로 검토한 후, 소규모 배치(기본값 3개 작업)로 작업을 실행하면서 각 배치 사이에 진행 상황을 아키텍트 검토를 위해 보고합니다. 이를 통해 내재된 품질 관리 체크포인트를 갖춘 체계적인 구현이 보장됩니다.

스킬 보기

requesting-code-review

디자인

이 스킬은 코드 변경 사항을 요구 사항에 따라 분석하기 위해 코드 리뷰어 하위 에이전트를 호출합니다. 작업 완료 후, 주요 기능 구현 후, 또는 메인 브랜치에 병합하기 전에 사용해야 합니다. 이 리뷰는 현재 구현체와 원래 계획을 비교하여 문제를 조기에 발견하는 데 도움이 됩니다.

스킬 보기

connect-mcp-server

디자인

이 스킬은 개발자들이 HTTP, stdio 또는 SSE 전송 방식을 통해 MCP 서버를 Claude Code에 연결하는 포괄적인 가이드를 제공합니다. GitHub, Notion 및 사용자 정의 API와 같은 외부 서비스를 통합하기 위한 설치, 구성, 인증 및 보안을 다룹니다. MCP 통합 설정, 외부 도구 구성 또는 Claude의 모델 컨텍스트 프로토콜 작업 시 활용하세요.

스킬 보기

web-cli-teleport

디자인

이 스킬은 작업 분석을 기반으로 개발자가 Claude Code 웹 인터페이스와 CLI 인터페이스 중 선택할 수 있도록 돕고, 두 환경 간 원활한 세션 텔레포트를 가능하게 합니다. 웹, CLI 또는 모바일 환경 전환 시 세션 상태와 컨텍스트를 관리하여 워크플로를 최적화합니다. 다양한 단계에서 서로 다른 도구가 필요한 복잡한 프로젝트에 사용하세요.

스킬 보기