SKILL·D2B00E

circuit-breaker-pattern

Name: circuit-breaker-pattern
Author: pjt222

pjt222

업데이트됨 1 month ago

9 조회

메타aiautomationdesign

정보

이 스킬은 에이전트 워크플로우에서 연쇄적인 도구 장애를 방지하기 위해 서킷 브레이커 패턴을 구현합니다. 도구 상태를 추적하고, 상태 전환을 관리하며, 장애 발생 시 기능 매핑을 통해 대안으로 호출을 전달합니다. 이를 통해 신뢰할 수 없는 도구를 우아하게 처리하고 작업 중단 시 복구할 수 있는 내결함성 에이전트를 구축할 수 있습니다.

빠른 설치

Claude Code

문서

Circuit Breaker Pattern

Graceful degradation when tools fail. Agent w/ 5 tools, 1 broken → don't fail whole → spot broken tool, stop calling, shrink scope to achievable, report honest what skipped. Codifies circuit breaker from distributed systems → agentic tool orchestration.

Core insight from kirapixelads' "Kitchen Fire Problem": expeditor (orch layer) must NOT cook. Separate what to attempt from how → orchestrator stays out of broken tool's retry loop.

Use When

Building agents w/ many tools, varying reliability
Fault-tolerant workflows → partial > total failure
Agent stuck in retry loop on broken tool, not moving forward
Mid-task tool outage → graceful recovery
Hardening existing agents vs. cascading failures
Stale/cached tool out being treated as fresh

In

Required: Tool list (names + purposes)
Required: Task to accomplish
Optional: Known reliability issues / past fail patterns
Optional: Fail threshold (default: 3 consecutive fails → open)
Optional: Fail budget per cycle (default: 5 total fails → pause)
Optional: Half-open probe interval (default: every 3rd attempt post-open)

Do

Step 1: Build Capability Map

Declare each tool's capability + alternatives. Map = foundation for scope reduction → w/o it, fail leaves agent guessing.

capability_map:
  - tool: Grep
    provides: content search across files
    alternatives:
      - tool: Bash
        method: "rg or grep command"
        degradation: "loses Grep's built-in output formatting"
      - tool: Read
        method: "read suspected files directly"
        degradation: "requires knowing which files to check; no broad search"
    fallback: "ask the user which files to examine"

  - tool: Bash
    provides: command execution, build tools, git operations
    alternatives: []
    fallback: "report commands that need to be run manually"

  - tool: Read
    provides: file content inspection
    alternatives:
      - tool: Bash
        method: "cat or head command"
        degradation: "loses line numbering and truncation safety"
    fallback: "ask the user to paste file contents"

  - tool: Write
    provides: file creation
    alternatives:
      - tool: Edit
        method: "create via full-file edit"
        degradation: "requires file to already exist for Edit"
      - tool: Bash
        method: "echo/cat heredoc"
        degradation: "loses Write's atomic file creation"
    fallback: "output file contents for the user to save manually"

  - tool: WebSearch
    provides: external information retrieval
    alternatives: []
    fallback: "state what information is needed; ask user to provide it"

Each tool, doc:

Capability (one line)
Alternative tools (w/ degradation notes)
Manual fallback when no tool alternative

→ Full map covers every tool agent uses. Each entry has fallback even if no tool alt. Map makes explicit what's implicit: critical tools (no alts) vs. substitutable.

If err: Tool list unclear → start w/ allowed-tools from skill frontmatter. Alts uncertain → mark degradation: "unknown — test before relying on this route" vs. omit.

Step 2: Initialize Circuit Breaker State

State tracker per tool. All tools start CLOSED (healthy).

Circuit Breaker State Table:
+------------+--------+-------------------+------------------+-----------------+
| Tool       | State  | Consecutive Fails | Last Failure     | Last Success    |
+------------+--------+-------------------+------------------+-----------------+
| Grep       | CLOSED | 0                 | —                | —               |
| Bash       | CLOSED | 0                 | —                | —               |
| Read       | CLOSED | 0                 | —                | —               |
| Write      | CLOSED | 0                 | —                | —               |
| Edit       | CLOSED | 0                 | —                | —               |
| WebSearch  | CLOSED | 0                 | —                | —               |
+------------+--------+-------------------+------------------+-----------------+

Failure budget: 0 / 5 consumed

State defs:

CLOSED — Tool healthy. Use normally. Track consecutive fails.
OPEN — Tool known-broken. Don't call. Route to alts or shrink scope.
HALF-OPEN — Tool was broken, maybe recovered. Single probe call. Success → CLOSED. Fail → OPEN.

Transitions:

CLOSED → OPEN: Consecutive fails ≥ threshold (default: 3)
OPEN → HALF-OPEN: After interval (e.g., every 3rd task step)
HALF-OPEN → CLOSED: Probe success
HALF-OPEN → OPEN: Probe fail

→ State table init'd all tools CLOSED, zero fails. Threshold + budget declared.

If err: Can't enumerate tools upfront (dynamic discovery) → init state on first use. Pattern still works → build table incrementally.

Step 3: Implement Call-and-Track Loop

Agent needs tool call → follow decision seq. This = expeditor logic → decides whether to call, not how to execute.

BEFORE each tool call:
  1. Check tool state in the circuit breaker table
  2. If OPEN:
     a. Check if it is time for a half-open probe
        - Yes → transition to HALF-OPEN, proceed with probe call
        - No  → skip this tool, route to alternative (Step 4)
  3. If HALF-OPEN:
     a. Make one probe call
     b. Success → transition to CLOSED, reset consecutive fails to 0
     c. Failure → transition to OPEN, increment failure budget
  4. If CLOSED:
     a. Make the call normally

AFTER each tool call:
  1. Success:
     - Reset consecutive fails to 0
     - Record last success timestamp
  2. Failure:
     - Increment consecutive fails
     - Record last failure timestamp and error message
     - Increment failure budget consumed
     - If consecutive fails >= threshold:
         transition to OPEN
         log: "Circuit OPENED for [tool]: [failure count] consecutive failures"
     - If failure budget exhausted:
         PAUSE — do not continue the task
         Report to user (Step 6)

Expeditor NEVER retries failed call immediately. Record fail, check thresholds, move on. Retries via HALF-OPEN probe at later step only.

→ Clear decision loop before + after every tool call. Tool health tracked continuous. Expeditor layer never blocks on failing tool.

If err: Tracking state across calls impractical (stateless exec) → degrade to simpler: count total fails, pause at budget. Three-state breaker = ideal; fail counter = min viable.

Step 4: Route to Alternatives on Open Circuit

Tool OPEN → consult capability map (Step 1), route to best alt.

Routing priority:

Tool alt, low degradation — Similar capability tool. Note degradation in out.
Tool alt, high degradation — Big capability loss. Label what's missing.
Manual fallback — Report what agent can't do, what user needs to provide.
Scope reduction — No alt + no fallback → drop dependent sub-task (Step 5).

Example routing decision:

Tool needed: Grep (circuit OPEN)
Task: find all files containing "API_KEY"

Route 1: Bash with rg command
  → Degradation: loses Grep's built-in formatting
  → Decision: ACCEPTABLE — use this route

If Bash also OPEN:
Route 2: Read suspected config files directly
  → Degradation: requires guessing which files; no broad search
  → Decision: PARTIAL — try known config paths only

If Read also OPEN:
Route 3: Ask user
  → "I need to find files containing 'API_KEY' but my search
     tools are unavailable. Can you run: grep -r 'API_KEY' ."
  → Decision: FALLBACK — user provides the information

If user unavailable:
Route 4: Scope reduction
  → Remove "find API key references" from task scope
  → Document: "SKIPPED: API key search — no tools available"

→ Tool circuit opens → agent transparently routes to alt or degrades scope. Decision + degradation documented in out → user knows what's affected.

If err: Map incomplete (no alts listed) → default scope reduction + report. NEVER silently skip → always doc what + why.

Step 5: Reduce Scope to Achievable Work

Tools OPEN + alts exhausted → shrink task to what working tools can do. Not failure → honest scope mgmt.

Scope reduction proc:

List remaining sub-tasks
Each sub-task → check tools required
All required tools CLOSED or viable alts → keep
Any required tool OPEN no alt → mark DEFERRED
Continue w/ reduced scope
Report deferred at end

Scope Reduction Report:

Original scope: 5 sub-tasks
  [x] 1. Read configuration files          (Read: CLOSED)
  [x] 2. Search for deprecated patterns    (Grep: CLOSED)
  [ ] 3. Run test suite                    (Bash: OPEN — no alternative)
  [x] 4. Update documentation             (Edit: CLOSED)
  [ ] 5. Deploy to staging                 (Bash: OPEN — no alternative)

Reduced scope: 3 sub-tasks achievable
Deferred: 2 sub-tasks require Bash (circuit OPEN)

Recommendation: Complete sub-tasks 1, 2, 4 now.
Sub-tasks 3 and 5 require Bash — will probe on next cycle
or user can run commands manually.

Do NOT attempt deferred. Do NOT retry open-circuited tools hoping they work. Breaker exists to prevent this → trust its state.

→ Clear partition of task → achievable + deferred. Agent finishes achievable, reports deferred w/ reason + unblock path.

If err: Scope reduction removes all sub-tasks (all tools broken) → skip to Step 6 pause-and-report. Agent w/ no working tools must not fake progress.

Step 6: Handle Staleness + Label Data Quality

Tool returns maybe-stale data (cached, old snapshot, prev fetched) → label explicit, not treat as fresh.

Staleness indicators:

Tool out matches prev call exactly (cache hit?)
Data timestamps older than current task
Tool doc mentions caching
Results contradict other recent observations

Labeling proc:

When presenting potentially stale data:

"[STALE DATA — retrieved at {timestamp}, may not reflect current state]
 File contents as of last successful Read:
 ..."

"[CACHED RESULT — Grep returned identical results to previous call;
 filesystem may have changed since]"

"[UNVERIFIED — WebSearch result from {date}; current status unknown]"

NEVER silently present stale as current. User / downstream agent must know data quality.

→ All maybe-stale outs labeled. Fresh not labeled (labels = uncertainty, not confirmation).

If err: Can't determine staleness (no timestamps, no baseline) → note: "[FRESHNESS UNKNOWN — no baseline for comparison]". Uncertainty = info.

Step 7: Enforce Failure Budget

Track total fails across all tools. Budget exhausted → pause + report vs. keep accumulating errs.

Failure Budget Enforcement:

Budget: 5 failures per cycle
Current: 4 / 5 consumed

  Failure 1: Bash — "permission denied" (step 3)
  Failure 2: Bash — "command not found" (step 3)
  Failure 3: Bash — "timeout after 120s" (step 4)
  Failure 4: WebSearch — "connection refused" (step 5)

Status: 1 failure remaining before mandatory pause

→ Next tool call proceeds with heightened caution
→ If it fails: PAUSE and generate status report

On budget exhaustion:

FAILURE BUDGET EXHAUSTED — PAUSING

Completed work:
  - Sub-task 1: Read configuration files (SUCCESS)
  - Sub-task 2: Search for deprecated patterns (SUCCESS)

Incomplete work:
  - Sub-task 3: Run test suite (FAILED — Bash circuit OPEN)
  - Sub-task 4: Update documentation (NOT ATTEMPTED — paused)
  - Sub-task 5: Deploy to staging (NOT ATTEMPTED — paused)

Tool health:
  Grep: CLOSED (healthy)
  Read: CLOSED (healthy)
  Edit: CLOSED (healthy)
  Bash: OPEN (3 consecutive failures — permission/command/timeout)
  WebSearch: OPEN (1 failure — connection refused)

Failures: 5 / 5 budget consumed

Recommendation:
  1. Investigate Bash failures — likely environment issue
  2. Check network connectivity for WebSearch
  3. Resume from sub-task 4 after resolution

Pause-and-report = breaker in electrical systems → prevents damage accumulating. Agent that keeps calling broken tools wastes ctx, confuses user w/ repeat errs, inconsistent partial results.

→ Agent stops clean on budget exhaust. Report covers done work, incomplete work, tool health, actionable next steps.

If err: Can't generate clean report (state tracking lost) → out whatever avail. Partial report > silent continuation.

Step 8: Separation of Concerns — Expeditor vs. Executor

Valid. orchestration logic (Steps 2-7) cleanly separated from tool exec.

Expeditor (orch) does:

Track tool health state
Decide call, skip, probe
Route to alts on open
Enforce fail budget
Generate status reports

Expeditor does NOT:

Retry failed calls immediately
Modify call params to work around errs
Catch + suppress tool errs
Assume why tool failed
Exec fallback logic that itself needs tools

Expeditor "cooking" (calling tools to work around other fails) → separation broken. Expeditor routes to alt or shrinks scope, NOT fixes broken tool.

→ Clean boundary orch decisions vs. tool exec. Expeditor described w/o ref to specific tool APIs or err types.

If err: Orch + exec entangled → refactor → extract decision logic to separate step before each tool call. Decision step outs: CALL, SKIP, PROBE, PAUSE. Exec step acts on out.

Step 9: Detect Cascading Failures

Many tools share infra (network, fs, perms) → single root cause trips many breakers at once. Detect correlated pattern vs. treat each breaker indep.

Cascade indicators:

3+ tools OPEN in same task step / narrow window
Fails share common err signature (e.g., "connection refused," "permission denied")
Tools w/ indep fail history suddenly fail together

Response proc:

Second breaker opens → check if fail category matches first
Correlated → flag systemic failure → pause all tool calls, not just broken
Report suspected root: "Multiple tools failing with [shared pattern] — likely [network/filesystem/permissions] issue"
Don't probe half-open during systemic fail → probes also fail, waste budget
Resume probing only after user confirms infra fixed

Backoff compounding: Cascade trips → exponential backoff for half-open probes: probe step 3, then 6, then 12. Cap max interval 20 steps → prevent permanent circuit lock. Stops rapid-fire probes overwhelming recovering system.

→ Correlated fails detected + treated as single systemic event, not N indep trips. Fail budget counts systemic event once, not N times.

If err: Correlation detection impractical (diff err sigs, shared cause) → fallback indep per-tool breakers. System still degrades → just consumes budget faster.

Step 10: Pre-Call Tool Selection Layer

Before circuit breaker loop (Step 3) → optionally valid. tool available + likely succeed. Cuts unnecessary trips from predictable fails.

Pre-call checks:

Check	Method	Action on failure
Tool exists	Verify tool is in the allowed-tools list	Skip — do not even attempt
MCP server health	Check server process/connection status	Route to alternative immediately
Resource availability	Verify target file/URL/endpoint exists	Route or degrade scope

Decision table:

Pre-call score:
  AVAILABLE  → proceed to circuit breaker loop (Step 3)
  DEGRADED   → proceed with caution, lower the failure threshold by 1
  UNAVAILABLE → skip tool, route to alternative (Step 4) without consuming budget

Pre-call = advisory, not authoritative. Tool passing pre-call can still fail during exec. Breaker = primary reliability mechanism.

→ Predictable fails (missing tools, unreachable servers) caught before budget consumed. Breaker handles only genuine runtime fails.

If err: Pre-call checks unavail or too much overhead → skip entirely. Step 3 breaker loop handles all fails → pre-call = opt, not req.

Check

Traps

Retry vs. circuit-break: Calling broken tool repeat wastes budget + ctx. 3 consecutive fails = pattern, not bad luck. OPEN it.
Cooking in expeditor: Orch decides what, not how to fix broken. Expeditor crafting workaround cmds for Bash fails → crossed the boundary.
Silent scope reduction: Dropping sub-tasks w/o doc → looks complete, isn't. Always report skipped.
Treat stale as fresh: Cached/prev results maybe not current. Label uncertainty, don't ignore.
Open circuits too eagerly: Single transient fail shouldn't open. Threshold (default: 3) filters noise from signal.
Never probe post-open: Permanently open → agent never finds recovery. Half-open probes essential.
Ignore fail budget: W/o budget → agent accumulates dozens of fails across tools while "progressing" on paper. Budget forces honest checkpoint.
Cascade backoff multiply: Many tools in dep chain each apply own exp backoff → compound delay grows multiplicatively. Cap total aggregate backoff across chain, not just per tool.
Stale discovery scores: Step 10 caches tool avail. No invalidation when conditions change → agent skips recovered tool or attempts unavail. Re-check scores after systemic fail event.

→

fail-early-pattern — complementary: fail-early validates in before work; circuit-breaker manages fails during work
escalate-issues — budget exhausted / scope reduction significant → escalate to specialist or human
write-incident-runbook — doc recurring fail patterns as runbooks
assess-context — eval if current approach can adapt when many tools degraded; pairs w/ scope reduction
du-dum — two-clock arch sep'ng observe from decide; complement for cutting observe cost in agent loops

GitHub 저장소

pjt222/agent-almanac

경로: i18n/caveman-ultra/skills/circuit-breaker-pattern

agentsagentskillsai-assisted-developmentclaude-codeskillsteams

FAQ

Frequently asked questions

What is the circuit-breaker-pattern skill?

circuit-breaker-pattern is a Claude Skill by pjt222. Skills package instructions and resources that Claude loads on demand, so Claude can perform circuit-breaker-pattern-related tasks without extra prompting.

How do I install circuit-breaker-pattern?

Use the install commands on this page: add circuit-breaker-pattern to Claude Code as a plugin, or clone its repository into your skills directory, then restart Claude so it picks up the skill.

What category does circuit-breaker-pattern belong to?

circuit-breaker-pattern is in the Meta category, tagged ai, automation and design.

Is circuit-breaker-pattern free to use?

Yes. circuit-breaker-pattern is listed on AIMCP and free to install. It runs inside Claude, so no separate service account is required to use the skill itself.

연관 스킬

content-collections

메타

이 스킬은 콘텐츠 콜렉션(Content Collections)을 위한 프로덕션 검증된 설정을 제공합니다. 콘텐츠 콜렉션은 Markdown/MDX 파일을 Zod 검증이 포함된 타입 안전한 데이터 콜렉션으로 변환해주는 TypeScript 최우선 도구입니다. 블로그, 문서 사이트 또는 콘텐츠 중심의 Vite + React 애플리케이션을 구축할 때 타입 안전성과 자동 콘텐츠 검증을 보장하기 위해 사용하세요. Vite 플러그인 구성과 MDX 컴파일부터 배포 최적화 및 스키마 검증에 이르기까지 모든 것을 다룹니다.

스킬 보기

polymarket

메타

이 스킬은 개발자들이 Polymarket 예측 시장 플랫폼을 활용한 애플리케이션을 구축할 수 있도록 지원하며, 거래 및 시장 데이터를 위한 API 통합 기능을 포함합니다. 또한 WebSocket을 통한 실시간 데이터 스트리밍을 제공하여 실시간 거래와 시장 활동을 모니터링할 수 있습니다. 이를 통해 거래 전략을 구현하거나 실시간 시장 업데이트를 처리하는 도구를 생성하는 데 활용할 수 있습니다.

스킬 보기

creating-opencode-plugins

메타

이 스킬은 개발자들이 명령어, 파일, LSP 작업 등 25개 이상의 이벤트 유형에 연결되는 OpenCode 플러그인을 만들 수 있도록 돕습니다. JavaScript/TypeScript 모듈을 위한 플러그인 구조, 이벤트 API 명세, 구현 패턴을 제공합니다. OpenCode AI 어시스턴트의 라이프사이클을 사용자 정의 이벤트 기반 로직으로 가로채거나, 모니터링하거나, 확장해야 할 때 사용하세요.

스킬 보기

sglang

메타

SGLang은 RadixAttention 프리픽스 캐싱을 활용하여 JSON, 정규식, 에이전트 워크플로우를 위한 고속 구조화 생성에 특화된 고성능 LLM 서빙 프레임워크입니다. 특히 반복되는 프리픽스가 있는 작업에서 상당히 빠른 추론 속도를 제공하여 복잡한 구조화 출력 및 다중 턴 대화에 이상적입니다. 제약 디코딩이 필요하거나 광범위한 프리픽스 공유가 있는 애플리케이션을 구축할 때는 vLLM과 같은 대안보다 SGLang을 선택하십시오.

스킬 보기