probe-feature-flag-state
关于
This skill probes the runtime state of a named feature flag in a CLI binary using a four-prong evidence protocol. It classifies flags as LIVE, DARK, INDETERMINATE, or UNKNOWN, handling scenarios like gate-vs-event disambiguation. Use it to verify if a capability is rolled out, audit dark-launched features, or refresh probes against new binary versions.
快速安装
Claude Code
推荐npx skills add pjt222/agent-almanac -a claude-code/plugin add https://github.com/pjt222/agent-almanacgit clone https://github.com/pjt222/agent-almanac.git ~/.claude/skills/probe-feature-flag-state在 Claude Code 中复制并粘贴此命令以安装该技能
技能文档
Probe Feature-Flag State
Determine if named flag in shipped CLI binary is LIVE, DARK, INDETERMINATE, or UNKNOWN via 4-prong evidence protocol pairing every state claim w/ specific observation.
Use When
- Capability rumored/documented/inferred → verify gate fires for running session
- Audit dark-launched features (ships in bundle, gated off) → plan integrations responsibly
- Prior probe needs refresh against new binary (flag may have flipped, removed, merged into conjunction)
- Phase 1 (
monitor-binary-version-baselines) follow-up → classify candidates before Phase 4 wire capture - User-visible behavior changed → flag flip or code change drove it?
In
- Required: Flag name as in binary (string-literal form)
- Required: CLI binary or bundle file readable + invocable
- Required: Authenticated session against harness's normal backend (own account; never another user's)
- Optional: Binary version ID — strongly recommended for diff-able evidence table
- Optional: Suspected co-gates list (other flags in conjunction)
- Optional: Prior probe artifact at different version, for delta analysis
Do
Step 1: Confirm Flag Name in Binary (Prong A — Binary Strings)
Extract candidate from bundle → confirm exists as string literal. Without this, all prongs probing thin air.
# Locate the bundle (common shapes: .js, .mjs, .bun, packaged binary)
BUNDLE=/path/to/cli/bundle.js
FLAG=acme_widget_v3 # synthetic placeholder — replace with the candidate
# Confirm the literal exists
grep -c "$FLAG" "$BUNDLE"
# Capture every line where it appears, with surrounding context for Step 2
grep -n -C 3 "$FLAG" "$BUNDLE" > /tmp/flag-context.txt
wc -l /tmp/flag-context.txt
Inspect /tmp/flag-context.txt, tag each occurrence:
- gate-call — first arg to gate-shaped fn (
gate("$FLAG", default),isEnabled("$FLAG"),flag("$FLAG", ...)). - telemetry-call — first arg to emit/log/track fn.
- env-var-check —
process.env.Xlookup. - string-table — static map/registry, role unclear.
→ ≥1 occurrence in bundle, each tagged w/ call-site role.
If err: grep -c returns 0 → flag not in build. Wrong input (typo, wrong namespace) or removed. Re-check Phase 1 markers, correct input or classify REMOVED + stop.
Step 2: Disambiguate Gate from Event from Env Var
Same string can be gate, telemetry event, env var, or all three. Classification by call-site, not string. Mistaking telemetry for gate → nonsense reasoning.
Per tagged occurrence:
- gate-call → eligible for LIVE/DARK/INDETERMINATE classification. Capture default value passed (
gate("$FLAG", false)defaults off;gate("$FLAG", true)defaults on). Record literal default + gate fn name. - telemetry-call → NOT a gate. Label fired when other gate already passed. Only telemetry-call → string is event-only, classification =
UNKNOWN. - env-var-check → usually kill switch (default-on disabled by env var) or opt-in (default-off enabled by env var). Note polarity —
if (process.env.X) { return null; }= kill switch;if (process.env.X) { enable(); }= opt-in. - string-table → cross-ref, look at downstream consumption.
→ Per occurrence, definite call-site role + (for gate-calls) recorded default value.
If err: gate-call's surrounding context too minified to read default → expand context (-C 10), inspect full callee. Default still unreadable → record default=?, downgrade LIVE/DARK to INDETERMINATE.
Step 3: Observe Live Invocation Behavior (Prong B — Runtime Probe)
Run harness in authenticated session, observe gated capability surfaces. Highest-signal prong: bundle says what can happen, runtime shows what does happen.
Pick probe action revealing gate-pass — typically user-visible behavior gate guards (tool in tool list, command flag valid, UI element rendering, output field appearing).
# Example shape — adapt to the harness
$CLI --list-capabilities | grep -i widget # does the gated capability appear?
$CLI --help 2>&1 | grep -i "$FLAG" # is a flag-related option exposed?
$CLI run-some-command --debug 2>&1 | tee probe-runtime.log
Record one of three:
- gate-pass observed — capability surfaced. Candidate:
LIVE. - gate-pass not observed — capability didn't surface. Candidate depends on default from Step 2 (default-false →
DARK; default-true → re-check, suspicious). - gate-pass conditional on input/context not reproducible here — record condition; candidate:
INDETERMINATE.
→ Recorded probe action, observed outcome, candidate classification.
If err: probe action errors (auth fail, network unreachable, wrong subcommand) → runtime prong unusable. Fix session or pick different probe action. Don't infer DARK from runtime that never ran.
Step 4: Inspect On-Disk State (Prong C — Config, Cache, Session)
Many harnesses persist gate evals or override values to disk. Inspecting shows what harness believed at last eval.
Common locations (shapes, not specific paths):
# User-level config
ls ~/.config/<harness>/ 2>/dev/null
ls ~/.<harness>/ 2>/dev/null
# Per-project state
ls .<harness>/ 2>/dev/null
# Cache directories
ls ~/.cache/<harness>/ 2>/dev/null
# Search any of these for the flag name
grep -r "$FLAG" ~/.config/<harness>/ ~/.cache/<harness>/ .<harness>/ 2>/dev/null
Record each hit's path, value w/ flag, last-modified time. Recently-modified cache entry overriding binary default = strongest possible evidence either way.
→ Confirmed override value w/ timestamp, OR confirmed absence (no on-disk state mentions flag).
If err: flag mentioned but can't tell if recorded value = cached server response, user override, or stale → flag entry for Step 5 (platform cache) reconciliation, don't guess.
Step 5: Inspect Platform Flag-Service Cache (Prong D)
If harness uses external flag service (LaunchDarkly, Statsig, GrowthBook, vendor-internal), locally-cached service response = authoritative current rollout state.
# Look for service-shaped cache files
find ~/.cache ~/.config -name "*flag*" -o -name "*feature*" -o -name "*config*" 2>/dev/null | head
# If a cache file is present, parse it for the flag name
jq ".[] | select(.key == \"$FLAG\")" ~/.cache/<harness>/flags.json 2>/dev/null
Record cached value, timestamp, TTL (if present). Platform cache false overrides binary default true; cache true overrides binary default false.
→ Definite cached value w/ timestamp, OR confirmed absence of flag-service cache.
If err: no flag-service or can't locate cache → prong contributes nothing. Note "Prong D: not applicable" in evidence table. Don't guess.
Step 6: Handle Conjunction Gates
Some capabilities guarded by multiple flags all true: gate("A") && gate("B") && gate("C"). Any one DARK → capability DARK, but per-flag classification still belongs to each.
# After finding the gate-call site for the primary flag in Step 2, scan the
# enclosing predicate for other gate(...) calls
grep -n -C 5 "$FLAG" "$BUNDLE" | grep -oE 'gate\("[^"]+"' | sort -u
Per co-gate string surfaced:
- Repeat Steps 1-5 for that flag (treat each as own probe)
- Record per-flag classification
- Compute capability-level classification: LIVE iff all conjuncts LIVE; DARK if any DARK; INDETERMINATE if no DARK + at least one INDETERMINATE.
→ Every conjunct ID'd + individually classified, plus derived capability-level classification.
If err: predicate too minified to enumerate (call site inlined or wrapped) → record "≥1 additional gate, structure unreadable", downgrade capability-level to INDETERMINATE even if primary looks LIVE.
Step 7: Check for Skill-Substitution
Flag may legitimately be DARK while user-facing capability reachable through different fully-supported route — different command, user-invocable skill, alternate API. Honest finding "flag DARK, capability LIVE via substitution" common + important; missing produces panicked dark-launch reports about capabilities users actually have.
For any DARK or INDETERMINATE candidate:
- Documented user-invokable command, slash command, skill delivering same outcome?
- Alternate API surface (different endpoint, tool name) returning equivalent data?
- Harness publishes user-facing extension point (plugins, custom tools, hooks) → users assemble equivalent themselves?
If yes to any → append substitution: note to evidence row recording alternate route + observability (how user reaches it, documented).
→ For every DARK/INDETERMINATE, explicit substitution check — route or "no substitution route identified."
If err: suspect substitution but can't confirm route → mark "substitution suspected; not confirmed" rather than asserting either way.
Step 8: Assemble Evidence Table + Final Classification
Combine 4 prongs → single table. Every state claim paired w/ supporting observation. Re-running at new version → diff-able artifact.
| Field | Value |
|---|---|
| Flag | acme_widget_v3 (synthetic placeholder) |
| Binary version | <version-id> |
| Probe date | YYYY-MM-DD |
| Prong A — strings | present (3 occurrences: 1 gate-call default=false, 2 telemetry) |
| Prong B — runtime | gate-pass not observed in capability list |
| Prong C — on-disk | no override found in ~/.config/<harness>/ |
| Prong D — platform cache | service cache absent / not applicable |
| Conjunction | none — single-gate predicate |
| Substitution | user-invokable widget slash command delivers equivalent UX |
| Final state | DARK (capability LIVE via substitution) |
Apply classification rules:
- LIVE — ≥1 prong observed gate-pass this session AND no prong contradicts.
- DARK — flag string present, gate-call default
false, no prong observed gate-pass, no override flips on. - INDETERMINATE — gate-pass conditional on input/context not reproducible, OR default unreadable, OR conjunct INDETERMINATE.
- UNKNOWN — string present but not used as gate (telemetry-only, string-table-only, env-var-only label).
Save table as probe artifact (e.g., probes/<flag>-<version>.md) → future probes diff against it.
→ Complete evidence table covering all 4 prongs, conjunction status, substitution status, single final classification.
If err: no prong yields usable signal (binary unreadable, runtime uninvocable, on-disk + platform cache both absent) → don't invent classification. Record INDETERMINATE w/ reason "no prong yielded signal" + stop.
Check
- Every state claim paired w/ specific observation (no bare assertions)
- Gate-call default value recorded (or explicitly noted as unreadable)
- Telemetry-event occurrences not counted as gate evidence
- Conjunction gates have per-flag + capability-level classifications
- Every DARK/INDETERMINATE row has explicit substitution check
- Artifact records binary version → diff-able
- No real product names, version-pinned IDs, dark-only flag names in publication artifacts (see
redact-for-public-disclosure)
Traps
- Conflate telemetry events w/ gates. String in
emit("$FLAG", ...)= label, not gate. Telemetry-only = no rollout state, classify UNKNOWN not DARK. - Skip Prong B (live invocation). Static evidence (binary says
default=false) ≠ runtime evidence (capability didn't appear). Flag w/ default-false in binary may be flipped true by server-side override; only runtime probe shows what session got. - Miss the conjunction. Classify primary LIVE because single occurrence shows
default=truewhile ignoring&& gate("B") && gate("C")→ falsely confident LIVE for capability gated by B or C. - Call DARK w/o substitution check. Many DARK genuinely unreachable; many have fully-supported user-invokable route. Substitution check turns "alarming dark-launch" into "honest finding."
- Probe stale binary. Artifact w/ no version stamp = useless. Always record version, diff future probes.
- Activate gate to confirm. Flipping flag to test = not part of this skill. Some dark gates off for safety (incomplete capability, regulatory hold, unfinished migration). Document; never bypass.
- Capture other users' state. Prong C + D inspect own state + cache. Reading another's cache = exfiltration, out of scope.
- Treat INDETERMINATE as failure. Not — honest classification when evidence partial. Forcing INDETERMINATE → LIVE/DARK to look decisive = fastest way to be wrong.
→
monitor-binary-version-baselines— Phase 1 of parent guide; marker tracking supplies candidate flag inventoryconduct-empirical-wire-capture— Phase 4; deeper runtime evidence (network capture, lifecycle hooks) when Prong B insufficientsecurity-audit-codebase— dark-launched code = attack-surface archaeology; this skill = discovery half of auditredact-for-public-disclosure— Phase 5; redaction discipline deciding which artifacts can leave private workspace
GitHub 仓库
相关推荐技能
content-collections
元Content Collections 是一个 TypeScript 优先的构建工具,可将本地 Markdown/MDX 文件转换为类型安全的数据集合。它专为构建博客、文档站和内容密集型 Vite+React 应用而设计,提供基于 Zod 的自动模式验证。该工具涵盖从 Vite 插件配置、MDX 编译到生产环境部署的完整工作流。
polymarket
元这个Claude Skill为开发者提供完整的Polymarket预测市场开发支持,涵盖API调用、交易执行和市场数据分析。关键特性包括实时WebSocket数据流,可监控实时交易、订单和市场动态。开发者可用它构建预测市场应用、实施交易策略并集成实时市场预测功能。
creating-opencode-plugins
元该Skill帮助开发者创建OpenCode插件,用于接入命令、文件、LSP等25+种事件。它提供了插件结构、事件API规范和JavaScript/TypeScript实现模式,适合需要拦截操作、扩展功能或自定义事件处理的场景。开发者可通过它快速构建响应式模块来增强OpenCode AI助手的能力。
sglang
元SGLang是一个专为LLM设计的高性能推理框架,特别适用于需要结构化输出的场景。它通过RadixAttention前缀缓存技术,在处理JSON、正则表达式、工具调用等具有重复前缀的复杂工作流时,能实现极速生成。如果你正在构建智能体或多轮对话系统,并追求远超vLLM的推理性能,SGLang是理想选择。
