probe-feature-flag-state
О программе
Этот навык анализирует CLI-бинарник, чтобы определить состояние запуска именованного функционального флага, используя четырёхуровневый протокол сбора доказательств. Он классифицирует флаги как LIVE, DARK, INDETERMINATE или UNKNOWN, обрабатывая сложные сценарии, такие как логические условия (conjunction gates) и подмена навыков. Используйте его для проверки запуска функциональных возможностей, аудита скрыто запущенных функций или обновления выводов при работе с новыми версиями бинарников.
Быстрая установка
Claude Code
Рекомендуетсяnpx skills add pjt222/agent-almanac -a claude-code/plugin add https://github.com/pjt222/agent-almanacgit clone https://github.com/pjt222/agent-almanac.git ~/.claude/skills/probe-feature-flag-stateСкопируйте и вставьте эту команду в Claude Code для установки этого навыка
Документация
Probe Feature-Flag State
Determine whether a named feature flag in a shipped CLI binary is LIVE, DARK, INDETERMINATE, or UNKNOWN, using a four-pronged evidence protocol that pairs every state claim with a specific observation.
When to Use
- A capability is rumored, documented, or inferred and you need to verify whether the gate actually fires for the running session.
- You are auditing dark-launched features — code that ships in the bundle but is gated off — to plan integrations responsibly.
- A prior probe's conclusions need refreshing against a new binary version (the flag may have flipped, been removed, or been merged into a conjunction).
- You are following up Phase 1 (
monitor-binary-version-baselines) markers and need to classify each candidate flag's rollout state before moving to Phase 4 wire capture. - A user-visible behavior changed and you need to know whether a flag flip or a code change drove it.
Inputs
- Required: the flag name as it appears in the binary (string-literal form).
- Required: the CLI binary or bundle file you can read and invoke.
- Required: an authenticated session against the harness's normal backend (your own account; never another user's).
- Optional: the binary version identifier — strongly recommended so the evidence table is diff-able against future probes.
- Optional: a list of suspected co-gates (other flag names that may participate in a conjunction with this one).
- Optional: a prior probe artifact for the same flag at a different version, for delta analysis.
Procedure
Step 1: Confirm the Flag Name Is Present in the Binary (Prong A — Binary Strings)
Extract the candidate flag name from the bundle to confirm it actually exists as a string literal. Without this, all later prongs are probing thin air.
# Locate the bundle (common shapes: .js, .mjs, .bun, packaged binary)
BUNDLE=/path/to/cli/bundle.js
FLAG=acme_widget_v3 # synthetic placeholder — replace with the candidate
# Confirm the literal exists
grep -c "$FLAG" "$BUNDLE"
# Capture every line where it appears, with surrounding context for Step 2
grep -n -C 3 "$FLAG" "$BUNDLE" > /tmp/flag-context.txt
wc -l /tmp/flag-context.txt
Inspect /tmp/flag-context.txt and tag each occurrence as one of:
- gate-call — appears as the first argument to a gate-shaped function (
gate("$FLAG", default),isEnabled("$FLAG"),flag("$FLAG", ...)). - telemetry-call — appears as the first argument to an emit/log/track function.
- env-var-check — appears in a
process.env.X(or equivalent) lookup. - string-table — appears in a static map or registry whose role is unclear.
Got: at least one occurrence of the flag string in the bundle, and each occurrence tagged with its call-site role.
If fail: if grep -c returns 0, the flag is not in this build. Either the input name is wrong (typo, wrong namespace) or the flag was removed in this version. Re-check Phase 1 marker output, then either correct the input or classify as REMOVED and stop.
Step 2: Disambiguate Gate from Event from Env Var
The same string can appear as a gate, a telemetry event name, an env var, or all three. The classification depends on call-site, not on the string. Mistaking a telemetry name for a gate produces nonsense reasoning ("this gate must be off") about something that was never a gate.
For each tagged occurrence from Step 1:
- A gate-call occurrence makes this string eligible for LIVE / DARK / INDETERMINATE classification. Capture the default value passed to the gate (
gate("$FLAG", false)defaults the flag to off;gate("$FLAG", true)defaults it to on). Record both the literal default and the gate function name. - A telemetry-call occurrence does not make the string a gate. It is a label fired when some other gate has already passed. If the only occurrences are telemetry-call, the string is event-only and final classification is
UNKNOWN(name present but not a gate). - An env-var-check occurrence usually indicates a kill switch (default-on capability disabled by an env var) or an explicit opt-in (default-off capability enabled by an env var). Note the polarity —
if (process.env.X) { return null; }is a kill switch;if (process.env.X) { enable(); }is an opt-in. - A string-table occurrence must be cross-referenced — look at how the table is consumed downstream.
Got: for every occurrence, a definite call-site role and (for gate-calls) the recorded default value.
If fail: if a gate-call's surrounding context is too minified to read the default, expand the grep context (-C 10) and inspect the full callee. If the default still cannot be determined, record it as default=? and downgrade any LIVE/DARK conclusion to INDETERMINATE.
Step 3: Observe Live Invocation Behavior (Prong B — Runtime Probe)
Run the harness in an authenticated session you control and observe whether the gated capability surfaces. This is the single highest-signal prong: the bundle says what can happen, the runtime shows what does happen.
Pick a probe action that would reveal the gate-pass — typically the user-visible behavior the gate guards (a tool appearing in a tool list, a command flag becoming valid, a UI element rendering, an output field appearing in a response).
# Example shape — adapt to the harness
$CLI --list-capabilities | grep -i widget # does the gated capability appear?
$CLI --help 2>&1 | grep -i "$FLAG" # is a flag-related option exposed?
$CLI run-some-command --debug 2>&1 | tee probe-runtime.log
Record one of three outcomes:
- gate-pass observed — the capability surfaced in the session. Classification candidate:
LIVE. - gate-pass not observed — the capability did not surface. Classification candidate depends on the default from Step 2 (default-false →
DARK; default-true → re-check, this is suspicious). - gate-pass conditional on a specific input or context not reproducible here — record the condition; classification candidate:
INDETERMINATE.
Got: a recorded probe action, the observed outcome, and the candidate classification it points to.
If fail: if the probe action itself errors (auth failure, network unreachable, wrong subcommand), the runtime prong is unusable for this round. Fix the session or pick a different probe action; do not infer DARK from a runtime that never ran.
Step 4: Inspect On-Disk State (Prong C — Config, Cache, Session)
Many harnesses persist gate evaluations or override values to disk so they need not be re-fetched. Inspecting this state shows what the harness believed about the flag at last evaluation.
Common locations (adapt to the harness — these are shapes, not specific paths):
# User-level config
ls ~/.config/<harness>/ 2>/dev/null
ls ~/.<harness>/ 2>/dev/null
# Per-project state
ls .<harness>/ 2>/dev/null
# Cache directories
ls ~/.cache/<harness>/ 2>/dev/null
# Search any of these for the flag name
grep -r "$FLAG" ~/.config/<harness>/ ~/.cache/<harness>/ .<harness>/ 2>/dev/null
Record each hit's path, the value associated with the flag, and the file's last-modified time. A recently-modified cache entry overriding a binary default is the strongest possible evidence either way.
Got: either a confirmed override value with timestamp, or a confirmed absence (no on-disk state mentions this flag).
If fail: if you find the flag mentioned but cannot tell whether the recorded value is a cached server response, a user override, or a stale value, flag the entry for Step 5 (platform cache) reconciliation rather than guessing.
Step 5: Inspect Platform Flag-Service Cache (Prong D)
If the harness uses an external feature-flag service (LaunchDarkly, Statsig, GrowthBook, vendor-internal, etc.), the locally-cached service response is the authoritative current rollout state. Inspect it where available.
# Look for service-shaped cache files
find ~/.cache ~/.config -name "*flag*" -o -name "*feature*" -o -name "*config*" 2>/dev/null | head
# If a cache file is present, parse it for the flag name
jq ".[] | select(.key == \"$FLAG\")" ~/.cache/<harness>/flags.json 2>/dev/null
Record the cached value, the cache timestamp, and (if present) the cache TTL. A platform cache that says false overrides a binary default of true; a platform cache that says true overrides a binary default of false.
Got: either a definite cached value with timestamp, or confirmed absence of a flag-service cache for this harness.
If fail: if the harness has no flag-service or you cannot locate the cache, this prong contributes nothing — that is acceptable. Note "Prong D: not applicable" in the evidence table; do not guess.
Step 6: Handle Conjunction Gates
Some capabilities are guarded by multiple flags that must all be true: gate("A") && gate("B") && gate("C"). Any one being DARK is sufficient to make the capability DARK, but the per-flag classification still belongs to each flag individually.
# After finding the gate-call site for the primary flag in Step 2, scan the
# enclosing predicate for other gate(...) calls
grep -n -C 5 "$FLAG" "$BUNDLE" | grep -oE 'gate\("[^"]+"' | sort -u
For each co-gate string surfaced:
- Repeat Steps 1–5 for that flag (treat each as its own probe).
- Record the per-flag classification.
- Compute the capability-level classification: LIVE iff all conjuncts are LIVE; DARK if any conjunct is DARK; INDETERMINATE if no conjunct is DARK and at least one is INDETERMINATE.
Got: every conjunct identified and individually classified, plus a derived capability-level classification.
If fail: if the predicate is too minified to enumerate cleanly (call site is inlined or wrapped), record the conjunction as "≥1 additional gate, structure unreadable" and downgrade the capability-level classification to INDETERMINATE even if the primary flag looks LIVE.
Step 7: Check for Skill-Substitution
A flag may legitimately be DARK while the user-facing capability it would unlock is reachable through a different, fully-supported route — a different command, a user-invocable skill, an alternate API. The honest finding "flag DARK, capability LIVE via substitution" is common and important; missing it produces panicked dark-launch reports about capabilities users actually have.
For any candidate classification of DARK or INDETERMINATE, ask:
- Is there a documented user-invokable command, slash command, or skill that delivers the same end-user outcome?
- Is there an alternate API surface (different endpoint, different tool name) that returns equivalent data?
- Does the harness publish a user-facing extension point (plugins, custom tools, hooks) that allows users to assemble the equivalent themselves?
If yes to any, append a substitution: note to the evidence row recording the alternate route and its observability (how a user reaches it, whether it is documented).
Got: for every DARK / INDETERMINATE classification, an explicit substitution check — either the route, or the explicit note "no substitution route identified."
If fail: if you suspect a substitution exists but cannot confirm the route, mark "substitution suspected; not confirmed" rather than asserting either way.
Step 8: Assemble the Evidence Table and Final Classification
Combine the four prongs into a single table. Every state claim must be paired with the observation that supports it; re-running the probe at a new version produces a diff-able artifact.
| Field | Value |
|---|---|
| Flag | acme_widget_v3 (synthetic placeholder) |
| Binary version | <version-id> |
| Probe date | YYYY-MM-DD |
| Prong A — strings | present (3 occurrences: 1 gate-call default=false, 2 telemetry) |
| Prong B — runtime | gate-pass not observed in capability list |
| Prong C — on-disk | no override found in ~/.config/<harness>/ |
| Prong D — platform cache | service cache absent / not applicable |
| Conjunction | none — single-gate predicate |
| Substitution | user-invokable widget slash command delivers equivalent UX |
| Final state | DARK (capability LIVE via substitution) |
Apply the classification rules:
- LIVE — at least one prong observed gate-pass this session AND no prong contradicts.
- DARK — flag string present, gate-call default is
false, no prong observed gate-pass, no override flips it on. - INDETERMINATE — gate-pass is conditional on an input or context not reproducible in this probe, OR the gate's default could not be determined, OR a conjunct is INDETERMINATE.
- UNKNOWN — string present but not used as a gate (telemetry-only, string-table-only, env-var-only label).
Save the table as a probe artifact (e.g., probes/<flag>-<version>.md) so future probes diff against it.
Got: a complete evidence table covering all four prongs, conjunction status, substitution status, and a single final classification.
If fail: if no prong yields a usable signal (binary cannot be read, runtime cannot be invoked, on-disk and platform cache both absent), do not invent a classification. Record INDETERMINATE with the reason "no prong yielded signal" and stop.
Validation
- Every state claim in the evidence table is paired with a specific observation (no bare assertions).
- The flag's gate-call default value is recorded (or explicitly noted as unreadable).
- Telemetry-event occurrences are not counted as gate evidence.
- Conjunction gates have per-flag classifications and a capability-level classification.
- Every DARK / INDETERMINATE row has an explicit substitution check.
- The artifact records the binary version so future probes are diff-able.
- No real product names, version-pinned identifiers, or dark-only flag names appear in any artifact intended for publication (see
redact-for-public-disclosure).
Pitfalls
- Conflating telemetry events with gates. A string that appears in
emit("$FLAG", ...)is a label, not a gate. A flag that is "telemetry-only" has no rollout state and should be classified UNKNOWN, not DARK. - Skipping Prong B (live invocation). Static evidence alone (the binary says
default=false) is not the same as runtime evidence (the capability did not appear). A flag with default-false in the binary may be flipped to true by a server-side override; only the runtime probe shows what the session actually got. - Missing the conjunction. Classifying the primary flag as LIVE because its single occurrence shows
default=truewhile ignoring the surrounding&& gate("B") && gate("C")produces a falsely confident LIVE for a capability that is actually gated by B or C. - Calling DARK without a substitution check. Many DARK flags are genuinely unreachable, but many others have a fully-supported user-invokable route. The substitution check is what turns "alarming dark-launch" into "honest finding."
- Probing a stale binary version. A probe artifact with no version stamp is useless — you cannot tell whether it reflects current state or last quarter's state. Always record the version, and diff future probes against the artifact.
- Activating the gate to confirm it. Flipping a flag to test it is not part of this skill. Some dark gates are off for safety reasons (incomplete capability, regulatory hold, unfinished migration). Document; never bypass.
- Capturing other users' state. Prong C and Prong D inspect your own on-disk state and your own cache. Reading another user's cache is exfiltration and is out of scope.
- Treating INDETERMINATE as a failure. It is not — it is the honest classification when evidence is partial. Forcing INDETERMINATE results into LIVE or DARK to make the report look decisive is the fastest way to be wrong.
Related Skills
monitor-binary-version-baselines— Phase 1 of the parent guide; the marker tracking this skill builds on supplies the candidate flag inventory.conduct-empirical-wire-capture— Phase 4; deeper runtime evidence (network capture, lifecycle hooks) when Prong B's surface-level probe is insufficient.security-audit-codebase— dark-launched code is part of attack-surface archaeology; this skill is the discovery half of that audit.redact-for-public-disclosure— Phase 5; the redaction discipline that decides which probe artifacts can leave the private workspace.
GitHub репозиторий
Похожие навыки
content-collections
МетаЭтот навык предоставляет проверенную в продакшене настройку для Content Collections — TypeScript-ориентированного инструмента, который преобразует файлы Markdown/MDX в типобезопасные коллекции данных с валидацией Zod. Используйте его при создании блогов, сайтов документации или контентных приложений на Vite + React для обеспечения типобезопасности и автоматической проверки содержимого. Он охватывает всё: от настройки плагина Vite и компиляции MDX до оптимизации развертывания и валидации схем.
polymarket
МетаЭтот навык позволяет разработчикам создавать приложения на платформе прогнозных рынков Polymarket, включая интеграцию с API для торговли и получения рыночных данных. Он также обеспечивает потоковую передачу данных в реальном времени через WebSocket для отслеживания текущих сделок и рыночной активности. Используйте его для реализации торговых стратегий или создания инструментов, обрабатывающих обновления рынка в реальном времени.
creating-opencode-plugins
МетаЭтот навык помогает разработчикам создавать плагины OpenCode, которые подключаются к более чем 25 типам событий, таким как команды, файлы и операции LSP. Он предоставляет структуру плагина, спецификации API событий и шаблоны реализации для модулей на JavaScript/TypeScript. Используйте его, когда вам нужно перехватывать, отслеживать или расширять жизненный цикл ассистента OpenCode AI с помощью пользовательской событийно-ориентированной логики.
sglang
МетаSGLang — это высокопроизводительный фреймворк для обслуживания больших языковых моделей (LLM), специализирующийся на быстрой структурированной генерации JSON, regex и рабочих процессов агентов с использованием кэширования префиксов RadixAttention. Он обеспечивает значительно более высокую скорость вывода, особенно для задач с повторяющимися префиксами, что делает его идеальным для сложных структурированных результатов и многократных диалогов. Выбирайте SGLang вместо альтернатив, таких как vLLM, когда вам требуется ограниченное декодирование или вы создаете приложения с интенсивным совместным использованием префиксов.
