MCP HubMCP Hub
Retour aux compétences

probe-feature-flag-state

pjt222
Mis à jour 2 days ago
2 vues
17
2
17
Voir sur GitHub
Métawordai

À propos

Cette compétence sonde un binaire CLI pour déterminer l'état d'exécution d'un drapeau de fonctionnalité nommé en utilisant un protocole de preuve à quatre volets. Elle classe les drapeaux comme LIVE, DARK, INDETERMINATE ou UNKNOWN, gérant des scénarios complexes comme les portes de conjonction et la substitution de compétences. Utilisez-la pour vérifier les déploiements de fonctionnalités, auditer les fonctionnalités lancées en mode sombre, ou actualiser les conclusions face à de nouvelles versions binaires.

Installation rapide

Claude Code

Recommandé
Principal
npx skills add pjt222/agent-almanac -a claude-code
Commande PluginAlternatif
/plugin add https://github.com/pjt222/agent-almanac
Git CloneAlternatif
git clone https://github.com/pjt222/agent-almanac.git ~/.claude/skills/probe-feature-flag-state

Copiez et collez cette commande dans Claude Code pour installer cette compétence

Documentation

Probe Feature-Flag State

Determine whether a named feature flag in a shipped CLI binary is LIVE, DARK, INDETERMINATE, or UNKNOWN, using a four-pronged evidence protocol that pairs every state claim with a specific observation.

When to Use

  • A capability is rumored, documented, or inferred and you need to verify whether the gate actually fires for the running session.
  • You are auditing dark-launched features — code that ships in the bundle but is gated off — to plan integrations responsibly.
  • A prior probe's conclusions need refreshing against a new binary version (the flag may have flipped, been removed, or been merged into a conjunction).
  • You are following up Phase 1 (monitor-binary-version-baselines) markers and need to classify each candidate flag's rollout state before moving to Phase 4 wire capture.
  • A user-visible behavior changed and you need to know whether a flag flip or a code change drove it.

Inputs

  • Required: the flag name as it appears in the binary (string-literal form).
  • Required: the CLI binary or bundle file you can read and invoke.
  • Required: an authenticated session against the harness's normal backend (your own account; never another user's).
  • Optional: the binary version identifier — strongly recommended so the evidence table is diff-able against future probes.
  • Optional: a list of suspected co-gates (other flag names that may participate in a conjunction with this one).
  • Optional: a prior probe artifact for the same flag at a different version, for delta analysis.

Procedure

Step 1: Confirm the Flag Name Is Present in the Binary (Prong A — Binary Strings)

Extract the candidate flag name from the bundle to confirm it actually exists as a string literal. Without this, all later prongs are probing thin air.

# Locate the bundle (common shapes: .js, .mjs, .bun, packaged binary)
BUNDLE=/path/to/cli/bundle.js
FLAG=acme_widget_v3   # synthetic placeholder — replace with the candidate

# Confirm the literal exists
grep -c "$FLAG" "$BUNDLE"

# Capture every line where it appears, with surrounding context for Step 2
grep -n -C 3 "$FLAG" "$BUNDLE" > /tmp/flag-context.txt
wc -l /tmp/flag-context.txt

Inspect /tmp/flag-context.txt and tag each occurrence as one of:

  • gate-call — appears as the first argument to a gate-shaped function (gate("$FLAG", default), isEnabled("$FLAG"), flag("$FLAG", ...)).
  • telemetry-call — appears as the first argument to an emit/log/track function.
  • env-var-check — appears in a process.env.X (or equivalent) lookup.
  • string-table — appears in a static map or registry whose role is unclear.

Got: at least one occurrence of the flag string in the bundle, and each occurrence tagged with its call-site role.

If fail: if grep -c returns 0, the flag is not in this build. Either the input name is wrong (typo, wrong namespace) or the flag was removed in this version. Re-check Phase 1 marker output, then either correct the input or classify as REMOVED and stop.

Step 2: Disambiguate Gate from Event from Env Var

The same string can appear as a gate, a telemetry event name, an env var, or all three. The classification depends on call-site, not on the string. Mistaking a telemetry name for a gate produces nonsense reasoning ("this gate must be off") about something that was never a gate.

For each tagged occurrence from Step 1:

  • A gate-call occurrence makes this string eligible for LIVE / DARK / INDETERMINATE classification. Capture the default value passed to the gate (gate("$FLAG", false) defaults the flag to off; gate("$FLAG", true) defaults it to on). Record both the literal default and the gate function name.
  • A telemetry-call occurrence does not make the string a gate. It is a label fired when some other gate has already passed. If the only occurrences are telemetry-call, the string is event-only and final classification is UNKNOWN (name present but not a gate).
  • An env-var-check occurrence usually indicates a kill switch (default-on capability disabled by an env var) or an explicit opt-in (default-off capability enabled by an env var). Note the polarity — if (process.env.X) { return null; } is a kill switch; if (process.env.X) { enable(); } is an opt-in.
  • A string-table occurrence must be cross-referenced — look at how the table is consumed downstream.

Got: for every occurrence, a definite call-site role and (for gate-calls) the recorded default value.

If fail: if a gate-call's surrounding context is too minified to read the default, expand the grep context (-C 10) and inspect the full callee. If the default still cannot be determined, record it as default=? and downgrade any LIVE/DARK conclusion to INDETERMINATE.

Step 3: Observe Live Invocation Behavior (Prong B — Runtime Probe)

Run the harness in an authenticated session you control and observe whether the gated capability surfaces. This is the single highest-signal prong: the bundle says what can happen, the runtime shows what does happen.

Pick a probe action that would reveal the gate-pass — typically the user-visible behavior the gate guards (a tool appearing in a tool list, a command flag becoming valid, a UI element rendering, an output field appearing in a response).

# Example shape — adapt to the harness
$CLI --list-capabilities | grep -i widget         # does the gated capability appear?
$CLI --help 2>&1 | grep -i "$FLAG"                # is a flag-related option exposed?
$CLI run-some-command --debug 2>&1 | tee probe-runtime.log

Record one of three outcomes:

  • gate-pass observed — the capability surfaced in the session. Classification candidate: LIVE.
  • gate-pass not observed — the capability did not surface. Classification candidate depends on the default from Step 2 (default-false → DARK; default-true → re-check, this is suspicious).
  • gate-pass conditional on a specific input or context not reproducible here — record the condition; classification candidate: INDETERMINATE.

Got: a recorded probe action, the observed outcome, and the candidate classification it points to.

If fail: if the probe action itself errors (auth failure, network unreachable, wrong subcommand), the runtime prong is unusable for this round. Fix the session or pick a different probe action; do not infer DARK from a runtime that never ran.

Step 4: Inspect On-Disk State (Prong C — Config, Cache, Session)

Many harnesses persist gate evaluations or override values to disk so they need not be re-fetched. Inspecting this state shows what the harness believed about the flag at last evaluation.

Common locations (adapt to the harness — these are shapes, not specific paths):

# User-level config
ls ~/.config/<harness>/ 2>/dev/null
ls ~/.<harness>/ 2>/dev/null

# Per-project state
ls .<harness>/ 2>/dev/null

# Cache directories
ls ~/.cache/<harness>/ 2>/dev/null

# Search any of these for the flag name
grep -r "$FLAG" ~/.config/<harness>/ ~/.cache/<harness>/ .<harness>/ 2>/dev/null

Record each hit's path, the value associated with the flag, and the file's last-modified time. A recently-modified cache entry overriding a binary default is the strongest possible evidence either way.

Got: either a confirmed override value with timestamp, or a confirmed absence (no on-disk state mentions this flag).

If fail: if you find the flag mentioned but cannot tell whether the recorded value is a cached server response, a user override, or a stale value, flag the entry for Step 5 (platform cache) reconciliation rather than guessing.

Step 5: Inspect Platform Flag-Service Cache (Prong D)

If the harness uses an external feature-flag service (LaunchDarkly, Statsig, GrowthBook, vendor-internal, etc.), the locally-cached service response is the authoritative current rollout state. Inspect it where available.

# Look for service-shaped cache files
find ~/.cache ~/.config -name "*flag*" -o -name "*feature*" -o -name "*config*" 2>/dev/null | head

# If a cache file is present, parse it for the flag name
jq ".[] | select(.key == \"$FLAG\")" ~/.cache/<harness>/flags.json 2>/dev/null

Record the cached value, the cache timestamp, and (if present) the cache TTL. A platform cache that says false overrides a binary default of true; a platform cache that says true overrides a binary default of false.

Got: either a definite cached value with timestamp, or confirmed absence of a flag-service cache for this harness.

If fail: if the harness has no flag-service or you cannot locate the cache, this prong contributes nothing — that is acceptable. Note "Prong D: not applicable" in the evidence table; do not guess.

Step 6: Handle Conjunction Gates

Some capabilities are guarded by multiple flags that must all be true: gate("A") && gate("B") && gate("C"). Any one being DARK is sufficient to make the capability DARK, but the per-flag classification still belongs to each flag individually.

# After finding the gate-call site for the primary flag in Step 2, scan the
# enclosing predicate for other gate(...) calls
grep -n -C 5 "$FLAG" "$BUNDLE" | grep -oE 'gate\("[^"]+"' | sort -u

For each co-gate string surfaced:

  • Repeat Steps 1–5 for that flag (treat each as its own probe).
  • Record the per-flag classification.
  • Compute the capability-level classification: LIVE iff all conjuncts are LIVE; DARK if any conjunct is DARK; INDETERMINATE if no conjunct is DARK and at least one is INDETERMINATE.

Got: every conjunct identified and individually classified, plus a derived capability-level classification.

If fail: if the predicate is too minified to enumerate cleanly (call site is inlined or wrapped), record the conjunction as "≥1 additional gate, structure unreadable" and downgrade the capability-level classification to INDETERMINATE even if the primary flag looks LIVE.

Step 7: Check for Skill-Substitution

A flag may legitimately be DARK while the user-facing capability it would unlock is reachable through a different, fully-supported route — a different command, a user-invocable skill, an alternate API. The honest finding "flag DARK, capability LIVE via substitution" is common and important; missing it produces panicked dark-launch reports about capabilities users actually have.

For any candidate classification of DARK or INDETERMINATE, ask:

  • Is there a documented user-invokable command, slash command, or skill that delivers the same end-user outcome?
  • Is there an alternate API surface (different endpoint, different tool name) that returns equivalent data?
  • Does the harness publish a user-facing extension point (plugins, custom tools, hooks) that allows users to assemble the equivalent themselves?

If yes to any, append a substitution: note to the evidence row recording the alternate route and its observability (how a user reaches it, whether it is documented).

Got: for every DARK / INDETERMINATE classification, an explicit substitution check — either the route, or the explicit note "no substitution route identified."

If fail: if you suspect a substitution exists but cannot confirm the route, mark "substitution suspected; not confirmed" rather than asserting either way.

Step 8: Assemble the Evidence Table and Final Classification

Combine the four prongs into a single table. Every state claim must be paired with the observation that supports it; re-running the probe at a new version produces a diff-able artifact.

FieldValue
Flagacme_widget_v3 (synthetic placeholder)
Binary version<version-id>
Probe dateYYYY-MM-DD
Prong A — stringspresent (3 occurrences: 1 gate-call default=false, 2 telemetry)
Prong B — runtimegate-pass not observed in capability list
Prong C — on-diskno override found in ~/.config/<harness>/
Prong D — platform cacheservice cache absent / not applicable
Conjunctionnone — single-gate predicate
Substitutionuser-invokable widget slash command delivers equivalent UX
Final stateDARK (capability LIVE via substitution)

Apply the classification rules:

  • LIVE — at least one prong observed gate-pass this session AND no prong contradicts.
  • DARK — flag string present, gate-call default is false, no prong observed gate-pass, no override flips it on.
  • INDETERMINATE — gate-pass is conditional on an input or context not reproducible in this probe, OR the gate's default could not be determined, OR a conjunct is INDETERMINATE.
  • UNKNOWN — string present but not used as a gate (telemetry-only, string-table-only, env-var-only label).

Save the table as a probe artifact (e.g., probes/<flag>-<version>.md) so future probes diff against it.

Got: a complete evidence table covering all four prongs, conjunction status, substitution status, and a single final classification.

If fail: if no prong yields a usable signal (binary cannot be read, runtime cannot be invoked, on-disk and platform cache both absent), do not invent a classification. Record INDETERMINATE with the reason "no prong yielded signal" and stop.

Validation

  • Every state claim in the evidence table is paired with a specific observation (no bare assertions).
  • The flag's gate-call default value is recorded (or explicitly noted as unreadable).
  • Telemetry-event occurrences are not counted as gate evidence.
  • Conjunction gates have per-flag classifications and a capability-level classification.
  • Every DARK / INDETERMINATE row has an explicit substitution check.
  • The artifact records the binary version so future probes are diff-able.
  • No real product names, version-pinned identifiers, or dark-only flag names appear in any artifact intended for publication (see redact-for-public-disclosure).

Pitfalls

  • Conflating telemetry events with gates. A string that appears in emit("$FLAG", ...) is a label, not a gate. A flag that is "telemetry-only" has no rollout state and should be classified UNKNOWN, not DARK.
  • Skipping Prong B (live invocation). Static evidence alone (the binary says default=false) is not the same as runtime evidence (the capability did not appear). A flag with default-false in the binary may be flipped to true by a server-side override; only the runtime probe shows what the session actually got.
  • Missing the conjunction. Classifying the primary flag as LIVE because its single occurrence shows default=true while ignoring the surrounding && gate("B") && gate("C") produces a falsely confident LIVE for a capability that is actually gated by B or C.
  • Calling DARK without a substitution check. Many DARK flags are genuinely unreachable, but many others have a fully-supported user-invokable route. The substitution check is what turns "alarming dark-launch" into "honest finding."
  • Probing a stale binary version. A probe artifact with no version stamp is useless — you cannot tell whether it reflects current state or last quarter's state. Always record the version, and diff future probes against the artifact.
  • Activating the gate to confirm it. Flipping a flag to test it is not part of this skill. Some dark gates are off for safety reasons (incomplete capability, regulatory hold, unfinished migration). Document; never bypass.
  • Capturing other users' state. Prong C and Prong D inspect your own on-disk state and your own cache. Reading another user's cache is exfiltration and is out of scope.
  • Treating INDETERMINATE as a failure. It is not — it is the honest classification when evidence is partial. Forcing INDETERMINATE results into LIVE or DARK to make the report look decisive is the fastest way to be wrong.

Related Skills

  • monitor-binary-version-baselines — Phase 1 of the parent guide; the marker tracking this skill builds on supplies the candidate flag inventory.
  • conduct-empirical-wire-capture — Phase 4; deeper runtime evidence (network capture, lifecycle hooks) when Prong B's surface-level probe is insufficient.
  • security-audit-codebase — dark-launched code is part of attack-surface archaeology; this skill is the discovery half of that audit.
  • redact-for-public-disclosure — Phase 5; the redaction discipline that decides which probe artifacts can leave the private workspace.

Dépôt GitHub

pjt222/agent-almanac
Chemin: i18n/caveman-lite/skills/probe-feature-flag-state
0
agentsagentskillsai-assisted-developmentclaude-codeskillsteams

Compétences associées

content-collections

Méta

Cette compétence propose une configuration éprouvée en production pour Content Collections, un outil axé sur TypeScript qui transforme des fichiers Markdown/MDX en collections de données typées de manière sûre avec une validation Zod. Utilisez-la lors de la création de blogs, de sites de documentation ou d'applications Vite + React riches en contenu pour garantir la sécurité de typage et la validation automatique du contenu. Elle couvre tout, de la configuration du plugin Vite et de la compilation MDX à l'optimisation des déploiements et la validation des schémas.

Voir la compétence

polymarket

Méta

Cette compétence permet aux développeurs de créer des applications avec la plateforme de marchés prédictifs Polymarket, incluant l'intégration d'API pour le trading et les données de marché. Elle fournit également une diffusion de données en temps réel via WebSocket pour surveiller les transactions en direct et l'activité du marché. Utilisez-la pour mettre en œuvre des stratégies de trading ou pour créer des outils traitant les mises à jour de marché en direct.

Voir la compétence

creating-opencode-plugins

Méta

Cette compétence aide les développeurs à créer des plugins OpenCode qui s'interconnectent avec plus de 25 types d'événements tels que les commandes, les fichiers et les opérations LSP. Elle fournit la structure du plugin, les spécifications de l'API événementielle et les modèles d'implémentation pour les modules JavaScript/TypeScript. Utilisez-la lorsque vous avez besoin d'intercepter, de surveiller ou d'étendre le cycle de vie de l'assistant IA OpenCode avec une logique personnalisée pilotée par les événements.

Voir la compétence

sglang

Méta

SGLang est un framework de service LLM haute performance spécialisé dans la génération rapide et structurée pour les workflows JSON, regex et agentiques grâce à son cache de préfixe RadixAttention. Il offre une inférence nettement plus rapide, particulièrement pour les tâches avec des préfixes répétés, ce qui le rend idéal pour les sorties complexes et structurées ainsi que les conversations multi-tours. Choisissez SGLang plutôt que des alternatives comme vLLM lorsque vous avez besoin d'un décodage contraint ou que vous construisez des applications avec un partage étendu de préfixes.

Voir la compétence