SKILL·9183EC

monitor-binary-version-baselines

Name: monitor-binary-version-baselines
Author: pjt222

pjt222

Mis à jour 1 month ago

9 vues

Développementaiapi

À propos

Cette compétence suit les changements des binaires CLI à travers les versions en détectant des marqueurs catégorisés comme les API et les drapeaux, en utilisant un score pondéré pour établir des références. Elle aide les développeurs à surveiller les cycles de vie des fonctionnalités, à découvrir des capacités cachées et à valider les outils d'analyse par rapport aux binaires historiques. Utilisez-la pour une analyse longitudinale de l'évolution du contenu binaire entre les versions.

Installation rapide

Claude Code

Recommandé

Principal

npx skills add pjt222/agent-almanac -a claude-code

Commande PluginAlternatif

/plugin add https://github.com/pjt222/agent-almanac

Git CloneAlternatif

git clone https://github.com/pjt222/agent-almanac.git ~/.claude/skills/monitor-binary-version-baselines

Copiez et collez cette commande dans Claude Code pour installer cette compétence

Documentation

Monitor Binary Version Baselines

Build + maintain comparable, version-keyed records of feature-system markers in CLI harness binary → additions/removals/dark-launched detected mechanically across releases.

Use When

Track feature lifecycle across releases of closed-source CLI harness
Probe dark-launched (shipped but gated off) or quietly-removed
Verify scanner still detects known-good markers on old binaries (regression-test scanner)
Phase 1 substrate consumed by later phases (flag discovery, dark-launch detection, wire capture)
Ad-hoc grep answers "X present today" but you need "how has system X+Y+Z moved across versions"

In

Required: ≥1 installed binary versions of same CLI harness (or extracted bundles)
Required: Working catalog file for marker defs (created on first run, extended)
Optional: Prior baseline file (extended in place, never rewritten)
Optional: List of never-published versions (skipped, withdrawn)
Optional: List of feature-systems already tracked, to extend not re-discover

Do

Step 1: Markers by Category

Pick stable semantically meaningful identifiers — not minified names bundler will rename next release.

Six categories:

API — endpoint paths, method names exposed in network surface
Identity — internal product names, codenames, version sentinels
Config — recognized keys in user-facing config files
Telemetry — event names emitted to analytics pipeline
Flag — feature-gate keys consumed by gate predicates
Function — well-known string constants in handlers (err msgs, log labels)

Avoid: short minified-looking (_a1, bX, two-letter+digits), inline literals that change w/ text revision, anything matching bundler's internal naming.

→ Each candidate has category tag + short justification ("appears in user-facing docs," "stable across N prior releases"). Typical first pass: 20-50 markers per system.

If err: markers vanish across consecutive minor versions → catalog captured rebuild-volatile not stable. Drop entries, broaden to longer semantically anchored substrings.

Step 2: Group by Feature-System

Bundle markers per system table for independently-evolving capability. "System" = coherent marker set whose presence/absence moves together (shared lifecycle).

Why: per-system scoring prevents cross-contamination. Absence of one system's markers must not suppress detection of another. Aggregate counts across unrelated systems = uninformative.

Working catalog shape (pseudocode):

catalog:
  acme_widget_v3:
    markers:
      - { id: "acme_widget_v3_init",         category: function, weight: 10 }
      - { id: "acme.widget.v3.dialog.open",  category: telemetry, weight: 5 }
      - { id: "ACME_WIDGET_V3_DISABLE",      category: flag,     weight: 10 }
  acme_other_system:
    markers:
      - ...

→ Each system has own marker list; no marker in two systems. New system = new top-level entry, never move retroactively.

If err: hard to assign (overlap, ambiguity) → defs too coarse. Split system or accept "shared substrate", exclude from per-system scoring.

Step 3: Weight by Signal Strength

Per marker:

10 = diagnostic-alone — unique enough alone confirms (long, system-specific, no other code path emits)
3-5 = corroborating only — too generic alone, contributes to aggregate (short telemetry suffix reused)

Convention not specific numbers. Spread "diagnostic" vs. "corroborating" matters more than exact integers. Thresholds Step 5 must distinguish "one strong signal" from "many weak signals."

→ Each marker weighted. Distribution skews toward corroborating (3-5), small number diagnostic-alone (10) per system.

If err: every marker = 10 → scoring loses resolution, partial-presence impossible. Demote markers recurring across systems or in unrelated handlers.

Step 4: Per-Version Baselines

Per scanned version, record both present + absent keyed by version. Both = evidence: absent in N is informative when N+1 reintroduces.

Shape:

baselines:
  "1.4.0":
    acme_widget_v3:
      present: ["acme_widget_v3_init", "ACME_WIDGET_V3_DISABLE"]
      absent:  ["acme.widget.v3.dialog.open"]
      score:   20
  "1.5.0":
    acme_widget_v3:
      present: ["acme_widget_v3_init", "ACME_WIDGET_V3_DISABLE", "acme.widget.v3.dialog.open"]
      absent:  []
      score:   25
  "1.4.1":
    _annotation: "never-published; skipped from upstream release timeline"

Never-published get explicit annotation, not silent omission. Silent skips look like data loss to next reader.

→ Every version has one record per tracked system: present, absent, score populated, or explicit _annotation if never-published.

If err: scan yields zero markers for previously-present system → don't assume removal until you confirm binary path correct, strings cmd produced output, marker IDs match catalog exactly. False zeroes corrupt longitudinal record.

Step 5: Thresholds for Full + Partial Detection

Two gates per system on aggregate score:

full — score above which system = present-and-active in this version
partial — score above which system = shipped-but-incomplete (some markers, below full)

Below partial = absent (or not-yet-present, depending on direction).

thresholds:
  acme_widget_v3:
    full:    25
    partial: 10

Choose: full = sum of weights healthy install would emit; partial = one diagnostic + corroborating signal. Re-tune w/ several versions evidence.

→ Each scan: per-system finding full | partial | absent. partial warrants investigation — dark-launch + removal candidates.

If err: every system reports partial across every version → thresholds too sensitive (set higher than markers can sum). Recalibrate vs. known-good version where system verifiably live.

Step 6: Scan w/ `strings -n 8`

strings w/ min length filter as extraction primitive. -n 8 floor filters most noise (short fragments, padding, address-table junk) w/o losing meaningful identifiers (almost always >8 chars).

strings -n 8 path/to/binary > /tmp/binary-strings.txt

Then catalog match vs. /tmp/binary-strings.txt (any line-oriented matcher: grep -F -f markers.txt, ripgrep, small script).

Caveats:

Lower (-n 4, -n 6) → flood w/ binary garbage + minified-symbol noise; diagnostic/corroborating collapses
Higher (-n 12+) → miss short flag identifiers + config keys
Some bundlers compress/encode → near-empty output → may need bundle-extraction first (out of scope)

→ Line-per-string out 1k-100k lines depending on binary size. Manual inspection reveals recognizable identifiers in first 100 lines.

If err: empty/unrecognizable → binary likely packed, encrypted, or bytecode format strings can't read. Stop, resolve at extraction layer. Don't record baseline from unreadable scan.

Step 7: Extend Forward, Don't Rewrite Past Records

New system or marker added to catalog → only forward versions scanned. Past records remain as originally written.

Why: prior baselines = empirical evidence of what was scanned at the time, not current model of what past version contained. Retroactive rewriting conflates "what we know now" w/ "what we observed then." Both useful, only one in baseline file.

Retroactive scan genuinely needed (test if new marker was in N-3) → record as separate addendum:

addenda:
  "1.4.0":
    scan_date: "2026-04-15"
    catalog_revision: "v7"
    findings:
      acme_new_system:
        present: ["..."]

Original baselines["1.4.0"] untouched. Reader sees both original + later retroactive, w/ respective catalog revisions.

→ Baseline grows monotonically forward; past records append-only w/ optional addenda. Catalog revisions versioned so each scan tied back to catalog state used.

If err: urge to edit past version's present list directly → stop. Add addendum. Mutating past records loses ability to detect scanner regressions (later scanner-validation relies on historical record being immutable).

Check

Catalog has explicit category tags every marker (API/identity/config/telemetry/flag/function)
Every marker assigned to exactly one system; no duplicates
Weights span real range (some 10s, some 3-5); not all identical
Each scanned version: present + absent + score per tracked system
Never-published explicitly annotated, not silent omission
Each system: full + partial thresholds; findings labeled
strings -n 8 extraction primitive (or documented equivalent for non-text binaries)
Past version records unchanged by latest scan; new findings in addenda if retroactive

Traps

Recording specific findings as catalog: Catalog should describe marker categories + shapes, not enumerate version-pinned literals. Finding-shaped entries decay fast + highest leak risk if published.
Capturing minified identifiers: _p3a, q9X rename every rebuild. Match today = noise tomorrow. Stay w/ semantically meaningful.
Conflating telemetry vs. flags: Share naming conventions in many harnesses but different roles. Tag by category (Step 1) so per-category analysis stays clean.
Silent skip never-published: Gap w/ no annotation looks like missed scan. Annotate: _annotation: "never-published".
Thresholds before baseline data: First scan establishes empirical weight totals; tune vs. that, not in advance.
Rewriting prior records when catalog grows: Past records = evidence; addenda = supported pattern for retroactive.
Trust empty scan output: Zero found ≠ "absent." Confirm binary readable + catalog IDs match exactly before declaring removal.
strings -n 4 more thorough than -n 8: Lower mins add noise faster than signal. Diagnostic markers essentially always 8+ chars.

→

security-audit-codebase — shared discipline; both treat marker presence as finding, different downstream consumers
audit-dependency-versions — extends same version-tracking rigor to external dep manifests; this applies to binary artifacts
probe-feature-flag-state — Phase 2-3 follow-up; consumes baselines to classify flag rollout state (live/opt-in/dark/removed)
conduct-empirical-wire-capture — Phase 4 follow-up; validates inferred behavior vs. actual harness traffic
redact-for-public-disclosure — Phase 5 follow-up; governs which findings can leave private workspace

Dépôt GitHub

pjt222/agent-almanac

Chemin: i18n/caveman-ultra/skills/monitor-binary-version-baselines

agentsagentskillsai-assisted-developmentclaude-codeskillsteams

FAQ

Frequently asked questions

What is the monitor-binary-version-baselines skill?

monitor-binary-version-baselines is a Claude Skill by pjt222. Skills package instructions and resources that Claude loads on demand, so Claude can perform monitor-binary-version-baselines-related tasks without extra prompting.

How do I install monitor-binary-version-baselines?

Use the install commands on this page: add monitor-binary-version-baselines to Claude Code as a plugin, or clone its repository into your skills directory, then restart Claude so it picks up the skill.

What category does monitor-binary-version-baselines belong to?

monitor-binary-version-baselines is in the Development category, tagged ai and api.

Is monitor-binary-version-baselines free to use?

Yes. monitor-binary-version-baselines is listed on AIMCP and free to install. It runs inside Claude, so no separate service account is required to use the skill itself.

Compétences associées

qmd

Développement

qmd est un outil CLI de recherche et d'indexation locale qui permet aux développeurs d'indexer et de rechercher dans des fichiers locaux en utilisant une recherche hybride combinant BM25, des embeddings vectoriels et du reranking. Il prend en charge à la fois une utilisation en ligne de commande et un mode MCP (Model Context Protocol) pour l'intégration avec Claude. L'outil utilise Ollama pour les embeddings et stocke les index localement, ce qui le rend idéal pour rechercher dans de la documentation ou des bases de code directement depuis le terminal.

Voir la compétence

subagent-driven-development

Développement

Cette compétence exécute des plans de mise en œuvre en déployant un nouveau sous-agent pour chaque tâche indépendante, avec une revue de code entre les tâches. Elle permet une itération rapide tout en maintenant des contrôles de qualité grâce à ce processus de revue. Utilisez-la lorsque vous travaillez sur des tâches principalement indépendantes au sein d'une même session pour assurer une progression continue avec des vérifications de qualité intégrées.

Voir la compétence

mcporter

Développement

La compétence mcporter permet aux développeurs de gérer et d'appeler des serveurs Model Context Protocol (MCP) directement depuis Claude. Elle fournit des commandes pour lister les serveurs disponibles, appeler leurs outils avec des arguments, et gérer l'authentification ainsi que le cycle de vie du démon. Utilisez cette compétence pour intégrer et tester les fonctionnalités des serveurs MCP dans votre flux de travail de développement.

Voir la compétence

adk-deployment-specialist

Développement

Cette compétence déploie et orchestre des agents Vertex AI ADK en utilisant le protocole A2A, gérant la découverte d'AgentCard, la soumission de tâches, et prenant en charge des outils tels que le bac à sable d'exécution de code et la banque de mémoire. Elle permet de construire des systèmes multi-agents avec des modèles d'orchestration séquentiels, parallèles ou en boucle en Python, Java ou Go. Utilisez-la lorsqu'on vous demande de déployer des agents ADK ou d'orchestrer des flux de travail d'agents sur Google Cloud.

Voir la compétence

monitor-binary-version-baselines

À propos

Installation rapide

Claude Code

Documentation

Monitor Binary Version Baselines

Use When

In

Do

Step 1: Markers by Category

Step 2: Group by Feature-System

Step 3: Weight by Signal Strength

Step 4: Per-Version Baselines

Step 5: Thresholds for Full + Partial Detection

Step 6: Scan w/ strings -n 8

Step 7: Extend Forward, Don't Rewrite Past Records

Check

Traps

→

Dépôt GitHub

Frequently asked questions

What is the monitor-binary-version-baselines skill?

How do I install monitor-binary-version-baselines?

What category does monitor-binary-version-baselines belong to?

Is monitor-binary-version-baselines free to use?

Compétences associées

Step 6: Scan w/ `strings -n 8`