Back to Skills

gemma_telemetry_retention_detector

Foundup
Updated Today
17 views
4
2
4
View on GitHub
Othergeneral

About

This skill provides fast binary classification of YouTube telemetry records to determine retention strategy. It uses pattern matching to scan heartbeat data as the first phase in a cleanup workflow. Developers should use it for quick initial classification before passing records to downstream agents for retention execution.

Documentation

Gemma Telemetry Retention Detector

Purpose: Fast pattern matching to classify YouTube DAE heartbeat records for retention strategy

Architecture: Phase 1 of Gemma→Qwen→0102 cleanup wardrobe pattern

WSP Compliance

  • WSP 77: Agent Coordination (Gemma fast classification → Qwen strategy)
  • WSP 91: DAEMON Observability (telemetry lifecycle management)
  • WSP 96: WRE Skills Wardrobe (autonomous cleanup execution)

Task Description

Scan data/foundups.db::youtube_heartbeats table and classify records into retention categories using fast binary pattern matching.

Input Contract

{
  "database_path": "data/foundups.db",
  "table": "youtube_heartbeats",
  "scan_limit": 1000,
  "current_timestamp": "2025-10-27T20:00:00Z"
}

Classification Rules (Fast Binary Decisions)

Rule 1: Recent Activity (KEEP)

  • Age: < 30 days
  • Pattern: High training value, operational visibility
  • Binary decision: category = "keep_recent"

Rule 2: Training Data (KEEP)

  • Age: 30-90 days
  • Pattern: Historical patterns for Gemma learning
  • Binary decision: category = "keep_training"

Rule 3: Archivable (ARCHIVE)

  • Age: 91-365 days
  • Pattern: Historical value but low operational need
  • Binary decision: category = "archive_candidate"

Rule 4: Purgeable (PURGE)

  • Age: > 365 days
  • Pattern: Minimal value, disk space reclamation
  • Binary decision: category = "purge_candidate"

Output Contract

{
  "scan_timestamp": "2025-10-27T20:00:00Z",
  "total_records_scanned": 3719,
  "categories": {
    "keep_recent": {
      "count": 1200,
      "age_range_days": "0-30",
      "disk_mb": 45
    },
    "keep_training": {
      "count": 1500,
      "age_range_days": "30-90",
      "disk_mb": 95
    },
    "archive_candidate": {
      "count": 800,
      "age_range_days": "91-365",
      "disk_mb": 70
    },
    "purge_candidate": {
      "count": 219,
      "age_range_days": ">365",
      "disk_mb": 19
    }
  },
  "recommendation": "archive_and_vacuum",
  "estimated_reclaim_mb": 89,
  "confidence": 0.95
}

Execution Logic (Gemma Implementation)

from datetime import datetime, timedelta, timezone
import sqlite3

def classify_heartbeat_age(timestamp_iso: str, now: datetime) -> str:
    """Fast binary classification by age"""
    ts = datetime.fromisoformat(timestamp_iso.replace('Z', '+00:00'))
    age_days = (now - ts).days

    if age_days < 30:
        return "keep_recent"
    elif age_days < 91:
        return "keep_training"
    elif age_days < 366:
        return "archive_candidate"
    else:
        return "purge_candidate"

def scan_telemetry_retention(db_path: str) -> dict:
    """Gemma fast scan for retention categories"""
    conn = sqlite3.connect(db_path)
    cursor = conn.cursor()

    # Get all heartbeat timestamps
    cursor.execute("SELECT timestamp FROM youtube_heartbeats ORDER BY timestamp DESC")
    rows = cursor.fetchall()

    now = datetime.now(timezone.utc)
    categories = {
        "keep_recent": [],
        "keep_training": [],
        "archive_candidate": [],
        "purge_candidate": []
    }

    # Fast classification loop
    for (ts,) in rows:
        category = classify_heartbeat_age(ts, now)
        categories[category].append(ts)

    conn.close()

    # Generate output
    return {
        "scan_timestamp": now.isoformat(),
        "total_records_scanned": len(rows),
        "categories": {
            cat: {
                "count": len(records),
                "age_range_days": _get_age_range(cat),
                "disk_mb": len(records) * 0.06  # Rough estimate
            }
            for cat, records in categories.items()
        },
        "recommendation": "archive_and_vacuum" if len(categories["archive_candidate"]) > 500 else "no_action",
        "estimated_reclaim_mb": (len(categories["archive_candidate"]) + len(categories["purge_candidate"])) * 0.06,
        "confidence": 0.95
    }

def _get_age_range(category: str) -> str:
    ranges = {
        "keep_recent": "0-30",
        "keep_training": "30-90",
        "archive_candidate": "91-365",
        "purge_candidate": ">365"
    }
    return ranges[category]

Performance Metrics

  • Scan speed: 10,000 records/second (pure SQLite query)
  • Classification: <1ms per record (simple age comparison)
  • Total execution: <50ms for 3,719 records
  • Token cost: 50-100 tokens (output generation only, no LLM inference)

Pattern Memory Integration

Store execution results in wre_core/recursive_improvement/metrics/telemetry_cleanup_metrics.jsonl:

{
  "skill": "gemma_telemetry_retention_detector",
  "timestamp": "2025-10-27T20:00:00Z",
  "execution_time_ms": 47,
  "records_scanned": 3719,
  "recommendation": "archive_and_vacuum",
  "estimated_reclaim_mb": 89,
  "pattern_fidelity": 0.95
}

Next Phase

When recommendation == "archive_and_vacuum", trigger:

  • Phase 2: qwen_telemetry_cleanup_strategist - Strategic cleanup plan
  • Phase 3: 0102 validation and execution

Training Value

Gemma learns:

  • Fast age-based classification patterns
  • Binary decision thresholds (30/90/365 days)
  • Disk usage estimation heuristics

Pattern reuse:

  • Same logic applies to other telemetry tables
  • Reusable for foundups_selenium telemetry
  • Generic retention classifier for any time-series data

Quick Install

/plugin add https://github.com/Foundup/Foundups-Agent/tree/main/gemma_telemetry_retention_detector

Copy and paste this command in Claude Code to install this skill

GitHub 仓库

Foundup/Foundups-Agent
Path: modules/communication/livechat/skills/gemma_telemetry_retention_detector
bitcoinblockchain-technologydaesdaofoundupspartifact

Related Skills

subagent-driven-development

Development

This skill executes implementation plans by dispatching a fresh subagent for each independent task, with code review between tasks. It enables fast iteration while maintaining quality gates through this review process. Use it when working on mostly independent tasks within the same session to ensure continuous progress with built-in quality checks.

View skill

cost-optimization

Other

This Claude Skill helps developers optimize cloud costs through resource rightsizing, tagging strategies, and spending analysis. It provides a framework for reducing cloud expenses and implementing cost governance across AWS, Azure, and GCP. Use it when you need to analyze infrastructure costs, right-size resources, or meet budget constraints.

View skill

algorithmic-art

Meta

This Claude Skill creates original algorithmic art using p5.js with seeded randomness and interactive parameters. It generates .md files for algorithmic philosophies, plus .html and .js files for interactive generative art implementations. Use it when developers need to create flow fields, particle systems, or other computational art while avoiding copyright issues.

View skill

executing-plans

Design

Use the executing-plans skill when you have a complete implementation plan to execute in controlled batches with review checkpoints. It loads and critically reviews the plan, then executes tasks in small batches (default 3 tasks) while reporting progress between each batch for architect review. This ensures systematic implementation with built-in quality control checkpoints.

View skill