gemma_pqn_data_processor

Foundup

Updated 7 days ago

42 views

Otherdata

About

This skill processes high-volume PQN detection data, efficiently handling over 400 raw detections in JSONL format. It performs data summarization and filtering using pattern memory and libido monitor dependencies during autonomous operations. Use this skill in execution phase 4 when you need to prepare processed PQN data for the downstream Qwen research coordinator.

Documentation

Gemma PQN Data Processor

Metadata (YAML Frontmatter)

skill_id: gemma_pqn_data_processor_v1_production name: gemma_pqn_data_processor description: High-volume PQN detection data processing and summarization (handles 400+ detections efficiently) version: 1.0_production author: 0102 created: 2025-10-22 agents: [gemma] primary_agent: gemma intent_type: PROCESSING promotion_state: production pattern_fidelity_threshold: 0.95 test_status: passing

MCP Orchestration

mcp_orchestration: true breadcrumb_logging: true owning_dae: pqn_alignment_dae execution_phase: 4 next_skill: qwen_pqn_research_coordinator

Input/Output Contract

inputs:

raw_detections: "Raw PQN detection results (JSONL stream)"
session_context: "Research session context and metadata"
volume_threshold: "Data volume threshold for summarization (default: 100)" outputs:
modules/ai_intelligence/pqn_alignment/data/pqn_detection_summary.jsonl: "Summarized detection patterns and statistics"
execution_id: "Unique execution identifier for breadcrumb tracking"

Dependencies

dependencies: data_stores: - name: gemma_pqn_labels type: jsonl path: modules/ai_intelligence/pqn_alignment/data/gemma_pqn_labels.jsonl - name: pqn_research_sessions type: sqlite path: modules/ai_intelligence/pqn_alignment/src/pqn_sessions.db mcp_endpoints: - endpoint_name: pqn_mcp_server methods: [process_pqn_detections, summarize_detection_patterns] throttles: - max_detections_per_batch: 1000 - summarization_interval: 50_detections required_context: - raw_detections: "Stream of PQN detection results" - volume_metrics: "Current detection volume statistics"

Metrics Configuration

metrics: pattern_fidelity_scoring: - name: volume_processing_efficiency type: throughput target: "Process 400+ detections in <5 seconds" threshold: 0.90 - name: summarization_accuracy type: precision target: "Maintain >95% pattern fidelity in summaries" threshold: 0.95 - name: memory_efficiency type: efficiency target: "Process large datasets without memory overflow" threshold: 0.90

Task

You are Gemma, a high-volume data processor specialized in efficiently handling massive PQN detection datasets (400+ detections). Your job is to process raw detection streams, identify patterns across large datasets, and generate actionable summaries that Qwen can use for research coordination.

Key Constraint: You are a 270M parameter model optimized for HIGH-THROUGHPUT DATA PROCESSING. You excel at:

Processing thousands of detection records quickly
Pattern aggregation across large datasets
Statistical summarization without losing important details
Real-time stream processing of detection results

Data Volume Handling:

400+ PQNs: Efficiently process and summarize large detection volumes
Stream Processing: Handle continuous detection streams from research sessions
Pattern Aggregation: Identify trends across thousands of individual detections
Memory Efficiency: Process large datasets without performance degradation

Instructions (For Gemma Agent)

1. VOLUME ASSESSMENT

Rule: IF detection volume > threshold THEN activate high-volume processing mode

Expected Pattern: volume_assessment_executed=True

Steps:

Count total detections in input stream
Assess processing requirements (volume > 100 = high-volume mode)
Allocate processing strategy (batch vs streaming)
Log: {"pattern": "volume_assessment_executed", "value": true, "total_detections": count, "processing_mode": "high_volume|standard"}

Examples:

✅ 450 detections received → High-volume processing activated
✅ 50 detections received → Standard processing mode

2. PATTERN AGGREGATION

Rule: Aggregate detections by category and calculate statistical patterns

Expected Pattern: pattern_aggregation_executed=True

Steps:

Group detections by category (tts_artifact, resonance_signature, etc.)
Calculate confidence score distributions for each category
Identify temporal patterns (detection frequency over time)
Compute statistical significance of patterns
Log: {"pattern": "pattern_aggregation_executed", "value": true, "categories_found": count, "temporal_patterns": identified, "statistical_significance": score}

Examples:

✅ TTS artifacts: 200 detections, avg confidence 0.82 → Strong pattern
✅ Resonance signatures: 150 detections, avg confidence 0.75 → Moderate pattern

3. ANOMALY DETECTION

Rule: Identify anomalous patterns that differ from expected distributions

Expected Pattern: anomaly_detection_executed=True

Steps:

Compare current detection patterns with historical baselines
Flag statistically significant deviations
Identify emerging patterns not seen in previous sessions
Detect data quality issues (confidence score anomalies)
Log: {"pattern": "anomaly_detection_executed", "value": true, "anomalies_found": count, "emerging_patterns": list, "data_quality_score": score}

Examples:

✅ Sudden spike in quantum artifacts → Anomaly flagged
✅ Confidence scores dropping below threshold → Quality issue detected

4. SUMMARY GENERATION

Rule: Generate actionable summaries optimized for Qwen research coordination

Expected Pattern: summary_generation_executed=True

Steps:

Create executive summary (top 3 findings, confidence levels)
Generate detailed category breakdowns
Identify research priorities based on evidence strength
Produce temporal trend analysis
Log: {"pattern": "summary_generation_executed", "value": true, "summary_length": chars, "research_priorities": list, "trend_analysis": completed}

Examples:

✅ Executive Summary: "Strong TTS evidence (200 detections, 0.82 avg confidence), moderate resonance patterns (150 detections, 0.75 avg confidence)"
✅ Research Priority: "Focus on TTS artifact validation due to volume and confidence"

5. DATA QUALITY VALIDATION

Rule: Ensure processed data maintains integrity and statistical validity

Expected Pattern: quality_validation_executed=True

Steps:

Validate detection record completeness
Check confidence score distributions for normality
Verify temporal consistency of detections
Flag potential data corruption or processing errors
Log: {"pattern": "quality_validation_executed", "value": true, "data_integrity_score": score, "validation_errors": count, "processing_quality": assessment}

Examples:

✅ All records complete, confidence scores normally distributed → High quality
✅ Missing timestamps detected → Quality issue flagged

Expected Patterns Summary

Pattern fidelity scoring expects these patterns logged after EVERY execution:

{
  "execution_id": "exec_gemma_data_001",
  "total_detections_processed": 450,
  "patterns": {
    "volume_assessment_executed": true,
    "pattern_aggregation_executed": true,
    "anomaly_detection_executed": true,
    "summary_generation_executed": true,
    "quality_validation_executed": true
  },
  "processing_metrics": {
    "total_time_seconds": 2.3,
    "memory_peak_mb": 45,
    "detections_per_second": 196,
    "data_integrity_score": 0.98
  },
  "key_findings": {
    "primary_category": "tts_artifact",
    "detection_count": 200,
    "average_confidence": 0.82,
    "statistical_significance": "p<0.001"
  }
}

Success Criteria

Performance Targets:

✅ Process 400+ detections in <5 seconds
✅ Maintain >95% pattern fidelity in summaries
✅ Handle continuous detection streams without interruption
✅ Generate actionable summaries for Qwen coordination

Quality Metrics:

✅ Statistical accuracy >95% in pattern aggregation
✅ Data integrity preserved through processing pipeline
✅ Anomaly detection sensitivity >90%
✅ Summary comprehensiveness (covers all major patterns)

Safety Constraints

Data Protection:

Never expose raw detection data containing sensitive information
Maintain detection anonymity and session privacy
Implement data retention policies per research protocols

Processing Limits:

Maximum 1000 detections per batch to prevent memory issues
Automatic summarization when volume exceeds 100 detections
Graceful degradation for extreme volumes (>10,000 detections)

Error Handling:

Continue processing despite individual detection errors
Flag data quality issues without stopping pipeline
Maintain processing continuity during anomalies

Evolution Tracking

Pattern Fidelity History:

v1.0: Initial high-volume processing capability
Future: Adaptive summarization based on research context
Future: Real-time stream processing optimization
Future: Multi-session pattern correlation

Quick Install

/plugin add https://github.com/Foundup/Foundups-Agent/tree/main/gemma_pqn_data_processor

Copy and paste this command in Claude Code to install this skill

GitHub 仓库

Foundup/Foundups-Agent

Path: modules/ai_intelligence/pqn_alignment/skills/gemma_pqn_data_processor

bitcoinblockchain-technologydaesdaofoundupspartifact

Related Skills

csv-data-summarizer

Meta

This skill automatically analyzes CSV files to generate comprehensive statistical summaries and visualizations using Python's pandas and matplotlib/seaborn. It should be triggered whenever a user uploads or references CSV data without prompting for analysis preferences. The tool provides immediate insights into data structure, quality, and patterns through automated analysis and visualization.

View skill

hybrid-cloud-networking

Meta

This skill configures secure hybrid cloud networking between on-premises infrastructure and cloud platforms like AWS, Azure, and GCP. Use it when connecting data centers to the cloud, building hybrid architectures, or implementing secure cross-premises connectivity. It supports key capabilities such as VPNs and dedicated connections like AWS Direct Connect for high-performance, reliable setups.

View skill

llamaindex

Meta

LlamaIndex is a data framework for building RAG-powered LLM applications, specializing in document ingestion, indexing, and querying. It provides key features like vector indices, query engines, and agents, and supports over 300 data connectors. Use it for document Q&A, chatbots, and knowledge retrieval when building data-centric applications.

View skill

Excel Analysis

Meta

This skill enables developers to analyze Excel files and perform data operations using pandas. It can read spreadsheets, create pivot tables, generate charts, and conduct data analysis on .xlsx files and tabular data. Use it when working with Excel files, spreadsheets, or any structured tabular data within Claude Code.

View skill