Back to Skills

discover-data

rand
Updated Today
23 views
29
2
29
View on GitHub
Metaautomationdata

About

The discover-data skill automatically activates Claude's data pipeline and ETL capabilities when working with data development tasks. It provides access to nine specialized data skills including batch processing, data validation, and differential dataflow for orchestrating complex data pipelines. Use this skill when working with ETL, Airflow, stream processing, or data validation requirements.

Quick Install

Claude Code

Recommended
Plugin CommandRecommended
/plugin add https://github.com/rand/cc-polymath
Git CloneAlternative
git clone https://github.com/rand/cc-polymath.git ~/.claude/skills/discover-data

Copy and paste this command in Claude Code to install this skill

Documentation

Data Skills Discovery

Provides automatic access to comprehensive data skills.

When This Skill Activates

This skill auto-activates when you're working with:

  • ETL
  • data pipelines
  • batch processing
  • stream processing
  • data validation
  • orchestration
  • Airflow
  • timely dataflow
  • differential dataflow
  • streaming aggregations
  • windowing
  • real-time analytics

Available Skills

Quick Reference

The Data category contains 9 skills:

  1. batch-processing - Orchestrating complex data pipelines with dependencies
  2. data-validation - Validating data schema before processing
  3. dataflow-coordination - Coordination patterns for distributed dataflow systems
  4. differential-dataflow - Differential computation for incremental updates and efficient joins
  5. etl-patterns - Designing data extraction from multiple sources
  6. pipeline-orchestration - Coordinating complex multi-step data workflows
  7. stream-processing - Processing real-time event streams (Kafka, Flink)
  8. streaming-aggregations - Windowing, sessionization, time-series aggregation
  9. timely-dataflow - Low-latency streaming computation with progress tracking

Load Full Category Details

For complete descriptions and workflows:

cat skills/data/INDEX.md

This loads the full Data category index with:

  • Detailed skill descriptions
  • Usage triggers for each skill
  • Common workflow combinations
  • Cross-references to related skills

Load Specific Skills

Load individual skills as needed:

# Traditional ETL/Batch
cat skills/data/batch-processing.md
cat skills/data/data-validation.md
cat skills/data/etl-patterns.md
cat skills/data/pipeline-orchestration.md

# Stream Processing
cat skills/data/stream-processing.md
cat skills/data/streaming-aggregations.md

# Advanced Dataflow Systems
cat skills/data/timely-dataflow.md
cat skills/data/differential-dataflow.md
cat skills/data/dataflow-coordination.md

Common Workflow Combinations

Real-Time Analytics Pipeline

# Load these skills together:
cat skills/data/stream-processing.md          # Kafka setup
cat skills/data/streaming-aggregations.md     # Windowing patterns
cat skills/data/dataflow-coordination.md      # Coordination

Incremental Computation System

# Load these skills together:
cat skills/data/timely-dataflow.md           # Foundation
cat skills/data/differential-dataflow.md     # Incremental updates
cat skills/data/dataflow-coordination.md     # Distributed coordination

Hybrid Batch + Stream

# Load these skills together:
cat skills/data/batch-processing.md          # Batch jobs
cat skills/data/stream-processing.md         # Stream processing
cat skills/data/pipeline-orchestration.md    # Overall coordination

Progressive Loading

This gateway skill enables progressive loading:

  • Level 1: Gateway loads automatically (you're here now)
  • Level 2: Load category INDEX.md for full overview
  • Level 3: Load specific skills as needed

Usage Instructions

  1. Auto-activation: This skill loads automatically when Claude Code detects data work
  2. Browse skills: Run cat skills/data/INDEX.md for full category overview
  3. Load specific skills: Use bash commands above to load individual skills

Next Steps: Run cat skills/data/INDEX.md to see full category details.

GitHub Repository

rand/cc-polymath
Path: skills/discover-data
aiclaude-codeskills

Related Skills

sglang

Meta

SGLang is a high-performance LLM serving framework that specializes in fast, structured generation for JSON, regex, and agentic workflows using its RadixAttention prefix caching. It delivers significantly faster inference, especially for tasks with repeated prefixes, making it ideal for complex, structured outputs and multi-turn conversations. Choose SGLang over alternatives like vLLM when you need constrained decoding or are building applications with extensive prefix sharing.

View skill

Algorithmic Art Generation

Meta

This skill helps developers create algorithmic art using p5.js, focusing on generative art, computational aesthetics, and interactive visualizations. It automatically activates for topics like "generative art" or "p5.js visualization" and guides you through creating unique algorithms with features like seeded randomness, flow fields, and particle systems. Use it when you need to build reproducible, code-driven artistic patterns.

View skill

llamaindex

Meta

LlamaIndex is a data framework for building RAG-powered LLM applications, specializing in document ingestion, indexing, and querying. It provides key features like vector indices, query engines, and agents, and supports over 300 data connectors. Use it for document Q&A, chatbots, and knowledge retrieval when building data-centric applications.

View skill

csv-data-summarizer

Meta

This skill automatically analyzes CSV files to generate comprehensive statistical summaries and visualizations using Python's pandas and matplotlib/seaborn. It should be triggered whenever a user uploads or references CSV data without prompting for analysis preferences. The tool provides immediate insights into data structure, quality, and patterns through automated analysis and visualization.

View skill