discover-data

rand

Updated Today

23 views

Metaautomationdata

About

The discover-data skill automatically activates Claude's data pipeline and ETL capabilities when working with data development tasks. It provides access to nine specialized data skills including batch processing, data validation, and differential dataflow for orchestrating complex data pipelines. Use this skill when working with ETL, Airflow, stream processing, or data validation requirements.

Quick Install

Claude Code

Recommended

Plugin CommandRecommended

/plugin add https://github.com/rand/cc-polymath

Git CloneAlternative

git clone https://github.com/rand/cc-polymath.git ~/.claude/skills/discover-data

Copy and paste this command in Claude Code to install this skill

Documentation

Data Skills Discovery

Provides automatic access to comprehensive data skills.

When This Skill Activates

This skill auto-activates when you're working with:

ETL
data pipelines
batch processing
stream processing
data validation
orchestration
Airflow
timely dataflow
differential dataflow
streaming aggregations
windowing
real-time analytics

Available Skills

Quick Reference

The Data category contains 9 skills:

batch-processing - Orchestrating complex data pipelines with dependencies
data-validation - Validating data schema before processing
dataflow-coordination - Coordination patterns for distributed dataflow systems
differential-dataflow - Differential computation for incremental updates and efficient joins
etl-patterns - Designing data extraction from multiple sources
pipeline-orchestration - Coordinating complex multi-step data workflows
stream-processing - Processing real-time event streams (Kafka, Flink)
streaming-aggregations - Windowing, sessionization, time-series aggregation
timely-dataflow - Low-latency streaming computation with progress tracking

Load Full Category Details

For complete descriptions and workflows:

cat skills/data/INDEX.md

This loads the full Data category index with:

Detailed skill descriptions
Usage triggers for each skill
Common workflow combinations
Cross-references to related skills

Load Specific Skills

Load individual skills as needed:

# Traditional ETL/Batch
cat skills/data/batch-processing.md
cat skills/data/data-validation.md
cat skills/data/etl-patterns.md
cat skills/data/pipeline-orchestration.md

# Stream Processing
cat skills/data/stream-processing.md
cat skills/data/streaming-aggregations.md

# Advanced Dataflow Systems
cat skills/data/timely-dataflow.md
cat skills/data/differential-dataflow.md
cat skills/data/dataflow-coordination.md

Common Workflow Combinations

Real-Time Analytics Pipeline

# Load these skills together:
cat skills/data/stream-processing.md          # Kafka setup
cat skills/data/streaming-aggregations.md     # Windowing patterns
cat skills/data/dataflow-coordination.md      # Coordination

Incremental Computation System

# Load these skills together:
cat skills/data/timely-dataflow.md           # Foundation
cat skills/data/differential-dataflow.md     # Incremental updates
cat skills/data/dataflow-coordination.md     # Distributed coordination

Hybrid Batch + Stream

# Load these skills together:
cat skills/data/batch-processing.md          # Batch jobs
cat skills/data/stream-processing.md         # Stream processing
cat skills/data/pipeline-orchestration.md    # Overall coordination

Progressive Loading

This gateway skill enables progressive loading:

Level 1: Gateway loads automatically (you're here now)
Level 2: Load category INDEX.md for full overview
Level 3: Load specific skills as needed

Usage Instructions

Auto-activation: This skill loads automatically when Claude Code detects data work
Browse skills: Run cat skills/data/INDEX.md for full category overview
Load specific skills: Use bash commands above to load individual skills

Next Steps: Run cat skills/data/INDEX.md to see full category details.

GitHub Repository

rand/cc-polymath

Path: skills/discover-data

aiclaude-codeskills

Related Skills

sglang

Meta

SGLang is a high-performance LLM serving framework that specializes in fast, structured generation for JSON, regex, and agentic workflows using its RadixAttention prefix caching. It delivers significantly faster inference, especially for tasks with repeated prefixes, making it ideal for complex, structured outputs and multi-turn conversations. Choose SGLang over alternatives like vLLM when you need constrained decoding or are building applications with extensive prefix sharing.

View skill

Algorithmic Art Generation

Meta

This skill helps developers create algorithmic art using p5.js, focusing on generative art, computational aesthetics, and interactive visualizations. It automatically activates for topics like "generative art" or "p5.js visualization" and guides you through creating unique algorithms with features like seeded randomness, flow fields, and particle systems. Use it when you need to build reproducible, code-driven artistic patterns.

View skill

llamaindex

Meta

LlamaIndex is a data framework for building RAG-powered LLM applications, specializing in document ingestion, indexing, and querying. It provides key features like vector indices, query engines, and agents, and supports over 300 data connectors. Use it for document Q&A, chatbots, and knowledge retrieval when building data-centric applications.

View skill

csv-data-summarizer

Meta

This skill automatically analyzes CSV files to generate comprehensive statistical summaries and visualizations using Python's pandas and matplotlib/seaborn. It should be triggered whenever a user uploads or references CSV data without prompting for analysis preferences. The tool provides immediate insights into data structure, quality, and patterns through automated analysis and visualization.

View skill