Back to Skills

data-orchestrator

majiayu000
Updated Yesterday
2 views
58
9
58
View on GitHub
Otherdata

About

The data-orchestrator skill coordinates data pipeline tasks including ETL, analytics, and feature engineering. Use it when implementing data ingestion, transformations, quality checks, or analytics workflows. It manages pipeline definitions, enforces a 95% data quality standard, and handles data governance like schema management and lineage tracking.

Quick Install

Claude Code

Recommended
Plugin CommandRecommended
/plugin add https://github.com/majiayu000/claude-skill-registry
Git CloneAlternative
git clone https://github.com/majiayu000/claude-skill-registry.git ~/.claude/skills/data-orchestrator

Copy and paste this command in Claude Code to install this skill

Documentation


name: data-orchestrator description: Coordinates data pipeline tasks (ETL, analytics, feature engineering). Use when implementing data ingestion, transformations, quality checks, or analytics. Applies data-quality-standard.md (95% minimum).

Data Orchestrator Skill

Role

Acts as CTO-Data, managing all data processing, analytics, and pipeline tasks.

Responsibilities

  1. Data Pipeline Management

    • ETL/ELT processes
    • Data validation
    • Quality assurance
    • Pipeline monitoring
  2. Analytics Coordination

    • Feature engineering
    • Model integration
    • Report generation
    • Metric calculation
  3. Data Governance

    • Schema management
    • Data lineage tracking
    • Privacy compliance
    • Access control
  4. Context Maintenance

    ai-state/active/data/
    ├── pipelines.json    # Pipeline definitions
    ├── features.json     # Feature registry
    ├── quality.json      # Data quality metrics
    └── tasks/           # Active data tasks
    

Skill Coordination

Available Data Skills

  • etl-skill - Extract, transform, load operations
  • feature-engineering-skill - Feature creation
  • analytics-skill - Analysis and reporting
  • quality-skill - Data quality checks
  • pipeline-skill - Pipeline orchestration

Context Package to Skills

context:
  task_id: "task-003-pipeline"
  pipelines:
    existing: ["daily_aggregation", "customer_segmentation"]
    schedule: "0 2 * * *"
  features:
    current: ["revenue_30d", "churn_risk"]
    dependencies: ["transactions", "customers"]
  standards:
    - "data-quality-standard.md"
    - "feature-engineering.md"
  test_requirements:
    quality: ["completeness", "accuracy", "timeliness"]

Task Processing Flow

  1. Receive Task

    • Identify data sources
    • Check dependencies
    • Validate requirements
  2. Prepare Context

    • Current pipeline state
    • Feature definitions
    • Quality metrics
  3. Assign to Skill

    • Choose data skill
    • Set parameters
    • Define outputs
  4. Monitor Execution

    • Track pipeline progress
    • Monitor resource usage
    • Check quality gates
  5. Validate Results

    • Data quality checks
    • Output validation
    • Performance metrics
    • Lineage tracking

Data-Specific Standards

Pipeline Checklist

  • Input validation
  • Error handling
  • Checkpoint/recovery
  • Monitoring enabled
  • Documentation updated
  • Performance optimized

Quality Checklist

  • Completeness checks
  • Accuracy validation
  • Consistency rules
  • Timeliness metrics
  • Uniqueness constraints
  • Validity ranges

Feature Engineering Checklist

  • Business logic documented
  • Dependencies tracked
  • Version controlled
  • Performance tested
  • Edge cases handled
  • Monitoring added

Integration Points

With Backend Orchestrator

  • Data model alignment
  • API data contracts
  • Database optimization
  • Cache strategies

With Frontend Orchestrator

  • Dashboard data requirements
  • Real-time vs batch
  • Data freshness SLAs
  • Visualization formats

With Human-Docs

Updates documentation with:

  • Pipeline changes
  • Feature definitions
  • Data dictionaries
  • Quality reports

Event Communication

Listening For

{
  "event": "data.source.updated",
  "source": "transactions",
  "schema_change": true,
  "impact": ["daily_pipeline", "revenue_features"]
}

Broadcasting

{
  "event": "data.pipeline.completed",
  "pipeline": "daily_aggregation",
  "records_processed": 50000,
  "duration": "5m 32s",
  "quality_score": 98.5
}

Test Requirements

Every Data Task Must Include

  1. Unit Tests - Transformation logic
  2. Integration Tests - Pipeline flow
  3. Data Quality Tests - Accuracy, completeness
  4. Performance Tests - Processing speed
  5. Edge Case Tests - Null, empty, invalid data
  6. Regression Tests - Output consistency

Success Metrics

  • Pipeline success rate > 99%
  • Data quality score > 95%
  • Processing time < SLA
  • Zero data loss
  • Feature coverage > 90%

Common Patterns

ETL Pattern

class ETLOrchestrator:
    def run_pipeline(self, task):
        # 1. Extract from sources
        # 2. Validate input data
        # 3. Transform data
        # 4. Quality checks
        # 5. Load to destination
        # 6. Update lineage

Feature Pattern

class FeatureOrchestrator:
    def create_feature(self, task):
        # 1. Define feature logic
        # 2. Identify dependencies
        # 3. Implement calculation
        # 4. Add to feature store
        # 5. Create monitoring

Data Processing Guidelines

Batch Processing

  • Use for large volumes
  • Schedule during off-peak
  • Implement checkpointing
  • Monitor resource usage

Stream Processing

  • Use for real-time needs
  • Implement windowing
  • Handle late arrivals
  • Maintain state

Data Quality Rules

  1. Completeness - No missing required fields
  2. Accuracy - Values within expected ranges
  3. Consistency - Cross-dataset alignment
  4. Timeliness - Data freshness requirements
  5. Uniqueness - No unwanted duplicates
  6. Validity - Format and type correctness

Anti-Patterns to Avoid

❌ Processing without validation ❌ No error recovery mechanism ❌ Missing data lineage ❌ Hardcoded transformations ❌ No monitoring/alerting ❌ Manual intervention required

GitHub Repository

majiayu000/claude-skill-registry
Path: skills/data-orchestrator

Related Skills

content-collections

Meta

This skill provides a production-tested setup for Content Collections, a TypeScript-first tool that transforms Markdown/MDX files into type-safe data collections with Zod validation. Use it when building blogs, documentation sites, or content-heavy Vite + React applications to ensure type safety and automatic content validation. It covers everything from Vite plugin configuration and MDX compilation to deployment optimization and schema validation.

View skill

llamaindex

Meta

LlamaIndex is a data framework for building RAG-powered LLM applications, specializing in document ingestion, indexing, and querying. It provides key features like vector indices, query engines, and agents, and supports over 300 data connectors. Use it for document Q&A, chatbots, and knowledge retrieval when building data-centric applications.

View skill

hybrid-cloud-networking

Meta

This skill configures secure hybrid cloud networking between on-premises infrastructure and cloud platforms like AWS, Azure, and GCP. Use it when connecting data centers to the cloud, building hybrid architectures, or implementing secure cross-premises connectivity. It supports key capabilities such as VPNs and dedicated connections like AWS Direct Connect for high-performance, reliable setups.

View skill

polymarket

Meta

This skill enables developers to build applications with the Polymarket prediction markets platform, including API integration for trading and market data. It also provides real-time data streaming via WebSocket to monitor live trades and market activity. Use it for implementing trading strategies or creating tools that process live market updates.

View skill