Back to Skills

data-validation

majiayu000
Updated Yesterday
2 views
58
9
58
View on GitHub
Metatestingapidata

About

This skill provides a comprehensive data validation framework for testing schema compliance, data quality, and referential integrity across databases, APIs, and pipelines. Use it to validate data sources and generate quality scorecards with anomaly detection for completeness, accuracy, and consistency. It's ideal for developers needing to check data integrity or perform ETL testing.

Quick Install

Claude Code

Recommended
Plugin CommandRecommended
/plugin add https://github.com/majiayu000/claude-skill-registry
Git CloneAlternative
git clone https://github.com/majiayu000/claude-skill-registry.git ~/.claude/skills/data-validation

Copy and paste this command in Claude Code to install this skill

Documentation

Data Validation Framework

Purpose

Comprehensive data validation framework for testing schema compliance, data quality, and referential integrity. Validates databases, APIs, data pipelines, and file formats. Generates data quality scorecards with anomaly detection.

Triggers

Use this skill when:

  • "validate data quality"
  • "check data integrity"
  • "schema validation"
  • "test data pipeline"
  • "data quality report"
  • "validate CSV"
  • "check for data anomalies"
  • "test ETL output"

When to Use

  • Data pipeline deployment
  • Database migration
  • API response validation
  • Report generation systems
  • Data warehouse testing
  • ML training data validation

When NOT to Use

  • API endpoint testing (use api-contract-validator)
  • Security testing (use security-test-suite)
  • Performance testing (use performance-benchmark)

Core Instructions

Data Quality Dimensions

DimensionDescriptionWeight
CompletenessMissing values, required fields25%
AccuracyType conformance, format validation25%
ConsistencyCross-field rules, referential integrity20%
UniquenessDuplicate detection, key uniqueness15%
FreshnessTimestamp validation, staleness10%
AnomalyStatistical outlier detection5%

Validation Categories

CategoryDescriptionSeverity
SchemaStructure and type complianceCritical
CompletenessMissing/null value detectionHigh
AccuracyValue correctness and formatHigh
ConsistencyCross-field/cross-table rulesMedium
UniquenessDuplicate detectionMedium
FreshnessTimeliness of dataMedium
AnomalyStatistical outlier detectionLow

Schema Definition

schema:
  tables:
    transactions:
      columns:
        - name: transaction_id
          type: string
          required: true
          unique: true
          pattern: "^TXN-[A-Z0-9]{10}$"

        - name: amount
          type: float
          required: true
          min: 0.01
          max: 1000000

        - name: status
          type: string
          required: true
          enum: [pending, completed, failed]

Templates

Data Quality Report

# Data Quality Report

**Source:** {source_type}
**Table:** {table_name}
**Generated:** {timestamp}

## Quality Scorecard

**Overall Score:** {score}/100 ({grade})

| Dimension | Score | Status |
| --------- | ----- | ------ |
| Completeness | {completeness} | {status_icon} |
| Accuracy | {accuracy} | {status_icon} |
| Consistency | {consistency} | {status_icon} |
| Uniqueness | {uniqueness} | {status_icon} |
| Freshness | {freshness} | {status_icon} |

## Data Summary

| Metric | Value |
| ------ | ----- |
| Total Rows | {total_rows} |
| Valid Rows | {valid_rows} ({valid_percent}%) |
| Invalid Rows | {invalid_rows} ({invalid_percent}%) |

## Issue Details

### {category} Issues

**{issue_id}:** {message}

- Column: `{column}`
- Affected rows: {row_count}
- Sample values: `{samples}`

Example

Input: Validate transactions CSV against schema

Output:

## Quality Scorecard

**Overall Score:** 87.3/100 (B)

| Dimension | Score | Status |
| --------- | ----- | ------ |
| Completeness | 95.0 | Pass |
| Accuracy | 88.5 | Pass |
| Consistency | 82.0 | Pass |
| Uniqueness | 100.0 | Pass |
| Freshness | 75.0 | Warn |

## Issue Details

### Accuracy Issues

**TYPE-amount:** Expected float, got string

- Column: `amount`
- Affected rows: 45
- Sample values: `"N/A", "pending", "TBD"`

Validation Checklist

  • Schema definition matches expected structure
  • All required columns validated
  • Null thresholds appropriately set
  • Foreign key references checked (if applicable)
  • Anomaly detection parameters tuned
  • Sample data reviewed for false positives
  • Report includes actionable remediation

Related Skills

  • api-contract-validator - For API response validation
  • unit-test-generator - For data processing function tests
  • test-health-monitor - For tracking validation trends

GitHub Repository

majiayu000/claude-skill-registry
Path: skills/data-validation

Related Skills

content-collections

Meta

This skill provides a production-tested setup for Content Collections, a TypeScript-first tool that transforms Markdown/MDX files into type-safe data collections with Zod validation. Use it when building blogs, documentation sites, or content-heavy Vite + React applications to ensure type safety and automatic content validation. It covers everything from Vite plugin configuration and MDX compilation to deployment optimization and schema validation.

View skill

creating-opencode-plugins

Meta

This skill provides the structure and API specifications for creating OpenCode plugins that hook into 25+ event types like commands, files, and LSP operations. It offers implementation patterns for JavaScript/TypeScript modules that intercept and extend the AI assistant's lifecycle. Use it when you need to build event-driven plugins for monitoring, custom handling, or extending OpenCode's capabilities.

View skill

evaluating-llms-harness

Testing

This Claude Skill runs the lm-evaluation-harness to benchmark LLMs across 60+ standardized academic tasks like MMLU and GSM8K. It's designed for developers to compare model quality, track training progress, or report academic results. The tool supports various backends including HuggingFace and vLLM models.

View skill

langchain

Meta

LangChain is a framework for building LLM applications using agents, chains, and RAG pipelines. It supports multiple LLM providers, offers 500+ integrations, and includes features like tool calling and memory management. Use it for rapid prototyping and deploying production systems like chatbots, autonomous agents, and question-answering services.

View skill