data-aggregation

majiayu000

Updated 2 days ago

Metadata

About

This skill aggregates and merges data from multiple sources like App Store sales, GitHub commits, and Skillz events. It's designed for combining data to create reports, dashboards, or for analysis. Key capabilities include time-based aggregation and tools to transform and merge different data streams.

Quick Install

Claude Code

Recommended

Plugin CommandRecommended

/plugin add https://github.com/majiayu000/claude-skill-registry

Git CloneAlternative

git clone https://github.com/majiayu000/claude-skill-registry.git ~/.claude/skills/data-aggregation

Copy and paste this command in Claude Code to install this skill

Documentation

Data Aggregation

Tools for aggregating, transforming, and merging data from multiple sources.

Quick Start

Aggregate App Store sales:

python scripts/aggregate_sales.py --input sales_reports/ --output aggregated.json

Aggregate GitHub commits:

python scripts/aggregate_commits.py --input commits.json --period week --output summary.json

Merge multiple sources:

python scripts/merge_sources.py --sources app_store.json github.json skillz.json --output combined.json

Aggregation Types

1. Time-Based Aggregation

Group data by time periods (day, week, month).

Example: Daily sales totals

from aggregate_sales import aggregate_by_time

# Input: List of sales records
sales = [
    {"date": "2026-01-14", "revenue": 123.45, "units": 5},
    {"date": "2026-01-14", "revenue": 67.89, "units": 3},
    {"date": "2026-01-15", "revenue": 234.56, "units": 8}
]

# Output: Aggregated by day
result = aggregate_by_time(sales, period='day')
# {
#     "2026-01-14": {"revenue": 191.34, "units": 8},
#     "2026-01-15": {"revenue": 234.56, "units": 8}
# }

2. Entity-Based Aggregation

Group data by entities (apps, users, repos, etc.).

Example: Per-app metrics

from aggregate_sales import aggregate_by_entity

sales = [
    {"app": "App A", "revenue": 100, "units": 5},
    {"app": "App A", "revenue": 50, "units": 2},
    {"app": "App B", "revenue": 200, "units": 10}
]

result = aggregate_by_entity(sales, entity_field='app')
# {
#     "App A": {"revenue": 150, "units": 7},
#     "App B": {"revenue": 200, "units": 10}
# }

3. Statistical Aggregation

Calculate statistics (sum, avg, min, max, percentiles).

Example: Commit statistics

from aggregate_commits import calculate_stats

commits = [
    {"author": "John", "lines": 125},
    {"author": "Jane", "lines": 87},
    {"author": "John", "lines": 43}
]

result = calculate_stats(commits, group_by='author', metric='lines')
# {
#     "John": {"sum": 168, "avg": 84, "min": 43, "max": 125, "count": 2},
#     "Jane": {"sum": 87, "avg": 87, "min": 87, "max": 87, "count": 1}
# }

Data Sources

App Store Sales

Input format (TSV from App Store Connect):

Provider	Provider Country	SKU	Developer	Title	Version	Product Type Identifier	Units	Developer Proceeds	Begin Date	End Date	Customer Currency	Country Code	Currency of Proceeds	Apple Identifier	Customer Price	Promo Code	Parent Identifier	Subscription	Period	Category	CMB	Device	Supported Platforms	Proceeds Reason	Preserved Pricing	Client

Aggregated output:

{
  "period": "2026-01-14",
  "apps": {
    "com.example.app": {
      "name": "My App",
      "downloads": 1234,
      "revenue": 567.89,
      "updates": 45,
      "countries": ["US", "CA", "UK"]
    }
  },
  "totals": {
    "total_downloads": 5678,
    "total_revenue": 2345.67,
    "total_apps": 5
  }
}

GitHub Commits

Input format (from GitHub API):

[
  {
    "sha": "abc123",
    "author": {"name": "John Doe", "email": "[email protected]"},
    "commit": {
      "message": "Add feature X",
      "author": {"date": "2026-01-14T10:30:00Z"}
    },
    "stats": {"additions": 125, "deletions": 45}
  }
]

Aggregated output:

{
  "period": "week",
  "date_range": "2026-01-07 to 2026-01-14",
  "summary": {
    "total_commits": 45,
    "total_contributors": 5,
    "total_lines": 2345,
    "total_files": 123
  },
  "by_author": {
    "John Doe": {
      "commits": 15,
      "lines_added": 1234,
      "lines_deleted": 456,
      "files_changed": 45
    }
  },
  "by_day": {
    "2026-01-14": {"commits": 8, "lines": 567}
  }
}

Skillz Events

Input format (from Skillz Developer Portal):

{
  "event_id": "888831",
  "name": "Winter Tournament",
  "status": "active",
  "start_date": "2026-01-10",
  "end_date": "2026-01-20",
  "prize_pool": 1000,
  "entries": 234
}

Aggregated output:

{
  "period": "active",
  "summary": {
    "total_events": 8,
    "total_prize_pool": 8000,
    "total_entries": 1234
  },
  "by_status": {
    "active": {"count": 5, "prize_pool": 5000},
    "completed": {"count": 3, "prize_pool": 3000}
  }
}

Aggregation Scripts

aggregate_sales.py

Aggregate App Store sales data.

Usage:

python scripts/aggregate_sales.py \
    --input sales_reports/ \
    --period week \
    --group-by app \
    --output aggregated.json

Arguments:

--input: Input directory or file (TSV/JSON)
--period: Time period (day, week, month)
--group-by: Grouping field (app, country, category)
--output: Output JSON file

aggregate_commits.py

Aggregate GitHub commit data.

Usage:

python scripts/aggregate_commits.py \
    --input commits.json \
    --period week \
    --metrics lines,files,commits \
    --output summary.json

Arguments:

--input: Input JSON file (commits array)
--period: Time period (day, week, month)
--metrics: Metrics to calculate (comma-separated)
--output: Output JSON file

aggregate_events.py

Aggregate Skillz event data.

Usage:

python scripts/aggregate_events.py \
    --input events/ \
    --status active,completed \
    --output summary.json

Arguments:

--input: Input directory with event JSON files
--status: Filter by status (comma-separated)
--output: Output JSON file

merge_sources.py

Merge data from multiple sources.

Usage:

python scripts/merge_sources.py \
    --sources app_store.json github.json skillz.json \
    --strategy combine \
    --output combined.json

Arguments:

--sources: Space-separated list of JSON files
--strategy: Merge strategy (combine, average, latest)
--output: Output JSON file

Merge strategies:

combine: Combine all data (keep all fields)
average: Average numeric fields
latest: Keep latest values (by timestamp)

Data Transformations

Filtering

from aggregate_sales import filter_data

sales = [...]

# Filter by country
us_sales = filter_data(sales, country='US')

# Filter by date range
recent_sales = filter_data(sales, start_date='2026-01-01', end_date='2026-01-14')

# Filter by value
high_revenue = filter_data(sales, min_revenue=100)

Grouping

from aggregate_commits import group_data

commits = [...]

# Group by author
by_author = group_data(commits, group_by='author')

# Group by repository
by_repo = group_data(commits, group_by='repository')

# Group by date
by_date = group_data(commits, group_by='date', period='day')

Sorting

from merge_sources import sort_data

data = [...]

# Sort by revenue (descending)
sorted_data = sort_data(data, field='revenue', reverse=True)

# Sort by date (ascending)
sorted_data = sort_data(data, field='date')

Integration with Agents

Reporting Agent

# Aggregate App Store sales
from aggregate_sales import aggregate_sales

sales_data = appstore_client.get_sales_report(days=7)
aggregated = aggregate_sales(sales_data, period='day', group_by='app')

# Use for report
html = render_template('appstore-metrics', aggregated)

Automation Agent

# Aggregate GitHub commits
from aggregate_commits import aggregate_commits

commits = github_client.get_commits(repo='owner/repo', days=7)
summary = aggregate_commits(commits, period='week')

# Create ClickUp task if high activity
if summary['total_commits'] > 50:
    clickup_client.create_task(
        title='High GitHub Activity',
        description=f"Total commits: {summary['total_commits']}"
    )

Examples

See examples/ directory for:

sample_sales_aggregation.json - App Store sales example
sample_commit_aggregation.json - GitHub commits example
sample_multi_source_merge.json - Multi-source merge example

GitHub Repository

majiayu000/claude-skill-registry

Path: skills/data-aggregation

Related Skills

content-collections

Meta

This skill provides a production-tested setup for Content Collections, a TypeScript-first tool that transforms Markdown/MDX files into type-safe data collections with Zod validation. Use it when building blogs, documentation sites, or content-heavy Vite + React applications to ensure type safety and automatic content validation. It covers everything from Vite plugin configuration and MDX compilation to deployment optimization and schema validation.

View skill

llamaindex

Meta

LlamaIndex is a data framework for building RAG-powered LLM applications, specializing in document ingestion, indexing, and querying. It provides key features like vector indices, query engines, and agents, and supports over 300 data connectors. Use it for document Q&A, chatbots, and knowledge retrieval when building data-centric applications.

View skill

hybrid-cloud-networking

Meta

This skill configures secure hybrid cloud networking between on-premises infrastructure and cloud platforms like AWS, Azure, and GCP. Use it when connecting data centers to the cloud, building hybrid architectures, or implementing secure cross-premises connectivity. It supports key capabilities such as VPNs and dedicated connections like AWS Direct Connect for high-performance, reliable setups.

View skill

polymarket

Meta

This skill enables developers to build applications with the Polymarket prediction markets platform, including API integration for trading and market data. It also provides real-time data streaming via WebSocket to monitor live trades and market activity. Use it for implementing trading strategies or creating tools that process live market updates.

View skill