data-aggregation
About
This skill aggregates and merges data from multiple sources like App Store sales, GitHub commits, and Skillz events. It's designed for combining data to create reports, dashboards, or for analysis. Key capabilities include time-based aggregation and tools to transform and merge different data streams.
Quick Install
Claude Code
Recommended/plugin add https://github.com/majiayu000/claude-skill-registrygit clone https://github.com/majiayu000/claude-skill-registry.git ~/.claude/skills/data-aggregationCopy and paste this command in Claude Code to install this skill
Documentation
Data Aggregation
Tools for aggregating, transforming, and merging data from multiple sources.
Quick Start
Aggregate App Store sales:
python scripts/aggregate_sales.py --input sales_reports/ --output aggregated.json
Aggregate GitHub commits:
python scripts/aggregate_commits.py --input commits.json --period week --output summary.json
Merge multiple sources:
python scripts/merge_sources.py --sources app_store.json github.json skillz.json --output combined.json
Aggregation Types
1. Time-Based Aggregation
Group data by time periods (day, week, month).
Example: Daily sales totals
from aggregate_sales import aggregate_by_time
# Input: List of sales records
sales = [
{"date": "2026-01-14", "revenue": 123.45, "units": 5},
{"date": "2026-01-14", "revenue": 67.89, "units": 3},
{"date": "2026-01-15", "revenue": 234.56, "units": 8}
]
# Output: Aggregated by day
result = aggregate_by_time(sales, period='day')
# {
# "2026-01-14": {"revenue": 191.34, "units": 8},
# "2026-01-15": {"revenue": 234.56, "units": 8}
# }
2. Entity-Based Aggregation
Group data by entities (apps, users, repos, etc.).
Example: Per-app metrics
from aggregate_sales import aggregate_by_entity
sales = [
{"app": "App A", "revenue": 100, "units": 5},
{"app": "App A", "revenue": 50, "units": 2},
{"app": "App B", "revenue": 200, "units": 10}
]
result = aggregate_by_entity(sales, entity_field='app')
# {
# "App A": {"revenue": 150, "units": 7},
# "App B": {"revenue": 200, "units": 10}
# }
3. Statistical Aggregation
Calculate statistics (sum, avg, min, max, percentiles).
Example: Commit statistics
from aggregate_commits import calculate_stats
commits = [
{"author": "John", "lines": 125},
{"author": "Jane", "lines": 87},
{"author": "John", "lines": 43}
]
result = calculate_stats(commits, group_by='author', metric='lines')
# {
# "John": {"sum": 168, "avg": 84, "min": 43, "max": 125, "count": 2},
# "Jane": {"sum": 87, "avg": 87, "min": 87, "max": 87, "count": 1}
# }
Data Sources
App Store Sales
Input format (TSV from App Store Connect):
Provider Provider Country SKU Developer Title Version Product Type Identifier Units Developer Proceeds Begin Date End Date Customer Currency Country Code Currency of Proceeds Apple Identifier Customer Price Promo Code Parent Identifier Subscription Period Category CMB Device Supported Platforms Proceeds Reason Preserved Pricing Client
Aggregated output:
{
"period": "2026-01-14",
"apps": {
"com.example.app": {
"name": "My App",
"downloads": 1234,
"revenue": 567.89,
"updates": 45,
"countries": ["US", "CA", "UK"]
}
},
"totals": {
"total_downloads": 5678,
"total_revenue": 2345.67,
"total_apps": 5
}
}
GitHub Commits
Input format (from GitHub API):
[
{
"sha": "abc123",
"author": {"name": "John Doe", "email": "[email protected]"},
"commit": {
"message": "Add feature X",
"author": {"date": "2026-01-14T10:30:00Z"}
},
"stats": {"additions": 125, "deletions": 45}
}
]
Aggregated output:
{
"period": "week",
"date_range": "2026-01-07 to 2026-01-14",
"summary": {
"total_commits": 45,
"total_contributors": 5,
"total_lines": 2345,
"total_files": 123
},
"by_author": {
"John Doe": {
"commits": 15,
"lines_added": 1234,
"lines_deleted": 456,
"files_changed": 45
}
},
"by_day": {
"2026-01-14": {"commits": 8, "lines": 567}
}
}
Skillz Events
Input format (from Skillz Developer Portal):
{
"event_id": "888831",
"name": "Winter Tournament",
"status": "active",
"start_date": "2026-01-10",
"end_date": "2026-01-20",
"prize_pool": 1000,
"entries": 234
}
Aggregated output:
{
"period": "active",
"summary": {
"total_events": 8,
"total_prize_pool": 8000,
"total_entries": 1234
},
"by_status": {
"active": {"count": 5, "prize_pool": 5000},
"completed": {"count": 3, "prize_pool": 3000}
}
}
Aggregation Scripts
aggregate_sales.py
Aggregate App Store sales data.
Usage:
python scripts/aggregate_sales.py \
--input sales_reports/ \
--period week \
--group-by app \
--output aggregated.json
Arguments:
--input: Input directory or file (TSV/JSON)--period: Time period (day, week, month)--group-by: Grouping field (app, country, category)--output: Output JSON file
aggregate_commits.py
Aggregate GitHub commit data.
Usage:
python scripts/aggregate_commits.py \
--input commits.json \
--period week \
--metrics lines,files,commits \
--output summary.json
Arguments:
--input: Input JSON file (commits array)--period: Time period (day, week, month)--metrics: Metrics to calculate (comma-separated)--output: Output JSON file
aggregate_events.py
Aggregate Skillz event data.
Usage:
python scripts/aggregate_events.py \
--input events/ \
--status active,completed \
--output summary.json
Arguments:
--input: Input directory with event JSON files--status: Filter by status (comma-separated)--output: Output JSON file
merge_sources.py
Merge data from multiple sources.
Usage:
python scripts/merge_sources.py \
--sources app_store.json github.json skillz.json \
--strategy combine \
--output combined.json
Arguments:
--sources: Space-separated list of JSON files--strategy: Merge strategy (combine, average, latest)--output: Output JSON file
Merge strategies:
combine: Combine all data (keep all fields)average: Average numeric fieldslatest: Keep latest values (by timestamp)
Data Transformations
Filtering
from aggregate_sales import filter_data
sales = [...]
# Filter by country
us_sales = filter_data(sales, country='US')
# Filter by date range
recent_sales = filter_data(sales, start_date='2026-01-01', end_date='2026-01-14')
# Filter by value
high_revenue = filter_data(sales, min_revenue=100)
Grouping
from aggregate_commits import group_data
commits = [...]
# Group by author
by_author = group_data(commits, group_by='author')
# Group by repository
by_repo = group_data(commits, group_by='repository')
# Group by date
by_date = group_data(commits, group_by='date', period='day')
Sorting
from merge_sources import sort_data
data = [...]
# Sort by revenue (descending)
sorted_data = sort_data(data, field='revenue', reverse=True)
# Sort by date (ascending)
sorted_data = sort_data(data, field='date')
Integration with Agents
Reporting Agent
# Aggregate App Store sales
from aggregate_sales import aggregate_sales
sales_data = appstore_client.get_sales_report(days=7)
aggregated = aggregate_sales(sales_data, period='day', group_by='app')
# Use for report
html = render_template('appstore-metrics', aggregated)
Automation Agent
# Aggregate GitHub commits
from aggregate_commits import aggregate_commits
commits = github_client.get_commits(repo='owner/repo', days=7)
summary = aggregate_commits(commits, period='week')
# Create ClickUp task if high activity
if summary['total_commits'] > 50:
clickup_client.create_task(
title='High GitHub Activity',
description=f"Total commits: {summary['total_commits']}"
)
Examples
See examples/ directory for:
sample_sales_aggregation.json- App Store sales examplesample_commit_aggregation.json- GitHub commits examplesample_multi_source_merge.json- Multi-source merge example
GitHub Repository
Related Skills
content-collections
MetaThis skill provides a production-tested setup for Content Collections, a TypeScript-first tool that transforms Markdown/MDX files into type-safe data collections with Zod validation. Use it when building blogs, documentation sites, or content-heavy Vite + React applications to ensure type safety and automatic content validation. It covers everything from Vite plugin configuration and MDX compilation to deployment optimization and schema validation.
llamaindex
MetaLlamaIndex is a data framework for building RAG-powered LLM applications, specializing in document ingestion, indexing, and querying. It provides key features like vector indices, query engines, and agents, and supports over 300 data connectors. Use it for document Q&A, chatbots, and knowledge retrieval when building data-centric applications.
hybrid-cloud-networking
MetaThis skill configures secure hybrid cloud networking between on-premises infrastructure and cloud platforms like AWS, Azure, and GCP. Use it when connecting data centers to the cloud, building hybrid architectures, or implementing secure cross-premises connectivity. It supports key capabilities such as VPNs and dedicated connections like AWS Direct Connect for high-performance, reliable setups.
polymarket
MetaThis skill enables developers to build applications with the Polymarket prediction markets platform, including API integration for trading and market data. It also provides real-time data streaming via WebSocket to monitor live trades and market activity. Use it for implementing trading strategies or creating tools that process live market updates.
