csv-excel-merger
について
csv-excel-mergerスキルは、複数のCSVまたはExcelファイルをインテリジェントにマージし、列を自動的に照合して異なるスキーマを処理します。データソースを結合する際の重複排除や競合解決といった主要な機能を提供します。開発者がスプレッドシートを統合したり、データエクスポートを結合したり、複数ファイルを単一のデータセットにまとめる必要がある場合に、このスキルをご利用ください。
クイックインストール
Claude Code
推奨/plugin add https://github.com/OneWave-AI/claude-skillsgit clone https://github.com/OneWave-AI/claude-skills.git ~/.claude/skills/csv-excel-mergerこのコマンドをClaude Codeにコピー&ペーストしてスキルをインストールします
ドキュメント
CSV/Excel Merger
Intelligently merge multiple CSV or Excel files with automatic column matching and data deduplication.
Instructions
When a user needs to merge CSV or Excel files:
-
Analyze Input Files:
- How many files need to be merged?
- What format (CSV, Excel, TSV)?
- Are the files provided or need to be read from disk?
- Do columns have the same names across files?
- What is the primary key (unique identifier)?
-
Inspect File Structures:
- Read headers from each file
- Identify column names and data types
- Detect encoding (UTF-8, Latin-1, etc.)
- Check for missing columns
- Look for duplicate column names
-
Create Merge Strategy:
Column Matching:
- Exact name match: "email" = "email"
- Case-insensitive: "Email" = "email"
- Fuzzy match: "E-mail" ≈ "email"
- Common patterns:
- "first_name", "firstname", "First Name" → "first_name"
- "phone", "phone_number", "tel" → "phone"
- "email", "e-mail", "email_address" → "email"
Conflict Resolution (when same record appears in multiple files):
- Keep first: Use value from first file
- Keep last: Use value from last file (most recent)
- Keep longest: Use most complete value
- Manual review: Flag conflicts for user review
- Merge: Combine non-conflicting fields
Deduplication:
- Identify duplicate rows based on primary key
- Options: keep first, keep last, keep all, merge values
- Track source file for each row
-
Perform Merge:
# Example merge logic import pandas as pd # Read files df1 = pd.read_csv('file1.csv') df2 = pd.read_csv('file2.csv') # Normalize column names df1.columns = df1.columns.str.lower().str.strip() df2.columns = df2.columns.str.lower().str.strip() # Map similar columns column_mapping = { 'firstname': 'first_name', 'e_mail': 'email', # ... } df2 = df2.rename(columns=column_mapping) # Merge merged = pd.concat([df1, df2], ignore_index=True) # Deduplicate merged = merged.drop_duplicates(subset=['email'], keep='last') # Save merged.to_csv('merged_output.csv', index=False) -
Format Output:
📊 CSV/EXCEL MERGER REPORT ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 📁 INPUT FILES ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ File 1: contacts_jan.csv Rows: 1,245 Columns: 8 (name, email, phone, company, ...) File 2: contacts_feb.csv Rows: 987 Columns: 9 (firstname, lastname, email, mobile, ...) File 3: leads_export.xlsx Rows: 2,103 Columns: 12 (full_name, email_address, phone, ...) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 🔄 COLUMN MAPPING ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Unified Schema: • first_name ← [firstname, first name, fname] • last_name ← [lastname, last name, lname] • email ← [email, e-mail, email_address] • phone ← [phone, mobile, phone_number, tel] • company ← [company, organization, org] • title ← [title, job_title, position] • source ← [file origin tracking] ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 🔍 MERGE ANALYSIS ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Total rows before merge: 4,335 Duplicate records found: 892 Conflicts detected: 47 Deduplication Strategy: Keep most recent (by source file date) Primary Key: email ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ⚠️ CONFLICTS ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Record: [email protected] File 1 phone: (555) 123-4567 File 2 phone: (555) 987-6543 Resolution: Kept most recent (File 2) [List top 10 conflicts] ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ✅ MERGE RESULTS ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Output File: merged_contacts.csv Total Rows: 3,443 Columns: 7 Duplicates Removed: 892 Breakdown by Source: • contacts_jan.csv: 1,245 rows (398 unique) • contacts_feb.csv: 987 rows (521 unique) • leads_export.xlsx: 2,103 rows (2,524 unique) Data Quality: • Email completeness: 98.2% • Phone completeness: 87.5% • Company completeness: 91.3% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 💡 RECOMMENDATIONS ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ • Review 47 conflict records manually • Standardize phone number format • Fill missing company names (8.7% incomplete) • Export conflicts to: conflicts_review.csv -
Handle Special Cases:
Multiple Primary Keys:
- Use compound keys: (email + company)
- Offer options when ambiguous
Different Data Types:
- Convert dates to standard format
- Normalize phone numbers
- Standardize country codes
- Clean whitespace and casing
Missing Columns:
- Fill with empty values
- Flag missing data
- Offer to create new columns
Large Files:
- Use chunking for files > 100MB
- Show progress indicator
- Estimate memory usage
-
Generate Code: Provide Python/pandas script that:
- Reads all files
- Performs intelligent column matching
- Deduplicates based on strategy
- Resolves conflicts
- Saves merged output
- Generates detailed report
-
Export Options:
- CSV (UTF-8)
- Excel (.xlsx)
- JSON
- SQL INSERT statements
- Parquet (for large datasets)
Example Triggers
- "Merge these three CSV files"
- "Combine multiple Excel sheets into one file"
- "Deduplicate and merge customer data"
- "Join spreadsheets with different column names"
- "Consolidate contact lists from different sources"
Best Practices
Column Matching:
- Use fuzzy matching for similar names
- Maintain original column name mapping report
- Allow manual override of auto-matching
Data Quality:
- Trim whitespace
- Standardize formats (phone, email, dates)
- Detect and flag invalid data
- Preserve data types
Performance:
- Use chunking for large files
- Process in batches
- Show progress for long operations
- Optimize memory usage
Transparency:
- Log all merge decisions
- Track source file for each row
- Report conflicts and resolutions
- Generate detailed merge report
Output Quality
Ensure merges:
- Intelligently match columns
- Handle different schemas
- Deduplicate properly
- Preserve data integrity
- Flag conflicts for review
- Generate comprehensive report
- Maintain data quality
- Track data lineage (source)
- Handle edge cases gracefully
- Provide validation statistics
Generate clean, deduplicated merged files with full transparency and data quality checks.
GitHub リポジトリ
関連スキル
content-collections
メタThis skill provides a production-tested setup for Content Collections, a TypeScript-first tool that transforms Markdown/MDX files into type-safe data collections with Zod validation. Use it when building blogs, documentation sites, or content-heavy Vite + React applications to ensure type safety and automatic content validation. It covers everything from Vite plugin configuration and MDX compilation to deployment optimization and schema validation.
polymarket
メタThis skill enables developers to build applications with the Polymarket prediction markets platform, including API integration for trading and market data. It also provides real-time data streaming via WebSocket to monitor live trades and market activity. Use it for implementing trading strategies or creating tools that process live market updates.
cloudflare-turnstile
メタThis skill provides comprehensive guidance for implementing Cloudflare Turnstile as a CAPTCHA-alternative bot protection system. It covers integration for forms, login pages, API endpoints, and frameworks like React/Next.js/Hono, while handling invisible challenges that maintain user experience. Use it when migrating from reCAPTCHA, debugging error codes, or implementing token validation and E2E tests.
hybrid-cloud-networking
メタThis skill configures secure hybrid cloud networking between on-premises infrastructure and cloud platforms like AWS, Azure, and GCP. Use it when connecting data centers to the cloud, building hybrid architectures, or implementing secure cross-premises connectivity. It supports key capabilities such as VPNs and dedicated connections like AWS Direct Connect for high-performance, reliable setups.
