Back to Skills

csv-excel-merger

OneWave-AI
Updated Today
19 views
11
4
11
View on GitHub
Documentsexceldata

About

The csv-excel-merger skill intelligently merges multiple CSV or Excel files by automatically matching columns and handling different schemas. It provides key capabilities like data deduplication and conflict resolution when combining data sources. Use this skill when developers need to consolidate spreadsheets, combine data exports, or merge multiple files into a single dataset.

Documentation

CSV/Excel Merger

Intelligently merge multiple CSV or Excel files with automatic column matching and data deduplication.

Instructions

When a user needs to merge CSV or Excel files:

  1. Analyze Input Files:

    • How many files need to be merged?
    • What format (CSV, Excel, TSV)?
    • Are the files provided or need to be read from disk?
    • Do columns have the same names across files?
    • What is the primary key (unique identifier)?
  2. Inspect File Structures:

    • Read headers from each file
    • Identify column names and data types
    • Detect encoding (UTF-8, Latin-1, etc.)
    • Check for missing columns
    • Look for duplicate column names
  3. Create Merge Strategy:

    Column Matching:

    • Exact name match: "email" = "email"
    • Case-insensitive: "Email" = "email"
    • Fuzzy match: "E-mail" ≈ "email"
    • Common patterns:
      • "first_name", "firstname", "First Name" → "first_name"
      • "phone", "phone_number", "tel" → "phone"
      • "email", "e-mail", "email_address" → "email"

    Conflict Resolution (when same record appears in multiple files):

    • Keep first: Use value from first file
    • Keep last: Use value from last file (most recent)
    • Keep longest: Use most complete value
    • Manual review: Flag conflicts for user review
    • Merge: Combine non-conflicting fields

    Deduplication:

    • Identify duplicate rows based on primary key
    • Options: keep first, keep last, keep all, merge values
    • Track source file for each row
  4. Perform Merge:

    # Example merge logic
    import pandas as pd
    
    # Read files
    df1 = pd.read_csv('file1.csv')
    df2 = pd.read_csv('file2.csv')
    
    # Normalize column names
    df1.columns = df1.columns.str.lower().str.strip()
    df2.columns = df2.columns.str.lower().str.strip()
    
    # Map similar columns
    column_mapping = {
        'firstname': 'first_name',
        'e_mail': 'email',
        # ...
    }
    df2 = df2.rename(columns=column_mapping)
    
    # Merge
    merged = pd.concat([df1, df2], ignore_index=True)
    
    # Deduplicate
    merged = merged.drop_duplicates(subset=['email'], keep='last')
    
    # Save
    merged.to_csv('merged_output.csv', index=False)
    
  5. Format Output:

    📊 CSV/EXCEL MERGER REPORT
    
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
    📁 INPUT FILES
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
    
    File 1: contacts_jan.csv
      Rows: 1,245
      Columns: 8 (name, email, phone, company, ...)
    
    File 2: contacts_feb.csv
      Rows: 987
      Columns: 9 (firstname, lastname, email, mobile, ...)
    
    File 3: leads_export.xlsx
      Rows: 2,103
      Columns: 12 (full_name, email_address, phone, ...)
    
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
    🔄 COLUMN MAPPING
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
    
    Unified Schema:
    • first_name ← [firstname, first name, fname]
    • last_name ← [lastname, last name, lname]
    • email ← [email, e-mail, email_address]
    • phone ← [phone, mobile, phone_number, tel]
    • company ← [company, organization, org]
    • title ← [title, job_title, position]
    • source ← [file origin tracking]
    
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
    🔍 MERGE ANALYSIS
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
    
    Total rows before merge: 4,335
    Duplicate records found: 892
    Conflicts detected: 47
    
    Deduplication Strategy: Keep most recent (by source file date)
    Primary Key: email
    
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
    ⚠️ CONFLICTS
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
    
    Record: [email protected]
      File 1 phone: (555) 123-4567
      File 2 phone: (555) 987-6543
      Resolution: Kept most recent (File 2)
    
    [List top 10 conflicts]
    
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
    ✅ MERGE RESULTS
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
    
    Output File: merged_contacts.csv
    Total Rows: 3,443
    Columns: 7
    Duplicates Removed: 892
    
    Breakdown by Source:
    • contacts_jan.csv: 1,245 rows (398 unique)
    • contacts_feb.csv: 987 rows (521 unique)
    • leads_export.xlsx: 2,103 rows (2,524 unique)
    
    Data Quality:
    • Email completeness: 98.2%
    • Phone completeness: 87.5%
    • Company completeness: 91.3%
    
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
    💡 RECOMMENDATIONS
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
    
    • Review 47 conflict records manually
    • Standardize phone number format
    • Fill missing company names (8.7% incomplete)
    • Export conflicts to: conflicts_review.csv
    
  6. Handle Special Cases:

    Multiple Primary Keys:

    • Use compound keys: (email + company)
    • Offer options when ambiguous

    Different Data Types:

    • Convert dates to standard format
    • Normalize phone numbers
    • Standardize country codes
    • Clean whitespace and casing

    Missing Columns:

    • Fill with empty values
    • Flag missing data
    • Offer to create new columns

    Large Files:

    • Use chunking for files > 100MB
    • Show progress indicator
    • Estimate memory usage
  7. Generate Code: Provide Python/pandas script that:

    • Reads all files
    • Performs intelligent column matching
    • Deduplicates based on strategy
    • Resolves conflicts
    • Saves merged output
    • Generates detailed report
  8. Export Options:

    • CSV (UTF-8)
    • Excel (.xlsx)
    • JSON
    • SQL INSERT statements
    • Parquet (for large datasets)

Example Triggers

  • "Merge these three CSV files"
  • "Combine multiple Excel sheets into one file"
  • "Deduplicate and merge customer data"
  • "Join spreadsheets with different column names"
  • "Consolidate contact lists from different sources"

Best Practices

Column Matching:

  • Use fuzzy matching for similar names
  • Maintain original column name mapping report
  • Allow manual override of auto-matching

Data Quality:

  • Trim whitespace
  • Standardize formats (phone, email, dates)
  • Detect and flag invalid data
  • Preserve data types

Performance:

  • Use chunking for large files
  • Process in batches
  • Show progress for long operations
  • Optimize memory usage

Transparency:

  • Log all merge decisions
  • Track source file for each row
  • Report conflicts and resolutions
  • Generate detailed merge report

Output Quality

Ensure merges:

  • Intelligently match columns
  • Handle different schemas
  • Deduplicate properly
  • Preserve data integrity
  • Flag conflicts for review
  • Generate comprehensive report
  • Maintain data quality
  • Track data lineage (source)
  • Handle edge cases gracefully
  • Provide validation statistics

Generate clean, deduplicated merged files with full transparency and data quality checks.

Quick Install

/plugin add https://github.com/OneWave-AI/claude-skills/tree/main/csv-excel-merger

Copy and paste this command in Claude Code to install this skill

GitHub 仓库

OneWave-AI/claude-skills
Path: csv-excel-merger

Related Skills

llamaindex

Meta

LlamaIndex is a data framework for building RAG-powered LLM applications, specializing in document ingestion, indexing, and querying. It provides key features like vector indices, query engines, and agents, and supports over 300 data connectors. Use it for document Q&A, chatbots, and knowledge retrieval when building data-centric applications.

View skill

csv-data-summarizer

Meta

This skill automatically analyzes CSV files to generate comprehensive statistical summaries and visualizations using Python's pandas and matplotlib/seaborn. It should be triggered whenever a user uploads or references CSV data without prompting for analysis preferences. The tool provides immediate insights into data structure, quality, and patterns through automated analysis and visualization.

View skill

hybrid-cloud-networking

Meta

This skill configures secure hybrid cloud networking between on-premises infrastructure and cloud platforms like AWS, Azure, and GCP. Use it when connecting data centers to the cloud, building hybrid architectures, or implementing secure cross-premises connectivity. It supports key capabilities such as VPNs and dedicated connections like AWS Direct Connect for high-performance, reliable setups.

View skill

Excel Analysis

Meta

This skill enables developers to analyze Excel files and perform data operations using pandas. It can read spreadsheets, create pivot tables, generate charts, and conduct data analysis on .xlsx files and tabular data. Use it when working with Excel files, spreadsheets, or any structured tabular data within Claude Code.

View skill