archiving-databases
About
This Claude Skill automates database archival by moving historical records from primary tables to archive storage (S3, Azure Blob, GCS, or archive tables) to reduce database size. It helps implement data retention policies and is triggered by requests to archive old data or move it to cold storage. The skill requires appropriate database and storage credentials and supports tools for SQL databases and cloud storage operations.
Quick Install
Claude Code
Recommended/plugin add https://github.com/jeremylongshore/claude-code-plugins-plusgit clone https://github.com/jeremylongshore/claude-code-plugins-plus.git ~/.claude/skills/archiving-databasesCopy and paste this command in Claude Code to install this skill
Documentation
Prerequisites
Before using this skill, ensure:
- Database credentials with SELECT and DELETE permissions on source tables
- Access to destination storage (archive table or cloud storage credentials)
- Network connectivity to cloud storage services if using S3/Azure/GCS
- Backup of database before first archival run
- Understanding of data retention requirements and compliance policies
- Monitoring tools configured to track archival job success
Instructions
Step 1: Define Archival Criteria
- Identify tables containing historical data for archival
- Define age threshold for archival (e.g., records older than 1 year)
- Determine additional criteria (status flags, record size, access frequency)
- Calculate expected data volume to be archived
- Document business requirements and compliance policies
Step 2: Choose Archival Destination
- Evaluate options: archive table in same database, separate archive database, or cold storage
- For cloud storage: select S3, Azure Blob, or GCS based on infrastructure
- Configure destination storage with appropriate security and access controls
- Set up compression settings for storage efficiency
- Define data format for archived records (CSV, Parquet, JSON)
Step 3: Create Archive Schema
- Design archive table schema matching source table structure
- Add metadata columns (archived_at, source_table, archive_reason)
- Create indexes on commonly queried archive columns
- For cloud storage: define bucket structure and naming conventions
- Test archive schema with sample data
Step 4: Implement Archival Logic
- Write SQL query to identify records meeting archival criteria
- Create extraction script to export records from source tables
- Implement transformation logic if archive format differs from source
- Build verification queries to confirm data integrity after archival
- Add transaction handling to ensure atomicity (delete only if archive succeeds)
Step 5: Execute Archival Process
- Run archival in staging environment first with subset of data
- Verify archived data integrity and completeness
- Execute archival in production during low-traffic window
- Monitor database performance during archival operation
- Generate archival report with record counts and storage savings
Step 6: Automate Retention Policy
- Schedule periodic archival jobs (weekly, monthly)
- Configure automated monitoring and alerting for job failures
- Implement cleanup of successfully archived records from source tables
- Set up expiration policies on archived data per compliance requirements
- Document archival schedule and retention periods
Output
This skill produces:
Archival Scripts: SQL and shell scripts to extract, transform, and load data to archive destination
Archive Tables/Files: Structured storage containing historical records with metadata and timestamps
Verification Reports: Row counts, data checksums, and integrity checks confirming successful archival
Storage Metrics: Database size reduction, archive storage utilization, and cost savings estimates
Archival Logs: Detailed logs of each archival run with timestamps, record counts, and any errors
Error Handling
Insufficient Storage Space:
- Check available disk space on archive destination before execution
- Implement storage monitoring and alerting
- Use compression to reduce archive size
- Clean up old archives per retention policy before new archival
Data Integrity Issues:
- Run checksums on source data before and after archival
- Implement row count verification between source and archive
- Keep source data until archive verification completes
- Rollback archive transaction if verification fails
Permission Denied Errors:
- Verify database user has SELECT on source tables and INSERT on archive tables
- Confirm cloud storage credentials have write permissions
- Check network security groups allow connections to cloud storage
- Document required permissions for archival automation
Timeout During Large Archival:
- Split archival into smaller batches by date ranges
- Run archival incrementally over multiple days
- Increase database timeout settings for archival sessions
- Schedule archival during maintenance windows with extended timeouts
Resources
Archival Configuration Templates:
- PostgreSQL archival:
{baseDir}/templates/postgresql-archive-config.yaml - MySQL archival:
{baseDir}/templates/mysql-archive-config.yaml - S3 cold storage:
{baseDir}/templates/s3-archive-config.yaml - Azure Blob storage:
{baseDir}/templates/azure-archive-config.yaml
Retention Policy Definitions: {baseDir}/policies/retention-policies.yaml
Archival Scripts Library: {baseDir}/scripts/archival/
- Extract to CSV script
- Extract to Parquet script
- S3 upload with compression
- Archive verification queries
Monitoring Dashboards: {baseDir}/monitoring/archival-dashboard.json
Cost Analysis Tools: {baseDir}/tools/storage-cost-calculator.py
GitHub Repository
Related Skills
sglang
MetaSGLang is a high-performance LLM serving framework that specializes in fast, structured generation for JSON, regex, and agentic workflows using its RadixAttention prefix caching. It delivers significantly faster inference, especially for tasks with repeated prefixes, making it ideal for complex, structured outputs and multi-turn conversations. Choose SGLang over alternatives like vLLM when you need constrained decoding or are building applications with extensive prefix sharing.
Algorithmic Art Generation
MetaThis skill helps developers create algorithmic art using p5.js, focusing on generative art, computational aesthetics, and interactive visualizations. It automatically activates for topics like "generative art" or "p5.js visualization" and guides you through creating unique algorithms with features like seeded randomness, flow fields, and particle systems. Use it when you need to build reproducible, code-driven artistic patterns.
csv-data-summarizer
MetaThis skill automatically analyzes CSV files to generate comprehensive statistical summaries and visualizations using Python's pandas and matplotlib/seaborn. It should be triggered whenever a user uploads or references CSV data without prompting for analysis preferences. The tool provides immediate insights into data structure, quality, and patterns through automated analysis and visualization.
llamaindex
MetaLlamaIndex is a data framework for building RAG-powered LLM applications, specializing in document ingestion, indexing, and querying. It provides key features like vector indices, query engines, and agents, and supports over 300 data connectors. Use it for document Q&A, chatbots, and knowledge retrieval when building data-centric applications.
