test-data-management
について
このスキルは、開発者がプライバシー準拠を確保しながら合成テストデータを生成・管理することを支援します。大規模で現実的なデータの作成、個人識別情報(PII)の匿名化、GDPR/CCPA準拠の維持に関する戦略を提供します。テストデータセットの構築、機密情報の取り扱い、または様々なテストシナリオ用の分離データが必要な場合にご利用ください。
クイックインストール
Claude Code
推奨/plugin add https://github.com/proffesor-for-testing/agentic-qegit clone https://github.com/proffesor-for-testing/agentic-qe.git ~/.claude/skills/test-data-managementこのコマンドをClaude Codeにコピー&ペーストしてスキルをインストールします
ドキュメント
Test Data Management
<default_to_action> When creating or managing test data:
- NEVER use production PII directly
- GENERATE synthetic data with faker libraries
- ANONYMIZE production data if used (mask, hash)
- ISOLATE test data (transactions, per-test cleanup)
- SCALE with batch generation (10k+ records/sec)
Quick Data Strategy:
- Unit tests: Minimal data (just enough)
- Integration: Realistic data (full complexity)
- Performance: Volume data (10k+ records)
Critical Success Factors:
- 40% of test failures from inadequate data
- GDPR fines up to €20M for PII violations
- Never store production PII in test environments </default_to_action>
Quick Reference Card
When to Use
- Creating test datasets
- Handling sensitive data
- Performance testing with volume
- GDPR/CCPA compliance
Data Strategies
| Type | When | Size |
|---|---|---|
| Minimal | Unit tests | 1-10 records |
| Realistic | Integration | 100-1000 records |
| Volume | Performance | 10k+ records |
| Edge cases | Boundary testing | Targeted |
Privacy Techniques
| Technique | Use Case |
|---|---|
| Synthetic | Generate fake data (preferred) |
| Masking | j***@example.com |
| Hashing | Irreversible pseudonymization |
| Tokenization | Reversible with key |
Synthetic Data Generation
import { faker } from '@faker-js/faker';
// Seed for reproducibility
faker.seed(123);
function generateUser() {
return {
id: faker.string.uuid(),
email: faker.internet.email(),
firstName: faker.person.firstName(),
lastName: faker.person.lastName(),
phone: faker.phone.number(),
address: {
street: faker.location.streetAddress(),
city: faker.location.city(),
zip: faker.location.zipCode()
},
createdAt: faker.date.past()
};
}
// Generate 1000 users
const users = Array.from({ length: 1000 }, generateUser);
Test Data Builder Pattern
class UserBuilder {
private user: Partial<User> = {};
asAdmin() {
this.user.role = 'admin';
this.user.permissions = ['read', 'write', 'delete'];
return this;
}
asCustomer() {
this.user.role = 'customer';
this.user.permissions = ['read'];
return this;
}
withEmail(email: string) {
this.user.email = email;
return this;
}
build(): User {
return {
id: this.user.id ?? faker.string.uuid(),
email: this.user.email ?? faker.internet.email(),
role: this.user.role ?? 'customer',
...this.user
} as User;
}
}
// Usage
const admin = new UserBuilder().asAdmin().withEmail('[email protected]').build();
const customer = new UserBuilder().asCustomer().build();
Data Anonymization
// Masking
function maskEmail(email) {
const [user, domain] = email.split('@');
return `${user[0]}***@${domain}`;
}
// [email protected] → j***@example.com
function maskCreditCard(cc) {
return `****-****-****-${cc.slice(-4)}`;
}
// 4242424242424242 → ****-****-****-4242
// Anonymize production data
const anonymizedUsers = prodUsers.map(user => ({
id: user.id, // Keep ID for relationships
email: `user-${user.id}@example.com`, // Fake email
firstName: faker.person.firstName(), // Generated
phone: null, // Remove PII
createdAt: user.createdAt // Keep non-PII
}));
Database Transaction Isolation
// Best practice: use transactions for cleanup
beforeEach(async () => {
await db.beginTransaction();
});
afterEach(async () => {
await db.rollbackTransaction(); // Auto cleanup!
});
test('user registration', async () => {
const user = await userService.register({
email: '[email protected]'
});
expect(user.id).toBeDefined();
// Automatic rollback after test - no cleanup needed
});
Volume Data Generation
// Generate 10,000 users efficiently
async function generateLargeDataset(count = 10000) {
const batchSize = 1000;
const batches = Math.ceil(count / batchSize);
for (let i = 0; i < batches; i++) {
const users = Array.from({ length: batchSize }, (_, index) => ({
id: i * batchSize + index,
email: `user${i * batchSize + index}@example.com`,
firstName: faker.person.firstName()
}));
await db.users.insertMany(users); // Batch insert
console.log(`Batch ${i + 1}/${batches}`);
}
}
Agent-Driven Data Generation
// High-speed generation with constraints
await Task("Generate Test Data", {
schema: 'ecommerce',
count: { users: 10000, products: 500, orders: 5000 },
preserveReferentialIntegrity: true,
constraints: {
age: { min: 18, max: 90 },
roles: ['customer', 'admin']
}
}, "qe-test-data-architect");
// GDPR-compliant anonymization
await Task("Anonymize Production Data", {
source: 'production-snapshot',
piiFields: ['email', 'phone', 'ssn'],
method: 'pseudonymization',
retainStructure: true
}, "qe-test-data-architect");
Agent Coordination Hints
Memory Namespace
aqe/test-data-management/
├── schemas/* - Data schemas
├── generators/* - Generator configs
├── anonymization/* - PII handling rules
└── fixtures/* - Reusable fixtures
Fleet Coordination
const dataFleet = await FleetManager.coordinate({
strategy: 'test-data-generation',
agents: [
'qe-test-data-architect', // Generate data
'qe-test-executor', // Execute with data
'qe-security-scanner' // Validate no PII exposure
],
topology: 'sequential'
});
Related Skills
- database-testing - Schema and integrity testing
- compliance-testing - GDPR/CCPA compliance
- performance-testing - Volume data for perf tests
Remember
Test data is infrastructure, not an afterthought. 40% of test failures are caused by inadequate test data. Poor data = poor tests.
Never use production PII directly. GDPR fines up to €20M or 4% of revenue. Always use synthetic data or properly anonymized production snapshots.
With Agents: qe-test-data-architect generates 10k+ records/sec with realistic patterns, relationships, and constraints. Agents ensure GDPR/CCPA compliance automatically and eliminate test data bottlenecks.
GitHub リポジトリ
関連スキル
compliance-testing
その他This skill automates regulatory compliance testing for standards like GDPR, HIPAA, and PCI-DSS. It validates data rights, encryption, and access controls to prepare for audits. Use it when handling sensitive data or needing audit-ready evidence reports.
compliance-testing
その他This Claude Skill automates regulatory compliance testing for standards like GDPR, HIPAA, and PCI-DSS. It maps requirements to testable controls, validates data rights and encryption, and generates audit-ready reports. Developers should use it when handling sensitive data or preparing for legal audits.
regression-testing
その他This skill strategically selects and runs regression tests based on code changes and risk, ensuring fixes don't break existing functionality. It analyzes impact, optimizes test execution for faster feedback, and helps manage continuous regression within CI/CD. Use it for verifying changes, planning test suites, or streamlining test execution.
test-environment-management
その他This Claude Skill manages test infrastructure using infrastructure as code, Docker/Kubernetes for consistent environments, and service virtualization. It helps developers ensure environment parity with production and optimize testing costs through auto-shutdown and spot instances. Use it when provisioning test environments or managing testing infrastructure.
