grey-haven-testing-strategy

greyhaven-ai

Updated Today

23 views

Testingaitestingdesign

About

This skill provides Grey Haven's testing standards for TypeScript (Vitest) and Python (pytest) projects, including configuration for unit/integration/e2e tests, fixture patterns, and an 80%+ coverage requirement. Use it when writing tests, debugging failures, improving coverage, or setting up CI/CD pipelines. It offers practical examples and references for implementing the complete testing strategy.

Quick Install

Claude Code

Recommended

Plugin CommandRecommended

/plugin add https://github.com/greyhaven-ai/claude-code-config

Git CloneAlternative

git clone https://github.com/greyhaven-ai/claude-code-config.git ~/.claude/skills/grey-haven-testing-strategy

Copy and paste this command in Claude Code to install this skill

Documentation

Grey Haven Testing Strategy

Comprehensive testing approach for TypeScript (Vitest) and Python (pytest) projects.

Follow these standards when writing tests, setting up test infrastructure, or improving test coverage in Grey Haven codebases.

Supporting Documentation

EXAMPLES.md - Copy-paste test examples for Vitest and pytest
REFERENCE.md - Complete configurations, project structures, and CI setup
templates/ - Ready-to-use test templates
checklists/ - Testing quality checklists
scripts/ - Helper scripts for coverage and test execution

Testing Philosophy

Coverage Requirements

Minimum: 80% code coverage for all projects (enforced in CI)
Target: 90%+ coverage for critical paths
100% coverage for security-critical code (auth, payments, multi-tenant isolation)

Test Types (Markers)

Grey Haven uses consistent test markers across languages:

unit: Fast, isolated tests of single functions/classes
integration: Tests involving multiple components or external dependencies
e2e: End-to-end tests through full user flows
benchmark: Performance tests measuring speed/memory

TypeScript Testing (Vitest)

Quick Setup

Project Structure:

tests/
├── unit/                    # Fast, isolated tests
├── integration/             # Multi-component tests
└── e2e/                    # Playwright tests

Key Configuration:

// vitest.config.ts
export default defineConfig({
  test: {
    globals: true,
    environment: "jsdom",
    setupFiles: ["./tests/setup.ts"],
    coverage: {
      thresholds: { lines: 80, functions: 80, branches: 80, statements: 80 },
    },
  },
});

Running Tests:

bun run test                 # Run all tests
bun run test:coverage        # With coverage report
bun run test:watch           # Watch mode
bun run test:ui              # UI mode
bun run test tests/unit/     # Unit tests only

See EXAMPLES.md for complete test examples.

Python Testing (pytest)

Quick Setup

Project Structure:

tests/
├── conftest.py             # Shared fixtures
├── unit/                   # @pytest.mark.unit
├── integration/            # @pytest.mark.integration
├── e2e/                   # @pytest.mark.e2e
└── benchmark/             # @pytest.mark.benchmark

Key Configuration:

# pyproject.toml
[tool.pytest.ini_options]
addopts = ["--cov=app", "--cov-fail-under=80"]
markers = [
    "unit: Fast, isolated unit tests",
    "integration: Tests involving multiple components",
    "e2e: End-to-end tests through full flows",
    "benchmark: Performance tests",
]

Running Tests:

# ⚠️ ALWAYS activate virtual environment first!
source .venv/bin/activate

# Run with Doppler for environment variables
doppler run -- pytest                      # All tests
doppler run -- pytest --cov=app            # With coverage
doppler run -- pytest -m unit              # Unit tests only
doppler run -- pytest -m integration       # Integration tests only
doppler run -- pytest -m e2e               # E2E tests only
doppler run -- pytest -v                   # Verbose output

See EXAMPLES.md for complete test examples.

Test Markers Explained

Unit Tests

Characteristics:

Fast execution (< 100ms per test)
No external dependencies (database, API, file system)
Mock all external services
Test single function/class in isolation

Use for:

Utility functions
Business logic
Data transformations
Component rendering (React Testing Library)

Integration Tests

Characteristics:

Test multiple components together
May use real database/Redis (with cleanup)
Test API endpoints with FastAPI TestClient
Test React Query + server functions

Use for:

API endpoint flows
Database operations with repositories
Authentication flows
Multi-component interactions

E2E Tests

Characteristics:

Test complete user flows
Use Playwright (TypeScript) or httpx (Python)
Test from user perspective
Slower execution (seconds per test)

Use for:

Registration/login flows
Critical user journeys
Form submissions
Multi-page workflows

Benchmark Tests

Characteristics:

Measure performance metrics
Track execution time
Monitor memory usage
Detect performance regressions

Use for:

Database query performance
Algorithm optimization
API response times
Batch operations

Environment Variables with Doppler

⚠️ CRITICAL: Grey Haven uses Doppler for ALL environment variables.

# Install Doppler
brew install dopplerhq/cli/doppler

# Authenticate and setup
doppler login
doppler setup

# Run tests with Doppler
doppler run -- bun run test          # TypeScript
doppler run -- pytest                # Python

# Use specific config
doppler run --config test -- pytest

Doppler provides:

DATABASE_URL_TEST - Test database connection
REDIS_URL - Redis for tests (separate DB)
BETTER_AUTH_SECRET - Auth secrets
STRIPE_SECRET_KEY - External service keys (test mode)
PLAYWRIGHT_BASE_URL - E2E test URL

See REFERENCE.md for complete setup.

Test Fixtures and Factories

TypeScript Factories

// tests/factories/user.factory.ts
import { faker } from "@faker-js/faker";

export function createMockUser(overrides = {}) {
  return {
    id: faker.string.uuid(),
    tenant_id: faker.string.uuid(),
    email_address: faker.internet.email(),
    name: faker.person.fullName(),
    ...overrides,
  };
}

Python Fixtures

# tests/conftest.py
@pytest.fixture
async def test_user(session, tenant_id):
    """Create test user with tenant isolation."""
    user = User(
        tenant_id=tenant_id,
        email_address="[email protected]",
        name="Test User",
    )
    session.add(user)
    await session.commit()
    return user

See EXAMPLES.md for more patterns.

Multi-Tenant Testing

⚠️ ALWAYS test tenant isolation in multi-tenant projects:

@pytest.mark.unit
async def test_tenant_isolation(session, test_user, tenant_id):
    """Verify queries filter by tenant_id."""
    repo = UserRepository(session)

    # Should find with correct tenant
    user = await repo.get_by_id(test_user.id, tenant_id)
    assert user is not None

    # Should NOT find with different tenant
    different_tenant = uuid4()
    user = await repo.get_by_id(test_user.id, different_tenant)
    assert user is None

Continuous Integration

GitHub Actions with Doppler:

# .github/workflows/test.yml
- name: Run tests with Doppler
  env:
    DOPPLER_TOKEN: ${{ secrets.DOPPLER_TOKEN_TEST }}
  run: doppler run --config test -- bun run test:coverage

See REFERENCE.md for complete workflow.

When to Apply This Skill

Use this skill when:

✅ Writing new tests for features
✅ Setting up test infrastructure (Vitest/pytest)
✅ Configuring CI/CD test pipelines
✅ Debugging failing tests
✅ Improving test coverage (<80%)
✅ Reviewing test code quality
✅ Setting up Doppler for test environments
✅ Creating test fixtures and factories
✅ Implementing TDD workflow
✅ User mentions: "test", "testing", "pytest", "vitest", "coverage", "TDD", "unit test", "integration test", "e2e", "test setup", "CI testing"

Template References

These testing patterns come from Grey Haven production templates:

Frontend: cvi-template (Vitest + Playwright + React Testing Library)
Backend: cvi-backend-template (pytest + FastAPI TestClient + async fixtures)

Critical Reminders

Coverage: 80% minimum (enforced in CI, blocks merge)
Test markers: unit, integration, e2e, benchmark (use consistently)
Doppler: ALWAYS use for test environment variables (never commit .env!)
Virtual env: MUST activate for Python tests (source .venv/bin/activate)
Tenant isolation: ALWAYS test multi-tenant scenarios
Fixtures: Use factories for test data generation (faker library)
Mocking: Mock external services in unit tests (use vi.mock or pytest mocks)
CI: Run tests with doppler run --config test
Database: Use separate test database (Doppler provides DATABASE_URL_TEST)
Cleanup: Clean up test data after each test (use fixtures with cleanup)

Next Steps

Need test examples? See EXAMPLES.md for copy-paste code
Need configurations? See REFERENCE.md for complete configs
Need templates? See templates/ for starter files
Need checklists? Use checklists/ for systematic test reviews
Need to run tests? Use scripts/ for helper utilities

GitHub Repository

greyhaven-ai/claude-code-config

Path: grey-haven-plugins/testing/skills/testing-strategy

Related Skills

sglang

evaluating-llms-harness

Testing

This Claude Skill runs the lm-evaluation-harness to benchmark LLMs across 60+ standardized academic tasks like MMLU and GSM8K. It's designed for developers to compare model quality, track training progress, or report academic results. The tool supports various backends including HuggingFace and vLLM models.

View skill

content-collections

llamaguard

Other

LlamaGuard is Meta's 7-8B parameter model for moderating LLM inputs and outputs across six safety categories like violence and hate speech. It offers 94-95% accuracy and can be deployed using vLLM, Hugging Face, or Amazon SageMaker. Use this skill to easily integrate content filtering and safety guardrails into your AI applications.

View skill