Back to Skills

Conducting Chaos Engineering

jeremylongshore
Updated Yesterday
16 views
712
74
712
View on GitHub
Metaaitestingdesign

About

This skill enables Claude to design and execute chaos engineering experiments to test system resilience. It helps with failure injection, latency simulation, and resource exhaustion to validate recovery mechanisms like circuit breakers. The skill leverages tools like Chaos Mesh and AWS FIS to simulate real-world failures.

Quick Install

Claude Code

Recommended
Plugin CommandRecommended
/plugin add https://github.com/jeremylongshore/claude-code-plugins-plus
Git CloneAlternative
git clone https://github.com/jeremylongshore/claude-code-plugins-plus.git ~/.claude/skills/Conducting Chaos Engineering

Copy and paste this command in Claude Code to install this skill

Documentation

Overview

This skill empowers Claude to act as a chaos engineering specialist, guiding users through the process of designing and implementing controlled failure scenarios to identify weaknesses and improve the robustness of their systems. It facilitates the creation of chaos experiments to validate system resilience and recovery mechanisms.

How It Works

  1. Experiment Design: Claude helps define the scope, target system, and failure scenarios for the chaos experiment based on the user's objectives.
  2. Tool Selection: Claude recommends appropriate chaos engineering tools (e.g., Chaos Mesh, Gremlin, Toxiproxy, AWS FIS) based on the target environment and desired failure types.
  3. Execution and Monitoring: Claude assists with configuring and executing the chaos experiment, while monitoring key metrics to observe system behavior under stress.
  4. Analysis and Recommendations: Claude analyzes the results of the experiment, identifies vulnerabilities, and provides recommendations for improving system resilience.

When to Use This Skill

This skill activates when you need to:

  • Design a chaos experiment to test the resilience of a specific service or application.
  • Implement failure injection strategies to simulate real-world outages.
  • Validate the effectiveness of circuit breakers and retry mechanisms.
  • Analyze system behavior under stress and identify potential vulnerabilities.

Examples

Example 1: Database Failover Testing

User request: "Help me design a chaos experiment to test our database failover process."

The skill will:

  1. Design a chaos experiment involving simulated database failures and automated failover.
  2. Recommend using Chaos Mesh for Kubernetes environments or AWS FIS for AWS-hosted databases.

Example 2: API Latency Simulation

User request: "Create a latency injection test for our API gateway to simulate network congestion."

The skill will:

  1. Design a latency injection test using Toxiproxy to introduce delays in API requests.
  2. Monitor API response times and error rates to assess the impact of latency.

Best Practices

  • Define Clear Objectives: Clearly define the goals of the chaos experiment and the specific system behavior you want to test.
  • Start Small: Begin with small-scale experiments and gradually increase the scope and intensity of the failures.
  • Automate and Monitor: Automate the execution and monitoring of chaos experiments to ensure repeatability and accurate data collection.

Integration

This skill integrates with various chaos engineering tools, allowing Claude to orchestrate failure injection, latency simulation, and resource exhaustion testing across different environments. It can also be used in conjunction with monitoring tools to track system behavior and identify potential vulnerabilities.

GitHub Repository

jeremylongshore/claude-code-plugins-plus
Path: backups/plugin-enhancements/plugin-backups/chaos-engineering-toolkit_20251019_155039/skills/skill-adapter
aiautomationclaude-codedevopsmarketplacemcp

Related Skills

sglang

Meta

SGLang is a high-performance LLM serving framework that specializes in fast, structured generation for JSON, regex, and agentic workflows using its RadixAttention prefix caching. It delivers significantly faster inference, especially for tasks with repeated prefixes, making it ideal for complex, structured outputs and multi-turn conversations. Choose SGLang over alternatives like vLLM when you need constrained decoding or are building applications with extensive prefix sharing.

View skill

evaluating-llms-harness

Testing

This Claude Skill runs the lm-evaluation-harness to benchmark LLMs across 60+ standardized academic tasks like MMLU and GSM8K. It's designed for developers to compare model quality, track training progress, or report academic results. The tool supports various backends including HuggingFace and vLLM models.

View skill

content-collections

Meta

This skill provides a production-tested setup for Content Collections, a TypeScript-first tool that transforms Markdown/MDX files into type-safe data collections with Zod validation. Use it when building blogs, documentation sites, or content-heavy Vite + React applications to ensure type safety and automatic content validation. It covers everything from Vite plugin configuration and MDX compilation to deployment optimization and schema validation.

View skill

llamaguard

Other

LlamaGuard is Meta's 7-8B parameter model for moderating LLM inputs and outputs across six safety categories like violence and hate speech. It offers 94-95% accuracy and can be deployed using vLLM, Hugging Face, or Amazon SageMaker. Use this skill to easily integrate content filtering and safety guardrails into your AI applications.

View skill