autoscaling-configuration

aj-geddes

Updated Today

14 views

Developmentgeneral

About

This Claude Skill configures autoscaling for Kubernetes, VMs, and serverless workloads to automatically adjust resource capacity based on demand. It enables scaling driven by metrics, schedules, and custom indicators to optimize performance and cost-efficiency. Use it for traffic-driven scaling, handling high-traffic events, and resource utilization optimization.

Documentation

Autoscaling Configuration

Overview

Implement autoscaling strategies to automatically adjust resource capacity based on demand, ensuring cost efficiency while maintaining performance and availability.

When to Use

Traffic-driven workload scaling
Time-based scheduled scaling
Resource utilization optimization
Cost reduction
High-traffic event handling
Batch processing optimization
Database connection pooling

Implementation Examples

1. Kubernetes Horizontal Pod Autoscaler

# hpa-configuration.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: "1000"
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Percent
          value: 50
          periodSeconds: 15
        - type: Pods
          value: 2
          periodSeconds: 60
      selectPolicy: Min
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
        - type: Percent
          value: 100
          periodSeconds: 15
        - type: Pods
          value: 4
          periodSeconds: 15
      selectPolicy: Max

---
# Vertical Pod Autoscaler for resource optimization
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: myapp-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
      - containerName: myapp
        minAllowed:
          cpu: 50m
          memory: 64Mi
        maxAllowed:
          cpu: 1000m
          memory: 512Mi
        controlledResources:
          - cpu
          - memory

2. AWS Auto Scaling

# aws-autoscaling.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: autoscaling-config
  namespace: production
data:
  setup-asg.sh: |
    #!/bin/bash
    set -euo pipefail

    ASG_NAME="myapp-asg"
    MIN_SIZE=2
    MAX_SIZE=10
    DESIRED_CAPACITY=3
    TARGET_CPU=70
    TARGET_MEMORY=80

    echo "Creating Auto Scaling Group..."

    # Create launch template
    aws ec2 create-launch-template \
      --launch-template-name myapp-template \
      --version-description "Production version" \
      --launch-template-data '{
        "ImageId": "ami-0c55b159cbfafe1f0",
        "InstanceType": "t3.medium",
        "KeyName": "myapp-key",
        "SecurityGroupIds": ["sg-0123456789abcdef0"],
        "UserData": "#!/bin/bash\ncd /app && docker-compose up -d",
        "TagSpecifications": [{
          "ResourceType": "instance",
          "Tags": [{"Key": "Name", "Value": "myapp-instance"}]
        }]
      }' || true

    # Create Auto Scaling Group
    aws autoscaling create-auto-scaling-group \
      --auto-scaling-group-name "$ASG_NAME" \
      --launch-template LaunchTemplateName=myapp-template \
      --min-size $MIN_SIZE \
      --max-size $MAX_SIZE \
      --desired-capacity $DESIRED_CAPACITY \
      --availability-zones us-east-1a us-east-1b us-east-1c \
      --target-group-arns arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/myapp/abcdef123456 \
      --health-check-type ELB \
      --health-check-grace-period 300 \
      --tags "Key=Name,Value=myapp,PropagateAtLaunch=true"

    # Create CPU scaling policy
    aws autoscaling put-scaling-policy \
      --auto-scaling-group-name "$ASG_NAME" \
      --policy-name myapp-cpu-scaling \
      --policy-type TargetTrackingScaling \
      --target-tracking-configuration '{
        "TargetValue": '$TARGET_CPU',
        "PredefinedMetricSpecification": {
          "PredefinedMetricType": "ASGAverageCPUUtilization"
        },
        "ScaleOutCooldown": 60,
        "ScaleInCooldown": 300
      }'

    echo "Auto Scaling Group created: $ASG_NAME"

---
apiVersion: batch/v1
kind: CronJob
metadata:
  name: scheduled-autoscaling
  namespace: production
spec:
  # Scale up at 8 AM
  - schedule: "0 8 * * 1-5"
    jobTemplate:
      spec:
        template:
          spec:
            containers:
              - name: autoscale
                image: amazon/aws-cli:latest
                command:
                  - sh
                  - -c
                  - |
                    aws autoscaling set-desired-capacity \
                      --auto-scaling-group-name myapp-asg \
                      --desired-capacity 10
            restartPolicy: OnFailure

  # Scale down at 6 PM
  - schedule: "0 18 * * 1-5"
    jobTemplate:
      spec:
        template:
          spec:
            containers:
              - name: autoscale
                image: amazon/aws-cli:latest
                command:
                  - sh
                  - -c
                  - |
                    aws autoscaling set-desired-capacity \
                      --auto-scaling-group-name myapp-asg \
                      --desired-capacity 3
            restartPolicy: OnFailure

3. Custom Metrics Autoscaling

# custom-metrics-hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: custom-metrics-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  minReplicas: 1
  maxReplicas: 50
  metrics:
    # Queue depth from custom metrics
    - type: Pods
      pods:
        metric:
          name: job_queue_depth
        target:
          type: AverageValue
          averageValue: "100"

    # Request rate from custom metrics
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: "1000"

    # Custom business metric
    - type: Pods
      pods:
        metric:
          name: active_connections
        target:
          type: AverageValue
          averageValue: "500"

---
# Prometheus ServiceMonitor for custom metrics
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: myapp-metrics
  namespace: production
spec:
  selector:
    matchLabels:
      app: myapp
  endpoints:
    - port: metrics
      interval: 30s
      path: /metrics

4. Autoscaling Script

#!/bin/bash
# autoscaling-setup.sh - Complete autoscaling configuration

set -euo pipefail

ENVIRONMENT="${1:-production}"
DEPLOYMENT="${2:-myapp}"

echo "Setting up autoscaling for $DEPLOYMENT in $ENVIRONMENT"

# Create HPA
cat <<EOF | kubectl apply -f -
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: ${DEPLOYMENT}-hpa
  namespace: ${ENVIRONMENT}
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: ${DEPLOYMENT}
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Percent
          value: 50
          periodSeconds: 15
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
        - type: Percent
          value: 100
          periodSeconds: 15
EOF

echo "HPA created successfully"

# Monitor autoscaling
echo "Monitoring autoscaling events..."
kubectl get hpa ${DEPLOYMENT}-hpa -n $ENVIRONMENT -w

5. Monitoring Autoscaling

# autoscaling-monitoring.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: autoscaling-alerts
  namespace: monitoring
data:
  alerts.yaml: |
    groups:
      - name: autoscaling
        rules:
          - alert: HpaMaxedOut
            expr: |
              kube_hpa_status_current_replicas == kube_hpa_status_desired_replicas
              and
              kube_hpa_status_desired_replicas == kube_hpa_spec_max_replicas
            for: 10m
            labels:
              severity: warning
            annotations:
              summary: "HPA {{ $labels.hpa }} is at maximum replicas"

          - alert: HpaMinedOut
            expr: |
              kube_hpa_status_current_replicas == kube_hpa_status_desired_replicas
              and
              kube_hpa_status_desired_replicas == kube_hpa_spec_min_replicas
            for: 30m
            labels:
              severity: info
            annotations:
              summary: "HPA {{ $labels.hpa }} is at minimum replicas"

          - alert: AsgCapacityLow
            expr: |
              aws_autoscaling_group_desired_capacity / aws_autoscaling_group_max_size < 0.2
            for: 10m
            labels:
              severity: warning
            annotations:
              summary: "ASG {{ $labels.auto_scaling_group_name }} has low capacity"

Best Practices

✅ DO

Set appropriate min/max replicas
Monitor metric aggregation window
Implement cooldown periods
Use multiple metrics
Test scaling behavior
Monitor scaling events
Plan for peak loads
Implement fallback strategies

❌ DON'T

Set min replicas to 1
Scale too aggressively
Ignore cooldown periods
Use single metric only
Forget to test scaling
Scale below resource needs
Neglect monitoring
Deploy without capacity tests

Scaling Metrics

CPU Utilization: Most common metric
Memory Utilization: Heap-bound applications
Request Rate: API-driven scaling
Queue Depth: Async job processing
Custom Metrics: Business-specific indicators

Resources

Quick Install

/plugin add https://github.com/aj-geddes/useful-ai-prompts/tree/main/autoscaling-configuration

Copy and paste this command in Claude Code to install this skill

GitHub 仓库

aj-geddes/useful-ai-prompts

Path: skills/autoscaling-configuration

Related Skills

subagent-driven-development

Development

This skill executes implementation plans by dispatching a fresh subagent for each independent task, with code review between tasks. It enables fast iteration while maintaining quality gates through this review process. Use it when working on mostly independent tasks within the same session to ensure continuous progress with built-in quality checks.

View skill

algorithmic-art

executing-plans

Design

Use the executing-plans skill when you have a complete implementation plan to execute in controlled batches with review checkpoints. It loads and critically reviews the plan, then executes tasks in small batches (default 3 tasks) while reporting progress between each batch for architect review. This ensures systematic implementation with built-in quality control checkpoints.

View skill

cost-optimization

Other

This Claude Skill helps developers optimize cloud costs through resource rightsizing, tagging strategies, and spending analysis. It provides a framework for reducing cloud expenses and implementing cost governance across AWS, Azure, and GCP. Use it when you need to analyze infrastructure costs, right-size resources, or meet budget constraints.

View skill