autoscaling-configuration
About
This Claude Skill configures autoscaling for Kubernetes, VMs, and serverless workloads to automatically adjust resource capacity based on demand. It enables scaling driven by metrics, schedules, and custom indicators to optimize performance and cost-efficiency. Use it for traffic-driven scaling, handling high-traffic events, and resource utilization optimization.
Documentation
Autoscaling Configuration
Overview
Implement autoscaling strategies to automatically adjust resource capacity based on demand, ensuring cost efficiency while maintaining performance and availability.
When to Use
- Traffic-driven workload scaling
- Time-based scheduled scaling
- Resource utilization optimization
- Cost reduction
- High-traffic event handling
- Batch processing optimization
- Database connection pooling
Implementation Examples
1. Kubernetes Horizontal Pod Autoscaler
# hpa-configuration.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: myapp-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "1000"
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 15
- type: Pods
value: 2
periodSeconds: 60
selectPolicy: Min
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
- type: Pods
value: 4
periodSeconds: 15
selectPolicy: Max
---
# Vertical Pod Autoscaler for resource optimization
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: myapp-vpa
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: myapp
minAllowed:
cpu: 50m
memory: 64Mi
maxAllowed:
cpu: 1000m
memory: 512Mi
controlledResources:
- cpu
- memory
2. AWS Auto Scaling
# aws-autoscaling.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: autoscaling-config
namespace: production
data:
setup-asg.sh: |
#!/bin/bash
set -euo pipefail
ASG_NAME="myapp-asg"
MIN_SIZE=2
MAX_SIZE=10
DESIRED_CAPACITY=3
TARGET_CPU=70
TARGET_MEMORY=80
echo "Creating Auto Scaling Group..."
# Create launch template
aws ec2 create-launch-template \
--launch-template-name myapp-template \
--version-description "Production version" \
--launch-template-data '{
"ImageId": "ami-0c55b159cbfafe1f0",
"InstanceType": "t3.medium",
"KeyName": "myapp-key",
"SecurityGroupIds": ["sg-0123456789abcdef0"],
"UserData": "#!/bin/bash\ncd /app && docker-compose up -d",
"TagSpecifications": [{
"ResourceType": "instance",
"Tags": [{"Key": "Name", "Value": "myapp-instance"}]
}]
}' || true
# Create Auto Scaling Group
aws autoscaling create-auto-scaling-group \
--auto-scaling-group-name "$ASG_NAME" \
--launch-template LaunchTemplateName=myapp-template \
--min-size $MIN_SIZE \
--max-size $MAX_SIZE \
--desired-capacity $DESIRED_CAPACITY \
--availability-zones us-east-1a us-east-1b us-east-1c \
--target-group-arns arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/myapp/abcdef123456 \
--health-check-type ELB \
--health-check-grace-period 300 \
--tags "Key=Name,Value=myapp,PropagateAtLaunch=true"
# Create CPU scaling policy
aws autoscaling put-scaling-policy \
--auto-scaling-group-name "$ASG_NAME" \
--policy-name myapp-cpu-scaling \
--policy-type TargetTrackingScaling \
--target-tracking-configuration '{
"TargetValue": '$TARGET_CPU',
"PredefinedMetricSpecification": {
"PredefinedMetricType": "ASGAverageCPUUtilization"
},
"ScaleOutCooldown": 60,
"ScaleInCooldown": 300
}'
echo "Auto Scaling Group created: $ASG_NAME"
---
apiVersion: batch/v1
kind: CronJob
metadata:
name: scheduled-autoscaling
namespace: production
spec:
# Scale up at 8 AM
- schedule: "0 8 * * 1-5"
jobTemplate:
spec:
template:
spec:
containers:
- name: autoscale
image: amazon/aws-cli:latest
command:
- sh
- -c
- |
aws autoscaling set-desired-capacity \
--auto-scaling-group-name myapp-asg \
--desired-capacity 10
restartPolicy: OnFailure
# Scale down at 6 PM
- schedule: "0 18 * * 1-5"
jobTemplate:
spec:
template:
spec:
containers:
- name: autoscale
image: amazon/aws-cli:latest
command:
- sh
- -c
- |
aws autoscaling set-desired-capacity \
--auto-scaling-group-name myapp-asg \
--desired-capacity 3
restartPolicy: OnFailure
3. Custom Metrics Autoscaling
# custom-metrics-hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: custom-metrics-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 1
maxReplicas: 50
metrics:
# Queue depth from custom metrics
- type: Pods
pods:
metric:
name: job_queue_depth
target:
type: AverageValue
averageValue: "100"
# Request rate from custom metrics
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "1000"
# Custom business metric
- type: Pods
pods:
metric:
name: active_connections
target:
type: AverageValue
averageValue: "500"
---
# Prometheus ServiceMonitor for custom metrics
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: myapp-metrics
namespace: production
spec:
selector:
matchLabels:
app: myapp
endpoints:
- port: metrics
interval: 30s
path: /metrics
4. Autoscaling Script
#!/bin/bash
# autoscaling-setup.sh - Complete autoscaling configuration
set -euo pipefail
ENVIRONMENT="${1:-production}"
DEPLOYMENT="${2:-myapp}"
echo "Setting up autoscaling for $DEPLOYMENT in $ENVIRONMENT"
# Create HPA
cat <<EOF | kubectl apply -f -
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: ${DEPLOYMENT}-hpa
namespace: ${ENVIRONMENT}
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: ${DEPLOYMENT}
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 15
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
EOF
echo "HPA created successfully"
# Monitor autoscaling
echo "Monitoring autoscaling events..."
kubectl get hpa ${DEPLOYMENT}-hpa -n $ENVIRONMENT -w
5. Monitoring Autoscaling
# autoscaling-monitoring.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: autoscaling-alerts
namespace: monitoring
data:
alerts.yaml: |
groups:
- name: autoscaling
rules:
- alert: HpaMaxedOut
expr: |
kube_hpa_status_current_replicas == kube_hpa_status_desired_replicas
and
kube_hpa_status_desired_replicas == kube_hpa_spec_max_replicas
for: 10m
labels:
severity: warning
annotations:
summary: "HPA {{ $labels.hpa }} is at maximum replicas"
- alert: HpaMinedOut
expr: |
kube_hpa_status_current_replicas == kube_hpa_status_desired_replicas
and
kube_hpa_status_desired_replicas == kube_hpa_spec_min_replicas
for: 30m
labels:
severity: info
annotations:
summary: "HPA {{ $labels.hpa }} is at minimum replicas"
- alert: AsgCapacityLow
expr: |
aws_autoscaling_group_desired_capacity / aws_autoscaling_group_max_size < 0.2
for: 10m
labels:
severity: warning
annotations:
summary: "ASG {{ $labels.auto_scaling_group_name }} has low capacity"
Best Practices
✅ DO
- Set appropriate min/max replicas
- Monitor metric aggregation window
- Implement cooldown periods
- Use multiple metrics
- Test scaling behavior
- Monitor scaling events
- Plan for peak loads
- Implement fallback strategies
❌ DON'T
- Set min replicas to 1
- Scale too aggressively
- Ignore cooldown periods
- Use single metric only
- Forget to test scaling
- Scale below resource needs
- Neglect monitoring
- Deploy without capacity tests
Scaling Metrics
- CPU Utilization: Most common metric
- Memory Utilization: Heap-bound applications
- Request Rate: API-driven scaling
- Queue Depth: Async job processing
- Custom Metrics: Business-specific indicators
Resources
Quick Install
/plugin add https://github.com/aj-geddes/useful-ai-prompts/tree/main/autoscaling-configurationCopy and paste this command in Claude Code to install this skill
GitHub 仓库
Related Skills
subagent-driven-development
DevelopmentThis skill executes implementation plans by dispatching a fresh subagent for each independent task, with code review between tasks. It enables fast iteration while maintaining quality gates through this review process. Use it when working on mostly independent tasks within the same session to ensure continuous progress with built-in quality checks.
algorithmic-art
MetaThis Claude Skill creates original algorithmic art using p5.js with seeded randomness and interactive parameters. It generates .md files for algorithmic philosophies, plus .html and .js files for interactive generative art implementations. Use it when developers need to create flow fields, particle systems, or other computational art while avoiding copyright issues.
executing-plans
DesignUse the executing-plans skill when you have a complete implementation plan to execute in controlled batches with review checkpoints. It loads and critically reviews the plan, then executes tasks in small batches (default 3 tasks) while reporting progress between each batch for architect review. This ensures systematic implementation with built-in quality control checkpoints.
cost-optimization
OtherThis Claude Skill helps developers optimize cloud costs through resource rightsizing, tagging strategies, and spending analysis. It provides a framework for reducing cloud expenses and implementing cost governance across AWS, Azure, and GCP. Use it when you need to analyze infrastructure costs, right-size resources, or meet budget constraints.
