infrastructure-cost-optimization
About
This skill helps developers optimize cloud infrastructure costs through resource rightsizing, reserved instances, and spot instance integration. It is designed for use in cloud cost reduction, budget management, and identifying and eliminating resource waste. The skill provides actionable strategies and implementation examples for continuous cost optimization without sacrificing performance.
Documentation
Infrastructure Cost Optimization
Overview
Reduce infrastructure costs through intelligent resource allocation, reserved instances, spot instances, and continuous optimization without sacrificing performance.
When to Use
- Cloud cost reduction
- Budget management and tracking
- Resource utilization optimization
- Multi-environment cost allocation
- Waste identification and elimination
- Reserved instance planning
- Spot instance integration
Implementation Examples
1. AWS Cost Optimization Configuration
# cost-optimization-setup.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: cost-optimization-scripts
namespace: operations
data:
analyze-costs.sh: |
#!/bin/bash
set -euo pipefail
echo "=== AWS Cost Analysis ==="
# Get daily cost trend
echo "Daily costs for last 7 days:"
aws ce get-cost-and-usage \
--time-period Start=$(date -d '7 days ago' +%Y-%m-%d),End=$(date +%Y-%m-%d) \
--granularity DAILY \
--metrics "BlendedCost" \
--group-by Type=DIMENSION,Key=SERVICE \
--query 'ResultsByTime[*].[TimePeriod.Start,Total.BlendedCost.Amount]' \
--output table
# Find unattached resources
echo -e "\n=== Unattached EBS Volumes ==="
aws ec2 describe-volumes \
--filters Name=status,Values=available \
--query 'Volumes[*].[VolumeId,Size,CreateTime]' \
--output table
echo -e "\n=== Unattached Elastic IPs ==="
aws ec2 describe-addresses \
--filters Name=association-id,Values=none \
--query 'Addresses[*].[PublicIp,AllocationId]' \
--output table
echo -e "\n=== Unused RDS Instances ==="
aws rds describe-db-instances \
--query 'DBInstances[?DBInstanceStatus==`available`].[DBInstanceIdentifier,DBInstanceClass,Engine,AllocatedStorage]' \
--output table
# Estimate savings with Reserved Instances
echo -e "\n=== Reserved Instance Savings Potential ==="
aws ce get-reservation-purchase-recommendation \
--service "EC2" \
--lookback-period THIRTY_DAYS \
--query 'Recommendations[0].[RecommendationSummary.TotalEstimatedMonthlySavingsAmount,RecommendationSummary.TotalEstimatedMonthlySavingsPercentage]' \
--output table
optimize-resources.sh: |
#!/bin/bash
set -euo pipefail
echo "Starting resource optimization..."
# Remove unattached volumes
echo "Removing unattached volumes..."
aws ec2 describe-volumes \
--filters Name=status,Values=available \
--query 'Volumes[*].VolumeId' \
--output text | \
while read volume_id; do
echo "Deleting volume: $volume_id"
aws ec2 delete-volume --volume-id "$volume_id" 2>/dev/null || true
done
# Release unused Elastic IPs
echo "Releasing unused Elastic IPs..."
aws ec2 describe-addresses \
--filters Name=association-id,Values=none \
--query 'Addresses[*].AllocationId' \
--output text | \
while read alloc_id; do
echo "Releasing EIP: $alloc_id"
aws ec2 release-address --allocation-id "$alloc_id" 2>/dev/null || true
done
# Modify RDS to smaller instances
echo "Analyzing RDS for downsizing..."
# Implement logic to check CloudWatch metrics and downsize if needed
echo "Optimization complete"
---
# Terraform cost optimization
resource "aws_ec2_instance" "spot" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t3.medium"
# Use spot instances for non-critical workloads
instance_market_options {
market_type = "spot"
spot_options {
max_price = "0.05" # Set max price
spot_instance_type = "persistent"
interrupt_behavior = "terminate"
valid_until = "2025-12-31T23:59:59Z"
}
}
tags = {
Name = "spot-instance"
CostCenter = "engineering"
}
}
# Reserved instance for baseline capacity
resource "aws_ec2_instance" "reserved" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t3.medium"
# Tag for reserved instance matching
tags = {
Name = "reserved-instance"
ReservationType = "reserved"
}
}
resource "aws_ec2_fleet" "mixed" {
name = "mixed-capacity"
launch_template_configs {
launch_template_specification {
launch_template_id = aws_launch_template.app.id
version = "$Latest"
}
overrides {
instance_type = "t3.medium"
weighted_capacity = "1"
priority = 1 # Reserved
}
overrides {
instance_type = "t3.large"
weighted_capacity = "2"
priority = 2 # Reserved
}
overrides {
instance_type = "t3a.medium"
weighted_capacity = "1"
priority = 3 # Spot
}
overrides {
instance_type = "t3a.large"
weighted_capacity = "2"
priority = 4 # Spot
}
}
target_capacity_specification {
total_target_capacity = 10
on_demand_target_capacity = 6
spot_target_capacity = 4
default_target_capacity_type = "on-demand"
}
fleet_type = "maintain"
}
2. Kubernetes Cost Optimization
# k8s-cost-optimization.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: cost-optimization-policies
namespace: kube-system
data:
policies.yaml: |
# Resource quotas per namespace
apiVersion: v1
kind: ResourceQuota
metadata:
name: compute-quota
namespace: production
spec:
hard:
requests.cpu: "100"
requests.memory: "200Gi"
limits.cpu: "200"
limits.memory: "400Gi"
pods: "500"
scopeSelector:
matchExpressions:
- operator: In
scopeName: PriorityClass
values: ["high", "medium"]
---
# Pod Disruption Budget for cost-effective scaling
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: cost-optimized-pdb
namespace: production
spec:
minAvailable: 1
selector:
matchLabels:
tier: backend
---
# Prioritize spot instances with taints/tolerations
apiVersion: v1
kind: Node
metadata:
name: spot-node-1
spec:
taints:
- key: cloud.google.com/gke-preemptible
value: "true"
effect: NoSchedule
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: cost-optimized-app
namespace: production
spec:
replicas: 3
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
# Tolerate spot instances
tolerations:
- key: cloud.google.com/gke-preemptible
operator: Equal
value: "true"
effect: NoSchedule
# Prefer nodes with lower cost
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
containers:
- name: app
image: myapp:latest
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
3. Cost Monitoring Dashboard
# cost-monitoring.py
import boto3
import json
from datetime import datetime, timedelta
class CostOptimizer:
def __init__(self):
self.ce_client = boto3.client('ce')
self.ec2_client = boto3.client('ec2')
self.rds_client = boto3.client('rds')
def get_daily_costs(self, days=30):
"""Get daily costs for past N days"""
end_date = datetime.now().date()
start_date = end_date - timedelta(days=days)
response = self.ce_client.get_cost_and_usage(
TimePeriod={
'Start': str(start_date),
'End': str(end_date)
},
Granularity='DAILY',
Metrics=['BlendedCost'],
GroupBy=[
{'Type': 'DIMENSION', 'Key': 'SERVICE'}
]
)
return response
def find_underutilized_instances(self):
"""Find EC2 instances with low CPU usage"""
cloudwatch = boto3.client('cloudwatch')
instances = []
ec2_instances = self.ec2_client.describe_instances()
for reservation in ec2_instances['Reservations']:
for instance in reservation['Instances']:
instance_id = instance['InstanceId']
# Check CPU utilization
response = cloudwatch.get_metric_statistics(
Namespace='AWS/EC2',
MetricName='CPUUtilization',
Dimensions=[{'Name': 'InstanceId', 'Value': instance_id}],
StartTime=datetime.now() - timedelta(days=7),
EndTime=datetime.now(),
Period=3600,
Statistics=['Average']
)
if response['Datapoints']:
avg_cpu = sum(d['Average'] for d in response['Datapoints']) / len(response['Datapoints'])
if avg_cpu < 10: # Less than 10% average
instances.append({
'InstanceId': instance_id,
'Type': instance['InstanceType'],
'AverageCPU': avg_cpu,
'Recommendation': 'Downsize or terminate'
})
return instances
def estimate_reserved_instance_savings(self):
"""Estimate potential savings from reserved instances"""
response = self.ce_client.get_reservation_purchase_recommendation(
Service='EC2',
LookbackPeriod='THIRTY_DAYS',
PageSize=100
)
total_savings = 0
for recommendation in response.get('Recommendations', []):
summary = recommendation['RecommendationSummary']
savings = float(summary['EstimatedMonthlyMonthlySavingsAmount'])
total_savings += savings
return total_savings
def generate_report(self):
"""Generate comprehensive cost optimization report"""
print("=== Cost Optimization Report ===\n")
# Daily costs
print("Daily Costs:")
costs = self.get_daily_costs(7)
for result in costs['ResultsByTime']:
date = result['TimePeriod']['Start']
total = result['Total']['BlendedCost']['Amount']
print(f" {date}: ${total}")
# Underutilized instances
print("\nUnderutilized Instances:")
underutilized = self.find_underutilized_instances()
for instance in underutilized:
print(f" {instance['InstanceId']}: {instance['AverageCPU']:.1f}% CPU - {instance['Recommendation']}")
# Reserved instance savings
print("\nReserved Instance Savings Potential:")
savings = self.estimate_reserved_instance_savings()
print(f" Estimated Monthly Savings: ${savings:.2f}")
# Usage
if __name__ == '__main__':
optimizer = CostOptimizer()
optimizer.generate_report()
Cost Optimization Strategies
✅ DO
- Use reserved instances for baseline
- Leverage spot instances
- Right-size resources
- Monitor cost trends
- Implement auto-scaling
- Use multi-region pricing
- Tag resources consistently
- Schedule non-essential resources
❌ DON'T
- Over-provision resources
- Ignore unused resources
- Neglect cost monitoring
- Run all on-demand
- Forget to release EIPs
- Mix cost centers
- Ignore savings opportunities
- Deploy without budgets
Cost Saving Opportunities
- Reserved Instances: 40-70% savings
- Spot Instances: 70-90% savings
- Committed Use Discounts: 25-55% savings
- Right-sizing: 10-30% savings
- Resource cleanup: 5-20% savings
Resources
Quick Install
/plugin add https://github.com/aj-geddes/useful-ai-prompts/tree/main/infrastructure-cost-optimizationCopy and paste this command in Claude Code to install this skill
GitHub 仓库
Related Skills
subagent-driven-development
DevelopmentThis skill executes implementation plans by dispatching a fresh subagent for each independent task, with code review between tasks. It enables fast iteration while maintaining quality gates through this review process. Use it when working on mostly independent tasks within the same session to ensure continuous progress with built-in quality checks.
algorithmic-art
MetaThis Claude Skill creates original algorithmic art using p5.js with seeded randomness and interactive parameters. It generates .md files for algorithmic philosophies, plus .html and .js files for interactive generative art implementations. Use it when developers need to create flow fields, particle systems, or other computational art while avoiding copyright issues.
executing-plans
DesignUse the executing-plans skill when you have a complete implementation plan to execute in controlled batches with review checkpoints. It loads and critically reviews the plan, then executes tasks in small batches (default 3 tasks) while reporting progress between each batch for architect review. This ensures systematic implementation with built-in quality control checkpoints.
cost-optimization
OtherThis Claude Skill helps developers optimize cloud costs through resource rightsizing, tagging strategies, and spending analysis. It provides a framework for reducing cloud expenses and implementing cost governance across AWS, Azure, and GCP. Use it when you need to analyze infrastructure costs, right-size resources, or meet budget constraints.
