infrastructure-cost-optimization

aj-geddes

Updated Today

24 views

Othergeneral

About

This skill helps developers optimize cloud infrastructure costs through resource rightsizing, reserved instances, and spot instance integration. It is designed for use in cloud cost reduction, budget management, and identifying and eliminating resource waste. The skill provides actionable strategies and implementation examples for continuous cost optimization without sacrificing performance.

Documentation

Infrastructure Cost Optimization

Overview

Reduce infrastructure costs through intelligent resource allocation, reserved instances, spot instances, and continuous optimization without sacrificing performance.

When to Use

Cloud cost reduction
Budget management and tracking
Resource utilization optimization
Multi-environment cost allocation
Waste identification and elimination
Reserved instance planning
Spot instance integration

Implementation Examples

1. AWS Cost Optimization Configuration

# cost-optimization-setup.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: cost-optimization-scripts
  namespace: operations
data:
  analyze-costs.sh: |
    #!/bin/bash
    set -euo pipefail

    echo "=== AWS Cost Analysis ==="

    # Get daily cost trend
    echo "Daily costs for last 7 days:"
    aws ce get-cost-and-usage \
      --time-period Start=$(date -d '7 days ago' +%Y-%m-%d),End=$(date +%Y-%m-%d) \
      --granularity DAILY \
      --metrics "BlendedCost" \
      --group-by Type=DIMENSION,Key=SERVICE \
      --query 'ResultsByTime[*].[TimePeriod.Start,Total.BlendedCost.Amount]' \
      --output table

    # Find unattached resources
    echo -e "\n=== Unattached EBS Volumes ==="
    aws ec2 describe-volumes \
      --filters Name=status,Values=available \
      --query 'Volumes[*].[VolumeId,Size,CreateTime]' \
      --output table

    echo -e "\n=== Unattached Elastic IPs ==="
    aws ec2 describe-addresses \
      --filters Name=association-id,Values=none \
      --query 'Addresses[*].[PublicIp,AllocationId]' \
      --output table

    echo -e "\n=== Unused RDS Instances ==="
    aws rds describe-db-instances \
      --query 'DBInstances[?DBInstanceStatus==`available`].[DBInstanceIdentifier,DBInstanceClass,Engine,AllocatedStorage]' \
      --output table

    # Estimate savings with Reserved Instances
    echo -e "\n=== Reserved Instance Savings Potential ==="
    aws ce get-reservation-purchase-recommendation \
      --service "EC2" \
      --lookback-period THIRTY_DAYS \
      --query 'Recommendations[0].[RecommendationSummary.TotalEstimatedMonthlySavingsAmount,RecommendationSummary.TotalEstimatedMonthlySavingsPercentage]' \
      --output table

  optimize-resources.sh: |
    #!/bin/bash
    set -euo pipefail

    echo "Starting resource optimization..."

    # Remove unattached volumes
    echo "Removing unattached volumes..."
    aws ec2 describe-volumes \
      --filters Name=status,Values=available \
      --query 'Volumes[*].VolumeId' \
      --output text | \
    while read volume_id; do
      echo "Deleting volume: $volume_id"
      aws ec2 delete-volume --volume-id "$volume_id" 2>/dev/null || true
    done

    # Release unused Elastic IPs
    echo "Releasing unused Elastic IPs..."
    aws ec2 describe-addresses \
      --filters Name=association-id,Values=none \
      --query 'Addresses[*].AllocationId' \
      --output text | \
    while read alloc_id; do
      echo "Releasing EIP: $alloc_id"
      aws ec2 release-address --allocation-id "$alloc_id" 2>/dev/null || true
    done

    # Modify RDS to smaller instances
    echo "Analyzing RDS for downsizing..."
    # Implement logic to check CloudWatch metrics and downsize if needed

    echo "Optimization complete"

---
# Terraform cost optimization
resource "aws_ec2_instance" "spot" {
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = "t3.medium"

  # Use spot instances for non-critical workloads
  instance_market_options {
    market_type = "spot"

    spot_options {
      max_price                      = "0.05"  # Set max price
      spot_instance_type             = "persistent"
      interrupt_behavior             = "terminate"
      valid_until                    = "2025-12-31T23:59:59Z"
    }
  }

  tags = {
    Name = "spot-instance"
    CostCenter = "engineering"
  }
}

# Reserved instance for baseline capacity
resource "aws_ec2_instance" "reserved" {
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = "t3.medium"

  # Tag for reserved instance matching
  tags = {
    Name = "reserved-instance"
    ReservationType = "reserved"
  }
}

resource "aws_ec2_fleet" "mixed" {
  name = "mixed-capacity"

  launch_template_configs {
    launch_template_specification {
      launch_template_id = aws_launch_template.app.id
      version            = "$Latest"
    }

    overrides {
      instance_type       = "t3.medium"
      weighted_capacity   = "1"
      priority            = 1  # Reserved
    }

    overrides {
      instance_type       = "t3.large"
      weighted_capacity   = "2"
      priority            = 2  # Reserved
    }

    overrides {
      instance_type       = "t3a.medium"
      weighted_capacity   = "1"
      priority            = 3  # Spot
    }

    overrides {
      instance_type       = "t3a.large"
      weighted_capacity   = "2"
      priority            = 4  # Spot
    }
  }

  target_capacity_specification {
    total_target_capacity  = 10
    on_demand_target_capacity = 6
    spot_target_capacity = 4
    default_target_capacity_type = "on-demand"
  }

  fleet_type = "maintain"
}

2. Kubernetes Cost Optimization

# k8s-cost-optimization.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: cost-optimization-policies
  namespace: kube-system
data:
  policies.yaml: |
    # Resource quotas per namespace
    apiVersion: v1
    kind: ResourceQuota
    metadata:
      name: compute-quota
      namespace: production
    spec:
      hard:
        requests.cpu: "100"
        requests.memory: "200Gi"
        limits.cpu: "200"
        limits.memory: "400Gi"
        pods: "500"
      scopeSelector:
        matchExpressions:
          - operator: In
            scopeName: PriorityClass
            values: ["high", "medium"]

---
# Pod Disruption Budget for cost-effective scaling
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: cost-optimized-pdb
  namespace: production
spec:
  minAvailable: 1
  selector:
    matchLabels:
      tier: backend

---
# Prioritize spot instances with taints/tolerations
apiVersion: v1
kind: Node
metadata:
  name: spot-node-1
spec:
  taints:
    - key: cloud.google.com/gke-preemptible
      value: "true"
      effect: NoSchedule

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cost-optimized-app
  namespace: production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      # Tolerate spot instances
      tolerations:
        - key: cloud.google.com/gke-preemptible
          operator: Equal
          value: "true"
          effect: NoSchedule

      # Prefer nodes with lower cost
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              preference:
                matchExpressions:
                  - key: karpenter.sh/capacity-type
                    operator: In
                    values: ["spot"]

      containers:
        - name: app
          image: myapp:latest
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: 500m
              memory: 512Mi

3. Cost Monitoring Dashboard

# cost-monitoring.py
import boto3
import json
from datetime import datetime, timedelta

class CostOptimizer:
    def __init__(self):
        self.ce_client = boto3.client('ce')
        self.ec2_client = boto3.client('ec2')
        self.rds_client = boto3.client('rds')

    def get_daily_costs(self, days=30):
        """Get daily costs for past N days"""
        end_date = datetime.now().date()
        start_date = end_date - timedelta(days=days)

        response = self.ce_client.get_cost_and_usage(
            TimePeriod={
                'Start': str(start_date),
                'End': str(end_date)
            },
            Granularity='DAILY',
            Metrics=['BlendedCost'],
            GroupBy=[
                {'Type': 'DIMENSION', 'Key': 'SERVICE'}
            ]
        )

        return response

    def find_underutilized_instances(self):
        """Find EC2 instances with low CPU usage"""
        cloudwatch = boto3.client('cloudwatch')
        instances = []

        ec2_instances = self.ec2_client.describe_instances()
        for reservation in ec2_instances['Reservations']:
            for instance in reservation['Instances']:
                instance_id = instance['InstanceId']

                # Check CPU utilization
                response = cloudwatch.get_metric_statistics(
                    Namespace='AWS/EC2',
                    MetricName='CPUUtilization',
                    Dimensions=[{'Name': 'InstanceId', 'Value': instance_id}],
                    StartTime=datetime.now() - timedelta(days=7),
                    EndTime=datetime.now(),
                    Period=3600,
                    Statistics=['Average']
                )

                if response['Datapoints']:
                    avg_cpu = sum(d['Average'] for d in response['Datapoints']) / len(response['Datapoints'])
                    if avg_cpu < 10:  # Less than 10% average
                        instances.append({
                            'InstanceId': instance_id,
                            'Type': instance['InstanceType'],
                            'AverageCPU': avg_cpu,
                            'Recommendation': 'Downsize or terminate'
                        })

        return instances

    def estimate_reserved_instance_savings(self):
        """Estimate potential savings from reserved instances"""
        response = self.ce_client.get_reservation_purchase_recommendation(
            Service='EC2',
            LookbackPeriod='THIRTY_DAYS',
            PageSize=100
        )

        total_savings = 0
        for recommendation in response.get('Recommendations', []):
            summary = recommendation['RecommendationSummary']
            savings = float(summary['EstimatedMonthlyMonthlySavingsAmount'])
            total_savings += savings

        return total_savings

    def generate_report(self):
        """Generate comprehensive cost optimization report"""
        print("=== Cost Optimization Report ===\n")

        # Daily costs
        print("Daily Costs:")
        costs = self.get_daily_costs(7)
        for result in costs['ResultsByTime']:
            date = result['TimePeriod']['Start']
            total = result['Total']['BlendedCost']['Amount']
            print(f"  {date}: ${total}")

        # Underutilized instances
        print("\nUnderutilized Instances:")
        underutilized = self.find_underutilized_instances()
        for instance in underutilized:
            print(f"  {instance['InstanceId']}: {instance['AverageCPU']:.1f}% CPU - {instance['Recommendation']}")

        # Reserved instance savings
        print("\nReserved Instance Savings Potential:")
        savings = self.estimate_reserved_instance_savings()
        print(f"  Estimated Monthly Savings: ${savings:.2f}")

# Usage
if __name__ == '__main__':
    optimizer = CostOptimizer()
    optimizer.generate_report()

Cost Optimization Strategies

✅ DO

Use reserved instances for baseline
Leverage spot instances
Right-size resources
Monitor cost trends
Implement auto-scaling
Use multi-region pricing
Tag resources consistently
Schedule non-essential resources

❌ DON'T

Over-provision resources
Ignore unused resources
Neglect cost monitoring
Run all on-demand
Forget to release EIPs
Mix cost centers
Ignore savings opportunities
Deploy without budgets

Cost Saving Opportunities

Reserved Instances: 40-70% savings
Spot Instances: 70-90% savings
Committed Use Discounts: 25-55% savings
Right-sizing: 10-30% savings
Resource cleanup: 5-20% savings

Resources

Quick Install

/plugin add https://github.com/aj-geddes/useful-ai-prompts/tree/main/infrastructure-cost-optimization

Copy and paste this command in Claude Code to install this skill

GitHub 仓库

aj-geddes/useful-ai-prompts

Path: skills/infrastructure-cost-optimization

Related Skills

subagent-driven-development

Development

This skill executes implementation plans by dispatching a fresh subagent for each independent task, with code review between tasks. It enables fast iteration while maintaining quality gates through this review process. Use it when working on mostly independent tasks within the same session to ensure continuous progress with built-in quality checks.

View skill

algorithmic-art

executing-plans

Design

Use the executing-plans skill when you have a complete implementation plan to execute in controlled batches with review checkpoints. It loads and critically reviews the plan, then executes tasks in small batches (default 3 tasks) while reporting progress between each batch for architect review. This ensures systematic implementation with built-in quality control checkpoints.

View skill

cost-optimization

Other

This Claude Skill helps developers optimize cloud costs through resource rightsizing, tagging strategies, and spending analysis. It provides a framework for reducing cloud expenses and implementing cost governance across AWS, Azure, and GCP. Use it when you need to analyze infrastructure costs, right-size resources, or meet budget constraints.

View skill