HyperShift AWS Provider

openshift-eng

Updated Today

24 views

110

Metageneral

About

The HyperShift AWS Provider skill enables developers to deploy HyperShift clusters on AWS infrastructure using the `/hcp:generate aws` command. It handles AWS-specific requirements including STS credentials, IAM roles, and VPC configuration. This provides implementation guidance for setting up clusters with proper security and regional best practices.

Quick Install

Claude Code

Recommended

Plugin CommandRecommended

/plugin add https://github.com/openshift-eng/ai-helpers

Git CloneAlternative

git clone https://github.com/openshift-eng/ai-helpers.git ~/.claude/skills/HyperShift AWS Provider

Copy and paste this command in Claude Code to install this skill

Documentation

HyperShift AWS Provider

This skill provides implementation guidance for creating HyperShift clusters on AWS, handling AWS-specific requirements including STS credentials, IAM roles, VPC configuration, and regional best practices.

When to Use This Skill

This skill is automatically invoked by the /hcp:generate aws command to guide the AWS provider cluster creation process.

Prerequisites

AWS CLI configured with appropriate credentials
HyperShift operator installed and configured
STS credentials file for the target AWS account
IAM role with required permissions for HyperShift
Pull secret for accessing OpenShift images

AWS Provider Overview

AWS Provider Peculiarities

Requires AWS credentials (STS): Must have valid STS credentials file
Region selection affects availability zones: Different regions have different AZ availability
Instance types vary by region: Not all instance types available in all regions
VPC CIDR must not conflict: Must not overlap with existing infrastructure
IAM roles: Can be auto-created or use pre-existing roles

Common AWS Configurations

Development Environment:

Single replica control plane (cost-effective)
m5.large instances (balanced performance/cost)
2 availability zones (basic redundancy)
Basic networking (public endpoints)

Production Environment:

Highly available control plane
m5.xlarge+ instances (better performance)
3+ availability zones (high availability)
Custom VPC configuration
KMS encryption enabled

Cost-Optimized Environment:

Single NAT gateway
Smaller instance types
Minimal replicas
Spot instances (where applicable)

Implementation Steps

Step 1: Analyze Cluster Description

Parse the natural language description for AWS-specific requirements:

Environment Type Detection:

Development: "dev", "development", "testing", "demo", "sandbox"
Production: "prod", "production", "critical", "enterprise"
Cost-Optimized: "cheap", "cost", "minimal", "budget", "demo"

Performance Indicators:

High Performance: "performance", "fast", "high-compute", "intensive"
Standard: Default moderate configuration
Minimal: "small", "minimal", "basic", "simple"

Security/Compliance:

FIPS: "fips", "compliance", "security", "regulated"
Private: "private", "isolated", "secure", "internal"

Special Requirements:

Multi-AZ: "highly available", "ha", "multi-zone", "resilient"
Single-AZ: "single zone", "simple", "minimal"

Step 2: Apply AWS Provider Defaults

Required Parameters:

--region: AWS region (default: us-east-1)
--pull-secret: Path to pull secret file
--release-image: OpenShift release image
--sts-creds: REQUIRED - Path to STS credentials file
--role-arn: REQUIRED - ARN of the IAM role to assume
--base-domain: REQUIRED - Base domain for the cluster

Smart Defaults by Environment:

Development Environment:

--instance-type m5.large
--node-pool-replicas 2
--control-plane-availability-policy SingleReplica
--endpoint-access Public
--root-volume-size 120
--zones auto-select 2 zones based on region

Production Environment:

--instance-type m5.xlarge
--node-pool-replicas 3
--control-plane-availability-policy HighlyAvailable
--endpoint-access PublicAndPrivate
--root-volume-size 120
--auto-repair true
--zones auto-select 3+ zones based on region

Cost-Optimized Environment:

--instance-type m5.large
--node-pool-replicas 2
--control-plane-availability-policy SingleReplica
--endpoint-access Public
--root-volume-size 120
--zones auto-select 2 zones (minimal redundancy)

Step 3: Interactive Parameter Collection

Required Information Collection:

Cluster Name

🔹 **Cluster Name**: What would you like to name your cluster?
   - Must be DNS-compatible (lowercase, hyphens allowed)
   - Used for AWS resource naming
   - Example: dev-cluster, prod-app, demo-env

AWS Region

🔹 **AWS Region**: Which AWS region should host your cluster?
   - Consider latency to your users
   - Verify desired instance types are available
   - [Press Enter for default: us-east-1]

   Popular regions:
   - us-east-1 (N. Virginia) - Largest service availability
   - us-west-2 (Oregon) - West coast, latest services
   - eu-west-1 (Ireland) - Europe
   - ap-southeast-1 (Singapore) - Asia Pacific

STS Credentials

🔹 **STS Credentials**: Path to your AWS STS credentials file?
   - Required for AWS authentication
   - Generate using: aws sts get-session-token
   - Example: /home/user/.aws/sts-creds.json
   - Format: {"AccessKeyId": "...", "SecretAccessKey": "...", "SessionToken": "..."}

IAM Role ARN

🔹 **IAM Role ARN**: ARN of the IAM role for HyperShift?
   - Role must have required HyperShift permissions
   - Example: arn:aws:iam::123456789012:role/hypershift-operator-role
   - See: https://hypershift.openshift.io/aws-setup/

Base Domain

🔹 **Base Domain**: What base domain should be used for cluster DNS?
   - Must be a domain you control in Route53
   - Used for cluster API and application routes
   - Example: example.com, clusters.mycompany.com

Pull Secret

🔹 **Pull Secret**: Path to your OpenShift pull secret file?
   - Required for accessing OpenShift container images
   - Download from: https://console.redhat.com/openshift/install/pull-secret
   - Example: /home/user/pull-secret.json

OpenShift Version

🔹 **OpenShift Version**: Which OpenShift version do you want to use?

   📋 **Check supported versions**: https://amd64.ocp.releases.ci.openshift.org/

   - Enter release image URL: quay.io/openshift-release-dev/ocp-release:X.Y.Z-multi
   - [Press Enter for default: quay.io/openshift-release-dev/ocp-release:4.18.0-multi]

Optional Configuration (based on description analysis):

Instance Type (if performance requirements detected)

🔹 **Instance Type**: Select instance type based on your performance needs:
   - m5.large (2 vCPU, 8GB RAM) - Development, light workloads
   - m5.xlarge (4 vCPU, 16GB RAM) - Production, balanced workloads
   - m5.2xlarge (8 vCPU, 32GB RAM) - High-performance workloads
   - c5.xlarge (4 vCPU, 8GB RAM) - Compute-optimized
   - [Press Enter for default based on environment type]

Node Pool Replicas

🔹 **Node Pool Replicas**: How many worker nodes do you need?
   - Minimum: 2 (for basic redundancy)
   - Production recommended: 3+
   - [Press Enter for default based on environment type]

Availability Zones (auto-selected, but confirmed)

🔹 **Availability Zones**: Detected region: us-east-1
   Auto-selecting zones for optimal distribution:
   - Development: us-east-1a, us-east-1b (2 zones)
   - Production: us-east-1a, us-east-1b, us-east-1c (3 zones)

   Modify zone selection? [y/N]

Step 4: Advanced Configuration (Conditional)

For FIPS Compliance (if detected):

🔹 **FIPS Mode**: Enable FIPS mode for compliance?
   - Required for government/regulated workloads
   - May impact performance
   - [yes/no] [Press Enter for default: no]

For High-Performance Workloads:

🔹 **Root Volume Size**: Increase root volume size?
   - Default: 120GB
   - High-performance workloads: 200GB+
   - [Press Enter for default: 120]

For Production Environments:

🔹 **Auto-Repair**: Enable automatic node repair?
   - Automatically replaces unhealthy nodes
   - Recommended for production
   - [yes/no] [Press Enter for default: yes for production]

Step 5: Generate Command

Basic AWS Cluster Command:

hypershift create cluster aws \
  --name <cluster-name> \
  --namespace <cluster-name>-ns \
  --region <region> \
  --instance-type <instance-type> \
  --pull-secret <pull-secret-path> \
  --node-pool-replicas <replica-count> \
  --zones <zone-list> \
  --control-plane-availability-policy <policy> \
  --sts-creds <sts-creds-path> \
  --role-arn <role-arn> \
  --base-domain <base-domain> \
  --release-image <release-image>

Development Configuration Example:

hypershift create cluster aws \
  --name dev-cluster \
  --namespace dev-cluster-ns \
  --region us-east-1 \
  --instance-type m5.large \
  --pull-secret /path/to/pull-secret.json \
  --node-pool-replicas 2 \
  --zones us-east-1a,us-east-1b \
  --control-plane-availability-policy SingleReplica \
  --endpoint-access Public \
  --root-volume-size 120 \
  --sts-creds /path/to/sts-creds.json \
  --role-arn arn:aws:iam::123456789012:role/hypershift-role \
  --base-domain example.com \
  --release-image quay.io/openshift-release-dev/ocp-release:4.18.0-multi

Production Configuration Example:

hypershift create cluster aws \
  --name production-cluster \
  --namespace production-cluster-ns \
  --region us-west-2 \
  --instance-type m5.xlarge \
  --pull-secret /path/to/pull-secret.json \
  --node-pool-replicas 3 \
  --zones us-west-2a,us-west-2b,us-west-2c \
  --control-plane-availability-policy HighlyAvailable \
  --endpoint-access PublicAndPrivate \
  --root-volume-size 120 \
  --auto-repair \
  --sts-creds /path/to/sts-creds.json \
  --role-arn arn:aws:iam::123456789012:role/hypershift-prod-role \
  --base-domain clusters.company.com \
  --release-image quay.io/openshift-release-dev/ocp-release:4.18.0-multi

FIPS-Enabled Configuration:

hypershift create cluster aws \
  --name compliance-cluster \
  --namespace compliance-cluster-ns \
  --region us-gov-east-1 \
  --instance-type m5.xlarge \
  --pull-secret /path/to/pull-secret.json \
  --node-pool-replicas 3 \
  --zones us-gov-east-1a,us-gov-east-1b,us-gov-east-1c \
  --control-plane-availability-policy HighlyAvailable \
  --fips \
  --sts-creds /path/to/sts-creds.json \
  --role-arn arn:aws-us-gov:iam::123456789012:role/hypershift-fips-role \
  --base-domain secure.gov.example.com \
  --release-image quay.io/openshift-release-dev/ocp-release:4.18.0-multi

Step 6: Pre-Flight Validation

Provide validation commands:

## Pre-Flight Checks

Before creating the cluster, verify your setup:

1. **AWS Credentials:**
   aws sts get-caller-identity

2. **STS Credentials File:**
   cat /path/to/sts-creds.json | jq .

3. **IAM Role Access:**
   aws iam get-role --role-name hypershift-role

4. **Route53 Domain:**
   aws route53 list-hosted-zones --query "HostedZones[?Name=='example.com.']"

5. **Region Availability:**
   aws ec2 describe-availability-zones --region us-east-1

6. **Instance Type Availability:**
   aws ec2 describe-instance-type-offerings --location-type availability-zone --filters Name=instance-type,Values=m5.large --region us-east-1

Step 7: Post-Generation Instructions

Next Steps:

## Next Steps

1. **Verify prerequisites are met:**
   - AWS credentials configured
   - STS credentials file exists and is valid
   - IAM role has required permissions
   - Base domain exists in Route53

2. **Run the generated command:**
   Copy and paste the command above

3. **Monitor cluster creation:**
   kubectl get hostedcluster -n <cluster-namespace>
   kubectl get nodepool -n <cluster-namespace>

4. **Check AWS resources:**
   - EC2 instances in AWS console
   - Load balancers created
   - VPC and networking resources

5. **Access cluster when ready:**
   hypershift create kubeconfig --name <cluster-name> --namespace <cluster-namespace>
   export KUBECONFIG=<cluster-name>-kubeconfig
   oc get nodes

Error Handling

Invalid AWS Credentials

Scenario: AWS credentials are invalid or expired.

Action:

AWS credentials validation failed.

Please check:
1. AWS CLI configuration: aws configure list
2. STS credentials file validity
3. IAM permissions

Regenerate STS credentials:
  aws sts get-session-token --duration-seconds 3600

IAM Role Not Found

Scenario: Specified IAM role doesn't exist or can't be assumed.

Action:

IAM role "arn:aws:iam::123456789012:role/hypershift-role" not found or inaccessible.

Please verify:
1. Role exists: aws iam get-role --role-name hypershift-role
2. Role has required permissions
3. Trust relationship allows your account to assume the role

See HyperShift AWS setup guide: https://hypershift.openshift.io/aws-setup/

Region/Zone Issues

Scenario: Instance type not available in selected region/zones.

Action:

Instance type "m5.large" not available in zone "us-east-1f".

Checking alternative zones in us-east-1:
✅ us-east-1a (available)
✅ us-east-1b (available)
❌ us-east-1f (not available)

Suggested zones: us-east-1a,us-east-1b

Would you like me to update the command?

Route53 Domain Issues

Scenario: Base domain not found in Route53 or not accessible.

Action:

Base domain "example.com" not found in Route53.

Please ensure:
1. Domain exists in Route53: aws route53 list-hosted-zones
2. Account has access to the hosted zone
3. Domain spelling is correct

Alternative: Use a subdomain you control (e.g., clusters.mydomain.com)

Resource Limits

Scenario: AWS account limits would be exceeded.

Action:

AWS service limits may be exceeded:
- EC2 instances: Current: 18/20, Requested: 5 more
- Elastic IPs: Current: 4/5, Requested: 2 more

Consider:
1. Request limit increases via AWS Support
2. Choose smaller instance types
3. Reduce node count
4. Clean up unused resources

Best Practices

Cost Optimization

Right-size instances: Don't over-provision for development
Use Spot instances: Where appropriate for non-critical workloads
Monitor resource usage: Regularly review AWS costs
Clean up unused clusters: Delete development clusters when not needed

Security

Least privilege IAM: Use minimal required permissions
STS credentials: Use short-lived credentials when possible
Private networking: Use PrivateAndPublic endpoints for production
KMS encryption: Enable for sensitive workloads

High Availability

Multi-AZ deployment: Use 3+ availability zones for production
Instance distribution: Spread nodes across zones
Auto-repair: Enable for automatic recovery
Monitoring: Set up CloudWatch monitoring

Network Planning

VPC design: Plan CIDR ranges carefully
Subnet strategy: Use public/private subnet design
Load balancer: Configure appropriate load balancer types
DNS: Ensure proper Route53 configuration

Anti-Patterns to Avoid

❌ Using root AWS credentials

Never use root account credentials for HyperShift

✅ Use IAM roles and STS credentials

❌ Single availability zone for production

--zones us-east-1a  # Single point of failure

✅ Use multiple zones: --zones us-east-1a,us-east-1b,us-east-1c

❌ Over-provisioning for development

--instance-type m5.8xlarge --node-pool-replicas 10  # Expensive for dev

✅ Use appropriate sizing: --instance-type m5.large --node-pool-replicas 2

❌ Ignoring region-specific limitations

Choosing regions without checking instance type availability

✅ Verify instance types and services are available in target region

Example Workflows

Startup Development Environment

Input: "cheap AWS cluster for testing our new microservice"

Analysis:
- Environment: Development
- Cost focus: High priority
- Scale: Minimal

Generated Command:
hypershift create cluster aws \
  --name dev-microservice \
  --namespace dev-microservice-ns \
  --region us-east-1 \
  --instance-type m5.large \
  --node-pool-replicas 2 \
  --control-plane-availability-policy SingleReplica \
  --endpoint-access Public

Enterprise Production

Input: "highly available AWS production cluster for customer-facing applications"

Analysis:
- Environment: Production
- Availability: High priority
- Scale: Enterprise

Generated Command:
hypershift create cluster aws \
  --name prod-customer-apps \
  --namespace prod-customer-apps-ns \
  --region us-west-2 \
  --instance-type m5.xlarge \
  --node-pool-replicas 5 \
  --zones us-west-2a,us-west-2b,us-west-2c \
  --control-plane-availability-policy HighlyAvailable \
  --endpoint-access PublicAndPrivate \
  --auto-repair

GitHub Repository

openshift-eng/ai-helpers

Path: plugins/hcp/skills/hcp-create-aws

Related Skills

subagent-driven-development

Development

This skill executes implementation plans by dispatching a fresh subagent for each independent task, with code review between tasks. It enables fast iteration while maintaining quality gates through this review process. Use it when working on mostly independent tasks within the same session to ensure continuous progress with built-in quality checks.

View skill

algorithmic-art

executing-plans

Design

Use the executing-plans skill when you have a complete implementation plan to execute in controlled batches with review checkpoints. It loads and critically reviews the plan, then executes tasks in small batches (default 3 tasks) while reporting progress between each batch for architect review. This ensures systematic implementation with built-in quality control checkpoints.

View skill

cost-optimization

Other

This Claude Skill helps developers optimize cloud costs through resource rightsizing, tagging strategies, and spending analysis. It provides a framework for reducing cloud expenses and implementing cost governance across AWS, Azure, and GCP. Use it when you need to analyze infrastructure costs, right-size resources, or meet budget constraints.

View skill

HyperShift AWS Provider

About

Quick Install

Claude Code

Documentation

HyperShift AWS Provider

When to Use This Skill

Prerequisites

AWS Provider Overview

AWS Provider Peculiarities

Common AWS Configurations

Implementation Steps

Step 1: Analyze Cluster Description

Step 2: Apply AWS Provider Defaults

Step 3: Interactive Parameter Collection

Step 4: Advanced Configuration (Conditional)

Step 5: Generate Command

Step 6: Pre-Flight Validation

Step 7: Post-Generation Instructions

Error Handling

Invalid AWS Credentials

IAM Role Not Found

Region/Zone Issues

Route53 Domain Issues

Resource Limits

Best Practices

Cost Optimization

Security

High Availability

Network Planning

Anti-Patterns to Avoid

Example Workflows

Startup Development Environment

Enterprise Production

See Also

GitHub Repository

Related Skills

subagent-driven-development

algorithmic-art

executing-plans

cost-optimization