Back to Skills

setup-service-mesh

pjt222
Updated 2 days ago
6 views
17
2
17
View on GitHub
Designdesign

About

This skill automates the deployment and configuration of a service mesh (Istio or Linkerd) in a Kubernetes environment. It enables secure service-to-service communication with mTLS, advanced traffic management, and observability without requiring application code changes. Use it when your microservices need encrypted communication, fine-grained traffic control like canary releases, or consistent circuit-breaking and retry policies.

Quick Install

Claude Code

Recommended
Primary
npx skills add pjt222/agent-almanac -a claude-code
Plugin CommandAlternative
/plugin add https://github.com/pjt222/agent-almanac
Git CloneAlternative
git clone https://github.com/pjt222/agent-almanac.git ~/.claude/skills/setup-service-mesh

Copy and paste this command in Claude Code to install this skill

Documentation

Setup Service Mesh

Deploy+configure mesh → secure svc-to-svc + advanced traffic mgmt.

Use When

  • Microservices arch needs encrypted svc-to-svc
  • Fine traffic ctrl (canary, A/B, splitting)
  • Observability across all svc interactions w/o app changes
  • Enforce security policies (mTLS, authz) at infra level
  • Impl circuit break, retries, timeouts consistent
  • Distributed tracing + svc dependency mapping

In

  • Required: K8s cluster w/ admin
  • Required: Mesh choice (Istio|Linkerd)
  • Required: Namespace(s) to enable
  • Optional: Monitoring stack (Prometheus, Grafana, Jaeger)
  • Optional: Custom traffic mgmt reqs
  • Optional: CA config for mTLS

Do

See Extended Examples for complete config + templates.

Step 1: Install Control Plane

Istio:

curl -L https://istio.io/downloadIstio | ISTIO_VERSION=1.20.2 sh -
istioctl install --set profile=production -y
kubectl get pods -n istio-system

Linkerd:

curl -sL https://run.linkerd.io/install | sh
linkerd check --pre
linkerd install --ha | kubectl apply -f -
linkerd check

Mesh config w/ resource limits + tracing:

# service-mesh-config.yaml (abbreviated)
spec:
  profile: production
  meshConfig:
    enableTracing: true
  components:
    pilot:
      k8s:
        resources: { requests: { cpu: 500m, memory: 2Gi } }
# See EXAMPLES.md Step 1 for complete configuration

→ Control plane pods running in istio-system|linkerd ns. istioctl version|linkerd version shows matching client+server.

If err:

  • Cluster has resources (≥4 CPU, 8GB RAM prod)
  • K8s ver compat (check mesh docs)
  • Logs: kubectl logs -n istio-system -l app=istiod|kubectl logs -n linkerd -l linkerd.io/control-plane-component=controller
  • Conflicting CRDs: kubectl get crd | grep istio|grep linkerd

Step 2: Auto Sidecar Injection

Istio:

# Label namespace for automatic injection
kubectl label namespace default istio-injection=enabled
kubectl get namespace -L istio-injection

Linkerd:

# Annotate namespace for injection
kubectl annotate namespace default linkerd.io/inject=enabled

Test:

# test-deployment.yaml (abbreviated)
apiVersion: apps/v1
kind: Deployment
spec:
  replicas: 2
  template:
    spec:
      containers:
      - name: app
        image: nginx:alpine
# See EXAMPLES.md Step 2 for complete test deployment
kubectl apply -f test-deployment.yaml
kubectl get pods -n default
# Expect 2/2 containers (app + proxy)

→ New pods 2/2 (app + sidecar). Describe shows istio-proxy|linkerd-proxy. Logs show successful proxy startup.

If err:

  • Labels|annotations: kubectl get ns default -o yaml
  • Webhook active: kubectl get mutatingwebhookconfiguration
  • Inject logs: kubectl logs -n istio-system -l app=sidecar-injector (Istio)
  • Manual inject test: kubectl get deploy test-app -o yaml | istioctl kube-inject -f - | kubectl apply -f -

Step 3: mTLS Policy

Istio:

# mtls-policy.yaml (abbreviated)
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: istio-system
spec:
  mtls:
    mode: STRICT
# See EXAMPLES.md Step 3 for per-namespace and permissive mode examples

Linkerd:

# Linkerd enforces mTLS by default for meshed pods
linkerd viz tap deploy/test-app -n default
# Check for 🔒 (lock) symbol

Apply + verify:

kubectl apply -f mtls-policy.yaml
# Istio: verify mTLS status
istioctl authn tls-check $(kubectl get pod -n default -l app=test-app -o jsonpath='{.items[0].metadata.name}') -n default

→ All meshed conns mTLS enabled. Istio tls-check STATUS "OK". Linkerd tap 🔒 all conns. No TLS errs in logs.

If err:

  • Cert issuance: kubectl get certificates -A (cert-manager)
  • CA healthy: kubectl logs -n istio-system -l app=istiod | grep -i cert
  • PERMISSIVE first → STRICT
  • Svcs w/o sidecars: kubectl get pods --all-namespaces -o json | jq '.items[] | select(.spec.containers | length == 1) | .metadata.name'

Step 4: Traffic Mgmt Rules

# traffic-management.yaml (abbreviated)
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
spec:
  http:
  - match:
    - uri: { prefix: /api/v2 }
    route:
    - destination: { host: api-service, subset: v2 }
      weight: 10
    - destination: { host: api-service, subset: v1 }
      weight: 90
    retries: { attempts: 3, perTryTimeout: 2s }
# See EXAMPLES.md Step 4 for complete routing, circuit breaker, and gateway configs

Linkerd traffic split:

apiVersion: split.smi-spec.io/v1alpha2
kind: TrafficSplit
spec:
  service: api-service
  backends:
  - service: api-service-v1
    weight: 900
  - service: api-service-v2
    weight: 100

Apply + test:

kubectl apply -f traffic-management.yaml
# Test traffic distribution
for i in {1..100}; do curl -s http://api.example.com/api/v2 | grep version; done | sort | uniq -c
# Monitor: istioctl dashboard kiali or linkerd viz dashboard

→ Splits per weights. Circuit breaker trips after consecutive errs. Retries on transient. Kiali|Linkerd dashboard shows flow viz.

If err:

  • Dest hosts resolve: kubectl get svc -n production
  • Subset labels match pod: kubectl get pods -n production --show-labels
  • Pilot logs: kubectl logs -n istio-system -l app=istiod
  • Test w/o circuit breaker first → add incrementally
  • istioctl analyze -n production

Step 5: Observability Integration

Install addons:

# Istio: Prometheus, Grafana, Kiali, Jaeger
kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.20/samples/addons/prometheus.yaml
kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.20/samples/addons/grafana.yaml
kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.20/samples/addons/kiali.yaml
kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.20/samples/addons/jaeger.yaml

# Linkerd
linkerd viz install | kubectl apply -f -
linkerd jaeger install | kubectl apply -f -

Custom metrics + dashboards:

# service-monitor.yaml (abbreviated)
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: istio-mesh-metrics
spec:
  selector: { matchLabels: { app: istiod } }
  endpoints:
  - port: http-monitoring
    interval: 30s
# See EXAMPLES.md Step 5 for Grafana dashboards and telemetry config

Access:

istioctl dashboard grafana  # or: linkerd viz dashboard
istioctl dashboard kiali
istioctl dashboard jaeger

→ Dashboards show topology, request rates, latency percentiles, err rates. Distributed traces in Jaeger. Prometheus scraping mesh metrics. Custom metrics in queries.

If err:

  • Prometheus scraping: kubectl get servicemonitor -A
  • Addon pods running: kubectl get pods -n istio-system
  • Telemetry config: istioctl proxy-config log <pod-name> -n <namespace>
  • Mesh config has tracing: kubectl get configmap istio -n istio-system -o yaml | grep -A 5 enableTracing
  • Port conflicts if port-forward fails

Step 6: Validate + Monitor Mesh Health

# Istio validation
istioctl analyze --all-namespaces
istioctl verify-install
istioctl proxy-status

# Linkerd validation
linkerd check
linkerd viz check
linkerd diagnostics policy

# Check proxy sync status
kubectl get pods -n production -o json | \
  jq '.items[] | {name: .metadata.name, proxy: .status.containerStatuses[] | select(.name=="istio-proxy").ready}'

# Monitor control plane health
kubectl get pods -n istio-system -w
kubectl top pods -n istio-system

Health check + alerts:

#!/bin/bash
# mesh-health-check.sh (abbreviated)
echo "=== Service Mesh Health Check ==="
kubectl get pods -n istio-system
istioctl analyze --all-namespaces
# See EXAMPLES.md Step 6 for complete health check script and alert configs

→ All checks pass no warns. Proxy-status all synced. mTLS check confirms encryption. Metrics show traffic. Control plane stable, low resource use.

If err:

  • Address istioctl analyze output
  • Proxy logs per pod: kubectl logs <pod> -c istio-proxy -n <namespace>
  • Net policies not blocking mesh
  • Control plane logs: kubectl logs -n istio-system deploy/istiod --tail=100
  • Restart problematic: kubectl rollout restart deploy/<deployment> -n <namespace>

Check

  • Control plane pods running healthy (istiod|linkerd-controller)
  • Sidecars injected all app pods (2/2)
  • mTLS enabled+functioning (tls-check|tap verified)
  • Traffic rules route correctly (curl tests)
  • Circuit breaker trips on repeated fails (fault inject)
  • Observability dashboards show metrics (Grafana|Kiali|Linkerd Viz)
  • Distributed traces in Jaeger
  • No warnings from istioctl analyze|linkerd check
  • Proxy sync status all in sync
  • Svc-to-svc encrypted (logs|dashboards verified)

Traps

  • Resource exhaustion: Mesh adds 100-200MB/pod for sidecars. Cluster needs capacity. Set limits in inject config.
  • Config conflicts: Multi VirtualServices same host = undefined behavior. Single VS per host w/ multi match conditions.
  • Cert expiration: mTLS auto-rotate but CA root managed. Monitor expiry: kubectl get certificate -A + alerts.
  • Sidecar not injected: Pods pre-label won't have sidecars. Recreate: kubectl rollout restart deploy/<name> -n <namespace>.
  • DNS issues: Mesh intercepts DNS. Use FQ names (service.namespace.svc.cluster.local) cross-ns.
  • Port naming req: Istio needs named ports protocol-name pattern (http-web, tcp-db). Unnamed → TCP passthrough.
  • Gradual rollout req: Don't enable STRICT mTLS immediate prod. PERMISSIVE during migration → verify all meshed → STRICT.
  • Observability overhead: 100% tracing sampling = perf issues. Use 1-10% prod: sampling: 1.0 in mesh config.
  • Gateway vs VS confusion: Gateway = ingress (LB), VS = routing. Both needed for external.
  • Ver compat: Mesh ver compat w/ K8s. Istio supports n-1 minor; Linkerd typically last 3 K8s vers.

  • configure-ingress-networking — Gateway complements mesh ingress
  • deploy-to-kubernetes — app deploy patterns w/ mesh
  • setup-prometheus-monitoring — Prometheus integ for mesh metrics
  • manage-kubernetes-secrets — cert mgmt for mTLS
  • enforce-policy-as-code — OPA policies alongside mesh authz

GitHub Repository

pjt222/agent-almanac
Path: i18n/caveman-ultra/skills/setup-service-mesh
0
agentsagentskillsai-assisted-developmentclaude-codeskillsteams

Related Skills

executing-plans

Design

Use the executing-plans skill when you have a complete implementation plan to execute in controlled batches with review checkpoints. It loads and critically reviews the plan, then executes tasks in small batches (default 3 tasks) while reporting progress between each batch for architect review. This ensures systematic implementation with built-in quality control checkpoints.

View skill

requesting-code-review

Design

This skill dispatches a code-reviewer subagent to analyze code changes against requirements before proceeding. It should be used after completing tasks, implementing major features, or before merging to main. The review helps catch issues early by comparing the current implementation with the original plan.

View skill

connect-mcp-server

Design

This skill provides a comprehensive guide for developers to connect MCP servers to Claude Code using HTTP, stdio, or SSE transports. It covers installation, configuration, authentication, and security for integrating external services like GitHub, Notion, and custom APIs. Use it when setting up MCP integrations, configuring external tools, or working with Claude's Model Context Protocol.

View skill

web-cli-teleport

Design

This skill helps developers choose between Claude Code Web and CLI interfaces based on task analysis, then enables seamless session teleportation between these environments. It optimizes workflow by managing session state and context when switching between web, CLI, or mobile. Use it for complex projects requiring different tools at various stages.

View skill