MCP HubMCP Hub
スキル一覧に戻る

Prow Job Analyze Test Failure

openshift-eng
更新日 Today
55 閲覧
16
110
16
GitHubで表示
テストaitesting

について

このスキルは、Prow CIのテスト失敗を分析し、ジョブアーティファクトをダウンロードしてテストログ、リソース、イベントを調査します。プロジェクトのソースコードを検査し、詳細で構造化された障害分析を提供します。Prowジョブが失敗した際に包括的な診断情報を得るための初期調査としてご利用ください。

クイックインストール

Claude Code

推奨
プラグインコマンド推奨
/plugin add https://github.com/openshift-eng/ai-helpers
Git クローン代替
git clone https://github.com/openshift-eng/ai-helpers.git ~/.claude/skills/Prow Job Analyze Test Failure

このコマンドをClaude Codeにコピー&ペーストしてスキルをインストールします

ドキュメント

Prow Job Analyze Test Failure

This skill analyzes the given test failure by downloading artifacts using the "Prow Job Analyze Resource" skill, checking test logs, inspecting resources, logs and events from the artifacts, and the test source code.

When to Use This Skill

Use this skill when the user wants to do an initial analysis of a Prow CI test failure.

Prerequisites

Identical with "Prow Job Analyze Resource" skill.

Input Format

The user will provide:

  1. Prow job URL - gcsweb URL containing test-platform-results/

    • Example: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_hypershift/6731/pull-ci-openshift-hypershift-main-e2e-aws/1962527613477982208
    • URL may or may not have trailing slash
  2. Test name - test name that failed

    • Examples:
      • TestKarpenter/EnsureHostedCluster/ValidateMetricsAreExposed
      • TestCreateClusterCustomConfig
      • The openshift-console downloads pods [apigroup:console.openshift.io] should be scheduled on different nodes

Implementation Steps

Step 1: Parse and Validate URL

Use the "Parse and Validate URL" steps from "Prow Job Analyze Resource" skill

Step 2: Create Working Directory

  1. Check for existing artifacts first

    • Check if .work/prow-job-analyze-test-failure/{build_id}/logs/ directory exists and has content
    • If it exists with content:
      • Use AskUserQuestion tool to ask:
        • Question: "Artifacts already exist for build {build_id}. Would you like to use the existing download or re-download?"
        • Options:
          • "Use existing" - Skip to step Analyze Test Failure
          • "Re-download" - Continue to clean and re-download
      • If user chooses "Re-download":
        • Remove all existing content: rm -rf .work/prow-job-analyze-test-failure/{build_id}/logs/
        • Also remove tmp directory: rm -rf .work/prow-job-analyze-test-failure/{build_id}/tmp/
        • This ensures clean state before downloading new content
      • If user chooses "Use existing":
        • Skip directly to Step 4 (Analyze Test Failure)
        • Still need to download prowjob.json if it doesn't exist
  2. Create directory structure

    mkdir -p .work/prow-job-analyze-test-failure/{build_id}/logs
    mkdir -p .work/prow-job-analyze-test-failure/{build_id}/tmp
    
    • Use .work/prow-job-analyze-test-failure/ as the base directory (already in .gitignore)
    • Use build_id as subdirectory name
    • Create logs/ subdirectory for all downloads
    • Create tmp/ subdirectory for temporary files (intermediate JSON, etc.)
    • Working directory: .work/prow-job-analyze-test-failure/{build_id}/

Step 3: Download and Validate prowjob.json

Use the "Download and Validate prowjob.json" steps from "Prow Job Analyze Resource" skill.

Step 4: Analyze Test Failure

  1. Download build-log.txt

    gcloud storage cp gs://test-platform-results/{bucket-path}/build-log.txt .work/prow-job-analyze-test-failure/{build_id}/logs/build-log.txt --no-user-output-enabled
    
  2. Parse and validate

    • Read .work/prow-job-analyze-resource/{build_id}/logs/build-log.txt
    • Search for the Test name
    • Gather stack trace related to the test
  3. Examine intervals files for cluster activity during E2E failures

    • Search recursively for E2E timeline artifacts (known as "interval files") within the bucket-path:
      gcloud storage ls 'gs://test-platform-results/{bucket-path}/**/e2e-timelines_spyglass_*json'
      
    • The files can be nested at unpredictable levels below the bucket-path
    • There could be as many as two matching files
    • Download all matching interval files (use the full paths from the search results):
      gcloud storage cp gs://test-platform-results/{bucket-path}/**/e2e-timelines_spyglass_*.json .work/prow-job-analyze-test-failure/{build_id}/logs/ --no-user-output-enabled
      
    • If the wildcard copy doesn't work, copy each file individually using the full paths from the search results
    • Scan interval files for test failure timing:
      • Look for intervals where source = "E2ETest" and message.annotations.status = "Failed"
      • Note the from and to timestamps on this interval - this indicates when the test was running
    • Scan interval files for related cluster events:
      • Look for intervals that overlap the timeframe when the failed test was running
      • Filter for intervals with:
        • level = "Error" or level = "Warning"
        • source = "OperatorState"
      • These events may indicate cluster issues that caused or contributed to the test failure
  4. Determine root cause

    • Determine a possible root cause for the test failure
    • Analyze stack traces
    • Analyze related code in the code repository
    • Store artifacts from Prow CI job (json/yaml files) related to the failure under .work/prow-job-analyze-resource/{build_id}/tmp
    • Store logs under .work/prow-job-analyze-resource/{build_id}/logs/
    • Provide evidence for the failure
    • Try to find additional evidence. For example, in logs and events and other json/yaml files

Step 5: Present Results to User

  1. Display summary

    Test Failure Analysis Complete
    
    Prow Job: {prowjob-name}
    Build ID: {build_id}
    Error: {error message}
    
    Summary: {failure analysis}
    Evidence: {evidence}
    Additional evidence: {additional evidence}
    
    Artifacts downloaded to: .work/prow-job-analyze-test-failure/{build_id}/logs/
    

Error Handling

Handle errors in the same way as "Error handling" in "Prow Job Analyze Resource" skill

Performance Considerations

Follow the instructions in "Performance Considerations" in "Prow Job Analyze Resource" skill

GitHub リポジトリ

openshift-eng/ai-helpers
パス: plugins/prow-job/skills/prow-job-analyze-test-failure

関連スキル

content-collections

メタ

This skill provides a production-tested setup for Content Collections, a TypeScript-first tool that transforms Markdown/MDX files into type-safe data collections with Zod validation. Use it when building blogs, documentation sites, or content-heavy Vite + React applications to ensure type safety and automatic content validation. It covers everything from Vite plugin configuration and MDX compilation to deployment optimization and schema validation.

スキルを見る

evaluating-llms-harness

テスト

This Claude Skill runs the lm-evaluation-harness to benchmark LLMs across 60+ standardized academic tasks like MMLU and GSM8K. It's designed for developers to compare model quality, track training progress, or report academic results. The tool supports various backends including HuggingFace and vLLM models.

スキルを見る

sglang

メタ

SGLang is a high-performance LLM serving framework that specializes in fast, structured generation for JSON, regex, and agentic workflows using its RadixAttention prefix caching. It delivers significantly faster inference, especially for tasks with repeated prefixes, making it ideal for complex, structured outputs and multi-turn conversations. Choose SGLang over alternatives like vLLM when you need constrained decoding or are building applications with extensive prefix sharing.

スキルを見る

cloudflare-turnstile

メタ

This skill provides comprehensive guidance for implementing Cloudflare Turnstile as a CAPTCHA-alternative bot protection system. It covers integration for forms, login pages, API endpoints, and frameworks like React/Next.js/Hono, while handling invisible challenges that maintain user experience. Use it when migrating from reCAPTCHA, debugging error codes, or implementing token validation and E2E tests.

スキルを見る