testability-scoring
关于
This skill provides AI-powered testability assessment for web applications using Playwright and optional Vibium integration. It evaluates applications against 10 principles of intrinsic testability including Observability, Controllability, and Stability. Use it when assessing software testability, identifying improvements, or generating testability reports.
快速安装
Claude Code
推荐/plugin add https://github.com/proffesor-for-testing/agentic-qegit clone https://github.com/proffesor-for-testing/agentic-qe.git ~/.claude/skills/testability-scoring在 Claude Code 中复制并粘贴此命令以安装该技能
技能文档
Testability Scoring
<default_to_action> When assessing testability:
- RUN assessment against target URL
- ANALYZE all 10 principles automatically
- GENERATE HTML report with radar chart
- PRIORITIZE improvements by impact/effort
- INTEGRATE with QX Partner for holistic view
Quick Assessment:
# Run assessment on any URL
TEST_URL='https://example.com/' npx playwright test tests/testability-scoring/testability-scoring.spec.js --project=chromium --workers=1
# Or use shell script wrapper
.claude/skills/testability-scoring/scripts/run-assessment.sh https://example.com/
The 10 Principles at a Glance:
| Principle | Weight | Key Question |
|---|---|---|
| Observability | 15% | Can we see what's happening? |
| Controllability | 15% | Can we control the application? |
| Algorithmic Simplicity | 10% | Are behaviors predictable? |
| Algorithmic Transparency | 10% | Can we understand what it does? |
| Algorithmic Stability | 10% | Does behavior remain consistent? |
| Explainability | 10% | Is the interface understandable? |
| Unbugginess | 10% | How error-free is it? |
| Smallness | 10% | Are components appropriately sized? |
| Decomposability | 5% | Can we test parts in isolation? |
| Similarity | 5% | Is the tech stack familiar? |
Grade Scale:
- A (90-100): Excellent testability
- B (80-89): Good testability
- C (70-79): Adequate testability
- D (60-69): Below average
- F (0-59): Poor testability </default_to_action>
Quick Reference Card
Running Assessments
| Method | Command | When to Use |
|---|---|---|
| Shell Script | ./scripts/run-assessment.sh URL | One-time assessment |
| ENV Override | TEST_URL='URL' npx playwright test... | CI/CD integration |
| Config File | Update tests/testability-scoring/config.js | Repeated runs |
Principle Details
High Weight (15% each)
| Principle | Measures | Indicators |
|---|---|---|
| Observability | State visibility, logging, monitoring | Console output, network tracking, error visibility |
| Controllability | Input control, state manipulation | API access, test data injection, determinism |
Medium Weight (10% each)
| Principle | Measures | Indicators |
|---|---|---|
| Simplicity | Predictable behavior | Clear I/O relationships, low complexity |
| Transparency | Understanding what system does | Visible processes, readable code |
| Stability | Consistent behavior | Change resilience, maintainability |
| Explainability | Interface understanding | Good docs, semantic structure, help text |
| Unbugginess | Error-free operation | Console errors, warnings, runtime issues |
| Smallness | Component size | Element count, script bloat, page complexity |
Low Weight (5% each)
| Principle | Measures | Indicators |
|---|---|---|
| Decomposability | Isolation testing | Component separation, modular design |
| Similarity | Technology familiarity | Standard frameworks, known patterns |
Assessment Workflow
1. Navigate to URL → 2. Collect Metrics → 3. Score Principles
↓
4. Generate JSON ← 5. Calculate Grades ← 6. Apply Weights
↓
7. Generate HTML Report with Radar Chart
↓
8. Open in Browser (auto-opens)
Output Files
tests/reports/
├── testability-results-<timestamp>.json # Raw data
├── testability-report-<timestamp>.html # Visual report
└── latest.json # Symlink
Integration Examples
CI/CD Integration
# GitHub Actions
- name: Testability Assessment
run: |
timeout 180 .claude/skills/testability-scoring/scripts/run-assessment.sh ${{ env.APP_URL }}
- name: Upload Reports
uses: actions/upload-artifact@v3
with:
name: testability-reports
path: tests/reports/testability-*.html
QX Partner Integration
// Combine testability with QX analysis
const qxAnalysis = await Task("QX Analysis", {
target: 'https://example.com',
integrateTestability: true
}, "qx-partner");
// Returns combined insights:
// - QX Score: 78/100
// - Testability Integration: Observability 72/100
// - Combined Insight: Low observability may mask UX issues
Programmatic Usage
import { runTestabilityAssessment } from './testability';
const results = await runTestabilityAssessment('https://example.com');
console.log(`Overall: ${results.overallScore}/100 (${results.grade})`);
console.log('Recommendations:', results.recommendations);
Agent Integration
// Run testability assessment
const assessment = await Task("Testability Assessment", {
url: 'https://example.com',
generateReport: true,
openBrowser: true
}, "qe-quality-analyzer");
// Use with QX Partner for holistic analysis
const qxReport = await Task("Full QX Analysis", {
target: 'https://example.com',
integrateTestability: true,
detectOracleProblems: true
}, "qx-partner");
Vibium Integration (Optional)
Overview
Vibium browser automation can be used alongside Playwright for enhanced testability assessment. While Playwright remains the primary engine, Vibium offers complementary capabilities for certain metrics.
Installation:
claude mcp add vibium -- npx -y vibium
Vibium-Enhanced Metrics
| Principle | Vibium Enhancement | Benefit |
|---|---|---|
| Observability | Auto-wait duration tracking | Measures DOM stability (30s timeout, 100ms polling) |
| Controllability | Element interaction success rate | Validates automation readiness via MCP |
| Stability | Screenshot consistency | Visual regression detection for layout stability |
| Explainability | Element attribute extraction | ARIA labels, semantic HTML validation |
When to Use Vibium
✅ USE Vibium for:
- Element stability metrics (auto-wait duration analysis)
- Visual consistency checks (screenshot comparison)
- MCP-native AI agent integration
- Lightweight Docker images (400MB vs 1.2GB)
❌ USE Playwright for:
- Console error detection (Vibium V1 lacks console API)
- Network performance metrics (BiDi network APIs coming in V2)
- Comprehensive browser coverage (Firefox, Safari)
- Production-proven stability (Vibium V1 released Dec 2024)
Hybrid Assessment Example
// Testability assessment using both engines
const assessment = {
// Playwright: Comprehensive metrics
playwright: await runPlaywrightAssessment(url),
// Vibium: Stability metrics
vibium: {
elementStability: await measureAutoWaitDuration(url),
visualConsistency: await compareScreenshots(url),
accessibilityAttributes: await extractARIALabels(url)
}
};
// Enhanced Observability Score
const observability =
(assessment.playwright.consoleErrors * 0.6) +
(assessment.vibium.elementStability * 0.4);
Vibium MCP Tools for Testability
// 1. Element Stability Measurement
const browser = await browser_launch();
await browser_navigate({ url });
const startTime = Date.now();
const element = await browser_find({ selector: ".critical-element" });
const autoWaitDuration = Date.now() - startTime;
// Lower duration = better stability
// 2. Visual Consistency Check
const screenshot1 = await browser_screenshot();
await browser_navigate({ url }); // Reload
const screenshot2 = await browser_screenshot();
const visualDiff = compareImages(screenshot1.png, screenshot2.png);
// Lower diff = better stability
// 3. Accessibility Attribute Extraction
const elements = await browser_find({ selector: "button, a, input" });
const ariaLabels = elements.map(el => el.attributes["aria-label"]);
const semanticScore = (ariaLabels.filter(Boolean).length / elements.length) * 100;
Migration Strategy
Current (V2.2): Hybrid approach
- Playwright: Primary engine for all 10 principles
- Vibium: Optional enhancement for stability metrics
Future (V3.0): When Vibium V2 ships
- Evaluate Vibium as primary engine if:
- Console/Network APIs available
- Production stability proven
- Community adoption increases
Agent Coordination Hints
Memory Namespace
aqe/testability/
├── assessments/* - Assessment results by URL
├── historical/* - Historical scores for trend analysis
├── recommendations/* - Improvement recommendations
├── integration/* - QX integration data
└── vibium/* - Vibium-specific metrics (optional)
Fleet Coordination
const testabilityFleet = await FleetManager.coordinate({
strategy: 'testability-assessment',
agents: [
'qe-quality-analyzer', // Primary assessment
'qx-partner', // UX integration
'qe-visual-tester' // Visual validation
],
topology: 'sequential'
});
Common Issues & Solutions
| Issue | Solution |
|---|---|
| Tests timing out | Increase timeout: timeout 300 ./scripts/run-assessment.sh URL |
| Partial results | Check console errors, increase network timeout |
| Report not opening | Use AUTO_OPEN=false, open manually |
| Config not updating | Use TEST_URL env var instead |
| Vibium not available | Install via claude mcp add vibium -- npx -y vibium (optional) |
| Hybrid mode errors | Vibium is optional; assessments work without it |
Related Skills
- accessibility-testing - WCAG compliance (overlaps with Explainability)
- visual-testing-advanced - UI consistency
- performance-testing - Load time metrics
Credits & References
Framework Origin
- Heuristics for Software Testability by James Bach and Michael Bolton
- Available at: https://www.satisfice.com/download/heuristics-of-software-testability
Implementation
- Based on https://github.com/fndlalit/testability-scorer (contributed by @fndlalit)
- Playwright v1.49.0+ with AI capabilities (primary engine)
- Vibium v1.0+ with MCP integration (optional enhancement)
- Chart.js for radar visualizations
Vibium Resources
- GitHub: https://github.com/VibiumDev/vibium
- MCP Integration:
claude mcp add vibium -- npx -y vibium - Created by Jason Huggins (creator of Selenium/Appium)
Remember
Testability is an investment, not an afterthought.
Good testability:
- Reduces debugging time
- Enables faster feedback loops
- Makes defects easier to find
- Supports continuous testing
Low scores = High risk. Prioritize improvements by weight × impact.
GitHub 仓库
相关推荐技能
compatibility-testing
其他该Skill用于跨浏览器、跨平台和跨设备的兼容性测试,确保应用在不同环境下体验一致。它特别适用于验证浏览器支持、测试响应式设计或确保平台兼容性等场景。关键能力包括定义浏览器矩阵、并行测试多设备、使用云服务进行真机测试以及跨平台视觉对比。
visual-testing-advanced
其他这是一个用于高级视觉回归测试的Claude Skill,它能通过像素级比对和AI差异分析来检测UI变更与设计偏差。该技能主要用于在响应式设计和跨浏览器场景中,自动化捕获、对比屏幕截图并验证视觉一致性,帮助开发者快速识别界面回归。其核心特性包括动态内容遮罩、多视口测试以及智能差异审查。
code-review-quality
其他这是一个专注于代码质量、可测试性和可维护性的上下文驱动代码审查技能。它能在审查代码、提供反馈或建立审查实践时,自动从质量、安全、性能和测试覆盖度等多维度进行分析。该技能通过优先级分类、提问式反馈和提供上下文建议,帮助开发者进行高效、有建设性的代码评审。
test-automation-strategy
其他该Skill为开发者提供完整的测试自动化策略指导,涵盖测试金字塔设计、FIRST原则应用和CI/CD集成。它适用于构建新自动化框架或优化现有测试效率的场景,内置页面对象模式等最佳实践模板。通过结构化方法帮助团队建立可靠、可维护且与开发流程深度集成的自动化测试体系。
