返回技能列表

usability-testing

rampstackco
更新于 2 days ago
6 次查看
239
27
239
在 GitHub 上查看
其他testingdesign

关于

This skill helps developers plan and run usability tests on designs or prototypes to identify issues before launch. It handles test design, task scripting, moderation, and synthesizing findings from both moderated and unmoderated tests. Use it to validate designs, improve task completion, and ensure real users can successfully interact with your build.

快速安装

Claude Code

推荐
主要方式
npx skills add rampstackco/claude-skills -a claude-code
插件命令备选方式
/plugin add https://github.com/rampstackco/claude-skills
Git 克隆备选方式
git clone https://github.com/rampstackco/claude-skills.git ~/.claude/skills/usability-testing

在 Claude Code 中复制并粘贴此命令以安装该技能

技能文档

Usability Testing

Plan and run tests that find usability problems before users hit them in production. Stack-agnostic. Tool-agnostic.

This skill is for testing existing designs or prototypes. For broader discovery research, use ux-research. For conversion testing in production, use cro-optimization.


When to use

  • Before launching a new flow or major redesign
  • After a redesign to verify it doesn't introduce new problems
  • When analytics show drop-off but you don't know why
  • When customer support tickets pattern around specific UI areas
  • Pre-launch user validation
  • Comparing two design directions

When NOT to use

  • Discovery / generative research (use ux-research)
  • Live conversion optimization (use cro-optimization)
  • Mapping the broader experience (use journey-mapping)
  • Pure quantitative measurement (use analytics-strategy)

Required inputs

  • The design or prototype to test (functional or near-functional)
  • Specific tasks users would do
  • The audience (who should be tested)
  • Testing infrastructure (moderated tool, unmoderated tool, in-person setup)

The framework: 5 phases

1. Define what to test

Don't test the whole product. Test specific tasks.

Task selection criteria:

  • The task represents a real user goal (not "click around and explore")
  • The task has a clear start and end
  • The task is achievable in 2 to 10 minutes
  • The task is one of: most common, most strategic, most problematic

Examples of testable tasks:

"You want to find a contractor near you who can install a fence. Show me how you'd do that on this site."

"You're a first-time visitor. You want to understand if this product fits your needs. Walk me through how you'd evaluate it."

"Your team needs a new tool to manage projects. Use this site to figure out which plan is right for a 12-person team."

Task framing rules:

  • State the user goal, not the system action ("find a place to stay" not "click the search button")
  • Provide context (why are you doing this?)
  • Don't reveal the path
  • Don't use product terminology in the task framing

2. Choose moderated or unmoderated

Moderated (live, with researcher):

  • Researcher observes and probes in real time
  • Best for early-stage prototypes, complex tasks, novel concepts
  • Higher cost, smaller sample (5 to 8 participants typical)
  • Catches surprises and probe deeper

Unmoderated (recorded, asynchronous):

  • Participant completes alone, often via tool (UserTesting, Maze, Lookback)
  • Best for stable designs, simple tasks, larger sample
  • Lower cost, larger sample (15 to 30 participants typical)
  • Catches patterns at scale, less depth per session

For most teams: moderated for early/critical decisions, unmoderated for ongoing validation.

3. Recruit

Target audience - not just convenience.

Recruit criteria:

  • Match real users (target audience, not just "anyone")
  • Mix of experience levels with the product (new and existing if applicable)
  • Mix of relevant device types (mobile, desktop, tablet if relevant)
  • Exclude friends, family, employees

Sample size:

  • Moderated: 5 to 8 participants (Nielsen's "5 users find 85% of usability issues" for the most common segment)
  • Unmoderated: 15 to 30 participants (more participants compensate for less probing)
  • Multi-segment testing: 5 to 8 per segment

4. Run the test

Pre-task setup:

  • Confirm recording works
  • Brief participant (purpose, anonymity, recording, "no wrong answers")
  • Get verbal consent
  • Have participant share screen if remote

Moderated session structure:

  1. Warm-up (2 to 3 min). Easy questions to put participant at ease.
  2. Pre-test questions (3 to 5 min). Background context, current behavior with similar products.
  3. Task 1 (5 to 10 min). Describe task. Have participant attempt while thinking aloud.
  4. Post-task questions (1 to 2 min). What was easy/hard? Anything confusing?
  5. Repeat for tasks 2, 3, 4 (typically 3 to 5 tasks per 60-minute session).
  6. Overall debrief (5 to 10 min). General reactions, comparisons to alternatives, anything else.
  7. Close (2 min).

Moderation principles:

  • Encourage think-aloud ("What's going through your mind?")
  • Don't help unless they're truly stuck (and even then, only after a long pause)
  • Don't lead ("Are you looking for the menu?" - bad)
  • Note where they hesitate, scroll, or backtrack
  • Note their language vs the product's language
  • Note emotional reactions

Anti-patterns:

  • Talking too much (researcher should talk maybe 20% of the time)
  • Defending the design when participants struggle
  • Helping prematurely
  • Asking participants to predict their future behavior
  • Treating participant suggestions as features ("Users want X" - test demand for X separately)

5. Synthesize and report

Patterns across participants are signal. Single-participant complaints are weaker (but worth investigating).

Synthesis steps:

  1. Issue inventory. Every issue observed, with which participant, which task, severity.
  2. Cluster. Issues that are the same root problem.
  3. Severity.
    • Critical: Blocks task completion. Most users hit this.
    • Major: Significantly slows task. Many users hit this.
    • Minor: Friction. Some users hit this. Workaround exists.
    • Cosmetic: Polish. Doesn't affect task.
  4. Recommendations. For each issue, propose specific fixes.
  5. Prioritize. By severity and effort.

Report structure:

# Usability Test: [Design / flow]

## Summary
[2 to 3 paragraphs covering: what was tested, headline findings, top 3 priorities]

## Method
[Moderated/unmoderated, sample size, audience, dates, tasks]

## Critical findings
[Each with description, frequency, supporting evidence (quotes/clips), recommendation]

## Major findings
[Same structure]

## Minor findings
[Brief]

## Cosmetic findings
[Briefest]

## What worked well
[Calibration: capture successes too]

## Recommendations
[Prioritized list with effort estimates]

## Next steps
[Test re-run schedule, design iteration plan]

Workflow

  1. Define the goals. What decisions hinge on this? What tasks matter most?
  2. Design tasks. 3 to 5 specific, realistic, goal-framed tasks.
  3. Choose moderated vs unmoderated. Match to stage and depth needed.
  4. Recruit. Specific to audience.
  5. Pilot. 1 to 2 sessions before main batch. Refine tasks if needed.
  6. Run. Follow the protocol. Stay disciplined.
  7. Synthesize during, not just after. Patterns emerge by session 4 or 5.
  8. Report. Multiple formats - written report + highlight clips.
  9. Track fixes. Every critical issue should have an owner and date.
  10. Re-test after fixes. Verify the fix worked, didn't introduce new issues.

Failure patterns

  • Testing the whole product instead of specific tasks. Vague results.
  • Tasks that reveal the path. ("Click the menu and find...")
  • Friends and family as participants. Biased, not representative.
  • Researcher leading the participant. Findings reflect the researcher.
  • Defending the design when participants struggle. Misses real issues.
  • Helping too quickly. Participant doesn't experience the friction.
  • Treating participant suggestions as features. Users solve their problem; product team designs the solution.
  • One participant = data point. A single strong opinion isn't a finding.
  • Skipping severity scoring. All findings treated equally; team can't prioritize.
  • Reports no one reads. Highlight clips and live walkthroughs work better than 80-page decks.
  • Testing once, never re-testing. Fixes that introduce new problems go undetected.

Output format

Default outputs:

  1. Test plan (before testing) - usability-test-plan-[topic].md
  2. Task script (per session) - usability-tasks-[topic].md
  3. Findings report (after synthesis) - usability-findings-[topic].md
  4. Highlight clips (separately produced)

Reference files

GitHub 仓库

rampstackco/claude-skills
路径: skills/usability-testing
0
agent-skillsai-agentsanthropicclaudeclaude-aiclaude-code

相关推荐技能

Web Research

其他

该Skill能通过web搜索对任何主题进行在线研究,支持浅层和深度两种搜索模式。它自动解析查询、聚合多源信息并生成结构化Markdown报告。开发者可快速获取最新行业趋势或深度技术分析,适用于信息收集和市场研究等场景。

查看技能

dev-research-codebase-exploration

其他

这个Skill为开发者提供了高效的代码库探索能力,通过Glob模式匹配文件和Grep搜索文件内容。它特别适合在大型代码库中快速定位组件定义、状态管理文件或特定函数调用。开发者可以使用模式匹配查找文件,或通过正则表达式搜索代码内容,支持大小写忽略和上下文显示等实用功能。

查看技能

moltuniversity

其他

MoltUniversity是一个研究社区协作工具,让开发者能够参与提出科学主张、进行计算验证和同行评审。它通过API优先的设计,允许开发者直接通过curl命令与社区互动,包括注册、获取研究任务和贡献内容。该工具特别适合需要参与开放式科学研究、论文辩论或知识验证的开发者,将研究流程集成到开发工作流中。

查看技能

moltlab

其他

MoltLab是一个面向研究者的社区协作工具,允许开发者在科研工作流中提出主张、运行计算、投票辩论及评审论文。它通过分布式计算支持类似Folding@home的众包研究模式,强调人类对研究质量的直接监督和所有权。开发者可用它参与或管理需要同行评审、对抗性验证的开放式科研项目。

查看技能