usability-testing
정보
이 스킬은 개발자가 출시 전에 문제점을 파악하기 위해 디자인이나 프로토타입에 대한 사용성 테스트를 계획하고 실행하는 데 도움을 줍니다. 테스트 설계, 작업 스크립트 작성, 진행 관리, 그리고 조사식과 비조사식 테스트 모두에서 결과를 종합하는 기능을 제공합니다. 디자인 검증, 작업 완료율 향상, 실제 사용자가 여러분의 제작물을 성공적으로 사용할 수 있도록 보장하기 위해 활용하세요.
빠른 설치
Claude Code
추천npx skills add rampstackco/claude-skills -a claude-code/plugin add https://github.com/rampstackco/claude-skillsgit clone https://github.com/rampstackco/claude-skills.git ~/.claude/skills/usability-testingClaude Code에서 이 명령을 복사하여 붙여넣어 스킬을 설치하세요
문서
Usability Testing
Plan and run tests that find usability problems before users hit them in production. Stack-agnostic. Tool-agnostic.
This skill is for testing existing designs or prototypes. For broader discovery research, use ux-research. For conversion testing in production, use cro-optimization.
When to use
- Before launching a new flow or major redesign
- After a redesign to verify it doesn't introduce new problems
- When analytics show drop-off but you don't know why
- When customer support tickets pattern around specific UI areas
- Pre-launch user validation
- Comparing two design directions
When NOT to use
- Discovery / generative research (use
ux-research) - Live conversion optimization (use
cro-optimization) - Mapping the broader experience (use
journey-mapping) - Pure quantitative measurement (use
analytics-strategy)
Required inputs
- The design or prototype to test (functional or near-functional)
- Specific tasks users would do
- The audience (who should be tested)
- Testing infrastructure (moderated tool, unmoderated tool, in-person setup)
The framework: 5 phases
1. Define what to test
Don't test the whole product. Test specific tasks.
Task selection criteria:
- The task represents a real user goal (not "click around and explore")
- The task has a clear start and end
- The task is achievable in 2 to 10 minutes
- The task is one of: most common, most strategic, most problematic
Examples of testable tasks:
"You want to find a contractor near you who can install a fence. Show me how you'd do that on this site."
"You're a first-time visitor. You want to understand if this product fits your needs. Walk me through how you'd evaluate it."
"Your team needs a new tool to manage projects. Use this site to figure out which plan is right for a 12-person team."
Task framing rules:
- State the user goal, not the system action ("find a place to stay" not "click the search button")
- Provide context (why are you doing this?)
- Don't reveal the path
- Don't use product terminology in the task framing
2. Choose moderated or unmoderated
Moderated (live, with researcher):
- Researcher observes and probes in real time
- Best for early-stage prototypes, complex tasks, novel concepts
- Higher cost, smaller sample (5 to 8 participants typical)
- Catches surprises and probe deeper
Unmoderated (recorded, asynchronous):
- Participant completes alone, often via tool (UserTesting, Maze, Lookback)
- Best for stable designs, simple tasks, larger sample
- Lower cost, larger sample (15 to 30 participants typical)
- Catches patterns at scale, less depth per session
For most teams: moderated for early/critical decisions, unmoderated for ongoing validation.
3. Recruit
Target audience - not just convenience.
Recruit criteria:
- Match real users (target audience, not just "anyone")
- Mix of experience levels with the product (new and existing if applicable)
- Mix of relevant device types (mobile, desktop, tablet if relevant)
- Exclude friends, family, employees
Sample size:
- Moderated: 5 to 8 participants (Nielsen's "5 users find 85% of usability issues" for the most common segment)
- Unmoderated: 15 to 30 participants (more participants compensate for less probing)
- Multi-segment testing: 5 to 8 per segment
4. Run the test
Pre-task setup:
- Confirm recording works
- Brief participant (purpose, anonymity, recording, "no wrong answers")
- Get verbal consent
- Have participant share screen if remote
Moderated session structure:
- Warm-up (2 to 3 min). Easy questions to put participant at ease.
- Pre-test questions (3 to 5 min). Background context, current behavior with similar products.
- Task 1 (5 to 10 min). Describe task. Have participant attempt while thinking aloud.
- Post-task questions (1 to 2 min). What was easy/hard? Anything confusing?
- Repeat for tasks 2, 3, 4 (typically 3 to 5 tasks per 60-minute session).
- Overall debrief (5 to 10 min). General reactions, comparisons to alternatives, anything else.
- Close (2 min).
Moderation principles:
- Encourage think-aloud ("What's going through your mind?")
- Don't help unless they're truly stuck (and even then, only after a long pause)
- Don't lead ("Are you looking for the menu?" - bad)
- Note where they hesitate, scroll, or backtrack
- Note their language vs the product's language
- Note emotional reactions
Anti-patterns:
- Talking too much (researcher should talk maybe 20% of the time)
- Defending the design when participants struggle
- Helping prematurely
- Asking participants to predict their future behavior
- Treating participant suggestions as features ("Users want X" - test demand for X separately)
5. Synthesize and report
Patterns across participants are signal. Single-participant complaints are weaker (but worth investigating).
Synthesis steps:
- Issue inventory. Every issue observed, with which participant, which task, severity.
- Cluster. Issues that are the same root problem.
- Severity.
- Critical: Blocks task completion. Most users hit this.
- Major: Significantly slows task. Many users hit this.
- Minor: Friction. Some users hit this. Workaround exists.
- Cosmetic: Polish. Doesn't affect task.
- Recommendations. For each issue, propose specific fixes.
- Prioritize. By severity and effort.
Report structure:
# Usability Test: [Design / flow]
## Summary
[2 to 3 paragraphs covering: what was tested, headline findings, top 3 priorities]
## Method
[Moderated/unmoderated, sample size, audience, dates, tasks]
## Critical findings
[Each with description, frequency, supporting evidence (quotes/clips), recommendation]
## Major findings
[Same structure]
## Minor findings
[Brief]
## Cosmetic findings
[Briefest]
## What worked well
[Calibration: capture successes too]
## Recommendations
[Prioritized list with effort estimates]
## Next steps
[Test re-run schedule, design iteration plan]
Workflow
- Define the goals. What decisions hinge on this? What tasks matter most?
- Design tasks. 3 to 5 specific, realistic, goal-framed tasks.
- Choose moderated vs unmoderated. Match to stage and depth needed.
- Recruit. Specific to audience.
- Pilot. 1 to 2 sessions before main batch. Refine tasks if needed.
- Run. Follow the protocol. Stay disciplined.
- Synthesize during, not just after. Patterns emerge by session 4 or 5.
- Report. Multiple formats - written report + highlight clips.
- Track fixes. Every critical issue should have an owner and date.
- Re-test after fixes. Verify the fix worked, didn't introduce new issues.
Failure patterns
- Testing the whole product instead of specific tasks. Vague results.
- Tasks that reveal the path. ("Click the menu and find...")
- Friends and family as participants. Biased, not representative.
- Researcher leading the participant. Findings reflect the researcher.
- Defending the design when participants struggle. Misses real issues.
- Helping too quickly. Participant doesn't experience the friction.
- Treating participant suggestions as features. Users solve their problem; product team designs the solution.
- One participant = data point. A single strong opinion isn't a finding.
- Skipping severity scoring. All findings treated equally; team can't prioritize.
- Reports no one reads. Highlight clips and live walkthroughs work better than 80-page decks.
- Testing once, never re-testing. Fixes that introduce new problems go undetected.
Output format
Default outputs:
- Test plan (before testing) -
usability-test-plan-[topic].md - Task script (per session) -
usability-tasks-[topic].md - Findings report (after synthesis) -
usability-findings-[topic].md - Highlight clips (separately produced)
Reference files
references/task-script-patterns.md- Task framing patterns by common product type, with good and bad examples.
GitHub 저장소
연관 스킬
Web Research
기타이 스킬은 검색 쿼리를 작성하고, 다양한 소스의 정보를 종합하며, 결과를 구조화된 마크다운 보고서로 정리하는 방식으로 모든 주제에 대한 자동화된 웹 리서치를 수행합니다. 얕은 검색과 깊은 검색 모드를 모두 제공하여 포괄적인 정보를 신속하게 수집하는 데 이상적입니다. 개발자는 리서치 작업, 정보 수집, 급변하는 주제에 대한 최신 정보 습득을 위해 이 스킬을 사용해야 합니다.
dev-research-codebase-exploration
기타이 Claude Skill은 Glob 및 Grep 도구를 통한 효율적인 코드베이스 탐색을 가능하게 하며, 파일 패턴 매칭과 콘텐츠 검색 기능을 제공합니다. 개발자가 파일 유형, 디렉터리 또는 이름으로 파일을 빠르게 찾고, 대소문자 구분 및 컨텍스트 옵션을 활용해 파일 내용 내에서 검색할 수 있도록 돕습니다. 익숙하지 않은 코드베이스를 탐색하거나 프로젝트 전반에 걸쳐 특정 컴포넌트, 함수 또는 패턴을 찾을 때 사용하세요.
Data Analyzer
기타Data Analyzer는 구조화 및 비구조화된 데이터 세트를 처리하여 통찰력을 추출하고 패턴을 식별하는 복잡한 연구 스킬입니다. 탐색적 데이터 분석, 통계적 검정, 상관관계 분석을 수행하여 실행 가능한 인텔리전스를 생성합니다. 비즈니스 분석, 연구 검증, 원시 데이터를 데이터 기반 권장 사항으로 변환하는 데 사용하세요.
moltuniversity
기타MoltUniversity는 개발자들이 주장을 제안하고, 계산을 실행하며, 동료 검토 연구 논문에 협업할 수 있도록 하는 연구 커뮤니티 스킬입니다. 이 스킬은 적대적 지식 프레임워크 내에서 아이디어를 논쟁하고, 개념에 투표하며, 동료의 작업을 검토할 수 있는 도구를 제공합니다. 구조화된 과학적 협업에 참여하거나 Claude를 통해 오픈 리서치 프로젝트에 기여해야 할 때 이 스킬을 사용하세요.
