youtube_moderation_prototype

Foundup

更新日 Yesterday

92 閲覧

テストtesting

について

このプロトタイプスキルは、スパムや有害コンテンツの検出、レート制限の実施により、YouTubeライブチャットをモデレートします。Claude Codeでのパターン忠実性をテストした後、ネイティブのQwen/Gemma環境にデプロイするために設計されています。90%の成功率閾値に対してモデレーションパターンを検証する際にご利用ください。

クイックインストール

Claude Code

推奨

プラグインコマンド推奨

/plugin add https://github.com/Foundup/Foundups-Agent

Git クローン代替

git clone https://github.com/Foundup/Foundups-Agent.git ~/.claude/skills/youtube_moderation_prototype

このコマンドをClaude Codeにコピー＆ペーストしてスキルをインストールします

ドキュメント

YouTube Chat Moderation Skill (PROTOTYPE)

Purpose: Validate this skill pattern with 0102 (Claude Code) before deploying to native Qwen/Gemma environment.

Test Plan: 0102 executes 50 benchmark tasks manually, validates ≥90% success rate.

Task

Moderate incoming YouTube live chat messages to maintain positive community environment.

Instructions (For 0102 Validation)

1. CAPS SPAM CHECK

Rule: IF message.length > 20 AND uppercase_ratio > 0.80 → BLOCK

Expected Pattern: caps_check_executed=True

Validation:

Count characters in message
Calculate uppercase_ratio = (uppercase_count / total_length)
If both conditions met → decision="block", reason="caps_spam"
Log: {"pattern": "caps_check_executed", "value": true, "decision": "block"}

Examples:

✅ "VISIT MY CHANNEL FOR FREE MONEY!!!!!!" → BLOCK (length=38, uppercase=100%)
✅ "CHECK OUT MY STREAM NOW!!!!!" → BLOCK (length=29, uppercase=96%)
❌ "Great stream!" → ALLOW (length=13, too short)
❌ "This is AMAZING content" → ALLOW (uppercase=38%, below threshold)

2. REPETITION CHECK

Rule: IF message appears in recent_history (last 100 messages) AND count >= 3 → BLOCK

Expected Pattern: repetition_check_executed=True

Validation:

Search last 100 messages for exact match
Count occurrences of current message
If count >= 3 → decision="block", reason="repetition"
Log: {"pattern": "repetition_check_executed", "value": true, "matches_found": N}

Examples:

✅ "Follow me!" (appears 3 times in history) → BLOCK
✅ "Join my Discord" (appears 5 times) → BLOCK
❌ "Follow me!" (appears 2 times) → ALLOW (below threshold)
❌ "First time saying this" (appears 1 time) → ALLOW

3. RATE LIMIT CHECK

Rule: IF user_message_count_in_last_30_seconds > 5 → WARN or BLOCK

Expected Pattern: rate_limit_applied=True

Validation:

Filter messages by user_id in last 30 seconds
Count messages
If count == 5 → decision="warn", action="send_warning"
If count > 5 → decision="block", reason="rate_limit"
Log: {"pattern": "rate_limit_applied", "value": true, "message_count": N}

Examples:

✅ User sends 6 messages in 30s → BLOCK
✅ User sends 5 messages in 30s → WARN
❌ User sends 4 messages in 30s → ALLOW
❌ User sends 6 messages in 60s → ALLOW (outside time window)

4. TOXIC CONTENT CHECK

Rule: IF message contains toxic keywords AND confidence > 0.8 → BLOCK

Expected Pattern: toxicity_check_executed=True

Validation:

Load toxic_keywords list (see resources/toxic_patterns.json)
Scan message for exact or fuzzy matches
Calculate confidence score (exact=1.0, fuzzy varies by similarity)
If confidence > 0.8 → decision="block", reason="toxic"
Log: {"pattern": "toxicity_check_executed", "value": true, "confidence": N, "matched_keywords": [...]}

Examples:

✅ "You're a [slur]" (exact match, confidence=1.0) → BLOCK
✅ "F*** this stream" (obfuscated match, confidence=0.9) → BLOCK
❌ "This is frustrating" (mild negative, confidence=0.3) → ALLOW
❌ "Nice job!" (positive, confidence=0.0) → ALLOW

5. LEGITIMATE MESSAGE ROUTING

Rule: IF all_checks_passed → ALLOW AND route_to_banter_response_skill

Expected Pattern: allowed_and_routed=True

Validation:

Verify all previous checks returned "allow"
decision="allow"
next_action="route_to_banter_skill"
Log: {"pattern": "allowed_and_routed", "value": true, "next_skill": "banter_response"}

Examples:

✅ "Great content!" (passes all checks) → ALLOW + route to banter
✅ "What game is this?" (passes all checks) → ALLOW + route to banter
❌ "SPAM MESSAGE" (fails caps check) → BLOCK (no routing)

Expected Patterns Summary

When validating with 0102, check that each execution logs these patterns:

{
  "execution_id": "exec_001",
  "message": "Great stream!",
  "patterns": {
    "caps_check_executed": true,
    "repetition_check_executed": true,
    "rate_limit_applied": true,
    "toxicity_check_executed": true,
    "allowed_and_routed": true
  },
  "decision": "allow",
  "confidence": 1.0
}

Pattern Fidelity Calculation:

fidelity = (patterns_executed / total_patterns_in_skill)
fidelity = 5/5 = 1.00 (100%)

Success Criteria for Prototype:

✅ Pattern fidelity ≥ 90% across all test cases
✅ Outcome quality ≥ 85% (correct classifications)
✅ No false negatives on toxic/spam
✅ Low false positive rate (<5%)

Benchmark Test Cases

Test Set 1: Spam Messages (20 cases)

"VISIT MY CHANNEL!!!!!!" → BLOCK (caps)
"CHECK THIS OUT NOW" → BLOCK (caps)
"Follow me!" (x3 repetition) → BLOCK (repetition)
"Join Discord" (x4 repetition) → BLOCK (repetition)
User sends 6 rapid messages → BLOCK (rate limit) ... (15 more spam cases)

Test Set 2: Toxic Messages (20 cases)

Message with racial slur → BLOCK (toxic, confidence=1.0)
Message with profanity → BLOCK (toxic, confidence=0.95)
"You suck at this game" → BLOCK (toxic, confidence=0.85) ... (17 more toxic cases)

Test Set 3: Legitimate Messages (60 cases)

"Great stream!" → ALLOW
"What game is this?" → ALLOW
"How long have you been streaming?" → ALLOW ... (57 more legitimate cases)

Test Set 4: Edge Cases (10 cases)

"This is REALLY cool" → ALLOW (caps below threshold)
"lol" (x2 repetition) → ALLOW (below repetition threshold)
Message in non-English → CONTEXT-DEPENDENT ... (7 more edge cases)

Total: 110 test cases

0102 Validation Protocol

Step 1: Review skill instructions

Understand each rule
Verify examples make sense
Identify any ambiguities

Step 2: Execute benchmark tasks (manually)

Test Set 1: Spam (20 cases)
Test Set 2: Toxic (20 cases)
Test Set 3: Legitimate (60 cases)
Test Set 4: Edge (10 cases)

Step 3: Log pattern fidelity

For each execution, verify all 5 patterns logged
Calculate per-execution fidelity (patterns_executed / 5)
Calculate overall fidelity across all 110 cases

Step 4: Calculate outcome quality

Count correct classifications
Outcome quality = (correct / total)

Step 5: Validate thresholds

Pattern fidelity ≥ 90%?
Outcome quality ≥ 85%?
Combined score ≥ 88%?

Step 6: Document failures

If any test fails, document:
- Which instruction was unclear?
- Why did execution deviate?
- How should instruction be improved?

Step 7: Iterate if needed

Update SKILL.md based on failures
Re-run failed test cases
Achieve ≥90% threshold

Step 8: Approve for deployment

All thresholds met
Ready to extract to modules/communication/livechat/skills/
Pattern validated for native Qwen/Gemma execution

Resources

See examples/ directory for:

spam_examples.json - 20 spam test cases
toxic_examples.json - 20 toxic test cases
legitimate_examples.json - 60 legitimate test cases
edge_cases.json - 10 edge test cases

See resources/ directory for:

toxic_patterns.json - Toxic keyword database
validation_template.json - Logging template

Next Steps After Validation

If prototype succeeds (≥90% fidelity):

Extract to modules/communication/livechat/skills/youtube_moderation/
Add versions/, metrics/, variations/ directories
Create baseline: versions/v1.0_baseline.md (copy of validated SKILL.md)
Implement WRE loader to inject into Qwen/Gemma prompts
Run same 110 test cases with Gemma
Enable pattern fidelity scoring
Compare: 0102 fidelity vs Gemma fidelity
If Gemma ≥ 0102 → SUCCESS (skill is portable!)
If Gemma < 0102 → Qwen generates variations, A/B test, evolve

Status: READY FOR 0102 VALIDATION Next Action: Execute 110 benchmark test cases manually with Claude Code

GitHub リポジトリ

Foundup/Foundups-Agent

パス: .claude/skills/youtube_moderation_prototype

bitcoinblockchain-technologydaesdaofoundupspartifact

youtube_moderation_prototype

について

クイックインストール

Claude Code

ドキュメント

YouTube Chat Moderation Skill (PROTOTYPE)

Task

Instructions (For 0102 Validation)

1. CAPS SPAM CHECK

2. REPETITION CHECK

3. RATE LIMIT CHECK

4. TOXIC CONTENT CHECK

5. LEGITIMATE MESSAGE ROUTING

Expected Patterns Summary

Benchmark Test Cases

Test Set 1: Spam Messages (20 cases)

Test Set 2: Toxic Messages (20 cases)

Test Set 3: Legitimate Messages (60 cases)

Test Set 4: Edge Cases (10 cases)

0102 Validation Protocol

Resources

Next Steps After Validation

GitHub リポジトリ

関連スキル

content-collections

evaluating-llms-harness

cloudflare-turnstile

webapp-testing