daily-news-report
About
This skill automatically scrapes technical content from preset URLs, filters for high-quality information, and compiles it into daily Markdown reports. It uses a multi-agent architecture with browser scraping and smart caching for reliable execution. Use it to automate the collection and summarization of daily technical news and updates.
Quick Install
Claude Code
Recommended/plugin add https://github.com/majiayu000/claude-skill-registrygit clone https://github.com/majiayu000/claude-skill-registry.git ~/.claude/skills/daily-news-reportCopy and paste this command in Claude Code to install this skill
Documentation
Daily News Report v3.0
Architecture Upgrade: Main Agent Orchestration + SubAgent Execution + Browser Scraping + Smart Caching
Core Architecture
┌─────────────────────────────────────────────────────────────────────┐
│ Main Agent (Orchestrator) │
│ Role: Scheduling, Monitoring, Evaluation, Decision, Aggregation │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ 1. Init │ → │ 2. Dispatch │ → │ 3. Monitor │ → │ 4. Evaluate │ │
│ │ Read Config │ │ Assign Tasks│ │ Collect Res │ │ Filter/Sort │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ 5. Decision │ ← │ Enough 20? │ │ 6. Generate │ → │ 7. Update │ │
│ │ Cont/Stop │ │ Y/N │ │ Report File │ │ Cache Stats │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────────┘
↓ Dispatch ↑ Return Results
┌─────────────────────────────────────────────────────────────────────┐
│ SubAgent Execution Layer │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Worker A │ │ Worker B │ │ Browser │ │
│ │ (WebFetch) │ │ (WebFetch) │ │ (Headless) │ │
│ │ Tier1 Batch │ │ Tier2 Batch │ │ JS Render │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ ↓ ↓ ↓ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Structured Result Return │ │
│ │ { status, data: [...], errors: [...], metadata: {...} } │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
Configuration Files
This skill uses the following configuration files:
| File | Purpose |
|---|---|
sources.json | Source configuration, priorities, scrape methods |
cache.json | Cached data, historical stats, deduplication fingerprints |
Execution Process Details
Phase 1: Initialization
Steps:
1. Determine date (user argument or current date)
2. Read sources.json for source configurations
3. Read cache.json for historical data
4. Create output directory NewsReport/
5. Check if a partial report exists for today (append mode)
Phase 2: Dispatch SubAgents
Strategy: Parallel dispatch, batch execution, early stopping mechanism
Wave 1 (Parallel):
- Worker A: Tier1 Batch A (HN, HuggingFace Papers)
- Worker B: Tier1 Batch B (OneUsefulThing, Paul Graham)
Wait for results → Evaluate count
If < 15 high-quality items:
Wave 2 (Parallel):
- Worker C: Tier2 Batch A (James Clear, FS Blog)
- Worker D: Tier2 Batch B (HackerNoon, Scott Young)
If still < 20 items:
Wave 3 (Browser):
- Browser Worker: ProductHunt, Latent Space (Require JS rendering)
Phase 3: SubAgent Task Format
Task format received by each SubAgent:
task: fetch_and_extract
sources:
- id: hn
url: https://news.ycombinator.com
extract: top_10
- id: hf_papers
url: https://huggingface.co/papers
extract: top_voted
output_schema:
items:
- source_id: string # Source Identifier
title: string # Title
summary: string # 2-4 sentence summary
key_points: string[] # Max 3 key points
url: string # Original URL
keywords: string[] # Keywords
quality_score: 1-5 # Quality Score
constraints:
filter: "Cutting-edge Tech/Deep Tech/Productivity/Practical Info"
exclude: "General Science/Marketing Puff/Overly Academic/Job Posts"
max_items_per_source: 10
skip_on_error: true
return_format: JSON
Phase 4: Main Agent Monitoring & Feedback
Main Agent Responsibilities:
Monitoring:
- Check SubAgent return status (success/partial/failed)
- Count collected items
- Record success rate per source
Feedback Loop:
- If a SubAgent fails, decide whether to retry or skip
- If a source fails persistently, mark as disabled
- Dynamically adjust source selection for subsequent batches
Decision:
- Items >= 25 AND HighQuality >= 20 → Stop scraping
- Items < 15 → Continue to next batch
- All batches done but < 20 → Generate with available content (Quality over Quantity)
Phase 5: Evaluation & Filtering
Deduplication:
- Exact URL match
- Title similarity (>80% considered duplicate)
- Check cache.json to avoid history duplicates
Score Calibration:
- Unify scoring standards across SubAgents
- Adjust weights based on source credibility
- Bonus points for manually curated high-quality sources
Sorting:
- Descending order by quality_score
- Sort by source priority if scores are equal
- Take Top 20
Phase 6: Browser Scraping (MCP Chrome DevTools)
For pages requiring JS rendering, use a headless browser:
Process:
1. Call mcp__chrome-devtools__new_page to open page
2. Call mcp__chrome-devtools__wait_for to wait for content load
3. Call mcp__chrome-devtools__take_snapshot to get page structure
4. Parse snapshot to extract required content
5. Call mcp__chrome-devtools__close_page to close page
Applicable Scenarios:
- ProductHunt (403 on WebFetch)
- Latent Space (Substack JS rendering)
- Other SPA applications
Phase 7: Generate Report
Output:
- Directory: NewsReport/
- Filename: YYYY-MM-DD-news-report.md
- Format: Standard Markdown
Content Structure:
- Title + Date
- Statistical Summary (Source count, items collected)
- 20 High-Quality Items (Template based)
- Generation Info (Version, Timestamps)
Phase 8: Update Cache
Update cache.json:
- last_run: Record this run info
- source_stats: Update stats per source
- url_cache: Add processed URLs
- content_hashes: Add content fingerprints
- article_history: Record included articles
SubAgent Call Examples
Using general-purpose Agent
Since custom agents require session restart to be discovered, use general-purpose and inject worker prompts:
Task Call:
subagent_type: general-purpose
model: haiku
prompt: |
You are a stateless execution unit. Only do the assigned task and return structured JSON.
Task: Scrape the following URLs and extract content
URLs:
- https://news.ycombinator.com (Extract Top 10)
- https://huggingface.co/papers (Extract top voted papers)
Output Format:
{
"status": "success" | "partial" | "failed",
"data": [
{
"source_id": "hn",
"title": "...",
"summary": "...",
"key_points": ["...", "...", "..."],
"url": "...",
"keywords": ["...", "..."],
"quality_score": 4
}
],
"errors": [],
"metadata": { "processed": 2, "failed": 0 }
}
Filter Criteria:
- Keep: Cutting-edge Tech/Deep Tech/Productivity/Practical Info
- Exclude: General Science/Marketing Puff/Overly Academic/Job Posts
Return JSON directly, no explanation.
Using worker Agent (Requires session restart)
Task Call:
subagent_type: worker
prompt: |
task: fetch_and_extract
input:
urls:
- https://news.ycombinator.com
- https://huggingface.co/papers
output_schema:
- source_id: string
- title: string
- summary: string
- key_points: string[]
- url: string
- keywords: string[]
- quality_score: 1-5
constraints:
filter: Cutting-edge Tech/Deep Tech/Productivity/Practical Info
exclude: General Science/Marketing Puff/Overly Academic
Output Template
# Daily News Report (YYYY-MM-DD)
> Curated from N sources today, containing 20 high-quality items
> Generation Time: X min | Version: v3.0
>
> **Warning**: Sub-agent 'worker' not detected. Running in generic mode (Serial Execution). Performance might be degraded.
---
## 1. Title
- **Summary**: 2-4 lines overview
- **Key Points**:
1. Point one
2. Point two
3. Point three
- **Source**: [Link](URL)
- **Keywords**: `keyword1` `keyword2` `keyword3`
- **Score**: ⭐⭐⭐⭐⭐ (5/5)
---
## 2. Title
...
---
*Generated by Daily News Report v3.0*
*Sources: HN, HuggingFace, OneUsefulThing, ...*
Constraints & Principles
- Quality over Quantity: Low-quality content does not enter the report.
- Early Stop: Stop scraping once 20 high-quality items are reached.
- Parallel First: SubAgents in the same batch execute in parallel.
- Fault Tolerance: Failure of a single source does not affect the whole process.
- Cache Reuse: Avoid re-scraping the same content.
- Main Agent Control: All decisions are made by the Main Agent.
- Fallback Awareness: Detect sub-agent availability, gracefully degrade if unavailable.
Expected Performance
| Scenario | Expected Time | Note |
|---|---|---|
| Optimal | ~2 mins | Tier1 sufficient, no browser needed |
| Normal | ~3-4 mins | Requires Tier2 supplement |
| Browser Needed | ~5-6 mins | Includes JS rendered pages |
Error Handling
| Error Type | Handling |
|---|---|
| SubAgent Timeout | Log error, continue to next |
| Source 403/404 | Mark disabled, update sources.json |
| Extraction Failed | Return raw content, Main Agent decides |
| Browser Crash | Skip source, log entry |
Compatibility & Fallback
To ensure usability across different Agent environments, the following checks must be performed:
-
Environment Check:
- In Phase 1 initialization, attempt to detect if
workersub-agent exists. - If not exists (or plugin not installed), automatically switch to Serial Execution Mode.
- In Phase 1 initialization, attempt to detect if
-
Serial Execution Mode:
- Do not use parallel block.
- Main Agent executes scraping tasks for each source sequentially.
- Slower, but guarantees basic functionality.
-
User Alert:
- MUST include a clear warning in the generated report header indicating the current degraded mode.
GitHub Repository
Related Skills
sglang
MetaSGLang is a high-performance LLM serving framework that specializes in fast, structured generation for JSON, regex, and agentic workflows using its RadixAttention prefix caching. It delivers significantly faster inference, especially for tasks with repeated prefixes, making it ideal for complex, structured outputs and multi-turn conversations. Choose SGLang over alternatives like vLLM when you need constrained decoding or are building applications with extensive prefix sharing.
evaluating-llms-harness
TestingThis Claude Skill runs the lm-evaluation-harness to benchmark LLMs across 60+ standardized academic tasks like MMLU and GSM8K. It's designed for developers to compare model quality, track training progress, or report academic results. The tool supports various backends including HuggingFace and vLLM models.
langchain
MetaLangChain is a framework for building LLM applications using agents, chains, and RAG pipelines. It supports multiple LLM providers, offers 500+ integrations, and includes features like tool calling and memory management. Use it for rapid prototyping and deploying production systems like chatbots, autonomous agents, and question-answering services.
llamaguard
OtherLlamaGuard is Meta's 7-8B parameter model for moderating LLM inputs and outputs across six safety categories like violence and hate speech. It offers 94-95% accuracy and can be deployed using vLLM, Hugging Face, or Amazon SageMaker. Use this skill to easily integrate content filtering and safety guardrails into your AI applications.
