Web Application Reconnaissance
关于
This skill provides a systematic methodology for mapping web application attack surfaces during security assessments. It focuses on discovering hidden endpoints, identifying technologies, and uncovering functionality that automated scanners often miss. Developers should use it when starting security assessments of web applications or hunting for forgotten endpoints and hidden features.
技能文档
Web Application Reconnaissance
Overview
Web application reconnaissance goes beyond simple subdomain discovery to map the full attack surface of a web application. This includes discovering hidden endpoints, analyzing client-side code, identifying backend technologies, and understanding the application's architecture.
Core principle: Systematic enumeration combined with intelligent analysis reveals hidden attack surface that automated scanners miss.
When to Use
Use this skill when:
- Starting security assessment of a web application
- Building comprehensive understanding of app structure
- Looking for hidden admin panels, APIs, or debug endpoints
- Analyzing JavaScript for hardcoded secrets or endpoints
- Mapping application functionality before deeper testing
Don't use when:
- Not authorized to test the target
- Application has strict rate limiting (adjust methodology)
- Need to remain completely passive (use only public sources)
The Four-Phase Methodology
Phase 1: Initial Discovery and Fingerprinting
Goal: Understand what you're dealing with - technologies, frameworks, and basic structure.
Techniques:
-
Technology Detection
# Comprehensive tech stack identification whatweb -v -a 3 https://target.com # HTTP headers analysis curl -I https://target.com # Wappalyzer or similar wappalyzer https://target.com -
Common Files and Directories
# robots.txt - often reveals hidden directories curl https://target.com/robots.txt # sitemap.xml - complete site structure curl https://target.com/sitemap.xml # security.txt - contact info, may reveal scope curl https://target.com/.well-known/security.txt # Common config/info files for file in readme.md humans.txt crossdomain.xml; do curl -s https://target.com/$file done -
SSL/TLS Analysis
# Certificate information may reveal additional domains echo | openssl s_client -connect target.com:443 2>/dev/null | \ openssl x509 -noout -text | \ grep -A1 "Subject Alternative Name"
Phase 2: Content Discovery
Goal: Find hidden endpoints, forgotten files, backup directories, and undocumented functionality.
Techniques:
-
Directory and File Fuzzing
# ffuf - fast web fuzzer ffuf -w /path/to/wordlist.txt \ -u https://target.com/FUZZ \ -mc 200,301,302,403 \ -o directories.json # gobuster for directory brute-forcing gobuster dir -u https://target.com \ -w /path/to/wordlist.txt \ -x php,html,js,txt,json \ -o gobuster_results.txt # feroxbuster - recursive directory discovery feroxbuster -u https://target.com \ -w /path/to/wordlist.txt \ --depth 3 \ -x php js json -
Intelligent Wordlist Selection
# Technology-specific wordlists # For WordPress: ffuf -w wordpress_wordlist.txt -u https://target.com/FUZZ # For APIs: ffuf -w api_wordlist.txt -u https://target.com/api/FUZZ # Custom wordlist from discovered technologies # If tech stack is Python/Django, use Django-specific paths -
Backup and Sensitive File Discovery
# Common backup patterns for ext in .bak .old .backup .swp ~; do ffuf -w discovered_files.txt -u https://target.com/FUZZ$ext -mc 200 done # Source code disclosure ffuf -w discovered_files.txt -u https://target.com/FUZZ.txt -mc 200 # Git exposure curl -s https://target.com/.git/HEAD # If found, use git-dumper or similar to extract repository
Phase 3: JavaScript Analysis
Goal: Extract hardcoded secrets, discover API endpoints, and understand client-side logic.
Techniques:
-
Enumerate All JavaScript Files
# Extract JS URLs from HTML curl -s https://target.com | \ grep -oP 'src="[^"]+\.js"' | \ sed 's/src="//;s/"$//' > js_files.txt # Use LinkFinder or similar python3 linkfinder.py -i https://target.com -o results.html -
Search for Sensitive Data in JS
# Download all JS files while read url; do curl -s "$url" > "js/$(basename "$url")" done < js_files.txt # Search for patterns grep -r -E "(api_key|apikey|secret|password|token|aws_access)" js/ grep -r -E "(https?://[^\"\'\ ]+)" js/ | grep -v "fonts\|cdn" # Find API endpoints grep -r -E "(/api/|/v[0-9]+/)" js/ -
Beautify and Analyze Minified Code
# Beautify JS for easier analysis for file in js/*.js; do js-beautify "$file" > "js_beautified/$(basename "$file")" done # Look for interesting functions grep -r "function" js_beautified/ | grep -i "admin\|debug\|test" -
Extract Subdomains and Endpoints from JS
# Use tools like JSFinder, relative-url-extractor python3 relative-url-extractor.py -u https://target.com > endpoints.txt
Phase 4: Architecture Mapping
Goal: Understand application structure, authentication flows, and data flows.
Techniques:
-
Crawling and Spidering
# Burp Suite spider (manual) # Or use automated crawlers gospider -s https://target.com -d 3 -c 10 -o spider_output # katana - fast crawler katana -u https://target.com -d 5 -ps -jc -o crawl_results.txt -
Parameter Discovery
# Find URL parameters arjun -u https://target.com/search -m GET # ParamSpider - discover parameters from wayback python3 paramspider.py -d target.com -
API Endpoint Enumeration
# If API discovered, enumerate versions and endpoints for version in v1 v2 v3; do ffuf -w api_endpoints.txt -u https://api.target.com/$version/FUZZ done # Swagger/OpenAPI documentation curl https://api.target.com/swagger.json curl https://api.target.com/openapi.json curl https://api.target.com/api-docs -
Authentication and Session Analysis
# Analyze authentication mechanisms # - Cookie attributes (HttpOnly, Secure, SameSite) # - JWT tokens (decode and analyze claims) # - OAuth flows # - Session management # Check for JWT # Decode JWT token (use jwt_tool or jwt.io) echo "eyJhbG..." | base64 -d
Automation Pipeline
Complete reconnaissance pipeline:
#!/bin/bash
# web_app_recon.sh
TARGET=$1
OUTPUT_DIR="${TARGET//[.:\/]/_}_webapp_recon"
mkdir -p "$OUTPUT_DIR"/{js,crawl,endpoints}
echo "[*] Starting web application reconnaissance for $TARGET"
# Phase 1: Fingerprinting
echo "[*] Phase 1: Technology fingerprinting"
whatweb -v -a 3 "$TARGET" > "$OUTPUT_DIR/whatweb.txt"
curl -I "$TARGET" > "$OUTPUT_DIR/headers.txt"
curl -s "$TARGET/robots.txt" > "$OUTPUT_DIR/robots.txt"
curl -s "$TARGET/sitemap.xml" > "$OUTPUT_DIR/sitemap.xml"
# Phase 2: Content Discovery
echo "[*] Phase 2: Content discovery"
feroxbuster -u "$TARGET" \
-w /usr/share/wordlists/seclists/Discovery/Web-Content/common.txt \
-x php,html,js,txt,json \
--depth 2 \
-o "$OUTPUT_DIR/feroxbuster.txt"
# Phase 3: JavaScript Analysis
echo "[*] Phase 3: JavaScript analysis"
katana -u "$TARGET" -jc -o "$OUTPUT_DIR/crawl/katana_js.txt"
# Download and analyze JS files
grep "\.js$" "$OUTPUT_DIR/crawl/katana_js.txt" | while read js_url; do
filename=$(echo "$js_url" | md5sum | cut -d' ' -f1)
curl -s "$js_url" > "$OUTPUT_DIR/js/${filename}.js"
done
# Search for secrets in JS
echo "[*] Searching for sensitive data in JavaScript"
grep -r -E "(api[_-]?key|secret|password|token)" "$OUTPUT_DIR/js/" > "$OUTPUT_DIR/js_secrets.txt"
# Phase 4: Endpoint extraction
echo "[*] Phase 4: Endpoint extraction"
cat "$OUTPUT_DIR/js"/*.js | grep -oP '(/api/[^"'"'"'\s]+)' | sort -u > "$OUTPUT_DIR/endpoints/api_endpoints.txt"
echo "[+] Reconnaissance complete. Results in $OUTPUT_DIR/"
echo "[+] Review the following files:"
echo " - whatweb.txt: Technology stack"
echo " - feroxbuster.txt: Discovered directories/files"
echo " - js_secrets.txt: Potential secrets in JavaScript"
echo " - endpoints/api_endpoints.txt: API endpoints found"
Tool Recommendations
Content Discovery:
- ffuf (fast, flexible, modern)
- feroxbuster (recursive, Rust-based)
- gobuster (reliable, simple)
Crawling:
- katana (fast, modern)
- gospider (feature-rich)
- Burp Suite spider (manual, thorough)
JavaScript Analysis:
- LinkFinder (extract endpoints from JS)
- JSFinder (find subdomains/endpoints)
- relative-url-extractor
- js-beautify (beautify minified code)
General:
- httpx (probing and tech detection)
- nuclei (vulnerability templates)
- waybackurls (historical URLs)
Common Patterns and Findings
High-value targets to look for:
-
Admin/Debug Panels
/admin, /administrator, /admin.php /debug, /test, /dev /phpinfo.php, /info.php /console, /terminal -
Configuration Files
/config.php, /.env, /settings.py /web.config, /application.yml /config.json, /.git/config -
API Documentation
/api-docs, /swagger, /api/v1/docs /graphql, /graphiql /redoc, /openapi.json -
Backup Files
/backup, /backups, /old index.php.bak, database.sql.old site.tar.gz, backup.zip
Organizing Findings
Create structured documentation:
# Web App Recon: target.com
## Executive Summary
- Application Type: [E-commerce, API, CMS, etc.]
- Primary Technology: [PHP/Laravel, Python/Django, Node.js, etc.]
- Notable Findings: [X hidden endpoints, Y exposed configs]
## Technology Stack
- Frontend: React 18.2, Bootstrap 5
- Backend: Laravel 9.x
- Server: Nginx 1.21
- Database: MySQL (inferred from error messages)
## Discovered Endpoints
### Public
- /api/v1/products - Product listing API
- /api/v1/users - User profiles (requires auth)
### Hidden/Interesting
- /api/v1/admin - Admin API (403, exists!)
- /api/internal/metrics - Internal metrics endpoint
- /debug/routes - Laravel route list (exposed!)
## Sensitive Files Found
- /storage/logs/laravel.log - Application logs exposed
- /.env.backup - Backup of environment config
- /phpinfo.php - Server info disclosure
## JavaScript Findings
- API keys found: 2 (one appears to be test key)
- Hardcoded API endpoints: 15 additional endpoints
- Subdomains discovered: api-staging.target.com
## Priority Items for Further Testing
1. /debug/routes - Full route disclosure
2. /.env.backup - May contain database credentials
3. /api/internal/metrics - Potential IDOR or info disclosure
4. Staging subdomain - May have weaker security
## Next Steps
- Test IDOR on /api/v1/users endpoints
- Attempt to access admin API with discovered tokens
- Manual review of staging environment
- Test for SQL injection in search parameters
Legal and Ethical Considerations
CRITICAL - Always follow these rules:
-
Authorization Required
- Never test without explicit permission
- Understand scope and boundaries
- Don't access sensitive data unless authorized
-
Responsible Disclosure
- Report findings through proper channels
- Don't publicly disclose before remediation
- Follow responsible disclosure timelines
-
Data Handling
- Don't exfiltrate sensitive data
- Don't store credentials or PII
- Delete reconnaissance data after assessment
-
Avoid DoS Conditions
- Rate limit your requests
- Don't overload servers
- Use appropriate concurrency settings
Common Pitfalls
| Mistake | Impact | Solution |
|---|---|---|
| Relying only on automated tools | Miss context-specific findings | Combine automation with manual analysis |
| Skipping JavaScript analysis | Miss API endpoints and secrets | Always analyze client-side code |
| Not checking robots.txt first | Waste time on known paths | Start with obvious information sources |
| Ignoring error messages | Miss technology fingerprinting | Pay attention to verbose errors |
| Too aggressive fuzzing | Detection, IP blocking | Start with smaller wordlists, increase gradually |
Integration with Other Skills
This skill works with:
- skills/reconnaissance/automated-subdomain-enum - Feeds discovered subdomains here
- skills/exploitation/* - Use discovered endpoints for exploitation
- skills/analysis/static-vuln-analysis - Analyze discovered source code
- skills/documentation/* - Document findings systematically
Success Metrics
A successful web app reconnaissance should:
- Identify all major technologies used
- Discover hidden or forgotten functionality
- Extract API endpoints and parameters
- Find configuration or sensitive file exposures
- Map authentication and authorization flows
- Prioritize findings for further testing
- Complete without triggering security alerts (if stealth required)
References and Further Reading
- OWASP Web Security Testing Guide
- "The Web Application Hacker's Handbook" by Dafydd Stuttard
- "Bug Bounty Bootcamp" by Vickie Li (Chapters 4-5)
- PortSwigger Web Security Academy
- HackerOne disclosed reports for real-world examples
快速安装
/plugin add https://github.com/macaugh/super-rouge-hunter-skills/tree/main/web-app-recon在 Claude Code 中复制并粘贴此命令以安装该技能
GitHub 仓库
相关推荐技能
llamaguard
其他LlamaGuard是Meta推出的7-8B参数内容审核模型,专门用于过滤LLM的输入和输出内容。它能检测六大安全风险类别(暴力/仇恨、性内容、武器、违禁品、自残、犯罪计划),准确率达94-95%。开发者可通过HuggingFace、vLLM或Sagemaker快速部署,并能与NeMo Guardrails集成实现自动化安全防护。
sglang
元SGLang是一个专为LLM设计的高性能推理框架,特别适用于需要结构化输出的场景。它通过RadixAttention前缀缓存技术,在处理JSON、正则表达式、工具调用等具有重复前缀的复杂工作流时,能实现极速生成。如果你正在构建智能体或多轮对话系统,并追求远超vLLM的推理性能,SGLang是理想选择。
evaluating-llms-harness
测试该Skill通过60+个学术基准测试(如MMLU、GSM8K等)评估大语言模型质量,适用于模型对比、学术研究及训练进度追踪。它支持HuggingFace、vLLM和API接口,被EleutherAI等行业领先机构广泛采用。开发者可通过简单命令行快速对模型进行多任务批量评估。
langchain
元LangChain是一个用于构建LLM应用程序的框架,支持智能体、链和RAG应用开发。它提供多模型提供商支持、500+工具集成、记忆管理和向量检索等核心功能。开发者可用它快速构建聊天机器人、问答系统和自主代理,适用于从原型验证到生产部署的全流程。
