Web Application Reconnaissance
About
This skill provides a systematic methodology for mapping web application attack surfaces during security assessments. It focuses on discovering hidden endpoints, identifying technologies, and uncovering functionality that automated scanners often miss. Developers should use it when starting security assessments of web applications or hunting for forgotten endpoints and hidden features.
Documentation
Web Application Reconnaissance
Overview
Web application reconnaissance goes beyond simple subdomain discovery to map the full attack surface of a web application. This includes discovering hidden endpoints, analyzing client-side code, identifying backend technologies, and understanding the application's architecture.
Core principle: Systematic enumeration combined with intelligent analysis reveals hidden attack surface that automated scanners miss.
When to Use
Use this skill when:
- Starting security assessment of a web application
- Building comprehensive understanding of app structure
- Looking for hidden admin panels, APIs, or debug endpoints
- Analyzing JavaScript for hardcoded secrets or endpoints
- Mapping application functionality before deeper testing
Don't use when:
- Not authorized to test the target
- Application has strict rate limiting (adjust methodology)
- Need to remain completely passive (use only public sources)
The Four-Phase Methodology
Phase 1: Initial Discovery and Fingerprinting
Goal: Understand what you're dealing with - technologies, frameworks, and basic structure.
Techniques:
-
Technology Detection
# Comprehensive tech stack identification whatweb -v -a 3 https://target.com # HTTP headers analysis curl -I https://target.com # Wappalyzer or similar wappalyzer https://target.com -
Common Files and Directories
# robots.txt - often reveals hidden directories curl https://target.com/robots.txt # sitemap.xml - complete site structure curl https://target.com/sitemap.xml # security.txt - contact info, may reveal scope curl https://target.com/.well-known/security.txt # Common config/info files for file in readme.md humans.txt crossdomain.xml; do curl -s https://target.com/$file done -
SSL/TLS Analysis
# Certificate information may reveal additional domains echo | openssl s_client -connect target.com:443 2>/dev/null | \ openssl x509 -noout -text | \ grep -A1 "Subject Alternative Name"
Phase 2: Content Discovery
Goal: Find hidden endpoints, forgotten files, backup directories, and undocumented functionality.
Techniques:
-
Directory and File Fuzzing
# ffuf - fast web fuzzer ffuf -w /path/to/wordlist.txt \ -u https://target.com/FUZZ \ -mc 200,301,302,403 \ -o directories.json # gobuster for directory brute-forcing gobuster dir -u https://target.com \ -w /path/to/wordlist.txt \ -x php,html,js,txt,json \ -o gobuster_results.txt # feroxbuster - recursive directory discovery feroxbuster -u https://target.com \ -w /path/to/wordlist.txt \ --depth 3 \ -x php js json -
Intelligent Wordlist Selection
# Technology-specific wordlists # For WordPress: ffuf -w wordpress_wordlist.txt -u https://target.com/FUZZ # For APIs: ffuf -w api_wordlist.txt -u https://target.com/api/FUZZ # Custom wordlist from discovered technologies # If tech stack is Python/Django, use Django-specific paths -
Backup and Sensitive File Discovery
# Common backup patterns for ext in .bak .old .backup .swp ~; do ffuf -w discovered_files.txt -u https://target.com/FUZZ$ext -mc 200 done # Source code disclosure ffuf -w discovered_files.txt -u https://target.com/FUZZ.txt -mc 200 # Git exposure curl -s https://target.com/.git/HEAD # If found, use git-dumper or similar to extract repository
Phase 3: JavaScript Analysis
Goal: Extract hardcoded secrets, discover API endpoints, and understand client-side logic.
Techniques:
-
Enumerate All JavaScript Files
# Extract JS URLs from HTML curl -s https://target.com | \ grep -oP 'src="[^"]+\.js"' | \ sed 's/src="//;s/"$//' > js_files.txt # Use LinkFinder or similar python3 linkfinder.py -i https://target.com -o results.html -
Search for Sensitive Data in JS
# Download all JS files while read url; do curl -s "$url" > "js/$(basename "$url")" done < js_files.txt # Search for patterns grep -r -E "(api_key|apikey|secret|password|token|aws_access)" js/ grep -r -E "(https?://[^\"\'\ ]+)" js/ | grep -v "fonts\|cdn" # Find API endpoints grep -r -E "(/api/|/v[0-9]+/)" js/ -
Beautify and Analyze Minified Code
# Beautify JS for easier analysis for file in js/*.js; do js-beautify "$file" > "js_beautified/$(basename "$file")" done # Look for interesting functions grep -r "function" js_beautified/ | grep -i "admin\|debug\|test" -
Extract Subdomains and Endpoints from JS
# Use tools like JSFinder, relative-url-extractor python3 relative-url-extractor.py -u https://target.com > endpoints.txt
Phase 4: Architecture Mapping
Goal: Understand application structure, authentication flows, and data flows.
Techniques:
-
Crawling and Spidering
# Burp Suite spider (manual) # Or use automated crawlers gospider -s https://target.com -d 3 -c 10 -o spider_output # katana - fast crawler katana -u https://target.com -d 5 -ps -jc -o crawl_results.txt -
Parameter Discovery
# Find URL parameters arjun -u https://target.com/search -m GET # ParamSpider - discover parameters from wayback python3 paramspider.py -d target.com -
API Endpoint Enumeration
# If API discovered, enumerate versions and endpoints for version in v1 v2 v3; do ffuf -w api_endpoints.txt -u https://api.target.com/$version/FUZZ done # Swagger/OpenAPI documentation curl https://api.target.com/swagger.json curl https://api.target.com/openapi.json curl https://api.target.com/api-docs -
Authentication and Session Analysis
# Analyze authentication mechanisms # - Cookie attributes (HttpOnly, Secure, SameSite) # - JWT tokens (decode and analyze claims) # - OAuth flows # - Session management # Check for JWT # Decode JWT token (use jwt_tool or jwt.io) echo "eyJhbG..." | base64 -d
Automation Pipeline
Complete reconnaissance pipeline:
#!/bin/bash
# web_app_recon.sh
TARGET=$1
OUTPUT_DIR="${TARGET//[.:\/]/_}_webapp_recon"
mkdir -p "$OUTPUT_DIR"/{js,crawl,endpoints}
echo "[*] Starting web application reconnaissance for $TARGET"
# Phase 1: Fingerprinting
echo "[*] Phase 1: Technology fingerprinting"
whatweb -v -a 3 "$TARGET" > "$OUTPUT_DIR/whatweb.txt"
curl -I "$TARGET" > "$OUTPUT_DIR/headers.txt"
curl -s "$TARGET/robots.txt" > "$OUTPUT_DIR/robots.txt"
curl -s "$TARGET/sitemap.xml" > "$OUTPUT_DIR/sitemap.xml"
# Phase 2: Content Discovery
echo "[*] Phase 2: Content discovery"
feroxbuster -u "$TARGET" \
-w /usr/share/wordlists/seclists/Discovery/Web-Content/common.txt \
-x php,html,js,txt,json \
--depth 2 \
-o "$OUTPUT_DIR/feroxbuster.txt"
# Phase 3: JavaScript Analysis
echo "[*] Phase 3: JavaScript analysis"
katana -u "$TARGET" -jc -o "$OUTPUT_DIR/crawl/katana_js.txt"
# Download and analyze JS files
grep "\.js$" "$OUTPUT_DIR/crawl/katana_js.txt" | while read js_url; do
filename=$(echo "$js_url" | md5sum | cut -d' ' -f1)
curl -s "$js_url" > "$OUTPUT_DIR/js/${filename}.js"
done
# Search for secrets in JS
echo "[*] Searching for sensitive data in JavaScript"
grep -r -E "(api[_-]?key|secret|password|token)" "$OUTPUT_DIR/js/" > "$OUTPUT_DIR/js_secrets.txt"
# Phase 4: Endpoint extraction
echo "[*] Phase 4: Endpoint extraction"
cat "$OUTPUT_DIR/js"/*.js | grep -oP '(/api/[^"'"'"'\s]+)' | sort -u > "$OUTPUT_DIR/endpoints/api_endpoints.txt"
echo "[+] Reconnaissance complete. Results in $OUTPUT_DIR/"
echo "[+] Review the following files:"
echo " - whatweb.txt: Technology stack"
echo " - feroxbuster.txt: Discovered directories/files"
echo " - js_secrets.txt: Potential secrets in JavaScript"
echo " - endpoints/api_endpoints.txt: API endpoints found"
Tool Recommendations
Content Discovery:
- ffuf (fast, flexible, modern)
- feroxbuster (recursive, Rust-based)
- gobuster (reliable, simple)
Crawling:
- katana (fast, modern)
- gospider (feature-rich)
- Burp Suite spider (manual, thorough)
JavaScript Analysis:
- LinkFinder (extract endpoints from JS)
- JSFinder (find subdomains/endpoints)
- relative-url-extractor
- js-beautify (beautify minified code)
General:
- httpx (probing and tech detection)
- nuclei (vulnerability templates)
- waybackurls (historical URLs)
Common Patterns and Findings
High-value targets to look for:
-
Admin/Debug Panels
/admin, /administrator, /admin.php /debug, /test, /dev /phpinfo.php, /info.php /console, /terminal -
Configuration Files
/config.php, /.env, /settings.py /web.config, /application.yml /config.json, /.git/config -
API Documentation
/api-docs, /swagger, /api/v1/docs /graphql, /graphiql /redoc, /openapi.json -
Backup Files
/backup, /backups, /old index.php.bak, database.sql.old site.tar.gz, backup.zip
Organizing Findings
Create structured documentation:
# Web App Recon: target.com
## Executive Summary
- Application Type: [E-commerce, API, CMS, etc.]
- Primary Technology: [PHP/Laravel, Python/Django, Node.js, etc.]
- Notable Findings: [X hidden endpoints, Y exposed configs]
## Technology Stack
- Frontend: React 18.2, Bootstrap 5
- Backend: Laravel 9.x
- Server: Nginx 1.21
- Database: MySQL (inferred from error messages)
## Discovered Endpoints
### Public
- /api/v1/products - Product listing API
- /api/v1/users - User profiles (requires auth)
### Hidden/Interesting
- /api/v1/admin - Admin API (403, exists!)
- /api/internal/metrics - Internal metrics endpoint
- /debug/routes - Laravel route list (exposed!)
## Sensitive Files Found
- /storage/logs/laravel.log - Application logs exposed
- /.env.backup - Backup of environment config
- /phpinfo.php - Server info disclosure
## JavaScript Findings
- API keys found: 2 (one appears to be test key)
- Hardcoded API endpoints: 15 additional endpoints
- Subdomains discovered: api-staging.target.com
## Priority Items for Further Testing
1. /debug/routes - Full route disclosure
2. /.env.backup - May contain database credentials
3. /api/internal/metrics - Potential IDOR or info disclosure
4. Staging subdomain - May have weaker security
## Next Steps
- Test IDOR on /api/v1/users endpoints
- Attempt to access admin API with discovered tokens
- Manual review of staging environment
- Test for SQL injection in search parameters
Legal and Ethical Considerations
CRITICAL - Always follow these rules:
-
Authorization Required
- Never test without explicit permission
- Understand scope and boundaries
- Don't access sensitive data unless authorized
-
Responsible Disclosure
- Report findings through proper channels
- Don't publicly disclose before remediation
- Follow responsible disclosure timelines
-
Data Handling
- Don't exfiltrate sensitive data
- Don't store credentials or PII
- Delete reconnaissance data after assessment
-
Avoid DoS Conditions
- Rate limit your requests
- Don't overload servers
- Use appropriate concurrency settings
Common Pitfalls
| Mistake | Impact | Solution |
|---|---|---|
| Relying only on automated tools | Miss context-specific findings | Combine automation with manual analysis |
| Skipping JavaScript analysis | Miss API endpoints and secrets | Always analyze client-side code |
| Not checking robots.txt first | Waste time on known paths | Start with obvious information sources |
| Ignoring error messages | Miss technology fingerprinting | Pay attention to verbose errors |
| Too aggressive fuzzing | Detection, IP blocking | Start with smaller wordlists, increase gradually |
Integration with Other Skills
This skill works with:
- skills/reconnaissance/automated-subdomain-enum - Feeds discovered subdomains here
- skills/exploitation/* - Use discovered endpoints for exploitation
- skills/analysis/static-vuln-analysis - Analyze discovered source code
- skills/documentation/* - Document findings systematically
Success Metrics
A successful web app reconnaissance should:
- Identify all major technologies used
- Discover hidden or forgotten functionality
- Extract API endpoints and parameters
- Find configuration or sensitive file exposures
- Map authentication and authorization flows
- Prioritize findings for further testing
- Complete without triggering security alerts (if stealth required)
References and Further Reading
- OWASP Web Security Testing Guide
- "The Web Application Hacker's Handbook" by Dafydd Stuttard
- "Bug Bounty Bootcamp" by Vickie Li (Chapters 4-5)
- PortSwigger Web Security Academy
- HackerOne disclosed reports for real-world examples
Quick Install
/plugin add https://github.com/macaugh/super-rouge-hunter-skills/tree/main/web-app-reconCopy and paste this command in Claude Code to install this skill
GitHub 仓库
Related Skills
sglang
MetaSGLang is a high-performance LLM serving framework that specializes in fast, structured generation for JSON, regex, and agentic workflows using its RadixAttention prefix caching. It delivers significantly faster inference, especially for tasks with repeated prefixes, making it ideal for complex, structured outputs and multi-turn conversations. Choose SGLang over alternatives like vLLM when you need constrained decoding or are building applications with extensive prefix sharing.
llamaguard
OtherLlamaGuard is Meta's 7-8B parameter model for moderating LLM inputs and outputs across six safety categories like violence and hate speech. It offers 94-95% accuracy and can be deployed using vLLM, Hugging Face, or Amazon SageMaker. Use this skill to easily integrate content filtering and safety guardrails into your AI applications.
langchain
MetaLangChain is a framework for building LLM applications using agents, chains, and RAG pipelines. It supports multiple LLM providers, offers 500+ integrations, and includes features like tool calling and memory management. Use it for rapid prototyping and deploying production systems like chatbots, autonomous agents, and question-answering services.
evaluating-llms-harness
TestingThis Claude Skill runs the lm-evaluation-harness to benchmark LLMs across 60+ standardized academic tasks like MMLU and GSM8K. It's designed for developers to compare model quality, track training progress, or report academic results. The tool supports various backends including HuggingFace and vLLM models.
