analyze-codebase-workflow
About
This skill automatically analyzes codebases to detect workflows, data pipelines, and file dependencies using putior's `put_auto()` engine. It generates an annotation plan mapping I/O patterns across 30+ languages, ideal for onboarding or starting putior integration. Use it to understand data flow in unfamiliar projects or to prepare for source file annotation.
Quick Install
Claude Code
Recommendednpx skills add pjt222/agent-almanac -a claude-code/plugin add https://github.com/pjt222/agent-almanacgit clone https://github.com/pjt222/agent-almanac.git ~/.claude/skills/analyze-codebase-workflowCopy and paste this command in Claude Code to install this skill
Documentation
Analyze Codebase Workflow
Survey repo → auto-detect data flows, file I/O, script deps → structured annotation plan for manual refinement.
Use When
- Onboard unfamiliar codebase → understand data flow
- Start putior integration, no PUT annotations
- Audit existing data pipeline pre-doc
- Prep annotation plan before
annotate-source-files
In
- Required: Path to repo/src dir
- Optional: Subdirs focus (default: entire repo)
- Optional: Langs include/exclude (default: all detected)
- Optional: Scope: inputs only, outputs only, both (default: both + deps)
Do
Step 1: Survey Repo Structure
Identify src files + langs → what putior can analyze.
library(putior)
# List all supported languages and their extensions
list_supported_languages()
list_supported_languages(detection_only = TRUE) # Only languages with auto-detection
# Get supported extensions
exts <- get_supported_extensions()
File listing → repo composition:
# Count files by extension in the target directory
find /path/to/repo -type f | sed 's/.*\.//' | sort | uniq -c | sort -rn | head -20
→ File extensions in repo + counts. Map against get_supported_extensions() → coverage.
If err: No files match supported → putior can't auto-detect. Check if lang supported but non-standard ext.
Step 2: Check Detection Coverage
Per detected lang → verify auto-detect pattern available.
# Check which languages have auto-detection patterns (18 languages, 902 patterns)
detection_langs <- list_supported_languages(detection_only = TRUE)
cat("Languages with auto-detection:\n")
print(detection_langs)
# Get pattern counts for specific languages found in the repo
for (lang in c("r", "python", "javascript", "sql", "dockerfile", "makefile")) {
patterns <- get_detection_patterns(lang)
cat(sprintf("%s: %d input, %d output, %d dependency patterns\n",
lang,
length(patterns$input),
length(patterns$output),
length(patterns$dependency)
))
}
→ Pattern counts printed. R 124, Python 159, JS 71, etc.
If err: No patterns → supports manual only, not auto. Plan manual annotations.
Step 3: Run Auto-Detection
Execute put_auto() → discover workflow elements.
# Full auto-detection
workflow <- put_auto("./src/",
detect_inputs = TRUE,
detect_outputs = TRUE,
detect_dependencies = TRUE
)
# Exclude build scripts and test helpers from scanning
workflow <- put_auto("./src/",
detect_inputs = TRUE,
detect_outputs = TRUE,
detect_dependencies = TRUE,
exclude = c("build-", "test_helper")
)
# View detected workflow nodes
print(workflow)
# Check node count
cat(sprintf("Detected %d workflow nodes\n", nrow(workflow)))
Large repos → analyze subdirs incrementally:
# Analyze specific subdirectories
etl_workflow <- put_auto("./src/etl/")
api_workflow <- put_auto("./src/api/")
→ Df w/ id, label, input, output, source_file cols. Row = detected step.
If err: Empty → src may lack recognizable I/O patterns. Try workflow <- put_auto("./src/", log_level = "DEBUG") → see scanned + matched.
Step 4: Initial Diagram
Visualize auto-detected → assess coverage + gaps.
# Generate diagram from auto-detected workflow
cat(put_diagram(workflow, theme = "github"))
# With source file info for traceability
cat(put_diagram(workflow, show_source_info = TRUE))
# Save to file for review
writeLines(put_diagram(workflow, theme = "github"), "workflow-auto.md")
→ Mermaid flowchart, detected nodes + data flow edges. Meaningful fn/file labels.
If err: Disconnected nodes → auto-detect found I/O but couldn't infer connections. Normal — matching output → input filenames. Annotation plan next step fills.
Step 5: Annotation Plan
Generate plan → what found + what needs manual.
# Generate annotation suggestions
put_generate("./src/", style = "single")
# For multiline style (more readable for complex workflows)
put_generate("./src/", style = "multiline")
# Copy suggestions to clipboard for easy pasting
put_generate("./src/", output = "clipboard")
Doc plan w/ coverage assessment:
## Annotation Plan
### Auto-Detected (no manual work needed)
- `src/etl/extract.R` — 3 inputs, 2 outputs detected
- `src/etl/transform.py` — 1 input, 1 output detected
### Needs Manual Annotation
- `src/api/handler.js` — Language supported but no I/O patterns matched
- `src/config/setup.sh` — Only 12 shell patterns; complex logic missed
### Not Supported
- `src/legacy/process.f90` — Fortran not in detection languages
### Recommended Connections
- extract.R output `data.csv` → transform.py input `data.csv` (auto-linked)
- transform.py output `clean.parquet` → load.R input (needs annotation)
→ Clear plan: auto-detected vs manual, specific recs per file.
If err: put_generate() no out → verify path correct + has supported src files.
Check
-
put_auto()no err on target - Detected workflow has ≥1 node (unless no recognizable I/O)
-
put_diagram()produces valid Mermaid -
put_generate()produces suggestions for detected files - Annotation plan doc created w/ coverage assessment
Traps
- Scan too broad:
put_auto(".")→ includesnode_modules/,.git/,venv/. Target specific src dirs. - Expect full coverage: Auto-detect finds I/O + lib calls, not business logic. 40-60% typical; rest manual.
- Ignore deps:
detect_dependencies = TRUEcatchessource(),import,require()→ links scripts. Disable → lose cross-file connections. - Lang mismatch: Non-standard ext (
.Rvs.r,.jsxvs.js) may not detect. Useget_comment_prefix(). ExtensionlessDockerfile,Makefilesupported via filename match. - Large repos: 100+ src files → analyze by module/dir → diagrams readable.
→
install-putior— prereqannotate-source-files— next: add manualgenerate-workflow-diagram— final after annotationconfigure-putior-mcp— MCP tools for interactive
GitHub Repository
Related Skills
executing-plans
DesignUse the executing-plans skill when you have a complete implementation plan to execute in controlled batches with review checkpoints. It loads and critically reviews the plan, then executes tasks in small batches (default 3 tasks) while reporting progress between each batch for architect review. This ensures systematic implementation with built-in quality control checkpoints.
requesting-code-review
DesignThis skill dispatches a code-reviewer subagent to analyze code changes against requirements before proceeding. It should be used after completing tasks, implementing major features, or before merging to main. The review helps catch issues early by comparing the current implementation with the original plan.
connect-mcp-server
DesignThis skill provides a comprehensive guide for developers to connect MCP servers to Claude Code using HTTP, stdio, or SSE transports. It covers installation, configuration, authentication, and security for integrating external services like GitHub, Notion, and custom APIs. Use it when setting up MCP integrations, configuring external tools, or working with Claude's Model Context Protocol.
web-cli-teleport
DesignThis skill helps developers choose between Claude Code Web and CLI interfaces based on task analysis, then enables seamless session teleportation between these environments. It optimizes workflow by managing session state and context when switching between web, CLI, or mobile. Use it for complex projects requiring different tools at various stages.
