- SOUL.md: full agent identity, operating principles, voice - IDENTITY.md: runtime identity, hosts, boundaries - USER.md: operator context imported from hermes-soul - AGENTS.md: actual operating rules, infrastructure, quick reference - memories/curated/: 5 topics (tailscale, forgejo, agents, projects, vaultwarden) - skills/: 9 cross-harness skills imported from hermes-soul after review - docs/PLAN-CONFIGURE-PRIVATE-REPO.md: configuration plan - Validate: passes clean
10 KiB
| name | description | version | author | license | platforms | metadata | |||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| codebase-knowledge-graphs | Build/query persistent codebase knowledge graphs for architecture discovery, cross-repo impact analysis, and agent handoffs. | 1.1.0 | Hermes Agent | MIT |
|
|
Codebase Knowledge Graphs
Overview
Use this skill when a task benefits from a persistent map of a repository or a set of related repositories: architecture discovery, cross-repo dependency questions, agent onboarding, impact analysis, or long-lived project navigation.
Default tool: Graphify (graphifyy package, graphify CLI). It creates:
graphify-out/graph.json
graphify-out/graph.html
graphify-out/GRAPH_REPORT.md
Treat the graph as a navigation aid, not an authority. Before editing code, verify graph answers against source files.
When to Use
Use for:
- Broad architecture questions: "how does X connect to Y?"
- Multi-agent projects where agents repeatedly lose context.
- Cross-repo flows, e.g. source repo -> build repo -> deployment repo.
- Impact analysis before changing scripts, skills, manifests, deployment contracts, or public interfaces.
- Producing a durable project map for future sessions.
Do not use as the first tool for:
- A tiny local grep/read-file task.
- Security-sensitive repositories where generated graph artifacts would leak secrets or paths.
- Critical runtime/boot paths until dependency and offline-install behavior has been tested.
Setup and Commands
Prefer uvx for one-off runs without permanently installing:
uvx --from graphifyy graphify --help
Build a graph for a project (AST-only, no API key needed):
cd <project> && uvx --from graphifyy graphify update .
This does AST extraction only — parses all source files, builds the graph with function/import/define edges. No LLM required. For semantic extraction (LLM-powered chunk labeling), you need an API key and graphify extract — see "Backend Pitfalls" below.
Query an existing graph:
uvx --from graphifyy graphify query "how does deployment work?" --graph graphify-out/graph.json
uvx --from graphifyy graphify path "hostd" "webroot" --graph graphify-out/graph.json
uvx --from graphifyy graphify explain "iso-publish" --graph graphify-out/graph.json
Export an architecture/call-flow page:
uvx --from graphifyy graphify export callflow-html
Merge graphs for cross-repo questions:
uvx --from graphifyy graphify merge-graphs \
../repo-a/graphify-out/graph.json \
../repo-b/graphify-out/graph.json \
--out graphify-out/merged-graph.json
Repository Integration Pattern
Default posture: generated graph artifacts are a temporary map, not the territory. The source code is the durable truth; regenerate graphs locally when needed instead of committing stale JSON.
For a mature repo that repeatedly uses Graphify, consider adding:
.graphifyignore
docs/GRAPHIFY.md
scripts/graphify-refresh.sh
Do not add a wrapper script before Graphify has been used on at least one real debugging/navigation task in that repo. First use the tool manually, observe what was annoying, then script the recurring parts.
Recommended .gitignore additions:
graphify-out/
*.graph.json
Commit graphify-out/graph.json, GRAPH_REPORT.md, graph.html, or other generated graph output only with an explicit project decision. Reasons to avoid committing by default:
- Graphs get stale as soon as source changes.
- Generated JSON creates noisy diffs and harder reviews.
- Checked-in graph output looks more authoritative than it is, even though Graphify can produce fake/noisy nodes or guessed edges.
Do not commit local caches, cost/mtime manifests, or generated graph output for Clawdie ISO unless the project explicitly reverses this rule.
Clawdie-ISO policy (2026-05-23): Graphify is prohibited entirely in the ISO repo. Do NOT add .graphifyignore, docs/GRAPHIFY.md, skills, or any graph-related files to clawdie-iso. The repo composition (shell scripts + markdown + archived planning docs) makes the graph mislead agents toward retired decisions and a deprecated QML installer. Clawdie-AI allows Linux-local on-demand use but still avoids formal integration. The Colibri and Herdr repos have no graph policy — use at your discretion.
.graphifyignore Guidance
Always exclude:
.git/
tmp/
node_modules/
dist/
build/
.cache/
.env
*.key
*.pem
*.sqlite
*.db
For ISO/build repos, also exclude:
*.img
*.img.gz
*.iso
*.sha256
packages/
downloads/
html/
webroot/
Include source, docs, skills, scripts, package lists, and manifest schemas.
Agent Usage Rules
- If
graphify-out/graph.jsonexists and the task is broad, query the graph before deep grep. - Use scoped questions and explicit graph paths.
- Cite graph findings as leads, not facts.
- Verify relevant source files before proposing or making code changes.
- Regenerate the graph when the repo has materially changed and graph freshness matters.
- Keep graph generation out of critical boot/build paths unless tested on the target platform.
Pitfalls
Build with update, not extract (different commands, different requirements)
graphify update and graphify extract are different commands with different requirements:
| Command | API key? | What it does | Use case |
|---|---|---|---|
graphify update . |
No | AST-only: parses source files, builds import/call/define graph | Primary build command — always works |
graphify extract . --out . |
Yes (Gemini/DeepSeek/OpenAI) | AST + semantic: LLM-powered chunk labeling on top of AST graph | Only when semantic labels are needed |
graphify extract may not appear in graphify --help (version-dependent). Always try graphify update . first — it produces a fully usable graph with zero configuration. A 971-file TypeScript repo produced 11,277 nodes and 16,820 edges with update alone.
If you accidentally run extract without an API key, you'll get:
error: no LLM API key found. Set GEMINI_API_KEY or GOOGLE_API_KEY (gemini), ... or pass --backend.
Fix: use graphify update . instead.
Create .graphifyignore BEFORE the first build
Without .graphifyignore, graphify walks every file in the repo including build artifacts, caches, and vendored dependencies. On a JS/TS project this means parsing node_modules/, dist/, tmp/, etc. — thousands of irrelevant files that bloat the graph and slow extraction.
Always create .graphifyignore before the first graphify update. Use the templates in the ".graphifyignore Guidance" section above. A 971-file repo with proper exclusions produces a clean graph; without exclusions it's easily 10x larger with noise nodes.
DeepSeek / OpenAI-compatible backends need the openai package (semantic extraction only)
When using graphify extract with --backend deepseek (or any OpenAI-compatible backend), the semantic phase requires the openai Python package. uvx sandboxes don't include it:
Gemini/Kimi/Ollama/OpenAI-compatible extraction requires the openai package. Run: pip install openai
Fix: Install system-wide:
pip install --break-system-packages openai
Or use a venv:
python3 -m venv /tmp/graphify-venv
/tmp/graphify-venv/bin/pip install openai
PATH="/tmp/graphify-venv/bin:$PATH" uvx --from graphifyy graphify extract . --out . --backend deepseek --model deepseek-chat
AST-only (graphify update) does not need this. Only the LLM-powered semantic chunk labeling in extract does.
Cross-repo AST-only graphs cannot trace data flow across repo boundaries
graphify update builds a graph from AST edges: imports, function calls, defines, contains. These edges only exist within a single codebase. Cross-repo data flow — e.g., TypeScript config values in one repo consumed by shell scripts in another — does not produce graph edges because the AST parser cannot see across repository boundaries.
graphify path will return "No path found" for cross-repo queries. The merged graph still has value (you can query both repos in one graph), but use graphify explain on individual nodes and trace the human-documented contracts (AGENTS.md, handoff docs) for cross-repo connections. The graph shows you what exists where; it does not replace cross-repo documentation.
FreeBSD and Offline Caution
Graphify is Python-based and depends on packages such as tree-sitter-*, networkx, rapidfuzz, and related scientific/runtime dependencies. On FreeBSD or offline images:
- Start with optional host/developer tooling, not mandatory runtime dependency.
- Investigate target-platform package availability before proposing runtime/ISO integration.
- Test
uvx --from graphifyy graphify --helpand a small extraction on FreeBSD before adding it to an ISO. - If offline use is required, cache/test wheels or ports on the target FreeBSD version first.
- Do not block image build, boot, USB flashing, or webroot publishing on graph generation.
For Clawdie/FreeBSD-specific package findings and the recommended smoke test, see references/graphify-freebsd-clawdie.md.
Validation
sh -n scripts/graphify-refresh.sh
uvx --from graphifyy graphify query "what are the main deployment paths?" --graph graphify-out/graph.json
python3 -m json.tool graphify-out/graph.json >/dev/null
For generated docs, open or serve graphify-out/graph.html only after confirming it does not expose secrets.
References
references/clawdie-graphify-integration.md— concrete Clawdie-AI/Clawdie-ISO integration plan and boundaries.references/graphify-freebsd-clawdie.md— FreeBSD package-availability findings, smoke-test commands, and Clawdie ISO/runtime boundary.references/clawdie-multi-repo-agents-structure.md— Clawdie's four-repo layout (AI, ISO, Colibri, herdr), AGENTS.md conventions per repo, platform split, and cross-repo update procedures.references/clawdie-cross-repo-graph-sizes.md— concrete node/edge counts from the 2026-05-27 Clawdie cross-repo graph build; merge command and cross-repo visibility limitations.