layered-soul/skills/codebase-knowledge-graphs/SKILL.md
Sam & Claude 4d8ce07fa7 docs: apply Prettier to current markdown (Sam & Codex)
Normalize markdown formatting after the latest main updates.\n\nChecks: python3 scripts/layered_soul.py validate .; npx --yes prettier@3 --check '**/*.md'; git diff --check.
2026-06-14 01:48:32 +02:00

10 KiB

name description version author license platforms metadata
codebase-knowledge-graphs Build/query persistent codebase knowledge graphs for architecture discovery, cross-repo impact analysis, and agent handoffs. 1.1.0 Hermes Agent MIT
linux
macos
freebsd
hermes
tags related_skills
codebase-analysis
knowledge-graphs
architecture
graphify
cross-repo
writing-plans
codebase-inspection
requesting-code-review

Codebase Knowledge Graphs

Overview

Use this skill when a task benefits from a persistent map of a repository or a set of related repositories: architecture discovery, cross-repo dependency questions, agent onboarding, impact analysis, or long-lived project navigation.

Default tool: Graphify (graphifyy package, graphify CLI). It creates:

graphify-out/graph.json
graphify-out/graph.html
graphify-out/GRAPH_REPORT.md

Treat the graph as a navigation aid, not an authority. Before editing code, verify graph answers against source files.

When to Use

Use for:

  • Broad architecture questions: "how does X connect to Y?"
  • Multi-agent projects where agents repeatedly lose context.
  • Cross-repo flows, e.g. source repo -> build repo -> deployment repo.
  • Impact analysis before changing scripts, skills, manifests, deployment contracts, or public interfaces.
  • Producing a durable project map for future sessions.

Do not use as the first tool for:

  • A tiny local grep/read-file task.
  • Security-sensitive repositories where generated graph artifacts would leak secrets or paths.
  • Critical runtime/boot paths until dependency and offline-install behavior has been tested.

Setup and Commands

Prefer uvx for one-off runs without permanently installing:

uvx --from graphifyy graphify --help

Build a graph for a project (AST-only, no API key needed):

cd <project> && uvx --from graphifyy graphify update .

This does AST extraction only — parses all source files, builds the graph with function/import/define edges. No LLM required. For semantic extraction (LLM-powered chunk labeling), you need an API key and graphify extract — see "Backend Pitfalls" below.

Query an existing graph:

uvx --from graphifyy graphify query "how does deployment work?" --graph graphify-out/graph.json
uvx --from graphifyy graphify path "hostd" "webroot" --graph graphify-out/graph.json
uvx --from graphifyy graphify explain "iso-publish" --graph graphify-out/graph.json

Export an architecture/call-flow page:

uvx --from graphifyy graphify export callflow-html

Merge graphs for cross-repo questions:

uvx --from graphifyy graphify merge-graphs \
  ../repo-a/graphify-out/graph.json \
  ../repo-b/graphify-out/graph.json \
  --out graphify-out/merged-graph.json

Repository Integration Pattern

Default posture: generated graph artifacts are a temporary map, not the territory. The source code is the durable truth; regenerate graphs locally when needed instead of committing stale JSON.

For a mature repo that repeatedly uses Graphify, consider adding:

.graphifyignore
docs/GRAPHIFY.md
scripts/graphify-refresh.sh

Do not add a wrapper script before Graphify has been used on at least one real debugging/navigation task in that repo. First use the tool manually, observe what was annoying, then script the recurring parts.

Recommended .gitignore additions:

graphify-out/
*.graph.json

Commit graphify-out/graph.json, GRAPH_REPORT.md, graph.html, or other generated graph output only with an explicit project decision. Reasons to avoid committing by default:

  • Graphs get stale as soon as source changes.
  • Generated JSON creates noisy diffs and harder reviews.
  • Checked-in graph output looks more authoritative than it is, even though Graphify can produce fake/noisy nodes or guessed edges.

Do not commit local caches, cost/mtime manifests, or generated graph output for Clawdie ISO unless the project explicitly reverses this rule.

Clawdie-ISO policy (2026-05-23): Graphify is prohibited entirely in the ISO repo. Do NOT add .graphifyignore, docs/GRAPHIFY.md, skills, or any graph-related files to clawdie-iso. The repo composition (shell scripts + markdown + archived planning docs) makes the graph mislead agents toward retired decisions and a deprecated QML installer. Clawdie-AI allows Linux-local on-demand use but still avoids formal integration. The Colibri and Herdr repos have no graph policy — use at your discretion.

.graphifyignore Guidance

Always exclude:

.git/
tmp/
node_modules/
dist/
build/
.cache/
.env
*.key
*.pem
*.sqlite
*.db

For ISO/build repos, also exclude:

*.img
*.img.gz
*.iso
*.sha256
packages/
downloads/
html/
webroot/

Include source, docs, skills, scripts, package lists, and manifest schemas.

Agent Usage Rules

  1. If graphify-out/graph.json exists and the task is broad, query the graph before deep grep.
  2. Use scoped questions and explicit graph paths.
  3. Cite graph findings as leads, not facts.
  4. Verify relevant source files before proposing or making code changes.
  5. Regenerate the graph when the repo has materially changed and graph freshness matters.
  6. Keep graph generation out of critical boot/build paths unless tested on the target platform.

Pitfalls

Build with update, not extract (different commands, different requirements)

graphify update and graphify extract are different commands with different requirements:

Command API key? What it does Use case
graphify update . No AST-only: parses source files, builds import/call/define graph Primary build command — always works
graphify extract . --out . Yes (Gemini/DeepSeek/OpenAI) AST + semantic: LLM-powered chunk labeling on top of AST graph Only when semantic labels are needed

graphify extract may not appear in graphify --help (version-dependent). Always try graphify update . first — it produces a fully usable graph with zero configuration. A 971-file TypeScript repo produced 11,277 nodes and 16,820 edges with update alone.

If you accidentally run extract without an API key, you'll get:

error: no LLM API key found. Set GEMINI_API_KEY or GOOGLE_API_KEY (gemini), ... or pass --backend.

Fix: use graphify update . instead.

Create .graphifyignore BEFORE the first build

Without .graphifyignore, graphify walks every file in the repo including build artifacts, caches, and vendored dependencies. On a JS/TS project this means parsing node_modules/, dist/, tmp/, etc. — thousands of irrelevant files that bloat the graph and slow extraction.

Always create .graphifyignore before the first graphify update. Use the templates in the ".graphifyignore Guidance" section above. A 971-file repo with proper exclusions produces a clean graph; without exclusions it's easily 10x larger with noise nodes.

DeepSeek / OpenAI-compatible backends need the openai package (semantic extraction only)

When using graphify extract with --backend deepseek (or any OpenAI-compatible backend), the semantic phase requires the openai Python package. uvx sandboxes don't include it:

Gemini/Kimi/Ollama/OpenAI-compatible extraction requires the openai package. Run: pip install openai

Fix: Install system-wide:

pip install --break-system-packages openai

Or use a venv:

python3 -m venv /tmp/graphify-venv
/tmp/graphify-venv/bin/pip install openai
PATH="/tmp/graphify-venv/bin:$PATH" uvx --from graphifyy graphify extract . --out . --backend deepseek --model deepseek-chat

AST-only (graphify update) does not need this. Only the LLM-powered semantic chunk labeling in extract does.

Cross-repo AST-only graphs cannot trace data flow across repo boundaries

graphify update builds a graph from AST edges: imports, function calls, defines, contains. These edges only exist within a single codebase. Cross-repo data flow — e.g., TypeScript config values in one repo consumed by shell scripts in another — does not produce graph edges because the AST parser cannot see across repository boundaries.

graphify path will return "No path found" for cross-repo queries. The merged graph still has value (you can query both repos in one graph), but use graphify explain on individual nodes and trace the human-documented contracts (AGENTS.md, handoff docs) for cross-repo connections. The graph shows you what exists where; it does not replace cross-repo documentation.

FreeBSD and Offline Caution

Graphify is Python-based and depends on packages such as tree-sitter-*, networkx, rapidfuzz, and related scientific/runtime dependencies. On FreeBSD or offline images:

  • Start with optional host/developer tooling, not mandatory runtime dependency.
  • Investigate target-platform package availability before proposing runtime/ISO integration.
  • Test uvx --from graphifyy graphify --help and a small extraction on FreeBSD before adding it to an ISO.
  • If offline use is required, cache/test wheels or ports on the target FreeBSD version first.
  • Do not block image build, boot, USB flashing, or webroot publishing on graph generation.

For Clawdie/FreeBSD-specific package findings and the recommended smoke test, see references/graphify-freebsd-clawdie.md.

Validation

sh -n scripts/graphify-refresh.sh
uvx --from graphifyy graphify query "what are the main deployment paths?" --graph graphify-out/graph.json
python3 -m json.tool graphify-out/graph.json >/dev/null

For generated docs, open or serve graphify-out/graph.html only after confirming it does not expose secrets.

References

  • references/clawdie-graphify-integration.md — concrete Clawdie-AI/Clawdie-ISO integration plan and boundaries.
  • references/graphify-freebsd-clawdie.md — FreeBSD package-availability findings, smoke-test commands, and Clawdie ISO/runtime boundary.
  • references/clawdie-multi-repo-agents-structure.md — Clawdie's four-repo layout (AI, ISO, Colibri, herdr), AGENTS.md conventions per repo, platform split, and cross-repo update procedures.
  • references/clawdie-cross-repo-graph-sizes.md — concrete node/edge counts from the 2026-05-27 Clawdie cross-repo graph build; merge command and cross-repo visibility limitations.