Normalize markdown formatting after the latest main updates.\n\nChecks: python3 scripts/layered_soul.py validate .; npx --yes prettier@3 --check '**/*.md'; git diff --check.
243 lines
10 KiB
Markdown
243 lines
10 KiB
Markdown
---
|
|
name: codebase-knowledge-graphs
|
|
description: "Build/query persistent codebase knowledge graphs for architecture discovery, cross-repo impact analysis, and agent handoffs."
|
|
version: 1.1.0
|
|
author: Hermes Agent
|
|
license: MIT
|
|
platforms: [linux, macos, freebsd]
|
|
metadata:
|
|
hermes:
|
|
tags:
|
|
[codebase-analysis, knowledge-graphs, architecture, graphify, cross-repo]
|
|
related_skills: [writing-plans, codebase-inspection, requesting-code-review]
|
|
---
|
|
|
|
# Codebase Knowledge Graphs
|
|
|
|
## Overview
|
|
|
|
Use this skill when a task benefits from a persistent map of a repository or a set of related repositories: architecture discovery, cross-repo dependency questions, agent onboarding, impact analysis, or long-lived project navigation.
|
|
|
|
Default tool: **Graphify** (`graphifyy` package, `graphify` CLI). It creates:
|
|
|
|
```text
|
|
graphify-out/graph.json
|
|
graphify-out/graph.html
|
|
graphify-out/GRAPH_REPORT.md
|
|
```
|
|
|
|
Treat the graph as a navigation aid, not an authority. Before editing code, verify graph answers against source files.
|
|
|
|
## When to Use
|
|
|
|
Use for:
|
|
|
|
- Broad architecture questions: "how does X connect to Y?"
|
|
- Multi-agent projects where agents repeatedly lose context.
|
|
- Cross-repo flows, e.g. source repo -> build repo -> deployment repo.
|
|
- Impact analysis before changing scripts, skills, manifests, deployment contracts, or public interfaces.
|
|
- Producing a durable project map for future sessions.
|
|
|
|
Do not use as the first tool for:
|
|
|
|
- A tiny local grep/read-file task.
|
|
- Security-sensitive repositories where generated graph artifacts would leak secrets or paths.
|
|
- Critical runtime/boot paths until dependency and offline-install behavior has been tested.
|
|
|
|
## Setup and Commands
|
|
|
|
Prefer `uvx` for one-off runs without permanently installing:
|
|
|
|
```sh
|
|
uvx --from graphifyy graphify --help
|
|
```
|
|
|
|
Build a graph for a project (AST-only, no API key needed):
|
|
|
|
```sh
|
|
cd <project> && uvx --from graphifyy graphify update .
|
|
```
|
|
|
|
This does AST extraction only — parses all source files, builds the graph with function/import/define edges. No LLM required. For semantic extraction (LLM-powered chunk labeling), you need an API key and `graphify extract` — see "Backend Pitfalls" below.
|
|
|
|
Query an existing graph:
|
|
|
|
```sh
|
|
uvx --from graphifyy graphify query "how does deployment work?" --graph graphify-out/graph.json
|
|
uvx --from graphifyy graphify path "hostd" "webroot" --graph graphify-out/graph.json
|
|
uvx --from graphifyy graphify explain "iso-publish" --graph graphify-out/graph.json
|
|
```
|
|
|
|
Export an architecture/call-flow page:
|
|
|
|
```sh
|
|
uvx --from graphifyy graphify export callflow-html
|
|
```
|
|
|
|
Merge graphs for cross-repo questions:
|
|
|
|
```sh
|
|
uvx --from graphifyy graphify merge-graphs \
|
|
../repo-a/graphify-out/graph.json \
|
|
../repo-b/graphify-out/graph.json \
|
|
--out graphify-out/merged-graph.json
|
|
```
|
|
|
|
## Repository Integration Pattern
|
|
|
|
Default posture: generated graph artifacts are a temporary map, not the territory. The source code is the durable truth; regenerate graphs locally when needed instead of committing stale JSON.
|
|
|
|
For a mature repo that repeatedly uses Graphify, consider adding:
|
|
|
|
```text
|
|
.graphifyignore
|
|
docs/GRAPHIFY.md
|
|
scripts/graphify-refresh.sh
|
|
```
|
|
|
|
Do **not** add a wrapper script before Graphify has been used on at least one real debugging/navigation task in that repo. First use the tool manually, observe what was annoying, then script the recurring parts.
|
|
|
|
Recommended `.gitignore` additions:
|
|
|
|
```text
|
|
graphify-out/
|
|
*.graph.json
|
|
```
|
|
|
|
Commit `graphify-out/graph.json`, `GRAPH_REPORT.md`, `graph.html`, or other generated graph output only with an explicit project decision. Reasons to avoid committing by default:
|
|
|
|
- Graphs get stale as soon as source changes.
|
|
- Generated JSON creates noisy diffs and harder reviews.
|
|
- Checked-in graph output looks more authoritative than it is, even though Graphify can produce fake/noisy nodes or guessed edges.
|
|
|
|
Do not commit local caches, cost/mtime manifests, or generated graph output for Clawdie ISO unless the project explicitly reverses this rule.
|
|
|
|
**Clawdie-ISO policy (2026-05-23):** Graphify is prohibited entirely in the ISO repo. Do NOT add `.graphifyignore`, `docs/GRAPHIFY.md`, skills, or any graph-related files to `clawdie-iso`. The repo composition (shell scripts + markdown + archived planning docs) makes the graph mislead agents toward retired decisions and a deprecated QML installer. Clawdie-AI allows Linux-local on-demand use but still avoids formal integration. The Colibri and Herdr repos have no graph policy — use at your discretion.
|
|
|
|
## `.graphifyignore` Guidance
|
|
|
|
Always exclude:
|
|
|
|
```text
|
|
.git/
|
|
tmp/
|
|
node_modules/
|
|
dist/
|
|
build/
|
|
.cache/
|
|
.env
|
|
*.key
|
|
*.pem
|
|
*.sqlite
|
|
*.db
|
|
```
|
|
|
|
For ISO/build repos, also exclude:
|
|
|
|
```text
|
|
*.img
|
|
*.img.gz
|
|
*.iso
|
|
*.sha256
|
|
packages/
|
|
downloads/
|
|
html/
|
|
webroot/
|
|
```
|
|
|
|
Include source, docs, skills, scripts, package lists, and manifest schemas.
|
|
|
|
## Agent Usage Rules
|
|
|
|
1. If `graphify-out/graph.json` exists and the task is broad, query the graph before deep grep.
|
|
2. Use scoped questions and explicit graph paths.
|
|
3. Cite graph findings as leads, not facts.
|
|
4. Verify relevant source files before proposing or making code changes.
|
|
5. Regenerate the graph when the repo has materially changed and graph freshness matters.
|
|
6. Keep graph generation out of critical boot/build paths unless tested on the target platform.
|
|
|
|
## Pitfalls
|
|
|
|
### Build with `update`, not `extract` (different commands, different requirements)
|
|
|
|
`graphify update` and `graphify extract` are **different commands** with different requirements:
|
|
|
|
| Command | API key? | What it does | Use case |
|
|
| ---------------------------- | ---------------------------- | -------------------------------------------------------------- | ------------------------------------ |
|
|
| `graphify update .` | No | AST-only: parses source files, builds import/call/define graph | Primary build command — always works |
|
|
| `graphify extract . --out .` | Yes (Gemini/DeepSeek/OpenAI) | AST + semantic: LLM-powered chunk labeling on top of AST graph | Only when semantic labels are needed |
|
|
|
|
`graphify extract` may not appear in `graphify --help` (version-dependent). Always try `graphify update .` first — it produces a fully usable graph with zero configuration. A 971-file TypeScript repo produced 11,277 nodes and 16,820 edges with `update` alone.
|
|
|
|
If you accidentally run `extract` without an API key, you'll get:
|
|
|
|
```
|
|
error: no LLM API key found. Set GEMINI_API_KEY or GOOGLE_API_KEY (gemini), ... or pass --backend.
|
|
```
|
|
|
|
Fix: use `graphify update .` instead.
|
|
|
|
### Create `.graphifyignore` BEFORE the first build
|
|
|
|
Without `.graphifyignore`, graphify walks every file in the repo including build artifacts, caches, and vendored dependencies. On a JS/TS project this means parsing `node_modules/`, `dist/`, `tmp/`, etc. — thousands of irrelevant files that bloat the graph and slow extraction.
|
|
|
|
**Always create `.graphifyignore` before the first `graphify update`.** Use the templates in the "`.graphifyignore` Guidance" section above. A 971-file repo with proper exclusions produces a clean graph; without exclusions it's easily 10x larger with noise nodes.
|
|
|
|
### DeepSeek / OpenAI-compatible backends need the `openai` package (semantic extraction only)
|
|
|
|
When using `graphify extract` with `--backend deepseek` (or any OpenAI-compatible backend), the semantic phase requires the `openai` Python package. `uvx` sandboxes don't include it:
|
|
|
|
```
|
|
Gemini/Kimi/Ollama/OpenAI-compatible extraction requires the openai package. Run: pip install openai
|
|
```
|
|
|
|
**Fix:** Install system-wide:
|
|
|
|
```sh
|
|
pip install --break-system-packages openai
|
|
```
|
|
|
|
Or use a venv:
|
|
|
|
```sh
|
|
python3 -m venv /tmp/graphify-venv
|
|
/tmp/graphify-venv/bin/pip install openai
|
|
PATH="/tmp/graphify-venv/bin:$PATH" uvx --from graphifyy graphify extract . --out . --backend deepseek --model deepseek-chat
|
|
```
|
|
|
|
AST-only (`graphify update`) does not need this. Only the LLM-powered semantic chunk labeling in `extract` does.
|
|
|
|
### Cross-repo AST-only graphs cannot trace data flow across repo boundaries
|
|
|
|
`graphify update` builds a graph from AST edges: imports, function calls, defines, contains. These edges only exist within a single codebase. Cross-repo data flow — e.g., TypeScript config values in one repo consumed by shell scripts in another — does **not** produce graph edges because the AST parser cannot see across repository boundaries.
|
|
|
|
`graphify path` will return "No path found" for cross-repo queries. The merged graph still has value (you can query both repos in one graph), but use `graphify explain` on individual nodes and trace the human-documented contracts (AGENTS.md, handoff docs) for cross-repo connections. The graph shows you _what exists where_; it does not replace cross-repo documentation.
|
|
|
|
## FreeBSD and Offline Caution
|
|
|
|
Graphify is Python-based and depends on packages such as `tree-sitter-*`, `networkx`, `rapidfuzz`, and related scientific/runtime dependencies. On FreeBSD or offline images:
|
|
|
|
- Start with optional host/developer tooling, not mandatory runtime dependency.
|
|
- Investigate target-platform package availability before proposing runtime/ISO integration.
|
|
- Test `uvx --from graphifyy graphify --help` and a small extraction on FreeBSD before adding it to an ISO.
|
|
- If offline use is required, cache/test wheels or ports on the target FreeBSD version first.
|
|
- Do not block image build, boot, USB flashing, or webroot publishing on graph generation.
|
|
|
|
For Clawdie/FreeBSD-specific package findings and the recommended smoke test, see `references/graphify-freebsd-clawdie.md`.
|
|
|
|
## Validation
|
|
|
|
```sh
|
|
sh -n scripts/graphify-refresh.sh
|
|
uvx --from graphifyy graphify query "what are the main deployment paths?" --graph graphify-out/graph.json
|
|
python3 -m json.tool graphify-out/graph.json >/dev/null
|
|
```
|
|
|
|
For generated docs, open or serve `graphify-out/graph.html` only after confirming it does not expose secrets.
|
|
|
|
## References
|
|
|
|
- `references/clawdie-graphify-integration.md` — concrete Clawdie-AI/Clawdie-ISO integration plan and boundaries.
|
|
- `references/graphify-freebsd-clawdie.md` — FreeBSD package-availability findings, smoke-test commands, and Clawdie ISO/runtime boundary.
|
|
- `references/clawdie-multi-repo-agents-structure.md` — Clawdie's four-repo layout (AI, ISO, Colibri, herdr), AGENTS.md conventions per repo, platform split, and cross-repo update procedures.
|
|
- `references/clawdie-cross-repo-graph-sizes.md` — concrete node/edge counts from the 2026-05-27 Clawdie cross-repo graph build; merge command and cross-repo visibility limitations.
|