clawdie-ai/ARCHITECTURE.md

147 lines
8.6 KiB
Markdown
Raw Normal View History

# Architecture Overview
**Last Updated:** 16.apr.2026
Clawdie is a self-hosted AI assistant platform running on FreeBSD. It uses Bastille jails for service isolation, PostgreSQL for all data, and a multi-agent control plane for task orchestration.
## High-Level Layout
```
FreeBSD Host (ZFS)
├── Agent Service (runs as AGENT_NAME user, port 3100)
│ ├── Telegram bot (message intake)
│ ├── HTTP REST API (control plane + health/metrics)
│ ├── Unified scheduler (task routing, heartbeats, budgets)
│ ├── Control plane runner (spawns pi/aider per task)
│ └── Watchdog (health checks, concurrency control)
├── hostd daemon (root, Unix socket)
│ └── Privileged ops: bastille, zfs, pf
├── PostgreSQL 18 (on host by default; db jail is opt-in via DB_RUNTIME=jail)
│ ├── {agent}_ops — tasks, agents, activity, budgets, approvals
│ ├── {agent}_skills — built-in knowledge (read-only artifact)
│ └── {agent}_memory — user/agent dynamic memory, pgvector embeddings
└── Bastille Jails
├── db (.3) — Data Service: PostgreSQL (only when DB_RUNTIME=jail; host is default)
├── cms (.4) — Web Service: nginx + Astro static site
├── git (.6) — Code Service: bare repos + Forgejo (optional)
├── llama-cpp (.5) — Local LLM inference (optional)
├── worker (.101) — General worker jail (legacy)
├── db-worker (.211) — DB Admin agent jail (Phase 7)
├── git-worker (.212) — Git Admin agent jail (Phase 7)
└── ctrl-worker (.213) — Coordinator agent jail (Phase 7)
```
## Agent System
One agent runs per installation. The agent has a name (`AGENT_NAME`, default: `clawdie`) and runs as a FreeBSD service under that user.
### Roles
| Role | Budget | Heartbeat | Purpose |
| ------------ | ------ | --------- | -------------------------------------------- |
| Orchestrator | 80% | On-demand | Primary decision-maker, responds to Telegram |
| Sysadmin | 10% | Daily | System health checks, ZFS, PF, jails |
| DB Admin | 5% | On-demand | PostgreSQL maintenance, migrations |
| Git Admin | 5% | On-demand | Repository management, backups |
Each role has an identity file in `.agent/identities/` that gets injected when the agent spawns for that role.
### Task Flow
```
Telegram message / API request
→ Control plane queues task
→ Scheduler assigns to specialist role
docs: comprehensive doc audit — update 16 files for consistency with codebase Systematic review of all doc/, docs/internal/, docs/public/, ARCHITECTURE.md, and README.md against recent codebase changes. 16 files updated: Cross-cutting fixes (multiple files): - Model references: anthropic/claude-3-5-sonnet → zai/glm-5-turbo (4 files) - Port references: hardcoded 3100 → CONTROLPLANE_API_PORT (3 files) - Skills mechanism: --no-skills + --append-system-prompt + skills_search (6 files) - CONTROLPLANE_SHARED_SECRET: documented in security, architecture, install (5 files) - Prompt guardrails: AGENT_MAX_INBOUND_CHARS etc. added to 3 files - controlplane is NOT a jail — runs on host (3 files corrected) - git jail added to layouts and IP tables (3 files) - npm run → just (2 files) Specific fixes: - .env.example: AGENT_SESSION_MAX_BYTES session rollover hint - README.md: fix IP layout (git=.6 not .4), add run-*.sh generation note - ARCHITECTURE.md: add config vars, recipe count update, --no-skills - doc/CONTROLPLANE-AGENT-ROLES.md: fix model, remove deleted file ref - doc/CONTROLPLANE-ARCHITECTURE.md: port params, security, guardrails section - doc/CONTROLPLANE-MESSAGE-CONTRACT.md: auth header, skills catalog rewrite - doc/SESSION-HANDOFF-2026-04-18.md: fix Telegram (plain text not Markdown) - doc/THREE-BIRD-ARCHITECTURE.md: fix 5 broken STRAPI-FREEBSD-GOTCHA refs - doc/HANDOFF-PHASE7.md: mark sysprompt cleanup as done - docs/internal/DOCUMENTATION.md: just CLI, tracked hooks, parameterized paths - docs/internal/HEARTBEAT.md: add controlplane heartbeat reference, fix setup step - docs/public/architecture/controlplane.md: phases 2-7 all ✅ DONE - docs/public/architecture/freebsd-jail-implementation.md: git jail, Forgejo - docs/public/architecture/warden.md: controlplane=host, git jail added - docs/public/operate/monitoring.md: just doctor, all guardrail vars - docs/public/operate/security.md: API auth, shell injection, guardrails Build: pass | Tests: not run (Linux) (Sam & Claude)
2026-04-18 22:15:50 +02:00
→ Runner spawns pi/aider with role identity + budget + `--no-skills`
→ Agent gets: identity file + skill index + (on FreeBSD) pi extension tools
→ Output captured, activity logged
→ Response routed back to channel
```
### Prompt Assembly
| Context | Source | Frequency | Path |
| ------------------ | ----------------------------------------- | --------- | ---------------------------------------------- |
| Identity | `.agent/identities/{ROLE}.md` + SOUL/USER/IDENTITY files | Per-run | Both (controlplane + telegram) |
| Runtime manifest | `src/runtime-manifest.ts` (repo/skills/capabilities) | Fresh per-message | Injected into main prompt |
| Skill index | `agent/library.yaml` → one-line summaries | Per-run | Controlplane (pi) |
| Profile rules | `src/pi-profile.ts` | Per-run | Telegram only |
| System state | `src/system-state.ts` (live hostd/ZFS/PF) | Per-run | Telegram only |
| Pi extension tools | `.pi/extensions/clawdie-harness/` | Per-run | Telegram only (needs loading for controlplane) |
**Runtime manifest** (`<runtime-manifest>` block):
- Generated fresh from local sources: `.git` config, `agent/library.yaml`, built-in artifact metadata
- Answers: "What repo am I running from? What branch? What skills exist? What specialists can I coordinate?"
- Injected as compact XML-like block (~50 tokens), solves the coherence gap where agent infrastructure facts were invisible to the model
- See `src/runtime-manifest.ts` for implementation
Skills are injected as a compact index (~200 tokens) instead of full content (~15,000+ tokens). Full SKILL.md available on-demand through the `skills_search` extension tool.
### Jail Isolation (Phase 7)
When `CONTROLPLANE_JAIL_ISOLATION=YES`, specialist agents run inside dedicated thin jails. Each jail gets scoped secrets (DB creds for db-worker, SSH keys for git-worker) and restricted network access via PF. Feature flag defaults to `NO`.
Jail agents reach hostd **through the controlplane API** (`POST /api/controlplane/hostd`), not via direct Unix socket. The API authenticates the request and proxies to the hostd daemon. This means no socket mount is needed inside jails — only network access to `CONTROLPLANE_HOST_IP:CONTROLPLANE_API_PORT`.
## Split-Brain Database
All three databases run on the same PostgreSQL 18 instance, each with its own user and permissions:
| Database | Contents | Write Pattern |
| ---------------- | -------------------------------------------------------- | ---------------------- |
| `{agent}_ops` | Tasks, agents, activity log, budgets, approvals, auth | Frequent writes |
| `{agent}_skills` | Preloaded knowledge chunks with pgvector embeddings | Read-only after import |
| `{agent}_memory` | User facts, agent memories, semantic search via pgvector | Moderate writes |
Multiple agents on the same host share the PostgreSQL instance but get their own set of 3 databases (e.g., `clawdie_ops` + `mevy_ops`).
## Configuration
All runtime config comes from `.env` in the project root. Key variables:
docs: comprehensive doc audit — update 16 files for consistency with codebase Systematic review of all doc/, docs/internal/, docs/public/, ARCHITECTURE.md, and README.md against recent codebase changes. 16 files updated: Cross-cutting fixes (multiple files): - Model references: anthropic/claude-3-5-sonnet → zai/glm-5-turbo (4 files) - Port references: hardcoded 3100 → CONTROLPLANE_API_PORT (3 files) - Skills mechanism: --no-skills + --append-system-prompt + skills_search (6 files) - CONTROLPLANE_SHARED_SECRET: documented in security, architecture, install (5 files) - Prompt guardrails: AGENT_MAX_INBOUND_CHARS etc. added to 3 files - controlplane is NOT a jail — runs on host (3 files corrected) - git jail added to layouts and IP tables (3 files) - npm run → just (2 files) Specific fixes: - .env.example: AGENT_SESSION_MAX_BYTES session rollover hint - README.md: fix IP layout (git=.6 not .4), add run-*.sh generation note - ARCHITECTURE.md: add config vars, recipe count update, --no-skills - doc/CONTROLPLANE-AGENT-ROLES.md: fix model, remove deleted file ref - doc/CONTROLPLANE-ARCHITECTURE.md: port params, security, guardrails section - doc/CONTROLPLANE-MESSAGE-CONTRACT.md: auth header, skills catalog rewrite - doc/SESSION-HANDOFF-2026-04-18.md: fix Telegram (plain text not Markdown) - doc/THREE-BIRD-ARCHITECTURE.md: fix 5 broken STRAPI-FREEBSD-GOTCHA refs - doc/HANDOFF-PHASE7.md: mark sysprompt cleanup as done - docs/internal/DOCUMENTATION.md: just CLI, tracked hooks, parameterized paths - docs/internal/HEARTBEAT.md: add controlplane heartbeat reference, fix setup step - docs/public/architecture/controlplane.md: phases 2-7 all ✅ DONE - docs/public/architecture/freebsd-jail-implementation.md: git jail, Forgejo - docs/public/architecture/warden.md: controlplane=host, git jail added - docs/public/operate/monitoring.md: just doctor, all guardrail vars - docs/public/operate/security.md: API auth, shell injection, guardrails Build: pass | Tests: not run (Linux) (Sam & Claude)
2026-04-18 22:15:50 +02:00
| Variable | Purpose | Default |
| ----------------------------- | ------------------------------- | ---------- |
| `AGENT_NAME` | Agent identity | `clawdie` |
| `DB_RUNTIME` | PostgreSQL location | `host` |
docs: comprehensive doc audit — update 16 files for consistency with codebase Systematic review of all doc/, docs/internal/, docs/public/, ARCHITECTURE.md, and README.md against recent codebase changes. 16 files updated: Cross-cutting fixes (multiple files): - Model references: anthropic/claude-3-5-sonnet → zai/glm-5-turbo (4 files) - Port references: hardcoded 3100 → CONTROLPLANE_API_PORT (3 files) - Skills mechanism: --no-skills + --append-system-prompt + skills_search (6 files) - CONTROLPLANE_SHARED_SECRET: documented in security, architecture, install (5 files) - Prompt guardrails: AGENT_MAX_INBOUND_CHARS etc. added to 3 files - controlplane is NOT a jail — runs on host (3 files corrected) - git jail added to layouts and IP tables (3 files) - npm run → just (2 files) Specific fixes: - .env.example: AGENT_SESSION_MAX_BYTES session rollover hint - README.md: fix IP layout (git=.6 not .4), add run-*.sh generation note - ARCHITECTURE.md: add config vars, recipe count update, --no-skills - doc/CONTROLPLANE-AGENT-ROLES.md: fix model, remove deleted file ref - doc/CONTROLPLANE-ARCHITECTURE.md: port params, security, guardrails section - doc/CONTROLPLANE-MESSAGE-CONTRACT.md: auth header, skills catalog rewrite - doc/SESSION-HANDOFF-2026-04-18.md: fix Telegram (plain text not Markdown) - doc/THREE-BIRD-ARCHITECTURE.md: fix 5 broken STRAPI-FREEBSD-GOTCHA refs - doc/HANDOFF-PHASE7.md: mark sysprompt cleanup as done - docs/internal/DOCUMENTATION.md: just CLI, tracked hooks, parameterized paths - docs/internal/HEARTBEAT.md: add controlplane heartbeat reference, fix setup step - docs/public/architecture/controlplane.md: phases 2-7 all ✅ DONE - docs/public/architecture/freebsd-jail-implementation.md: git jail, Forgejo - docs/public/architecture/warden.md: controlplane=host, git jail added - docs/public/operate/monitoring.md: just doctor, all guardrail vars - docs/public/operate/security.md: API auth, shell injection, guardrails Build: pass | Tests: not run (Linux) (Sam & Claude)
2026-04-18 22:15:50 +02:00
| `CONTROLPLANE_JAIL_ISOLATION` | Enable per-specialist jails | `NO` |
| `WARDEN_SUBNET_BASE` | Jail IP subnet | `10.0.0` |
| `CONTROLPLANE_PORT` | API port | `3100` |
| `CONTROLPLANE_SHARED_SECRET` | API auth for agent subprocesses | `` |
| `CONTROLPLANE_BIND_HOST` | API listen address | `0.0.0.0` |
| `AGENT_MAX_INBOUND_CHARS` | Inbound message cap | `12000` |
| `AGENT_SESSION_MAX_BYTES` | Session rollover threshold | `2000000` |
| `PI_TUI_PROVIDER` | LLM provider | (required) |
Secrets (DB passwords, API keys) are generated by `setup/secrets.ts` and stored in `.env`.
## Infrastructure as Code
- `infra/jails.yaml` — Single source of truth for all jail definitions (IPs, packages, services, mounts)
- `setup/bastille-helpers.ts` — Shared provisioner (create, start, install packages, configure services)
- `setup/install.ts` — 20-step install orchestrator with ZFS checkpoints
docs: comprehensive doc audit — update 16 files for consistency with codebase Systematic review of all doc/, docs/internal/, docs/public/, ARCHITECTURE.md, and README.md against recent codebase changes. 16 files updated: Cross-cutting fixes (multiple files): - Model references: anthropic/claude-3-5-sonnet → zai/glm-5-turbo (4 files) - Port references: hardcoded 3100 → CONTROLPLANE_API_PORT (3 files) - Skills mechanism: --no-skills + --append-system-prompt + skills_search (6 files) - CONTROLPLANE_SHARED_SECRET: documented in security, architecture, install (5 files) - Prompt guardrails: AGENT_MAX_INBOUND_CHARS etc. added to 3 files - controlplane is NOT a jail — runs on host (3 files corrected) - git jail added to layouts and IP tables (3 files) - npm run → just (2 files) Specific fixes: - .env.example: AGENT_SESSION_MAX_BYTES session rollover hint - README.md: fix IP layout (git=.6 not .4), add run-*.sh generation note - ARCHITECTURE.md: add config vars, recipe count update, --no-skills - doc/CONTROLPLANE-AGENT-ROLES.md: fix model, remove deleted file ref - doc/CONTROLPLANE-ARCHITECTURE.md: port params, security, guardrails section - doc/CONTROLPLANE-MESSAGE-CONTRACT.md: auth header, skills catalog rewrite - doc/SESSION-HANDOFF-2026-04-18.md: fix Telegram (plain text not Markdown) - doc/THREE-BIRD-ARCHITECTURE.md: fix 5 broken STRAPI-FREEBSD-GOTCHA refs - doc/HANDOFF-PHASE7.md: mark sysprompt cleanup as done - docs/internal/DOCUMENTATION.md: just CLI, tracked hooks, parameterized paths - docs/internal/HEARTBEAT.md: add controlplane heartbeat reference, fix setup step - docs/public/architecture/controlplane.md: phases 2-7 all ✅ DONE - docs/public/architecture/freebsd-jail-implementation.md: git jail, Forgejo - docs/public/architecture/warden.md: controlplane=host, git jail added - docs/public/operate/monitoring.md: just doctor, all guardrail vars - docs/public/operate/security.md: API auth, shell injection, guardrails Build: pass | Tests: not run (Linux) (Sam & Claude)
2026-04-18 22:15:50 +02:00
- `justfile` — CLI front door with 60+ recipes for common operations
## Channels
Messages arrive via Telegram (grammy bot) or HTTP API. The router dispatches to the control plane, which queues tasks and assigns them to specialist agents.
## Documentation Map
| Topic | File |
| ----------------------------- | ----------------------------------------------- |
| Agent development conventions | `AGENTS.md` |
| Contributing guide | `CONTRIBUTING.md` |
| Control plane architecture | `doc/CONTROLPLANE-ARCHITECTURE.md` |
| Agent roles and skills | `doc/CONTROLPLANE-AGENT-ROLES.md` |
| API message contracts | `doc/CONTROLPLANE-MESSAGE-CONTRACT.md` |
| Multi-LLM provider routing | `doc/MULTI-PROVIDER-ARCHITECTURE.md` |
| Docs localization pipeline | `doc/THREE-BIRD-ARCHITECTURE.md` |
| Harness evolution plan | `docs/internal/AGENT-HARNESS-V2.md` |
| Skills architecture | `docs/internal/nanoclaw-architecture-final.md` |
| Install guide | `docs/public/install/install.md` |
| Deployment models | `docs/public/architecture/deployment-models.md` |
| Disaster recovery | `docs/public/operate/db-disaster-recovery.md` |