Rewrite stale PostgreSQL specialist guidance

---
Build: pass | Tests: pass — 2162 passed (630 files)

---
Build: pass | Tests: pass — 2162 passed (630 files)
This commit is contained in:
Operator & Codex 2026-05-05 20:59:53 +02:00
parent 93d35ad95f
commit ae8a4b4e9e
3 changed files with 383 additions and 347 deletions

View file

@ -1,283 +1,282 @@
---
name: debug
description: Debug Clawdie agent issues on FreeBSD. Use when the agent is not responding, service is crashing, Telegram messages go unanswered, pi subprocess fails, or to understand how the runtime works. Covers service management, logs, jail state, memory DB, and common failure modes.
description: Debug Clawdie agent issues on the current FreeBSD host runtime. Use when the agent is not responding, the service is crashing, Telegram messages go unanswered, pi subprocesses fail, or when verifying the live runtime shape. Covers host service management, logs, optional jail state, memory DB reachability, and common failure modes.
---
# Clawdie Agent Debugging (FreeBSD / Bastille)
# Clawdie Agent Debugging (FreeBSD host runtime)
All commands run on the **host (osa)** unless noted otherwise.
All commands run on the host unless explicitly noted otherwise.
## Architecture Recap
## Scope and source of truth
```
osa (host)
└── clawdie-controlplane jail (10.0.1.2)
├── rc.d/clawdie — daemon(8) supervises the agent
├── run-clawdie.sh — wrapper: creates tmux session, execs node
├── dist/index.js — main process (Telegram + scheduler + watchdog)
└── /opt/npm/bin/pi — spawned per-message for LLM responses
└── ZAI/GLM-5.1 via OpenRouter
Use this skill for live runtime diagnosis. Do not use it as the canonical
architecture explainer. For current runtime defaults, prefer:
db jail (10.0.1.3)
└── PostgreSQL: clawdie_brain — long-term memory (81 memories, 2862 chunks)
```
- `src/config.ts`
- `setup/db.ts`
- `docs/internal/POSTGRES-MEMORY.md`
- `just doctor`
## Log Locations
Do not assume any of the following unless current config proves it:
| Log | Path (host = jail nullfs share) | Content |
|-----|----------------------------------|---------|
| **Main app** | `logs/clawdie.log` | Startup, Telegram events, scheduler, watchdog |
| **Main errors** | `logs/clawdie.error.log` | Uncaught exceptions, pino fatal |
| **Per-run pi** | `groups/{folder}/logs/agent-{runId}.log` | Full pi subprocess output |
| **Heartbeat** | `logs/heartbeat.log` | LLM reachability probes |
| **Embed** | `logs/embed-docs.log` | Knowledge embedding runs |
- a `clawdie-controlplane` jail
- `10.0.1.2` / `10.0.1.3`
- `clawdie_brain`
- a hardcoded pi path like `/opt/npm/bin/pi`
## 1. Check Service Status
## Current runtime recap
Default runtime today:
- main service: host rc.d service `clawdie`
- privileged sidecar: host rc.d service `clawdie_hostd`
- main process: `dist/index.js`
- main logs: `logs/clawdie.log`, `logs/clawdie.error.log`
- per-run pi logs: `groups/{folder}/logs/agent-*.log`
- default database mode: `DB_RUNTIME=host`
- default DB host for jails: `${SUBNET_BASE}.1`
Optional / install-specific pieces:
- `DB_RUNTIME=jail` uses the `db` jail role from `infra/jails.yaml`
- current repo jail registry defaults to `10.0.1.0/24`, with `db` on `.5`
- `PI_TUI_BIN` may override the pi path; otherwise use `command -v pi`
## Log locations
| Log | Path | Content |
|-----|------|---------|
| Main app | `logs/clawdie.log` | Startup, Telegram events, scheduler, watchdog |
| Main errors | `logs/clawdie.error.log` | Uncaught exceptions and fatal errors |
| Per-run pi | `groups/{folder}/logs/agent-{runId}.log` | Full pi subprocess output |
| Heartbeat | `logs/heartbeat.log` | Controlplane checks and LLM reachability |
| Embed | `logs/embed-docs.log` | Knowledge embedding runs |
## 1. Check service status
```bash
sudo service clawdie status
sudo service clawdie_hostd status
pgrep -laf 'node.*/dist/index.js'
```
If the install explicitly uses `DB_RUNTIME=jail`, check that jail separately:
```bash
# Is the jail running?
sudo bastille list
# Is the service up inside the jail?
sudo bastille cmd clawdie-controlplane service clawdie status
# Quick process check
sudo bastille cmd clawdie-controlplane pgrep -la node
```
## 2. Read Logs
## 2. Read logs
```bash
# Live tail (works from host — nullfs share)
tail -f logs/clawdie.log
# Last 50 lines of errors
tail -50 logs/clawdie.error.log
# Find per-run pi logs for a specific group
ls groups/*/logs/agent-*.log | tail -5
tail -100 groups/Samo/logs/agent-$(ls -t groups/Samo/logs/ | head -1)
```
## 3. Restart / Stop / Start
When a specific run failed, read the newest per-run log directly:
```bash
# Restart inside jail
sudo bastille cmd clawdie-controlplane service clawdie restart
# Stop (daemon(8) will NOT restart while stopped via service)
sudo bastille cmd clawdie-controlplane service clawdie stop
# Start
sudo bastille cmd clawdie-controlplane service clawdie start
# One-shot start without changing autostart config
sudo bastille cmd clawdie-controlplane service clawdie onestart
ls -t groups/*/logs/agent-*.log | head -3
tail -100 groups/main/logs/agent-<runId>.log
```
## 4. Common Failure Modes
## 3. Restart / stop / start
```bash
sudo service clawdie restart
sudo service clawdie stop
sudo service clawdie start
sudo service clawdie onestart
```
If hostd itself is suspect:
```bash
sudo service clawdie_hostd restart
```
## 4. Common failure modes
### Agent not responding to Telegram
1. Check service is running (step 1)
2. Check Telegram bot token is valid:
```bash
grep TELEGRAM_BOT_TOKEN .env
```
3. Look for grammy errors in main log:
```bash
grep -i 'telegram\|grammy\|409\|403' logs/clawdie.log | tail -20
```
4. **409 Conflict** means two agent instances are running simultaneously — kill the older one:
```bash
sudo bastille cmd clawdie-controlplane pgrep -la node
sudo bastille cmd clawdie-controlplane pkill -f 'node.*index.js'
sudo bastille cmd clawdie-controlplane service clawdie start
```
1. Check the main service is running.
2. Check the bot token exists:
### pi subprocess fails / no response
Check the per-run log:
```bash
# Find most recent run
ls -t groups/*/logs/agent-*.log | head -3
grep '^TELEGRAM_BOT_TOKEN=' .env
```
# Read it
tail -100 groups/Samo/logs/agent-<runId>.log
3. Look for Telegram / Grammy errors:
```bash
grep -i 'telegram\|grammy\|409\|403' logs/clawdie.log | tail -20
```
`409 Conflict` usually means two instances are long-polling at once. Check for
duplicate Node processes before restarting.
### pi subprocess fails or produces no answer
Check the newest per-run log first:
```bash
ls -t groups/*/logs/agent-*.log | head -3
tail -100 groups/main/logs/agent-<runId>.log
```
Then verify the configured pi path rather than assuming one:
```bash
grep '^PI_TUI_BIN=' .env
command -v pi
pi --version
```
Common causes:
- `ZAI_API_KEY` missing or expired — check `.env`
- `PI_TUI_BIN` wrong path — should be `/opt/npm/bin/pi`:
```bash
sudo bastille cmd clawdie-controlplane ls /opt/npm/bin/pi
sudo bastille cmd clawdie-controlplane /opt/npm/bin/pi --version
```
- pi is crashed / OOM — check:
```bash
sudo bastille cmd clawdie-controlplane dmesg | grep -i 'kill\|oom' | tail -5
```
- provider key missing or expired
- `PI_TUI_BIN` points at a dead path
- pi process died under memory pressure
For memory pressure signals:
```bash
dmesg | grep -i 'kill\|oom' | tail -10
```
### Memory DB unreachable
Check the live DB mode and target first:
```bash
# Can the jail reach the db jail?
sudo bastille cmd clawdie-controlplane pg_isready -h 10.0.1.3 -U clawdie_brain
# Quick row count
sudo bastille cmd db psql -U postgres -d clawdie_brain \
-c 'SELECT COUNT(*) FROM memories;'
# Check MEMORY_DB_URL in .env
grep MEMORY_DB_URL .env
grep -E '^(DB_RUNTIME|DB_HOST|MEMORY_DB_URL)=' .env
node -e "import('./dist/config.js').then((m) => console.log(JSON.stringify({ DB_RUNTIME: m.DB_RUNTIME, DB_HOST: m.DB_HOST, MEMORY_DB_NAME: m.MEMORY_DB_NAME }, null, 2)))"
```
### Service keeps crashing (restart loop)
Probe the resolved host:
```bash
pg_isready -h "$(node -e 'import("./dist/config.js").then((m) => process.stdout.write(m.DB_HOST))')" -p 5432
```
If `DB_RUNTIME=jail`, verify the optional db jail exists and use its actual
jail name from `bastille list`:
```bash
sudo bastille list | grep db
sudo bastille cmd <db-jail-name> service postgresql status
```
### Service keeps crashing
Check for rapid restart loops in the log:
`daemon(8)` with `-r` will restart immediately on exit. If you see rapid log entries:
```bash
# Check exit reason
grep -E 'fatal|error|exit|SIGTERM|SIGKILL' logs/clawdie.log | tail -20
# Check if another process holds the Telegram long-poll
grep '409\|duplicate\|conflict' logs/clawdie.log | tail -5
grep '409\|duplicate\|conflict' logs/clawdie.log | tail -10
```
To break the loop temporarily:
```bash
sudo bastille cmd clawdie-controlplane service clawdie stop
# fix the issue
sudo bastille cmd clawdie-controlplane service clawdie start
```
Temporarily stop the service to break the loop, fix the root cause, then
restart.
### Watchdog killing the agent
### Watchdog or controlplane resets the agent
Watchdog auto-restarts the agent if it becomes unresponsive. Check:
```bash
grep -i watchdog logs/clawdie.log | tail -10
grep -i controlplane logs/heartbeat.log | tail -20
```
Watchdog socket: `tmp/ipc/<agent>-watchdog.sock` (inside jail, from project root)
### Build broken after code change
### Build broken after a code change
```bash
cd /home/clawdie/clawdie-ai
npm run build 2>&1 | tail -20
# or typecheck only
npm run typecheck
npm run build
just doctor
```
If the service needs to pick up the new build:
Restart the service after a successful build if the running process needs to
pick up new code.
## 5. Quick diagnostic
Prefer the built-in check first:
```bash
sudo bastille cmd clawdie-controlplane service clawdie restart
just doctor
```
## 5. Quick Diagnostic
Run from `/home/clawdie/clawdie-ai`:
If you need a manual snapshot:
```bash
echo "=== Clawdie Diagnostic ==="
echo -e "\n1. Jail running?"
sudo bastille list | grep clawdie-controlplane
echo -e "\n2. Service status?"
sudo bastille cmd clawdie-controlplane service clawdie status 2>&1
echo -e "\n3. Bot token set?"
grep -q 'TELEGRAM_BOT_TOKEN=.' .env && echo "OK" || echo "MISSING"
echo -e "\n4. ZAI key set?"
grep -q 'ZAI_API_KEY=.' .env && echo "OK" || echo "MISSING"
echo -e "\n5. pi binary present?"
sudo bastille cmd clawdie-controlplane ls /opt/npm/bin/pi 2>/dev/null && echo "OK" || echo "MISSING"
echo -e "\n6. DB reachable?"
sudo bastille cmd clawdie-controlplane pg_isready -h 10.0.1.3 -U clawdie_brain 2>&1
echo -e "\n7. Recent errors?"
tail -5 logs/clawdie.error.log 2>/dev/null || echo "No error log"
echo -e "\n8. Last 5 log lines?"
tail -5 logs/clawdie.log 2>/dev/null || echo "No log yet"
sudo service clawdie status
sudo service clawdie_hostd status
grep -E '^(DB_RUNTIME|DB_HOST|TELEGRAM_BOT_TOKEN|PI_TUI_BIN)=' .env
tail -10 logs/clawdie.log
tail -10 logs/clawdie.error.log
```
## 6. Jail Shell Access
## 6. Optional jail shell access
For interactive debugging inside the jail:
Only for installs that actually use jails for the thing you are debugging:
```bash
# Open a shell inside the jail
sudo bastille console clawdie-controlplane
# Or run a single command
sudo bastille cmd clawdie-controlplane <command>
# Attach to the agent's tmux session (if running)
sudo bastille cmd clawdie-controlplane tmux attach -t clawdie
sudo bastille console db
sudo bastille cmd db service postgresql status
```
## 7. Enable Debug Logging
Do not assume a controlplane jail exists.
Set `LOG_LEVEL=debug` in `.env` then restart:
## 7. Enable debug logging
Set `LOG_LEVEL=debug` in `.env`, then restart:
```bash
grep -v '^LOG_LEVEL=' .env > .env.tmp && echo 'LOG_LEVEL=debug' >> .env.tmp && mv .env.tmp .env
sudo bastille cmd clawdie-controlplane service clawdie restart
sudo service clawdie restart
tail -f logs/clawdie.log
```
Debug level adds: full Telegram message routing, pi spawn args, IPC watcher events, scheduler ticks.
## 8. Metrics
The agent exposes a Prometheus metrics endpoint on port 9100 inside the jail:
The metrics endpoint is exposed by the main runtime:
```bash
curl -s http://10.0.1.2:9100/metrics | head -20
curl -s http://127.0.0.1:9100/metrics | head -20
curl -s http://127.0.0.1:9100/healthz
```
There is also a lightweight liveness probe:
Use `/healthz` only as a listener check. Use `just doctor` for actual runtime
diagnosis.
## 9. Autostart configuration
```bash
curl -s http://10.0.1.2:9100/healthz
sudo sysrc clawdie_enable
sudo sysrc clawdie_enable=AUTO
sudo sysrc clawdie_enable=YES
sudo sysrc clawdie_enable=NONE
```
Use `/healthz` only to confirm the metrics HTTP listener is alive. Use
`just doctor` for real runtime diagnosis.
## 9. Autostart Configuration
Hostd has its own rcvar:
```bash
# Check current setting
sudo bastille cmd clawdie-controlplane sysrc clawdie_enable
# Smart start on boot (recommended)
sudo bastille cmd clawdie-controlplane sysrc clawdie_enable=AUTO
# Always start at boot
sudo bastille cmd clawdie-controlplane sysrc clawdie_enable=YES
# Disable autostart
sudo bastille cmd clawdie-controlplane sysrc clawdie_enable=NONE
sudo sysrc clawdie_hostd_enable
```
## 10. Session State
## 10. Session state
Per-group agent state lives in `groups/{folder}/`:
Per-group state lives in `groups/{folder}/`:
```bash
ls groups/
ls groups/Samo/logs/ # pi run logs
ls groups/Samo/ipc/ # IPC files (if any)
ls groups/main/logs/
ls groups/main/ipc/ 2>/dev/null
```
To clear a group's run logs (harmless, they're just debug output):
Run logs are disposable debugging artifacts:
```bash
rm -f groups/Samo/logs/agent-*.log
rm -f groups/main/logs/agent-*.log
```
Sessions are managed by pi internally — do not delete `groups/` while the agent is running.
Do not delete active session files or the whole `groups/` tree while the agent
is running.

View file

@ -1,78 +1,88 @@
---
name: postgres-memory
description: Install and validate a local PostgreSQL 18 memory database with pgvector for Clawdie in a dedicated FreeBSD jail. Use when creating the `db` jail, installing PostgreSQL 18, enabling pgvector and required contrib extensions, validating service startup, or planning local long-term memory storage on ZFS-backed FreeBSD infrastructure.
description: Plan or validate the optional PostgreSQL 18 data-service jail for Clawdie. Use only when an install explicitly chooses `DB_RUNTIME=jail`; the default runtime is host PostgreSQL.
---
# postgres-memory
Use this skill for the first local memory database bring-up.
Use this skill only for the optional `DB_RUNTIME=jail` path.
This skill is intentionally narrow:
Do not use it as the general explanation for "how memory works" or "how
PostgreSQL is installed" on a normal host. The current default is
`DB_RUNTIME=host`.
- jail name: `db`
- hostname: `db.<agent>.home.arpa`
- runtime: FreeBSD jail
- provisioning: `thick`
- networking: `vnet`
- PostgreSQL version: `18`
- purpose: split-brain PostgreSQL backend
- `pgvector`: part of the standard proven build
## Current truth
- default runtime: host PostgreSQL
- optional runtime: `DB_RUNTIME=jail`
- jail role label: `db`
- on-disk jail name: current service-prefixed db jail (for example `clawdie-db`)
- current jail registry default:
- subnet base: `10.0.1`
- db IP suffix: `.5`
- bridge: `warden0`
- required packages/extensions:
- PostgreSQL `18`
- `pgvector`
- `pgcrypto`
- `uuid-ossp`
Do not reuse the old `.3`, `10.0.0.x`, or `10.0.1.3` examples.
Resolve the live jail IP from current registry/config instead of hardcoding it.
## Scope
This skill covers:
- install planning for the `db` jail
- PostgreSQL-specific ZFS assumptions
- install-time profile choices
- restore-from-dump planning
- PostgreSQL 18 package install
- service initialization and startup
- first validation commands
- enabling `pgcrypto`, `uuid-ossp`, and `vector`
- optional `db` jail planning and bring-up
- PostgreSQL-specific jail assumptions
- package install and service initialization
- extension enablement
- validation commands for the optional jail path
- ZFS snapshot points before schema work
- the repeatable `db-memory-bootstrap.yaml` Ansible path
This skill does not cover:
- the default host PostgreSQL path
- general memory architecture explanation
- memory schema design
- embedding model selection
- Supabase compatibility layers
- migration of old v2 shell scripts
## Install-time choices
## Naming model
Treat these as explicit deployment questions:
The PostgreSQL instance may host both shared platform DBs and tenant-derived
DBs.
1. resource profile
- `minimal` = `1G RAM / 10G / 1 vCPU`
- `balanced` = `2G RAM / 15G / 1 vCPU`
Shared platform defaults:
2. restore mode
- `empty`
- `restore-from-dump`
- `system_brain`
- `system_skills`
- `system_ops`
- `system_git`
- `system_web`
- read-only skills role: `system_reader`
Recommended default for first install:
- resource profile: `minimal`
- restore mode: `empty`
Tenant names are derived from `dbSlug(tenantId)` in `src/db-identifiers.ts`.
Do not hardcode tenant DB names in this skill; use the helpers.
## Workflow
1. Ensure the `db` jail exists and starts cleanly.
2. Validate the ZFS assumptions before installing PostgreSQL.
3. Install PostgreSQL 18 inside the jail.
4. Initialize the database cluster.
5. Start the PostgreSQL service.
6. Validate local connectivity.
7. Install `postgresql18-contrib` and enable required extensions.
8. Snapshot the jail dataset before schema work.
1. Confirm the install explicitly chose `DB_RUNTIME=jail`.
2. Resolve the `db` jail IP from current registry/config.
3. Ensure the `db` jail exists and starts cleanly.
4. Validate ZFS assumptions before installing PostgreSQL.
5. Install PostgreSQL 18 inside the jail.
6. Initialize the database cluster.
7. Start the PostgreSQL service.
8. Validate local connectivity.
9. Install `postgresql18-contrib` and enable required extensions.
10. Snapshot the jail dataset before schema work.
## Restore path
Database restore should be designed into deployment from the start.
Target future flow:
Design restore into the optional jail deployment from the start:
1. create jail
2. install PostgreSQL
@ -81,6 +91,21 @@ Target future flow:
5. validate
6. snapshot
## Automation note
There is no current canonical Ansible playbook for this path.
The live automation source of truth is:
- `setup/db.ts`
- `setup/jail-provision.ts`
- `src/jail-schema.ts`
- `infra/jails.yaml`
If this optional jail path becomes important again, a future playbook should be
derived from those files rather than from older `db-memory-bootstrap.yaml`
references.
## Read next
- Install commands: `references/install.md`
@ -89,22 +114,6 @@ Target future flow:
- Failure signatures: `references/troubleshooting.md`
- Security and access control: `references/security.md`
## Ansible handoff
When the manual PostgreSQL 18 bring-up has been proven and should become
repeatable, hand off to:
- `infra/ansible/playbooks/db-memory-bootstrap.yaml`
That playbook should be treated as the canonical automation path for:
- PostgreSQL 18 package install
- `postgresql18-contrib` for `pgcrypto` and `uuid-ossp`
- `postgresql18-pgvector` for `vector`
- split-brain role creation from `SKILLS_DB_USER`, `MEMORY_DB_USER`, and `strapi_cms`
- `data` directory config updates
- first validation after restart
## Scripts
- `scripts/render_install_commands.sh`

View file

@ -1,91 +1,80 @@
# PostgreSQL Memory Plan
This document defines the PostgreSQL memory database architecture for Clawdie.
This document defines the PostgreSQL-backed memory and platform database plan
for Clawdie.
## Decision
Default: dedicated FreeBSD jail named `${AGENT_NAME}-db`.
Optional: host-based PostgreSQL when `DB_RUNTIME=host` is set in `.env`.
Default: host PostgreSQL with `DB_RUNTIME=host`.
Optional: dedicated FreeBSD jail for the Data Service with `DB_RUNTIME=jail`.
Both paths run:
- PostgreSQL `18`
- `pgvector`
- `pgcrypto` and `uuid-ossp`
- `pgcrypto`
- `uuid-ossp`
The database is **mandatory**. Clawdie will not start without a healthy connection to the memory database.
The database is mandatory. Clawdie does not run without a healthy PostgreSQL
backend for memory and operational state.
Canonical jail identity (DB_RUNTIME=jail):
## Current runtime defaults
- jail: `${AGENT_NAME}-db` (e.g. `clawdie-db`)
- hostname: `db.${AGENT_INTERNAL_DOMAIN}` (e.g. `db.clawdie.home.arpa`)
- provisioning: `thick`
- networking: `vnet`
- IP: `${SUBNET_BASE}.3` (e.g. `10.0.0.3`)
Host runtime (`DB_RUNTIME=host`) is the current default:
This is the preferred local memory backend over trying to reproduce a full local Supabase stack immediately.
- jails reach PostgreSQL on `${SUBNET_BASE}.1`
- `DB_HOST` resolves to the host address unless explicitly overridden
- ZFS datasets:
- `zroot/${ZFS_PREFIX}/pgdata``/var/db/postgres/data`
- `zroot/${ZFS_PREFIX}/pgwal``/var/db/postgres/wal`
Host runtime (DB_RUNTIME=host) uses ZFS datasets:
Optional jail runtime (`DB_RUNTIME=jail`) uses the current jail registry:
- `zroot/${ZFS_PREFIX}/pgdata``/var/db/postgres/data`
- `zroot/${ZFS_PREFIX}/pgwal``/var/db/postgres/wal`
- role label: `db`
- on-disk jail name: current service-prefixed db jail (for example `clawdie-db`)
- hostname: `db.${AGENT_INTERNAL_DOMAIN}`
- current repo registry default:
- subnet base: `10.0.1`
- gateway: `10.0.1.1`
- db IP suffix: `.5`
- bridge: `warden0`
Use `DB_COMPRESSION=lz4` (default) or `DB_COMPRESSION=zstd` to tune dataset compression.
Do not hardcode old `.3` or `10.0.0.x` examples. Resolve the live jail address
from `infra/jails.yaml` plus env overrides.
## Why
## Why host PostgreSQL is the default
- native FreeBSD packages
- good fit for ZFS-backed jails
- one database can hold relational memory data and vectors
- lower operational complexity than a Linux VM or multiple specialized services
- lower operational complexity than a separate db jail on every install
- simpler upgrade and restart path
- fewer moving parts during onboarding and recovery
- still compatible with ZFS-backed storage and PostgreSQL extensions
## Thick vs Thin
The optional jail path still exists for installs that explicitly want DB
isolation or a service-jail layout.
Use a thick jail for `db`.
## Thick vs thin
If the optional db jail is used, keep it thick.
Reason:
- database is a persistent service, not an ephemeral worker
- upgrades and rollback should be self-contained
- host coupling should be minimized
- ZFS snapshots are more useful when the jail shape is stable
- database state is persistent, not ephemeral
- rollback and snapshots should remain self-contained
- database upgrades should not depend on a thin base being in lockstep
## Initial scope
First installation milestone:
1. create or prepare the `db` jail
1. provision host PostgreSQL or the optional `db` jail
2. install PostgreSQL 18
3. enable `allow.sysvipc` for the jail
4. initialize and start the service
5. install `postgresql18-contrib`
6. enable `pgcrypto`, `uuid-ossp`, and `vector`
7. validate local access
8. snapshot
3. initialize and start the service
4. install `postgresql18-contrib`
5. enable `pgcrypto`, `uuid-ossp`, and `vector`
6. validate local access
7. snapshot before schema work
## Deployment path
Fixed defaults — no install-time questions. The db jail is auto-created by `npm run setup -- --step db`.
Current deployment path:
- network: `vnet`
- IP: `${SUBNET_BASE}.3` (e.g. `10.0.0.3`)
- bridge: `warden0`
- gateway: `${SUBNET_BASE}.1` (e.g. `10.0.0.1`)
Canonical create command for that path:
```sh
sudo bastille create -T -B -g 10.0.0.1 clawdie-db 15.0-RELEASE 10.0.0.3/24 warden0
```
If a VNET `db` jail comes up without a `default` route, treat that as a provisioning defect:
- the create path is missing the explicit `-g 10.0.0.1` flag
- fix the create command rather than adding the route manually and forgetting the root cause
Another required jail-side prerequisite discovered during real bring-up:
If the optional db jail path is used, also enable:
```sh
sudo bastille config db set allow.sysvipc 1
@ -94,20 +83,44 @@ sudo bastille restart db
Without that, `service postgresql initdb` can fail with shared-memory errors.
## Deployment path
Current canonical automation lives in:
- `setup/db.ts`
- `setup/jail-provision.ts`
- `src/jail-schema.ts`
- `infra/jails.yaml`
The default path is host PostgreSQL.
If `DB_RUNTIME=jail` is chosen, the current repo-default jail create shape is:
- network: `vnet`
- IP: `10.0.1.5`
- bridge: `warden0`
- gateway: `10.0.1.1`
Canonical example for the current registry default:
```sh
sudo bastille create -T -B -g 10.0.1.1 clawdie-db 15.0-RELEASE 10.0.1.5/24 warden0
```
Treat missing default route or bad addressing as a provisioning defect. Fix the
registry/create path rather than layering manual one-off network fixes.
## Restore path
The deployment design should support restore from the start.
The deployment design should support restore from the start:
Target future flow:
1. create `db`
1. provision host PostgreSQL or create `db`
2. install PostgreSQL 18
3. enable `allow.sysvipc`
4. initialize cluster
5. enable extensions
6. optionally restore from `.sql` or PostgreSQL custom dump
7. validate
8. snapshot
3. initialize cluster
4. enable extensions
5. optionally restore from `.sql` or PostgreSQL custom dump
6. validate
7. snapshot
## Baseline resources
@ -117,7 +130,7 @@ Target future flow:
## Snapshot points
- `@fresh`
- `@postgres17-ready`
- `@postgres18-ready`
- `@pre-schema`
- `@post-extensions`
@ -128,48 +141,63 @@ Two different decisions matter here:
1. `ashift`
2. dataset properties
`ashift` is a pool/vdev decision and cannot be changed later. For modern 4 KiB devices, the expected value is usually `12`.
`ashift` is a pool/vdev decision and cannot be changed later. For modern
4 KiB devices, the expected value is usually `12`.
Dataset-level settings can still be tuned later. Conservative starting settings for PostgreSQL data are:
Conservative starting settings for PostgreSQL data:
- `compression=lz4`
- `atime=off`
- `recordsize=16K`
## Split-brain architecture
## Database families and naming model
Three databases in one jail — all mandatory:
PostgreSQL hosts both shared platform DBs and tenant-derived DBs.
| Database | Role | Lifecycle |
| ------------------- | ------------------------------------------------------------ | ------------------------------------------------ |
| Agent System Skills | Preloaded read-only skills, install docs, operator workflows | Updated by pulling a new versioned release |
| User/Agent Memory | Dynamic conversation memory, user preferences, agent context | Grows with use; follows its own backup lifecycle |
| Operational State | Messages, tasks, sessions, routing, registered groups | High-frequency read/write from message router |
Shared platform DBs use the `system_*` prefix:
## Current role split
- `system_brain`
- `system_skills`
- `system_ops`
- `system_git`
- `system_web`
- PostgreSQL (Agent System Skills): preloaded read-only knowledge, chunked + embedded before release
- PostgreSQL (User/Agent Memory): long-term memory backend with hybrid search (full-text + vector)
- PostgreSQL (Operational State): real-time message routing, task scheduling, session tracking
Shared roles include:
## Databases
- `system_brain`
- `system_reader`
- `system_ops`
- `system_git`
- `system_web`
Agent System Skills:
Tenant DB names are derived from `dbSlug(tenantId)` in `src/db-identifiers.ts`.
Examples:
- Name: `${AGENT_NAME}_skills` (e.g. `clawdie_skills`)
- Role: `${AGENT_NAME}_reader`
- Access: read-only at runtime
- Retrieval: PostgreSQL full-text search by default, with optional vector use later
- `<slug>_brain`
- `<slug>_skills`
- `<slug>_ops`
- `<slug>_forgejo`
User/Agent Memory:
Do not derive DB names from `ASSISTANT_NAME`.
- Name: `${AGENT_NAME}_brain` (e.g. `clawdie_brain`)
- Role: `${AGENT_NAME}_brain`
- Retrieval: hybrid memory search over the memory backend
## Split-brain responsibilities
## Schema
Three core data families drive Clawdie's runtime behavior:
The schema consists of three layers:
| Family | Default shared DB | Purpose |
|--------|-------------------|---------|
| Agent System Skills | `system_skills` | Preloaded read-only skills, install docs, operator workflows |
| User / Agent Memory | `system_brain` | Dynamic conversation memory, preferences, compaction summaries |
| Operational State | `system_ops` | Messages, tasks, sessions, routing, registered groups |
Additional platform service DBs also live in PostgreSQL when enabled:
- `system_git`
- `system_web`
## Memory schema
The memory schema consists of three layers:
1. `memories` — base table (session summaries, metadata)
2. `memory_chunks` — chunked text with full-text search
@ -178,19 +206,19 @@ The schema consists of three layers:
See:
- [POSTGRES-HYBRID-MEMORY.md](POSTGRES-HYBRID-MEMORY.md)
- [clawdie-brain-hybrid-upgrade.sql](sql/clawdie-brain-hybrid-upgrade.sql)
- [pgvector-install-log.md](pgvector-install-log.md)
## Next snapshot policy step
Once `db` is stable, extend snapshot policy to the other persistent service
jails with lighter retention than the database:
Once PostgreSQL is stable, extend snapshot policy to other persistent services
with lighter retention than the database:
- `db`: strongest retention as `critical-data`
- `git`: moderate retention as `persistent-service`
- `cms`: moderate retention as `persistent-service`
- database: strongest retention as `critical_data`
- git service: moderate retention as `persistent_service`
- web service: moderate retention as `persistent_service`
## Validation Note
## Validation note
The PostgreSQL 18 + `pgvector` installation path documented here was validated on
`09.mar.2026` by the operator and Codex in the `db` jail on the current FreeBSD host.
The PostgreSQL 18 + `pgvector` path has been validated in both host-runtime and
service-jail-oriented development flows. Current repo defaults, however, are
host-first. Any doc or skill that treats the db jail as the default is stale.