Rewrite stale PostgreSQL specialist guidance
--- Build: pass | Tests: pass — 2162 passed (630 files) --- Build: pass | Tests: pass — 2162 passed (630 files)
This commit is contained in:
parent
93d35ad95f
commit
ae8a4b4e9e
3 changed files with 383 additions and 347 deletions
|
|
@ -1,283 +1,282 @@
|
|||
---
|
||||
name: debug
|
||||
description: Debug Clawdie agent issues on FreeBSD. Use when the agent is not responding, service is crashing, Telegram messages go unanswered, pi subprocess fails, or to understand how the runtime works. Covers service management, logs, jail state, memory DB, and common failure modes.
|
||||
description: Debug Clawdie agent issues on the current FreeBSD host runtime. Use when the agent is not responding, the service is crashing, Telegram messages go unanswered, pi subprocesses fail, or when verifying the live runtime shape. Covers host service management, logs, optional jail state, memory DB reachability, and common failure modes.
|
||||
---
|
||||
|
||||
# Clawdie Agent Debugging (FreeBSD / Bastille)
|
||||
# Clawdie Agent Debugging (FreeBSD host runtime)
|
||||
|
||||
All commands run on the **host (osa)** unless noted otherwise.
|
||||
All commands run on the host unless explicitly noted otherwise.
|
||||
|
||||
## Architecture Recap
|
||||
## Scope and source of truth
|
||||
|
||||
```
|
||||
osa (host)
|
||||
└── clawdie-controlplane jail (10.0.1.2)
|
||||
├── rc.d/clawdie — daemon(8) supervises the agent
|
||||
├── run-clawdie.sh — wrapper: creates tmux session, execs node
|
||||
├── dist/index.js — main process (Telegram + scheduler + watchdog)
|
||||
└── /opt/npm/bin/pi — spawned per-message for LLM responses
|
||||
└── ZAI/GLM-5.1 via OpenRouter
|
||||
Use this skill for live runtime diagnosis. Do not use it as the canonical
|
||||
architecture explainer. For current runtime defaults, prefer:
|
||||
|
||||
db jail (10.0.1.3)
|
||||
└── PostgreSQL: clawdie_brain — long-term memory (81 memories, 2862 chunks)
|
||||
```
|
||||
- `src/config.ts`
|
||||
- `setup/db.ts`
|
||||
- `docs/internal/POSTGRES-MEMORY.md`
|
||||
- `just doctor`
|
||||
|
||||
## Log Locations
|
||||
Do not assume any of the following unless current config proves it:
|
||||
|
||||
| Log | Path (host = jail nullfs share) | Content |
|
||||
|-----|----------------------------------|---------|
|
||||
| **Main app** | `logs/clawdie.log` | Startup, Telegram events, scheduler, watchdog |
|
||||
| **Main errors** | `logs/clawdie.error.log` | Uncaught exceptions, pino fatal |
|
||||
| **Per-run pi** | `groups/{folder}/logs/agent-{runId}.log` | Full pi subprocess output |
|
||||
| **Heartbeat** | `logs/heartbeat.log` | LLM reachability probes |
|
||||
| **Embed** | `logs/embed-docs.log` | Knowledge embedding runs |
|
||||
- a `clawdie-controlplane` jail
|
||||
- `10.0.1.2` / `10.0.1.3`
|
||||
- `clawdie_brain`
|
||||
- a hardcoded pi path like `/opt/npm/bin/pi`
|
||||
|
||||
## 1. Check Service Status
|
||||
## Current runtime recap
|
||||
|
||||
Default runtime today:
|
||||
|
||||
- main service: host rc.d service `clawdie`
|
||||
- privileged sidecar: host rc.d service `clawdie_hostd`
|
||||
- main process: `dist/index.js`
|
||||
- main logs: `logs/clawdie.log`, `logs/clawdie.error.log`
|
||||
- per-run pi logs: `groups/{folder}/logs/agent-*.log`
|
||||
- default database mode: `DB_RUNTIME=host`
|
||||
- default DB host for jails: `${SUBNET_BASE}.1`
|
||||
|
||||
Optional / install-specific pieces:
|
||||
|
||||
- `DB_RUNTIME=jail` uses the `db` jail role from `infra/jails.yaml`
|
||||
- current repo jail registry defaults to `10.0.1.0/24`, with `db` on `.5`
|
||||
- `PI_TUI_BIN` may override the pi path; otherwise use `command -v pi`
|
||||
|
||||
## Log locations
|
||||
|
||||
| Log | Path | Content |
|
||||
|-----|------|---------|
|
||||
| Main app | `logs/clawdie.log` | Startup, Telegram events, scheduler, watchdog |
|
||||
| Main errors | `logs/clawdie.error.log` | Uncaught exceptions and fatal errors |
|
||||
| Per-run pi | `groups/{folder}/logs/agent-{runId}.log` | Full pi subprocess output |
|
||||
| Heartbeat | `logs/heartbeat.log` | Controlplane checks and LLM reachability |
|
||||
| Embed | `logs/embed-docs.log` | Knowledge embedding runs |
|
||||
|
||||
## 1. Check service status
|
||||
|
||||
```bash
|
||||
sudo service clawdie status
|
||||
sudo service clawdie_hostd status
|
||||
pgrep -laf 'node.*/dist/index.js'
|
||||
```
|
||||
|
||||
If the install explicitly uses `DB_RUNTIME=jail`, check that jail separately:
|
||||
|
||||
```bash
|
||||
# Is the jail running?
|
||||
sudo bastille list
|
||||
|
||||
# Is the service up inside the jail?
|
||||
sudo bastille cmd clawdie-controlplane service clawdie status
|
||||
|
||||
# Quick process check
|
||||
sudo bastille cmd clawdie-controlplane pgrep -la node
|
||||
```
|
||||
|
||||
## 2. Read Logs
|
||||
## 2. Read logs
|
||||
|
||||
```bash
|
||||
# Live tail (works from host — nullfs share)
|
||||
tail -f logs/clawdie.log
|
||||
|
||||
# Last 50 lines of errors
|
||||
tail -50 logs/clawdie.error.log
|
||||
|
||||
# Find per-run pi logs for a specific group
|
||||
ls groups/*/logs/agent-*.log | tail -5
|
||||
tail -100 groups/Samo/logs/agent-$(ls -t groups/Samo/logs/ | head -1)
|
||||
```
|
||||
|
||||
## 3. Restart / Stop / Start
|
||||
When a specific run failed, read the newest per-run log directly:
|
||||
|
||||
```bash
|
||||
# Restart inside jail
|
||||
sudo bastille cmd clawdie-controlplane service clawdie restart
|
||||
|
||||
# Stop (daemon(8) will NOT restart while stopped via service)
|
||||
sudo bastille cmd clawdie-controlplane service clawdie stop
|
||||
|
||||
# Start
|
||||
sudo bastille cmd clawdie-controlplane service clawdie start
|
||||
|
||||
# One-shot start without changing autostart config
|
||||
sudo bastille cmd clawdie-controlplane service clawdie onestart
|
||||
ls -t groups/*/logs/agent-*.log | head -3
|
||||
tail -100 groups/main/logs/agent-<runId>.log
|
||||
```
|
||||
|
||||
## 4. Common Failure Modes
|
||||
## 3. Restart / stop / start
|
||||
|
||||
```bash
|
||||
sudo service clawdie restart
|
||||
sudo service clawdie stop
|
||||
sudo service clawdie start
|
||||
sudo service clawdie onestart
|
||||
```
|
||||
|
||||
If hostd itself is suspect:
|
||||
|
||||
```bash
|
||||
sudo service clawdie_hostd restart
|
||||
```
|
||||
|
||||
## 4. Common failure modes
|
||||
|
||||
### Agent not responding to Telegram
|
||||
|
||||
1. Check service is running (step 1)
|
||||
2. Check Telegram bot token is valid:
|
||||
```bash
|
||||
grep TELEGRAM_BOT_TOKEN .env
|
||||
```
|
||||
3. Look for grammy errors in main log:
|
||||
```bash
|
||||
grep -i 'telegram\|grammy\|409\|403' logs/clawdie.log | tail -20
|
||||
```
|
||||
4. **409 Conflict** means two agent instances are running simultaneously — kill the older one:
|
||||
```bash
|
||||
sudo bastille cmd clawdie-controlplane pgrep -la node
|
||||
sudo bastille cmd clawdie-controlplane pkill -f 'node.*index.js'
|
||||
sudo bastille cmd clawdie-controlplane service clawdie start
|
||||
```
|
||||
1. Check the main service is running.
|
||||
2. Check the bot token exists:
|
||||
|
||||
### pi subprocess fails / no response
|
||||
|
||||
Check the per-run log:
|
||||
```bash
|
||||
# Find most recent run
|
||||
ls -t groups/*/logs/agent-*.log | head -3
|
||||
grep '^TELEGRAM_BOT_TOKEN=' .env
|
||||
```
|
||||
|
||||
# Read it
|
||||
tail -100 groups/Samo/logs/agent-<runId>.log
|
||||
3. Look for Telegram / Grammy errors:
|
||||
|
||||
```bash
|
||||
grep -i 'telegram\|grammy\|409\|403' logs/clawdie.log | tail -20
|
||||
```
|
||||
|
||||
`409 Conflict` usually means two instances are long-polling at once. Check for
|
||||
duplicate Node processes before restarting.
|
||||
|
||||
### pi subprocess fails or produces no answer
|
||||
|
||||
Check the newest per-run log first:
|
||||
|
||||
```bash
|
||||
ls -t groups/*/logs/agent-*.log | head -3
|
||||
tail -100 groups/main/logs/agent-<runId>.log
|
||||
```
|
||||
|
||||
Then verify the configured pi path rather than assuming one:
|
||||
|
||||
```bash
|
||||
grep '^PI_TUI_BIN=' .env
|
||||
command -v pi
|
||||
pi --version
|
||||
```
|
||||
|
||||
Common causes:
|
||||
- `ZAI_API_KEY` missing or expired — check `.env`
|
||||
- `PI_TUI_BIN` wrong path — should be `/opt/npm/bin/pi`:
|
||||
```bash
|
||||
sudo bastille cmd clawdie-controlplane ls /opt/npm/bin/pi
|
||||
sudo bastille cmd clawdie-controlplane /opt/npm/bin/pi --version
|
||||
```
|
||||
- pi is crashed / OOM — check:
|
||||
```bash
|
||||
sudo bastille cmd clawdie-controlplane dmesg | grep -i 'kill\|oom' | tail -5
|
||||
```
|
||||
|
||||
- provider key missing or expired
|
||||
- `PI_TUI_BIN` points at a dead path
|
||||
- pi process died under memory pressure
|
||||
|
||||
For memory pressure signals:
|
||||
|
||||
```bash
|
||||
dmesg | grep -i 'kill\|oom' | tail -10
|
||||
```
|
||||
|
||||
### Memory DB unreachable
|
||||
|
||||
Check the live DB mode and target first:
|
||||
|
||||
```bash
|
||||
# Can the jail reach the db jail?
|
||||
sudo bastille cmd clawdie-controlplane pg_isready -h 10.0.1.3 -U clawdie_brain
|
||||
|
||||
# Quick row count
|
||||
sudo bastille cmd db psql -U postgres -d clawdie_brain \
|
||||
-c 'SELECT COUNT(*) FROM memories;'
|
||||
|
||||
# Check MEMORY_DB_URL in .env
|
||||
grep MEMORY_DB_URL .env
|
||||
grep -E '^(DB_RUNTIME|DB_HOST|MEMORY_DB_URL)=' .env
|
||||
node -e "import('./dist/config.js').then((m) => console.log(JSON.stringify({ DB_RUNTIME: m.DB_RUNTIME, DB_HOST: m.DB_HOST, MEMORY_DB_NAME: m.MEMORY_DB_NAME }, null, 2)))"
|
||||
```
|
||||
|
||||
### Service keeps crashing (restart loop)
|
||||
Probe the resolved host:
|
||||
|
||||
```bash
|
||||
pg_isready -h "$(node -e 'import("./dist/config.js").then((m) => process.stdout.write(m.DB_HOST))')" -p 5432
|
||||
```
|
||||
|
||||
If `DB_RUNTIME=jail`, verify the optional db jail exists and use its actual
|
||||
jail name from `bastille list`:
|
||||
|
||||
```bash
|
||||
sudo bastille list | grep db
|
||||
sudo bastille cmd <db-jail-name> service postgresql status
|
||||
```
|
||||
|
||||
### Service keeps crashing
|
||||
|
||||
Check for rapid restart loops in the log:
|
||||
|
||||
`daemon(8)` with `-r` will restart immediately on exit. If you see rapid log entries:
|
||||
```bash
|
||||
# Check exit reason
|
||||
grep -E 'fatal|error|exit|SIGTERM|SIGKILL' logs/clawdie.log | tail -20
|
||||
|
||||
# Check if another process holds the Telegram long-poll
|
||||
grep '409\|duplicate\|conflict' logs/clawdie.log | tail -5
|
||||
grep '409\|duplicate\|conflict' logs/clawdie.log | tail -10
|
||||
```
|
||||
|
||||
To break the loop temporarily:
|
||||
```bash
|
||||
sudo bastille cmd clawdie-controlplane service clawdie stop
|
||||
# fix the issue
|
||||
sudo bastille cmd clawdie-controlplane service clawdie start
|
||||
```
|
||||
Temporarily stop the service to break the loop, fix the root cause, then
|
||||
restart.
|
||||
|
||||
### Watchdog killing the agent
|
||||
### Watchdog or controlplane resets the agent
|
||||
|
||||
Watchdog auto-restarts the agent if it becomes unresponsive. Check:
|
||||
```bash
|
||||
grep -i watchdog logs/clawdie.log | tail -10
|
||||
grep -i controlplane logs/heartbeat.log | tail -20
|
||||
```
|
||||
|
||||
Watchdog socket: `tmp/ipc/<agent>-watchdog.sock` (inside jail, from project root)
|
||||
|
||||
### Build broken after code change
|
||||
### Build broken after a code change
|
||||
|
||||
```bash
|
||||
cd /home/clawdie/clawdie-ai
|
||||
npm run build 2>&1 | tail -20
|
||||
# or typecheck only
|
||||
npm run typecheck
|
||||
npm run build
|
||||
just doctor
|
||||
```
|
||||
|
||||
If the service needs to pick up the new build:
|
||||
Restart the service after a successful build if the running process needs to
|
||||
pick up new code.
|
||||
|
||||
## 5. Quick diagnostic
|
||||
|
||||
Prefer the built-in check first:
|
||||
|
||||
```bash
|
||||
sudo bastille cmd clawdie-controlplane service clawdie restart
|
||||
just doctor
|
||||
```
|
||||
|
||||
## 5. Quick Diagnostic
|
||||
|
||||
Run from `/home/clawdie/clawdie-ai`:
|
||||
If you need a manual snapshot:
|
||||
|
||||
```bash
|
||||
echo "=== Clawdie Diagnostic ==="
|
||||
|
||||
echo -e "\n1. Jail running?"
|
||||
sudo bastille list | grep clawdie-controlplane
|
||||
|
||||
echo -e "\n2. Service status?"
|
||||
sudo bastille cmd clawdie-controlplane service clawdie status 2>&1
|
||||
|
||||
echo -e "\n3. Bot token set?"
|
||||
grep -q 'TELEGRAM_BOT_TOKEN=.' .env && echo "OK" || echo "MISSING"
|
||||
|
||||
echo -e "\n4. ZAI key set?"
|
||||
grep -q 'ZAI_API_KEY=.' .env && echo "OK" || echo "MISSING"
|
||||
|
||||
echo -e "\n5. pi binary present?"
|
||||
sudo bastille cmd clawdie-controlplane ls /opt/npm/bin/pi 2>/dev/null && echo "OK" || echo "MISSING"
|
||||
|
||||
echo -e "\n6. DB reachable?"
|
||||
sudo bastille cmd clawdie-controlplane pg_isready -h 10.0.1.3 -U clawdie_brain 2>&1
|
||||
|
||||
echo -e "\n7. Recent errors?"
|
||||
tail -5 logs/clawdie.error.log 2>/dev/null || echo "No error log"
|
||||
|
||||
echo -e "\n8. Last 5 log lines?"
|
||||
tail -5 logs/clawdie.log 2>/dev/null || echo "No log yet"
|
||||
sudo service clawdie status
|
||||
sudo service clawdie_hostd status
|
||||
grep -E '^(DB_RUNTIME|DB_HOST|TELEGRAM_BOT_TOKEN|PI_TUI_BIN)=' .env
|
||||
tail -10 logs/clawdie.log
|
||||
tail -10 logs/clawdie.error.log
|
||||
```
|
||||
|
||||
## 6. Jail Shell Access
|
||||
## 6. Optional jail shell access
|
||||
|
||||
For interactive debugging inside the jail:
|
||||
Only for installs that actually use jails for the thing you are debugging:
|
||||
|
||||
```bash
|
||||
# Open a shell inside the jail
|
||||
sudo bastille console clawdie-controlplane
|
||||
|
||||
# Or run a single command
|
||||
sudo bastille cmd clawdie-controlplane <command>
|
||||
|
||||
# Attach to the agent's tmux session (if running)
|
||||
sudo bastille cmd clawdie-controlplane tmux attach -t clawdie
|
||||
sudo bastille console db
|
||||
sudo bastille cmd db service postgresql status
|
||||
```
|
||||
|
||||
## 7. Enable Debug Logging
|
||||
Do not assume a controlplane jail exists.
|
||||
|
||||
Set `LOG_LEVEL=debug` in `.env` then restart:
|
||||
## 7. Enable debug logging
|
||||
|
||||
Set `LOG_LEVEL=debug` in `.env`, then restart:
|
||||
|
||||
```bash
|
||||
grep -v '^LOG_LEVEL=' .env > .env.tmp && echo 'LOG_LEVEL=debug' >> .env.tmp && mv .env.tmp .env
|
||||
sudo bastille cmd clawdie-controlplane service clawdie restart
|
||||
sudo service clawdie restart
|
||||
tail -f logs/clawdie.log
|
||||
```
|
||||
|
||||
Debug level adds: full Telegram message routing, pi spawn args, IPC watcher events, scheduler ticks.
|
||||
|
||||
## 8. Metrics
|
||||
|
||||
The agent exposes a Prometheus metrics endpoint on port 9100 inside the jail:
|
||||
The metrics endpoint is exposed by the main runtime:
|
||||
|
||||
```bash
|
||||
curl -s http://10.0.1.2:9100/metrics | head -20
|
||||
curl -s http://127.0.0.1:9100/metrics | head -20
|
||||
curl -s http://127.0.0.1:9100/healthz
|
||||
```
|
||||
|
||||
There is also a lightweight liveness probe:
|
||||
Use `/healthz` only as a listener check. Use `just doctor` for actual runtime
|
||||
diagnosis.
|
||||
|
||||
## 9. Autostart configuration
|
||||
|
||||
```bash
|
||||
curl -s http://10.0.1.2:9100/healthz
|
||||
sudo sysrc clawdie_enable
|
||||
sudo sysrc clawdie_enable=AUTO
|
||||
sudo sysrc clawdie_enable=YES
|
||||
sudo sysrc clawdie_enable=NONE
|
||||
```
|
||||
|
||||
Use `/healthz` only to confirm the metrics HTTP listener is alive. Use
|
||||
`just doctor` for real runtime diagnosis.
|
||||
|
||||
## 9. Autostart Configuration
|
||||
Hostd has its own rcvar:
|
||||
|
||||
```bash
|
||||
# Check current setting
|
||||
sudo bastille cmd clawdie-controlplane sysrc clawdie_enable
|
||||
|
||||
# Smart start on boot (recommended)
|
||||
sudo bastille cmd clawdie-controlplane sysrc clawdie_enable=AUTO
|
||||
|
||||
# Always start at boot
|
||||
sudo bastille cmd clawdie-controlplane sysrc clawdie_enable=YES
|
||||
|
||||
# Disable autostart
|
||||
sudo bastille cmd clawdie-controlplane sysrc clawdie_enable=NONE
|
||||
sudo sysrc clawdie_hostd_enable
|
||||
```
|
||||
|
||||
## 10. Session State
|
||||
## 10. Session state
|
||||
|
||||
Per-group agent state lives in `groups/{folder}/`:
|
||||
Per-group state lives in `groups/{folder}/`:
|
||||
|
||||
```bash
|
||||
ls groups/
|
||||
ls groups/Samo/logs/ # pi run logs
|
||||
ls groups/Samo/ipc/ # IPC files (if any)
|
||||
ls groups/main/logs/
|
||||
ls groups/main/ipc/ 2>/dev/null
|
||||
```
|
||||
|
||||
To clear a group's run logs (harmless, they're just debug output):
|
||||
Run logs are disposable debugging artifacts:
|
||||
|
||||
```bash
|
||||
rm -f groups/Samo/logs/agent-*.log
|
||||
rm -f groups/main/logs/agent-*.log
|
||||
```
|
||||
|
||||
Sessions are managed by pi internally — do not delete `groups/` while the agent is running.
|
||||
Do not delete active session files or the whole `groups/` tree while the agent
|
||||
is running.
|
||||
|
|
|
|||
|
|
@ -1,78 +1,88 @@
|
|||
---
|
||||
name: postgres-memory
|
||||
description: Install and validate a local PostgreSQL 18 memory database with pgvector for Clawdie in a dedicated FreeBSD jail. Use when creating the `db` jail, installing PostgreSQL 18, enabling pgvector and required contrib extensions, validating service startup, or planning local long-term memory storage on ZFS-backed FreeBSD infrastructure.
|
||||
description: Plan or validate the optional PostgreSQL 18 data-service jail for Clawdie. Use only when an install explicitly chooses `DB_RUNTIME=jail`; the default runtime is host PostgreSQL.
|
||||
---
|
||||
|
||||
# postgres-memory
|
||||
|
||||
Use this skill for the first local memory database bring-up.
|
||||
Use this skill only for the optional `DB_RUNTIME=jail` path.
|
||||
|
||||
This skill is intentionally narrow:
|
||||
Do not use it as the general explanation for "how memory works" or "how
|
||||
PostgreSQL is installed" on a normal host. The current default is
|
||||
`DB_RUNTIME=host`.
|
||||
|
||||
- jail name: `db`
|
||||
- hostname: `db.<agent>.home.arpa`
|
||||
- runtime: FreeBSD jail
|
||||
- provisioning: `thick`
|
||||
- networking: `vnet`
|
||||
- PostgreSQL version: `18`
|
||||
- purpose: split-brain PostgreSQL backend
|
||||
- `pgvector`: part of the standard proven build
|
||||
## Current truth
|
||||
|
||||
- default runtime: host PostgreSQL
|
||||
- optional runtime: `DB_RUNTIME=jail`
|
||||
- jail role label: `db`
|
||||
- on-disk jail name: current service-prefixed db jail (for example `clawdie-db`)
|
||||
- current jail registry default:
|
||||
- subnet base: `10.0.1`
|
||||
- db IP suffix: `.5`
|
||||
- bridge: `warden0`
|
||||
- required packages/extensions:
|
||||
- PostgreSQL `18`
|
||||
- `pgvector`
|
||||
- `pgcrypto`
|
||||
- `uuid-ossp`
|
||||
|
||||
Do not reuse the old `.3`, `10.0.0.x`, or `10.0.1.3` examples.
|
||||
|
||||
Resolve the live jail IP from current registry/config instead of hardcoding it.
|
||||
|
||||
## Scope
|
||||
|
||||
This skill covers:
|
||||
|
||||
- install planning for the `db` jail
|
||||
- PostgreSQL-specific ZFS assumptions
|
||||
- install-time profile choices
|
||||
- restore-from-dump planning
|
||||
- PostgreSQL 18 package install
|
||||
- service initialization and startup
|
||||
- first validation commands
|
||||
- enabling `pgcrypto`, `uuid-ossp`, and `vector`
|
||||
- optional `db` jail planning and bring-up
|
||||
- PostgreSQL-specific jail assumptions
|
||||
- package install and service initialization
|
||||
- extension enablement
|
||||
- validation commands for the optional jail path
|
||||
- ZFS snapshot points before schema work
|
||||
- the repeatable `db-memory-bootstrap.yaml` Ansible path
|
||||
|
||||
This skill does not cover:
|
||||
|
||||
- the default host PostgreSQL path
|
||||
- general memory architecture explanation
|
||||
- memory schema design
|
||||
- embedding model selection
|
||||
- Supabase compatibility layers
|
||||
- migration of old v2 shell scripts
|
||||
|
||||
## Install-time choices
|
||||
## Naming model
|
||||
|
||||
Treat these as explicit deployment questions:
|
||||
The PostgreSQL instance may host both shared platform DBs and tenant-derived
|
||||
DBs.
|
||||
|
||||
1. resource profile
|
||||
- `minimal` = `1G RAM / 10G / 1 vCPU`
|
||||
- `balanced` = `2G RAM / 15G / 1 vCPU`
|
||||
Shared platform defaults:
|
||||
|
||||
2. restore mode
|
||||
- `empty`
|
||||
- `restore-from-dump`
|
||||
- `system_brain`
|
||||
- `system_skills`
|
||||
- `system_ops`
|
||||
- `system_git`
|
||||
- `system_web`
|
||||
- read-only skills role: `system_reader`
|
||||
|
||||
Recommended default for first install:
|
||||
|
||||
- resource profile: `minimal`
|
||||
- restore mode: `empty`
|
||||
Tenant names are derived from `dbSlug(tenantId)` in `src/db-identifiers.ts`.
|
||||
Do not hardcode tenant DB names in this skill; use the helpers.
|
||||
|
||||
## Workflow
|
||||
|
||||
1. Ensure the `db` jail exists and starts cleanly.
|
||||
2. Validate the ZFS assumptions before installing PostgreSQL.
|
||||
3. Install PostgreSQL 18 inside the jail.
|
||||
4. Initialize the database cluster.
|
||||
5. Start the PostgreSQL service.
|
||||
6. Validate local connectivity.
|
||||
7. Install `postgresql18-contrib` and enable required extensions.
|
||||
8. Snapshot the jail dataset before schema work.
|
||||
1. Confirm the install explicitly chose `DB_RUNTIME=jail`.
|
||||
2. Resolve the `db` jail IP from current registry/config.
|
||||
3. Ensure the `db` jail exists and starts cleanly.
|
||||
4. Validate ZFS assumptions before installing PostgreSQL.
|
||||
5. Install PostgreSQL 18 inside the jail.
|
||||
6. Initialize the database cluster.
|
||||
7. Start the PostgreSQL service.
|
||||
8. Validate local connectivity.
|
||||
9. Install `postgresql18-contrib` and enable required extensions.
|
||||
10. Snapshot the jail dataset before schema work.
|
||||
|
||||
## Restore path
|
||||
|
||||
Database restore should be designed into deployment from the start.
|
||||
|
||||
Target future flow:
|
||||
Design restore into the optional jail deployment from the start:
|
||||
|
||||
1. create jail
|
||||
2. install PostgreSQL
|
||||
|
|
@ -81,6 +91,21 @@ Target future flow:
|
|||
5. validate
|
||||
6. snapshot
|
||||
|
||||
## Automation note
|
||||
|
||||
There is no current canonical Ansible playbook for this path.
|
||||
|
||||
The live automation source of truth is:
|
||||
|
||||
- `setup/db.ts`
|
||||
- `setup/jail-provision.ts`
|
||||
- `src/jail-schema.ts`
|
||||
- `infra/jails.yaml`
|
||||
|
||||
If this optional jail path becomes important again, a future playbook should be
|
||||
derived from those files rather than from older `db-memory-bootstrap.yaml`
|
||||
references.
|
||||
|
||||
## Read next
|
||||
|
||||
- Install commands: `references/install.md`
|
||||
|
|
@ -89,22 +114,6 @@ Target future flow:
|
|||
- Failure signatures: `references/troubleshooting.md`
|
||||
- Security and access control: `references/security.md`
|
||||
|
||||
## Ansible handoff
|
||||
|
||||
When the manual PostgreSQL 18 bring-up has been proven and should become
|
||||
repeatable, hand off to:
|
||||
|
||||
- `infra/ansible/playbooks/db-memory-bootstrap.yaml`
|
||||
|
||||
That playbook should be treated as the canonical automation path for:
|
||||
|
||||
- PostgreSQL 18 package install
|
||||
- `postgresql18-contrib` for `pgcrypto` and `uuid-ossp`
|
||||
- `postgresql18-pgvector` for `vector`
|
||||
- split-brain role creation from `SKILLS_DB_USER`, `MEMORY_DB_USER`, and `strapi_cms`
|
||||
- `data` directory config updates
|
||||
- first validation after restart
|
||||
|
||||
## Scripts
|
||||
|
||||
- `scripts/render_install_commands.sh`
|
||||
|
|
|
|||
|
|
@ -1,91 +1,80 @@
|
|||
# PostgreSQL Memory Plan
|
||||
|
||||
This document defines the PostgreSQL memory database architecture for Clawdie.
|
||||
This document defines the PostgreSQL-backed memory and platform database plan
|
||||
for Clawdie.
|
||||
|
||||
## Decision
|
||||
|
||||
Default: dedicated FreeBSD jail named `${AGENT_NAME}-db`.
|
||||
Optional: host-based PostgreSQL when `DB_RUNTIME=host` is set in `.env`.
|
||||
Default: host PostgreSQL with `DB_RUNTIME=host`.
|
||||
Optional: dedicated FreeBSD jail for the Data Service with `DB_RUNTIME=jail`.
|
||||
|
||||
Both paths run:
|
||||
|
||||
- PostgreSQL `18`
|
||||
- `pgvector`
|
||||
- `pgcrypto` and `uuid-ossp`
|
||||
- `pgcrypto`
|
||||
- `uuid-ossp`
|
||||
|
||||
The database is **mandatory**. Clawdie will not start without a healthy connection to the memory database.
|
||||
The database is mandatory. Clawdie does not run without a healthy PostgreSQL
|
||||
backend for memory and operational state.
|
||||
|
||||
Canonical jail identity (DB_RUNTIME=jail):
|
||||
## Current runtime defaults
|
||||
|
||||
- jail: `${AGENT_NAME}-db` (e.g. `clawdie-db`)
|
||||
- hostname: `db.${AGENT_INTERNAL_DOMAIN}` (e.g. `db.clawdie.home.arpa`)
|
||||
- provisioning: `thick`
|
||||
- networking: `vnet`
|
||||
- IP: `${SUBNET_BASE}.3` (e.g. `10.0.0.3`)
|
||||
Host runtime (`DB_RUNTIME=host`) is the current default:
|
||||
|
||||
This is the preferred local memory backend over trying to reproduce a full local Supabase stack immediately.
|
||||
- jails reach PostgreSQL on `${SUBNET_BASE}.1`
|
||||
- `DB_HOST` resolves to the host address unless explicitly overridden
|
||||
- ZFS datasets:
|
||||
- `zroot/${ZFS_PREFIX}/pgdata` → `/var/db/postgres/data`
|
||||
- `zroot/${ZFS_PREFIX}/pgwal` → `/var/db/postgres/wal`
|
||||
|
||||
Host runtime (DB_RUNTIME=host) uses ZFS datasets:
|
||||
Optional jail runtime (`DB_RUNTIME=jail`) uses the current jail registry:
|
||||
|
||||
- `zroot/${ZFS_PREFIX}/pgdata` → `/var/db/postgres/data`
|
||||
- `zroot/${ZFS_PREFIX}/pgwal` → `/var/db/postgres/wal`
|
||||
- role label: `db`
|
||||
- on-disk jail name: current service-prefixed db jail (for example `clawdie-db`)
|
||||
- hostname: `db.${AGENT_INTERNAL_DOMAIN}`
|
||||
- current repo registry default:
|
||||
- subnet base: `10.0.1`
|
||||
- gateway: `10.0.1.1`
|
||||
- db IP suffix: `.5`
|
||||
- bridge: `warden0`
|
||||
|
||||
Use `DB_COMPRESSION=lz4` (default) or `DB_COMPRESSION=zstd` to tune dataset compression.
|
||||
Do not hardcode old `.3` or `10.0.0.x` examples. Resolve the live jail address
|
||||
from `infra/jails.yaml` plus env overrides.
|
||||
|
||||
## Why
|
||||
## Why host PostgreSQL is the default
|
||||
|
||||
- native FreeBSD packages
|
||||
- good fit for ZFS-backed jails
|
||||
- one database can hold relational memory data and vectors
|
||||
- lower operational complexity than a Linux VM or multiple specialized services
|
||||
- lower operational complexity than a separate db jail on every install
|
||||
- simpler upgrade and restart path
|
||||
- fewer moving parts during onboarding and recovery
|
||||
- still compatible with ZFS-backed storage and PostgreSQL extensions
|
||||
|
||||
## Thick vs Thin
|
||||
The optional jail path still exists for installs that explicitly want DB
|
||||
isolation or a service-jail layout.
|
||||
|
||||
Use a thick jail for `db`.
|
||||
## Thick vs thin
|
||||
|
||||
If the optional db jail is used, keep it thick.
|
||||
|
||||
Reason:
|
||||
|
||||
- database is a persistent service, not an ephemeral worker
|
||||
- upgrades and rollback should be self-contained
|
||||
- host coupling should be minimized
|
||||
- ZFS snapshots are more useful when the jail shape is stable
|
||||
- database state is persistent, not ephemeral
|
||||
- rollback and snapshots should remain self-contained
|
||||
- database upgrades should not depend on a thin base being in lockstep
|
||||
|
||||
## Initial scope
|
||||
|
||||
First installation milestone:
|
||||
|
||||
1. create or prepare the `db` jail
|
||||
1. provision host PostgreSQL or the optional `db` jail
|
||||
2. install PostgreSQL 18
|
||||
3. enable `allow.sysvipc` for the jail
|
||||
4. initialize and start the service
|
||||
5. install `postgresql18-contrib`
|
||||
6. enable `pgcrypto`, `uuid-ossp`, and `vector`
|
||||
7. validate local access
|
||||
8. snapshot
|
||||
3. initialize and start the service
|
||||
4. install `postgresql18-contrib`
|
||||
5. enable `pgcrypto`, `uuid-ossp`, and `vector`
|
||||
6. validate local access
|
||||
7. snapshot before schema work
|
||||
|
||||
## Deployment path
|
||||
|
||||
Fixed defaults — no install-time questions. The db jail is auto-created by `npm run setup -- --step db`.
|
||||
|
||||
Current deployment path:
|
||||
|
||||
- network: `vnet`
|
||||
- IP: `${SUBNET_BASE}.3` (e.g. `10.0.0.3`)
|
||||
- bridge: `warden0`
|
||||
- gateway: `${SUBNET_BASE}.1` (e.g. `10.0.0.1`)
|
||||
|
||||
Canonical create command for that path:
|
||||
|
||||
```sh
|
||||
sudo bastille create -T -B -g 10.0.0.1 clawdie-db 15.0-RELEASE 10.0.0.3/24 warden0
|
||||
```
|
||||
|
||||
If a VNET `db` jail comes up without a `default` route, treat that as a provisioning defect:
|
||||
|
||||
- the create path is missing the explicit `-g 10.0.0.1` flag
|
||||
- fix the create command rather than adding the route manually and forgetting the root cause
|
||||
|
||||
Another required jail-side prerequisite discovered during real bring-up:
|
||||
If the optional db jail path is used, also enable:
|
||||
|
||||
```sh
|
||||
sudo bastille config db set allow.sysvipc 1
|
||||
|
|
@ -94,20 +83,44 @@ sudo bastille restart db
|
|||
|
||||
Without that, `service postgresql initdb` can fail with shared-memory errors.
|
||||
|
||||
## Deployment path
|
||||
|
||||
Current canonical automation lives in:
|
||||
|
||||
- `setup/db.ts`
|
||||
- `setup/jail-provision.ts`
|
||||
- `src/jail-schema.ts`
|
||||
- `infra/jails.yaml`
|
||||
|
||||
The default path is host PostgreSQL.
|
||||
|
||||
If `DB_RUNTIME=jail` is chosen, the current repo-default jail create shape is:
|
||||
|
||||
- network: `vnet`
|
||||
- IP: `10.0.1.5`
|
||||
- bridge: `warden0`
|
||||
- gateway: `10.0.1.1`
|
||||
|
||||
Canonical example for the current registry default:
|
||||
|
||||
```sh
|
||||
sudo bastille create -T -B -g 10.0.1.1 clawdie-db 15.0-RELEASE 10.0.1.5/24 warden0
|
||||
```
|
||||
|
||||
Treat missing default route or bad addressing as a provisioning defect. Fix the
|
||||
registry/create path rather than layering manual one-off network fixes.
|
||||
|
||||
## Restore path
|
||||
|
||||
The deployment design should support restore from the start.
|
||||
The deployment design should support restore from the start:
|
||||
|
||||
Target future flow:
|
||||
|
||||
1. create `db`
|
||||
1. provision host PostgreSQL or create `db`
|
||||
2. install PostgreSQL 18
|
||||
3. enable `allow.sysvipc`
|
||||
4. initialize cluster
|
||||
5. enable extensions
|
||||
6. optionally restore from `.sql` or PostgreSQL custom dump
|
||||
7. validate
|
||||
8. snapshot
|
||||
3. initialize cluster
|
||||
4. enable extensions
|
||||
5. optionally restore from `.sql` or PostgreSQL custom dump
|
||||
6. validate
|
||||
7. snapshot
|
||||
|
||||
## Baseline resources
|
||||
|
||||
|
|
@ -117,7 +130,7 @@ Target future flow:
|
|||
## Snapshot points
|
||||
|
||||
- `@fresh`
|
||||
- `@postgres17-ready`
|
||||
- `@postgres18-ready`
|
||||
- `@pre-schema`
|
||||
- `@post-extensions`
|
||||
|
||||
|
|
@ -128,48 +141,63 @@ Two different decisions matter here:
|
|||
1. `ashift`
|
||||
2. dataset properties
|
||||
|
||||
`ashift` is a pool/vdev decision and cannot be changed later. For modern 4 KiB devices, the expected value is usually `12`.
|
||||
`ashift` is a pool/vdev decision and cannot be changed later. For modern
|
||||
4 KiB devices, the expected value is usually `12`.
|
||||
|
||||
Dataset-level settings can still be tuned later. Conservative starting settings for PostgreSQL data are:
|
||||
Conservative starting settings for PostgreSQL data:
|
||||
|
||||
- `compression=lz4`
|
||||
- `atime=off`
|
||||
- `recordsize=16K`
|
||||
|
||||
## Split-brain architecture
|
||||
## Database families and naming model
|
||||
|
||||
Three databases in one jail — all mandatory:
|
||||
PostgreSQL hosts both shared platform DBs and tenant-derived DBs.
|
||||
|
||||
| Database | Role | Lifecycle |
|
||||
| ------------------- | ------------------------------------------------------------ | ------------------------------------------------ |
|
||||
| Agent System Skills | Preloaded read-only skills, install docs, operator workflows | Updated by pulling a new versioned release |
|
||||
| User/Agent Memory | Dynamic conversation memory, user preferences, agent context | Grows with use; follows its own backup lifecycle |
|
||||
| Operational State | Messages, tasks, sessions, routing, registered groups | High-frequency read/write from message router |
|
||||
Shared platform DBs use the `system_*` prefix:
|
||||
|
||||
## Current role split
|
||||
- `system_brain`
|
||||
- `system_skills`
|
||||
- `system_ops`
|
||||
- `system_git`
|
||||
- `system_web`
|
||||
|
||||
- PostgreSQL (Agent System Skills): preloaded read-only knowledge, chunked + embedded before release
|
||||
- PostgreSQL (User/Agent Memory): long-term memory backend with hybrid search (full-text + vector)
|
||||
- PostgreSQL (Operational State): real-time message routing, task scheduling, session tracking
|
||||
Shared roles include:
|
||||
|
||||
## Databases
|
||||
- `system_brain`
|
||||
- `system_reader`
|
||||
- `system_ops`
|
||||
- `system_git`
|
||||
- `system_web`
|
||||
|
||||
Agent System Skills:
|
||||
Tenant DB names are derived from `dbSlug(tenantId)` in `src/db-identifiers.ts`.
|
||||
Examples:
|
||||
|
||||
- Name: `${AGENT_NAME}_skills` (e.g. `clawdie_skills`)
|
||||
- Role: `${AGENT_NAME}_reader`
|
||||
- Access: read-only at runtime
|
||||
- Retrieval: PostgreSQL full-text search by default, with optional vector use later
|
||||
- `<slug>_brain`
|
||||
- `<slug>_skills`
|
||||
- `<slug>_ops`
|
||||
- `<slug>_forgejo`
|
||||
|
||||
User/Agent Memory:
|
||||
Do not derive DB names from `ASSISTANT_NAME`.
|
||||
|
||||
- Name: `${AGENT_NAME}_brain` (e.g. `clawdie_brain`)
|
||||
- Role: `${AGENT_NAME}_brain`
|
||||
- Retrieval: hybrid memory search over the memory backend
|
||||
## Split-brain responsibilities
|
||||
|
||||
## Schema
|
||||
Three core data families drive Clawdie's runtime behavior:
|
||||
|
||||
The schema consists of three layers:
|
||||
| Family | Default shared DB | Purpose |
|
||||
|--------|-------------------|---------|
|
||||
| Agent System Skills | `system_skills` | Preloaded read-only skills, install docs, operator workflows |
|
||||
| User / Agent Memory | `system_brain` | Dynamic conversation memory, preferences, compaction summaries |
|
||||
| Operational State | `system_ops` | Messages, tasks, sessions, routing, registered groups |
|
||||
|
||||
Additional platform service DBs also live in PostgreSQL when enabled:
|
||||
|
||||
- `system_git`
|
||||
- `system_web`
|
||||
|
||||
## Memory schema
|
||||
|
||||
The memory schema consists of three layers:
|
||||
|
||||
1. `memories` — base table (session summaries, metadata)
|
||||
2. `memory_chunks` — chunked text with full-text search
|
||||
|
|
@ -178,19 +206,19 @@ The schema consists of three layers:
|
|||
See:
|
||||
|
||||
- [POSTGRES-HYBRID-MEMORY.md](POSTGRES-HYBRID-MEMORY.md)
|
||||
- [clawdie-brain-hybrid-upgrade.sql](sql/clawdie-brain-hybrid-upgrade.sql)
|
||||
- [pgvector-install-log.md](pgvector-install-log.md)
|
||||
|
||||
## Next snapshot policy step
|
||||
|
||||
Once `db` is stable, extend snapshot policy to the other persistent service
|
||||
jails with lighter retention than the database:
|
||||
Once PostgreSQL is stable, extend snapshot policy to other persistent services
|
||||
with lighter retention than the database:
|
||||
|
||||
- `db`: strongest retention as `critical-data`
|
||||
- `git`: moderate retention as `persistent-service`
|
||||
- `cms`: moderate retention as `persistent-service`
|
||||
- database: strongest retention as `critical_data`
|
||||
- git service: moderate retention as `persistent_service`
|
||||
- web service: moderate retention as `persistent_service`
|
||||
|
||||
## Validation Note
|
||||
## Validation note
|
||||
|
||||
The PostgreSQL 18 + `pgvector` installation path documented here was validated on
|
||||
`09.mar.2026` by the operator and Codex in the `db` jail on the current FreeBSD host.
|
||||
The PostgreSQL 18 + `pgvector` path has been validated in both host-runtime and
|
||||
service-jail-oriented development flows. Current repo defaults, however, are
|
||||
host-first. Any doc or skill that treats the db jail as the default is stale.
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue