clawdie-ai/docs/internal/DEBUG_CHECKLIST.md
Clawdie AI a3e0d8cef3 docs: update SQLite references to PostgreSQL ops database (Sam & Claude)
Split-brain architecture now spans three PostgreSQL databases:
skills (read-only), memory (dynamic), and ops (operational state).
SQLite is fully removed from the runtime.

Updated: README, public docs site, install guide, internal docs,
agent memory, skill files, backup/restore procedure, debug checklist,
and 6 marketing page translations.

---
Build: pass | Tests: not run (Linux)
2026-04-11 14:56:24 +02:00

4.2 KiB

Clawdie Debug Checklist

Known Issues (08.Feb.2026)

1. [FIXED] Resume branches from stale tree position

When agent teams spawns subagent CLI processes, they write to the same session JSONL. On subsequent query() resumes, the CLI reads the JSONL but may pick a stale branch tip (from before the subagent activity), causing the agent's response to land on a branch the host never receives a result for. Fix: pass resumeSessionAt with the last assistant message UUID to explicitly anchor each resume.

2. IDLE_TIMEOUT == JAIL_TIMEOUT (both 30 min)

Both timers fire at the same time, so jails always exit via hard SIGKILL (code 137) instead of graceful _close sentinel shutdown. The idle timeout should be shorter (e.g., 5 min) so jails wind down between messages, while jail timeout stays at 30 min as a safety net for stuck agents.

3. Cursor advanced before agent succeeds

processGroupMessages advances lastAgentTimestamp before the agent runs. If the jail times out, retries find no messages (cursor already past them). Messages are permanently lost on timeout.

Quick Status Check

# 1. Is the service running?
if [ -f clawdie.pid ]; then kill -0 "$(cat clawdie.pid)" && echo running; else echo stopped; fi

# 2. Any running jails?
jls -n | grep clawdie-cp

# 4. Recent errors in service log?
grep -E 'ERROR|WARN' logs/clawdie.log | tail -20

# 5. Is Telegram configured and the process stable?
grep -E 'TELEGRAM_BOT_TOKEN is not set|Connected|connection.*close' logs/clawdie.log | tail -5

# 6. Are groups loaded?
grep 'groupCount' logs/clawdie.log | tail -3

Session Transcript Branching

# Check for concurrent CLI processes in session debug logs
ls -la data/sessions/<group>/.agent/debug/

# Count unique SDK processes that handled messages
# Each .txt file = one CLI subprocess. Multiple = concurrent queries.

# Check parentUuid branching in transcript
python3 -c "
import json, sys
lines = open('data/sessions/<group>/.agent/projects/-workspace-group/<session>.jsonl').read().strip().split('\n')
for i, line in enumerate(lines):
  try:
    d = json.loads(line)
    if d.get('type') == 'user' and d.get('message'):
      parent = d.get('parentUuid', 'ROOT')[:8]
      content = str(d['message'].get('content', ''))[:60]
      print(f'L{i+1} parent={parent} {content}')
  except: pass
"

Jail Timeout Investigation

# Check for recent timeouts
grep -E 'Jail timeout|timed out' logs/clawdie.log | tail -10

# Check jail log files for the timed-out run
ls -lt groups/*/logs/jail-*.log | head -10

# Read the most recent jail log (replace path)
cat groups/<group>/logs/jail-<timestamp>.log

# Check if retries were scheduled and what happened
grep -E 'Scheduling retry|retry|Max retries' logs/clawdie.log | tail -10

Agent Not Responding

# Check if messages are being received from the channel
grep 'New messages' logs/clawdie.log | tail -10

# Check if messages are being processed (jail spawned)
grep -E 'Processing messages|Spawning jail' logs/clawdie.log | tail -10

# Check if messages are being piped to active jail
grep -E 'Piped messages|sendMessage' logs/clawdie.log | tail -10

# Check the queue state — any active jails?
grep -E 'Starting jail|Jail active|concurrency limit' logs/clawdie.log | tail -10

# Check lastAgentTimestamp vs latest message timestamp
psql "$OPS_DB_URL" -c "SELECT chat_jid, MAX(timestamp) as latest FROM messages GROUP BY chat_jid ORDER BY latest DESC LIMIT 5;"

Jail Mount Issues

# Check mount validation logs (shows on jail spawn)
grep -E 'Mount validated|Mount.*REJECTED|mount' logs/clawdie.log | tail -10

# Verify the mount allowlist is readable
cat ~/.config/clawdie-cp/mount-allowlist.json

# Check group's jail config in DB
psql "$OPS_DB_URL" -c "SELECT name, jail_config FROM registered_groups;"

Telegram Auth Issues

# Verify token is set
grep '^TELEGRAM_BOT_TOKEN=' .env

# Re-run Telegram token verification
npm run auth

Service Management

# Restart the service
./stop-clawdie.sh && ./run-clawdie.sh

# View live logs
tail -f logs/clawdie.log

# Stop the service
./stop-clawdie.sh

# Start the service
./run-clawdie.sh

# Rebuild after code changes
npm run build && ./stop-clawdie.sh && ./run-clawdie.sh