clawdie-ai/docs/DEBUG_CHECKLIST.md

# Clawdie Debug Checklist

## Known Issues (2026-02-08)

### 1. [FIXED] Resume branches from stale tree position
When agent teams spawns subagent CLI processes, they write to the same session JSONL. On subsequent `query()` resumes, the CLI reads the JSONL but may pick a stale branch tip (from before the subagent activity), causing the agent's response to land on a branch the host never receives a `result` for. **Fix**: pass `resumeSessionAt` with the last assistant message UUID to explicitly anchor each resume.

### 2. IDLE_TIMEOUT == JAIL_TIMEOUT (both 30 min)
Both timers fire at the same time, so jails always exit via hard SIGKILL (code 137) instead of graceful `_close` sentinel shutdown. The idle timeout should be shorter (e.g., 5 min) so jails wind down between messages, while jail timeout stays at 30 min as a safety net for stuck agents.

### 3. Cursor advanced before agent succeeds
`processGroupMessages` advances `lastAgentTimestamp` before the agent runs. If the jail times out, retries find no messages (cursor already past them). Messages are permanently lost on timeout.

## Quick Status Check

```bash
# 1. Is the service running?
if [ -f clawdie.pid ]; then kill -0 "$(cat clawdie.pid)" && echo running; else echo stopped; fi

# 2. Any running jails?
jls -n | grep clawdie

# 4. Recent errors in service log?
grep -E 'ERROR|WARN' logs/clawdie.log | tail -20

# 5. Is Telegram configured and the process stable?
grep -E 'TELEGRAM_BOT_TOKEN is not set|Connected|connection.*close' logs/clawdie.log | tail -5

# 6. Are groups loaded?
grep 'groupCount' logs/clawdie.log | tail -3
```

## Session Transcript Branching

```bash
# Check for concurrent CLI processes in session debug logs
ls -la data/sessions/<group>/.agent/debug/

# Count unique SDK processes that handled messages
# Each .txt file = one CLI subprocess. Multiple = concurrent queries.

# Check parentUuid branching in transcript
python3 -c "
import json, sys
lines = open('data/sessions/<group>/.agent/projects/-workspace-group/<session>.jsonl').read().strip().split('\n')
for i, line in enumerate(lines):
  try:
    d = json.loads(line)
    if d.get('type') == 'user' and d.get('message'):
      parent = d.get('parentUuid', 'ROOT')[:8]
      content = str(d['message'].get('content', ''))[:60]
      print(f'L{i+1} parent={parent} {content}')
  except: pass
"
```

## Jail Timeout Investigation

```bash
# Check for recent timeouts
grep -E 'Jail timeout|timed out' logs/clawdie.log | tail -10

# Check jail log files for the timed-out run
ls -lt groups/*/logs/jail-*.log | head -10

# Read the most recent jail log (replace path)
cat groups/<group>/logs/jail-<timestamp>.log

# Check if retries were scheduled and what happened
grep -E 'Scheduling retry|retry|Max retries' logs/clawdie.log | tail -10
```

## Agent Not Responding

```bash
# Check if messages are being received from the channel
grep 'New messages' logs/clawdie.log | tail -10

# Check if messages are being processed (jail spawned)
grep -E 'Processing messages|Spawning jail' logs/clawdie.log | tail -10

# Check if messages are being piped to active jail
grep -E 'Piped messages|sendMessage' logs/clawdie.log | tail -10

# Check the queue state — any active jails?
grep -E 'Starting jail|Jail active|concurrency limit' logs/clawdie.log | tail -10

# Check lastAgentTimestamp vs latest message timestamp
sqlite3 store/messages.db "SELECT chat_jid, MAX(timestamp) as latest FROM messages GROUP BY chat_jid ORDER BY latest DESC LIMIT 5;"
```

## Jail Mount Issues

```bash
# Check mount validation logs (shows on jail spawn)
grep -E 'Mount validated|Mount.*REJECTED|mount' logs/clawdie.log | tail -10

# Verify the mount allowlist is readable
cat ~/.config/clawdie-cp/mount-allowlist.json

# Check group's jail config in DB
sqlite3 store/messages.db "SELECT name, jail_config FROM registered_groups;"
```

## Telegram Auth Issues

```bash
# Verify token is set
grep '^TELEGRAM_BOT_TOKEN=' .env

# Re-run Telegram token verification
npm run auth
```

## Service Management

```bash
# Restart the service
./stop-clawdie.sh && ./start-clawdie.sh

# View live logs
tail -f logs/clawdie.log

# Stop the service
./stop-clawdie.sh

# Start the service
./start-clawdie.sh

# Rebuild after code changes
npm run build && ./stop-clawdie.sh && ./start-clawdie.sh
```