layered-soul/skills/freebsd-truss-debug/SKILL.md

---
name: freebsd-truss-debug
description: Debug FreeBSD process failures with truss — trace syscalls to find the exact kernel call that fails (EACCES, ENOENT, etc.).
---

# FreeBSD truss Debugging

`truss` traces every system call a process makes to the kernel. When a command
works from a shell but fails from a daemon/service, `truss` shows exactly which
syscall returns the error and why.

## Quick reference

```sh
# Trace a NEW process (follow children)
sudo truss -f -o /tmp/trace.out command [args]

# Attach to a RUNNING process
sudo truss -f -o /tmp/trace.out -p PID

# Common filters
grep 'ERR#' /tmp/trace.out          # all errors
grep -v 'ERR#2'                     # exclude "No such file" noise
grep 'fork\|rfork\|execve'          # process creation only
grep 'EACCES\|EPERM\|ERR#13'        # permission errors
```

## When to use

Use `truss` when a command works in one context but not another. Common scenarios:

- Daemon (via `daemon(8)` or rc.d) gets EACCES but shell works fine → PATH issue
- Permission denied but `sudo -u <user>` works → staging directory ownership
- "Text file busy" on binary replacement → process still holding the file
- Silent failures with no error message → syscall trace reveals the hidden error

## Walkthrough: debugging a daemon spawn failure

### 1. Start daemon under truss

```sh
sudo service daemon_name stop
sleep 1; sudo rm -f /var/run/socket.sock /tmp/trace.out
sudo truss -f -o /tmp/trace.out \
  env COLIBRI_JAIL_PRIV_MODE=sudo \
  COLIBRI_DAEMON_SOCKET=/var/run/socket.sock \
  COLIBRI_DAEMON_DATA_DIR=/var/db/app \
  /usr/local/bin/daemon-binary &
sleep 3   # wait for socket ready
```

**Important:** pass the daemon's expected env vars explicitly so the trace
captures the real spawn path, not a misconfigured one.

### 2. Trigger the failing operation

```sh
client-command --socket /var/run/socket.sock trigger-failure
sleep 2
```

### 3. Stop and analyze

```sh
sudo pkill daemon-binary; wait
wc -l /tmp/trace.out          # expect hundreds-thousands of lines

# Find the error
grep 'ERR#13\|ERR#1\|EACCES\|EPERM' /tmp/trace.out | grep -v 'ERR#2'

# Find process creation (fork + exec)
grep 'fork\|rfork\|execve' /tmp/trace.out
```

### 4. Interpret

| Pattern | Meaning |
|---------|---------|
| `fork() = ERR#13` | Can't create child process (resource limits?) |
| `execve("/path/to/bin") ERR#13` | Binary exists but can't execute (permissions, MAC) |
| `execve("sudo") ERR#2` | Bare name — PATH doesn't include `/usr/local/bin` |
| `open("/path") ERR#13` | File exists but can't open (ownership, mode) |
| `mkdir("/path") ERR#13` | Parent directory not writable |
| No fork/exec at all | Error happens BEFORE spawn — staging/validation failure |

## Common daemon pitfalls caught by truss

1. **Bare command names**: daemon(8) clears/reorders PATH — `execvp("sudo")` can't find `/usr/local/bin/sudo`. Fix: use absolute paths or a fixed search list.

2. **Staging directory ownership**: daemon runs as unprivileged user but staging path was created by root. Fix: pre-create with correct ownership in bootstrap script.

3. **Orphaned processes holding socket**: `service stop` killed the supervisor but old background daemons still hold the socket. Fix: `ps aux | grep 'daemon: name'` to find all supervisors, kill them all before starting.

4. **Capsicum sandboxing**: if `cap_enter()` appears in the trace, the process entered capability mode and subsequent `open()`/`fork()` calls may fail. Fix: do all setup BEFORE `cap_enter()`.

## ktrace / kdump (alternative)

For long-running processes where `truss` output would be too large:

```sh
# Record
sudo ktrace -f /tmp/ktrace.out -p PID
# ... trigger the bug ...
sudo ktrace -C   # stop tracing

# Read
kdump -f /tmp/ktrace.out | less
kdump -f /tmp/ktrace.out | grep 'fork\|execve\|ERR'
```

`ktrace` writes to a binary file, so it's faster than `truss` for high-throughput
processes. Use `kdump` to decode. Same syscall output, different capture mechanism.
feat(skills): add freebsd-truss-debug — syscall tracing for daemon failures truss traces every kernel call a process makes. Quick reference, full walkthrough (start daemon→trigger→stop→analyze), common daemon pitfalls and their truss signatures, ktrace alternative. Proven debugging colibri-daemon jail-spawn Permission Denied: found bare command names unresolved under daemon(8) PATH and staging directory ownership issues. 2026-06-21 17:38:44 +02:00			`---`
			`name: freebsd-truss-debug`
			`description: Debug FreeBSD process failures with truss — trace syscalls to find the exact kernel call that fails (EACCES, ENOENT, etc.).`
			`---`

			`# FreeBSD truss Debugging`

			`truss` traces every system call a process makes to the kernel. When a command
			works from a shell but fails from a daemon/service, `truss` shows exactly which
			`syscall returns the error and why.`

			`## Quick reference`

			```sh
			`# Trace a NEW process (follow children)`
			`sudo truss -f -o /tmp/trace.out command [args]`

			`# Attach to a RUNNING process`
			`sudo truss -f -o /tmp/trace.out -p PID`

			`# Common filters`
			`grep 'ERR#' /tmp/trace.out # all errors`
			`grep -v 'ERR#2' # exclude "No such file" noise`
			`grep 'fork\\|rfork\\|execve' # process creation only`
			`grep 'EACCES\\|EPERM\\|ERR#13' # permission errors`
			```

			`## When to use`

			Use `truss` when a command works in one context but not another. Common scenarios:

			- Daemon (via `daemon(8)` or rc.d) gets EACCES but shell works fine → PATH issue
			- Permission denied but `sudo -u <user>` works → staging directory ownership
			`- "Text file busy" on binary replacement → process still holding the file`
			`- Silent failures with no error message → syscall trace reveals the hidden error`

			`## Walkthrough: debugging a daemon spawn failure`

			`### 1. Start daemon under truss`

			```sh
			`sudo service daemon_name stop`
			`sleep 1; sudo rm -f /var/run/socket.sock /tmp/trace.out`
			`sudo truss -f -o /tmp/trace.out \`
			`env COLIBRI_JAIL_PRIV_MODE=sudo \`
			`COLIBRI_DAEMON_SOCKET=/var/run/socket.sock \`
			`COLIBRI_DAEMON_DATA_DIR=/var/db/app \`
			`/usr/local/bin/daemon-binary &`
			`sleep 3 # wait for socket ready`
			```

			`Important: pass the daemon's expected env vars explicitly so the trace`
			`captures the real spawn path, not a misconfigured one.`

			`### 2. Trigger the failing operation`

			```sh
			`client-command --socket /var/run/socket.sock trigger-failure`
			`sleep 2`
			```

			`### 3. Stop and analyze`

			```sh
			`sudo pkill daemon-binary; wait`
			`wc -l /tmp/trace.out # expect hundreds-thousands of lines`

			`# Find the error`
			`grep 'ERR#13\\|ERR#1\\|EACCES\\|EPERM' /tmp/trace.out \| grep -v 'ERR#2'`

			`# Find process creation (fork + exec)`
			`grep 'fork\\|rfork\\|execve' /tmp/trace.out`
			```

			`### 4. Interpret`

			`\| Pattern \| Meaning \|`
			`\|---------\|---------\|`
			\| `fork() = ERR#13` \| Can't create child process (resource limits?) \|
			\| `execve("/path/to/bin") ERR#13` \| Binary exists but can't execute (permissions, MAC) \|
			\| `execve("sudo") ERR#2` \| Bare name — PATH doesn't include `/usr/local/bin` \|
			\| `open("/path") ERR#13` \| File exists but can't open (ownership, mode) \|
			\| `mkdir("/path") ERR#13` \| Parent directory not writable \|
			`\| No fork/exec at all \| Error happens BEFORE spawn — staging/validation failure \|`

			`## Common daemon pitfalls caught by truss`

			1. Bare command names: daemon(8) clears/reorders PATH — `execvp("sudo")` can't find `/usr/local/bin/sudo`. Fix: use absolute paths or a fixed search list.

			`2. Staging directory ownership: daemon runs as unprivileged user but staging path was created by root. Fix: pre-create with correct ownership in bootstrap script.`

			3. Orphaned processes holding socket: `service stop` killed the supervisor but old background daemons still hold the socket. Fix: `ps aux \| grep 'daemon: name'` to find all supervisors, kill them all before starting.

			4. Capsicum sandboxing: if `cap_enter()` appears in the trace, the process entered capability mode and subsequent `open()`/`fork()` calls may fail. Fix: do all setup BEFORE `cap_enter()`.

			`## ktrace / kdump (alternative)`

			For long-running processes where `truss` output would be too large:

			```sh
			`# Record`
			`sudo ktrace -f /tmp/ktrace.out -p PID`
			`# ... trigger the bug ...`
			`sudo ktrace -C # stop tracing`

			`# Read`
			`kdump -f /tmp/ktrace.out \| less`
			`kdump -f /tmp/ktrace.out \| grep 'fork\\|execve\\|ERR'`
			```

			`ktrace` writes to a binary file, so it's faster than `truss` for high-throughput
			processes. Use `kdump` to decode. Same syscall output, different capture mechanism.