layered-soul/skills/freebsd-truss-debug/SKILL.md
Sam & Claude 7888132d4a feat(skills): add freebsd-truss-debug — syscall tracing for daemon failures
truss traces every kernel call a process makes. Quick reference,
full walkthrough (start daemon→trigger→stop→analyze), common
daemon pitfalls and their truss signatures, ktrace alternative.

Proven debugging colibri-daemon jail-spawn Permission Denied:
found bare command names unresolved under daemon(8) PATH and
staging directory ownership issues.
2026-06-21 17:38:44 +02:00

3.9 KiB

name description
freebsd-truss-debug Debug FreeBSD process failures with truss — trace syscalls to find the exact kernel call that fails (EACCES, ENOENT, etc.).

FreeBSD truss Debugging

truss traces every system call a process makes to the kernel. When a command works from a shell but fails from a daemon/service, truss shows exactly which syscall returns the error and why.

Quick reference

# Trace a NEW process (follow children)
sudo truss -f -o /tmp/trace.out command [args]

# Attach to a RUNNING process
sudo truss -f -o /tmp/trace.out -p PID

# Common filters
grep 'ERR#' /tmp/trace.out          # all errors
grep -v 'ERR#2'                     # exclude "No such file" noise
grep 'fork\|rfork\|execve'          # process creation only
grep 'EACCES\|EPERM\|ERR#13'        # permission errors

When to use

Use truss when a command works in one context but not another. Common scenarios:

  • Daemon (via daemon(8) or rc.d) gets EACCES but shell works fine → PATH issue
  • Permission denied but sudo -u <user> works → staging directory ownership
  • "Text file busy" on binary replacement → process still holding the file
  • Silent failures with no error message → syscall trace reveals the hidden error

Walkthrough: debugging a daemon spawn failure

1. Start daemon under truss

sudo service daemon_name stop
sleep 1; sudo rm -f /var/run/socket.sock /tmp/trace.out
sudo truss -f -o /tmp/trace.out \
  env COLIBRI_JAIL_PRIV_MODE=sudo \
  COLIBRI_DAEMON_SOCKET=/var/run/socket.sock \
  COLIBRI_DAEMON_DATA_DIR=/var/db/app \
  /usr/local/bin/daemon-binary &
sleep 3   # wait for socket ready

Important: pass the daemon's expected env vars explicitly so the trace captures the real spawn path, not a misconfigured one.

2. Trigger the failing operation

client-command --socket /var/run/socket.sock trigger-failure
sleep 2

3. Stop and analyze

sudo pkill daemon-binary; wait
wc -l /tmp/trace.out          # expect hundreds-thousands of lines

# Find the error
grep 'ERR#13\|ERR#1\|EACCES\|EPERM' /tmp/trace.out | grep -v 'ERR#2'

# Find process creation (fork + exec)
grep 'fork\|rfork\|execve' /tmp/trace.out

4. Interpret

Pattern Meaning
fork() = ERR#13 Can't create child process (resource limits?)
execve("/path/to/bin") ERR#13 Binary exists but can't execute (permissions, MAC)
execve("sudo") ERR#2 Bare name — PATH doesn't include /usr/local/bin
open("/path") ERR#13 File exists but can't open (ownership, mode)
mkdir("/path") ERR#13 Parent directory not writable
No fork/exec at all Error happens BEFORE spawn — staging/validation failure

Common daemon pitfalls caught by truss

  1. Bare command names: daemon(8) clears/reorders PATH — execvp("sudo") can't find /usr/local/bin/sudo. Fix: use absolute paths or a fixed search list.

  2. Staging directory ownership: daemon runs as unprivileged user but staging path was created by root. Fix: pre-create with correct ownership in bootstrap script.

  3. Orphaned processes holding socket: service stop killed the supervisor but old background daemons still hold the socket. Fix: ps aux | grep 'daemon: name' to find all supervisors, kill them all before starting.

  4. Capsicum sandboxing: if cap_enter() appears in the trace, the process entered capability mode and subsequent open()/fork() calls may fail. Fix: do all setup BEFORE cap_enter().

ktrace / kdump (alternative)

For long-running processes where truss output would be too large:

# Record
sudo ktrace -f /tmp/ktrace.out -p PID
# ... trigger the bug ...
sudo ktrace -C   # stop tracing

# Read
kdump -f /tmp/ktrace.out | less
kdump -f /tmp/ktrace.out | grep 'fork\|execve\|ERR'

ktrace writes to a binary file, so it's faster than truss for high-throughput processes. Use kdump to decode. Same syscall output, different capture mechanism.