Return an error from the socket server when another daemon owns the Unix socket or bind setup fails, and broadcast shutdown so the daemon does not stay alive without a control socket. Also format the PR docs.\n\nChecks: cargo fmt --check; ./scripts/check-format.sh; git diff --check; cargo test -p colibri-daemon clear_stale_socket -- --nocapture; cargo test -p colibri-daemon --test sigterm_shutdown -- --nocapture.
The rc.d "rm stale socket on prestart" fix (07e4660) was a band-aid over
two daemon-side defects that surfaced on the live FreeBSD host:
1. colibri-daemon never handled SIGTERM. main.rs awaited only ctrl_c()
(SIGINT), so `service stop`/`restart` — which sends SIGTERM via
daemon(8) to the child — killed it on the default disposition with no
cleanup. The graceful path (socket removal, agent reaping) never ran,
leaking the socket file and orphaning spawned agents across restarts.
Now wait_for_shutdown_signal() selects on SIGTERM or SIGINT, so the
same graceful path runs on a normal service stop. New integration test
(tests/sigterm_shutdown.rs) spawns the binary, sends SIGTERM, and
asserts the socket is removed.
2. Stale-socket cleanup had no liveness check — both the daemon
(socket.rs) and the rc prestart would unconditionally rm the socket
before bind, which could delete a *running* instance's socket if
rc.subr's pid detection misfires and starts a second daemon. Cleanup
now probes first (clear_stale_socket): connect succeeds -> refuse to
start; refused/dead -> remove and bind. Unit-tested for absent, stale,
and live cases.
With the daemon owning safe socket cleanup, the rc prestart no longer
removes the socket (only stale pidfiles), eliminating the restart-time
clobber hazard. This also makes the SIGTERM shutdown described in
ISO-SERVICE-LAYOUT.md (PR #75) actually true.
Gates: cargo fmt --check, clippy -D warnings, cargo test --workspace all
green on Linux; sh -n on the rc script OK. FreeBSD runtime validation
still pending per FREEBSD-BUILD-LANE-HANDOFF.md.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Six bugs found in deep-dive analysis of FreeBSD rc.d/rc.conf after the
live-copy-safe fix (7d23905):
1. colibri_cost_mode → colibri_daemon_cost_mode: naming broke rc.subr
${name}_ convention — operator setting colibri_daemon_cost_mode=fast
in rc.conf was silently ignored. Fixed in rc.d, staging script,
rc.conf.sample, and all docs.
2. Removed redundant chmod 660 on socket in poststart: Rust code already
sets 0770 with documented rationale. The poststart override to 0660
was conflicting, fragile, and had no comment.
3. Removed unnecessary chmod 644 on pidfile in poststart: pidfile lives
in a 0750 directory — world-readable permission is pointless and
security-negative.
4. Fixed ISO-SERVICE-LAYOUT.md: socket perms were wrong (said 750, actual
770), colibri-daemon.pid was labeled supervisor pidfile (it's the
child), supervisor pidfile was missing entirely, shutdown behavior
didn't mention custom stop_cmd targeting the supervisor.
5. health_cmd now checks for non-empty daemon response instead of just
connectvity — a hung daemon accepting connections but returning
garbage was reported healthy.
6. rc.conf.sample hostname path: $ (hostname) → $(/bin/hostname) for
consistency with rc.d script and early-boot PATH safety.
Checks: sh -n OK, cargo fmt --check OK, cargo clippy clean,
cargo test --workspace 207 passed.