fix(rc): FreeBSD rc.d deep-audit — 6 bugs after live-host testing #75

Merged
clawdie merged 3 commits from fix/freebsd-rc-d-deep-audit into main 2026-06-15 09:24:20 +02:00

3 commits

Author SHA1 Message Date
4517e13935 fix(daemon): fail closed when socket ownership is unsafe (Sam & Codex)
Some checks failed
CI / rust (pull_request) Has been cancelled
CI / markdown (pull_request) Has been cancelled
Return an error from the socket server when another daemon owns the Unix socket or bind setup fails, and broadcast shutdown so the daemon does not stay alive without a control socket. Also format the PR docs.\n\nChecks: cargo fmt --check; ./scripts/check-format.sh; git diff --check; cargo test -p colibri-daemon clear_stale_socket -- --nocapture; cargo test -p colibri-daemon --test sigterm_shutdown -- --nocapture.
2026-06-15 09:08:56 +02:00
Sam & Claude
b32c3acaed fix(daemon): handle SIGTERM + liveness-aware socket cleanup (Sam & Claude)
Some checks failed
CI / rust (pull_request) Has been cancelled
CI / markdown (pull_request) Has been cancelled
The rc.d "rm stale socket on prestart" fix (07e4660) was a band-aid over
two daemon-side defects that surfaced on the live FreeBSD host:

1. colibri-daemon never handled SIGTERM. main.rs awaited only ctrl_c()
   (SIGINT), so `service stop`/`restart` — which sends SIGTERM via
   daemon(8) to the child — killed it on the default disposition with no
   cleanup. The graceful path (socket removal, agent reaping) never ran,
   leaking the socket file and orphaning spawned agents across restarts.
   Now wait_for_shutdown_signal() selects on SIGTERM or SIGINT, so the
   same graceful path runs on a normal service stop. New integration test
   (tests/sigterm_shutdown.rs) spawns the binary, sends SIGTERM, and
   asserts the socket is removed.

2. Stale-socket cleanup had no liveness check — both the daemon
   (socket.rs) and the rc prestart would unconditionally rm the socket
   before bind, which could delete a *running* instance's socket if
   rc.subr's pid detection misfires and starts a second daemon. Cleanup
   now probes first (clear_stale_socket): connect succeeds -> refuse to
   start; refused/dead -> remove and bind. Unit-tested for absent, stale,
   and live cases.

With the daemon owning safe socket cleanup, the rc prestart no longer
removes the socket (only stale pidfiles), eliminating the restart-time
clobber hazard. This also makes the SIGTERM shutdown described in
ISO-SERVICE-LAYOUT.md (PR #75) actually true.

Gates: cargo fmt --check, clippy -D warnings, cargo test --workspace all
green on Linux; sh -n on the rc script OK. FreeBSD runtime validation
still pending per FREEBSD-BUILD-LANE-HANDOFF.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 08:37:06 +02:00
Sam & Claude
df5fbab051 fix(rc): FreeBSD rc.d deep-audit — cost mode naming, chmod cleanup, health check, docs (Sam & Hermes)
Some checks failed
CI / rust (pull_request) Has been cancelled
CI / markdown (pull_request) Has been cancelled
Six bugs found in deep-dive analysis of FreeBSD rc.d/rc.conf after the
live-copy-safe fix (7d23905):

1. colibri_cost_mode → colibri_daemon_cost_mode: naming broke rc.subr
   ${name}_ convention — operator setting colibri_daemon_cost_mode=fast
   in rc.conf was silently ignored. Fixed in rc.d, staging script,
   rc.conf.sample, and all docs.

2. Removed redundant chmod 660 on socket in poststart: Rust code already
   sets 0770 with documented rationale. The poststart override to 0660
   was conflicting, fragile, and had no comment.

3. Removed unnecessary chmod 644 on pidfile in poststart: pidfile lives
   in a 0750 directory — world-readable permission is pointless and
   security-negative.

4. Fixed ISO-SERVICE-LAYOUT.md: socket perms were wrong (said 750, actual
   770), colibri-daemon.pid was labeled supervisor pidfile (it's the
   child), supervisor pidfile was missing entirely, shutdown behavior
   didn't mention custom stop_cmd targeting the supervisor.

5. health_cmd now checks for non-empty daemon response instead of just
   connectvity — a hung daemon accepting connections but returning
   garbage was reported healthy.

6. rc.conf.sample hostname path: $ (hostname) → $(/bin/hostname) for
   consistency with rc.d script and early-boot PATH safety.

Checks: sh -n OK, cargo fmt --check OK, cargo clippy clean,
cargo test --workspace 207 passed.
2026-06-15 08:28:20 +02:00