From 8199e238900dd47318eecce0ba652d02e29dfcbe Mon Sep 17 00:00:00 2001 From: Sam & Claude Date: Wed, 27 May 2026 21:09:26 +0200 Subject: [PATCH] docs: record OSA intake scheduler re-smoke --- .../2026-05-27-osa-freebsd-intake-resmoke.md | 142 ++++++++++++++++++ 1 file changed, 142 insertions(+) create mode 100644 docs/internal/sessions/2026-05-27-osa-freebsd-intake-resmoke.md diff --git a/docs/internal/sessions/2026-05-27-osa-freebsd-intake-resmoke.md b/docs/internal/sessions/2026-05-27-osa-freebsd-intake-resmoke.md new file mode 100644 index 0000000..0da0af8 --- /dev/null +++ b/docs/internal/sessions/2026-05-27-osa-freebsd-intake-resmoke.md @@ -0,0 +1,142 @@ +# OSA FreeBSD Intake Scheduler Re-smoke + +**Date:** 27.maj.2026 +**Host:** osa.smilepowered.org +**OS:** FreeBSD 15.0-RELEASE-p9 amd64 +**Repo:** `Clawdie/Colibri` +**Base pulled:** `af8c011` — daemon loop wiring landed +**Fix commit:** `d760536` — `fix: avoid scheduler store deadlock on intake drain` +**Status:** PASS after follow-up fix + +## What was checked + +Pulled the daemon-loop wiring (`9717ce7`, plus `af8c011`) and ran the full workspace gates: + +```sh +cargo fmt --check +cargo clippy --workspace --all-targets -- -D warnings +cargo test --workspace +cargo build --workspace --release +``` + +Initial gates were green, but a live `/tmp` FreeBSD re-smoke exposed one more runtime bug. + +## Finding during live re-smoke + +With `daemon::run_loop` now started, `intake-task` did reach the scheduler tick and inserted a SQLite task, but a concurrent `list-tasks` socket request timed out. + +Root cause: `Scheduler::tick` used expressions like: + +```rust +match state.store.lock().unwrap().create_task(...) { + Ok(task) => { + state.store.lock().unwrap().list_agents() + } +} +``` + +The `MutexGuard` temporary from the `match` scrutinee can live through the match arm. Relocking `state.store` inside the arm can deadlock the scheduler thread. On FreeBSD this manifested as: + +- SQLite row was created +- socket request blocked on the store lock +- daemon became unresponsive until killed by the smoke harness timeout + +## Fix + +`d760536` rewrites scheduler store access so every lock is scoped explicitly and dropped before the next lock: + +```rust +let create_result = { + let store = state.store.lock().unwrap(); + store.create_task(...) +}; +``` + +It also adds a regression test: + +```text +scheduler::tests::test_scheduler_tick_drains_intake_without_deadlock +``` + +## Re-smoke result after fix + +Isolated `/tmp` environment: + +```text +COLIBRI_DAEMON_DATA_DIR=/tmp/colibri-osa-resmoke-clawdie-1779908417/data +COLIBRI_DAEMON_SOCKET=/tmp/colibri-osa-resmoke-clawdie-1779908417/colibri.sock +COLIBRI_DB_PATH=/tmp/colibri-osa-resmoke-clawdie-1779908417/colibri.sqlite +COLIBRI_HOST=osa-resmoke +``` + +`intake-task` response: + +```json +{"ok":true,"data":{"status":"queued"}} +``` + +The scheduler drained intake on the 30s tick: + +```text +FOUND_ON_POLL=30 +``` + +`list-tasks` then returned the queued task: + +```json +{ + "ok": true, + "data": [ + { + "agent_id": null, + "created_at": "2026-05-27T19:00:47.360062420+00:00", + "description": "prove scheduler loop drains intake", + "id": "c3dab9df-8a37-47b1-854b-d956fd796d41", + "status": "queued", + "title": "osa resmoke intake", + "updated_at": "2026-05-27T19:00:47.360062420+00:00" + } + ] +} +``` + +SQLite verification: + +```text +tasks 1 +journal_mode wal +``` + +Graceful shutdown: + +```text +socket exists after stop? no +process remains? no +``` + +Daemon log included both task loops exiting cleanly: + +```text +daemon background loop started ... scheduler_secs=30 +received interrupt signal, initiating graceful shutdown +socket server received shutdown signal +daemon loop received shutdown signal +daemon background loop exited +Herdr socket API shut down +colibri-daemon shut down cleanly +``` + +## Final verdict + +The daemon loop wiring is valid after `d760536`. + +Validated on FreeBSD: + +- daemon starts with `/tmp` data/socket/DB paths +- background daemon loop starts beside the socket server +- `intake-task` over Unix socket becomes a queued SQLite task on the scheduler tick +- `list-tasks` remains responsive after the tick +- SQLite WAL works +- graceful shutdown removes socket and exits both socket + loop tasks + +The smoke directory was removed after recording this report.