colibri/docs/COLIBRI-DAEMON-GLASSPANE-INTEGRATION.md
Sam & Claude 6e78ea630d
Some checks failed
CI / rust (pull_request) Has been cancelled
CI / markdown (pull_request) Has been cancelled
docs: clarify Herdr as optional Linux display (Sam & Codex)
Cleans stale Herdr socket/API naming after the Colibri socket rename, preserves Herdr as an optional Linux/macOS display client, marks the clawdie mini-binary service as experimental rather than ISO/deployed-service contract, and removes old internal session logs.\n\nChecks: ./scripts/check-format.sh; cargo fmt --check; git diff --check; sh -n packaging/freebsd/colibri_daemon.in packaging/freebsd/clawdie.in
2026-06-13 12:29:11 +02:00

31 KiB

colibri-daemon ↔ colibri-glasspane integration contract

Attribution: Sam & Hermes

This is the binding contract between the two core Rust crates in the colibri workspace. It defines the socket API, pane-to-session identity mapping, state flow, unified vocabulary, the snapshot contract, and the boot sequence. Both crates MUST implement their side of this contract; changes here require both crates to be updated in lockstep.


Architecture summary

┌─────────────────────────────────────────────────────────────────┐
│  colibri-daemon (always-on service)                             │
│                                                                 │
│  ┌──────────┐  ┌───────────┐  ┌──────────┐  ┌───────────────┐  │
│  │ Spawner   │  │ Sessions  │  │ Heartbeat │  │ Socket Server │  │
│  │ (agents)  │  │ (JSONL)   │  │ (30s)     │  │ (Unix domain) │──┼──► Colibri TUI / web
│  └─────┬─────┘  └───────────┘  └─────┬─────┘  └───────┬───────┘  │     Zed / optional Herdr Linux
│        │                             │                │          │
│        │  stdout JSONL               │ poll_exit()    │ query    │
│        ▼                             ▼                ▼          │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │  DaemonState.glasspane: RwLock<PaneSupervisor>           │   │
│  │  (colibri-glasspane — owned, embedded in daemon)         │   │
│  └──────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────┘
  • colibri-daemon owns the PaneSupervisor. Glasspane is NOT a separate process; it is a crate compiled into the daemon binary.
  • colibri-glasspane provides the state machine (apply_pi_event / fold_pi_events), the PiJsonlIngestor, SupervisedPane, PaneSupervisor, and the GlasspaneSnapshot wire type. It has no I/O dependencies beyond what the daemon gives it (lines of text with timestamps).

1. Socket API shape

The daemon opens a Unix domain socket (path from DaemonConfig.socket_path). All communication is newline-delimited JSON (one JSON object per line, matching the existing watchdog convention).

Wire types (defined in colibri-daemon/src/lib.rs)

Inbound: ColibriCommand (tagged by cmd field):

cmd value Parameters Purpose
status none Health check: agent count, session count
glasspane-snapshot none Full PaneSupervisor.snapshot_at(...)
list-sessions none Enumerate sessions (id, turn_count, bytes)
spawn-agent provider, model, session_id?, system_prompt? Spawn agent, attach pane to glasspane. For provider:"local", model is treated as the executable path and no API key is required.
kill-agent agent_id SIGKILL agent, ingest error event
get-session session_id Full session dump (turns + prompt)
compact-session session_id Compact oldest turns in a session

Outbound: ColibriResponse:

{
  "ok": true,
  "error": null,
  "data": { ... }
}

data is null when ok is false.

Glasspane-facing commands

Two commands produce or consume glasspane state:

glasspane-snapshot — reads the PaneSupervisor under its RwLock and returns a GlasspaneSnapshot:

Client: {"cmd":"glasspane-snapshot"}\n
Server: {"ok":true,"data":{
  "schema":"clawdie.glasspane.snapshot.v1",
  "host":"domedog",
  "observed_at":"2026-05-27T12:00:00.000Z",
  "panes":[
    {"id":"abc-123","agent":"pi","state":"working",
     "pi_session_id":"019e5e59-...","last_event_at":"...",
     "cwd":"/repo","stalled":false}
  ]
}}\n

spawn-agent — creates a daemon session, spawns an agent subprocess, attaches a new pane to PaneSupervisor, and wires agent stdout into the glasspane ingestion pipeline:

Client: {"cmd":"spawn-agent","provider":"deepseek","model":"deepseek-chat",
         "session_id":"sess-1","system_prompt":"You are a helpful assistant."}\n
Server: {"ok":true,"data":{"agent_id":"a1b2-c3d4","status":"running"}}\n

Operator CLI smoke helpers

colibri-client also ships small binaries for manual display-client and SSH smoke tests without hand-writing socket JSON:

# Inspect daemon state
colibri --socket "$COLIBRI_DAEMON_SOCKET" status
colibri --socket "$COLIBRI_DAEMON_SOCKET" snapshot

# Spawn a deterministic no-network Pi JSONL emitter through the daemon
colibri --socket "$COLIBRI_DAEMON_SOCKET" \
  spawn-local target/release/colibri-smoke-agent

# Stop the spawned local agent
colibri --socket "$COLIBRI_DAEMON_SOCKET" kill <agent_id>

colibri-smoke-agent emits session, turn_start, queue_update, turn_start, and turn_end, so colibri-tui should show:

Idle → Working → Blocked → Done

What the daemon sends TO glasspane

The daemon is the sole writer into the PaneSupervisor. It calls:

Daemon action Glasspane method called
cmd_spawn_agent — attach new pane attach_pane_at(agent_id, binary_name, SystemTime::now)
stream_agent_stdout_to_glasspane — each JSONL line ingest_line_at(agent_id, line, SystemTime::now)
heartbeat — agent exited successfully ingest_line_at(agent_id, '{"type":"agent_end"}', now)
heartbeat — agent exited with error ingest_line_at(agent_id, '{"type":"error"}', now)
cmd_kill_agent — forced kill ingest_line_at(agent_id, '{"type":"error"}', now)

What glasspane exposes TO the daemon

The daemon is the sole reader of the PaneSupervisor:

Daemon action Glasspane method called
cmd_glasspane_snapshot snapshot_at(host, now, DEFAULT_STALL_AFTER)
cmd_status (agent count lookup) not glasspane — reads state.agents directly

Glasspane does NOT push events to the daemon. It is a passive state accumulator. The daemon feeds it events; the daemon reads snapshots. Glasspane has no threads, no channels, no timers of its own.


2. Pane-to-session mapping

Two distinct identity spaces exist and MUST NOT be conflated:

Concept Owner ID namespace Example
Agent ID colibri-daemon UUIDv4 (Uuid::new_v4()) "a1b2c3d4-e5f6-..."
Pane ID colibri-daemon same as Agent ID "a1b2c3d4-e5f6-..."
Session ID colibri-daemon caller-supplied or UUIDv4 "sess-1" or "019e5e59-..."
Pi session ID Pi agent (JSONL) Pi --mode json header id field "019e5e59-6645-7e21-aca2-b57ccf0f8578"

The mapping chain

Agent ID  ==  Pane ID  ──►  Session ID  ──►  Pi session ID
 (daemon)        (daemon)    (daemon)         (glasspane, discovered)
   │               │             │                   │
   │  1:1          │   n:1       │   1:1             │  discovered from
   │               │ (multiple   │ (one Pi agent     │  Pi JSONL header
   │               │  agents     │  per session;     │  `{"type":"session",
   │               │  can share  │  but sessions     │   "id":"..."}`
   │               │  a session) │  can be reused)
   │               │             │
   ▼               ▼             ▼
AgentHandle     SupervisedPane  Session
(DashMap)       (PaneSupervisor) (DashMap)

How the mapping is created

  1. spawn-agent command arrives → daemon generates agent_id (UUIDv4).
  2. Daemon creates or resolves session_id.
  3. Spawner::spawn() returns AgentHandle { id: agent_id }.
  4. Daemon inserts handle into state.agents keyed by agent_id.
  5. Daemon calls state.glasspane.attach_pane_at(agent_id, agent_binary, now) — this creates a SupervisedPane with pane.id == agent_id.
  6. Daemon spawns stream_agent_stdout_to_glasspane(state, agent_id, stdout).
  7. When glasspane ingests a {"type":"session","id":"pi-xxx","cwd":"/repo"} line, it captures pi_session_id and cwd on the SupervisedPane.

Why separate Pane ID from Pi session ID?

  • The daemon controls agent lifecycle (spawn, kill, restart). It needs a stable ID that it assigns before the Pi agent emits its first JSONL line.
  • The Pi agent's session ID is internal to the agent — it cannot be known until the JSONL stream begins.
  • Glasspane tracks both: pane.id is the daemon-assigned key; pane.pi_session_id is the discovered Pi header field. Tests enforce pane.id != pane.pi_session_id when both are present.

Agent-to-pane lifetime

  • Agent spawn → pane attached (attach_pane_at).
  • Agent stdout lines → pane ingests (ingest_line_at).
  • Agent exit (natural or killed) → daemon ingests a final lifecycle event (agent_end or error) into the pane.
  • Pane is never removed from PaneSupervisor currently. In Phase 4, the daemon may prune panes after a configurable retention window.

3. State flow

Daemon lifecycle events → Glasspane AgentState

Daemon lifecycle event Ingested Pi event type Resulting AgentState Notes
Agent subprocess spawned (pane attached, state = Idle) Idle SupervisedPane::new defaults to Idle
Agent emits session header session / session_started Idle Also captures pi_session_id and cwd
Agent emits turn/message/tool turn_start, message_start, tool_execution_*, etc. Working Any of 14 event types
Agent emits compaction events auto_compaction_*, compaction_* Working Compaction is active work
Agent emits retry events auto_retry_* Working Retry is active work
Agent awaits steering/approval queue_update Blocked Operator attention needed (dashboard headline)
Turn/task complete turn_end / agent_end Done Agent reached a completion point
Agent emits explicit error error Error Terminal failure state
Agent subprocess exits (0) daemon injects agent_end Done Heartbeat detected normal exit
Agent subprocess exits (!=0) daemon injects error Error Heartbeat detected crash/error exit
Agent killed externally daemon injects error Error kill-agent command

State transition diagram

                         ┌──────────┐
          attach ───────►│   Idle   │◄──── session / session_started
                         └────┬─────┘
                              │ turn_start, message_*, tool_execution_*,
                              │ auto_compaction_*, auto_retry_*
                              ▼
  queue_update ─────►  ┌──────────┐  ◄──── any working event
  (steering needed)    │ Working  │
                       └────┬─────┘
                            │ turn_end / agent_end
                            ▼
                       ┌──────────┐
                       │   Done   │
                       └──────────┘

                       ┌──────────┐
                       │  Error   │◄──── error (from agent or daemon)
                       └──────────┘

                       ┌──────────┐
                       │ Blocked  │──► turn_start etc. ──► Working
                       └──────────┘
                            ▲
                            │ queue_update
                       ┌────┴─────┐
                       │ Working  │
                       └──────────┘
  • Blocked is entered from Working or any other state when queue_update arrives. It transitions back to Working on any working-type event (the agent resumed after receiving steering input).
  • Done and Error are terminal-ish: they are not reset by subsequent events unless a new session header appears (which would restart at Idle).
  • Unknown event types preserve the current state — forward-compatible with future Pi event taxonomy additions.

Daemon background loop ↔ Glasspane

Loop tick Interval Glasspane interaction
Heartbeat 30s Polls AgentHandle::poll_exit(). On exit, injects agent_end or error event into glasspane via ingest_line_at.
Session rotation 60s Checks session byte/turn thresholds. Triggers compaction. No direct glasspane interaction, but agent compaction emits auto_compaction_* events that flow through stdout → glasspane.
Memory handoff 120s Currently a stub. Future: produce shared context summaries. No glasspane interaction yet.

Stalled detection

Stalled is derived in the snapshot layer, not stored as mutable state:

pub fn is_stalled_at(&self, now: SystemTime, stall_after: Duration) -> bool {
    if !matches!(self.state(), AgentState::Working | AgentState::Blocked) {
        return false;
    }
    let silence_since = self.last_event_at().unwrap_or(self.started_at);
    now.duration_since(silence_since)
        .is_ok_and(|silent_for| silent_for >= stall_after)
}
  • Only Working and Blocked panes can be stalled.
  • DEFAULT_STALL_AFTER is 4 hours.
  • Done and Error panes are never stalled (they've already reached a terminal state).
  • The daemon heartbeat's agent_stall_timeout (300s default) is a separate concept: it detects dead subprocesses, not semantic stalling. The heartbeat timeout triggers event injection; glasspane stalled is a display concern.

4. Unified API vocabulary

Consistent naming across colibri-daemon and colibri-glasspane. Every term below means exactly one thing.

Glasspane / supervision namespace

Term Rust type / fn Owner Meaning
pane SupervisedPane, Pane glasspane One agent occupying one supervision slot
pane.id PaneId (type alias String) glasspane Daemon-assigned unique ID (= agent ID)
attach_pane_at PaneSupervisor::attach_pane_at glasspane Register a new pane in the supervisor
ingest_line_at PaneSupervisor::ingest_line_at glasspane Feed one JSONL line at a wall-clock time
ingest_jsonl_reader_at PaneSupervisor::ingest_jsonl_reader_at glasspane Feed a BufRead to the supervisor
snapshot_at PaneSupervisor::snapshot_at glasspane Produce GlasspaneSnapshot for all panes
state AgentState enum glasspane Semantic agent state (5 variants)
stalled Pane::stalled (derived bool) glasspane Event silence exceeds stall_after threshold
pi_session_id Option<String> on Pane glasspane Pi session ID captured from JSONL header
last_event_at Option<SystemTime> glasspane Wall-clock time of last accepted Pi event
cwd Option<String> on Pane glasspane Working directory from Pi session header
apply_pi_event fn(AgentState, &str) -> AgentState glasspane Pure state transition function
fold_pi_events fn(Iterator<&str>) -> AgentState glasspane Fold a sequence of event types
DEFAULT_STALL_AFTER Duration (4 hours) glasspane Default stall silence threshold
GLASSPANE_SNAPSHOT_SCHEMA &str = "clawdie.glasspane.snapshot.v1" glasspane Schema constant for all snapshots

Daemon / lifecycle namespace

Term Rust type / fn Owner Meaning
agent AgentHandle daemon Running agent subprocess handle
agent.id String (UUIDv4) daemon Same value as pane.id
session Session daemon JSONL-backed conversation store
session.id String daemon Caller-supplied or generated session key
spawn Spawner::spawn daemon Launch agent subprocess with retry/backoff
kill AgentHandle::kill daemon SIGKILL agent, update status
poll_exit AgentHandle::poll_exit daemon Non-blocking exit check (heartbeat)
compact Session::compact_oldest_turns daemon Compaction triggered by byte/turn thresholds
prune Session::prune_to daemon Aggressive pruning after compaction
heartbeat fn heartbeat in daemon loop daemon 30s tick: check exits, detect stalls
session_rotation fn session_rotation in daemon loop daemon 60s tick: compact/prune sessions
memory_handoff fn memory_handoff in daemon loop daemon 120s tick: cross-agent context sharing

Provider namespace

Term Rust type / fn Owner Meaning
provider Provider enum daemon LLM backend (DeepSeek, OpenRouter, Anthropic)
provider (socket) "deepseek", "openrouter", "anthropic", "local" daemon String form in spawn-agent command (local is no-network/fake-agent smoke only; model = executable path)

Socket command namespace

Term Wire cmd value Owner Meaning
status "status" daemon Health check
glasspane-snapshot "glasspane-snapshot" daemon/glasspane Read full supervision snapshot
list-sessions "list-sessions" daemon Enumerate sessions
spawn-agent "spawn-agent" daemon Spawn agent + attach pane
kill-agent "kill-agent" daemon Kill agent + ingest error event
get-session "get-session" daemon Dump full session
compact-session "compact-session" daemon Manual compaction trigger

Event taxonomy (colibri-pi-events, shared)

These are the Pi --mode json type field values recognized by apply_pi_event:

Event type Maps to state Notes
session Idle Captures pi_session_id and cwd
session_started Idle Alternative header form
agent_start Working Agent lifecycle begin
turn_start Working Turn/task begin
message_start Working LLM message streaming begin
message_update Working LLM message streaming chunk
message_end Working LLM message streaming complete
tool_execution_start Working Tool invocation begin
tool_execution_update Working Tool invocation progress
tool_execution_end Working Tool invocation complete
auto_compaction_start Working Automatic context compaction begin
auto_compaction_end Working Automatic context compaction complete
compaction_start Working Legacy compaction begin
compaction_end Working Legacy compaction complete
auto_retry_start Working Automatic retry begin
auto_retry_end Working Automatic retry complete
queue_update Blocked Steering/approval/input required
turn_end Done Turn/task complete
agent_end Done Agent lifecycle complete
error Error Terminal failure

5. Contract: clawdie.glasspane.snapshot.v1

Where it is defined

Defined in colibri-glasspane/src/lib.rs:

pub const GLASSPANE_SNAPSHOT_SCHEMA: &str = "clawdie.glasspane.snapshot.v1";

Rust type

#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
pub struct GlasspaneSnapshot {
    pub schema: String,          // "clawdie.glasspane.snapshot.v1"
    pub host: String,            // Hostname from DaemonConfig.host
    pub observed_at: String,     // RFC 3339 with milliseconds (e.g. "2026-05-27T12:00:00.000Z")
    pub panes: Vec<Pane>,        // All supervised panes
}

#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
pub struct Pane {
    pub id: String,                          // Daemon-assigned pane ID (= agent ID)
    pub agent: String,                       // Agent binary name (e.g. "pi", "hermes-agent")
    pub state: AgentState,                   // "idle" | "working" | "blocked" | "done" | "error"
    pub pi_session_id: Option<String>,       // Pi session ID from JSONL header
    pub last_event_at: Option<String>,       // RFC 3339 of last accepted event
    pub cwd: Option<String>,                 // Working directory from Pi session header
    pub stalled: bool,                       // Derived: event silence >= DEFAULT_STALL_AFTER
}

Where it is produced

Exactly one place: PaneSupervisor::snapshot_at(host, observed_at, stall_after) in colibri-glasspane. Called by cmd_glasspane_snapshot in colibri-daemon/src/socket.rs.

// In cmd_glasspane_snapshot:
let snapshot = state.glasspane.read().await.snapshot_at(
    state.config.host.clone(),
    SystemTime::now(),
    DEFAULT_STALL_AFTER,
);

Where it is consumed

Consumer Transport Phase Purpose
colibri CLI / colibri-tui Unix socket 4 Native operator dashboard and smoke surface
Herdr (Linux/macOS optional) Unix socket/bridge 4 Optional external display client, not source
Zed / web board HTTP / SSE 4 Web-based supervision view
colibri-orchestrator In-memory / socket 5 Route/dispatch work across panes

Wire shape (JSON)

{
  "schema": "clawdie.glasspane.snapshot.v1",
  "host": "domedog",
  "observed_at": "2026-05-27T12:00:00.123Z",
  "panes": [
    {
      "id": "a1b2c3d4-e5f6-...",
      "agent": "pi",
      "state": "working",
      "pi_session_id": "019e5e59-6645-7e21-aca2-b57ccf0f8578",
      "last_event_at": "2026-05-27T11:59:58.456Z",
      "cwd": "/home/clawdija/clawdie-ai",
      "stalled": false
    }
  ]
}

Serialization rules:

  • pi_session_id, last_event_at, and cwd are omitted when None (#[serde(skip_serializing_if = "Option::is_none")]).
  • stalled is omitted when false (#[serde(skip_serializing_if = "skip_false")]).
  • AgentState serializes as lowercase: "idle", "working", "blocked", "done", "error".

Promotion path

The schema constant and types currently live in colibri-glasspane. Once a second consumer (a display client binary separate from the daemon) needs to deserialize GlasspaneSnapshot, the types should be promoted to a shared colibri-contracts crate. Until then, the crate boundary is sufficient — the daemon depends on colibri-glasspane and links it directly.


6. Boot sequence

Which starts first?

colibri-daemon starts first and starts alone. colibri-glasspane is a library crate, not a process — it is compiled into the daemon binary.

Startup order

1. CLI parses args, loads DaemonConfig (env or toml)
          │
2. DaemonState::new(config) ──► PaneSupervisor::new() (empty BTreeMap)
          │
3. Daemon background loop spawned (tokio::spawn)
   ├── heartbeat tick (30s)
   ├── session_rotation tick (60s)
   └── memory_handoff tick (120s)
          │
4. socket::serve(state, shutdown_rx) ← BLOCKING
   ├── Binds Unix socket at config.socket_path
   ├── Accepts connections
   └── Dispatches ColibriCommand variants
          │
5. External clients (colibri CLI/TUI, optional Herdr Linux/macOS, web) connect
   and send commands

Clean boot checklist (in sequence)

  1. Remove stale socket file if it exists.
  2. Create parent directory for socket if needed.
  3. Bind UnixListener.
  4. Spawn daemon loop task.
  5. Enter accept loop — the daemon is now ready.

Shutdown sequence

  1. Daemon receives shutdown_rx.recv() (from SIGINT/SIGTERM or explicit shutdown command).
  2. Socket server breaks its accept loop.
  3. Daemon loop task breaks its select loop.
  4. Socket file is removed.
  5. All agent subprocesses are killed via AgentHandle::kill().
  6. Process exits.

No discovery needed

There is no service discovery between daemon and glasspane because glasspane is embedded. External clients discover the daemon by connecting to the well-known Unix socket path (DaemonConfig.socket_path).


Cross-reference

Document Relationship
docs/COLIBRI-GLASSPANE-DESIGN.md Glasspane capability design, phase plan, non-goals
crates/colibri-client/src/lib.rs Phase-4 typed Unix-socket client for display/UI consumers
docs/HERDR-VS-COLIBRI-GRAPH.md Hybrid boundary: Herdr as Linux display client
crates/colibri-daemon/src/socket.rs Socket server implementation
crates/colibri-daemon/src/daemon.rs Daemon background loop + heartbeat
crates/colibri-daemon/src/lib.rs Wire types: ColibriCommand, ColibriResponse
crates/colibri-daemon/src/spawner.rs Agent subprocess spawner
crates/colibri-glasspane/src/lib.rs State machine, supervisor, snapshot contract