clawdie-ai/doc/CONTROLPLANE-ARCHITECTURE.md
Operator & Codex c5820dec84 Document Colibri Pi control plan
Park the Pi-only control simplification and cross-host run contract so other agents can review before implementation starts.

---
Build: pass | Tests: pass — 2456 passed (182 files)
2026-05-24 19:26:19 +02:00

7.3 KiB

Control Plane Architecture

Overview

Starting with v0.10.0, Clawdie has a built-in control plane for multi-agent orchestration. The control plane is integrated directly into the Clawdie service running on the host.

Single clawdie service:
  ├── Telegram intake (existing)
  ├── HTTP REST API (control plane routes — NEW)
  ├── Unified scheduler (coordinates all agent work)
  ├── Agentic harness (TUI + extensions, planned)
  └── Shared hostd access (privileged operations)

Single-tenant: one control plane per host. Multi-tenant jailed companies deferred to Phase 8.


Service Layout

HOST (osa)
│
├── clawdie service (unified)
│   ├── Telegram bot (grammy)
  │   │   └── Messages → PostgreSQL ops DB → Unified scheduler
│   │
│   ├── HTTP API (port 3100)
│   │   ├── Control plane REST routes (agents, tasks, activity, approvals)
│   │   └── Health/metrics endpoints
│   │
│   ├── Unified scheduler (ticks every 30s)
│   │   ├── Process Telegram tasks
│   │   ├── Process HTTP API tasks
│   │   ├── Coordinate agent heartbeats
│   │   └── Enforce budgets + approvals
│   │
│   ├── Control plane runner
│   │   ├── spawn("pi", ...) on host with CONTROLPLANE_* env
│   │   └── jailPi/jailAider via bastille exec (when CONTROLPLANE_JAIL_ISOLATION=YES)
│   │
│   ├── IPC watcher (existing)
│   │   └── Reads JSON files from agents → dispatches
│   │
│   └── Watchdog (existing)
│       └── Health checks, concurrency control, mode switching
│
├── hostd daemon (existing)
│   └── Privileged ops (bastille, zfs, pf) via Unix socket
│       └── API proxy: POST /api/controlplane/hostd (for jail agents)
│
└── PostgreSQL Data Service (host by default, optional db jail)
    ├── system_* platform databases (existing)
    └── control plane tables (agents, tasks, agent_activity, agent_budgets, approvals, operators)

Network & Domains

Service IP/Port Domain Purpose
HTTP API 0.0.0.0:{CONTROLPLANE_API_PORT} (default 3100) ai.clawdie.home.arpa (internal) REST API for agents + UI
HTTP API 0.0.0.0:{CONTROLPLANE_API_PORT} (default 3100) https://ai.clawdie.si (public via nginx) Operator API for harness
Auth 0.0.0.0:{CONTROLPLANE_API_PORT} (default 3100) /api/auth/* Better Auth (email+password sessions)
Telegram bot N/A Telegram API Incoming messages
PostgreSQL host warden0:5432 by default; db jail when selected Internal (warden0) Shared data store

Harness access (planned):

  • Operator works from a terminal UI (TUI) and extensions, not a browser dashboard.
  • Tailscale exposure uses host nginx TLS for the MagicDNS hostname, proxying to http://127.0.0.1:{CONTROLPLANE_API_PORT} (default 3100).

Database

PostgreSQL uses the shared Data Service: host PostgreSQL by default, or the db jail when DB_RUNTIME=jail is explicitly selected.

agents             — agent registry (Orchestrator, Sysadmin, DBA, Git Admin)
tasks              — work items (shared by Telegram + HTTP API)
agent_activity     — immutable audit trail
agent_budgets      — token spend limits per agent
approvals          — operator sign-offs for expensive operations
operators          — human admin accounts
cp_users           — Better Auth user accounts
cp_sessions        — Better Auth sessions
cp_accounts        — Better Auth credential accounts
cp_verifications   — Better Auth verification tokens

No systems table. Single-tenant.


Security Model

  • Telegram: Existing auth (chat ID → group registration)
  • HTTP API: session cookies for operator, Bearer API keys for agents
  • Operator account: Auto-created during setup, password hashed with bcrypt/argon2
  • Agent API keys: Hashed bearer tokens, injected via CONTROLPLANE_* env vars at spawn time
  • Agent-to-API auth: Requires Bearer token matching CONTROLPLANE_SHARED_SECRET
  • Secrets: Encrypted at rest (AES-256-GCM), master key at ~/.clawdie/secrets/master.key

Heartbeat Policy

Role Heartbeat Policy
Orchestrator disabled On-demand only (cost-conscious)
Sysadmin enabled, 86400s Daily health check + on-demand
DBA disabled On-demand only
Git Admin disabled On-demand only

Token Budgets

Daily total: 100,000 tokens (configurable via CONTROLPLANE_DAILY_TOKENS)
├── Orchestrator: 80,000 (80%)
├── Sysadmin:   10,000 (10%)
├── DBA:         5,000 (5%)
└── Git Admin:   5,000 (5%)

Hard limits per agent. Reset at UTC midnight. Budget check before every agent spawn.


Jail Isolation (Phase 7)

When CONTROLPLANE_JAIL_ISOLATION=YES in .env, specialist agents run inside dedicated FreeBSD jails instead of on the host:

Agent ID Jail
sysadmin host (no jail — privileged ops)
db-admin db-worker
git-admin git-worker
coordinator ctrl-worker

Feature flag defaults to NO. See src/controlplane-runner.ts (AGENT_JAIL_MAP).


Prompt Guardrails

Configurable limits prevent runaway context growth:

Variable Purpose
AGENT_MAX_INBOUND_CHARS Max characters per incoming message
AGENT_MAX_BACKLOG_MESSAGES Max messages included from session backlog
AGENT_MAX_BACKLOG_CHARS Max total characters from backlog messages
AGENT_MAX_PROMPT_CHARS Max total prompt size sent to model
AGENT_SESSION_MAX_BYTES Max bytes per session before rollover

Context-Exceeded Handling

When the model hits context_window_exceeded, the agent replies with a short guidance message instead of retrying indefinitely. The cursor is not rolled back.

Session Rollover

When AGENT_SESSION_MAX_BYTES is exceeded, the session is rolled over — a fresh session is started and the old one is archived.


References

  • doc/CONTROLPLANE-AGENT-ROLES.md — role definitions, skill mapping
  • doc/CONTROLPLANE-MESSAGE-CONTRACT.md — API contracts
  • doc/COLIBRI-PI-CONTROL-PLAN.md — planned Pi-only simplification and Colibri event fabric
  • doc/INTERAGENT-RUN-CONTRACT.md — cross-host run manifest and artifact exchange contract