Park the Pi-only control simplification and cross-host run contract so other agents can review before implementation starts. --- Build: pass | Tests: pass — 2456 passed (182 files)
7.3 KiB
Control Plane Architecture
Overview
Starting with v0.10.0, Clawdie has a built-in control plane for multi-agent orchestration. The control plane is integrated directly into the Clawdie service running on the host.
Single clawdie service:
├── Telegram intake (existing)
├── HTTP REST API (control plane routes — NEW)
├── Unified scheduler (coordinates all agent work)
├── Agentic harness (TUI + extensions, planned)
└── Shared hostd access (privileged operations)
Single-tenant: one control plane per host. Multi-tenant jailed companies deferred to Phase 8.
Service Layout
HOST (osa)
│
├── clawdie service (unified)
│ ├── Telegram bot (grammy)
│ │ └── Messages → PostgreSQL ops DB → Unified scheduler
│ │
│ ├── HTTP API (port 3100)
│ │ ├── Control plane REST routes (agents, tasks, activity, approvals)
│ │ └── Health/metrics endpoints
│ │
│ ├── Unified scheduler (ticks every 30s)
│ │ ├── Process Telegram tasks
│ │ ├── Process HTTP API tasks
│ │ ├── Coordinate agent heartbeats
│ │ └── Enforce budgets + approvals
│ │
│ ├── Control plane runner
│ │ ├── spawn("pi", ...) on host with CONTROLPLANE_* env
│ │ └── jailPi/jailAider via bastille exec (when CONTROLPLANE_JAIL_ISOLATION=YES)
│ │
│ ├── IPC watcher (existing)
│ │ └── Reads JSON files from agents → dispatches
│ │
│ └── Watchdog (existing)
│ └── Health checks, concurrency control, mode switching
│
├── hostd daemon (existing)
│ └── Privileged ops (bastille, zfs, pf) via Unix socket
│ └── API proxy: POST /api/controlplane/hostd (for jail agents)
│
└── PostgreSQL Data Service (host by default, optional db jail)
├── system_* platform databases (existing)
└── control plane tables (agents, tasks, agent_activity, agent_budgets, approvals, operators)
Network & Domains
| Service | IP/Port | Domain | Purpose |
|---|---|---|---|
| HTTP API | 0.0.0.0:{CONTROLPLANE_API_PORT} (default 3100) |
ai.clawdie.home.arpa (internal) |
REST API for agents + UI |
| HTTP API | 0.0.0.0:{CONTROLPLANE_API_PORT} (default 3100) |
https://ai.clawdie.si (public via nginx) |
Operator API for harness |
| Auth | 0.0.0.0:{CONTROLPLANE_API_PORT} (default 3100) |
/api/auth/* |
Better Auth (email+password sessions) |
| Telegram bot | N/A | Telegram API | Incoming messages |
| PostgreSQL | host warden0:5432 by default; db jail when selected |
Internal (warden0) | Shared data store |
Harness access (planned):
- Operator works from a terminal UI (TUI) and extensions, not a browser dashboard.
- Tailscale exposure uses host nginx TLS for the MagicDNS hostname, proxying to
http://127.0.0.1:{CONTROLPLANE_API_PORT}(default 3100).
Database
PostgreSQL uses the shared Data Service: host PostgreSQL by default, or the db jail when DB_RUNTIME=jail is explicitly selected.
agents — agent registry (Orchestrator, Sysadmin, DBA, Git Admin)
tasks — work items (shared by Telegram + HTTP API)
agent_activity — immutable audit trail
agent_budgets — token spend limits per agent
approvals — operator sign-offs for expensive operations
operators — human admin accounts
cp_users — Better Auth user accounts
cp_sessions — Better Auth sessions
cp_accounts — Better Auth credential accounts
cp_verifications — Better Auth verification tokens
No systems table. Single-tenant.
Security Model
- Telegram: Existing auth (chat ID → group registration)
- HTTP API: session cookies for operator, Bearer API keys for agents
- Operator account: Auto-created during setup, password hashed with bcrypt/argon2
- Agent API keys: Hashed bearer tokens, injected via CONTROLPLANE_* env vars at spawn time
- Agent-to-API auth: Requires Bearer token matching
CONTROLPLANE_SHARED_SECRET - Secrets: Encrypted at rest (AES-256-GCM), master key at
~/.clawdie/secrets/master.key
Heartbeat Policy
| Role | Heartbeat | Policy |
|---|---|---|
| Orchestrator | disabled | On-demand only (cost-conscious) |
| Sysadmin | enabled, 86400s | Daily health check + on-demand |
| DBA | disabled | On-demand only |
| Git Admin | disabled | On-demand only |
Token Budgets
Daily total: 100,000 tokens (configurable via CONTROLPLANE_DAILY_TOKENS)
├── Orchestrator: 80,000 (80%)
├── Sysadmin: 10,000 (10%)
├── DBA: 5,000 (5%)
└── Git Admin: 5,000 (5%)
Hard limits per agent. Reset at UTC midnight. Budget check before every agent spawn.
Jail Isolation (Phase 7)
When CONTROLPLANE_JAIL_ISOLATION=YES in .env, specialist agents run inside
dedicated FreeBSD jails instead of on the host:
| Agent ID | Jail |
|---|---|
sysadmin |
host (no jail — privileged ops) |
db-admin |
db-worker |
git-admin |
git-worker |
coordinator |
ctrl-worker |
Feature flag defaults to NO. See src/controlplane-runner.ts (AGENT_JAIL_MAP).
Prompt Guardrails
Configurable limits prevent runaway context growth:
| Variable | Purpose |
|---|---|
AGENT_MAX_INBOUND_CHARS |
Max characters per incoming message |
AGENT_MAX_BACKLOG_MESSAGES |
Max messages included from session backlog |
AGENT_MAX_BACKLOG_CHARS |
Max total characters from backlog messages |
AGENT_MAX_PROMPT_CHARS |
Max total prompt size sent to model |
AGENT_SESSION_MAX_BYTES |
Max bytes per session before rollover |
Context-Exceeded Handling
When the model hits context_window_exceeded, the agent replies with a short guidance message instead of retrying indefinitely. The cursor is not rolled back.
Session Rollover
When AGENT_SESSION_MAX_BYTES is exceeded, the session is rolled over — a fresh session is started and the old one is archived.
References
doc/CONTROLPLANE-AGENT-ROLES.md— role definitions, skill mappingdoc/CONTROLPLANE-MESSAGE-CONTRACT.md— API contractsdoc/COLIBRI-PI-CONTROL-PLAN.md— planned Pi-only simplification and Colibri event fabricdoc/INTERAGENT-RUN-CONTRACT.md— cross-host run manifest and artifact exchange contract