colibri/docs/guide/architecture/controlplane.md at main

clawdie/colibri

Fork 0

Sam & Claude 95c487546d

CI / rust (pull_request) Waiting to run

Details

CI / markdown (pull_request) Waiting to run

Details

CI / port (pull_request) Waiting to run

Details

CI / agent-jail-pkgs (pull_request) Waiting to run

Details

docs(guide): port 39 procedural docs from clawdie-ai to colibri

New docs/guide/ tree — canonical home for operator-facing procedural docs.
Starlight frontmatter added to all files. 0.12 alignment fixes applied:

- v0.11.0 → v0.12.0 throughout
- PI_TUI_PROVIDER/MODEL → DEEPSEEK_API_KEY
- Headless Codex login → Agent runtime setup (zot + RPC mode)
- /login and auth.json references removed
- pi → zot in provider-fallback spawn reference
- colibri-provider-verify (was pi-provider-smoke)
- Language cleanup: smoke test → verification, fake → test,
  can't self-fix → requires operator intervention,
  broken → unresponsive, Fix anything broken → Verify all checks pass

Two-tree model: docs/wiki/ (decisions) + docs/guide/ (procedural).
Single source of truth in colibri. clawdie-ai docs/public/ to be retired.

2026-06-26 09:16:43 +02:00

7.2 KiB

Raw Permalink Blame History

title
Control Plane

Starting with v0.10.0, Clawdie has a built-in multi-agent control plane. The agent named after your install (e.g. "Clawdie" or "Atlas") becomes the orchestrator of her own computer — with a Sysadmin, DBA, and Git Admin working under her.

This is not a separate service or jail. It runs inside the existing clawdie service on the host.

What It Is

A lightweight orchestration layer baked into Clawdie that gives her:

Org chart — Orchestrator + Sysadmin + DBA + Git Admin, each with a defined scope
Task queue — work items assigned to agents, created by Telegram or by the orchestrator herself
Token budgets — daily limits per agent, hard stops, operator approval for expensive ops
Activity log — immutable audit trail of every decision and skill execution
Heartbeat scheduling — Sysadmin wakes daily for health checks; others wake on demand
Agentic harness — a terminal-first operator UI with extensions, safety gates, and live status

You are the human operator. You approve expensive operations, review the activity log, and can create tasks directly via the HTTP API or Telegram.

Architecture

Single clawdie service (host):
  ├── Telegram intake          — existing
  ├── HTTP API (port 3100)     — new: /api/controlplane/...
  ├── Unified scheduler        — 30s ticks, Telegram + heartbeats
  ├── Agent executor           — spawn("pi", ...) with CONTROLPLANE_* env
  ├── Agentic harness (TUI)    — extensions, safety gates, live status
  └── Shared hostd access      — privileged ops (bastille, zfs, pf)

Agents run on the host via the pi CLI. Each agent gets:

A system prompt from their identity file (SYSADMIN_AGENT.md, DB_ADMIN_AGENT.md, etc.)
A persistent session in data/sessions/{agent}.jsonl
Access to the skills catalog in data/skills/
CONTROLPLANE_* env vars pointing at the local HTTP API
All agent spawns use --no-skills to disable pi's built-in skill discovery; skills are injected via --append-system-prompt from the catalog

API authentication requires CONTROLPLANE_SHARED_SECRET — a Bearer token that all agents and API clients must present.

Default System

Setup auto-provisions a default system named after your AGENT_NAME:

Agent	Role	Heartbeat	Budget
Orchestrator	Primary decision-maker, delegator	On-demand	80%
Sysadmin	Jails, ZFS, PF, services	Daily (24h)	10%
DBA	PostgreSQL ops	On-demand	5%
Git Admin	Merges, releases, mirrors	On-demand	5%

Budget is token-based. Default: 100,000 tokens/day. Hard stops enforced before every spawn.

Dual-Layer Decision Model

Every agent queries two systems before acting:

1. Control plane API  → "What's my task? What's my budget?"
2. Local session      → "What did I do last time? What skills do I have?"

Then: pattern-match task → skill → execute (deterministic, low cost)
      no match          → escalate to orchestrator or request operator approval

Most work is skill execution — 300–1,200 tokens. Reasoning is reserved for genuinely ambiguous situations.

Skills

Agents use a catalog of operational skills sourced from agent/library.yaml.

Skills are discoverable via tags and the skills_search extension tool. The control plane can route tasks to the right specialist without depending on the LLM to “remember” what exists.

Skill	Agent	Trigger example
`jail-status`	Sysadmin	"Check if db jail is running"
`disk-usage`	Sysadmin	"How much free disk?"
`system-stats`	Sysadmin	"CPU and memory load?"
`service-restart`	Sysadmin	"Restart nginx service"
`backup-db`	DBA	"Back up the database"
`db-vacuum`	DBA	"Run vacuum on system_brain"
`db-migrate`	DBA	"Apply pending migrations"
`git-merge`	Git Admin	"Merge PR #42 into main"
`git-release-tag`	Git Admin	"Tag version v0.12.0"

The catalog evolves over time; for the authoritative current list run /skills in Telegram or just skill-list on the host.

Agents also have access to the skills_search extension tool, which queries the skills catalog at runtime to find relevant skills without consuming session tokens.

Implementation Progress

Built in 7 phases. Each phase adds one module and turns its test todos green.

Phase	Module	Status
1	DB schema + provisioning (`setup/controlplane.ts`)	✅
2	HTTP API routes (`src/controlplane-api.ts`)	✅
3	Control plane runner (`src/controlplane-runner.ts`)	✅
4	Budget enforcement (`src/controlplane-budget.ts`)	✅
5	Session persistence (`src/agent-session.ts`)	✅
6	Skills discovery (`src/skills-discovery.ts`)	✅
7	Scheduler integration (`src/task-scheduler.ts`)	✅

Setup

just setup-controlplane

# Output:
# ✓ Creating control plane tables...
# ✓ Hiring orchestrator agent...
# ✓ Hiring Sysadmin agent (heartbeat: 24h)...
# ✓ Hiring DBA agent...
# ✓ Hiring Git Admin agent...
# ✓ Copying 15 operational skills to data/skills/...
# ✓ Operator account created: clawdie
# ✓ Harness: run in terminal (no browser dashboard)

Runtime Observability

Every agent run (orchestrator main chat or specialist heartbeat) records three provider/model values in agent_activity.payload:

Field	Meaning
`configured_*`	What `provider.env` says (`DEEPSEEK_API_KEY`)
`effective_*`	What was actually passed to pi (after fallback swap)
`actual_*`	What pi reports having used (parsed from session JSONL)

configured_* and effective_* differ when provider fallback is active (cooldown is live, runtime is using the operator's chosen fallback). actual_* should match effective_* for a successful run; a divergence suggests pi rewrote the model selection internally.

/budgetreport and /tokens surface these values; /policy shows the fallback cooldown line when one is active.

References

SOUL.md, SYSADMIN_AGENT.md, DB_ADMIN_AGENT.md, GIT_ADMIN_AGENT.md — agent identity files
Provider Fallback — automatic provider switching when the primary hits a usage cap
Structured Reports — operator-facing report family + free-text routing
Colibri Architecture — the Rust control plane replacing this TypeScript implementation

7.2 KiB Raw Permalink Blame History Unescape Escape

What It Is

Architecture

Default System

Dual-Layer Decision Model

Skills

Implementation Progress

Setup

Runtime Observability

References

7.2 KiB

Raw Permalink Blame History