Park the Pi-only control simplification and cross-host run contract so other agents can review before implementation starts. --- Build: pass | Tests: pass — 2456 passed (182 files)
349 lines
8.5 KiB
Markdown
349 lines
8.5 KiB
Markdown
# Control Plane Message Contract
|
||
|
||
## Overview
|
||
|
||
Agents query the control plane HTTP API for governance (tasks, budgets, approvals) and local resources for operations (sessions, skills). This is the dual-layer decision model.
|
||
|
||
```
|
||
Agent Heartbeat:
|
||
1. GET /api/controlplane/state → "What's my budget? Am I active?"
|
||
2. GET /api/controlplane/tasks?role=X → "What's assigned to me?"
|
||
3. Read local data/sessions/{name}.jsonl → "What did I do last time?"
|
||
4. Execute skill or escalate
|
||
5. POST /api/controlplane/activity → "Here's what I did"
|
||
```
|
||
|
||
---
|
||
|
||
## Control Plane API Queries
|
||
|
||
### 1. Get State
|
||
|
||
```http
|
||
GET /api/controlplane/state
|
||
Authorization: Bearer {CONTROLPLANE_SHARED_SECRET}
|
||
```
|
||
|
||
**Response:**
|
||
|
||
```json
|
||
{
|
||
"agents": [
|
||
{ "id": "clawdie", "role": "orchestrator", "heartbeat_enabled": false },
|
||
{
|
||
"id": "sysadmin",
|
||
"role": "sysadmin",
|
||
"heartbeat_enabled": true
|
||
},
|
||
{
|
||
"id": "db-admin",
|
||
"role": "db-admin",
|
||
"heartbeat_enabled": false
|
||
},
|
||
{
|
||
"id": "git-admin",
|
||
"role": "git-admin",
|
||
"heartbeat_enabled": false
|
||
}
|
||
],
|
||
"budget": {
|
||
"daily_tokens": 100000,
|
||
"spent_today": 25000,
|
||
"remaining": 75000,
|
||
"hard_limit_exceeded": false,
|
||
"allocation": {
|
||
"orchestrator": 80000,
|
||
"sysadmin": 10000,
|
||
"db-admin": 5000,
|
||
"git-admin": 5000
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
### 2. Get Task Queue
|
||
|
||
```http
|
||
GET /api/controlplane/tasks?role={agent_role}
|
||
Authorization: Bearer {CONTROLPLANE_SHARED_SECRET}
|
||
```
|
||
|
||
**Response:**
|
||
|
||
```json
|
||
{
|
||
"tasks": [
|
||
{
|
||
"task_id": "TASK-001",
|
||
"title": "Check if db jail is running",
|
||
"description": "Verify clawdie-db is up and healthy",
|
||
"assigned_to": "sysadmin",
|
||
"priority": "medium",
|
||
"status": "pending",
|
||
"created_at": "2026-04-07T10:30:00Z",
|
||
"context": { "jail_name": "clawdie-db" }
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
### 3. Get Approvals
|
||
|
||
```http
|
||
GET /api/controlplane/approvals?agent_id={agent_id}
|
||
Authorization: Bearer {CONTROLPLANE_SHARED_SECRET}
|
||
```
|
||
|
||
**Response:**
|
||
|
||
```json
|
||
{
|
||
"pending": [
|
||
{
|
||
"approval_id": "APPR-042",
|
||
"task_id": "TASK-002",
|
||
"operation": "merge PR with conflict resolution",
|
||
"estimated_tokens": 8500,
|
||
"operator_approved": false
|
||
}
|
||
],
|
||
"approved": [
|
||
{
|
||
"approval_id": "APPR-041",
|
||
"operation": "backup database",
|
||
"operator_approved": true,
|
||
"approved_at": "2026-04-07T10:45:00Z"
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
### 6. Proxy hostd Operation (jail agents)
|
||
|
||
Jail agents use this endpoint to execute privileged host operations (bastille, zfs, pf) through the controlplane API instead of direct Unix socket access.
|
||
|
||
```http
|
||
POST /api/controlplane/hostd
|
||
Authorization: Bearer {CONTROLPLANE_SHARED_SECRET}
|
||
|
||
{
|
||
"op": "bastille-list",
|
||
"params": {}
|
||
}
|
||
```
|
||
|
||
**Response:**
|
||
|
||
```json
|
||
{
|
||
"ok": true,
|
||
"output": "JID IP Address Hostname Path",
|
||
"exitCode": 0
|
||
}
|
||
```
|
||
|
||
The API proxies the request to the hostd daemon on the host. Available ops match those in `src/hostd/privileged-commands.ts` (bastille-list, bastille-cmd, zfs-snapshot, etc.).
|
||
|
||
---
|
||
|
||
## Agent Posts
|
||
|
||
### 1. Task Completion
|
||
|
||
```http
|
||
POST /api/controlplane/activity
|
||
Authorization: Bearer {CONTROLPLANE_SHARED_SECRET}
|
||
|
||
{
|
||
"event_type": "task_completed",
|
||
"task_id": "TASK-001",
|
||
"agent_id": "sysadmin",
|
||
"skill_executed": "jail-status",
|
||
"result": {
|
||
"status": "success",
|
||
"output": "Jail clawdie-db running, uptime 5d 3h, CPU 2.1%",
|
||
"tokens_used": 420
|
||
}
|
||
}
|
||
```
|
||
|
||
### 2. Approval Request
|
||
|
||
```http
|
||
POST /api/controlplane/activity
|
||
Authorization: Bearer {CONTROLPLANE_SHARED_SECRET}
|
||
|
||
{
|
||
"event_type": "approval_request",
|
||
"agent_id": "git-admin",
|
||
"operation": "Merge PR #42 with conflict resolution",
|
||
"reasoning": "Conflict detected in src/index.ts",
|
||
"estimated_tokens": 8500
|
||
}
|
||
```
|
||
|
||
### 3. Error / Escalation
|
||
|
||
```http
|
||
POST /api/controlplane/activity
|
||
Authorization: Bearer {CONTROLPLANE_SHARED_SECRET}
|
||
|
||
{
|
||
"event_type": "error",
|
||
"agent_id": "db-admin",
|
||
"error_message": "Vacuum failed: database locked",
|
||
"action_taken": "Escalated to orchestrator",
|
||
"tokens_used": 1200
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## Local Resources (No API)
|
||
|
||
### Session History
|
||
|
||
```typescript
|
||
const sessionPath = `${process.env.CONTROLPLANE_SESSION_CWD}/${agentName}.jsonl`;
|
||
```
|
||
|
||
JSONL format, one entry per line:
|
||
|
||
```json
|
||
{
|
||
"timestamp": "2026-04-06T10:00:00Z",
|
||
"task": "Check jail status",
|
||
"skill": "jail-status",
|
||
"outcome": "running",
|
||
"tokens_used": 420
|
||
}
|
||
```
|
||
|
||
### Skills Catalog
|
||
|
||
Skills are not scanned from a directory at runtime. Instead:
|
||
|
||
1. `agent/library.yaml` defines all skills with invoke patterns and compact summaries.
|
||
2. The control plane injects the compact skill index via `--append-system-prompt` when spawning pi (with `--no-skills` to disable pi's built-in discovery).
|
||
3. Full skill content is served on-demand through the `skills_search` extension tool.
|
||
|
||
```typescript
|
||
import { getAgentSkillIndex } from './skill-library';
|
||
const index = getAgentSkillIndex(agentId);
|
||
```
|
||
|
||
---
|
||
|
||
## Telegram Images (Vision Fallback)
|
||
|
||
Telegram photo messages are persisted to disk under `TMP_DIR` and stored in the
|
||
conversation as a placeholder:
|
||
|
||
`[Photo saved: /home/clawdie/clawdie-ai/tmp/telegram/photos/<chat>/<id>.jpg]`
|
||
|
||
For screenshot/meme analysis, the runtime can optionally run a **vision helper
|
||
model** that OCRs the saved image and injects a bounded block into the prompt:
|
||
|
||
`[Vision OCR] ... [/Vision OCR]`
|
||
|
||
The main chat model must answer using the OCR/summary in that block and should
|
||
not claim it “can’t see the image”.
|
||
|
||
**Config:**
|
||
|
||
- `OPENROUTER_API_KEY` — required when `VISION_PROVIDER=openrouter`
|
||
- `VISION_PROVIDER=openrouter`
|
||
- `VISION_MODEL=nvidia/nemotron-nano-12b-v2-vl:free` (default)
|
||
- `VISION_MAX_IMAGES` (default `1`)
|
||
- `VISION_MAX_CHARS_PER_IMAGE` (default `4000`)
|
||
- `VISION_MAX_TOTAL_CHARS` (default `6000`)
|
||
|
||
## The Loop
|
||
|
||
```
|
||
[CONTROL PLANE API] [LOCAL RESOURCES] [AGENT]
|
||
|
|
||
|<-- GET /api/controlplane/state -----|
|
||
|<-- GET /api/controlplane/tasks ------|
|
||
| |-- Read session JSONL
|
||
| |-- Load skills catalog
|
||
| |-- Pattern match → execute skill
|
||
|
|
||
|<-- POST /api/controlplane/activity --|
|
||
| [Done]
|
||
```
|
||
|
||
Most work (skill execution) happens locally. API is coordination + audit.
|
||
|
||
---
|
||
|
||
## Implementation Mapping
|
||
|
||
### src/index.ts
|
||
|
||
```typescript
|
||
app.get('/api/controlplane/state', requireAuth, async (req, res) => { ... });
|
||
app.get('/api/controlplane/tasks', requireAuth, async (req, res) => { ... });
|
||
app.post('/api/controlplane/activity', requireAuth, async (req, res) => { ... });
|
||
```
|
||
|
||
### src/controlplane-runner.ts
|
||
|
||
```typescript
|
||
const agentEnv = {
|
||
CONTROLPLANE_AGENT_ID: agent.id,
|
||
CONTROLPLANE_API_URL: `http://localhost:${process.env.CONTROLPLANE_API_PORT || 3100}`,
|
||
CONTROLPLANE_API_KEY: agent.apiKey,
|
||
CONTROLPLANE_TASK_ID: task.id,
|
||
CONTROLPLANE_WORKSPACE_CWD: '/home/clawdie/clawdie-ai',
|
||
CONTROLPLANE_SESSION_CWD: '/home/clawdie/clawdie-ai/data/sessions',
|
||
};
|
||
```
|
||
|
||
---
|
||
|
||
## Runner Modes (pi vs aider)
|
||
|
||
Control plane tasks are executed by a runner. Default is `pi`, but an Aider
|
||
runner can be enabled for multi-agent orchestration with tmux glass-pane
|
||
visibility.
|
||
|
||
### Environment switches
|
||
|
||
- `CONTROLPLANE_RUNNER=pi` (default)
|
||
- `CONTROLPLANE_RUNNER=aider`
|
||
- `CONTROLPLANE_AIDER_BIN=aider`
|
||
- `CONTROLPLANE_AIDER_FLAGS="--no-check-update --no-gitignore --no-auto-commits --no-dirty-commits"`
|
||
- `CONTROLPLANE_AIDER_TMUX_SESSION=clawdie-controlplane`
|
||
- `CONTROLPLANE_AIDER_LOG_DIR=/home/clawdie/clawdie-ai/tmp/controlplane/aider`
|
||
|
||
### tmux glass-pane
|
||
|
||
When `CONTROLPLANE_RUNNER=aider`, each agent streams output to:
|
||
`CONTROLPLANE_AIDER_LOG_DIR/{agentId}.log`.
|
||
|
||
If you already have a tmux session named the same as
|
||
`CONTROLPLANE_AIDER_TMUX_SESSION` and its window indices are constrained by a
|
||
custom config, tmux may reject `new-window` with an “index in use” error. Use
|
||
an empty session name or delete stale windows before running the controlplane
|
||
to avoid this edge case.
|
||
|
||
Attach:
|
||
|
||
```
|
||
tmux attach -t clawdie-controlplane
|
||
```
|
||
|
||
---
|
||
|
||
## References
|
||
|
||
- `doc/CONTROLPLANE-ARCHITECTURE.md` — service architecture
|
||
- `doc/CONTROLPLANE-AGENT-ROLES.md` — role definitions
|
||
- `doc/COLIBRI-PI-CONTROL-PLAN.md` — planned Pi-only simplification and Colibri event fabric
|
||
- `doc/INTERAGENT-RUN-CONTRACT.md` — cross-host run manifest and artifact exchange contract
|
||
- `SOUL.md`, `.agent/identities/SYSADMIN.md`, `.agent/identities/DB_ADMIN.md`, `.agent/identities/GIT_ADMIN.md` — agent identities
|