clawdie-ai/docs/public/operate/security.md at main

Operator & Codex c136418c5a Drop remaining config compatibility shims (Codex)

Remove old Telegram/TTS env fallbacks, clean compatibility wording, and make the privilege-model doc describe mac_do without claiming it is already bootstrapped.

---
Build: pass | Tests: pass — 2372 passed (704 files)

2026-05-10 21:07:03 +02:00

20 KiB

Raw Permalink Blame History

title
Clawdie Security Model

In Plain Language

Clawdie is designed around a simple idea:

ordinary agent work should happen inside a sandbox
sensitive host files should stay outside that sandbox
persistent system changes should only happen when a trusted operator explicitly asks for them

This document explains where Clawdie relies on real technical protection, where it relies on operator trust, and where it uses observability to help detect problems.

Trust Model

Entity	Trust Level	Why
Main group	Trusted operator context	Private self-chat, admin control
Non-main groups	Untrusted	Other users may be malicious or careless
Jail agents	Sandboxed	Isolated execution environment
Channel messages	User input	May contain prompt injection or unsafe instructions
Host system	Trusted boundary	Runs the orchestrator and enforces the rules

The Three Security Domains

Not everything in Clawdie runs with the same power. There are three different domains:

1. Sandboxed Runtime

This is the normal agent execution path.

Runs inside a FreeBSD jail
Can only see explicitly mounted paths
Cannot freely modify the host application or host operating system
Handles routine chat, task execution, and file work

This is the primary protection boundary.

2. Project Maintenance

This is the controlled path for changing the Clawdie codebase itself.

Used for applying skills, updates, and tracked code changes
More powerful than ordinary jailed runtime
Intended for deliberate project changes, not routine chat execution

This is not the same as a sandbox escape. It is a separate, operator-directed maintenance path.

3. Host Administration

This is the path for changing the machine itself.

Used for actions like sysrc, service, nginx config, routing, PF, and other host-level settings
Intended only for trusted operator workflows
Must be treated as explicit administration, not normal agent behavior

At runtime, privileged host operations from the agent go through hostd — a root daemon on /var/run/<agent>-hostd.sock with whitelisted op handlers. The agent user calls hostd(op, params); the daemon validates params with Zod and runs the handler as root. Unknown ops and invalid params are rejected.

Jail agents call hostd through the controlplane API (POST /api/controlplane/hostd), not via direct socket. The extension's hostd-bridge.ts sends an HTTP request to the API, which authenticates via CONTROLPLANE_SHARED_SECRET and proxies to the hostd daemon. This eliminates the need to mount the Unix socket into jails — only network access to the host IP is required.

Shell injection prevention: hostd applies Zod regex validators on all string parameters in src/privileged-commands.ts — jailName, datasetName, snapshotTag, and serviceName must match strict patterns before any command is constructed.

This satisfies the "host-admin actions should run through a separate executor" rule without requiring the agent to have sudo access.

Privilege Escalation Paths

Three separate paths exist for actions that need more than ordinary user privileges. Each has a different audit story and a different intended caller.

Path	Caller	When it runs	What's logged
`hostd`	Agent runtime	Every privileged op the agent performs	`hostd` writes a structured entry per op with timestamp, op name, params, and result
`mac_do`	Kernel UID boundary	Narrowly-scoped UID transitions inside jails	No native logging — wrap through `hostd` for audited agent workflows
`sudo`	Operator at the CLI	Interactive admin work, setup-time installs	Standard syslog `sudo` audit

Agent code never uses sudo. All runtime privileged operations route through hostd so that the audit trail is uniform and the privilege boundary is one socket, not a setuid binary. Setup-time scripts (setup/install.ts, setup/environment.ts) still use sudo because they run interactively as the operator, not as the agent runtime.

mac_do is the FreeBSD MAC framework that allows kernel-enforced credential transitions without sudoers parsing. On systems where it is enabled, agents do not call mdo directly. When a jail-internal UID transition is needed (for example to run a postgres-owned command), hostd is the orchestrator and decides whether to exec directly or to use mdo for the transition. This keeps the audit trail flowing through hostd's log even when the underlying mechanism is kernel-level.

Safety Harness Gates

The agent's safety harness (.agent/harness/safety.yaml) inspects every shell command before it runs. Privileged-looking commands trigger a confirm prompt to the operator:

sudo in a bash command → confirm-sudo
mdo in a bash command → confirm-mdo
hostd('bastille-destroy', …) op → confirm-bastille-destroy

These are belt-and-suspenders on top of hostd's own validation: if the agent ever tries to escape hostd and shell out to sudo directly, the harness still prompts for explicit operator approval before execution.

A Skill Is Not a Permission

Skills are instruction packages. They describe how to do work. They do not grant privileges by themselves.

The real permission comes from the execution context:

where the code runs
what directories are mounted
whether the host process allows the action
whether the operator explicitly requested it

This distinction matters:

If a jailed agent finds a way to modify host code without approval, that is a security failure.
If the trusted operator explicitly invokes a host-admin workflow to change nginx, that is an authorized admin action.

Skill Privilege Tiers

To keep the model clear, skills should be thought of in three tiers:

`sandbox-safe`

Safe for ordinary jailed execution.

Read mounted project files
Write only to approved writable mounts
No host OS changes
No persistent modification of the host app outside approved writable paths

`project-write`

Allowed to change the Clawdie checkout deliberately.

Apply code changes through the skills engine
Modify tracked project files
Still should not change host machine state like nginx, PF, routing, or sysrc

`host-admin`

Allowed to change machine-wide host state.

Host services
nginx config and webroots
routing and forwarding
persistent operating system settings

This tier should be limited to trusted operator flows only.

SSH and Automation Boundaries

Clawdie should keep the SSH trust model simple.

Default rule:

automation such as Ansible SSHes into the FreeBSD host
Bastille jails are managed from the host
jails are not treated as separate SSH-managed servers by default

That means host automation should prefer:

bastille cmd <jail> <cmd>
explicit file placement into jail roots
host-side validation of jail state

This approach has security advantages:

fewer SSH keys to manage
less secret sprawl
no need to enable sshd inside every persistent jail
clearer distinction between host trust and jail workload trust

For the current architecture, this means:

one SSH path for the operator or automation
cms and db remain host-managed service jails
worker jails should not gain SSH access at all
Strapi admin UI stays disabled by default; enable with CMS_ADMIN_UI=YES only when explicitly needed

Direct SSH into a jail should be treated as an exception, not the default. It is only justified when a jail becomes a deliberately independent operational boundary.

Current rule:

keep SSH on the FreeBSD host
keep cms, db, git, and worker jails host-managed
do not introduce a separate operator jail into the active trust model

Service Identity Consistency

Host-mounted jail paths depend on numeric ownership, not just usernames.

For the current model, keep host-managed service identities explicit and stable whenever a mounted path crosses the host/jail boundary.

Why this matters:

mounted datasets become easier to reason about
ownership does not silently drift because host and jail assigned different IDs
shared tooling stays separate from the interactive operator account

For the current layout, see Host operator model.

Security Boundaries

1. Jail Isolation (Primary Boundary)

Agents execute in FreeBSD jails, providing:

Process isolation: jail processes cannot directly affect the host
Filesystem isolation: only explicitly mounted directories are visible
Non-root execution: runs as unprivileged node user (uid 1000)
Ephemeral execution: fresh jail-backed worker per invocation

Rather than relying mainly on application-level permission checks, Clawdie reduces risk by limiting what is mounted into the jail in the first place.

2. Mount Security

External allowlist

Mount permissions are stored at ~/.config/clawdie-cp/mount-allowlist.json, which is:

outside project root
never mounted into jails
not modifiable by jailed agents

Default blocked patterns

.ssh, .gnupg, .aws, .azure, .gcloud, .kube, .docker,
credentials, .env, .netrc, .npmrc, id_rsa, id_ed25519,
private_key, .secret

Protections

symlink resolution before validation
jail path validation that rejects .. and absolute paths
nonMainReadOnly option forces read-only for non-main groups

3. Read-Only Project Root

The main group's project root is mounted read-only during normal jailed execution.

Writable paths the agent needs, such as the group folder, IPC directory, and .agent/, are mounted separately.

This matters because otherwise the jailed agent could modify host application code such as:

src/
dist/
package.json
startup scripts
security checks

If that happened, the current jail might still be isolated, but the next host restart could run the modified code outside the jail. In other words, the attack would persist into the trusted host application.

So:

read-only project root protects the host application from routine agent execution
separate writable mounts still allow useful work
deliberate project changes should use a separate maintenance path, not ordinary chat runtime

4. Session Isolation

Each group has isolated Claude sessions at data/sessions/{group}/.agent/.

groups cannot see each other's conversation history
session data includes message history and file contents read during the session
this helps prevent cross-group information disclosure

5. IPC Authorization

Messages and task operations are checked against group identity.

Operation	Main Group	Non-Main Group
Send message to own chat	Yes	Yes
Send message to other chats	Yes	No
Schedule task for self	Yes	Yes
Schedule task for others	Yes	No
View all tasks	Yes	Own only
Manage other groups	Yes	No

6. API Authentication

The control plane HTTP API requires a Bearer token matching CONTROLPLANE_SHARED_SECRET. All agent-to-API communication authenticates with this secret. Requests without a valid token are rejected.

7. Credential Handling

Mounted credentials

Claude auth tokens filtered from .env and mounted read-only

Not mounted

channel credentials outside allowlisted provider vars
mount allowlist
any credentials matching blocked patterns

Credential filtering

Only these environment variables are exposed to jails:

const allowedVars = [
  'OPENROUTER_API_KEY',
  'ANTHROPIC_API_KEY',
  'OPENAI_API_KEY',
];

Important limitation:

Provider credentials passed into a jail worker may still be discoverable by the agent through Bash or file operations inside that jail. The long-term goal should be to reduce credential exposure further.

Security, Obscurity, and Observability

These are not the same thing.

Security

Security means there is a real technical control that blocks or limits harmful behavior.

Examples:

jail isolation
read-only mounts
IPC authorization
credential filtering

Obscurity

Obscurity means hiding details and hoping that makes attacks harder.

Examples:

unusual file names
undocumented paths
hidden implementation details

Obscurity can sometimes reduce casual misuse, but it is not treated as a primary defense in Clawdie.

Observability

Observability means being able to see what happened.

Examples:

logs
task history
IPC traces
operator diagnostics
screenshots of terminal state

Observability does not stop an attack, but it helps detect mistakes, investigate failures, and understand incidents.

Where `tmux-screenshot` Fits

The tmux-screenshot skill is best understood as an observability and diagnostics tool, not a security boundary.

It can help operators:

inspect a terminal session visually
capture evidence during debugging
confirm what the agent or operator saw at a specific moment
support incident review when plain text logs are not enough

It does not:

enforce isolation
block unsafe actions
replace logs or authorization

So the right mental model is:

security controls prevent damage
observability tools help explain damage or confirm normal behavior

Why a Small Codebase Helps Security

Clawdie's small codebase is a real advantage, but it is a supporting advantage, not the primary security boundary.

A smaller, more reviewable system makes it easier to:

understand what the software actually does
inspect important code paths without getting lost in layers of abstraction
notice risky changes sooner
keep hidden complexity and accidental privilege paths from accumulating

This improves auditability and lowers the chance of security problems hiding in unused or overly complex code.

But it is important to be precise:

a small codebase does not replace jail isolation
a small codebase does not replace authorization checks
a small codebase does not make unsafe mounts safe

So the right way to think about it is:

small code helps humans review the system
hard boundaries like jails, mount rules, and authorization still do the actual enforcement

Sensitive Artifact Handling

Observability artifacts can themselves be sensitive.

A screenshot may contain:

API keys
private file paths
customer data
chat history
terminal history
internal hostnames or network details

Because of that, screenshots and similar artifacts should be treated as sensitive operational data:

store them in a controlled location
avoid exposing them to unrelated groups
keep retention limited
redact when sharing externally

Privilege Comparison

Capability	Main Group	Non-Main Group
Project root access	`/workspace/project` (ro)	None
Group folder	`/workspace/group` (rw)	`/workspace/group` (rw)
Global memory	Implicit via project	`/workspace/global` (ro)
Additional mounts	Configurable	Read-only unless allowed
Network access	Unrestricted	Unrestricted
MCP tools	All	All

Recommended Enforcement Rules

To keep the security model coherent, the host process should enforce rules like these:

non-main groups may only use sandbox-safe workflows
main group uses sandboxed execution by default
project-write actions require explicit operator intent
host-admin actions require explicit operator intent and clear confirmation
skills cannot self-elevate their privileges
host-admin actions should run through a separate executor, not the normal jailed worker path
important admin actions should be logged with who requested them, what changed, and whether they succeeded
automation should authenticate to the host, not to service jails, unless a separate SSH boundary is explicitly intended

Prevent, Detect, Recover

Another simple way to understand the model:

Prevent

Controls that try to stop bad outcomes:

jail isolation
read-only project root
limited mounts
IPC authorization
credential filtering
prompt guardrails (AGENT_MAX_INBOUND_CHARS, AGENT_MAX_PROMPT_CHARS) — input validation that rejects or truncates oversized prompts before they reach the model
context-exceeded handling — when a prompt exceeds the model's context window, the runtime returns a structured error instead of crashing or retrying indefinitely, preventing DoS via oversized inputs

Detect

Controls that help spot problems:

logs
task records
audit trails
tmux-screenshot

Recover

Controls and workflows that help restore a safe state:

restart clean worker environments
revert or repair bad config changes
inspect logs and screenshots
rotate credentials if exposure is suspected

Security Architecture Diagram

┌──────────────────────────────────────────────────────────────────┐
│                      UNTRUSTED INPUT ZONE                        │
│  Channel messages, prompts, pasted commands, external content    │
└────────────────────────────────┬─────────────────────────────────┘
                                 │
                                 ▼ Validation, routing, auth
┌──────────────────────────────────────────────────────────────────┐
│                     HOST PROCESS (TRUSTED, user)                 │
│  • Message routing                                               │
│  • IPC authorization                                             │
│  • Mount validation                                              │
│  • Jail lifecycle                                                │
│  • Credential filtering                                          │
│  • Skill privilege decisions                                     │
└────────────────────────────────┬─────────────────────────────────┘
                                 │
         ┌───────────────────────┼───────────────────────┐
         │                       │                       │
         ▼                       ▼                       ▼
┌──────────────────┐  ┌──────────────────────┐  ┌──────────────────────┐
│ JAIL RUNTIME     │  │ ADMIN / MAINTENANCE  │  │ hostd (root)         │
│ (SANDBOXED)      │  │ PATHS                │  │ Unix socket          │
│ • agent execution│  │ • project-write      │  │ • whitelisted ops    │
│ • limited mounts │  │ • host-admin (manual)│  │ • bastille/zfs/pf    │
│ • no host changes│  │ • operator intent    │  │ • Zod-validated      │
└──────────────────┘  └──────────────────────┘  └──────────────────────┘

20 KiB Raw Permalink Blame History

In Plain Language

Trust Model

The Three Security Domains

1. Sandboxed Runtime

2. Project Maintenance

3. Host Administration

Privilege Escalation Paths

Safety Harness Gates

A Skill Is Not a Permission

Skill Privilege Tiers

sandbox-safe

project-write

host-admin

SSH and Automation Boundaries

Service Identity Consistency

Security Boundaries

1. Jail Isolation (Primary Boundary)

2. Mount Security

3. Read-Only Project Root

4. Session Isolation

5. IPC Authorization

6. API Authentication

7. Credential Handling

Security, Obscurity, and Observability

Security

Obscurity

Observability

Where tmux-screenshot Fits

Why a Small Codebase Helps Security

Sensitive Artifact Handling

Privilege Comparison

Recommended Enforcement Rules

Prevent, Detect, Recover

Prevent

Detect

Recover

Security Architecture Diagram

20 KiB

Raw Permalink Blame History

`sandbox-safe`

`project-write`

`host-admin`

Where `tmux-screenshot` Fits