# Browser Jail

**Date:** 11.maj.2026
**Status:** DESIGN — Phase 0.6 injection validation passed; ready for Phase 1 implementation
**Phase:** 0.5 viability PASS → 0.6 injection validation PASS → 1 implementation

Clawdie's browser-computer-use backend is a FreeBSD/Bastille browser execution
template plus future ephemeral task clones. The jail layer executes browser
operations only; the controlplane owns MCP, auth, audit, credentials, task
state, and the UI-TARS-compatible model loop.

This document is the current implementation target. The prior long-lived shared
Chromium-context model is superseded by the fixed thick `browser` template and
per-task clone direction, validated by the redefined Phase 0.6 injection run.

---

## Related docs

| Doc | Role |
|---|---|
| `docs/internal/BROWSER-JAIL.md` (this) | Current design + implementation target |
| `doc/BROWSER-JAIL-HANDOFF.md` | Next tasks: Phase 1 implementation sequence |
| `docs/internal/BROWSER-JAIL-FREEBSD-VIABILITY.md` | Phase 0.5 record (PASS) |
| `docs/internal/BROWSER-JAIL-CLONE-LIFECYCLE-VALIDATION.md` | Phase 0.6 record (PASS for CDP injection; profile-byte inheritance dropped) |
| `docs/internal/UI-TARS-ADOPTION.md` | Direction: UI-TARS as the agent-loop reference; clawdie owns substrate |
| `docs/internal/VISION-GROUNDING-FINDINGS.md` | Vision-grounding validation — input for model selection |
| `doc/BROWSER-JAIL-TEMPLATE-CLONE-PROPOSAL.md` | Historical — pivot reasoning only; do not update |

---

## Goals

- Server-side headless browser automation on FreeBSD.
- One fixed, credential-free thick browser template jail at a stable registry
  slot.
- Future per-task jail clones for stronger isolation and clean teardown.
- Controlplane-owned credentials store with explicit CDP cookie injection.
- UI-TARS-compatible operator surface: screenshot → prediction → execute.
- pi integration as one compact `browser_run_task` result, not screenshot spam
  in JSONL history.

## Non-goals (MVP)

- Model logic inside the jail.
- Storing operator credentials inside jail profiles or templates.
- Browser profile-byte authentication cloning.
- Downloads/uploads.
- localStorage / IndexedDB credential injection.
- Passkeys / WebAuthn portability.
- Electron/nut-js desktop control on the FreeBSD server path.

---

## Fixed jail registry slot

The canonical browser substrate is:

```text
name:        browser
ip:          WARDEN_BROWSER_IP, default <subnet>.6
live IP:     192.168.72.6 on the current validation host
shape:       thick Bastille VNET jail
boot:        off by default
packages:    chromium, node22, npm-node22
workspace:   /opt/browser-validation for current validation scripts
credentials: none
```

Registry source of truth:

```text
infra/jails.yaml
infra/packages/browser-jail.txt
```

Why thick:

- The browser template is a golden execution image, not a small ordinary
  service jail.
- Thick snapshots/clones are self-contained and avoid the thin-jail nullfs/fstab
  clone failures observed in Phase 0.6.
- ZFS clones remain cheap because unchanged blocks are shared.
- Chromium's package payload dominates the size; the copied base userland is
  acceptable for reproducibility.

The old thin `browserop` validation jail is retired. Operator mode is not a
separate authenticated template; it is a **session credential mode** authorized
by controlplane policy.

---

## Architecture

```text
External clients / tasks
  ├── Claude Desktop MCP
  ├── UI-TARS-compatible runner
  ├── pi browser_run_task
  └── Clawdie controlplane tasks
          │
          ▼
controlplane (host, clawdie service)
  ├── better-auth / operator auth
  ├── tenant + grant-token policy
  ├── credentials store + CDP cookie injection
  ├── audit log
  ├── screenshot retention store
  ├── UI-TARS-compatible GUIAgent runner
  ├── hostd calls for Bastille/ZFS/PF lifecycle
  └── per-session route table: session_id → browsertaskNNN/IP
          │
          ▼
browsertaskNNN clone (future Phase 1 runtime)
  ├── cloned from thick browser template
  ├── plain HTTP API only, reachable from controlplane
  ├── Chromium from FreeBSD pkg
  ├── Node 22 + puppeteer-core/CDP bridge
  ├── no credentials at rest before injection
  └── destroyed at session end
```

The fixed `browser` jail is the template/source image. Runtime task work should
happen in `browsertaskNNN` clones. During validation, the
same HTTP/CDP code may be exercised directly inside `browser` to avoid clone
noise, but production semantics are clone-backed.

---

## Controlplane responsibilities

The controlplane owns every security-sensitive and product-facing concern:

- MCP/HTTP tool surface.
- better-auth session validation.
- tenant and operator identity resolution.
- `operator_grant_token` validation.
- credentials storage, decryption, and domain filtering.
- CDP cookie injection at session open.
- audit writes before forwarding actions to the jail.
- screenshot recording/retention outside clone datasets.
- clone lifecycle through hostd.
- orphan reaper and forced-unmount fallback.
- UI-TARS-compatible browser task loop.

The jail owns only:

- starting Chromium,
- accepting plain HTTP requests from controlplane,
- executing browser actions via CDP,
- returning screenshots/DOM/action results.

No MCP server, long-lived auth secret, model loop, or audit DB access belongs in
the jail.

---

## Credentials store + injection

### MVP scope

- Cookies only.
- Domain-filtered.
- Tenant-scoped.
- Grant-token-gated for operator mode.
- Encrypted at rest.
- No persistence back from task sessions.

Out of MVP:

- localStorage,
- IndexedDB,
- passkeys/WebAuthn,
- automatic credential refresh when a site expires a session.

### Session credential modes

```json
{
  "credential_mode": "clean | operator",
  "domains": ["github.com"],
  "record": "off | transient | audit"
}
```

Defaults:

```text
credential_mode = clean
record          = transient
domains         = []
```

`credential_mode: "clean"` never injects cookies.

`credential_mode: "operator"` requires a valid `operator_grant_token` whose
scope matches the tenant and requested domains. The token authorizes injection
only; it does not grant shell access, jail access, or blanket cookie export.

### `operator_grant_token` schema

Concrete shape (resolved 11.maj.2026):

```text
operator_grant_token = {
  jti:                 uuid           # unique token id, audit key
  iss:                 "clawdie-cp"   # issuer, fixed
  tenant_id:           uuid
  origin_session_id:   uuid           # the operator-authorized session that issued this token
  operator_id:         uuid
  allowed_domains:     ["github.com", "stripe.com"]   # exact host match; no wildcards
  issued_at:           iso8601
  expires_at:          iso8601        # short-lived: 15 min default, configurable per tenant
  single_use:          true | false   # true = consumed by the first injection call
}
```

- **Format:** clawdie-internal opaque token (random `jti`) → controlplane
  looks up the rest in its own table. Not a JWT — no signature verification
  needed because issuance and validation both happen inside the controlplane.
- **Storage:** Postgres `auth.operator_grant_tokens` table; rows expire and
  GC'd by the existing cron pattern. Cleartext token only lives in the
  originating session's response and in the env of the pi task it spawns.
- **Issuance:** controlplane issues during operator-authorized task creation
  (Telegram approval, dashboard action). Issued tokens are audit-logged.
- **Validation:** at `browser_run_task` / `open_session` entry, controlplane
  verifies `jti` exists, not expired, `tenant_id` matches the caller's tenant,
  `allowed_domains` is a superset of the request's `domains`. On `single_use: true`,
  the row is marked consumed atomically.
- **Revocation:** delete the row by `jti`. Pending validations against the
  same `jti` fail.

The `single_use` default for MVP is `true` — each grant authorizes one task.
Long-running approvals are explicit: operator marks a session
`single_use: false` and a longer `expires_at` if they want to chain multiple
tasks under one grant.

### Store sketch

Preferred backend: Postgres, because it keeps credentials transactional,
backupable, tenant-scoped, and close to Clawdie's existing secret story.
Filesystem storage is acceptable only for early validation runs.

```text
credentials.cookies
  id
  tenant_id
  domain
  cookies_encrypted      # encrypted JSON array of CDP cookie shapes
  created_at
  last_refreshed_at
  last_injected_at
  grant_scope
```

### Refresh workflow

Credential refresh is an operator-driven workflow, separate from task sessions:

```text
operator requests refresh
  → controlplane starts a refresh browser session/clone
  → operator logs in interactively
  → controlplane exports cookies via CDP Network.getAllCookies
  → controlplane filters selected domains
  → encrypted cookies are written to credentials store
  → refresh clone/session is destroyed
```

The refresh UX is not decided yet. Candidate shapes:

- controlplane-streamed browser view,
- VNC/X-forwarding/SPICE-style access,
- future Lumina/Firefox cookie export bridge.

The storage/injection contract is independent of that UX choice.

### Injection workflow

```text
open browser task with credential_mode=operator
  → validate operator_grant_token
  → select cookies for (tenant, allowed domains)
  → decrypt in controlplane
  → clone browser → browsertaskNNN
  → start Chromium with empty profile
  → inject cookies via CDP Network.setCookie
  → verify via smoke probe where possible
  → hand session to UI-TARS/operator loop
```

Task sessions do not write updated cookies back to the store. If cookies expire,
the task should fail clearly and ask for a refresh workflow.

---

## Tool surface

Primitive browser tools remain useful for debugging and MCP compatibility:

| Tool | Purpose | Returns |
|---|---|---|
| `browser.open_session` | Start a browser session/clone with `{credential_mode, domains, record}`. | `{ session_id, credential_mode, record, started_at }` |
| `browser.navigate` | Load a URL. | `{ status, final_url, title }` |
| `browser.screenshot` | Capture viewport by default; `full_page: true` explicit. | `{ image_base64, width, height, captured_at, persisted_path? }` |
| `browser.click` | Click `(x,y)` or CSS selector. | `{ success, after_screenshot? }` |
| `browser.type` | Type text into focused element or selector. | `{ success, after_screenshot? }` |
| `browser.scroll` | Scroll page/selector. | `{ success, after_screenshot? }` |
| `browser.read_dom` | Return truncated DOM for grounding fallback. | `{ html, truncated_to }` |
| `browser.close_session` | Stop browser service, destroy clone/session resources. | `{ closed_at }` |

Normal product integration should prefer one high-level call:

```text
browser.run_task({ instruction, credential_mode, domains, record, max_steps })
```

pi exposes this as `browser_run_task` and receives one compact result:

```json
{
  "status": "finished | max_steps | error | aborted",
  "summary": "final answer or useful task summary",
  "result_data": {},
  "trace_id": "controlplane trace/session id",
  "step_count": 8,
  "final_screenshot_path": "/var/db/browser-jail/sessions/.../final.png"
}
```

Screenshots stay in the UI-TARS loop and Clawdie recording store. They are not
appended turn-by-turn into pi JSONL history.

---

## UI-TARS-compatible operator

Clawdie should adapt UI-TARS' mature GUI-agent loop shape rather than inventing
a separate parser/loop:

```text
instruction
  → screenshot()
  → model prediction
  → execute(prediction)
  → repeat until finished/max_steps/error
```

`ClawdieBrowserOperator` should expose:

- `screenshot()` backed by `browser.screenshot`,
- `execute(prediction)` translating parsed UI-TARS actions to browser tools,
- `finished()` / close semantics backed by controlplane session cleanup.

The model loop runs in controlplane or an external UI-TARS-compatible client,
never inside the jail.

---

## Network policy

Ingress:

- Browser clone HTTP API reachable only from controlplane on the internal jail
  network.
- No public ingress.
- No MCP/auth endpoint inside the jail.

Egress:

- Public web egress allowed for browser tasks.
- PF denies internal targets by default:
  - RFC1918,
  - loopback,
  - link-local,
  - IPv6 ULA/link-local,
  - cloud metadata (`169.254.169.254`).

Future PF shape for clones:

- static ruleset,
- table `browser_tasks`,
- add/delete clone IPs with `pfctl -t browser_tasks -T add/delete`,
- no full PF reload per clone.

---

## Screenshot recording

Recording is session-level and immutable for the session:

| Mode | Disk | Retention | Use |
|---|---|---|---|
| `off` | none | — | disposable tests or sensitive sessions the operator chooses not to record |
| `transient` | N=50 FIFO ring buffer | until close/eviction | default action-loop debugging |
| `audit` | every screenshot | 7d default | sensitive operations needing forensic trace |

Path:

```text
/var/db/browser-jail/sessions/<tenant_id>/<session_id>/<seq>.png
```

Screenshots live on the host/controlplane side, outside clone datasets. Clone
destroy must not delete audit material.

---

## Clone lifecycle

Task clone names are alphanumeric for Bastille VNET compatibility:

```text
browsertask001
browsertask002
...
```

Expected create path:

1. ensure `browser` template is stopped/quiescent,
2. snapshot or clone from a known-good template state,
3. call hostd `browser-clone-create` to clone datasets, patch jail name/IP/VNET config, clear stale epairs, and add the clone IP to PF `browser_tasks`,
4. start jail,
5. start browser HTTP service / Chromium,
6. inject cookies if `credential_mode=operator`,
7. run task loop.

Destroy/reaper order:

1. call hostd `browser-clone-destroy` / `browser-clone-reap`,
2. stop in-jail browser HTTP service through rc.d,
3. TERM Chromium by PID file,
4. KILL by PID only as fallback,
5. unmount nullfs/pkg-cache/session mounts,
6. `bastille stop <clone>`,
7. remove clone IP from PF table,
8. release IP,
9. `zfs destroy -r <clone_dataset>`,
10. on busy dataset: `browser-clone-force-unmount` / `zfs unmount -f` and retry destroy,
11. remove stale epairs before retrying a clone name.

Hostd operations added for Phase 1A:

- `browser-clone-create` — clone the thick `browser` datasets from a named snapshot, patch Bastille/VNET config, clear stale `btNNN` epairs, and add the task IP to PF table `browser_tasks`.
- `browser-clone-destroy` — stop the in-jail browser service, perform PID-file-targeted shutdown, stop the jail, remove PF membership, clear epairs, and destroy clone datasets with forced-unmount retry.
- `browser-clone-reap` — idempotent destroy path for orphaned clones.
- `browser-clone-force-unmount` — narrow reaper fallback for busy clone datasets.

Broad `pkill chrome` is not production behavior. Use rc.d service stop or
PID-file-targeted shutdown.

---

## Resource quotas and limits

Starting points:

```text
zroot/<runtime>/jails/browser              quota=10G
zroot/<runtime>/browser-screenshots        quota=20G
```

Per-clone runtime limits remain required:

- memory RCTL,
- openfiles RCTL,
- per-session deadline,
- max concurrent browser tasks per tenant/host.

Exact values should be tuned after Phase 0.6/Phase 1 measurements.

---

## Threat model highlights

### Credential exfiltration

Credentials are not stored in templates or clone profiles before injection.
Injection is domain-filtered and grant-token-gated. Cleartext cookies are never
logged. Screenshot recording mode is explicit, and typed text is redacted in
audit logs.

### SSRF/internal network probing

PF denies internal address ranges and metadata endpoints. The browser has public
egress but should not reach private service jails or the host controlplane.

### Jail compromise

The jail has no audit DB credentials, no MCP auth secret, and no credentials
store. A compromised clone can affect its own task session but not the
controlplane-owned store or audit trail.

### Resource exhaustion

ZFS quotas, RCTL, deadlines, max sessions, and reaper cleanup bound disk,
memory, CPU, and process leaks.

### Profile bleed

The rejected profile-byte clone model is not used. Cookie injection starts from
an empty profile per task clone. Cross-session cookie leakage remains an
integration test requirement.

---

## Validation state

### Phase 0.5 — FreeBSD Chromium/CDP viability

Passed. See `BROWSER-JAIL-FREEBSD-VIABILITY.md`.

Confirmed:

- FreeBSD 15 jail can run pkg Chromium headless.
- `puppeteer-core` works against system Chromium over CDP.
- screenshots and DOM reads work.

### Phase 0.6 — clone lifecycle / credential injection

Passed. See `BROWSER-JAIL-CLONE-LIFECYCLE-VALIDATION.md`.

Learned:

- ZFS clone mechanics are viable.
- Bastille cloned jail start is viable after config patching.
- Thin-jail fstab/nullfs details were painful; fixed template is now thick.
- Stale epairs and busy datasets require explicit cleanup/reaper logic.
- Chromium encrypted profile-byte credential inheritance is not viable.
- CDP cookie injection works inside `browser` and across the clone boundary.

Passing run:

- cookie export/import round-trip in `browser`: pass,
- cookie injection across `browser` → `browsertask001`: pass,
- 3-cycle clone/start/inject/smoke/destroy: pass,
- idempotent orphan reaper: pass.

Median timings from the passing run:

- clone: 113 ms,
- Bastille jail start: 1006 ms,
- Chromium CDP readiness: 1502 ms,
- CDP injection + smoke: 593 ms,
- clone + jail start + injection smoke: 1718 ms,
- clone + jail start + Chromium ready + injection smoke: 3211 ms.

---

## Phases

| Phase | Status | Output |
|---|---|---|
| 0 — Design | COMPLETE | Browser jail docs + UI-TARS direction |
| 0.5 — FreeBSD viability | PASS | Chromium + CDP confirmed |
| 0.6 — Injection clone validation | PASS | cookie injection + 3-cycle clone lifecycle |
| 1 — Browser backend implementation | READY | setup/browser-jail, hostd clone ops, jail HTTP API |
| 2 — Controlplane integration | PENDING | credentials store, grant token policy, browser_run_task |
| 3 — Refresh UX | PENDING | operator credential refresh flow |
| 4 — Downloads/uploads | DEFERRED | explicit security design needed |

---

## Decided

Resolved 11.maj.2026 after Phase 0.6 pivot:

- **Credentials store backend: Postgres.** Reuses clawdie's existing
  encryption/transaction story; tenant-scoped via row constraints; single
  backup path. Filesystem store retained only as a validation-time scaffold,
  removed before Phase 1 implementation lands.
- **Refresh UX: controlplane-streamed clone.** Operator interacts with a
  refresh-mode `browsertaskNNN` clone via a web-streamed channel served by
  the controlplane. Only option that works regardless of operator's
  machine; VNC / X-forwarding alternatives discarded because they assume
  a configured operator workstation.

Blocking before adoption (gates remain):

- CDP cookie injection must work reliably across clone boundary.
- Clone cleanup/reaper must be idempotent.
- PF table strategy must be validated with task clone IPs.
- hostd must expose narrow clone/destroy/force-clean operations.