# Browser Jail **Date:** 11.maj.2026 **Status:** DESIGN — Phase 0.6 injection validation passed; ready for Phase 1 implementation **Phase:** 0.5 viability PASS → 0.6 injection validation PASS → 1 implementation Clawdie's browser-computer-use backend is a FreeBSD/Bastille browser execution template plus future ephemeral task clones. The jail layer executes browser operations only; the controlplane owns MCP, auth, audit, credentials, task state, and the UI-TARS-compatible model loop. This document is the current implementation target. The prior long-lived shared Chromium-context model is superseded by the fixed thick `browser` template and per-task clone direction, validated by the redefined Phase 0.6 injection run. --- ## Related docs | Doc | Role | |---|---| | `docs/internal/BROWSER-JAIL.md` (this) | Current design + implementation target | | `doc/BROWSER-JAIL-HANDOFF.md` | Next tasks: Phase 1 implementation sequence | | `docs/internal/BROWSER-JAIL-FREEBSD-VIABILITY.md` | Phase 0.5 record (PASS) | | `docs/internal/BROWSER-JAIL-CLONE-LIFECYCLE-VALIDATION.md` | Phase 0.6 record (PASS for CDP injection; profile-byte inheritance dropped) | | `docs/internal/UI-TARS-ADOPTION.md` | Direction: UI-TARS as the agent-loop reference; clawdie owns substrate | | `docs/internal/VISION-GROUNDING-FINDINGS.md` | Vision-grounding validation — input for model selection | | `doc/BROWSER-JAIL-TEMPLATE-CLONE-PROPOSAL.md` | Historical — pivot reasoning only; do not update | --- ## Goals - Server-side headless browser automation on FreeBSD. - One fixed, credential-free thick browser template jail at a stable registry slot. - Future per-task jail clones for stronger isolation and clean teardown. - Controlplane-owned credentials store with explicit CDP cookie injection. - UI-TARS-compatible operator surface: screenshot → prediction → execute. - pi integration as one compact `browser_run_task` result, not screenshot spam in JSONL history. ## Non-goals (MVP) - Model logic inside the jail. - Storing operator credentials inside jail profiles or templates. - Browser profile-byte authentication cloning. - Downloads/uploads. - localStorage / IndexedDB credential injection. - Passkeys / WebAuthn portability. - Electron/nut-js desktop control on the FreeBSD server path. --- ## Fixed jail registry slot The canonical browser substrate is: ```text name: browser ip: WARDEN_BROWSER_IP, default .6 live IP: 192.168.72.6 on the current validation host shape: thick Bastille VNET jail boot: off by default packages: chromium, node22, npm-node22 workspace: /opt/browser-validation for current validation scripts credentials: none ``` Registry source of truth: ```text infra/jails.yaml infra/packages/browser-jail.txt ``` Why thick: - The browser template is a golden execution image, not a small ordinary service jail. - Thick snapshots/clones are self-contained and avoid the thin-jail nullfs/fstab clone failures observed in Phase 0.6. - ZFS clones remain cheap because unchanged blocks are shared. - Chromium's package payload dominates the size; the copied base userland is acceptable for reproducibility. The old thin `browserop` validation jail is retired. Operator mode is not a separate authenticated template; it is a **session credential mode** authorized by controlplane policy. --- ## Architecture ```text External clients / tasks ├── Claude Desktop MCP ├── UI-TARS-compatible runner ├── pi browser_run_task └── Clawdie controlplane tasks │ ▼ controlplane (host, clawdie service) ├── better-auth / operator auth ├── tenant + grant-token policy ├── credentials store + CDP cookie injection ├── audit log ├── screenshot retention store ├── UI-TARS-compatible GUIAgent runner ├── hostd calls for Bastille/ZFS/PF lifecycle └── per-session route table: session_id → browsertaskNNN/IP │ ▼ browsertaskNNN clone (future Phase 1 runtime) ├── cloned from thick browser template ├── plain HTTP API only, reachable from controlplane ├── Chromium from FreeBSD pkg ├── Node 22 + puppeteer-core/CDP bridge ├── no credentials at rest before injection └── destroyed at session end ``` The fixed `browser` jail is the template/source image. Runtime task work should happen in `browsertaskNNN` clones. During validation, the same HTTP/CDP code may be exercised directly inside `browser` to avoid clone noise, but production semantics are clone-backed. --- ## Controlplane responsibilities The controlplane owns every security-sensitive and product-facing concern: - MCP/HTTP tool surface. - better-auth session validation. - tenant and operator identity resolution. - `operator_grant_token` validation. - credentials storage, decryption, and domain filtering. - CDP cookie injection at session open. - audit writes before forwarding actions to the jail. - screenshot recording/retention outside clone datasets. - clone lifecycle through hostd. - orphan reaper and forced-unmount fallback. - UI-TARS-compatible browser task loop. The jail owns only: - starting Chromium, - accepting plain HTTP requests from controlplane, - executing browser actions via CDP, - returning screenshots/DOM/action results. No MCP server, long-lived auth secret, model loop, or audit DB access belongs in the jail. --- ## Credentials store + injection ### MVP scope - Cookies only. - Domain-filtered. - Tenant-scoped. - Grant-token-gated for operator mode. - Encrypted at rest. - No persistence back from task sessions. Out of MVP: - localStorage, - IndexedDB, - passkeys/WebAuthn, - automatic credential refresh when a site expires a session. ### Session credential modes ```json { "credential_mode": "clean | operator", "domains": ["github.com"], "record": "off | transient | audit" } ``` Defaults: ```text credential_mode = clean record = transient domains = [] ``` `credential_mode: "clean"` never injects cookies. `credential_mode: "operator"` requires a valid `operator_grant_token` whose scope matches the tenant and requested domains. The token authorizes injection only; it does not grant shell access, jail access, or blanket cookie export. ### `operator_grant_token` schema Concrete shape (resolved 11.maj.2026): ```text operator_grant_token = { jti: uuid # unique token id, audit key iss: "clawdie-cp" # issuer, fixed tenant_id: uuid origin_session_id: uuid # the operator-authorized session that issued this token operator_id: uuid allowed_domains: ["github.com", "stripe.com"] # exact host match; no wildcards issued_at: iso8601 expires_at: iso8601 # short-lived: 15 min default, configurable per tenant single_use: true | false # true = consumed by the first injection call } ``` - **Format:** clawdie-internal opaque token (random `jti`) → controlplane looks up the rest in its own table. Not a JWT — no signature verification needed because issuance and validation both happen inside the controlplane. - **Storage:** Postgres `auth.operator_grant_tokens` table; rows expire and GC'd by the existing cron pattern. Cleartext token only lives in the originating session's response and in the env of the pi task it spawns. - **Issuance:** controlplane issues during operator-authorized task creation (Telegram approval, dashboard action). Issued tokens are audit-logged. - **Validation:** at `browser_run_task` / `open_session` entry, controlplane verifies `jti` exists, not expired, `tenant_id` matches the caller's tenant, `allowed_domains` is a superset of the request's `domains`. On `single_use: true`, the row is marked consumed atomically. - **Revocation:** delete the row by `jti`. Pending validations against the same `jti` fail. The `single_use` default for MVP is `true` — each grant authorizes one task. Long-running approvals are explicit: operator marks a session `single_use: false` and a longer `expires_at` if they want to chain multiple tasks under one grant. ### Store sketch Preferred backend: Postgres, because it keeps credentials transactional, backupable, tenant-scoped, and close to Clawdie's existing secret story. Filesystem storage is acceptable only for early validation runs. ```text credentials.cookies id tenant_id domain cookies_encrypted # encrypted JSON array of CDP cookie shapes created_at last_refreshed_at last_injected_at grant_scope ``` ### Refresh workflow Credential refresh is an operator-driven workflow, separate from task sessions: ```text operator requests refresh → controlplane starts a refresh browser session/clone → operator logs in interactively → controlplane exports cookies via CDP Network.getAllCookies → controlplane filters selected domains → encrypted cookies are written to credentials store → refresh clone/session is destroyed ``` The refresh UX is not decided yet. Candidate shapes: - controlplane-streamed browser view, - VNC/X-forwarding/SPICE-style access, - future Lumina/Firefox cookie export bridge. The storage/injection contract is independent of that UX choice. ### Injection workflow ```text open browser task with credential_mode=operator → validate operator_grant_token → select cookies for (tenant, allowed domains) → decrypt in controlplane → clone browser → browsertaskNNN → start Chromium with empty profile → inject cookies via CDP Network.setCookie → verify via smoke probe where possible → hand session to UI-TARS/operator loop ``` Task sessions do not write updated cookies back to the store. If cookies expire, the task should fail clearly and ask for a refresh workflow. --- ## Tool surface Primitive browser tools remain useful for debugging and MCP compatibility: | Tool | Purpose | Returns | |---|---|---| | `browser.open_session` | Start a browser session/clone with `{credential_mode, domains, record}`. | `{ session_id, credential_mode, record, started_at }` | | `browser.navigate` | Load a URL. | `{ status, final_url, title }` | | `browser.screenshot` | Capture viewport by default; `full_page: true` explicit. | `{ image_base64, width, height, captured_at, persisted_path? }` | | `browser.click` | Click `(x,y)` or CSS selector. | `{ success, after_screenshot? }` | | `browser.type` | Type text into focused element or selector. | `{ success, after_screenshot? }` | | `browser.scroll` | Scroll page/selector. | `{ success, after_screenshot? }` | | `browser.read_dom` | Return truncated DOM for grounding fallback. | `{ html, truncated_to }` | | `browser.close_session` | Stop browser service, destroy clone/session resources. | `{ closed_at }` | Normal product integration should prefer one high-level call: ```text browser.run_task({ instruction, credential_mode, domains, record, max_steps }) ``` pi exposes this as `browser_run_task` and receives one compact result: ```json { "status": "finished | max_steps | error | aborted", "summary": "final answer or useful task summary", "result_data": {}, "trace_id": "controlplane trace/session id", "step_count": 8, "final_screenshot_path": "/var/db/browser-jail/sessions/.../final.png" } ``` Screenshots stay in the UI-TARS loop and Clawdie recording store. They are not appended turn-by-turn into pi JSONL history. --- ## UI-TARS-compatible operator Clawdie should adapt UI-TARS' mature GUI-agent loop shape rather than inventing a separate parser/loop: ```text instruction → screenshot() → model prediction → execute(prediction) → repeat until finished/max_steps/error ``` `ClawdieBrowserOperator` should expose: - `screenshot()` backed by `browser.screenshot`, - `execute(prediction)` translating parsed UI-TARS actions to browser tools, - `finished()` / close semantics backed by controlplane session cleanup. The model loop runs in controlplane or an external UI-TARS-compatible client, never inside the jail. --- ## Network policy Ingress: - Browser clone HTTP API reachable only from controlplane on the internal jail network. - No public ingress. - No MCP/auth endpoint inside the jail. Egress: - Public web egress allowed for browser tasks. - PF denies internal targets by default: - RFC1918, - loopback, - link-local, - IPv6 ULA/link-local, - cloud metadata (`169.254.169.254`). Future PF shape for clones: - static ruleset, - table `browser_tasks`, - add/delete clone IPs with `pfctl -t browser_tasks -T add/delete`, - no full PF reload per clone. --- ## Screenshot recording Recording is session-level and immutable for the session: | Mode | Disk | Retention | Use | |---|---|---|---| | `off` | none | — | disposable tests or sensitive sessions the operator chooses not to record | | `transient` | N=50 FIFO ring buffer | until close/eviction | default action-loop debugging | | `audit` | every screenshot | 7d default | sensitive operations needing forensic trace | Path: ```text /var/db/browser-jail/sessions///.png ``` Screenshots live on the host/controlplane side, outside clone datasets. Clone destroy must not delete audit material. --- ## Clone lifecycle Task clone names are alphanumeric for Bastille VNET compatibility: ```text browsertask001 browsertask002 ... ``` Expected create path: 1. ensure `browser` template is stopped/quiescent, 2. snapshot or clone from a known-good template state, 3. call hostd `browser-clone-create` to clone datasets, patch jail name/IP/VNET config, clear stale epairs, and add the clone IP to PF `browser_tasks`, 4. start jail, 5. start browser HTTP service / Chromium, 6. inject cookies if `credential_mode=operator`, 7. run task loop. Destroy/reaper order: 1. call hostd `browser-clone-destroy` / `browser-clone-reap`, 2. stop in-jail browser HTTP service through rc.d, 3. TERM Chromium by PID file, 4. KILL by PID only as fallback, 5. unmount nullfs/pkg-cache/session mounts, 6. `bastille stop `, 7. remove clone IP from PF table, 8. release IP, 9. `zfs destroy -r `, 10. on busy dataset: `browser-clone-force-unmount` / `zfs unmount -f` and retry destroy, 11. remove stale epairs before retrying a clone name. Hostd operations added for Phase 1A: - `browser-clone-create` — clone the thick `browser` datasets from a named snapshot, patch Bastille/VNET config, clear stale `btNNN` epairs, and add the task IP to PF table `browser_tasks`. - `browser-clone-destroy` — stop the in-jail browser service, perform PID-file-targeted shutdown, stop the jail, remove PF membership, clear epairs, and destroy clone datasets with forced-unmount retry. - `browser-clone-reap` — idempotent destroy path for orphaned clones. - `browser-clone-force-unmount` — narrow reaper fallback for busy clone datasets. Broad `pkill chrome` is not production behavior. Use rc.d service stop or PID-file-targeted shutdown. --- ## Resource quotas and limits Starting points: ```text zroot//jails/browser quota=10G zroot//browser-screenshots quota=20G ``` Per-clone runtime limits remain required: - memory RCTL, - openfiles RCTL, - per-session deadline, - max concurrent browser tasks per tenant/host. Exact values should be tuned after Phase 0.6/Phase 1 measurements. --- ## Threat model highlights ### Credential exfiltration Credentials are not stored in templates or clone profiles before injection. Injection is domain-filtered and grant-token-gated. Cleartext cookies are never logged. Screenshot recording mode is explicit, and typed text is redacted in audit logs. ### SSRF/internal network probing PF denies internal address ranges and metadata endpoints. The browser has public egress but should not reach private service jails or the host controlplane. ### Jail compromise The jail has no audit DB credentials, no MCP auth secret, and no credentials store. A compromised clone can affect its own task session but not the controlplane-owned store or audit trail. ### Resource exhaustion ZFS quotas, RCTL, deadlines, max sessions, and reaper cleanup bound disk, memory, CPU, and process leaks. ### Profile bleed The rejected profile-byte clone model is not used. Cookie injection starts from an empty profile per task clone. Cross-session cookie leakage remains an integration test requirement. --- ## Validation state ### Phase 0.5 — FreeBSD Chromium/CDP viability Passed. See `BROWSER-JAIL-FREEBSD-VIABILITY.md`. Confirmed: - FreeBSD 15 jail can run pkg Chromium headless. - `puppeteer-core` works against system Chromium over CDP. - screenshots and DOM reads work. ### Phase 0.6 — clone lifecycle / credential injection Passed. See `BROWSER-JAIL-CLONE-LIFECYCLE-VALIDATION.md`. Learned: - ZFS clone mechanics are viable. - Bastille cloned jail start is viable after config patching. - Thin-jail fstab/nullfs details were painful; fixed template is now thick. - Stale epairs and busy datasets require explicit cleanup/reaper logic. - Chromium encrypted profile-byte credential inheritance is not viable. - CDP cookie injection works inside `browser` and across the clone boundary. Passing run: - cookie export/import round-trip in `browser`: pass, - cookie injection across `browser` → `browsertask001`: pass, - 3-cycle clone/start/inject/smoke/destroy: pass, - idempotent orphan reaper: pass. Median timings from the passing run: - clone: 113 ms, - Bastille jail start: 1006 ms, - Chromium CDP readiness: 1502 ms, - CDP injection + smoke: 593 ms, - clone + jail start + injection smoke: 1718 ms, - clone + jail start + Chromium ready + injection smoke: 3211 ms. --- ## Phases | Phase | Status | Output | |---|---|---| | 0 — Design | COMPLETE | Browser jail docs + UI-TARS direction | | 0.5 — FreeBSD viability | PASS | Chromium + CDP confirmed | | 0.6 — Injection clone validation | PASS | cookie injection + 3-cycle clone lifecycle | | 1 — Browser backend implementation | READY | setup/browser-jail, hostd clone ops, jail HTTP API | | 2 — Controlplane integration | PENDING | credentials store, grant token policy, browser_run_task | | 3 — Refresh UX | PENDING | operator credential refresh flow | | 4 — Downloads/uploads | DEFERRED | explicit security design needed | --- ## Decided Resolved 11.maj.2026 after Phase 0.6 pivot: - **Credentials store backend: Postgres.** Reuses clawdie's existing encryption/transaction story; tenant-scoped via row constraints; single backup path. Filesystem store retained only as a validation-time scaffold, removed before Phase 1 implementation lands. - **Refresh UX: controlplane-streamed clone.** Operator interacts with a refresh-mode `browsertaskNNN` clone via a web-streamed channel served by the controlplane. Only option that works regardless of operator's machine; VNC / X-forwarding alternatives discarded because they assume a configured operator workstation. Blocking before adoption (gates remain): - CDP cookie injection must work reliably across clone boundary. - Clone cleanup/reaper must be idempotent. - PF table strategy must be validated with task clone IPs. - hostd must expose narrow clone/destroy/force-clean operations.