clawdie-ai/docs/internal/BROWSER-JAIL.md
Operator & Codex 6d662d5d3b Add browser clone hostd lifecycle ops
---
Build: pass | Tests: pass — 2395 passed (175 files)
2026-05-11 18:18:44 +02:00

19 KiB

Browser Jail

Date: 11.maj.2026 Status: DESIGN — Phase 0.6 injection validation passed; ready for Phase 1 implementation Phase: 0.5 viability PASS → 0.6 injection validation PASS → 1 implementation

Clawdie's browser-computer-use backend is a FreeBSD/Bastille browser execution template plus future ephemeral task clones. The jail layer executes browser operations only; the controlplane owns MCP, auth, audit, credentials, task state, and the UI-TARS-compatible model loop.

This document is the current implementation target. The prior long-lived shared Chromium-context model is superseded by the fixed thick browser template and per-task clone direction, validated by the redefined Phase 0.6 injection run.


Doc Role
docs/internal/BROWSER-JAIL.md (this) Current design + implementation target
doc/BROWSER-JAIL-HANDOFF.md Next tasks: Phase 1 implementation sequence
docs/internal/BROWSER-JAIL-FREEBSD-VIABILITY.md Phase 0.5 record (PASS)
docs/internal/BROWSER-JAIL-CLONE-LIFECYCLE-VALIDATION.md Phase 0.6 record (PASS for CDP injection; profile-byte inheritance dropped)
docs/internal/UI-TARS-ADOPTION.md Direction: UI-TARS as the agent-loop reference; clawdie owns substrate
docs/internal/VISION-GROUNDING-FINDINGS.md Vision-grounding validation — input for model selection
doc/BROWSER-JAIL-TEMPLATE-CLONE-PROPOSAL.md Historical — pivot reasoning only; do not update

Goals

  • Server-side headless browser automation on FreeBSD.
  • One fixed, credential-free thick browser template jail at a stable registry slot.
  • Future per-task jail clones for stronger isolation and clean teardown.
  • Controlplane-owned credentials store with explicit CDP cookie injection.
  • UI-TARS-compatible operator surface: screenshot → prediction → execute.
  • pi integration as one compact browser_run_task result, not screenshot spam in JSONL history.

Non-goals (MVP)

  • Model logic inside the jail.
  • Storing operator credentials inside jail profiles or templates.
  • Browser profile-byte authentication cloning.
  • Downloads/uploads.
  • localStorage / IndexedDB credential injection.
  • Passkeys / WebAuthn portability.
  • Electron/nut-js desktop control on the FreeBSD server path.

Fixed jail registry slot

The canonical browser substrate is:

name:        browser
ip:          WARDEN_BROWSER_IP, default <subnet>.6
live IP:     192.168.72.6 on the current validation host
shape:       thick Bastille VNET jail
boot:        off by default
packages:    chromium, node22, npm-node22
workspace:   /opt/browser-validation for current validation scripts
credentials: none

Registry source of truth:

infra/jails.yaml
infra/packages/browser-jail.txt

Why thick:

  • The browser template is a golden execution image, not a small ordinary service jail.
  • Thick snapshots/clones are self-contained and avoid the thin-jail nullfs/fstab clone failures observed in Phase 0.6.
  • ZFS clones remain cheap because unchanged blocks are shared.
  • Chromium's package payload dominates the size; the copied base userland is acceptable for reproducibility.

The old thin browserop validation jail is retired. Operator mode is not a separate authenticated template; it is a session credential mode authorized by controlplane policy.


Architecture

External clients / tasks
  ├── Claude Desktop MCP
  ├── UI-TARS-compatible runner
  ├── pi browser_run_task
  └── Clawdie controlplane tasks
          │
          ▼
controlplane (host, clawdie service)
  ├── better-auth / operator auth
  ├── tenant + grant-token policy
  ├── credentials store + CDP cookie injection
  ├── audit log
  ├── screenshot retention store
  ├── UI-TARS-compatible GUIAgent runner
  ├── hostd calls for Bastille/ZFS/PF lifecycle
  └── per-session route table: session_id → browsertaskNNN/IP
          │
          ▼
browsertaskNNN clone (future Phase 1 runtime)
  ├── cloned from thick browser template
  ├── plain HTTP API only, reachable from controlplane
  ├── Chromium from FreeBSD pkg
  ├── Node 22 + puppeteer-core/CDP bridge
  ├── no credentials at rest before injection
  └── destroyed at session end

The fixed browser jail is the template/source image. Runtime task work should happen in browsertaskNNN clones. During validation, the same HTTP/CDP code may be exercised directly inside browser to avoid clone noise, but production semantics are clone-backed.


Controlplane responsibilities

The controlplane owns every security-sensitive and product-facing concern:

  • MCP/HTTP tool surface.
  • better-auth session validation.
  • tenant and operator identity resolution.
  • operator_grant_token validation.
  • credentials storage, decryption, and domain filtering.
  • CDP cookie injection at session open.
  • audit writes before forwarding actions to the jail.
  • screenshot recording/retention outside clone datasets.
  • clone lifecycle through hostd.
  • orphan reaper and forced-unmount fallback.
  • UI-TARS-compatible browser task loop.

The jail owns only:

  • starting Chromium,
  • accepting plain HTTP requests from controlplane,
  • executing browser actions via CDP,
  • returning screenshots/DOM/action results.

No MCP server, long-lived auth secret, model loop, or audit DB access belongs in the jail.


Credentials store + injection

MVP scope

  • Cookies only.
  • Domain-filtered.
  • Tenant-scoped.
  • Grant-token-gated for operator mode.
  • Encrypted at rest.
  • No persistence back from task sessions.

Out of MVP:

  • localStorage,
  • IndexedDB,
  • passkeys/WebAuthn,
  • automatic credential refresh when a site expires a session.

Session credential modes

{
  "credential_mode": "clean | operator",
  "domains": ["github.com"],
  "record": "off | transient | audit"
}

Defaults:

credential_mode = clean
record          = transient
domains         = []

credential_mode: "clean" never injects cookies.

credential_mode: "operator" requires a valid operator_grant_token whose scope matches the tenant and requested domains. The token authorizes injection only; it does not grant shell access, jail access, or blanket cookie export.

operator_grant_token schema

Concrete shape (resolved 11.maj.2026):

operator_grant_token = {
  jti:                 uuid           # unique token id, audit key
  iss:                 "clawdie-cp"   # issuer, fixed
  tenant_id:           uuid
  origin_session_id:   uuid           # the operator-authorized session that issued this token
  operator_id:         uuid
  allowed_domains:     ["github.com", "stripe.com"]   # exact host match; no wildcards
  issued_at:           iso8601
  expires_at:          iso8601        # short-lived: 15 min default, configurable per tenant
  single_use:          true | false   # true = consumed by the first injection call
}
  • Format: clawdie-internal opaque token (random jti) → controlplane looks up the rest in its own table. Not a JWT — no signature verification needed because issuance and validation both happen inside the controlplane.
  • Storage: Postgres auth.operator_grant_tokens table; rows expire and GC'd by the existing cron pattern. Cleartext token only lives in the originating session's response and in the env of the pi task it spawns.
  • Issuance: controlplane issues during operator-authorized task creation (Telegram approval, dashboard action). Issued tokens are audit-logged.
  • Validation: at browser_run_task / open_session entry, controlplane verifies jti exists, not expired, tenant_id matches the caller's tenant, allowed_domains is a superset of the request's domains. On single_use: true, the row is marked consumed atomically.
  • Revocation: delete the row by jti. Pending validations against the same jti fail.

The single_use default for MVP is true — each grant authorizes one task. Long-running approvals are explicit: operator marks a session single_use: false and a longer expires_at if they want to chain multiple tasks under one grant.

Store sketch

Preferred backend: Postgres, because it keeps credentials transactional, backupable, tenant-scoped, and close to Clawdie's existing secret story. Filesystem storage is acceptable only for early validation runs.

credentials.cookies
  id
  tenant_id
  domain
  cookies_encrypted      # encrypted JSON array of CDP cookie shapes
  created_at
  last_refreshed_at
  last_injected_at
  grant_scope

Refresh workflow

Credential refresh is an operator-driven workflow, separate from task sessions:

operator requests refresh
  → controlplane starts a refresh browser session/clone
  → operator logs in interactively
  → controlplane exports cookies via CDP Network.getAllCookies
  → controlplane filters selected domains
  → encrypted cookies are written to credentials store
  → refresh clone/session is destroyed

The refresh UX is not decided yet. Candidate shapes:

  • controlplane-streamed browser view,
  • VNC/X-forwarding/SPICE-style access,
  • future Lumina/Firefox cookie export bridge.

The storage/injection contract is independent of that UX choice.

Injection workflow

open browser task with credential_mode=operator
  → validate operator_grant_token
  → select cookies for (tenant, allowed domains)
  → decrypt in controlplane
  → clone browser → browsertaskNNN
  → start Chromium with empty profile
  → inject cookies via CDP Network.setCookie
  → verify via smoke probe where possible
  → hand session to UI-TARS/operator loop

Task sessions do not write updated cookies back to the store. If cookies expire, the task should fail clearly and ask for a refresh workflow.


Tool surface

Primitive browser tools remain useful for debugging and MCP compatibility:

Tool Purpose Returns
browser.open_session Start a browser session/clone with {credential_mode, domains, record}. { session_id, credential_mode, record, started_at }
browser.navigate Load a URL. { status, final_url, title }
browser.screenshot Capture viewport by default; full_page: true explicit. { image_base64, width, height, captured_at, persisted_path? }
browser.click Click (x,y) or CSS selector. { success, after_screenshot? }
browser.type Type text into focused element or selector. { success, after_screenshot? }
browser.scroll Scroll page/selector. { success, after_screenshot? }
browser.read_dom Return truncated DOM for grounding fallback. { html, truncated_to }
browser.close_session Stop browser service, destroy clone/session resources. { closed_at }

Normal product integration should prefer one high-level call:

browser.run_task({ instruction, credential_mode, domains, record, max_steps })

pi exposes this as browser_run_task and receives one compact result:

{
  "status": "finished | max_steps | error | aborted",
  "summary": "final answer or useful task summary",
  "result_data": {},
  "trace_id": "controlplane trace/session id",
  "step_count": 8,
  "final_screenshot_path": "/var/db/browser-jail/sessions/.../final.png"
}

Screenshots stay in the UI-TARS loop and Clawdie recording store. They are not appended turn-by-turn into pi JSONL history.


UI-TARS-compatible operator

Clawdie should adapt UI-TARS' mature GUI-agent loop shape rather than inventing a separate parser/loop:

instruction
  → screenshot()
  → model prediction
  → execute(prediction)
  → repeat until finished/max_steps/error

ClawdieBrowserOperator should expose:

  • screenshot() backed by browser.screenshot,
  • execute(prediction) translating parsed UI-TARS actions to browser tools,
  • finished() / close semantics backed by controlplane session cleanup.

The model loop runs in controlplane or an external UI-TARS-compatible client, never inside the jail.


Network policy

Ingress:

  • Browser clone HTTP API reachable only from controlplane on the internal jail network.
  • No public ingress.
  • No MCP/auth endpoint inside the jail.

Egress:

  • Public web egress allowed for browser tasks.
  • PF denies internal targets by default:
    • RFC1918,
    • loopback,
    • link-local,
    • IPv6 ULA/link-local,
    • cloud metadata (169.254.169.254).

Future PF shape for clones:

  • static ruleset,
  • table browser_tasks,
  • add/delete clone IPs with pfctl -t browser_tasks -T add/delete,
  • no full PF reload per clone.

Screenshot recording

Recording is session-level and immutable for the session:

Mode Disk Retention Use
off none disposable tests or sensitive sessions the operator chooses not to record
transient N=50 FIFO ring buffer until close/eviction default action-loop debugging
audit every screenshot 7d default sensitive operations needing forensic trace

Path:

/var/db/browser-jail/sessions/<tenant_id>/<session_id>/<seq>.png

Screenshots live on the host/controlplane side, outside clone datasets. Clone destroy must not delete audit material.


Clone lifecycle

Task clone names are alphanumeric for Bastille VNET compatibility:

browsertask001
browsertask002
...

Expected create path:

  1. ensure browser template is stopped/quiescent,
  2. snapshot or clone from a known-good template state,
  3. call hostd browser-clone-create to clone datasets, patch jail name/IP/VNET config, clear stale epairs, and add the clone IP to PF browser_tasks,
  4. start jail,
  5. start browser HTTP service / Chromium,
  6. inject cookies if credential_mode=operator,
  7. run task loop.

Destroy/reaper order:

  1. call hostd browser-clone-destroy / browser-clone-reap,
  2. stop in-jail browser HTTP service through rc.d,
  3. TERM Chromium by PID file,
  4. KILL by PID only as fallback,
  5. unmount nullfs/pkg-cache/session mounts,
  6. bastille stop <clone>,
  7. remove clone IP from PF table,
  8. release IP,
  9. zfs destroy -r <clone_dataset>,
  10. on busy dataset: browser-clone-force-unmount / zfs unmount -f and retry destroy,
  11. remove stale epairs before retrying a clone name.

Hostd operations added for Phase 1A:

  • browser-clone-create — clone the thick browser datasets from a named snapshot, patch Bastille/VNET config, clear stale btNNN epairs, and add the task IP to PF table browser_tasks.
  • browser-clone-destroy — stop the in-jail browser service, perform PID-file-targeted shutdown, stop the jail, remove PF membership, clear epairs, and destroy clone datasets with forced-unmount retry.
  • browser-clone-reap — idempotent destroy path for orphaned clones.
  • browser-clone-force-unmount — narrow reaper fallback for busy clone datasets.

Broad pkill chrome is not production behavior. Use rc.d service stop or PID-file-targeted shutdown.


Resource quotas and limits

Starting points:

zroot/<runtime>/jails/browser              quota=10G
zroot/<runtime>/browser-screenshots        quota=20G

Per-clone runtime limits remain required:

  • memory RCTL,
  • openfiles RCTL,
  • per-session deadline,
  • max concurrent browser tasks per tenant/host.

Exact values should be tuned after Phase 0.6/Phase 1 measurements.


Threat model highlights

Credential exfiltration

Credentials are not stored in templates or clone profiles before injection. Injection is domain-filtered and grant-token-gated. Cleartext cookies are never logged. Screenshot recording mode is explicit, and typed text is redacted in audit logs.

SSRF/internal network probing

PF denies internal address ranges and metadata endpoints. The browser has public egress but should not reach private service jails or the host controlplane.

Jail compromise

The jail has no audit DB credentials, no MCP auth secret, and no credentials store. A compromised clone can affect its own task session but not the controlplane-owned store or audit trail.

Resource exhaustion

ZFS quotas, RCTL, deadlines, max sessions, and reaper cleanup bound disk, memory, CPU, and process leaks.

Profile bleed

The rejected profile-byte clone model is not used. Cookie injection starts from an empty profile per task clone. Cross-session cookie leakage remains an integration test requirement.


Validation state

Phase 0.5 — FreeBSD Chromium/CDP viability

Passed. See BROWSER-JAIL-FREEBSD-VIABILITY.md.

Confirmed:

  • FreeBSD 15 jail can run pkg Chromium headless.
  • puppeteer-core works against system Chromium over CDP.
  • screenshots and DOM reads work.

Phase 0.6 — clone lifecycle / credential injection

Passed. See BROWSER-JAIL-CLONE-LIFECYCLE-VALIDATION.md.

Learned:

  • ZFS clone mechanics are viable.
  • Bastille cloned jail start is viable after config patching.
  • Thin-jail fstab/nullfs details were painful; fixed template is now thick.
  • Stale epairs and busy datasets require explicit cleanup/reaper logic.
  • Chromium encrypted profile-byte credential inheritance is not viable.
  • CDP cookie injection works inside browser and across the clone boundary.

Passing run:

  • cookie export/import round-trip in browser: pass,
  • cookie injection across browserbrowsertask001: pass,
  • 3-cycle clone/start/inject/smoke/destroy: pass,
  • idempotent orphan reaper: pass.

Median timings from the passing run:

  • clone: 113 ms,
  • Bastille jail start: 1006 ms,
  • Chromium CDP readiness: 1502 ms,
  • CDP injection + smoke: 593 ms,
  • clone + jail start + injection smoke: 1718 ms,
  • clone + jail start + Chromium ready + injection smoke: 3211 ms.

Phases

Phase Status Output
0 — Design COMPLETE Browser jail docs + UI-TARS direction
0.5 — FreeBSD viability PASS Chromium + CDP confirmed
0.6 — Injection clone validation PASS cookie injection + 3-cycle clone lifecycle
1 — Browser backend implementation READY setup/browser-jail, hostd clone ops, jail HTTP API
2 — Controlplane integration PENDING credentials store, grant token policy, browser_run_task
3 — Refresh UX PENDING operator credential refresh flow
4 — Downloads/uploads DEFERRED explicit security design needed

Decided

Resolved 11.maj.2026 after Phase 0.6 pivot:

  • Credentials store backend: Postgres. Reuses clawdie's existing encryption/transaction story; tenant-scoped via row constraints; single backup path. Filesystem store retained only as a validation-time scaffold, removed before Phase 1 implementation lands.
  • Refresh UX: controlplane-streamed clone. Operator interacts with a refresh-mode browsertaskNNN clone via a web-streamed channel served by the controlplane. Only option that works regardless of operator's machine; VNC / X-forwarding alternatives discarded because they assume a configured operator workstation.

Blocking before adoption (gates remain):

  • CDP cookie injection must work reliably across clone boundary.
  • Clone cleanup/reaper must be idempotent.
  • PF table strategy must be validated with task clone IPs.
  • hostd must expose narrow clone/destroy/force-clean operations.