11 KiB
Browser Jail Handoff
From: Claude (design) → Codex (implementation) Date: 11.maj.2026 Status: PHASE 0.5 COMPLETE — READY FOR PHASE 1 REVIEW
Design reference: docs/internal/BROWSER-JAIL.md
Vision spike: docs/internal/VISION-GROUNDING-FINDINGS.md (in progress)
UI-TARS direction: docs/internal/UI-TARS-ADOPTION.md
Scope
- Repo:
codeberg.org:Clawdie/Clawdie-AI.git - Branch: new feature branch off
main(suggested:browser-jail) - Runtime: Node v22+ on FreeBSD — no Bun anywhere in this work
- Build/test stamping: new trailer hook from
17746bbalready lives onmain; full-suite test footer on each commit - Coordination: Codex's controlplane-heartbeat refactor runs in parallel on a separate branch; no expected conflicts (different file surfaces)
- UI-TARS alignment: do not implement a parallel Clawdie-specific GUI-agent loop, action grammar, or prediction parser unless the UI-TARS-compatible operator path is proven unworkable. The browser jail remains the execution backend; the controlplane or external client owns the model loop.
Pre-flight
Before starting Phase 0.5:
- Read
docs/internal/BROWSER-JAIL.mdend-to-end. Especially the threat model and the implementation notes on FreeBSD Chromium / CDP choice. - Skim
setup/ollama.tsandinfra/packages/ollama-jail.txtas reference for the jail-bootstrap pattern. The browser jail follows the same shape. - Confirm the host has the recent post-
17746bbtrailer hook so commit footers reflect single, accurate full-suite runs.
Phase 0.5 — FreeBSD viability spike
Status: COMPLETE by Codex on 11.maj.2026. See
docs/internal/BROWSER-JAIL-FREEBSD-VIABILITY.md.
Goal: before any clawdie-side code, prove the FreeBSD substrate supports headless Chromium + CDP.
Steps:
- On the FreeBSD host, create a throwaway bastille jail:
sudo bastille create browser-spike 15.0-RELEASE 10.0.0.X sudo bastille pkg browser-spike install -y chromium - From inside the jail, run Chromium headless with the debugging port:
Note:chromium \ --headless=new \ --no-sandbox \ --remote-debugging-port=9222 \ --user-data-dir=/tmp/spike-profile \ about:blank--no-sandboxis for the spike only. Production runs with Chromium sandbox enabled inside a bastille jail; bastille is the outer sandbox. - From a separate shell in the jail, install Node 22 and a CDP client:
pkg install -y node22 npm-node22 npm init -y && npm install puppeteer-core@latest - Write a 20-line Node script that:
- Connects to
http://127.0.0.1:9222viapuppeteer-core.connect. - Opens a new page, navigates to
data:text/html,<h1>hello</h1>. - Takes a PNG screenshot.
- Reads the
<h1>text viapage.evaluate(() => document.querySelector('h1').textContent). - Prints
OKand exits.
- Connects to
- If
puppeteer-corehas FreeBSD-specific issues, retry withchrome-remote-interfaceas the fallback CDP client.
Output: a viability note at
docs/internal/BROWSER-JAIL-FREEBSD-VIABILITY.md covering:
- Which
pkgChromium version was used. - Which CDP client worked (
puppeteer-corevschrome-remote-interface). - Any FreeBSD-specific flags or environment that were needed.
- A copy of the working spike script.
Gate: passed. puppeteer-core connected to system-pkg Chromium over CDP,
read DOM text, wrote a PNG screenshot, and re-ran the deterministic renderer.
Commit: one commit, the viability note + the spike script under
scripts/browser-jail-spike/. Trailer must show full-suite test pass.
Phase 1 — MVP browser jail + MCP proxy
Only proceed after Phase 0.5 passes.
Phase 1A — Jail-side HTTP service
Slice into small commits, each with full-suite tests passing.
-
Jail scaffolding.
infra/packages/browser-jail.txtlisting packages (chromium, node22).setup/browser-jail.tsmirroringsetup/ollama.tsshape.- Mount the host pkg cache with
mountPkgCacheInJail(jailName)before package installation. - Set initial ZFS quotas: 10 GiB for the jail dataset, 20 GiB for the screenshot/session dataset (tune later with telemetry).
- rc.d service file for Chromium with the right flags.
- Document the bastille jail config in
docs/internal/BROWSER-JAIL-OPS.md(mirrorsLOCAL-LLM.mdshape).
-
HTTP server skeleton.
- Node v22+ HTTP server on
:8090, binds to the internal jail IP. GET /healthreturns{status: ok}.- Routes scaffolded for
/sessions,/screenshot,/click,/type,/scroll,/navigate,/read_dom— return501for now. - Unit tests for routing and the health endpoint.
- Node v22+ HTTP server on
-
Session lifecycle.
POST /sessionsaccepts{record: "off" | "transient" | "audit"}(immutable for session life; default"transient"). Spawn freshBrowserContext, return{session_id, record}.DELETE /sessions/:id→ close context, delete profile dir.- In-memory map of
session_id → {BrowserContext, record, ring_buffer}. - Test: open + close, double-close returns 404, max sessions per jail
enforced,
recordechoed in response, attempting to changerecordmid-session returns 400.
-
navigate.POST /navigate{session_id, url}→ page.goto, return{final_url, title}.- Test: 200 OK page, 404, redirect, network error.
-
screenshot.POST /screenshot{session_id, full_page?}→ PNG, return base64- dimensions + timestamp +
persisted_path?.
- dimensions + timestamp +
- Default viewport-only;
full_page: trueis explicit. - Persistence is governed by the session's
recordmode (not per-call):offwrites nothing;transientwrites to the ring buffer and FIFO-evicts at N=50;auditwrites all with 7d retention. - Test: full page vs viewport, multiple sequential screenshots,
ring buffer eviction at N+1,
record:"off"returns nopersisted_path.
-
clickandtype.POST /clickaccepts either{x,y}or{selector}.POST /typeaccepts{text, selector?}(uses focused element if no selector).after_screenshot: truebundles a fresh screenshot into the response.- Test: click by coords, click by selector, type into focused vs selector, after_screenshot behavior.
-
scrollandread_dom.POST /scroll{session_id, delta_y, selector?}.POST /read_dom{session_id, selector?, max_chars?}→ outerHTML truncated.- Tests as above.
-
Resource limits and retention.
- Per-session deadline (default 300s).
- Max concurrent sessions per jail (default 10).
record:"transient"ring buffer: N=50 per session, FIFO-evicted as new screenshots arrive (no time component).record:"audit"retention: 7d, GC'd by controlplane cron (configurable per-tenant in Phase 2).- Dataset-level FIFO eviction under screenshot quota pressure (80% high-water alert, 100% evicts transient sessions first, then audit). Eviction never crosses tenant boundaries.
- Tests: deadline kills a stuck session; over-limit returns 429; transient ring buffer evicts at N+1; audit GC removes screenshots past 7d; quota-pressure eviction prefers transient over audit data.
Phase 1B — Controlplane MCP proxy
-
MCP server scaffolding.
- New
src/browser-jail-mcp.ts. - HTTP/SSE transport (one canonical transport in MVP).
- Tool definitions matching the
browser.*surface inBROWSER-JAIL.md. - All tools return
501initially. - Unit tests: tool list, schema validation.
- New
-
Auth wiring.
- Reuse better-auth session check at the MCP boundary.
- Resolve
tenant_id+operator_idfrom session. - Test: unauthenticated request → 401, valid session → forwarded.
-
Audit log table + write path.
- Migration:
audit.browser_session_eventsper the schema inBROWSER-JAIL.md. - Proxy writes a row before forwarding to the jail.
browser.typeparams redacted (length + redacted flag only).- Tests: row written, redaction applied, retries don't duplicate.
- Migration:
-
Per-tool forwarding.
- One commit per tool, each: route → audit → HTTP call to jail → return MCP response.
- Tests: happy path, jail-down, tenant mismatch.
-
Claude Desktop integration smoke test.
docs/internal/BROWSER-JAIL-CLAUDE-DESKTOP-SETUP.mdshowing theclaude_desktop_config.jsonsnippet that points at the proxy.- Manual smoke: open session, navigate to a test page, click, screenshot, close session. Document outcome in the same doc.
Phase 1C — Network policy
- PF rules on the browser jail.
- Block egress to RFC1918, loopback, link-local, ULA, fe80::/10, 169.254.169.254.
- Allow public egress.
- Ingress on :8090 only from controlplane IP.
- Tests: PF rule lint + an integration test that asserts a navigate to an RFC1918 URL fails as expected.
Phase 1 stop point
- Browser jail is reachable only from controlplane.
- Controlplane MCP proxy presents the
browser.*tools to Claude Desktop. - A manual end-to-end smoke (Claude Desktop → MCP → jail → page action) works.
- All commits green on full-suite tests.
Do not proceed to controlplane integration (Phase 2) until Phase 1 is demoed and reviewed by Claude/Sam.
Phase 2 — Controlplane integration
Spec only — implement after Phase 1 review.
- New controlplane task type:
browser-session. - ZFS snapshot before session start; destroy on clean exit; retain on error. Hook reuses the existing hostd-side snapshot helpers from the sudo-elimination work.
- PF rule hardening: confirm rules survive jail restart, integrate with the existing PF management surface.
- Audit retention: 7d default, configurable per-tenant.
- Screenshot GC cron: hourly sweep of
/var/db/browser-jailfor sessions past their retention window.
Validation per commit
- Run the full test suite (not just browser-jail / controlplane
tests). Reference:
7dca928's "2369 passed (703 files)" footer is the canonical shape. - Commit footer must show single, accurate stamping (the
17746bb-era hook). - No commit may introduce imports from
src/controlplane-api.tsinto the browser-jail modules — DAG must stay acyclic.
Open questions / escalation triggers
Escalate to Claude/Sam before deciding, don't guess:
- Phase 0.5 viability spike fails with both
puppeteer-coreandchrome-remote-interface. - A clean better-auth session token isn't available for MCP clients (Claude Desktop in particular — check its auth-passing model before Phase 1B-2).
- Chromium's BrowserContext isolation turns out leakier than documented during Phase 1A-3 testing.
- ZFS snapshot lifecycle decisions (retention, restore semantics) for Phase 2.
Out of scope
- Electron + nut-js desktop shell.
- Vision/grounding inside the browser jail. The jail ships pixels; the client agent grounds them.
- Multi-jail scale-out beyond one shared Chromium per jail.
- File downloads/uploads (Phase 3+).