clawdie-ai/doc/BROWSER-JAIL-HANDOFF.md
Operator & Codex ba33a349cc Document UI-TARS adoption direction
---
Build: pass | Tests: pass — 2382 passed (175 files)
2026-05-11 12:33:13 +02:00

11 KiB

Browser Jail Handoff

From: Claude (design) → Codex (implementation) Date: 11.maj.2026 Status: PHASE 0.5 COMPLETE — READY FOR PHASE 1 REVIEW

Design reference: docs/internal/BROWSER-JAIL.md Vision spike: docs/internal/VISION-GROUNDING-FINDINGS.md (in progress) UI-TARS direction: docs/internal/UI-TARS-ADOPTION.md

Scope

  • Repo: codeberg.org:Clawdie/Clawdie-AI.git
  • Branch: new feature branch off main (suggested: browser-jail)
  • Runtime: Node v22+ on FreeBSD — no Bun anywhere in this work
  • Build/test stamping: new trailer hook from 17746bb already lives on main; full-suite test footer on each commit
  • Coordination: Codex's controlplane-heartbeat refactor runs in parallel on a separate branch; no expected conflicts (different file surfaces)
  • UI-TARS alignment: do not implement a parallel Clawdie-specific GUI-agent loop, action grammar, or prediction parser unless the UI-TARS-compatible operator path is proven unworkable. The browser jail remains the execution backend; the controlplane or external client owns the model loop.

Pre-flight

Before starting Phase 0.5:

  1. Read docs/internal/BROWSER-JAIL.md end-to-end. Especially the threat model and the implementation notes on FreeBSD Chromium / CDP choice.
  2. Skim setup/ollama.ts and infra/packages/ollama-jail.txt as reference for the jail-bootstrap pattern. The browser jail follows the same shape.
  3. Confirm the host has the recent post-17746bb trailer hook so commit footers reflect single, accurate full-suite runs.

Phase 0.5 — FreeBSD viability spike

Status: COMPLETE by Codex on 11.maj.2026. See docs/internal/BROWSER-JAIL-FREEBSD-VIABILITY.md.

Goal: before any clawdie-side code, prove the FreeBSD substrate supports headless Chromium + CDP.

Steps:

  1. On the FreeBSD host, create a throwaway bastille jail:
    sudo bastille create browser-spike 15.0-RELEASE 10.0.0.X
    sudo bastille pkg browser-spike install -y chromium
    
  2. From inside the jail, run Chromium headless with the debugging port:
    chromium \
      --headless=new \
      --no-sandbox \
      --remote-debugging-port=9222 \
      --user-data-dir=/tmp/spike-profile \
      about:blank
    
    Note: --no-sandbox is for the spike only. Production runs with Chromium sandbox enabled inside a bastille jail; bastille is the outer sandbox.
  3. From a separate shell in the jail, install Node 22 and a CDP client:
    pkg install -y node22 npm-node22
    npm init -y && npm install puppeteer-core@latest
    
  4. Write a 20-line Node script that:
    • Connects to http://127.0.0.1:9222 via puppeteer-core.connect.
    • Opens a new page, navigates to data:text/html,<h1>hello</h1>.
    • Takes a PNG screenshot.
    • Reads the <h1> text via page.evaluate(() => document.querySelector('h1').textContent).
    • Prints OK and exits.
  5. If puppeteer-core has FreeBSD-specific issues, retry with chrome-remote-interface as the fallback CDP client.

Output: a viability note at docs/internal/BROWSER-JAIL-FREEBSD-VIABILITY.md covering:

  • Which pkg Chromium version was used.
  • Which CDP client worked (puppeteer-core vs chrome-remote-interface).
  • Any FreeBSD-specific flags or environment that were needed.
  • A copy of the working spike script.

Gate: passed. puppeteer-core connected to system-pkg Chromium over CDP, read DOM text, wrote a PNG screenshot, and re-ran the deterministic renderer.

Commit: one commit, the viability note + the spike script under scripts/browser-jail-spike/. Trailer must show full-suite test pass.


Phase 1 — MVP browser jail + MCP proxy

Only proceed after Phase 0.5 passes.

Phase 1A — Jail-side HTTP service

Slice into small commits, each with full-suite tests passing.

  1. Jail scaffolding.

    • infra/packages/browser-jail.txt listing packages (chromium, node22).
    • setup/browser-jail.ts mirroring setup/ollama.ts shape.
    • Mount the host pkg cache with mountPkgCacheInJail(jailName) before package installation.
    • Set initial ZFS quotas: 10 GiB for the jail dataset, 20 GiB for the screenshot/session dataset (tune later with telemetry).
    • rc.d service file for Chromium with the right flags.
    • Document the bastille jail config in docs/internal/BROWSER-JAIL-OPS.md (mirrors LOCAL-LLM.md shape).
  2. HTTP server skeleton.

    • Node v22+ HTTP server on :8090, binds to the internal jail IP.
    • GET /health returns {status: ok}.
    • Routes scaffolded for /sessions, /screenshot, /click, /type, /scroll, /navigate, /read_dom — return 501 for now.
    • Unit tests for routing and the health endpoint.
  3. Session lifecycle.

    • POST /sessions accepts {record: "off" | "transient" | "audit"} (immutable for session life; default "transient"). Spawn fresh BrowserContext, return {session_id, record}.
    • DELETE /sessions/:id → close context, delete profile dir.
    • In-memory map of session_id → {BrowserContext, record, ring_buffer}.
    • Test: open + close, double-close returns 404, max sessions per jail enforced, record echoed in response, attempting to change record mid-session returns 400.
  4. navigate.

    • POST /navigate {session_id, url} → page.goto, return {final_url, title}.
    • Test: 200 OK page, 404, redirect, network error.
  5. screenshot.

    • POST /screenshot {session_id, full_page?} → PNG, return base64
      • dimensions + timestamp + persisted_path?.
    • Default viewport-only; full_page: true is explicit.
    • Persistence is governed by the session's record mode (not per-call): off writes nothing; transient writes to the ring buffer and FIFO-evicts at N=50; audit writes all with 7d retention.
    • Test: full page vs viewport, multiple sequential screenshots, ring buffer eviction at N+1, record:"off" returns no persisted_path.
  6. click and type.

    • POST /click accepts either {x,y} or {selector}.
    • POST /type accepts {text, selector?} (uses focused element if no selector).
    • after_screenshot: true bundles a fresh screenshot into the response.
    • Test: click by coords, click by selector, type into focused vs selector, after_screenshot behavior.
  7. scroll and read_dom.

    • POST /scroll {session_id, delta_y, selector?}.
    • POST /read_dom {session_id, selector?, max_chars?} → outerHTML truncated.
    • Tests as above.
  8. Resource limits and retention.

    • Per-session deadline (default 300s).
    • Max concurrent sessions per jail (default 10).
    • record:"transient" ring buffer: N=50 per session, FIFO-evicted as new screenshots arrive (no time component).
    • record:"audit" retention: 7d, GC'd by controlplane cron (configurable per-tenant in Phase 2).
    • Dataset-level FIFO eviction under screenshot quota pressure (80% high-water alert, 100% evicts transient sessions first, then audit). Eviction never crosses tenant boundaries.
    • Tests: deadline kills a stuck session; over-limit returns 429; transient ring buffer evicts at N+1; audit GC removes screenshots past 7d; quota-pressure eviction prefers transient over audit data.

Phase 1B — Controlplane MCP proxy

  1. MCP server scaffolding.

    • New src/browser-jail-mcp.ts.
    • HTTP/SSE transport (one canonical transport in MVP).
    • Tool definitions matching the browser.* surface in BROWSER-JAIL.md.
    • All tools return 501 initially.
    • Unit tests: tool list, schema validation.
  2. Auth wiring.

    • Reuse better-auth session check at the MCP boundary.
    • Resolve tenant_id + operator_id from session.
    • Test: unauthenticated request → 401, valid session → forwarded.
  3. Audit log table + write path.

    • Migration: audit.browser_session_events per the schema in BROWSER-JAIL.md.
    • Proxy writes a row before forwarding to the jail.
    • browser.type params redacted (length + redacted flag only).
    • Tests: row written, redaction applied, retries don't duplicate.
  4. Per-tool forwarding.

    • One commit per tool, each: route → audit → HTTP call to jail → return MCP response.
    • Tests: happy path, jail-down, tenant mismatch.
  5. Claude Desktop integration smoke test.

    • docs/internal/BROWSER-JAIL-CLAUDE-DESKTOP-SETUP.md showing the claude_desktop_config.json snippet that points at the proxy.
    • Manual smoke: open session, navigate to a test page, click, screenshot, close session. Document outcome in the same doc.

Phase 1C — Network policy

  1. PF rules on the browser jail.
    • Block egress to RFC1918, loopback, link-local, ULA, fe80::/10, 169.254.169.254.
    • Allow public egress.
    • Ingress on :8090 only from controlplane IP.
    • Tests: PF rule lint + an integration test that asserts a navigate to an RFC1918 URL fails as expected.

Phase 1 stop point

  • Browser jail is reachable only from controlplane.
  • Controlplane MCP proxy presents the browser.* tools to Claude Desktop.
  • A manual end-to-end smoke (Claude Desktop → MCP → jail → page action) works.
  • All commits green on full-suite tests.

Do not proceed to controlplane integration (Phase 2) until Phase 1 is demoed and reviewed by Claude/Sam.


Phase 2 — Controlplane integration

Spec only — implement after Phase 1 review.

  1. New controlplane task type: browser-session.
  2. ZFS snapshot before session start; destroy on clean exit; retain on error. Hook reuses the existing hostd-side snapshot helpers from the sudo-elimination work.
  3. PF rule hardening: confirm rules survive jail restart, integrate with the existing PF management surface.
  4. Audit retention: 7d default, configurable per-tenant.
  5. Screenshot GC cron: hourly sweep of /var/db/browser-jail for sessions past their retention window.

Validation per commit

  • Run the full test suite (not just browser-jail / controlplane tests). Reference: 7dca928's "2369 passed (703 files)" footer is the canonical shape.
  • Commit footer must show single, accurate stamping (the 17746bb-era hook).
  • No commit may introduce imports from src/controlplane-api.ts into the browser-jail modules — DAG must stay acyclic.

Open questions / escalation triggers

Escalate to Claude/Sam before deciding, don't guess:

  • Phase 0.5 viability spike fails with both puppeteer-core and chrome-remote-interface.
  • A clean better-auth session token isn't available for MCP clients (Claude Desktop in particular — check its auth-passing model before Phase 1B-2).
  • Chromium's BrowserContext isolation turns out leakier than documented during Phase 1A-3 testing.
  • ZFS snapshot lifecycle decisions (retention, restore semantics) for Phase 2.

Out of scope

  • Electron + nut-js desktop shell.
  • Vision/grounding inside the browser jail. The jail ships pixels; the client agent grounds them.
  • Multi-jail scale-out beyond one shared Chromium per jail.
  • File downloads/uploads (Phase 3+).