Phase 0 done: Clawdie/Colibri created, Phase-1 colibri-probe scaffold pushed (cf7d25e), builds --release on Linux. Lane table + Decisions updated with the clone URL so Codex (code) and the ISO builder (FreeBSD build) can clone and start.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
Build: pass | Tests: FAIL — 18 failed
Defines the implementation path for a combined, FreeBSD-native Rust control plane: Multica's coordination model + Reasonix's cache-first cost discipline; AionUI dropped on FreeBSD (Electron/Bun) and deferred to an optional Linux client. New dedicated cross-platform repo (greenfield); build Linux-first (domedog) then FreeBSD (osa).
Adds an Active-lanes table so agents can work in parallel via the hub-and-spoke run-manifest contract: Codex=Rust code (osa), ISO builder=FreeBSD build (osa), Claude=Linux build + live DeepSeek cache smoke (domedog), shared schema steward. Repo creation (Phase 0) is the single gating handoff before lanes fan out.
Includes gated supersession/drop candidates (non-Pi runners, per-backend heartbeats, terminal-scrape glue) pending proof gates + per-file caller inventory, with archive/multitenant-claude-pre-divergence as the rollback snapshot. Nothing deleted.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
Build: pass | Tests: FAIL — 18 failed
Full-page SL render of the rebuilt landing (final state, including the reinstated concrete install CTA). Captured via headless Chromium, optimized losslessly with optipng (~460 KB). Reference asset under doc/, not part of the Astro build.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
Build: pass | Tests: FAIL — 18 failed
Confirm the current Pi/openai-codex gpt-5.5 lane in the active Colibri operational matrix alongside the Codex app, Claude, Z.ai, DeepSeek, host-status, and version-sync lanes.
---
Build: pass | Tests: pass — 2490 passed (187 files)
Add a structured runtime inventory schema, drift summary tests, and skills for Pi provider smoke tests plus Node/Pi/npm version synchronization across hosts and ISO build inputs.
---
Build: pass | Tests: pass — 2485 passed (186 files)
Introduce pure Pi JSON event parsing and cross-host run manifest validation as the first code on the Colibri control branch.
---
Build: pass | Tests: pass — 2467 passed (184 files)
Park the Pi-only control simplification and cross-host run contract so other agents can review before implementation starts.
---
Build: pass | Tests: pass — 2456 passed (182 files)
Codex explicitly reverted to system /tmp for root pcap staging in c9a8aa1
and documented the exception. No further action needed.
---
Build: pass | Tests: pass — 2456 passed (182 files)
Small follow-ups after the design alignment sweep, surfaced as
improvements worth landing before Phase 0.6 starts.
- BROWSER-JAIL.md: add a "Related docs" index at the top so a new reader
can find current design vs phase records vs direction vs history
without grepping. Resolve the two non-blocking choices: credentials
store backend = Postgres, refresh UX = controlplane-streamed clone.
Add a concrete operator_grant_token schema (jti/iss/tenant_id/
origin_session_id/operator_id/allowed_domains/issued_at/expires_at/
single_use) with clawdie-internal opaque-token storage in postgres,
validation rules, and revocation semantics.
- BROWSER-JAIL-HANDOFF.md: add an "Order of work" preface to the
implementation section so the component-organized checklists are read
with the dependency chain in mind. Setup → hostd → backend →
credentials/grants → operator/injection → run_task endpoint → pi
extension → smoke. pi-side test cases require the full stack and
should not be the first integration target.
- VISION-GROUNDING-FINDINGS.md: add a "Role under UI-TARS adoption"
footer reframing the doc as Phase 1 model-selection input (which
vision model to pair with the UI-TARS-compatible runner), not "should
we build vision grounding."
- BROWSER-JAIL-TEMPLATE-CLONE-PROPOSAL.md: mark HISTORICAL. Retained
for pivot reasoning (why profile-byte cloning was dropped). New
decisions land in BROWSER-JAIL.md, not here.
Nothing in this commit changes the architecture or blocks any track.
All five items are doc hygiene that future implementation work will
appreciate.
Phase 0.6 validation paused: ZFS clone substrate works; Chromium encrypted
profile inheritance does not. Cookies survive as SQLite rows in the
cloned profile but do not decrypt or present in the clone's Chromium.
Documented in BROWSER-JAIL-CLONE-LIFECYCLE-VALIDATION.md.
Pivot: decouple substrate from identity.
- Substrate (template + clone) keeps install amortization, jail-level
isolation, fast spawn, clean teardown. All validated.
- Identity moves out of the profile. New: clawdie-owned credentials store
per (tenant, domain), encrypted at rest, populated by an operator-driven
refresh workflow.
- Cookies injected per-session via CDP Network.setCookie at clone start.
- Templates become credential-free. browserop and browserclean are
identical at rest; difference is policy (browserop eligible for
operator-mode injection when paired with valid operator_grant_token).
operator_grant_token from earlier commits now authorizes credential
injection scoped by domain, not access to a magic pre-authenticated
template. Stronger model: domain-filtered, audited, revocable.
Refines per Codex review:
- MVP scope: cookies only, domain-filtered, grant-token required;
no localStorage, IndexedDB, passkeys, persistence-back from tasks.
- Sealed-snapshot mechanics retired — no encryption-survival concern,
no SingletonLock survival concern.
- Phase 0.6 redefined as cookie-injection round-trip → cross-clone
injection → 3-cycle clone/inject/smoke/destroy with the original
latency / orphaned-state acceptance criteria.
- Cleanup order updated to record forced-unmount fallback after busy
dataset; Phase 0.6 confirmed this is a normal case, not an alert.
- Operator credential refresh workflow described as a new section;
refresh UX (web stream vs VNC vs Lumina-local) explicitly deferred.
No edits to BROWSER-JAIL.md or BROWSER-JAIL-HANDOFF.md. Those reshape
only after Phase 0.6 (redefined) passes.
Fold in the agreed refinements from Codex's review. Proposal is now
explicit about what's decided and what gates on Phase 0.6.
Status: PROPOSAL — pending Phase 0.6 Bastille/ZFS clone lifecycle
validation. Hard gate before any BROWSER-JAIL.md or BROWSER-JAIL-HANDOFF.md
edits.
Key decisions baked in:
- VNET-safe naming throughout (browserop, browserclean, browsertaskNNN);
the hyphenated names in the original proposal would have been rejected
by bastille (confirmed in Phase 0.5 viability).
- Two templates, default "clean"; "operator" requires explicit
authorization, not silent model tool-call grant.
- Sealed-snapshot clones — Chromium stopped + SQLite quiesced before
the snapshot used as the clone source. No cloning from a live
mutable profile.
- No external ingress to template jail; controlplane-mediated refresh
only.
- Defer Firefox sync. Manual refresh in MVP.
- PF approach: static ruleset matching pfctl table "browser_tasks";
per-clone membership via pfctl -t -T add/delete. No full firewall
reload per session.
- Watchdog rule: templates are infrastructure (must stay up); task
clones are session-owned ephemeral resources (disappearance is normal).
- Screenshots/audit stored outside clone datasets so they survive
clone destruction.
- 7-step cleanup order codified (service stop, chrome TERM/KILL,
unmount, bastille stop, PF delete, IP release, zfs destroy).
New sections:
- Seven additional open questions from Codex review (VNET naming/IP
pool, sealed snapshot mechanics, profile clone correctness, RCTL
limits, screenshot lifetime, orphan reaper, operator-template
auth UX).
- Phase 0.6 spike (GATING) with hard acceptance criteria:
10 sequential cycles with zero orphans, median <2s p95 <5s clone+
start latency, sealed-template cookie visible in clone, idempotent
reaper.
- Failure modes that change the verdict.
- Status summary table.
No edits to BROWSER-JAIL.md or BROWSER-JAIL-HANDOFF.md.
Significant architectural change vs the current BROWSER-JAIL.md design.
Replaces "one long-lived jail + per-task BrowserContext" with persistent
template jails (operator-browser, clean-browser) + ephemeral per-task
ZFS clones.
Motivation: the current design has no story for persistent operator
logins. Every task starts blank — 2FA on every run, no usable workflow
for authenticated services. Cloning a thick template via ZFS is
~constant-time, plays to clawdie's existing platform strengths
(bastille, hostd, ZFS), and gives per-task jail-level isolation rather
than BrowserContext-level.
Status: PROPOSAL — six open questions documented for Codex review
before any BROWSER-JAIL.md edits or implementation reshape. Specifically
seeks Codex's read on bastille clone operational smoothness at per-task
rate, watchdog tolerance for fluctuating jail count, and PF rule
generation cost per clone.
No changes to BROWSER-JAIL.md or BROWSER-JAIL-HANDOFF.md in this commit —
those land after the proposal is accepted or amended.
Replace per-call persist:false with a session-level record mode set at
open_session, immutable for the session's life. Three modes:
- off: nothing written to disk; model still sees screenshots in
context.
- transient: last N=50 screenshots in a FIFO ring buffer per session.
Default. Enough for post-hoc debugging without unbounded
growth.
- audit: persist all with 7d retention. Explicit opt-in for
sensitive operations.
Default resolution: explicit param → tenant default → system default
("transient"). MVP hardcodes the system default; tenant overrides are
Phase 2.
Rationale: screenshots serve three different jobs (agent's eyes,
debugging trace, forensic audit), and a single retention policy can't
serve all three without either drowning in disk or losing audit value.
The dashcam analogy in the doc covers this directly. Per-call
persistence flags are messy and per-tenant audit-flagging at session
level was the wrong granularity.
Also:
- Credential-exfiltration mitigation in the threat model now describes
the off/audit levers an operator has.
- Future enhancement noted: browser.freeze_session to promote a
transient ring buffer to audit retention without restarting.
- Phase 1A handoff updated: POST /sessions accepts record, response
echoes it; /screenshot persistence behavior tied to session record
mode with explicit test points.
Move the spike workspace from the gitignored tmp/ scratch dir into
scripts/browser-jail-spike/ so Codex (or anyone) can re-run it on
FreeBSD with the keys already configured on the host. Self-contained:
fixtures, CDP renderer, OpenAI-compat harness, scorer, plus the
committed screenshots and ground-truth JSON so the experiment is
reproducible without re-rendering.
Claude Opus 4.7 baseline included in results/ (17/17 PASS at 30 px,
mean 1 px). Pending columns:
- GPT-4o via OPENAI_API_KEY
- GLM-4V via ZAI_API_KEY (pi's existing provider)
- UI-TARS-7B via vLLM if/when an endpoint exists
Path references in VISION-GROUNDING-FINDINGS.md and
BROWSER-JAIL-HANDOFF.md updated to match the new location.
Make the docs renderer name match its purpose, add CMS_DOCS_SITE_PATH with ASTRO_SITE_PATH compatibility, and update docs publishing paths.
---
Build: pass | Tests: pass — 2372 passed (704 files)
Every file under docs/internal/ ends up in the bootstrap/skills-memory
artifact (per metadata.json: "Full project docs, internal docs, identity
files, and skill definitions"). Stale handoffs, dated build reports,
single-commit reviews, and superseded design notes were polluting the
embedding index with low-signal chunks.
Removed:
- TLS-CERT-LIFECYCLE-HANDOFF.md, GLASSPANE-FREEBSD-HANDOFF.md,
CMS-ASTRO-SOURCE-OF-TRUTH-HANDOFF.md (handoffs whose work has landed)
- HOST-DB-READINESS-REVIEW.md, HOST-DB-REBOOT-REVIEW.md,
HOST-DB-RECOVERY-PLAN.md, SYSTEM-NAMESPACE-BRANCH-REVIEW.md
(commit/branch reviews self-marked as historical)
- BUILD-TEST-REPORT-06.APR.2026.md, test-results.md (dated snapshots)
- DEBUG_CHECKLIST.md (Feb 2026 known-issues list, top item already fixed)
- BOOTABLE-ISO-PLAN-V1.md (V1 plan; ISO-FIRST-BOOT-IMPLEMENTATION.md is now
the source of truth)
- STRAPI-FREEBSD-SETUP.md, PI-SKILLS-INTEGRATION.md, CODEX-FREEBSD.md
(workarounds and one-off design notes for resolved/superseded paths)
- REFACTOR-PLAN.md, nanoclaw-architecture-final.md, AGENT-HARNESS-V2.md,
AGENT-SKILLS-VS-REALITY.md (older planning/architecture docs whose
decisions are now in code or ARCHITECTURE.md)
- BUILTIN-KNOWLEDGE-SPEC.md, LOCAL-KNOWLEDGE-BOOTSTRAP.md (early specs
superseded by SKILLS-ARTIFACT-V1-PLAN.md)
- HEARTBEAT.md (design doc; implementation lives in scripts/heartbeat.sh
and src/controlplane-heartbeat.ts)
- POSTGRES-PERMISSIONS.md (one-off fix recipe)
- RUNTIME-MANIFEST-DESIGN.md (status: Implemented; design is in code now)
Updates to remaining files patch broken cross-links:
- ARCHITECTURE.md drops the two table rows pointing at deleted docs
- doc/THREE-BIRD-ARCHITECTURE.md drops Strapi-setup link references
- docs/internal/SKILLS-ARTIFACT-V1-PLAN.md drops the "Depends on" line
- docs/internal/SUDO_REPLACEMENT.md trims its list of internal docs that
reference sudo
- .agent/skills/setup and .agent/skills/docs-deployment drop pointers to
REFACTOR-PLAN and DEBUG_CHECKLIST
Net: 23 files deleted, 7566 lines removed. docs/internal/ goes from 41 to
18 markdown files. The artifact's next refresh will see proportionally
less noise in retrieval.
---
Build: FAIL | Tests: FAIL — 16 failed