Commit graph

162 commits

Author SHA1 Message Date
e1d4fd4441 chore(freebsd): align host baseline with Python 3.12 (Sam & Pi)
Some checks failed
CI / ci (pull_request) Has been cancelled
---
Build: FAIL | Tests: FAIL
2026-06-17 14:57:19 +02:00
Operator & Claude Code
29d2fe0ce2 Note Colibri repo is live + clone URL in lane table (Sam & Claude)
Phase 0 done: Clawdie/Colibri created, Phase-1 colibri-probe scaffold pushed (cf7d25e), builds --release on Linux. Lane table + Decisions updated with the clone URL so Codex (code) and the ISO builder (FreeBSD build) can clone and start.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---
Build: pass | Tests: FAIL — 18 failed
2026-05-26 10:25:47 +02:00
Operator & Claude Code
8d40c08ec2 Add Colibri control plane plan with parallel agent lanes (Sam & Claude)
Defines the implementation path for a combined, FreeBSD-native Rust control plane: Multica's coordination model + Reasonix's cache-first cost discipline; AionUI dropped on FreeBSD (Electron/Bun) and deferred to an optional Linux client. New dedicated cross-platform repo (greenfield); build Linux-first (domedog) then FreeBSD (osa).

Adds an Active-lanes table so agents can work in parallel via the hub-and-spoke run-manifest contract: Codex=Rust code (osa), ISO builder=FreeBSD build (osa), Claude=Linux build + live DeepSeek cache smoke (domedog), shared schema steward. Repo creation (Phase 0) is the single gating handoff before lanes fan out.

Includes gated supersession/drop candidates (non-Pi runners, per-backend heartbeats, terminal-scrape glue) pending proof gates + per-file caller inventory, with archive/multitenant-claude-pre-divergence as the rollback snapshot. Nothing deleted.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---
Build: pass | Tests: FAIL — 18 failed
2026-05-26 09:40:20 +02:00
Clawdie
e89e2c1d07 Two rust based projects that can be used for "Clawdie Controlplane"
Side by side comparison of two popular solutions for multi agent orchestration. Rust based Clawdie Controlplane is next goal
2026-05-26 08:38:40 +02:00
Clawdie
9661e620cf Reasonix - check if we can build with rust on FreeBSD
Implementation idea for our smoke test with Deepseek api
2026-05-26 08:08:29 +02:00
Clawdie
52b5eb310c Colibri design
To complement marketing update for Astro deploy
2026-05-25 15:51:43 +02:00
Operator & Claude Code
786cc9593f Add clawdie.si landing preview screenshot (Sam & Claude)
Full-page SL render of the rebuilt landing (final state, including the reinstated concrete install CTA). Captured via headless Chromium, optimized losslessly with optipng (~460 KB). Reference asset under doc/, not part of the Astro build.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---
Build: pass | Tests: FAIL — 18 failed
2026-05-25 15:47:15 +02:00
74a0f6e1db Defining target audience - Slovenian
Soloprenour to Soveringprenour talk by Jordan Urbs
2026-05-25 15:06:59 +02:00
028f3d0d08 Add Pi GPT-5.5 lane to Colibri matrix
Confirm the current Pi/openai-codex gpt-5.5 lane in the active Colibri operational matrix alongside the Codex app, Claude, Z.ai, DeepSeek, host-status, and version-sync lanes.

---
Build: pass | Tests: pass — 2490 passed (187 files)
2026-05-25 00:15:15 +02:00
fa2ecb0c7c Avoid brittle Codex model pins in docs (Sam & Codex)
---
Build: pass | Tests: pass — 2490 passed (187 files)
2026-05-25 00:09:18 +02:00
51ba35bd33 Document active Colibri lanes and Codex identity (Sam & Codex)
---
Build: pass | Tests: pass — 2490 passed (187 files)
2026-05-25 00:04:41 +02:00
b26e4da118 Add Colibri runtime version inventory
Add a structured runtime inventory schema, drift summary tests, and skills for Pi provider smoke tests plus Node/Pi/npm version synchronization across hosts and ISO build inputs.

---
Build: pass | Tests: pass — 2485 passed (186 files)
2026-05-24 20:58:37 +02:00
477605ad13 Document Pi DeepSeek smoke lane
Capture Linux Pi credential layout and JSON-mode DeepSeek smoke checks for the Colibri provider gate.

---
Build: pass | Tests: pass — 2467 passed (184 files)
2026-05-24 19:48:50 +02:00
e5f05abd51 Add Colibri event parsing foundation
Introduce pure Pi JSON event parsing and cross-host run manifest validation as the first code on the Colibri control branch.

---
Build: pass | Tests: pass — 2467 passed (184 files)
2026-05-24 19:33:25 +02:00
c5820dec84 Document Colibri Pi control plan
Park the Pi-only control simplification and cross-host run contract so other agents can review before implementation starts.

---
Build: pass | Tests: pass — 2456 passed (182 files)
2026-05-24 19:26:19 +02:00
Operator & Claude Code
f5f60e7838 Mark network skill H3 /tmp usage as operator-accepted WONTFIX (Claude)
Codex explicitly reverted to system /tmp for root pcap staging in c9a8aa1
and documented the exception. No further action needed.

---
Build: pass | Tests: pass — 2456 passed (182 files)
2026-05-18 12:19:07 +02:00
Operator & Claude Code
68e715424a Add session stabilization and network skill review handoffs (Claude)
Two review handoffs from architecture analysis of the xfce-operator-usb
branch and the network throughput skill. Session stabilization covers 15
actionable items (2 HIGH, 5 MEDIUM, 5 LOW). Network skill covers 13 items
(3 HIGH, 6 MEDIUM, 4 LOW).

---
Build: pass | Tests: unknown — Linux agent cannot run tests
2026-05-18 10:15:00 +02:00
a2ab899b2d Wire browser credential injection and operator
---
Build: pass | Tests: pass — 2439 passed (181 files)
2026-05-11 21:38:05 +02:00
30663792ab Pin browser credential injection contract
---
Build: pass | Tests: pass — 2430 passed (179 files)
2026-05-11 21:20:48 +02:00
cd04e9da3f Add browser credentials and grant stores
---
Build: pass | Tests: pass — 2430 passed (179 files)
2026-05-11 20:26:44 +02:00
7cdfb8bf8b Add browser session orchestration internals
---
Build: pass | Tests: pass — 2419 passed (177 files)
2026-05-11 20:11:15 +02:00
e7a1315b30 Document browser session orchestration rules
---
Build: pass | Tests: pass — 2400 passed (176 files)
2026-05-11 19:37:09 +02:00
fcd1172939 Add browser jail HTTP backend
---
Build: pass | Tests: pass — 2400 passed (176 files)
2026-05-11 19:16:46 +02:00
6d662d5d3b Add browser clone hostd lifecycle ops
---
Build: pass | Tests: pass — 2395 passed (175 files)
2026-05-11 18:18:44 +02:00
6c549e7ad0 Rename browser validation assets
---
Build: pass | Tests: pass — 2383 passed (175 files)
2026-05-11 17:32:22 +02:00
5eeb51d68b Provision browser template jail
---
Build: pass | Tests: pass — 2383 passed (175 files)
2026-05-11 16:38:49 +02:00
3ea26f231d Validate browser clone cookie injection
---
Build: pass | Tests: pass — 2383 passed (175 files)
2026-05-11 16:19:12 +02:00
Operator & Claude Code
35ddc4afb2 Polish browser-jail doc constellation
Small follow-ups after the design alignment sweep, surfaced as
improvements worth landing before Phase 0.6 starts.

- BROWSER-JAIL.md: add a "Related docs" index at the top so a new reader
  can find current design vs phase records vs direction vs history
  without grepping. Resolve the two non-blocking choices: credentials
  store backend = Postgres, refresh UX = controlplane-streamed clone.
  Add a concrete operator_grant_token schema (jti/iss/tenant_id/
  origin_session_id/operator_id/allowed_domains/issued_at/expires_at/
  single_use) with clawdie-internal opaque-token storage in postgres,
  validation rules, and revocation semantics.

- BROWSER-JAIL-HANDOFF.md: add an "Order of work" preface to the
  implementation section so the component-organized checklists are read
  with the dependency chain in mind. Setup → hostd → backend →
  credentials/grants → operator/injection → run_task endpoint → pi
  extension → smoke. pi-side test cases require the full stack and
  should not be the first integration target.

- VISION-GROUNDING-FINDINGS.md: add a "Role under UI-TARS adoption"
  footer reframing the doc as Phase 1 model-selection input (which
  vision model to pair with the UI-TARS-compatible runner), not "should
  we build vision grounding."

- BROWSER-JAIL-TEMPLATE-CLONE-PROPOSAL.md: mark HISTORICAL. Retained
  for pivot reasoning (why profile-byte cloning was dropped). New
  decisions land in BROWSER-JAIL.md, not here.

Nothing in this commit changes the architecture or blocks any track.
All five items are doc hygiene that future implementation work will
appreciate.
2026-05-11 15:12:26 +02:00
55a6dee215 Align browser jail design docs
---
Build: pass | Tests: pass — 2383 passed (175 files)
2026-05-11 15:05:31 +02:00
54f612edf2 Fix browser jail registry slot
---
Build: pass | Tests: pass — 2383 passed (175 files)
2026-05-11 14:53:12 +02:00
Operator & Claude Code
8855543ef0 Pivot template+clone proposal to credentials store + CDP injection
Phase 0.6 validation paused: ZFS clone substrate works; Chromium encrypted
profile inheritance does not. Cookies survive as SQLite rows in the
cloned profile but do not decrypt or present in the clone's Chromium.
Documented in BROWSER-JAIL-CLONE-LIFECYCLE-VALIDATION.md.

Pivot: decouple substrate from identity.

- Substrate (template + clone) keeps install amortization, jail-level
  isolation, fast spawn, clean teardown. All validated.
- Identity moves out of the profile. New: clawdie-owned credentials store
  per (tenant, domain), encrypted at rest, populated by an operator-driven
  refresh workflow.
- Cookies injected per-session via CDP Network.setCookie at clone start.
- Templates become credential-free. browserop and browserclean are
  identical at rest; difference is policy (browserop eligible for
  operator-mode injection when paired with valid operator_grant_token).

operator_grant_token from earlier commits now authorizes credential
injection scoped by domain, not access to a magic pre-authenticated
template. Stronger model: domain-filtered, audited, revocable.

Refines per Codex review:

- MVP scope: cookies only, domain-filtered, grant-token required;
  no localStorage, IndexedDB, passkeys, persistence-back from tasks.
- Sealed-snapshot mechanics retired — no encryption-survival concern,
  no SingletonLock survival concern.
- Phase 0.6 redefined as cookie-injection round-trip → cross-clone
  injection → 3-cycle clone/inject/smoke/destroy with the original
  latency / orphaned-state acceptance criteria.
- Cleanup order updated to record forced-unmount fallback after busy
  dataset; Phase 0.6 confirmed this is a normal case, not an alert.
- Operator credential refresh workflow described as a new section;
  refresh UX (web stream vs VNC vs Lumina-local) explicitly deferred.

No edits to BROWSER-JAIL.md or BROWSER-JAIL-HANDOFF.md. Those reshape
only after Phase 0.6 (redefined) passes.
2026-05-11 13:59:50 +02:00
1095a86592 Reduce browser clone validation cycles
---
Build: pass | Tests: pass — 2382 passed (175 files)
2026-05-11 13:01:07 +02:00
fec724273c Document pi browser task integration
---
Build: pass | Tests: pass — 2382 passed (175 files)
2026-05-11 12:44:23 +02:00
ba33a349cc Document UI-TARS adoption direction
---
Build: pass | Tests: pass — 2382 passed (175 files)
2026-05-11 12:33:13 +02:00
70a8a12d36 Rename browser clone spike to validation
---
Build: pass | Tests: pass — 2382 passed (175 files)
2026-05-11 11:41:55 +02:00
Operator & Claude Code
28a8242f47 Revise template+clone proposal per Codex review
Fold in the agreed refinements from Codex's review. Proposal is now
explicit about what's decided and what gates on Phase 0.6.

Status: PROPOSAL — pending Phase 0.6 Bastille/ZFS clone lifecycle
validation. Hard gate before any BROWSER-JAIL.md or BROWSER-JAIL-HANDOFF.md
edits.

Key decisions baked in:

- VNET-safe naming throughout (browserop, browserclean, browsertaskNNN);
  the hyphenated names in the original proposal would have been rejected
  by bastille (confirmed in Phase 0.5 viability).
- Two templates, default "clean"; "operator" requires explicit
  authorization, not silent model tool-call grant.
- Sealed-snapshot clones — Chromium stopped + SQLite quiesced before
  the snapshot used as the clone source. No cloning from a live
  mutable profile.
- No external ingress to template jail; controlplane-mediated refresh
  only.
- Defer Firefox sync. Manual refresh in MVP.
- PF approach: static ruleset matching pfctl table "browser_tasks";
  per-clone membership via pfctl -t -T add/delete. No full firewall
  reload per session.
- Watchdog rule: templates are infrastructure (must stay up); task
  clones are session-owned ephemeral resources (disappearance is normal).
- Screenshots/audit stored outside clone datasets so they survive
  clone destruction.
- 7-step cleanup order codified (service stop, chrome TERM/KILL,
  unmount, bastille stop, PF delete, IP release, zfs destroy).

New sections:

- Seven additional open questions from Codex review (VNET naming/IP
  pool, sealed snapshot mechanics, profile clone correctness, RCTL
  limits, screenshot lifetime, orphan reaper, operator-template
  auth UX).
- Phase 0.6 spike (GATING) with hard acceptance criteria:
  10 sequential cycles with zero orphans, median <2s p95 <5s clone+
  start latency, sealed-template cookie visible in clone, idempotent
  reaper.
- Failure modes that change the verdict.
- Status summary table.

No edits to BROWSER-JAIL.md or BROWSER-JAIL-HANDOFF.md.
2026-05-11 11:37:53 +02:00
Operator & Claude Code
23db494f57 Propose template + clone browser-jail architecture
Significant architectural change vs the current BROWSER-JAIL.md design.
Replaces "one long-lived jail + per-task BrowserContext" with persistent
template jails (operator-browser, clean-browser) + ephemeral per-task
ZFS clones.

Motivation: the current design has no story for persistent operator
logins. Every task starts blank — 2FA on every run, no usable workflow
for authenticated services. Cloning a thick template via ZFS is
~constant-time, plays to clawdie's existing platform strengths
(bastille, hostd, ZFS), and gives per-task jail-level isolation rather
than BrowserContext-level.

Status: PROPOSAL — six open questions documented for Codex review
before any BROWSER-JAIL.md edits or implementation reshape. Specifically
seeks Codex's read on bastille clone operational smoothness at per-task
rate, watchdog tolerance for fluctuating jail count, and PF rule
generation cost per clone.

No changes to BROWSER-JAIL.md or BROWSER-JAIL-HANDOFF.md in this commit —
those land after the proposal is accepted or amended.
2026-05-11 11:30:31 +02:00
Operator & Claude Code
f2a5c59273 Session-level screenshot recording modes (off/transient/audit)
Replace per-call persist:false with a session-level record mode set at
open_session, immutable for the session's life. Three modes:

- off:       nothing written to disk; model still sees screenshots in
             context.
- transient: last N=50 screenshots in a FIFO ring buffer per session.
             Default. Enough for post-hoc debugging without unbounded
             growth.
- audit:     persist all with 7d retention. Explicit opt-in for
             sensitive operations.

Default resolution: explicit param → tenant default → system default
("transient"). MVP hardcodes the system default; tenant overrides are
Phase 2.

Rationale: screenshots serve three different jobs (agent's eyes,
debugging trace, forensic audit), and a single retention policy can't
serve all three without either drowning in disk or losing audit value.
The dashcam analogy in the doc covers this directly. Per-call
persistence flags are messy and per-tenant audit-flagging at session
level was the wrong granularity.

Also:

- Credential-exfiltration mitigation in the threat model now describes
  the off/audit levers an operator has.
- Future enhancement noted: browser.freeze_session to promote a
  transient ring buffer to audit retention without restarting.
- Phase 1A handoff updated: POST /sessions accepts record, response
  echoes it; /screenshot persistence behavior tied to session record
  mode with explicit test points.
2026-05-11 11:19:23 +02:00
6d6d3a1373 Document browser jail handoff storage policy
---
Build: pass | Tests: pass — 2382 passed (175 files)
2026-05-11 11:04:19 +02:00
466ad73cee Document browser jail FreeBSD viability
---
Build: pass | Tests: pass — 2382 passed (175 files)
2026-05-11 10:44:42 +02:00
Operator & Claude Code
e55edbbf0c Promote browser-jail vision-grounding spike to scripts/
Move the spike workspace from the gitignored tmp/ scratch dir into
scripts/browser-jail-spike/ so Codex (or anyone) can re-run it on
FreeBSD with the keys already configured on the host. Self-contained:
fixtures, CDP renderer, OpenAI-compat harness, scorer, plus the
committed screenshots and ground-truth JSON so the experiment is
reproducible without re-rendering.

Claude Opus 4.7 baseline included in results/ (17/17 PASS at 30 px,
mean 1 px). Pending columns:

- GPT-4o via OPENAI_API_KEY
- GLM-4V via ZAI_API_KEY (pi's existing provider)
- UI-TARS-7B via vLLM if/when an endpoint exists

Path references in VISION-GROUNDING-FINDINGS.md and
BROWSER-JAIL-HANDOFF.md updated to match the new location.
2026-05-11 10:03:15 +02:00
Operator & Claude Code
3070fa323f Add browser-jail design, threat model, and Phase 0 spike artifacts
Three coordinated docs that anchor the FreeBSD-hosted headless browser
work:

- docs/internal/BROWSER-JAIL.md — full design (architecture, MCP tool
  surface, isolation model, auth via better-auth, PF egress policy,
  screenshot retention, audit logging) and a threat-model section
  covering SSRF, credential leakage, cross-session bleed, audit
  poisoning, and resource exhaustion.
- docs/internal/VISION-GROUNDING-FINDINGS.md — spike methodology
  (3 deterministic HTML fixtures, DOM-extracted ground truth,
  30 px tolerance, identical prompt across models). Claude Opus 4.7
  column complete: 17/17 PASS, mean 1 px, max 8 px. GPT-4o, GLM-4V,
  and UI-TARS columns pending — harness ready under
  tmp/browser-jail-spike/.
- doc/BROWSER-JAIL-HANDOFF.md — Codex handoff for Phase 0.5 (FreeBSD
  viability spike) and Phase 1 (jail HTTP service + controlplane MCP
  proxy + PF rules) with per-commit validation requirements.

Runtime constraint baked in: Node v22+ everywhere on the FreeBSD path,
no Bun. CDP client is puppeteer-core against system-pkg Chromium —
full Playwright avoided due to FreeBSD bundling gaps.
2026-05-11 09:58:14 +02:00
8777f0f583 Remove Qodo repo surfaces and embeddings
---
Build: pass | Tests: pass — 2376 passed (712 files)
2026-05-11 00:58:54 +02:00
777a9a5235 Remove completed mac_do reboot handoff
---
Build: pass | Tests: pass — 2378 passed (704 files)
2026-05-11 00:04:54 +02:00
96edcd4f1d Record pre-reboot mac_do validation tests
---
Build: pass | Tests: pass — 2375 passed (704 files)
2026-05-10 23:09:39 +02:00
538bc951b4 Document mac_do reboot handoff and reboot intent plan (Codex)
---
Build: pass | Tests: pass — 2373 passed (704 files)
2026-05-10 22:31:03 +02:00
50a915c414 Drop Astro docs path compatibility noise (Codex)
Remove the ASTRO_SITE_PATH alias and stale STRIPPED/refactor comments now that CMS_DOCS_SITE_PATH is the canonical docs project path.

---
Build: pass | Tests: pass — 2372 passed (704 files)

---
Build: pass | Tests: pass — 2372 passed (704 files)

---
Build: pass | Tests: pass — 2372 passed (704 files)

---
Build: pass | Tests: pass — 2372 passed (704 files)
2026-05-10 20:47:10 +02:00
e3ad322d3b Rename Astro docs project to clawdie-docs (Sam & Claude)
Make the docs renderer name match its purpose, add CMS_DOCS_SITE_PATH with ASTRO_SITE_PATH compatibility, and update docs publishing paths.

---
Build: pass | Tests: pass — 2372 passed (704 files)
2026-05-10 19:49:39 +02:00
Operator & Claude Code
398bdd5f5f Prune stale docs/internal handoffs, reviews, and superseded plans
Every file under docs/internal/ ends up in the bootstrap/skills-memory
artifact (per metadata.json: "Full project docs, internal docs, identity
files, and skill definitions"). Stale handoffs, dated build reports,
single-commit reviews, and superseded design notes were polluting the
embedding index with low-signal chunks.

Removed:
- TLS-CERT-LIFECYCLE-HANDOFF.md, GLASSPANE-FREEBSD-HANDOFF.md,
  CMS-ASTRO-SOURCE-OF-TRUTH-HANDOFF.md (handoffs whose work has landed)
- HOST-DB-READINESS-REVIEW.md, HOST-DB-REBOOT-REVIEW.md,
  HOST-DB-RECOVERY-PLAN.md, SYSTEM-NAMESPACE-BRANCH-REVIEW.md
  (commit/branch reviews self-marked as historical)
- BUILD-TEST-REPORT-06.APR.2026.md, test-results.md (dated snapshots)
- DEBUG_CHECKLIST.md (Feb 2026 known-issues list, top item already fixed)
- BOOTABLE-ISO-PLAN-V1.md (V1 plan; ISO-FIRST-BOOT-IMPLEMENTATION.md is now
  the source of truth)
- STRAPI-FREEBSD-SETUP.md, PI-SKILLS-INTEGRATION.md, CODEX-FREEBSD.md
  (workarounds and one-off design notes for resolved/superseded paths)
- REFACTOR-PLAN.md, nanoclaw-architecture-final.md, AGENT-HARNESS-V2.md,
  AGENT-SKILLS-VS-REALITY.md (older planning/architecture docs whose
  decisions are now in code or ARCHITECTURE.md)
- BUILTIN-KNOWLEDGE-SPEC.md, LOCAL-KNOWLEDGE-BOOTSTRAP.md (early specs
  superseded by SKILLS-ARTIFACT-V1-PLAN.md)
- HEARTBEAT.md (design doc; implementation lives in scripts/heartbeat.sh
  and src/controlplane-heartbeat.ts)
- POSTGRES-PERMISSIONS.md (one-off fix recipe)
- RUNTIME-MANIFEST-DESIGN.md (status: Implemented; design is in code now)

Updates to remaining files patch broken cross-links:
- ARCHITECTURE.md drops the two table rows pointing at deleted docs
- doc/THREE-BIRD-ARCHITECTURE.md drops Strapi-setup link references
- docs/internal/SKILLS-ARTIFACT-V1-PLAN.md drops the "Depends on" line
- docs/internal/SUDO_REPLACEMENT.md trims its list of internal docs that
  reference sudo
- .agent/skills/setup and .agent/skills/docs-deployment drop pointers to
  REFACTOR-PLAN and DEBUG_CHECKLIST

Net: 23 files deleted, 7566 lines removed. docs/internal/ goes from 41 to
18 markdown files. The artifact's next refresh will see proportionally
less noise in retrieval.

---
Build: FAIL | Tests: FAIL — 16 failed
2026-05-10 13:34:27 +02:00
f6acf8e256 Prune stale first-boot docs and scripts (Sam & Codex)
Make the first-boot implementation spec self-contained, remove the superseded secrets handoff and obsolete manual jail setup scripts, and align hostname defaulting with the assistant-name separation rule. Update PostgreSQL permission notes and sync the public first-boot page into Astro docs.

---

Build: pass

Tests: pass — 2197 passed (164 files)

---
Build: pass | Tests: pass — 2197 passed (650 files)
2026-05-07 12:40:47 +02:00