Commit graph

157 commits

Author SHA1 Message Date
Clawdie
52b5eb310c Colibri design
To complement marketing update for Astro deploy
2026-05-25 15:51:43 +02:00
Operator & Claude Code
786cc9593f Add clawdie.si landing preview screenshot (Sam & Claude)
Full-page SL render of the rebuilt landing (final state, including the reinstated concrete install CTA). Captured via headless Chromium, optimized losslessly with optipng (~460 KB). Reference asset under doc/, not part of the Astro build.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---
Build: pass | Tests: FAIL — 18 failed
2026-05-25 15:47:15 +02:00
74a0f6e1db Defining target audience - Slovenian
Soloprenour to Soveringprenour talk by Jordan Urbs
2026-05-25 15:06:59 +02:00
028f3d0d08 Add Pi GPT-5.5 lane to Colibri matrix
Confirm the current Pi/openai-codex gpt-5.5 lane in the active Colibri operational matrix alongside the Codex app, Claude, Z.ai, DeepSeek, host-status, and version-sync lanes.

---
Build: pass | Tests: pass — 2490 passed (187 files)
2026-05-25 00:15:15 +02:00
fa2ecb0c7c Avoid brittle Codex model pins in docs (Sam & Codex)
---
Build: pass | Tests: pass — 2490 passed (187 files)
2026-05-25 00:09:18 +02:00
51ba35bd33 Document active Colibri lanes and Codex identity (Sam & Codex)
---
Build: pass | Tests: pass — 2490 passed (187 files)
2026-05-25 00:04:41 +02:00
b26e4da118 Add Colibri runtime version inventory
Add a structured runtime inventory schema, drift summary tests, and skills for Pi provider smoke tests plus Node/Pi/npm version synchronization across hosts and ISO build inputs.

---
Build: pass | Tests: pass — 2485 passed (186 files)
2026-05-24 20:58:37 +02:00
477605ad13 Document Pi DeepSeek smoke lane
Capture Linux Pi credential layout and JSON-mode DeepSeek smoke checks for the Colibri provider gate.

---
Build: pass | Tests: pass — 2467 passed (184 files)
2026-05-24 19:48:50 +02:00
e5f05abd51 Add Colibri event parsing foundation
Introduce pure Pi JSON event parsing and cross-host run manifest validation as the first code on the Colibri control branch.

---
Build: pass | Tests: pass — 2467 passed (184 files)
2026-05-24 19:33:25 +02:00
c5820dec84 Document Colibri Pi control plan
Park the Pi-only control simplification and cross-host run contract so other agents can review before implementation starts.

---
Build: pass | Tests: pass — 2456 passed (182 files)
2026-05-24 19:26:19 +02:00
Operator & Claude Code
f5f60e7838 Mark network skill H3 /tmp usage as operator-accepted WONTFIX (Claude)
Codex explicitly reverted to system /tmp for root pcap staging in c9a8aa1
and documented the exception. No further action needed.

---
Build: pass | Tests: pass — 2456 passed (182 files)
2026-05-18 12:19:07 +02:00
Operator & Claude Code
68e715424a Add session stabilization and network skill review handoffs (Claude)
Two review handoffs from architecture analysis of the xfce-operator-usb
branch and the network throughput skill. Session stabilization covers 15
actionable items (2 HIGH, 5 MEDIUM, 5 LOW). Network skill covers 13 items
(3 HIGH, 6 MEDIUM, 4 LOW).

---
Build: pass | Tests: unknown — Linux agent cannot run tests
2026-05-18 10:15:00 +02:00
a2ab899b2d Wire browser credential injection and operator
---
Build: pass | Tests: pass — 2439 passed (181 files)
2026-05-11 21:38:05 +02:00
30663792ab Pin browser credential injection contract
---
Build: pass | Tests: pass — 2430 passed (179 files)
2026-05-11 21:20:48 +02:00
cd04e9da3f Add browser credentials and grant stores
---
Build: pass | Tests: pass — 2430 passed (179 files)
2026-05-11 20:26:44 +02:00
7cdfb8bf8b Add browser session orchestration internals
---
Build: pass | Tests: pass — 2419 passed (177 files)
2026-05-11 20:11:15 +02:00
e7a1315b30 Document browser session orchestration rules
---
Build: pass | Tests: pass — 2400 passed (176 files)
2026-05-11 19:37:09 +02:00
fcd1172939 Add browser jail HTTP backend
---
Build: pass | Tests: pass — 2400 passed (176 files)
2026-05-11 19:16:46 +02:00
6d662d5d3b Add browser clone hostd lifecycle ops
---
Build: pass | Tests: pass — 2395 passed (175 files)
2026-05-11 18:18:44 +02:00
6c549e7ad0 Rename browser validation assets
---
Build: pass | Tests: pass — 2383 passed (175 files)
2026-05-11 17:32:22 +02:00
5eeb51d68b Provision browser template jail
---
Build: pass | Tests: pass — 2383 passed (175 files)
2026-05-11 16:38:49 +02:00
3ea26f231d Validate browser clone cookie injection
---
Build: pass | Tests: pass — 2383 passed (175 files)
2026-05-11 16:19:12 +02:00
Operator & Claude Code
35ddc4afb2 Polish browser-jail doc constellation
Small follow-ups after the design alignment sweep, surfaced as
improvements worth landing before Phase 0.6 starts.

- BROWSER-JAIL.md: add a "Related docs" index at the top so a new reader
  can find current design vs phase records vs direction vs history
  without grepping. Resolve the two non-blocking choices: credentials
  store backend = Postgres, refresh UX = controlplane-streamed clone.
  Add a concrete operator_grant_token schema (jti/iss/tenant_id/
  origin_session_id/operator_id/allowed_domains/issued_at/expires_at/
  single_use) with clawdie-internal opaque-token storage in postgres,
  validation rules, and revocation semantics.

- BROWSER-JAIL-HANDOFF.md: add an "Order of work" preface to the
  implementation section so the component-organized checklists are read
  with the dependency chain in mind. Setup → hostd → backend →
  credentials/grants → operator/injection → run_task endpoint → pi
  extension → smoke. pi-side test cases require the full stack and
  should not be the first integration target.

- VISION-GROUNDING-FINDINGS.md: add a "Role under UI-TARS adoption"
  footer reframing the doc as Phase 1 model-selection input (which
  vision model to pair with the UI-TARS-compatible runner), not "should
  we build vision grounding."

- BROWSER-JAIL-TEMPLATE-CLONE-PROPOSAL.md: mark HISTORICAL. Retained
  for pivot reasoning (why profile-byte cloning was dropped). New
  decisions land in BROWSER-JAIL.md, not here.

Nothing in this commit changes the architecture or blocks any track.
All five items are doc hygiene that future implementation work will
appreciate.
2026-05-11 15:12:26 +02:00
55a6dee215 Align browser jail design docs
---
Build: pass | Tests: pass — 2383 passed (175 files)
2026-05-11 15:05:31 +02:00
54f612edf2 Fix browser jail registry slot
---
Build: pass | Tests: pass — 2383 passed (175 files)
2026-05-11 14:53:12 +02:00
Operator & Claude Code
8855543ef0 Pivot template+clone proposal to credentials store + CDP injection
Phase 0.6 validation paused: ZFS clone substrate works; Chromium encrypted
profile inheritance does not. Cookies survive as SQLite rows in the
cloned profile but do not decrypt or present in the clone's Chromium.
Documented in BROWSER-JAIL-CLONE-LIFECYCLE-VALIDATION.md.

Pivot: decouple substrate from identity.

- Substrate (template + clone) keeps install amortization, jail-level
  isolation, fast spawn, clean teardown. All validated.
- Identity moves out of the profile. New: clawdie-owned credentials store
  per (tenant, domain), encrypted at rest, populated by an operator-driven
  refresh workflow.
- Cookies injected per-session via CDP Network.setCookie at clone start.
- Templates become credential-free. browserop and browserclean are
  identical at rest; difference is policy (browserop eligible for
  operator-mode injection when paired with valid operator_grant_token).

operator_grant_token from earlier commits now authorizes credential
injection scoped by domain, not access to a magic pre-authenticated
template. Stronger model: domain-filtered, audited, revocable.

Refines per Codex review:

- MVP scope: cookies only, domain-filtered, grant-token required;
  no localStorage, IndexedDB, passkeys, persistence-back from tasks.
- Sealed-snapshot mechanics retired — no encryption-survival concern,
  no SingletonLock survival concern.
- Phase 0.6 redefined as cookie-injection round-trip → cross-clone
  injection → 3-cycle clone/inject/smoke/destroy with the original
  latency / orphaned-state acceptance criteria.
- Cleanup order updated to record forced-unmount fallback after busy
  dataset; Phase 0.6 confirmed this is a normal case, not an alert.
- Operator credential refresh workflow described as a new section;
  refresh UX (web stream vs VNC vs Lumina-local) explicitly deferred.

No edits to BROWSER-JAIL.md or BROWSER-JAIL-HANDOFF.md. Those reshape
only after Phase 0.6 (redefined) passes.
2026-05-11 13:59:50 +02:00
1095a86592 Reduce browser clone validation cycles
---
Build: pass | Tests: pass — 2382 passed (175 files)
2026-05-11 13:01:07 +02:00
fec724273c Document pi browser task integration
---
Build: pass | Tests: pass — 2382 passed (175 files)
2026-05-11 12:44:23 +02:00
ba33a349cc Document UI-TARS adoption direction
---
Build: pass | Tests: pass — 2382 passed (175 files)
2026-05-11 12:33:13 +02:00
70a8a12d36 Rename browser clone spike to validation
---
Build: pass | Tests: pass — 2382 passed (175 files)
2026-05-11 11:41:55 +02:00
Operator & Claude Code
28a8242f47 Revise template+clone proposal per Codex review
Fold in the agreed refinements from Codex's review. Proposal is now
explicit about what's decided and what gates on Phase 0.6.

Status: PROPOSAL — pending Phase 0.6 Bastille/ZFS clone lifecycle
validation. Hard gate before any BROWSER-JAIL.md or BROWSER-JAIL-HANDOFF.md
edits.

Key decisions baked in:

- VNET-safe naming throughout (browserop, browserclean, browsertaskNNN);
  the hyphenated names in the original proposal would have been rejected
  by bastille (confirmed in Phase 0.5 viability).
- Two templates, default "clean"; "operator" requires explicit
  authorization, not silent model tool-call grant.
- Sealed-snapshot clones — Chromium stopped + SQLite quiesced before
  the snapshot used as the clone source. No cloning from a live
  mutable profile.
- No external ingress to template jail; controlplane-mediated refresh
  only.
- Defer Firefox sync. Manual refresh in MVP.
- PF approach: static ruleset matching pfctl table "browser_tasks";
  per-clone membership via pfctl -t -T add/delete. No full firewall
  reload per session.
- Watchdog rule: templates are infrastructure (must stay up); task
  clones are session-owned ephemeral resources (disappearance is normal).
- Screenshots/audit stored outside clone datasets so they survive
  clone destruction.
- 7-step cleanup order codified (service stop, chrome TERM/KILL,
  unmount, bastille stop, PF delete, IP release, zfs destroy).

New sections:

- Seven additional open questions from Codex review (VNET naming/IP
  pool, sealed snapshot mechanics, profile clone correctness, RCTL
  limits, screenshot lifetime, orphan reaper, operator-template
  auth UX).
- Phase 0.6 spike (GATING) with hard acceptance criteria:
  10 sequential cycles with zero orphans, median <2s p95 <5s clone+
  start latency, sealed-template cookie visible in clone, idempotent
  reaper.
- Failure modes that change the verdict.
- Status summary table.

No edits to BROWSER-JAIL.md or BROWSER-JAIL-HANDOFF.md.
2026-05-11 11:37:53 +02:00
Operator & Claude Code
23db494f57 Propose template + clone browser-jail architecture
Significant architectural change vs the current BROWSER-JAIL.md design.
Replaces "one long-lived jail + per-task BrowserContext" with persistent
template jails (operator-browser, clean-browser) + ephemeral per-task
ZFS clones.

Motivation: the current design has no story for persistent operator
logins. Every task starts blank — 2FA on every run, no usable workflow
for authenticated services. Cloning a thick template via ZFS is
~constant-time, plays to clawdie's existing platform strengths
(bastille, hostd, ZFS), and gives per-task jail-level isolation rather
than BrowserContext-level.

Status: PROPOSAL — six open questions documented for Codex review
before any BROWSER-JAIL.md edits or implementation reshape. Specifically
seeks Codex's read on bastille clone operational smoothness at per-task
rate, watchdog tolerance for fluctuating jail count, and PF rule
generation cost per clone.

No changes to BROWSER-JAIL.md or BROWSER-JAIL-HANDOFF.md in this commit —
those land after the proposal is accepted or amended.
2026-05-11 11:30:31 +02:00
Operator & Claude Code
f2a5c59273 Session-level screenshot recording modes (off/transient/audit)
Replace per-call persist:false with a session-level record mode set at
open_session, immutable for the session's life. Three modes:

- off:       nothing written to disk; model still sees screenshots in
             context.
- transient: last N=50 screenshots in a FIFO ring buffer per session.
             Default. Enough for post-hoc debugging without unbounded
             growth.
- audit:     persist all with 7d retention. Explicit opt-in for
             sensitive operations.

Default resolution: explicit param → tenant default → system default
("transient"). MVP hardcodes the system default; tenant overrides are
Phase 2.

Rationale: screenshots serve three different jobs (agent's eyes,
debugging trace, forensic audit), and a single retention policy can't
serve all three without either drowning in disk or losing audit value.
The dashcam analogy in the doc covers this directly. Per-call
persistence flags are messy and per-tenant audit-flagging at session
level was the wrong granularity.

Also:

- Credential-exfiltration mitigation in the threat model now describes
  the off/audit levers an operator has.
- Future enhancement noted: browser.freeze_session to promote a
  transient ring buffer to audit retention without restarting.
- Phase 1A handoff updated: POST /sessions accepts record, response
  echoes it; /screenshot persistence behavior tied to session record
  mode with explicit test points.
2026-05-11 11:19:23 +02:00
6d6d3a1373 Document browser jail handoff storage policy
---
Build: pass | Tests: pass — 2382 passed (175 files)
2026-05-11 11:04:19 +02:00
466ad73cee Document browser jail FreeBSD viability
---
Build: pass | Tests: pass — 2382 passed (175 files)
2026-05-11 10:44:42 +02:00
Operator & Claude Code
e55edbbf0c Promote browser-jail vision-grounding spike to scripts/
Move the spike workspace from the gitignored tmp/ scratch dir into
scripts/browser-jail-spike/ so Codex (or anyone) can re-run it on
FreeBSD with the keys already configured on the host. Self-contained:
fixtures, CDP renderer, OpenAI-compat harness, scorer, plus the
committed screenshots and ground-truth JSON so the experiment is
reproducible without re-rendering.

Claude Opus 4.7 baseline included in results/ (17/17 PASS at 30 px,
mean 1 px). Pending columns:

- GPT-4o via OPENAI_API_KEY
- GLM-4V via ZAI_API_KEY (pi's existing provider)
- UI-TARS-7B via vLLM if/when an endpoint exists

Path references in VISION-GROUNDING-FINDINGS.md and
BROWSER-JAIL-HANDOFF.md updated to match the new location.
2026-05-11 10:03:15 +02:00
Operator & Claude Code
3070fa323f Add browser-jail design, threat model, and Phase 0 spike artifacts
Three coordinated docs that anchor the FreeBSD-hosted headless browser
work:

- docs/internal/BROWSER-JAIL.md — full design (architecture, MCP tool
  surface, isolation model, auth via better-auth, PF egress policy,
  screenshot retention, audit logging) and a threat-model section
  covering SSRF, credential leakage, cross-session bleed, audit
  poisoning, and resource exhaustion.
- docs/internal/VISION-GROUNDING-FINDINGS.md — spike methodology
  (3 deterministic HTML fixtures, DOM-extracted ground truth,
  30 px tolerance, identical prompt across models). Claude Opus 4.7
  column complete: 17/17 PASS, mean 1 px, max 8 px. GPT-4o, GLM-4V,
  and UI-TARS columns pending — harness ready under
  tmp/browser-jail-spike/.
- doc/BROWSER-JAIL-HANDOFF.md — Codex handoff for Phase 0.5 (FreeBSD
  viability spike) and Phase 1 (jail HTTP service + controlplane MCP
  proxy + PF rules) with per-commit validation requirements.

Runtime constraint baked in: Node v22+ everywhere on the FreeBSD path,
no Bun. CDP client is puppeteer-core against system-pkg Chromium —
full Playwright avoided due to FreeBSD bundling gaps.
2026-05-11 09:58:14 +02:00
8777f0f583 Remove Qodo repo surfaces and embeddings
---
Build: pass | Tests: pass — 2376 passed (712 files)
2026-05-11 00:58:54 +02:00
777a9a5235 Remove completed mac_do reboot handoff
---
Build: pass | Tests: pass — 2378 passed (704 files)
2026-05-11 00:04:54 +02:00
96edcd4f1d Record pre-reboot mac_do validation tests
---
Build: pass | Tests: pass — 2375 passed (704 files)
2026-05-10 23:09:39 +02:00
538bc951b4 Document mac_do reboot handoff and reboot intent plan (Codex)
---
Build: pass | Tests: pass — 2373 passed (704 files)
2026-05-10 22:31:03 +02:00
50a915c414 Drop Astro docs path compatibility noise (Codex)
Remove the ASTRO_SITE_PATH alias and stale STRIPPED/refactor comments now that CMS_DOCS_SITE_PATH is the canonical docs project path.

---
Build: pass | Tests: pass — 2372 passed (704 files)

---
Build: pass | Tests: pass — 2372 passed (704 files)

---
Build: pass | Tests: pass — 2372 passed (704 files)

---
Build: pass | Tests: pass — 2372 passed (704 files)
2026-05-10 20:47:10 +02:00
e3ad322d3b Rename Astro docs project to clawdie-docs (Sam & Claude)
Make the docs renderer name match its purpose, add CMS_DOCS_SITE_PATH with ASTRO_SITE_PATH compatibility, and update docs publishing paths.

---
Build: pass | Tests: pass — 2372 passed (704 files)
2026-05-10 19:49:39 +02:00
Operator & Claude Code
398bdd5f5f Prune stale docs/internal handoffs, reviews, and superseded plans
Every file under docs/internal/ ends up in the bootstrap/skills-memory
artifact (per metadata.json: "Full project docs, internal docs, identity
files, and skill definitions"). Stale handoffs, dated build reports,
single-commit reviews, and superseded design notes were polluting the
embedding index with low-signal chunks.

Removed:
- TLS-CERT-LIFECYCLE-HANDOFF.md, GLASSPANE-FREEBSD-HANDOFF.md,
  CMS-ASTRO-SOURCE-OF-TRUTH-HANDOFF.md (handoffs whose work has landed)
- HOST-DB-READINESS-REVIEW.md, HOST-DB-REBOOT-REVIEW.md,
  HOST-DB-RECOVERY-PLAN.md, SYSTEM-NAMESPACE-BRANCH-REVIEW.md
  (commit/branch reviews self-marked as historical)
- BUILD-TEST-REPORT-06.APR.2026.md, test-results.md (dated snapshots)
- DEBUG_CHECKLIST.md (Feb 2026 known-issues list, top item already fixed)
- BOOTABLE-ISO-PLAN-V1.md (V1 plan; ISO-FIRST-BOOT-IMPLEMENTATION.md is now
  the source of truth)
- STRAPI-FREEBSD-SETUP.md, PI-SKILLS-INTEGRATION.md, CODEX-FREEBSD.md
  (workarounds and one-off design notes for resolved/superseded paths)
- REFACTOR-PLAN.md, nanoclaw-architecture-final.md, AGENT-HARNESS-V2.md,
  AGENT-SKILLS-VS-REALITY.md (older planning/architecture docs whose
  decisions are now in code or ARCHITECTURE.md)
- BUILTIN-KNOWLEDGE-SPEC.md, LOCAL-KNOWLEDGE-BOOTSTRAP.md (early specs
  superseded by SKILLS-ARTIFACT-V1-PLAN.md)
- HEARTBEAT.md (design doc; implementation lives in scripts/heartbeat.sh
  and src/controlplane-heartbeat.ts)
- POSTGRES-PERMISSIONS.md (one-off fix recipe)
- RUNTIME-MANIFEST-DESIGN.md (status: Implemented; design is in code now)

Updates to remaining files patch broken cross-links:
- ARCHITECTURE.md drops the two table rows pointing at deleted docs
- doc/THREE-BIRD-ARCHITECTURE.md drops Strapi-setup link references
- docs/internal/SKILLS-ARTIFACT-V1-PLAN.md drops the "Depends on" line
- docs/internal/SUDO_REPLACEMENT.md trims its list of internal docs that
  reference sudo
- .agent/skills/setup and .agent/skills/docs-deployment drop pointers to
  REFACTOR-PLAN and DEBUG_CHECKLIST

Net: 23 files deleted, 7566 lines removed. docs/internal/ goes from 41 to
18 markdown files. The artifact's next refresh will see proportionally
less noise in retrieval.

---
Build: FAIL | Tests: FAIL — 16 failed
2026-05-10 13:34:27 +02:00
f6acf8e256 Prune stale first-boot docs and scripts (Sam & Codex)
Make the first-boot implementation spec self-contained, remove the superseded secrets handoff and obsolete manual jail setup scripts, and align hostname defaulting with the assistant-name separation rule. Update PostgreSQL permission notes and sync the public first-boot page into Astro docs.

---

Build: pass

Tests: pass — 2197 passed (164 files)

---
Build: pass | Tests: pass — 2197 passed (650 files)
2026-05-07 12:40:47 +02:00
6de0ed87ab Remove legacy Mevy references (Sam & Codex)
Sweep active code, tests, identity files, public docs, CMS seed content, and stale handoffs so old assistant-name fixtures no longer leak into current Clawdie/system-namespace behavior. Keep the skills-memory SQL artifact unchanged per regeneration policy.

---

Build: pass

Tests: pass — 2197 passed (164 files)

---
Build: pass | Tests: pass — 2197 passed (650 files)
2026-05-07 11:16:40 +02:00
fcab9aa475 docs(handoff): streamline — 17/20 done, 3 items remain (Sam & Claude)
Phase 7e jail isolation, coordinator validation, and daily sync check
are the only remaining deletion criteria.

---
Build: pass | Tests: untested — doc only
2026-04-22 10:17:59 +02:00
b3fc431bb1 docs(handoff): mark /model and skill-list done
Update harness validation checklist based on confirmed Telegram /model flow; clarify PF isolation checks (use TCP probes since ping is blocked in jails).
2026-04-22 10:07:45 +02:00
abe6236da5 docs(handoff): update harness validation progress
Mark /status override and several harness cleanup items complete; update ZAI models endpoint reference and branch hash.
2026-04-22 09:50:48 +02:00
104435bcd7 docs(harness): update handoff with latest fixes
- Bump branch hash\n- Note scheduler 42P08 cast fix and ZAI endpoint correction\n\n---\nBuild: pass | Tests: pass — unchanged
2026-04-22 08:56:03 +02:00