Commit graph

129 commits

Author SHA1 Message Date
ba33a349cc Document UI-TARS adoption direction
---
Build: pass | Tests: pass — 2382 passed (175 files)
2026-05-11 12:33:13 +02:00
70a8a12d36 Rename browser clone spike to validation
---
Build: pass | Tests: pass — 2382 passed (175 files)
2026-05-11 11:41:55 +02:00
Operator & Claude Code
28a8242f47 Revise template+clone proposal per Codex review
Fold in the agreed refinements from Codex's review. Proposal is now
explicit about what's decided and what gates on Phase 0.6.

Status: PROPOSAL — pending Phase 0.6 Bastille/ZFS clone lifecycle
validation. Hard gate before any BROWSER-JAIL.md or BROWSER-JAIL-HANDOFF.md
edits.

Key decisions baked in:

- VNET-safe naming throughout (browserop, browserclean, browsertaskNNN);
  the hyphenated names in the original proposal would have been rejected
  by bastille (confirmed in Phase 0.5 viability).
- Two templates, default "clean"; "operator" requires explicit
  authorization, not silent model tool-call grant.
- Sealed-snapshot clones — Chromium stopped + SQLite quiesced before
  the snapshot used as the clone source. No cloning from a live
  mutable profile.
- No external ingress to template jail; controlplane-mediated refresh
  only.
- Defer Firefox sync. Manual refresh in MVP.
- PF approach: static ruleset matching pfctl table "browser_tasks";
  per-clone membership via pfctl -t -T add/delete. No full firewall
  reload per session.
- Watchdog rule: templates are infrastructure (must stay up); task
  clones are session-owned ephemeral resources (disappearance is normal).
- Screenshots/audit stored outside clone datasets so they survive
  clone destruction.
- 7-step cleanup order codified (service stop, chrome TERM/KILL,
  unmount, bastille stop, PF delete, IP release, zfs destroy).

New sections:

- Seven additional open questions from Codex review (VNET naming/IP
  pool, sealed snapshot mechanics, profile clone correctness, RCTL
  limits, screenshot lifetime, orphan reaper, operator-template
  auth UX).
- Phase 0.6 spike (GATING) with hard acceptance criteria:
  10 sequential cycles with zero orphans, median <2s p95 <5s clone+
  start latency, sealed-template cookie visible in clone, idempotent
  reaper.
- Failure modes that change the verdict.
- Status summary table.

No edits to BROWSER-JAIL.md or BROWSER-JAIL-HANDOFF.md.
2026-05-11 11:37:53 +02:00
Operator & Claude Code
23db494f57 Propose template + clone browser-jail architecture
Significant architectural change vs the current BROWSER-JAIL.md design.
Replaces "one long-lived jail + per-task BrowserContext" with persistent
template jails (operator-browser, clean-browser) + ephemeral per-task
ZFS clones.

Motivation: the current design has no story for persistent operator
logins. Every task starts blank — 2FA on every run, no usable workflow
for authenticated services. Cloning a thick template via ZFS is
~constant-time, plays to clawdie's existing platform strengths
(bastille, hostd, ZFS), and gives per-task jail-level isolation rather
than BrowserContext-level.

Status: PROPOSAL — six open questions documented for Codex review
before any BROWSER-JAIL.md edits or implementation reshape. Specifically
seeks Codex's read on bastille clone operational smoothness at per-task
rate, watchdog tolerance for fluctuating jail count, and PF rule
generation cost per clone.

No changes to BROWSER-JAIL.md or BROWSER-JAIL-HANDOFF.md in this commit —
those land after the proposal is accepted or amended.
2026-05-11 11:30:31 +02:00
Operator & Claude Code
f2a5c59273 Session-level screenshot recording modes (off/transient/audit)
Replace per-call persist:false with a session-level record mode set at
open_session, immutable for the session's life. Three modes:

- off:       nothing written to disk; model still sees screenshots in
             context.
- transient: last N=50 screenshots in a FIFO ring buffer per session.
             Default. Enough for post-hoc debugging without unbounded
             growth.
- audit:     persist all with 7d retention. Explicit opt-in for
             sensitive operations.

Default resolution: explicit param → tenant default → system default
("transient"). MVP hardcodes the system default; tenant overrides are
Phase 2.

Rationale: screenshots serve three different jobs (agent's eyes,
debugging trace, forensic audit), and a single retention policy can't
serve all three without either drowning in disk or losing audit value.
The dashcam analogy in the doc covers this directly. Per-call
persistence flags are messy and per-tenant audit-flagging at session
level was the wrong granularity.

Also:

- Credential-exfiltration mitigation in the threat model now describes
  the off/audit levers an operator has.
- Future enhancement noted: browser.freeze_session to promote a
  transient ring buffer to audit retention without restarting.
- Phase 1A handoff updated: POST /sessions accepts record, response
  echoes it; /screenshot persistence behavior tied to session record
  mode with explicit test points.
2026-05-11 11:19:23 +02:00
6d6d3a1373 Document browser jail handoff storage policy
---
Build: pass | Tests: pass — 2382 passed (175 files)
2026-05-11 11:04:19 +02:00
466ad73cee Document browser jail FreeBSD viability
---
Build: pass | Tests: pass — 2382 passed (175 files)
2026-05-11 10:44:42 +02:00
Operator & Claude Code
e55edbbf0c Promote browser-jail vision-grounding spike to scripts/
Move the spike workspace from the gitignored tmp/ scratch dir into
scripts/browser-jail-spike/ so Codex (or anyone) can re-run it on
FreeBSD with the keys already configured on the host. Self-contained:
fixtures, CDP renderer, OpenAI-compat harness, scorer, plus the
committed screenshots and ground-truth JSON so the experiment is
reproducible without re-rendering.

Claude Opus 4.7 baseline included in results/ (17/17 PASS at 30 px,
mean 1 px). Pending columns:

- GPT-4o via OPENAI_API_KEY
- GLM-4V via ZAI_API_KEY (pi's existing provider)
- UI-TARS-7B via vLLM if/when an endpoint exists

Path references in VISION-GROUNDING-FINDINGS.md and
BROWSER-JAIL-HANDOFF.md updated to match the new location.
2026-05-11 10:03:15 +02:00
Operator & Claude Code
3070fa323f Add browser-jail design, threat model, and Phase 0 spike artifacts
Three coordinated docs that anchor the FreeBSD-hosted headless browser
work:

- docs/internal/BROWSER-JAIL.md — full design (architecture, MCP tool
  surface, isolation model, auth via better-auth, PF egress policy,
  screenshot retention, audit logging) and a threat-model section
  covering SSRF, credential leakage, cross-session bleed, audit
  poisoning, and resource exhaustion.
- docs/internal/VISION-GROUNDING-FINDINGS.md — spike methodology
  (3 deterministic HTML fixtures, DOM-extracted ground truth,
  30 px tolerance, identical prompt across models). Claude Opus 4.7
  column complete: 17/17 PASS, mean 1 px, max 8 px. GPT-4o, GLM-4V,
  and UI-TARS columns pending — harness ready under
  tmp/browser-jail-spike/.
- doc/BROWSER-JAIL-HANDOFF.md — Codex handoff for Phase 0.5 (FreeBSD
  viability spike) and Phase 1 (jail HTTP service + controlplane MCP
  proxy + PF rules) with per-commit validation requirements.

Runtime constraint baked in: Node v22+ everywhere on the FreeBSD path,
no Bun. CDP client is puppeteer-core against system-pkg Chromium —
full Playwright avoided due to FreeBSD bundling gaps.
2026-05-11 09:58:14 +02:00
8777f0f583 Remove Qodo repo surfaces and embeddings
---
Build: pass | Tests: pass — 2376 passed (712 files)
2026-05-11 00:58:54 +02:00
777a9a5235 Remove completed mac_do reboot handoff
---
Build: pass | Tests: pass — 2378 passed (704 files)
2026-05-11 00:04:54 +02:00
96edcd4f1d Record pre-reboot mac_do validation tests
---
Build: pass | Tests: pass — 2375 passed (704 files)
2026-05-10 23:09:39 +02:00
538bc951b4 Document mac_do reboot handoff and reboot intent plan (Codex)
---
Build: pass | Tests: pass — 2373 passed (704 files)
2026-05-10 22:31:03 +02:00
50a915c414 Drop Astro docs path compatibility noise (Codex)
Remove the ASTRO_SITE_PATH alias and stale STRIPPED/refactor comments now that CMS_DOCS_SITE_PATH is the canonical docs project path.

---
Build: pass | Tests: pass — 2372 passed (704 files)

---
Build: pass | Tests: pass — 2372 passed (704 files)

---
Build: pass | Tests: pass — 2372 passed (704 files)

---
Build: pass | Tests: pass — 2372 passed (704 files)
2026-05-10 20:47:10 +02:00
e3ad322d3b Rename Astro docs project to clawdie-docs (Sam & Claude)
Make the docs renderer name match its purpose, add CMS_DOCS_SITE_PATH with ASTRO_SITE_PATH compatibility, and update docs publishing paths.

---
Build: pass | Tests: pass — 2372 passed (704 files)
2026-05-10 19:49:39 +02:00
Operator & Claude Code
398bdd5f5f Prune stale docs/internal handoffs, reviews, and superseded plans
Every file under docs/internal/ ends up in the bootstrap/skills-memory
artifact (per metadata.json: "Full project docs, internal docs, identity
files, and skill definitions"). Stale handoffs, dated build reports,
single-commit reviews, and superseded design notes were polluting the
embedding index with low-signal chunks.

Removed:
- TLS-CERT-LIFECYCLE-HANDOFF.md, GLASSPANE-FREEBSD-HANDOFF.md,
  CMS-ASTRO-SOURCE-OF-TRUTH-HANDOFF.md (handoffs whose work has landed)
- HOST-DB-READINESS-REVIEW.md, HOST-DB-REBOOT-REVIEW.md,
  HOST-DB-RECOVERY-PLAN.md, SYSTEM-NAMESPACE-BRANCH-REVIEW.md
  (commit/branch reviews self-marked as historical)
- BUILD-TEST-REPORT-06.APR.2026.md, test-results.md (dated snapshots)
- DEBUG_CHECKLIST.md (Feb 2026 known-issues list, top item already fixed)
- BOOTABLE-ISO-PLAN-V1.md (V1 plan; ISO-FIRST-BOOT-IMPLEMENTATION.md is now
  the source of truth)
- STRAPI-FREEBSD-SETUP.md, PI-SKILLS-INTEGRATION.md, CODEX-FREEBSD.md
  (workarounds and one-off design notes for resolved/superseded paths)
- REFACTOR-PLAN.md, nanoclaw-architecture-final.md, AGENT-HARNESS-V2.md,
  AGENT-SKILLS-VS-REALITY.md (older planning/architecture docs whose
  decisions are now in code or ARCHITECTURE.md)
- BUILTIN-KNOWLEDGE-SPEC.md, LOCAL-KNOWLEDGE-BOOTSTRAP.md (early specs
  superseded by SKILLS-ARTIFACT-V1-PLAN.md)
- HEARTBEAT.md (design doc; implementation lives in scripts/heartbeat.sh
  and src/controlplane-heartbeat.ts)
- POSTGRES-PERMISSIONS.md (one-off fix recipe)
- RUNTIME-MANIFEST-DESIGN.md (status: Implemented; design is in code now)

Updates to remaining files patch broken cross-links:
- ARCHITECTURE.md drops the two table rows pointing at deleted docs
- doc/THREE-BIRD-ARCHITECTURE.md drops Strapi-setup link references
- docs/internal/SKILLS-ARTIFACT-V1-PLAN.md drops the "Depends on" line
- docs/internal/SUDO_REPLACEMENT.md trims its list of internal docs that
  reference sudo
- .agent/skills/setup and .agent/skills/docs-deployment drop pointers to
  REFACTOR-PLAN and DEBUG_CHECKLIST

Net: 23 files deleted, 7566 lines removed. docs/internal/ goes from 41 to
18 markdown files. The artifact's next refresh will see proportionally
less noise in retrieval.

---
Build: FAIL | Tests: FAIL — 16 failed
2026-05-10 13:34:27 +02:00
f6acf8e256 Prune stale first-boot docs and scripts (Sam & Codex)
Make the first-boot implementation spec self-contained, remove the superseded secrets handoff and obsolete manual jail setup scripts, and align hostname defaulting with the assistant-name separation rule. Update PostgreSQL permission notes and sync the public first-boot page into Astro docs.

---

Build: pass

Tests: pass — 2197 passed (164 files)

---
Build: pass | Tests: pass — 2197 passed (650 files)
2026-05-07 12:40:47 +02:00
6de0ed87ab Remove legacy Mevy references (Sam & Codex)
Sweep active code, tests, identity files, public docs, CMS seed content, and stale handoffs so old assistant-name fixtures no longer leak into current Clawdie/system-namespace behavior. Keep the skills-memory SQL artifact unchanged per regeneration policy.

---

Build: pass

Tests: pass — 2197 passed (164 files)

---
Build: pass | Tests: pass — 2197 passed (650 files)
2026-05-07 11:16:40 +02:00
fcab9aa475 docs(handoff): streamline — 17/20 done, 3 items remain (Sam & Claude)
Phase 7e jail isolation, coordinator validation, and daily sync check
are the only remaining deletion criteria.

---
Build: pass | Tests: untested — doc only
2026-04-22 10:17:59 +02:00
b3fc431bb1 docs(handoff): mark /model and skill-list done
Update harness validation checklist based on confirmed Telegram /model flow; clarify PF isolation checks (use TCP probes since ping is blocked in jails).
2026-04-22 10:07:45 +02:00
abe6236da5 docs(handoff): update harness validation progress
Mark /status override and several harness cleanup items complete; update ZAI models endpoint reference and branch hash.
2026-04-22 09:50:48 +02:00
104435bcd7 docs(harness): update handoff with latest fixes
- Bump branch hash\n- Note scheduler 42P08 cast fix and ZAI endpoint correction\n\n---\nBuild: pass | Tests: pass — unchanged
2026-04-22 08:56:03 +02:00
cf9c852795 docs(handoff): update with current state, fixed bugs, remaining validation (Sam & Claude)
- All 6 original bugs fixed, 3 new found in fixes (2 fixed, 2 open trivial/low)
- Remaining: /model e2e, Phase 7e jail isolation, coordinator scenarios
- Full commit history table (20 commits this session)
- Architecture notes updated with advisory lock flow and budget breakdown

---
Build: pass | Tests: untested — doc only
2026-04-22 08:36:50 +02:00
295178c173 docs(harness): mark checks complete (Sam & Codex)
- Update HARNESS-VALIDATION-HANDOFF with latest branch and completed items
- Document default budget allocation split
- Remove extra blank line in .gitignore

---
Build: pass | Tests: pass — 1683 passed (104 files)
2026-04-21 23:48:34 +02:00
f981df260f docs(handoff): add bugs found by Linux code review for FreeBSD agent (Sam & Claude)
6 bugs found in FreeBSD agent's 6 commits:
- HIGH: dead code ensureSymlinkOnlyWhenMissing() in setup/cms.ts
- HIGH: three competing bastille list parsers need unification
- MEDIUM: ensureSitemapStub() may be unnecessary after zod removal
- MEDIUM: dashboard column parsing may be wrong
- LOW: budget allocation magic numbers undocumented
- TRIVIAL: double blank line in .gitignore

---
Build: pass | Tests: untested — doc only
2026-04-21 22:43:05 +02:00
5a052718f5 fix(controlplane): repair agent-task scripts
- Add required Authorization header (CONTROLPLANE_SHARED_SECRET)
- Support selecting assigned role via `just agent-task "..." db-admin`
- Update agent-task-status to understand `task_id` and list recent tasks
- Update harness handoff Phase 7e example

---
Build: pass | Tests: pass — 103 files, 1680 tests
2026-04-21 22:23:02 +02:00
33e54cdf01 test: add model catalog coverage (Sam & Codex)
- Add src/model-catalog.test.ts (mock fetch + mock pg.Pool query)
- Covers schema init, sync diff logic, provider lists, and formatting
- Update harness validation handoff checklist

---
Build: pass | Tests: pass — 1680 passed (103 files)
2026-04-21 21:46:06 +02:00
f3b2c0189a fix: restore harness validation commands (Sam & Codex)
- Fix `just dashboard`: correct hostd socket default, default output dir `html/dashboard`, mkdir output dir\n- Fix `just system-health`: parse `bastille list` correctly + call hostd `service-status` with `name`\n- Update harness validation handoff checkboxes + results\n\n---\nBuild: pass | Tests: pass — 1674 passed (102 files)
2026-04-21 21:07:06 +02:00
918824ffba docs: add harness validation handoff for FreeBSD agent (Sam & Claude)
---
Build: pass | Tests: untested — new doc only
2026-04-21 20:47:03 +02:00
d565984d28 chore: remove compaction handoff doc; add /budget alias 2026-04-20 12:53:36 +00:00
ef0baf9e95 doc: add session compaction handoff 2026-04-20 11:08:52 +00:00
72497d4f14 Docs/tests: lock in vision OCR pipeline
---

Build: pass | Tests: pass — 1537 passed (93 files)
2026-04-19 11:22:08 +00:00
ec3cc40679 docs(handoff): update ISO handoff with env var propagation progress
Mark 5/7 deletion criteria complete. Document shell-env.sh changes
applied to clawdie-iso. Resolve Forgejo default question (bare repo).

---
Build: pass | Tests: not run (Linux)
2026-04-19 12:22:54 +02:00
c633fdcc49 Remove legacy agent IDs + tighten task API
- Canonicalize controlplane agent IDs/roles to: sysadmin, db-admin, git-admin (drop *_agent variants).

- Add DB migration to rewrite existing *_agent rows and references to canonical IDs.

- Tighten POST /api/controlplane/tasks contract: require assigned_to (remove agent_id alias).

- Update tests and docs to match canonical IDs.

---

Build: pass (just typecheck)

Tests: pass — 1536 passed (92 files) (just test)
2026-04-19 06:54:28 +00:00
befd8bb3f4 Fix jail map canonical IDs
- Restore canonical keys in AGENT_JAIL_MAP so Phase 7 jail routing matches docs/tests.

- Normalize doc/HANDOFF-ISO-AGENT.md to repo handoff format + DD.mmm.YYYY date convention.

---

Build: pass (just typecheck)

Tests: pass — 1536 passed (92 files) (just test)
2026-04-19 06:36:29 +00:00
8010ab3b3c docs: document hostd API proxy architecture for jail agents
The hostd-bridge now routes through the controlplane API instead of
direct Unix socket. 6 files updated:

- ARCHITECTURE.md: jail isolation section — hostd via API, no socket mount
- doc/CONTROLPLANE-ARCHITECTURE.md: hostd tree shows API proxy route
- doc/CONTROLPLANE-MESSAGE-CONTRACT.md: add POST /api/controlplane/hostd
  endpoint with request/response examples
- docs/public/operate/security.md: hostd section describes HTTP proxy
  model with CONTROLPLANE_SHARED_SECRET auth
- .env.example: document CONTROLPLANE_HOST_IP (default 10.0.1.1)
- doc/HANDOFF-ISO-AGENT.md: add sections 4 (hostd API proxy) and 5
  (legacy agent ID removal) to breaking changes

Build: pass | Tests: not run (Linux) (Sam & Claude)
2026-04-19 08:32:29 +02:00
bef6ec779d refactor: remove legacy agent ID mappings, close jail extension handoff
- controlplane-runner.ts: remove CANONICAL_AGENT_MAP and 3 legacy entries
  from AGENT_JAIL_MAP (sysadmin, db-admin, git-admin). Legacy IDs were
  removed from the DB schema in 0f7fbc4 — these mappings are dead code.
  resolveCanonicalAgentId now returns input unchanged.
- Delete doc/HANDOFF-JAIL-EXTENSIONS.md — resolved by re-running
  'sudo just setup-agent-jails' which writes the PI_EXTENSIONS_DIR mount.

Build: pass | Tests: not run (Linux) (Sam & Claude)
2026-04-19 07:32:49 +02:00
263ae89404 Docs: add jail extension mount handoff for next agent
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-18 23:16:46 +00:00
a6582f2506 fix: task double-execution race, dead code cleanup, doc updates
Correctness:
- controlplane-db.ts: add claimTask() — atomically updates task status
  from 'pending' to 'in_progress' via conditional UPDATE. Returns false
  if already claimed, preventing double-execution between onTaskCreated
  callback and heartbeat loop.
- controlplane-heartbeat.ts: claim task before running agent in both
  the on-demand task loop and the per-agent heartbeat task pickup.
  Skip with 'task_already_claimed' if race lost.

Dead code removed:
- config.ts: remove ENCRYPTED_DIR, ENCRYPTED_SCREENSHOTS_DIR,
  TMP_SKILLS_DIR, SCREENSHOTS_DIR, AGENT_LOG_FILE, AGENT_ERROR_LOG
  (never imported outside config.ts)
- controlplane-db.ts: remove getAgentById (never imported)

Docs:
- GIT-JAIL-FORGEJO-HANDOFF.md: note SSH automation in setup/git.ts,
  GIT_LOCAL_URL injection, local-first skills update
- MULTI-PROVIDER-ARCHITECTURE.md: note current default zai/glm-5-turbo
  and just pi-config for runtime changes

Build: pass | Tests: not run (Linux) (Sam & Claude)
2026-04-19 00:48:25 +02:00
3505206814 docs: mark GIT-JAIL-PLAN Phase 4 complete
Build: pass | Tests: not run (Linux) (Sam & Claude)
2026-04-19 00:22:38 +02:00
7a72864549 docs: close handoffs, update harness status, add install security notes
- Delete doc/HANDOFF-PHASE7.md — Phase 7 validation complete (1530 tests
  pass, service running, jail isolation deployed)
- Delete doc/HANDOFF-SECURITY-RELIABILITY.md — security fixes validated
  on FreeBSD (API auth, shell injection prevention, pool timeouts)
- Update AGENT-HARNESS-V2.md: status from DRAFT to Phase 1-6 COMPLETE,
  Phase 7 near-complete. Replace 'What's missing' with phase completion
  table and remaining work items.
- Update install.md: add CONTROLPLANE_SHARED_SECRET setup instructions
  and note that run-*.sh wrappers are generated (not in git).

Build: pass | Tests: not run (Linux) (Sam & Claude)
2026-04-18 22:52:34 +02:00
a521ec77ff docs: comprehensive doc audit — update 16 files for consistency with codebase
Systematic review of all doc/, docs/internal/, docs/public/, ARCHITECTURE.md,
and README.md against recent codebase changes. 16 files updated:

Cross-cutting fixes (multiple files):
- Model references: anthropic/claude-3-5-sonnet → zai/glm-5-turbo (4 files)
- Port references: hardcoded 3100 → CONTROLPLANE_API_PORT (3 files)
- Skills mechanism: --no-skills + --append-system-prompt + skills_search (6 files)
- CONTROLPLANE_SHARED_SECRET: documented in security, architecture, install (5 files)
- Prompt guardrails: AGENT_MAX_INBOUND_CHARS etc. added to 3 files
- controlplane is NOT a jail — runs on host (3 files corrected)
- git jail added to layouts and IP tables (3 files)
- npm run → just (2 files)

Specific fixes:
- .env.example: AGENT_SESSION_MAX_BYTES session rollover hint
- README.md: fix IP layout (git=.6 not .4), add run-*.sh generation note
- ARCHITECTURE.md: add config vars, recipe count update, --no-skills
- doc/CONTROLPLANE-AGENT-ROLES.md: fix model, remove deleted file ref
- doc/CONTROLPLANE-ARCHITECTURE.md: port params, security, guardrails section
- doc/CONTROLPLANE-MESSAGE-CONTRACT.md: auth header, skills catalog rewrite
- doc/SESSION-HANDOFF-2026-04-18.md: fix Telegram (plain text not Markdown)
- doc/THREE-BIRD-ARCHITECTURE.md: fix 5 broken STRAPI-FREEBSD-GOTCHA refs
- doc/HANDOFF-PHASE7.md: mark sysprompt cleanup as done
- docs/internal/DOCUMENTATION.md: just CLI, tracked hooks, parameterized paths
- docs/internal/HEARTBEAT.md: add controlplane heartbeat reference, fix setup step
- docs/public/architecture/controlplane.md: phases 2-7 all  DONE
- docs/public/architecture/freebsd-jail-implementation.md: git jail, Forgejo
- docs/public/architecture/warden.md: controlplane=host, git jail added
- docs/public/operate/monitoring.md: just doctor, all guardrail vars
- docs/public/operate/security.md: API auth, shell injection, guardrails

Build: pass | Tests: not run (Linux) (Sam & Claude)
2026-04-18 22:15:59 +02:00
ffbed868f5 Fix telegram non-response on context limit
When the model hits context window exceeded, send a short guidance reply and avoid infinite cursor rollback retries.

Also record the incident and local wrapper regeneration note in session handoff.
2026-04-18 19:10:39 +00:00
4a02e2934c fix: reliability — shared secret plumbing, heartbeat resilience, pool timeouts, startup validation
Critical:
- config.ts: add CONTROLPLANE_SHARED_SECRET to readEnvFile allowlist so it
  actually gets read from .env (was dead code — agent auth always rejected).
  Added envConfig fallback (was only reading process.env).
- controlplane-api.ts: use config export instead of raw process.env.
- .env.example: document CONTROLPLANE_SHARED_SECRET, fix bind host default
  to 127.0.0.1.

Reliability:
- controlplane-heartbeat.ts: wrap writeSessionEntry in try/catch — disk full
  no longer prevents task status updates or agent reply delivery.
- controlplane-heartbeat.ts: per-agent try/catch in loop — one throwing
  agent (DB error) no longer starves remaining agents for that tick.
- db.ts, memory-pg.ts, skills-pg.ts: add connectionTimeoutMillis: 30000 to
  all pg.Pool instances. Prevents indefinite blocking on pool exhaustion.

Observability:
- channels/telegram.ts: warn when TELEGRAM_BOT_TOKEN is missing instead of
  silently disabling the channel.
- index.ts: startup config validation warns for missing TELEGRAM_BOT_TOKEN,
  CONTROLPLANE_SHARED_SECRET, and OPENAI_API_KEY.

Build: pass | Tests: not run (Linux) (Sam & Claude)
2026-04-18 20:57:16 +02:00
f523c16a1f fix(telegram): drop Markdown parse_mode to avoid double-send on code replies
Agent replies frequently contain unbalanced backticks, _, * which break
Telegram's strict Markdown parser. Every code message triggered a failed
Markdown attempt followed by a plain-text retry — wasting an API call and
adding latency. Send plain text directly; it's reliable and fast.

Also:
- Delete stale doc/HANDOFF-GIT-JAIL-OPENSSH.md (SUPERSEDED)
- Update doc/GIT-JAIL-PLAN.md Phase 2 as automated by setup/git.ts

Build: pass | Tests: not run (Linux) (Sam & Claude)
2026-04-18 14:36:31 +02:00
9da3a7f4a6 fix(git-jail): remove openssh-portable — base sshd works in thin jails
Thin jails share the host's /usr and /bin, so /usr/sbin/sshd is
already available. The openssh-portable ports package is unnecessary.

The Linux agent was right in 29439b2. My 4cc2f7d adding
openssh-portable was based on a wrong assumption. This corrects it.

Updated handoff doc to reflect the correction.

---
Build: pass | Tests: pass — 1530 passed (91 files)
2026-04-18 12:18:31 +00:00
93ce1f782d doc: handoff for Linux agent — openssh-portable in thin jails
The Linux agent removed openssh from git-jail packages thinking sshd
is in FreeBSD base. This is wrong for thin jails — they don't inherit
base services. openssh-portable fix already merged on main.

---
Build: pass | Tests: pass — 1530 passed (91 files)
2026-04-18 12:13:23 +00:00
9e2fce2311 feat(git-jail): wire git-admin agent to local repo + per-agent branches
Local-first git model:
- Add GIT_LOCAL_URL config (git jail as primary push target)
- Inject GIT_LOCAL_URL into git-admin agent jail env
- Update git-push-mirror skill for local-first architecture
- Add GIT_LOCAL_URL to .env.example
- Fix git-jail package: openssh-portable (not openssh)
- Update GIT-JAIL-PLAN.md with current status

Branch model:
- main: shared base, syncs with Codeberg upstream
- mevy: Mevy's working branch (agents commit here)
- Future agents get their own branch from main

Agent flow: work on {agent} branch → push to local git jail →
optional explicit push to Codeberg via git-push-mirror skill

---
Build: pass | Tests: pass — 1530 passed (91 files)
2026-04-18 12:10:07 +00:00
0ec476df40 feat(setup): automate git jail SSH + remove run-clawdie.sh symlink
- setup/git.ts: add setupGitJailSsh() — generates host keys, writes
  sshd_config (keys-only), creates git user with git-shell restricted
  shell, deploys operator SSH public key to authorized_keys, enables
  sshd. Fully idempotent, key sourced from SSH_PUBLIC_KEY or
  GIT_SSH_KEY_PATH.pub or ~/.ssh/id_ed25519.pub.
- infra/packages/git-jail.txt: add openssh package
- Delete run-clawdie.sh and run-mevy.sh symlink from git; add run-*.sh
  to .gitignore. setup/service.ts already generates run-${AGENT_NAME}.sh
  at install time — the checked-in template was redundant.
- .env.example: document GIT_SSH_KEY_PATH for git-admin agent
- Update HANDOFF-ISO-AGENT.md with run-*.sh change

Build: pass | Tests: not run (Linux) (Sam & Claude)
2026-04-18 11:54:28 +02:00
4a9415ecc3 doc: add ISO/Linux agent handoff for recent breaking changes
Covers skill system unification, PG 17→18 refs, and startup script fix.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-18 09:16:28 +00:00