zot/internal/swarm/socketpath.go
patriceckhart b11e6ed4e4 swarm: introduce /swarm dashboard, /btw-style transcript view, and per-session scope
A /swarm subsystem for long-running parallel subagents. Each agent runs
in its own subprocess against a fresh git worktree (branch swarm/<id>)
with its own persistent session file and unix-socket inbox; the parent
zot stays in the main session and pokes / observes them via the
dashboard.

Highlights:

- New internal/swarm package: Agent, Spawn/Resume/Kill/Remove, event log
  (events.jsonl), inbox protocol (listen/dial), worktree manager, exec
  runner that spawns "zot --swarm-agent ...".
- New internal/agent/swarm_agent.go: daemon-mode child entry point.
  Reuses the standard agent loop but persists turns to the supervisor-
  chosen session.json and streams events as JSONL on stdout. Mirror to
  events.jsonl is dormant while the supervisor's stdout pipe is alive so
  events do not get double-written.
- Resume reattaches in place: reuses the same worktree, session, branch
  and inbox path; carries forward the prior transcript replayed from
  events.jsonl. Resume no longer re-fires the original Task as a fresh
  user turn -- that was producing "agent busy; send cancel first" races.
- core.NewSessionAtPath plus an openOrCreateSession fallback so the
  child actually persists its session.json at the supervisor-chosen path
  on first spawn instead of running with sess==nil.
- Dashboard in internal/agent/modes/swarm_dialog.go + swarm_slash.go:
  list / new / kill / remove / resume / logs / send subcommands plus an
  interactive picker. Transcript view is /btw-style: an always-on
  inline editor at the bottom, streaming auto-follow, inline busy
  spinner with the agent's current activity such as "thinking" or
  "tool: edit". /model inside the spawn editor pops the global model
  picker.
- Per-session scope: each spawn is stamped with the host session's id
  and only shows in that session's /swarm dashboard. Pre-upgrade agents
  -- empty session_id -- remain visible everywhere as a safety net. The
  active scope is re-applied whenever loadSession swaps sessions.
- Resolve falls back to the provider's default model when the persisted
  cfg.Model is no longer in the catalogue, warns on stderr, and rewrites
  config.json so the next launch is silent.
- ReadEventLog folds back-to-back same-type identical-payload events
  within 250ms so events.jsonl files polluted by the old supervisor +
  mirror double-write read back cleanly.
- DrawLog gains an idle no-op fast path: identical buffer plus identical
  cursor = emit nothing, so the terminal's cursor blink keeps ticking in
  dialogs whose underlying agent is idle.

Slash UX:

- New /swarm command with subcommands; the suggester picks it up.
- README.md documents the full dashboard, CLI, and persistence story,
  and explicitly notes that /session export does NOT bundle subagents
  -- their worktree and unix-socket inbox cannot round-trip through a
  .zotsession.

Tests cover: SpawnReq + Resume lifecycle, session-id scoping + persistence,
default-child-args spawn vs resume contract, NewSessionAtPath at a fixed
path, model fallback when the configured model is gone, swarm dialog
behaviour -- auto-open editor, /model in spawn editor, transcript grows
without internal scroll, busy spinner, multi-message send -- event-log
dedup, swarm emitter dormant-until-orphan, and the DrawLog idle no-op +
change-breaks-fast-path invariants.
2026-05-16 11:53:20 +02:00

69 lines
2.5 KiB
Go

package swarm
import (
"crypto/sha1"
"encoding/hex"
"fmt"
"os"
"path/filepath"
"runtime"
)
// maxUnixSocketPath is the conservative platform-portable path limit
// for unix sockets. macOS allows 104, linux 108 (including the NUL
// terminator). We pick 100 so the path itself plus a small filename
// tail stays under both caps with a safety margin.
const maxUnixSocketPath = 100
// inboxSocketPath returns a per-agent unix-socket path that's short
// enough to actually work (see maxUnixSocketPath) and unique per
// swarm root so two zot instances on the same machine don't collide.
//
// Strategy:
//
// 1. Try <root>/agents/<id>/in.sock. This is the obvious place and
// puts everything next to the durable state; on most setups it
// fits.
// 2. If that's too long, fall back to <tmp>/zot-swarm-<roothash>/<id>.sock.
// We hash root rather than embedding it so the tmp directory name
// stays short. SHA-1's first 8 hex chars is plenty: collisions
// only matter within a single user's tmp dir and we already
// create a dedicated subdir.
// 3. If even /tmp is somehow too long (chroots, containers), give
// up with a clear error so the caller surfaces it instead of
// leaving the user wondering why follow-ups don't work.
func inboxSocketPath(root, agentID string) (string, error) {
primary := filepath.Join(root, "agents", agentID, "in.sock")
if len(primary) <= maxUnixSocketPath {
return primary, nil
}
tmp := os.TempDir()
dir := filepath.Join(tmp, "zot-swarm-"+rootTag(root))
if err := os.MkdirAll(dir, 0o700); err != nil {
return "", fmt.Errorf("socket tmp dir: %w", err)
}
candidate := filepath.Join(dir, agentID+".sock")
if len(candidate) <= maxUnixSocketPath {
return candidate, nil
}
// Last-resort: use just the short hash of the id so even very long
// task slugs fit. We surface the original id in the meta.json /
// events log; the socket path is purely transport.
short := shortHash(agentID)
candidate = filepath.Join(dir, short+".sock")
if len(candidate) <= maxUnixSocketPath {
return candidate, nil
}
return "", fmt.Errorf("unix socket path too long even after shortening (%s, %d > %d, GOOS=%s)",
candidate, len(candidate), maxUnixSocketPath, runtime.GOOS)
}
// rootTag returns a stable 8-hex-char tag for the swarm root. Used
// in the tmp-dir name so two parallel zot instances with different
// roots don't share sockets.
func rootTag(root string) string { return shortHash(root) }
func shortHash(s string) string {
sum := sha1.Sum([]byte(s))
return hex.EncodeToString(sum[:4])
}