swarm: drop git-worktree / isolation; agents share the host cwd

Each swarm subagent now runs with cwd == the parent zot's RepoRoot, just
like the main agent. No per-agent git worktree, no swarm/<id> branch, no
SetIsolation toggle, no '\''i'\'' dashboard shortcut, no --isolated flag. The
previous worktree flow was confusing (toggling '\''i'\'' on a running agent
couldn'\''t reseat its cwd, so edits kept landing in the host repo anyway)
and shipped without a real use case.

Concretely:

- delete internal/swarm/worktree.go and the WorktreeManager interface.
- Config loses Worktree; SpawnReq loses Isolated; Agent loses Branch and
  Isolated; AgentSnapshot loses Branch and Isolated; agentMeta loses
  branch and isolated (older meta.json files still decode \u2014 unknown JSON
  keys are ignored \u2014 and buildDetachedAgent coerces any stale per-
  worktree Dir back to the live RepoRoot so detached agents resume in
  the right place).
- Swarm.Remove no longer calls into any worktree manager, so it can'\''t
  accidentally git-worktree-remove the user'\''s actual source tree; it
  only clears <swarm-root>/agents/<id>/.
- runner.go drops the <Dir>/.zot/session.json fallback (every plausible
  Dir is now the user'\''s repo, where a stray .zot/ would litter the
  source tree); SessionPath is required and Spawn always populates it
  under <swarm-root>/agents/<id>/session.json.
- swarm dialog: remove isolate/SetIsolateFunc, the '\''i'\'' key handler, the
  MODE column, the mode/branch lines in the transcript header. Fix the
  transcript-view cursor row math (row += 4 was counting a now-removed
  branch row, leaving the caret one row above the editor accent bar).
- swarm slash command: drop /swarm isolate, /swarm unisolate, and the
  --isolated flag on /swarm new; trim the spawn-flag parser and tests.
- README and slash-suggest description updated; site copy updated in a
  separate commit.

Tests adjusted accordingly; full suite green.
This commit is contained in:
patriceckhart 2026-05-17 00:01:29 +02:00
parent 63e28ad156
commit 1aea23e419
14 changed files with 166 additions and 247 deletions

View file

@ -187,7 +187,7 @@ Type `/` in the TUI to open the autocomplete popup. Available commands:
| `/session` | Four ops on the current session: `export` to a portable `.zotsession` file, `import` one back in, `fork` from a past user message into a new branch, `tree` to switch between branches. Opens a picker without an argument; direct forms: `/session export [path]`, `/session import <path>`, `/session fork`, `/session tree`. Default export destination is `~/Downloads`. |
| `/jump` | Scroll the chat to a previous turn (or `/jump <text>` to filter). |
| `/btw` | Side chat with full context that doesn't add to the main thread. |
| `/swarm` | Spawn, monitor, and chat with background subagents. Each gets its own git worktree and runs in parallel with your main session. |
| `/swarm` | Spawn, monitor, and chat with background subagents. Each runs in parallel with your main session and shares its working directory. |
| `/skills` | List discovered skills (SKILL.md files) and preview their bodies. |
| `/compact` | Summarize the transcript into one message to free up context. |
| `/study` | Run the canned prompt "Read and understand everything in the current directory." so the agent has full project context before you start asking targeted questions. |
@ -210,7 +210,7 @@ Four ops on the current session. `/session` alone opens a picker; each is also r
- **`/session export [path]`**. Writes the running transcript to a portable `.zotsession` file. Default destination is `~/Downloads/<timestamp>-<session-id>-<prompt-slug>.zotsession`. Pass a path to override; a directory is fine (a dated name is built inside), a bare name gets `.zotsession` appended. The meta's cwd is stripped on the way out so the recipient doesn't see your filesystem layout.
**What's included.** Only the main chat thread of the running session — messages, tool calls, tool results, compactions, and usage. **`/swarm` subagents are NOT included.** Their transcripts, worktrees (which are real git checkouts under `$ZOT_HOME/swarm/worktrees/<id>` on branch `swarm/<id>`), unix-socket inboxes, and per-agent session files are all machine-local; a `.zotsession` is just a chat transcript and has no way to bundle a git worktree or revive a unix socket on another box. If you want to share a subagent's work, use normal git tooling on its branch (`git push`, `git format-patch`, etc.); if you want the conversation, copy it out of the dashboard manually.
**What's included.** Only the main chat thread of the running session — messages, tool calls, tool results, compactions, and usage. **`/swarm` subagents are NOT included.** Their transcripts, unix-socket inboxes, and per-agent session files are all machine-local; a `.zotsession` is just a chat transcript and has no way to revive a unix socket on another box. If you want the conversation, copy it out of the dashboard manually.
- **`/session import <path>`**. Copies a `.zotsession` file into `$ZOT_HOME/sessions/<cwd-hash>/` with a fresh id and the current cwd, then switches the running agent onto it. Imported sessions are first-class: they show up in `/sessions`, `/jump`, and the tree. Drag-drop paths in the editor are accepted (zot strips the surrounding quotes automatically).
- **`/session fork`**. Opens a turn picker (same shape as `/jump`). Pick any past user message; zot copies every message up to and including that turn into a new session, records `parent` + `fork_point` in the new meta, and switches onto the branch. The parent session stays on disk. Use it to try a different question without polluting the original transcript, or to rewind after the agent went down the wrong path.
- **`/session tree`**. Shows every session in the current cwd arranged by parent/child relationships, depth-first with indent per level. The current session is tagged `[current]`. Pick any entry to switch into it. Parentless sessions are roots; branches created via `/session fork` nest under whichever session they were forked from. Orphaned children (whose parent file was deleted) still show as roots so they stay discoverable.
@ -236,45 +236,47 @@ Inside the overlay: `enter` sends, `esc` cancels an in-flight call (or closes th
### `/swarm`
Long-running parallel subagents. Each one gets its own git worktree off your current repo (branch `swarm/<id>`), its own persistent session file, and its own background subprocess driving the model. You stay in the main session and check in on them whenever you want — the dashboard is a `/btw`-style chat per agent.
Background subagents that run alongside your main session. Each one is a separate `zot` subprocess with its own model loop, its own persistent session file, and its own chat in the dashboard — but they all run in **the same working directory as the host**, so they see and edit the same files you do. Spawn one for a side task (“draft the migration”, “investigate this stack trace”, “write the test harness for module X”), keep going in the main thread, check in on it whenever you want.
Why: “go write the test harness for module X” / “investigate this stack trace” / “draft the migration” are perfect side tasks. Spawn an agent, keep working in your main session, come back to its results later — or chat with it interactively while it works.
> **Agents edit the same files you do.** They use the same `read` / `write` / `edit` / `bash` tools as the main agent against the host's working directory. There's no per-agent worktree or branch. If you need parallel edits on isolated checkouts, set that up yourself with `git worktree` outside zot.
```
/swarm # open the dashboard
/swarm new <task> # spawn an agent on a fresh worktree
/swarm new --model gpt-5 <task> # pin the new agent to a specific model
/swarm new <task> # spawn an agent
/swarm new --model gpt-5 <task> # pin the new agent to a specific model
/swarm logs <id> # jump straight into one agent's transcript
/swarm send <id> <text> # send a follow-up without opening the dashboard
/swarm resume # pick a stopped agent to bring back
/swarm resume <id> # bring a specific agent back
/swarm kill <id> # stop a running agent (worktree stays)
/swarm remove <id> # delete the worktree + session for an agent
/swarm kill <id> # stop a running agent (its state stays)
/swarm remove <id> # delete the agent's session and state
/swarm list # alias for opening the dashboard
```
**Dashboard (`/swarm` with no arg)** — a list of every agent for the current session with status, age, and current activity. Keys:
**Dashboard (`/swarm` with no arg)** — a list of every agent for the current session, with status, age, and current activity. Keys:
- `↑` / `↓` move the cursor between rows.
- `enter` opens the highlighted agent's transcript view.
- `n` spawns a new agent. Type the task, `enter` to confirm; the new agent inherits the model your main session is currently on (see `/model` and the in-editor `/model` command below).
- `p` opens a one-off prompt editor for the selected row (alternative to entering the transcript).
- `R` resumes a stopped agent in place.
- `k` kills the selected running agent (its worktree and session stay so you can resume later).
- `r` removes the selected agent entirely (worktree + session + meta gone).
- `esc` closes the dashboard.
| Key | Action |
|---|---|
| `↑` / `↓` | Move cursor between rows. |
| `enter` | Open the highlighted agent's transcript view. |
| `n` | Spawn a new agent (opens an inline task editor; inherits the host's current model). |
| `p` | One-off prompt editor for the selected row (without entering the transcript). |
| `R` | Resume a stopped agent in place. |
| `k` | Kill the selected running agent. Its session and state stay so you can resume it later. |
| `r` | Remove the selected agent entirely (session + meta gone). |
| `esc` | Close the dashboard. |
**Inside an agent's transcript** — a chat overlay just like `/btw`. The agent's conversation flows above an always-on inline composer; type and hit `enter` to send a follow-up. The view auto-follows streaming output and shows an inline spinner with the agent's current activity (`thinking`, `tool: edit_file`, etc.) while it's busy. `esc` returns to the dashboard.
**Inside an agent's transcript** — a chat overlay with an always-on inline composer at the bottom. The conversation flows above it; type and `enter` to send a follow-up. The view auto-follows streaming output and shows an inline spinner with the agent's current activity (`thinking`, `tool: edit_file`, etc.) while it's busy. `esc` returns to the dashboard.
**Switching the spawn model from inside the editor** — while composing a task in the `n`-prompt, type `/model` on its own line and `enter`. The same model picker the global `/model` uses pops up; pick a model, the picker closes, and the editor reopens with your typed task intact and the new model pinned for the spawn.
**Switching the spawn model from inside the editor** — while composing a task in the `n`-prompt, type `/model` on its own line and `enter`. The standard `/model` picker pops up; pick a model, the picker closes, and the editor reopens with your typed task intact and the new model pinned for the spawn.
**Session scoping** — each agent is stamped with the session that spawned it and only shows up in that session's `/swarm` dashboard. Swap sessions with `/sessions` and the dashboard re-narrows accordingly. Agents you spawned in another session keep running and reappear when you switch back. Pre-upgrade agents (no session stamp) are visible from every session as a safety net.
**Session scoping** — each agent is stamped with the host session that spawned it and only shows up in that session's dashboard. Swap sessions with `/sessions` and the dashboard re-narrows accordingly. Agents from other sessions keep running in the background and reappear when you switch back.
**Persistence across zot restarts** — every spawn writes a `meta.json` next to its event log and session file under `$ZOT_HOME/swarm/agents/<id>/`. On the next `zot` launch they show up in the dashboard as **detached**; press `R` (or `/swarm resume <id>`) to bring one back. Resumed agents reattach to the same worktree, session, branch, and inbox socket, so the conversation continues from where it left off.
**Persistence across zot restarts** — every spawn writes a `meta.json` next to its event log and session file under `$ZOT_HOME/swarm/agents/<id>/`. On the next `zot` launch they show up in the dashboard as **detached**; press `R` (or `/swarm resume <id>`) to bring one back. Resumed agents reattach to the same session and inbox socket, so the conversation continues from where it left off.
**Where their work lives** — worktrees go under `$ZOT_HOME/swarm/worktrees/<id>` on the branch `swarm/<id>`. Use the normal git tooling to inspect, merge, or rebase (`git worktree list`, `git log swarm/<id>`, etc.). `/swarm remove` deletes both the worktree and the swarm bookkeeping.
**Where state lives** — everything per-agent (session file, events log, inbox socket, meta) lives under `$ZOT_HOME/swarm/agents/<id>/`. The agent's actual code edits land directly in your repo; track them with normal `git status` / `git diff`.
**`/session export` does NOT bundle subagents.** A `.zotsession` is just the main chat transcript; swarm worktrees and per-agent state are machine-local (a real git worktree on disk and a unix-socket inbox, neither of which round-trips through a JSONL file). To share a subagent's actual work, push its `swarm/<id>` branch with the normal git tooling. To share what it said, copy it out of the transcript view manually.
**`/session export` does NOT bundle subagents.** A `.zotsession` is just the main chat transcript; per-agent state (session file, unix-socket inbox) is machine-local and doesn't round-trip through a JSONL file. To share what an agent said, copy it out of the transcript view manually.
### `/skills`

View file

@ -45,7 +45,7 @@ var slashCatalog = []slashCommand{
{Name: "/jail", Desc: "confine tools to the current directory"},
{Name: "/unjail", Desc: "allow tools to touch paths outside this directory"},
{Name: "/skills", Desc: "list discovered skills (SKILL.md files)"},
{Name: "/swarm", Desc: "supervise background agents working in their own worktrees"},
{Name: "/swarm", Desc: "supervise background agents that share this working directory"},
{Name: "/reload-ext", Desc: "hot-reload all extensions (re-read manifests and respawn)"},
{Name: "/telegram", Desc: "connect, disconnect, or show status of the telegram bridge"},
{Name: "/clear", Desc: "clear the chat transcript"},

View file

@ -23,7 +23,7 @@ import (
// ↑/↓ move cursor
// enter show transcript tail for the selected agent
// k kill (Stop) the selected running agent
// r remove a terminated agent (deletes worktree)
// r remove a terminated agent (clears its state)
// esc / q close
//
// Keys (transcript view):
@ -44,8 +44,8 @@ type swarmDialog struct {
// Wired by Open(); when nil the inline 'p' shortcut is disabled.
send func(id, text string) error
// resume restarts a detached or terminated agent on its existing
// worktree/session. Wired by Open(); when nil the inline 'R'
// shortcut is disabled.
// session. Wired by Open(); when nil the inline 'R' shortcut is
// disabled.
resume func(id string) error
rows []swarm.AgentSnapshot
@ -240,7 +240,7 @@ func (d *swarmDialog) transcriptEditorCursorRow(width, popupRows, editorRowOffse
return -1
}
row := 1 // frame header
row += 4 // task / branch / dir / status
row += 3 // task / dir / status (mirrors renderTranscript's fixed header rows)
if a.Model != "" {
row++
}
@ -400,7 +400,7 @@ func promptDisabledHint(s swarm.Status) string {
// killDisabledHint mirrors promptDisabledHint for the 'k' shortcut.
// Kill only makes sense on running / pending agents; on detached and
// terminal ones it's a no-op and the user usually wants 'r' (remove)
// to clean up the worktree instead.
// to clear out the agent's state instead.
func killDisabledHint(s swarm.Status) string {
return "kill: agent is " + string(s) + "; nothing to stop (press r to remove)"
}
@ -1033,7 +1033,6 @@ func (d *swarmDialog) renderTranscript(th tui.Theme, width int) []string {
header := []string{
frameHeader(th, "swarm: "+a.ID+" (type to send, esc back)", width),
" " + th.FG256(th.Muted, "task: "+a.Task),
" " + th.FG256(th.Muted, "branch: "+a.Branch),
" " + th.FG256(th.Muted, "dir: "+a.Dir),
" " + th.FG256(th.Muted, fmt.Sprintf("status: %s, %s", a.Status, a.Activity)),
}
@ -1321,6 +1320,12 @@ func (d *swarmDialog) renderPromptEditor(th tui.Theme, width int, out []string)
}
// formatSwarmRow is the one-line summary shown per agent.
//
// Layout (fixed-width columns, then free-form activity):
//
// STATUS ID AGE ACTIVITY
// ● run fix-login-12345 3m editing main.go
// ✓ done write-tests-67890 1h done
func formatSwarmRow(r swarm.AgentSnapshot, maxWidth int) string {
status := statusLabel(r.Status)
age := formatAge(r.Started)

View file

@ -100,7 +100,7 @@ func TestSwarmDialogEnterShowsTranscript(t *testing.T) {
_ = d.Render(tui.Theme{}, 80)
d.HandleKey(tui.Key{Kind: tui.KeyEnter})
out := strings.Join(d.Render(tui.Theme{}, 80), "\n")
for _, want := range []string{"task:", "branch:", "line a", "line b"} {
for _, want := range []string{"task:", "dir:", "line a", "line b"} {
if !strings.Contains(out, want) {
t.Fatalf("transcript view missing %q:\n%s", want, out)
}

View file

@ -110,9 +110,9 @@ func (i *Interactive) runSwarm(ctx context.Context, args []string) {
return
}
if model != "" {
i.swarmStatus("spawned "+a.ID+" on "+a.Branch+" (model "+model+")", "")
i.swarmStatus("spawned "+a.ID+" (model "+model+")", "")
} else {
i.swarmStatus("spawned "+a.ID+" on "+a.Branch, "")
i.swarmStatus("spawned "+a.ID, "")
}
case "kill", "stop":
if rest == "" {

View file

@ -2,7 +2,6 @@ package modes
import (
"context"
"path/filepath"
"testing"
"time"
@ -19,7 +18,6 @@ func newInteractiveForSwarmTest(t *testing.T) (*Interactive, *swarm.Swarm) {
f := swarm.New(swarm.Config{
Root: root,
RepoRoot: root,
Worktree: swarm.MemWorktree(filepath.Join(root, "wt")),
NewRunner: func(a *swarm.Agent) swarm.Runner {
return swarm.RunnerFunc(func(ctx context.Context, sink swarm.Sink) error {
<-ctx.Done()
@ -107,7 +105,6 @@ func TestRunSwarmSendDeliversToAgentInbox(t *testing.T) {
f := swarm.New(swarm.Config{
Root: root,
RepoRoot: root,
Worktree: swarm.MemWorktree(filepath.Join(root, "wt")),
NewRunner: func(a *swarm.Agent) swarm.Runner {
return swarm.RunnerFunc(func(ctx context.Context, sink swarm.Sink) error {
// Stand up a real Listener on the agent's inbox path so

View file

@ -14,8 +14,7 @@ import (
type Agent struct {
ID string
Task string
Branch string
Dir string
Dir string // always the host's RepoRoot; agents share its cwd.
Started time.Time
// Model and Provider, when non-empty, override the child

View file

@ -33,10 +33,13 @@ import (
// the supervisor needs to rebuild an Agent after a restart live here.
// Adding a field is backwards-compatible (older meta.json files just
// leave it zero); removing or renaming one is not.
//
// Historical fields like `branch` and `isolated` are silently dropped
// by encoding/json's permissive decoder when an older meta.json is
// loaded; we don't need to keep them in the struct.
type agentMeta struct {
ID string `json:"id"`
Task string `json:"task"`
Branch string `json:"branch"`
Dir string `json:"dir"`
Started time.Time `json:"started"`
Model string `json:"model,omitempty"`
@ -51,8 +54,7 @@ type agentMeta struct {
// (and agents spawned outside of any session, e.g. by tests or
// scripted callers that didn't call SetActiveSession) have an
// empty SessionID and are visible from every session as a
// backward-compat fallback. Added in 2026 — a fresh field on a
// json struct is backwards-compatible by design.
// backward-compat fallback.
SessionID string `json:"session_id,omitempty"`
}
@ -65,7 +67,6 @@ func writeAgentMeta(stateDir string, a *Agent) error {
m := agentMeta{
ID: a.ID,
Task: a.Task,
Branch: a.Branch,
Dir: a.Dir,
Started: a.Started,
Model: a.Model,
@ -187,11 +188,20 @@ func (f *Swarm) Reload() (loaded int, errs []error) {
// The returned Agent has a closed `done` channel because Wait should
// return instantly: there is nothing to wait for.
func (f *Swarm) buildDetachedAgent(m agentMeta) *Agent {
// Older meta.json files may still record a per-agent worktree
// path under Dir. They predate the decision to run every agent
// in the host's repo and shouldn't continue editing that stale
// checkout, which most likely no longer matches HEAD. Coerce
// the dir back to the live RepoRoot so resume picks up where
// the host is now.
dir := m.Dir
if f.cfg.RepoRoot != "" {
dir = f.cfg.RepoRoot
}
a := &Agent{
ID: m.ID,
Task: m.Task,
Branch: m.Branch,
Dir: m.Dir,
Dir: dir,
Started: m.Started,
Model: m.Model,
Provider: m.Provider,
@ -334,7 +344,7 @@ func (f *Swarm) Resume(ctx context.Context, id string) (*Agent, error) {
// (e.g. tests that hand-built an Agent) don't accidentally route
// the new runner at the wrong paths.
m := agentMeta{
ID: existing.ID, Task: existing.Task, Branch: existing.Branch,
ID: existing.ID, Task: existing.Task,
Dir: existing.Dir, Started: existing.Started,
Model: existing.Model, Provider: existing.Provider,
InboxPath: existing.InboxPath, EventLogPath: existing.EventLogPath,
@ -344,7 +354,6 @@ func (f *Swarm) Resume(ctx context.Context, id string) (*Agent, error) {
a := &Agent{
ID: m.ID,
Task: m.Task,
Branch: m.Branch,
Dir: m.Dir,
Started: m.Started,
Model: m.Model,

View file

@ -21,7 +21,6 @@ func TestSpawnWritesMetaJSON(t *testing.T) {
f := New(Config{
Root: root,
RepoRoot: root,
Worktree: MemWorktree(filepath.Join(root, "wt")),
NewRunner: func(a *Agent) Runner {
return RunnerFunc(func(ctx context.Context, _ Sink) error {
<-ctx.Done()
@ -51,7 +50,7 @@ func TestSpawnWritesMetaJSON(t *testing.T) {
if got.Task != "investigate widget" {
t.Errorf("meta.Task = %q", got.Task)
}
if got.Branch != a.Branch || got.Dir != a.Dir {
if got.Dir != a.Dir {
t.Errorf("meta paths drifted: %+v vs agent %+v", got, a)
}
if got.InboxPath == "" || got.EventLogPath == "" || got.SessionPath == "" {
@ -74,7 +73,6 @@ func TestReloadRebuildsDetachedAgents(t *testing.T) {
first := New(Config{
Root: root,
RepoRoot: root,
Worktree: MemWorktree(filepath.Join(root, "wt")),
NewRunner: func(a *Agent) Runner {
return RunnerFunc(func(ctx context.Context, _ Sink) error {
<-ctx.Done()
@ -101,7 +99,6 @@ func TestReloadRebuildsDetachedAgents(t *testing.T) {
second := New(Config{
Root: root,
RepoRoot: root,
Worktree: MemWorktree(filepath.Join(root, "wt")),
})
loaded, errs := second.Reload()
if len(errs) > 0 {
@ -140,7 +137,6 @@ func TestReloadIsIdempotent(t *testing.T) {
root := t.TempDir()
first := New(Config{
Root: root, RepoRoot: root,
Worktree: MemWorktree(filepath.Join(root, "wt")),
NewRunner: func(a *Agent) Runner {
return RunnerFunc(func(ctx context.Context, _ Sink) error { <-ctx.Done(); return ctx.Err() })
},
@ -150,7 +146,7 @@ func TestReloadIsIdempotent(t *testing.T) {
}
first.StopAll()
second := New(Config{Root: root, RepoRoot: root, Worktree: MemWorktree(filepath.Join(root, "wt"))})
second := New(Config{Root: root, RepoRoot: root})
loaded1, _ := second.Reload()
loaded2, errs := second.Reload()
if len(errs) > 0 {
@ -177,8 +173,8 @@ func TestReloadReplaysTranscriptFromEventLog(t *testing.T) {
}
// meta.json
m := agentMeta{
ID: id, Task: "do thing", Branch: "swarm/alpha-9",
Dir: filepath.Join(root, "wt", id), Started: time.Now().Add(-time.Hour),
ID: id, Task: "do thing",
Dir: root, Started: time.Now().Add(-time.Hour),
InboxPath: filepath.Join(stateDir, "in.sock"),
EventLogPath: filepath.Join(stateDir, "events.jsonl"),
SessionPath: filepath.Join(stateDir, "session.json"),
@ -198,7 +194,7 @@ func TestReloadReplaysTranscriptFromEventLog(t *testing.T) {
_ = log.Append(NewEvent("agent_stopped", map[string]any{"reason": "shutdown"}))
_ = log.Close()
f := New(Config{Root: root, RepoRoot: root, Worktree: MemWorktree(filepath.Join(root, "wt"))})
f := New(Config{Root: root, RepoRoot: root})
loaded, errs := f.Reload()
if len(errs) > 0 || loaded != 1 {
t.Fatalf("reload loaded=%d errs=%v", loaded, errs)
@ -246,11 +242,11 @@ func TestReloadSkipsBareDirsAndCorruptMeta(t *testing.T) {
good := "good-1"
stateDir := filepath.Join(agentsDir, good)
_ = os.MkdirAll(stateDir, 0o755)
m := agentMeta{ID: good, Task: "x", Branch: "swarm/" + good, Dir: "/tmp/x", Started: time.Now()}
m := agentMeta{ID: good, Task: "x", Dir: "/tmp/x", Started: time.Now()}
mb, _ := json.MarshalIndent(m, "", " ")
_ = os.WriteFile(filepath.Join(stateDir, "meta.json"), mb, 0o644)
f := New(Config{Root: root, RepoRoot: root, Worktree: MemWorktree(filepath.Join(root, "wt"))})
f := New(Config{Root: root, RepoRoot: root})
loaded, errs := f.Reload()
if loaded != 1 {
t.Errorf("loaded = %d; want 1", loaded)
@ -276,7 +272,6 @@ func TestResumeRestartsRunnerOnSameSession(t *testing.T) {
)
f := New(Config{
Root: root, RepoRoot: root,
Worktree: MemWorktree(filepath.Join(root, "wt")),
NewRunner: func(a *Agent) Runner {
return RunnerFunc(func(ctx context.Context, sink Sink) error {
mu.Lock()
@ -358,7 +353,6 @@ func TestResumeSetsResumingFlag(t *testing.T) {
root := t.TempDir()
f := New(Config{
Root: root, RepoRoot: root,
Worktree: MemWorktree(filepath.Join(root, "wt")),
NewRunner: func(a *Agent) Runner {
return RunnerFunc(func(ctx context.Context, _ Sink) error { <-ctx.Done(); return ctx.Err() })
},
@ -392,7 +386,6 @@ func TestResumeRejectsRunningAgent(t *testing.T) {
root := t.TempDir()
f := New(Config{
Root: root, RepoRoot: root,
Worktree: MemWorktree(filepath.Join(root, "wt")),
NewRunner: func(a *Agent) Runner {
return RunnerFunc(func(ctx context.Context, _ Sink) error { <-ctx.Done(); return ctx.Err() })
},
@ -430,7 +423,6 @@ func TestResumeAfterReload(t *testing.T) {
// Process A
a := New(Config{
Root: root, RepoRoot: root,
Worktree: MemWorktree(filepath.Join(root, "wt")),
NewRunner: func(ag *Agent) Runner {
return RunnerFunc(func(ctx context.Context, sink Sink) error {
sink.Transcript("first run for " + ag.ID)
@ -456,7 +448,6 @@ func TestResumeAfterReload(t *testing.T) {
resumed := make(chan struct{}, 1)
b := New(Config{
Root: root, RepoRoot: root,
Worktree: MemWorktree(filepath.Join(root, "wt")),
NewRunner: func(ag *Agent) Runner {
return RunnerFunc(func(ctx context.Context, sink Sink) error {
sink.Transcript("second run for " + ag.ID)
@ -507,7 +498,6 @@ func TestSpawnReqPersistsModel(t *testing.T) {
root := t.TempDir()
f := New(Config{
Root: root, RepoRoot: root,
Worktree: MemWorktree(filepath.Join(root, "wt")),
NewRunner: func(a *Agent) Runner {
return RunnerFunc(func(ctx context.Context, _ Sink) error { <-ctx.Done(); return ctx.Err() })
},
@ -547,7 +537,7 @@ func TestSpawnReqPersistsModel(t *testing.T) {
// Reload in a fresh Swarm and confirm the detached agent still
// carries the model/provider so Resume can route the child
// subprocess back to the same model.
g := New(Config{Root: root, RepoRoot: root, Worktree: MemWorktree(filepath.Join(root, "wt"))})
g := New(Config{Root: root, RepoRoot: root})
if loaded, errs := g.Reload(); loaded != 1 || len(errs) > 0 {
t.Fatalf("reload loaded=%d errs=%v", loaded, errs)
}
@ -605,8 +595,8 @@ func TestStopOnDetachedAgentIsNoopAndDoesNotPanic(t *testing.T) {
t.Fatal(err)
}
m := agentMeta{
ID: id, Task: "t", Branch: "swarm/" + id,
Dir: filepath.Join(root, "wt", id),
ID: id, Task: "t",
Dir: root,
Started: time.Now().Add(-time.Hour),
InboxPath: filepath.Join(stateDir, "in.sock"),
EventLogPath: filepath.Join(stateDir, "events.jsonl"),
@ -617,7 +607,7 @@ func TestStopOnDetachedAgentIsNoopAndDoesNotPanic(t *testing.T) {
t.Fatal(err)
}
f := New(Config{Root: root, RepoRoot: root, Worktree: MemWorktree(filepath.Join(root, "wt"))})
f := New(Config{Root: root, RepoRoot: root})
if loaded, errs := f.Reload(); loaded != 1 || len(errs) > 0 {
t.Fatalf("reload loaded=%d errs=%v", loaded, errs)
}
@ -654,7 +644,6 @@ func TestRemoveAlsoCleansStateDir(t *testing.T) {
root := t.TempDir()
f := New(Config{
Root: root, RepoRoot: root,
Worktree: MemWorktree(filepath.Join(root, "wt")),
NewRunner: func(a *Agent) Runner {
return RunnerFunc(func(ctx context.Context, _ Sink) error { <-ctx.Done(); return ctx.Err() })
},
@ -679,7 +668,7 @@ func TestRemoveAlsoCleansStateDir(t *testing.T) {
}
// A fresh Swarm + Reload should find nothing.
g := New(Config{Root: root, RepoRoot: root, Worktree: MemWorktree(filepath.Join(root, "wt"))})
g := New(Config{Root: root, RepoRoot: root})
if loaded, _ := g.Reload(); loaded != 0 {
t.Fatalf("reload after remove loaded=%d; want 0", loaded)
}
@ -694,7 +683,6 @@ func TestActiveSessionScopesSnapshotAll(t *testing.T) {
root := t.TempDir()
f := New(Config{
Root: root, RepoRoot: root,
Worktree: MemWorktree(filepath.Join(root, "wt")),
NewRunner: func(a *Agent) Runner {
return RunnerFunc(func(ctx context.Context, _ Sink) error { <-ctx.Done(); return ctx.Err() })
},
@ -759,7 +747,6 @@ func TestSessionIDPersistsAcrossReload(t *testing.T) {
mkSwarm := func() *Swarm {
return New(Config{
Root: root, RepoRoot: root,
Worktree: MemWorktree(filepath.Join(root, "wt")),
NewRunner: func(a *Agent) Runner {
return RunnerFunc(func(ctx context.Context, _ Sink) error { <-ctx.Done(); return ctx.Err() })
},
@ -803,7 +790,6 @@ func TestEmptySessionIDIsVisibleFromAnyScope(t *testing.T) {
root := t.TempDir()
f := New(Config{
Root: root, RepoRoot: root,
Worktree: MemWorktree(filepath.Join(root, "wt")),
NewRunner: func(a *Agent) Runner {
return RunnerFunc(func(ctx context.Context, _ Sink) error { <-ctx.Done(); return ctx.Err() })
},

View file

@ -14,7 +14,8 @@ import (
)
// execRunner spawns `zot --swarm-agent <inbox> --session <path>` in
// the agent's worktree and consumes its JSONL event stream on stdout.
// the host's working directory (Agent.Dir, which is always the parent
// zot's RepoRoot) and consumes its JSONL event stream on stdout.
//
// Why a long-lived daemon and not `zot --print`: the supervisor and
// the user expect agents to keep accepting follow-up prompts. A
@ -40,9 +41,14 @@ type execRunner struct {
// tested without a real child. Production code leaves it nil.
Command []string
// SessionPath is the agent's session file. When empty the
// runner derives it as <Dir>/.zot/session.json so each agent
// owns its own session inside its worktree.
// SessionPath is the agent's session file. Empty means "defer
// to r.agent.SessionPath", which Swarm.Spawn always populates
// with <swarm-root>/agents/<id>/session.json. Tests that
// hand-build an Agent without going through Spawn must set
// one of the two; the runner refuses to invent a fallback
// because the only plausible one (<Dir>/.zot/session.json)
// would litter the user's repo — every agent's Dir points
// at it directly.
SessionPath string
}
@ -116,9 +122,27 @@ func swarmAgentArgs(opts swarmAgentArgsOpts) []string {
}
func (r *execRunner) Run(ctx context.Context, sink Sink) error {
// SessionPath resolution order:
// 1. explicit r.SessionPath set by the test / caller
// 2. r.agent.SessionPath baked in by Swarm.Spawn — the
// production path. Always lives under
// <swarm-root>/agents/<id>/session.json so the per-
// agent state is entirely outside the working tree.
// Crucial because Agent.Dir points at the user's repo;
// any .zot/ scratch directory under Dir would litter
// their source tree.
//
// There is no third fallback. If neither path is set we
// refuse to start instead of inventing a directory; that
// way a misconfigured caller fails loudly the first time
// instead of silently dumping session data into someone's
// repo.
sessionPath := r.SessionPath
if sessionPath == "" {
sessionPath = filepath.Join(r.agent.Dir, ".zot", "session.json")
sessionPath = r.agent.SessionPath
}
if sessionPath == "" {
return fmt.Errorf("swarm: agent missing session path (set SpawnRequest via Swarm.SpawnReq, or hand-build Agent with SessionPath populated)")
}
if err := os.MkdirAll(filepath.Dir(sessionPath), 0o755); err != nil {
return fmt.Errorf("session dir: %w", err)

View file

@ -46,7 +46,6 @@ func TestRunnerEndToEndWithStubChild(t *testing.T) {
f := New(Config{
Root: root,
RepoRoot: repo,
Worktree: MemWorktree(filepath.Join(root, "wt")),
NewRunner: func(a *Agent) Runner {
return &execRunner{
agent: a,

View file

@ -1,20 +1,26 @@
// Package swarm implements zot's multi-agent supervisor.
//
// A Swarm manages a set of headless zot subprocesses ("agents")
// working in their own git worktrees. The interactive TUI exposes
// the supervisor through the /swarm slash command and a dashboard
// dialog; non-TUI code can drive it directly through this package.
// that share the host's working directory. The interactive TUI
// exposes the supervisor through the /swarm slash command and a
// dashboard dialog; non-TUI code can drive it directly through
// this package.
//
// Every agent runs with cwd == the parent zot's RepoRoot — the
// same files the user sees, the same files the main agent edits.
// There is no git worktree, no per-agent branch, no isolation. If
// you want parallel edits on a separate branch, use normal git
// tooling (a real worktree, a different terminal) yourself.
//
// Each Agent has:
// - a unique id (short slug + nanoseconds)
// - a branch name on a fresh git worktree under <root>/swarm/<id>
// - a Runner (the thing that actually executes the task)
// - a Status string + Activity string that the dashboard reads
//
// The Runner abstraction means tests can swap a fake in instead of
// really spawning a subprocess; the production Runner shells out to
// `zot --print` so we reuse zot's own model resolution and tooling
// without re-implementing the agent loop.
// `zot --swarm-agent ...` so we reuse zot's own model resolution
// and tooling without re-implementing the agent loop.
package swarm
import (
@ -43,32 +49,24 @@ const (
// Config configures a Swarm.
type Config struct {
// Root is the directory under which worktrees + state files live.
// Root is the directory under which per-agent state files live.
// Typically <ZotHome>/swarm, but tests pass a tempdir.
Root string
// RepoRoot is the path to the user's git repo (CWD of the parent
// zot). New worktrees branch off this repo.
// RepoRoot is the working directory every spawned agent runs
// in — the same cwd the parent zot is using. There is no
// per-agent isolation: agents edit the host's files directly.
RepoRoot string
// Worktree creates the per-agent working directory. If nil, the
// default git-worktree implementation is used.
Worktree WorktreeManager
// NewRunner produces the Runner for an Agent. If nil, the default
// `zot --print` exec runner is used. Tests inject a fake here.
// `zot --swarm-agent ...` exec runner is used. Tests inject a fake
// here.
NewRunner func(a *Agent) Runner
// Now is a clock seam for tests; defaults to time.Now.
Now func() time.Time
}
// WorktreeManager creates and removes per-agent working directories.
type WorktreeManager interface {
Create(id, branch, base string) (dir string, err error)
Remove(id, dir string) error
}
// Runner executes one agent task. Run blocks until the task finishes,
// is cancelled via ctx, or hits an unrecoverable error.
//
@ -118,9 +116,6 @@ func New(cfg Config) *Swarm {
if cfg.Now == nil {
cfg.Now = time.Now
}
if cfg.Worktree == nil {
cfg.Worktree = &gitWorktree{root: filepath.Join(cfg.Root, "worktrees"), repo: cfg.RepoRoot}
}
if cfg.NewRunner == nil {
cfg.NewRunner = func(a *Agent) Runner { return &execRunner{agent: a} }
}
@ -158,7 +153,7 @@ func (f *Swarm) ActiveSession() string {
// events.jsonl durable event log (runner-owned)
// in.sock unix socket inbox (child-owned)
// session.json persistent agent session (child-owned)
// meta.json static metadata (id, task, branch, dir)
// meta.json static metadata (id, task)
func (f *Swarm) agentStateDir(id string) string {
return filepath.Join(f.cfg.Root, "agents", id)
}
@ -173,13 +168,13 @@ type SpawnRequest struct {
Provider string // optional override; usually paired with Model
}
// Spawn creates a new Agent for the given task, sets up its worktree,
// allocates the on-disk state directory (events log, inbox socket
// path, session file path), and starts the Runner on a background
// goroutine. The returned Agent is already in StatusRunning (or
// StatusFailed if worktree setup failed before the goroutine
// started). This is the historical signature; callers that want to
// override the child's model use SpawnReq instead.
// Spawn creates a new Agent for the given task, allocates its
// on-disk state directory (events log, inbox socket path, session
// file path), and starts the Runner on a background goroutine. The
// returned Agent is already in StatusRunning (or StatusFailed if
// state setup failed before the goroutine started). This is the
// historical signature; callers that want to override the child's
// model use SpawnReq instead.
func (f *Swarm) Spawn(ctx context.Context, task string) (*Agent, error) {
return f.SpawnReq(ctx, SpawnRequest{Task: task})
}
@ -187,17 +182,17 @@ func (f *Swarm) Spawn(ctx context.Context, task string) (*Agent, error) {
// SpawnReq is the full-fat variant of Spawn that accepts a
// SpawnRequest. Existing callers can keep using Spawn; new code that
// wants to pin the child's model uses this.
//
// Every spawned agent runs with cwd == cfg.RepoRoot — the same
// working directory as the host. No per-agent worktree, no branch,
// no isolation. The user explicitly opted out of the worktree flow.
func (f *Swarm) SpawnReq(ctx context.Context, req SpawnRequest) (*Agent, error) {
task := strings.TrimSpace(req.Task)
if task == "" {
return nil, errors.New("swarm: empty task")
}
id := newAgentID(task, f.cfg.Now())
branch := "swarm/" + id
dir, err := f.cfg.Worktree.Create(id, branch, f.cfg.RepoRoot)
if err != nil {
return nil, fmt.Errorf("worktree create: %w", err)
}
dir := f.cfg.RepoRoot
stateDir := f.agentStateDir(id)
if err := os.MkdirAll(stateDir, 0o755); err != nil {
@ -226,7 +221,6 @@ func (f *Swarm) SpawnReq(ctx context.Context, req SpawnRequest) (*Agent, error)
a := &Agent{
ID: id,
Task: task,
Branch: branch,
Dir: dir,
Started: f.cfg.Now(),
Model: strings.TrimSpace(req.Model),
@ -385,11 +379,15 @@ func (f *Swarm) StopAll() {
}
}
// Remove tears down the worktree for a terminated agent. It is an
// error to remove an agent that's still running; call Stop first and
// wait for the status to settle. Detached agents (reloaded from
// disk) remove cleanly because they have no live runner racing for
// the same files.
// Remove tears down the per-agent state for a terminated agent. It
// is an error to remove an agent that's still running; call Stop
// first and wait for the status to settle. Detached agents
// (reloaded from disk) remove cleanly because they have no live
// runner racing for the same files.
//
// Agents share the host's working tree, so Remove never touches
// any source file — it only deletes the agent's state directory
// under <root>/agents/<id>/.
func (f *Swarm) Remove(id string) error {
a := f.Get(id)
if a == nil {
@ -397,18 +395,14 @@ func (f *Swarm) Remove(id string) error {
}
a.mu.Lock()
st := a.status
dir := a.Dir
a.mu.Unlock()
if st == StatusRunning || st == StatusPending {
return fmt.Errorf("agent %s still %s", a.ID, st)
}
if err := f.cfg.Worktree.Remove(a.ID, dir); err != nil {
return err
}
// Best-effort cleanup of the per-agent state directory (meta.json,
// events.jsonl, session.json, in.sock if it's local). Worktree
// removal already succeeded; failing here would leave the user
// with no recourse, so swallow the error.
// Best-effort cleanup of the per-agent state directory
// (meta.json, events.jsonl, session.json, in.sock if it's
// local). Failing here would leave the user with no recourse,
// so swallow the error.
_ = os.RemoveAll(f.agentStateDir(a.ID))
f.mu.Lock()
delete(f.agents, a.ID)
@ -427,7 +421,6 @@ func (f *Swarm) Remove(id string) error {
type AgentSnapshot struct {
ID string
Task string
Branch string
Dir string
Status Status
Activity string
@ -465,7 +458,7 @@ func (a *Agent) Snapshot() AgentSnapshot {
errStr = a.lastErr.Error()
}
return AgentSnapshot{
ID: a.ID, Task: a.Task, Branch: a.Branch, Dir: a.Dir,
ID: a.ID, Task: a.Task, Dir: a.Dir,
Status: a.status, Activity: a.activity,
Started: a.Started, Finished: a.finished,
Err: errStr, Tail: tail, Lines: lines,

View file

@ -3,7 +3,6 @@ package swarm
import (
"context"
"errors"
"path/filepath"
"strings"
"sync"
"testing"
@ -11,16 +10,15 @@ import (
)
// newTestSwarm builds a Swarm rooted in t.TempDir with the in-memory
// worktree and a Runner factory the test controls. Returns the swarm
// plus a slice of runners keyed by spawn order so tests can assert
// they were actually invoked.
// and a Runner factory the test controls. Returns the swarm plus a
// slice of runners keyed by spawn order so tests can assert they
// were actually invoked.
func newTestSwarm(t *testing.T, mk func(a *Agent) Runner) *Swarm {
t.Helper()
root := t.TempDir()
return New(Config{
Root: root,
RepoRoot: root,
Worktree: MemWorktree(filepath.Join(root, "worktrees")),
NewRunner: mk,
})
}
@ -58,8 +56,25 @@ func TestSpawnRunsAndCompletes(t *testing.T) {
if !strings.Contains(a.ID, "do-a-thing") {
t.Fatalf("id %q missing slug", a.ID)
}
if a.Branch != "swarm/"+a.ID {
t.Fatalf("branch = %q", a.Branch)
// Every agent shares the host's RepoRoot.
if a.Dir != f.cfg.RepoRoot {
t.Fatalf("dir = %q; want repo root %q", a.Dir, f.cfg.RepoRoot)
}
}
// TestSpawnAgentSharesRepoRoot verifies the only-mode-we-support:
// every spawned agent points its cwd at the parent zot's RepoRoot.
func TestSpawnAgentSharesRepoRoot(t *testing.T) {
f := newTestSwarm(t, func(a *Agent) Runner {
return RunnerFunc(func(ctx context.Context, sink Sink) error { return nil })
})
a, err := f.Spawn(context.Background(), "share me")
if err != nil {
t.Fatal(err)
}
a.Wait()
if a.Dir != f.cfg.RepoRoot {
t.Fatalf("Dir = %q; want RepoRoot %q", a.Dir, f.cfg.RepoRoot)
}
}

View file

@ -1,110 +0,0 @@
package swarm
import (
"errors"
"fmt"
"os"
"os/exec"
"path/filepath"
"strings"
)
// gitWorktree creates per-agent working directories via `git worktree
// add`. If the repo path isn't a git repository it falls back to a
// plain mkdir under root/<id>, which is still useful for tests and
// for running agents in non-git directories.
type gitWorktree struct {
root string // <swarm-root>/worktrees
repo string // user CWD; git operations resolve from here
}
// Create makes a new worktree on branch off of HEAD. The branch is
// created if it doesn't already exist. Returns the absolute path.
func (g *gitWorktree) Create(id, branch, base string) (string, error) {
dir := filepath.Join(g.root, id)
if err := os.MkdirAll(g.root, 0o755); err != nil {
return "", err
}
if !isGitRepo(g.repo) {
// Non-git fallback: just a fresh directory. The agent works
// here; nothing is staged for merge.
if err := os.MkdirAll(dir, 0o755); err != nil {
return "", err
}
return dir, nil
}
cmd := exec.Command("git", "worktree", "add", "-b", branch, dir)
cmd.Dir = g.repo
out, err := cmd.CombinedOutput()
if err != nil {
// If the branch already exists (e.g. a leftover from a
// previous run), retry without -b so the user doesn't need
// to clean up by hand.
if strings.Contains(string(out), "already exists") {
cmd = exec.Command("git", "worktree", "add", dir, branch)
cmd.Dir = g.repo
out, err = cmd.CombinedOutput()
}
}
if err != nil {
return "", fmt.Errorf("git worktree add: %s: %w", strings.TrimSpace(string(out)), err)
}
return dir, nil
}
// Remove deletes the worktree. Uses `git worktree remove --force` so
// dirty trees don't block cleanup; the user is the one running this
// command explicitly.
func (g *gitWorktree) Remove(id, dir string) error {
if dir == "" {
return errors.New("empty worktree dir")
}
if !isGitRepo(g.repo) {
return os.RemoveAll(dir)
}
cmd := exec.Command("git", "worktree", "remove", "--force", dir)
cmd.Dir = g.repo
out, err := cmd.CombinedOutput()
if err != nil {
// `git worktree remove` refuses to operate on an unknown path;
// in that case fall back to a plain rmdir so the user can
// always clean up.
if strings.Contains(string(out), "not a working tree") {
return os.RemoveAll(dir)
}
return fmt.Errorf("git worktree remove: %s: %w", strings.TrimSpace(string(out)), err)
}
return nil
}
func isGitRepo(dir string) bool {
if dir == "" {
return false
}
cmd := exec.Command("git", "-C", dir, "rev-parse", "--git-dir")
return cmd.Run() == nil
}
// memWorktree is a WorktreeManager used by tests. It does no git
// operations; it just makes a fresh subdirectory under root.
type memWorktree struct{ root string }
// MemWorktree returns a WorktreeManager that creates plain
// subdirectories. Exposed for tests and for callers running zot
// outside a git repository.
func MemWorktree(root string) WorktreeManager { return &memWorktree{root: root} }
func (m *memWorktree) Create(id, branch, base string) (string, error) {
dir := filepath.Join(m.root, id)
if err := os.MkdirAll(dir, 0o755); err != nil {
return "", err
}
return dir, nil
}
func (m *memWorktree) Remove(id, dir string) error {
if dir == "" {
return errors.New("empty worktree dir")
}
return os.RemoveAll(dir)
}