The status-bar was showing 2x the real cost. Anthropic's SSE stream
sends the full cumulative usage payload on both message_start AND
message_delta, and our code was summing them with += on each. Cache
tokens, the biggest cost component on multi-turn sessions, were
therefore counted twice on every single API call.
Fix: assign instead of accumulate within one Stream() invocation.
Cross-call accumulation still happens correctly in
core.CostTracker.Add(). Verified end-to-end: a truly fresh "read
sample.ts on desktop" session that used to report $0.15 now reports
$0.07 with the same cache-hit rate.
While chasing that, audited and corrected the rest of the request
pipeline so the cache actually hits cleanly.
Provider layer (internal/provider/anthropic.go):
- cache_control on the Claude Code identity line (was uncached),
giving Anthropic a first stable checkpoint independent of the
user system prompt. Turns a cold start from R=0 into R>0 for
any subsequent fresh session within the cache TTL.
- tool_result blocks go in their OWN new user message instead of
merging into the preceding user message. Merging was mutating
the prior user message's content array between turns, busting
byte-identical prefix match in Anthropic's cache.
- tagLastUserCache: exactly one cache_control on the last user
message (was two), so identity + sysprompt + last-tool +
last-user fits Anthropic's 4-breakpoint budget exactly.
- user-agent dropped its "(external, cli)" suffix to match the
canonical Claude Code string exactly.
- ZOT_DEBUG_ANTHROPIC=<path> env hook appends each outgoing
request body (one JSON object per line) to that file. Off by
default; for debugging cache / cost issues in the field.
- Usage field handling now correctly assigns the latest value
from each SSE event instead of summing.
Core (internal/core/tool.go):
- Registry.Specs() now sorts tools alphabetically. Go map
iteration order is randomized per call; randomized tool arrays
were breaking Anthropic's byte-level prefix match on every
single call within a session.
System prompt (internal/agent/systemprompt.go):
- Restored a substantial default prompt with structured tools +
operating guidelines sections. The earlier aggressive trim
dropped us under Anthropic's 1024-token minimum cacheable
prefix floor: prefixes below 1024 tokens are silently NOT
cached by Anthropic, so every fresh session started cold with
R=0 no matter what else we did.
- Current default ~1040 tokens on its own; with identity and
tools it's ~1400, comfortably above the 1024 floor.
- --system-prompt, --append-system-prompt, and
$ZOT_HOME/SYSTEM.md escape hatches all still work and take
precedence.
Model catalog (internal/provider/models.go):
- claude-opus-4-5: 1M ctx / 128k max -> 200k ctx / 64k max. I had
over-extrapolated; 1M context is a 4.6+ feature.
- gpt-5.4: 400k -> 272k. Canonical value on both the OpenAI
direct API and the ChatGPT Codex OAuth backend.
- gpt-5.1, gpt-5.2, gpt-5.3, gpt-5.4-mini: pinned to 272k.
OpenAI advertises 400k on direct and Codex caps at 272k. zot
serves both from one catalog row per id, so we pin to the
smaller number to keep the context-usage meter honest under
subscription auth. Direct-API users see a conservative estimate
instead of an inflated one.
README:
- Tiny capitalization touch-up on the opening line.
|
||
|---|---|---|
| .github/workflows | ||
| cmd/zot | ||
| docs | ||
| examples | ||
| internal | ||
| pkg | ||
| .goreleaser.yaml | ||
| go.mod | ||
| go.sum | ||
| install.ps1 | ||
| install.sh | ||
| LICENSE | ||
| Makefile | ||
| README.md | ||
zot
Yet another coding agent harness, lightweight and written (vibe-slopped) in go.
- one static binary.
- two providers atm (anthropic, openai/codex).
- four tools (read, write, edit, bash).
- three run modes (interactive tui, print, json).
- built-in telegram bot.
- extensions in any language via subprocess + json-rpc. None installed by default; opt in with
zot ext installorzot --ext. See docs/extensions.md. - reusable instructions via
SKILL.mdfiles; see docs/skills.md. - no community atm.
Install
One-liner (macOS, Linux)
curl -fsSL https://raw.githubusercontent.com/patriceckhart/zot/main/install.sh | bash
Detects your OS and architecture, downloads the latest release from GitHub, verifies the SHA-256 against the release's checksums.txt, extracts the binary, and drops it in /usr/local/bin, ~/.local/bin, or ~/bin, whichever is writable first. Pass a version or prefix to pin:
curl -fsSL https://raw.githubusercontent.com/patriceckhart/zot/main/install.sh | bash -s -- v0.0.1 ~/bin
One-liner (Windows, PowerShell)
iwr -useb https://raw.githubusercontent.com/patriceckhart/zot/main/install.ps1 | iex
Drops zot.exe into $HOME\bin and adds it to the user PATH if missing. Open a fresh terminal afterwards.
Homebrew (macOS, Linux)
brew install patriceckhart/tap/zot
The tap lives at patriceckhart/homebrew-tap.
go install
go install github.com/patriceckhart/zot/cmd/zot@latest
From source
git clone https://github.com/patriceckhart/zot
cd zot
make build # produces ./bin/zot
make install # into $GOPATH/bin
Prebuilt binaries
Every release on the releases page ships archives for Linux, macOS, and Windows on amd64 and arm64 (except windows/arm64), plus a checksums.txt file. Download, verify, chmod +x, and drop on your $PATH.
Authenticate
The easiest way is to just run zot and type /login. The TUI opens even without credentials and walks you through a browser-based login flow.
Credential lookup order
--api-keyflagANTHROPIC_API_KEYorOPENAI_API_KEYenv var$ZOT_HOME/auth.json(API key or OAuth token; mode 0600)
$ZOT_HOME defaults to:
- macOS:
~/Library/Application Support/zot - Linux:
$XDG_STATE_HOME/zotor~/.local/state/zot - Windows:
%LOCALAPPDATA%\zot
/login flow
Run zot and type /login. Pick one of two methods:
- API key: a small local web server starts on
127.0.0.1:<free-port>, your browser opens a form, you paste yoursk-ant-...orsk-...key. zot probes the provider once and saves it toauth.jsonif accepted. - Subscription: use your Claude Pro/Max or ChatGPT Plus/Pro subscription. The OAuth flow pins the callback to a fixed port per provider (
localhost:53692for Anthropic,localhost:1455for OpenAI) because those are the only ports their auth servers will redirect to.- Anthropic uses the Claude Code OAuth flow. Messages go to
api.anthropic.comwith a bearer token and the Claude Code identity headers. - OpenAI uses the Codex CLI OAuth flow. Messages go to
chatgpt.com/backend-api/codex/responseswith thechatgpt-account-idextracted from the returned id_token.
- Anthropic uses the Claude Code OAuth flow. Messages go to
Note on subscription login. The OAuth client IDs used are the ones published in Anthropic's Claude Code CLI and OpenAI's Codex CLI. Reusing them from a third-party tool is against their terms of service and may be revoked at any time. Use it at your own risk; the API-key flow is the safe default.
Token refresh
OAuth access tokens are short-lived (Anthropic ~8h, OpenAI ~30d). zot refreshes them automatically:
- At every credential lookup, zot checks the stored
expiryand, if past it (with a 60s safety margin), hits the provider'soauth/tokenendpoint with the storedrefresh_token, persists the newaccess_token,refresh_token, andexpiryback toauth.json, and hands the fresh token to the client. - The telegram bridge additionally refreshes once per turn so a bot that runs for days keeps working without manual intervention.
- If the refresh itself fails (the
refresh_tokenwas revoked, or the account was logged out everywhere), the error bubbles up to the caller: the TUI shows it in the status line, the bot replies with it in your DM. Run/loginto get a fresh token pair.
All data lives under $ZOT_HOME:
$ZOT_HOME/
├── config.json # last-used provider/model/theme, saved automatically
├── auth.json # api keys and oauth tokens (mode 0600)
├── sessions/ # jsonl transcripts, one dir per cwd
├── models-cache.json # live /v1/models discovery cache (6h ttl)
├── SYSTEM.md # optional: replaces the default system prompt
├── skills/ # optional: user SKILL.md files (opt in with --with-skills)
├── extensions/ # installed extensions, one dir per extension
└── logs/ # app log files
Drop a SYSTEM.md in $ZOT_HOME to replace the built-in identity and guidelines for every run. --system-prompt still wins per-invocation. Delete the file to revert to the default.
Changelog on update
The first time you launch a newer zot binary, the TUI shows the GitHub release notes once in a dismissible overlay. Press any key to close. The version is recorded in config.json's last_changelog_shown so the same release notes never reappear. Fresh installs don't see a changelog (no upgrade has happened yet). The fetch is best-effort: a network failure or a missing release page silently skips, with another attempt on the next launch.
Usage
zot # interactive tui
zot "fix the failing test" # tui, pre-filled prompt
zot -p "list all go files" # print final text, exit
zot --json "refactor main.go" # newline-delimited json events, exit
zot --continue # resume the most recent session for this cwd
zot --resume # pick a session to resume
zot --list-models # show supported models
zot --help
Flags
| Flag | Description |
|---|---|
--provider anthropic|openai |
Pick the provider. |
--model <id> |
Pick the model (see --list-models). |
--api-key <key> |
Override the API key. |
--base-url <url> |
Override the provider base URL (tests, self-hosted). |
--system-prompt <text> |
Replace the default system prompt for this run (also overrides $ZOT_HOME/SYSTEM.md). |
--append-system-prompt <text> |
Append text to the system prompt (repeatable). |
--reasoning low|medium|high |
Enable reasoning on supported models. |
-c, --continue |
Resume the latest session for this cwd. |
-r, --resume |
Pick a session to resume. |
--session <path> |
Resume a specific session file. |
--no-session |
Don't read or write session files. |
--cwd <path> |
Use <path> as the working directory. |
--no-tools |
Disable all tools. |
--tools <csv> |
Only enable the listed tools. |
--max-steps <n> |
Cap agent loop iterations (default 50). |
-e, --ext <path> |
Load an extension from <path> for this run (repeatable; wins against installed extensions of the same name). |
--no-ext |
Skip extension discovery for this run. --ext still works on top, so --no-ext --ext ./x runs only x. |
--with-skills |
Also load user-installed skills. Without this, only the built-in skills shipped in the binary are loaded. |
--no-skill |
Disable all skills, including built-ins. No skill tool is registered and the system prompt has no skill manifest. |
Tools
read: read text files, or inline images (PNG, JPEG, GIF, WebP).write: create or overwrite files, making parent directories as needed.edit: one or more exact-match replacements in an existing file.bash: run a shell command in the session cwd, with merged stdout/stderr and a timeout.
When the sandbox is on (see /lock), all four tools refuse paths outside the session cwd.
Modes
- Interactive (default): chat TUI with streaming output, spinner, cost meter, slash commands.
- Print:
zot -p "prompt"runs the agent to completion and writes only the final assistant text to stdout. - JSON:
zot --json "prompt"emits one JSON object per agent event to stdout, newline-delimited. The schema is documented in docs/rpc.md. - RPC:
zot rpcruns as a long-lived child process; commands in on stdin, events and responses out on stdout, both as NDJSON. Designed for embedding zot in third-party apps written in any language. See docs/rpc.md for the wire schema andexamples/rpc/{python,node,shell,go}for working clients.
Embedding
Two ways to drive zot from another program:
- Go in-process: import
github.com/patriceckhart/zot/pkg/zotcore. OneRuntimeper project;Prompt(ctx, text, images)returns a channel ofEvent. Small example inexamples/sdk/. - Any language, out-of-process: spawn
zot rpcas a subprocess and exchange newline-delimited JSON over its stdin/stdout. Wire format and event schema in docs/rpc.md. Reference clients live underexamples/rpc/.
Both interfaces share the same event schema, so transcripts captured by one can be replayed through the other.
Slash commands
Type / in the TUI to open the autocomplete popup. Available commands:
| Command | Description |
|---|---|
/help |
Show key bindings and commands. |
/login |
Log in via API key or subscription (opens a dialog). |
/logout [provider] |
Clear credentials for anthropic, openai, or all when omitted. |
/model |
Pick a model from a list (or /model <id> to set directly). |
/sessions |
Resume a previous session for this directory. |
/jump |
Scroll the chat to a previous turn (or /jump <text> to filter). |
/btw |
Side chat with full context that doesn't add to the main thread. |
/skills |
List discovered skills (SKILL.md files) and preview their bodies. |
/compact |
Summarize the transcript into one message to free up context. |
/lock |
Confine tools to the current directory. |
/unlock |
Allow tools to touch paths outside again. |
/reload-ext |
Hot-reload all extensions (re-read manifests, respawn subprocesses, rebuild tool registry). |
/clear |
Clear the chat transcript. |
/exit |
Exit zot. |
Extension-registered commands appear under a divider at the bottom of the popup, sorted by name.
/sessions
Shows previous sessions for the current working directory, newest first, with timestamp, model, message count, cost, and the first user prompt. Pick one with up/down, enter to resume, esc to cancel. zot swaps the current session file for the selected one and replays the full transcript (including tool calls) into the agent. Sessions remember the model they ended on, so resuming picks up on that exact model even if your global default changed.
/jump
Opens a turn picker for the current session, one row per user prompt, each showing the turn number, how many tools that turn invoked, and the first line of the prompt. up/down to pick, enter to jump, esc to cancel. Any printable rune while the picker is open extends a filter; backspace narrows it back. /jump <text> pre-applies the filter; if exactly one turn matches, zot jumps straight there without showing the picker.
Jumping is non-destructive. The transcript is untouched, the viewport just scrolls so the chosen turn is at the top. A muted line at the top of the chat reads viewing turn N of M, pgdn to catch up. Scroll back to the bottom with pgdn (or keep scrolling with the arrow keys) and the indicator goes away.
/btw
Opens a side-chat overlay with the full main session as frozen context, so you can ask quick clarifying questions ("does asyncio.gather() catch exceptions?", "btw the bundle budget is 10MB", "what's the default fetch timeout?") without bloating the main thread.
Each question fires a one-off model call against system + main transcript + side-chat history so far. Responses render in the overlay and stay there. When you press esc to close, nothing has been added to the main session and subsequent main-thread turns don't re-read any of the side-chat exchanges, keeping the running context window lean.
/btw # open the overlay, type questions interactively
/btw does PUT replace the whole resource?
Inside the overlay: enter sends, esc cancels an in-flight call (or closes the overlay if idle), ctrl+c closes immediately. Side-chat exchanges never touch the transcript and aren't persisted to the session file.
/skills
Opens a picker listing every discovered SKILL.md file, built-ins hidden. Each row shows the skill name, source, and description. enter opens the body inline (scrollable with up/down/pgup/pgdn); esc goes back. Re-runs discovery each time it opens, so edits to a SKILL.md during a session are reflected immediately.
/compact
Sends the current transcript through the model with a structured summarization prompt. The returned summary replaces the transcript as one synthetic user message, with the last few exchanges kept verbatim for continuity. The status bar's context meter resets. Use it when the context meter creeps past ~80%.
zot also auto-compacts in the background: after any turn that leaves context usage at or above 85% of the model's window, the agent kicks off a condense pass on its own. You'll see condensing history, esc to cancel above the status bar and an (auto) tag next to the context percentage; esc aborts it without touching the transcript.
/lock
Enforces a sandbox rooted at the cwd shown in the status bar. read, write, and edit resolve their target path (including through symlinks) and refuse anything outside the sandbox. bash refuses obvious escape patterns: sudo, rm -rf /, leading cd /, cd .., cd ~, chmod -R, dd of=/, and similar. The status bar shows locked, ~/your/cwd while active.
This is a guardrail against accidents, not a hard security boundary. If you need real isolation, run zot under docker or a proper sandbox.
Sessions
Every interactive or print/json run (unless --no-session) writes a JSONL transcript under $ZOT_HOME/sessions/<cwd-hash>/. Resume any of them with --continue, --resume, --session <path>, or interactively via /sessions inside the TUI. Empty sessions (the user exited without prompting) are deleted on close so the list stays tidy.
Models
--list-models or the /model picker shows the full catalog. Three sources:
- Catalog: models baked into zot, always available.
- Live: IDs discovered from
GET /v1/modelsusing your stored API key (cached for 6h in$ZOT_HOME/models-cache.json, refreshed in the background on startup). - Speculative: IDs that appear in the upstream generator but aren't live on the public API yet. They'll 404 today and start working the moment the provider ships them.
The context meter in the status line uses the model's advertised context window to show how much of it your last turn consumed.
Inline images
When a tool returns an image (for example read on a PNG), zot renders it inline on terminals that support it: iTerm2, WezTerm, Kitty, Ghostty. On other terminals you see a text placeholder with MIME type, pixel dimensions, and byte size. Control with the ZOT_INLINE_IMAGES env var:
| Value | Effect |
|---|---|
| unset (default) | Auto-detect based on TERM_PROGRAM. |
iterm, iterm2 |
Force the iTerm2 OSC 1337 protocol. |
kitty |
Force the Kitty graphics protocol. |
off, none |
Always use the text placeholder. |
Frames containing images are full-repainted (no differential diff) to prevent stale image pixels from lingering through scroll. That costs one terminal flash per image-containing frame; set ZOT_INLINE_IMAGES=off if that bothers you.
Queued messages
You can keep typing while the agent is working. Pressing enter during a turn queues the message instead of interrupting: it shows up above the status bar as sliding in: <text> and is delivered as the next user turn the moment the current one finishes. Queue as many as you want; they run in order. esc or ctrl+c cancels the active turn and drops the queue so a runaway turn doesn't flood you with stale follow-ups.
Slash commands also work while the agent is busy. Read-only ones (/help, /jump, /btw, /sessions, /skills, /lock, /unlock, /exit) take effect immediately. Destructive ones (/clear, /compact, /login, /logout, /model, /reload-ext) cancel the active turn first and then run.
Keys (interactive mode)
Input
| Key | Action |
|---|---|
enter |
Submit (queued if the agent is busy). |
alt+enter |
Newline. |
tab |
Complete the selected slash command. |
esc |
Cancel the current turn (while busy); clear input (while idle). |
ctrl+c |
Clear the input and queue (or cancel the current turn). Press again within 2s to exit. |
ctrl+d |
Exit on empty input. |
ctrl+l |
Redraw the screen. |
ctrl+o |
Expand or collapse long tool results (read, write, edit, bash outputs over ~12 lines). |
Editor line navigation
| Key | Action |
|---|---|
ctrl+a, ctrl+e |
Jump to start or end of line. |
alt+left, alt+right |
Jump one word back or forward. |
ctrl+u, ctrl+k |
Delete to start or end of line. |
ctrl+w, alt+backspace |
Delete the previous word. |
up, down (editor non-empty) |
Cycle through prompt history. |
Chat scroll
| Key | Action |
|---|---|
pgup, pgdn |
Scroll one page up or down. |
up, down (editor empty) |
Scroll three lines up or down. This is how the mouse wheel reaches the scroll logic on most terminals. |
Extensions
zot can be extended in any language via a subprocess + JSON-RPC protocol. Extensions can register slash commands, expose tools to the model, intercept tool calls (block or rewrite args), gate whole turns before the model is called, and rewrite the assistant's visible text before it reaches the user. None are installed by default; opt in explicitly. Hot-reload any time with /reload-ext.
Install and manage
zot ext install <path|git-url> # copy / clone into $ZOT_HOME/extensions/
zot ext list # show installed extensions
zot ext logs <name> [-f] # cat or tail the extension's stderr log
zot ext enable <name> # re-enable a disabled extension
zot ext disable <name> # disable without removing
zot ext remove <name> # delete an extension directory
For development, point zot --ext <path> at a working directory and skip the install step entirely. Repeatable; takes precedence over installed extensions of the same name.
Reference
examples/extensions/ ships reference implementations in Go, TypeScript, Node, and shell. See docs/extensions.md for the full protocol, the SDK API (pkg/zotext), and the phase roadmap.
Skills
A skill is a per-folder SKILL.md file with a YAML frontmatter header. zot discovers skills at startup, surfaces their names in the system prompt, and exposes a built-in skill tool the model uses to load the body on demand.
By default only the built-in skills shipped with the zot binary are loaded. Pass --with-skills to also load user-installed skills from:
./.zot/skills/<name>/SKILL.md(project)$ZOT_HOME/skills/<name>/SKILL.md(global)./.claude/skills/<name>/SKILL.md,~/.claude/skills/<name>/SKILL.md(Claude-compatible layout)./.agents/skills/<name>/SKILL.md,~/.agents/skills/<name>/SKILL.md(agent-compatible layout)
See docs/skills.md for the frontmatter fields, authoring tips, and example skills under examples/skills/.
Telegram bot (bridge)
zot can run as a telegram bot so you can DM it from your phone. It's a built-in subcommand, not a plugin:
zot telegram-bot setup # paste a BotFather token, verify, save
zot telegram-bot run # foreground: long-poll in this terminal (ctrl+c to stop)
zot telegram-bot start # background: detach and return immediately
zot telegram-bot stop # SIGTERM the background bot (SIGKILL after 5s)
zot telegram-bot logs -f # tail $ZOT_HOME/logs/bot.log (omit -f to just cat)
zot telegram-bot status # config (token masked) + running/stopped
zot telegram-bot reset # forget the token and paired user
# short alias: `zot tg ...` is accepted for every subcommand
The background flavor writes the child's PID to $ZOT_HOME/bot.pid and redirects stdout and stderr to $ZOT_HOME/logs/bot.log. zot telegram-bot stop reads that PID, sends SIGTERM, waits up to five seconds, then escalates to SIGKILL if the child is still alive. Running two instances at once is refused at startup.
Use the installed binary for
start.go run ./cmd/zot telegram-bot startwon't work.go runbuilds a binary in a temp directory and deletes it when it exits, which kills the detached child. Runmake install(orgo build) first and invoke the installed binary.
Setup flow:
- Talk to @BotFather on telegram, run
/newbot, copy the token it gives you. - Run
zot telegram-bot setupand paste the token when prompted. - Run
zot telegram-bot runin the directory you want the agent to operate in. - Open your bot on telegram, send
/start. The first user to do this claims the bridge (stored asallowed_user_id); every other user is rejected.
From then on, any DM you send is forwarded to the agent as a user prompt. Attached photos or image/* documents are downloaded and passed to vision-capable models. In-bot telegram commands: /help, /status, /stop (cancel the current turn). Config lives in $ZOT_HOME/bot.json (mode 0600).
Bot mode respects the usual zot flags: --provider, --model, --cwd, --reasoning, --continue, --no-session, --no-tools, and so on. Run zot tg run -c --model claude-opus-4-1 to resume the latest session on Opus, for example.
Development
make build # build ./bin/zot
make test # go test -race ./...
make lint # go vet + gofmt check
make fmt # gofmt -w .
make release # cross-compile linux/darwin/windows on amd64 and arm64
Source layout:
cmd/zot/ main()
internal/agent/ cli wiring, arg parsing, system prompt, config
internal/agent/extensions/ extension subprocess manager
internal/agent/modes/ interactive tui, print, json, dialogs
internal/agent/tools/ read, write, edit, bash, sandbox
internal/auth/ credential store, api-key probe, oauth, login server
internal/core/ agent loop, sessions, cost tracking
internal/extproto/ extension wire-format types
internal/provider/ anthropic + openai streaming clients, model catalog
internal/skills/ skill discovery, frontmatter parser, skill tool
internal/tui/ terminal raw-mode, input parser, editor, renderer, markdown, view
pkg/zotcore/ public Go SDK for embedding zot in-process
pkg/zotext/ public Go SDK for writing extensions
License
MIT