The runtime escape hatch was redundant: --no-yolo's per-call dialog
already exposes 'yes-always-this-session' which flips ConfirmGate
into allow-all mode without a separate command. Once a session
starts with --no-yolo, the only way to disable confirmations is now
to either pick the always-this-session option in the dialog or exit
and relaunch.
- slash_suggest.go: drop the /yolo entry from the autocomplete list.
- interactive.go: remove the case "/yolo" dispatch (falls through
to 'unknown command') and delete the orphaned runYoloOn method.
- README: drop the /yolo row from the slash-command table and the
trailing reference in the --no-yolo flag description.
- internal/provider/gemini.go: REST client against
generativelanguage.googleapis.com/v1beta/models/{id}:streamGenerateContent
?alt=sse, mapping our message/tool format to Gemini's Content/Part schema
and translating SSE chunks into the existing assistant-message event
stream. Handles text, tool calls, thought-summary parts, and per-model
thinking config (thinkingBudget for 2.5, thinkingLevel for 3.x with
Gemini-3-Pro pinned to LOW minimum).
- internal/provider/discover.go: DiscoverGoogle pages /v1beta/models and
filters to chat-capable ids (skips embeddings, AQA).
- internal/provider/models.go: catalog entries for gemini-2.5-pro,
2.5-flash, 2.5-flash-lite, 2.0-flash, 2.0-flash-lite.
- internal/auth: 'google' is a recognized provider; API-key probe hits
/v1beta/models with x-goog-api-key. OAuth flows reject google with a
clear 'API-key only' error since Gemini Advanced subscriptions don't
issue API tokens.
- internal/agent: env lookup for GEMINI_API_KEY / GOOGLE_API_KEY,
default model gemini-2.5-pro, NewClient wires provider.NewGemini,
background model discovery, /login + /logout + rescue dialog all
include google.
- README: new ### Google Gemini section with auth model, free-tier
limits, and reasoning-config notes.
Pressing Option+Up while the agent is busy now pops the most recently
queued ('sliding in') message back into the editor so the user can
edit and resend it. Repeated presses keep peeling messages off the
tail of the queue, newest first; each press replaces the editor
contents rather than appending. When the queue is empty the keypress
falls through to the existing scroll-up behavior.
A muted hint row underneath the chips advertises the shortcut, using
the same color as the model info on the status bar so it reads as
ambient metadata.
DrawLog now saves/restores the cursor at the top of the bottom band
instead of relying on relative up-N math that drifted when the
terminal naturally scrolled between frames. This fixes duplicated
transcript blocks with empty gaps (previously only ctrl+l recovered).
Also strip literal carriage returns from pasted/typed editor text
before rendering. A bare \r moves the terminal cursor to column 0
and overwrites the left side of the input row, which looked like
missing highlight segments on continuation lines.
- Pre-turn auto-compact: when the previous turn already pushed
context past the threshold, condense before sending. The user's
prompt is re-queued and fired automatically once compaction
succeeds.
- HTTP 413 handling: a 'payload too large' from the provider is no
longer surfaced as a status_err. Instead the request is retried
after a transparent auto-compact pass.
- Both inline auto-compacts surface a yellow chat note above the
status bar so the user sees the spinner *and* the reason; on
success a status_ok like 'context auto-compacted; sending your
last message' confirms the retry.
- Resume picker (/sessions and startup) now scrolls to the bottom
of the loaded transcript instead of parking at the last user
turn, so the most recent reply is always fully visible.
- Drop the VS Code mouse-capture path: native click-drag selection
beats the wheel-speed boost there.
Loading or exporting a session containing very large JSONL rows
(image blocks, big tool outputs, compacted history) failed with
'bufio.Scanner: token too long' — Scanner caps each token to its
buffer size, even when bumped to 20 MiB. A single oversized row
blocked OpenSession entirely so an existing long session could
not be resumed.
Switch session readers (OpenSession, SessionUsage, describeSession,
sessionHasNoMessages, ExportSession, ImportSession, BranchSession,
firstUserPrompt) to a shared bufio.Reader.ReadBytes-based JSONL
helper that handles arbitrarily long lines. Add a regression test
that opens and exports a session containing a >20 MiB row.
The redraw path rebuilt the full transcript on every key event:
filtered the agent's full message slice, refreshed tool path maps,
walked every message through the per-message render cache, and
re-assembled the entire chat line buffer. With a long session, the
O(N) work per keystroke made typing visibly lag.
Add an idle render cache: the previously built chat is reused when
nothing relevant changed (terminal width, transcript revision,
status notes, help/update banners, expand-all). The agent now
exposes a cheap monotonically increasing Revision() that ticks
whenever messages are appended or replaced, so the cache key stays
trivial. Live turns (busy/streaming/tool-call mutations) keep the
old rebuild path.
iTerm's OSC 1337 and kitty/ghostty's APC G escapes both advance the
cursor by the image's actual rendered cell count, which depends on
the terminal's font cell aspect ratio. Without knowing that ratio,
zot's box renderer couldn't reliably place the closing │ at the box's
right column: spaces-based padding either overshot (cutting the bar
off the screen) or undershot (drawing the bar mid-image).
Use the protocols' "don't move cursor" flags to keep the terminal
cursor at the start of the image after rendering:
- iTerm2: doNotMoveCursor=1
- kitty/ghostty: C=1
With cursor pinned, the row's logical advance is just the leading
indent spaces, and toolBoxSideWithImage can pad to the right edge
column without caring how the image was scaled. The image's graphics
layer paints over the padding spaces visually.
Re-enable inline images in iTerm.app since the new approach removes
the alignment hack that previously misbehaved there.
User message bubble redesign:
- No more "\u258c you" / "\u258c zot" speaker labels; turns are
delimited by a tinted bubble panel for the user side and plain
prose for the assistant side.
- The bubble is a full-width tinted row prefixed by an accent "\u258c "
bar (BG behind the bar matches the panel so bar and panel read as
one continuous coloured strip). One blank tinted row above and
below the message gives the bubble vertical breathing room \u2014 the
closest a terminal can get to css padding-top / padding-bottom.
- Theme.UserBubble + Theme.UserBubbleBG/FG carry the panel colours.
- /btw side-chat reuses the same bubble helper so it matches the
main chat layout.
- One blank line above the slash popup, the dialog block, and the
sliding-in chips so they don't sit flush against the chat above.
- One trailing blank inside the /btw frame after the editor.
Tool box outer/inner geometry:
- toolBoxOuterMargin pulls the box frame in from the terminal
edges so user bubble, assistant prose, and box frames all share
the same left/right column.
- Image-footprint rows (escape + reservation blanks + caption gap)
are now tagged with imageFootprintSentinel by renderImageBlock
and the three box-side wrappers strip the tag and still wrap the
rows in \u2502 \u2026 \u2502 so the box edges stay continuous around the
image. The escape is indented inside the frame instead of
kissing the \u2502.
- toolBoxSide bypasses width measurement / truncation when the row
carries an iTerm OSC 1337 or Kitty APC G escape: visibleWidth
doesn't recognise OSC payloads and was destroying the image
bytes when wrapping turned them into thousand-cell-wide rows.
Markdown / code fences:
- RenderMarkdown emits a FlushLeftSentinel byte at the start of
every fenced code-block line so a future caller can opt those
rows out of prose indent. The current consumer doesn't strip
the sentinel, but stale callers in skills / changelog / btw /
compaction summaries do, so leftover sentinels never leak into
rendered output.
Theme detection:
- New tui.DetectThemeFromBackground(timeout) probes the terminal
via OSC 11, parses rgb:RRRR/GGGG/BBBB, and returns Light if
Rec. 709 luma >= 0.5 else Dark. ZOT_THEME=dark|light overrides
the probe. Falls back to Dark when the terminal doesn't
respond within the timeout (Linux console, certain VS Code
configs, tmux without pass-through).
- cli.go now picks the theme via DetectThemeFromBackground
instead of hard-coding tui.Dark.
- Light theme bubble palette tuned (UserBubbleBG 254 / FG 240) so
user rows stay legible on a light terminal.
Editor / spinner polish:
- Editor prompt remains the AccentBar in Theme.Accent.
- One "funny working line" entry softened from "go generics" to
"the code".
Slash command popup and dialogs (login / model / sessions / jump /
btw / changelog / etc.) used to sit flush against the chat or
welcome content above. Add one blank row before the suggest popup
when it is non-empty and one above the dialog block when a dialog
is showing.
Cursor-row math gains a dialogLead offset so popup-side cursor
branches (btwDialog, login dialog, sessionDialog) land in the
right cell now that the dialog starts one row lower.
ListSessions sorted by filename descending. Filenames embed the
creation timestamp, so /sessions, --continue, and the resume
picker were effectively ordered by when each session was first
created, not when the user last worked in it.
Sort by filesystem ModTime descending instead, with filename desc
as a stable tie-break. The session you most recently touched now
sits at the top of every picker and is what --continue resumes,
even when newer-but-idle sessions exist.
When the user types while a turn is in flight the queued messages
render as "sliding in:" chips above the status bar. Previously
they sat flush against the streaming chat content above. Prepend
a single blank row so the chips have air on top.
The trailer blank inside the queue slice is removed because the
bottom-region assembly already inserts a blank above statusLines;
keeping it doubled the gap below the chips.
A drag-drop into VS Code's terminal arrives as a stream of single
KeyRune events (no bracketed paste). The main loop redrew between
every rune, so on a long session the path appeared to type in over
several seconds instead of landing instantly.
Bump the input channel buffer to 256 and, after handling one key,
non-blockingly drain every other key already queued before
issuing one invalidate. The whole burst processes in a single
main-loop pass and the screen repaints once at the end.
Anthropic returns 'At least one of the image dimensions exceed max
allowed size for many-image requests' when a request contains more
than one image and any of them is larger than 2000 px on the
longest side. Single-image requests get a more lenient 8000 px
cap. OpenAI has no such limit, so a session that worked fine on
gpt-* breaks the moment /model swaps to a Claude family while the
transcript still holds high-res screenshots.
Add anthShrinkImageBytesIfTooBig that decodes any outbound image
exceeding 2000 px on the longest side, resamples it with Catmull-
Rom from golang.org/x/image/draw, and re-encodes in the same
format. JPEG stays JPEG, PNG stays PNG, GIF gets re-encoded as
PNG (we'd otherwise need to compose a single-frame gif by hand).
Decode/encode failures fall back to the original bytes so the
caller's flow is untouched and Anthropic's own error message
surfaces if the image is genuinely unusable.
Both Anthropic encode sites \u2014 the top-level message converter
and the tool-result inner converter \u2014 run every image through
the helper, so screenshots from earlier turns also get resized
on subsequent requests instead of failing forever once the
transcript captured them at full size.
Adds golang.org/x/image v0.18.0 (compatible with go 1.22).
Build() emits a trailing blank after every message, so the final
assistant reply (and any optional addendum like /help, the OK
line, or extension notes) ended its block with one blank row.
After we added a leading blank above the status block, that gave
a doubled gap between the last conversation row and the status.
Strip trailing blank rows from the assembled chat slice once
every optional addition has been appended, so the conversation
sits flush against the single leading blank of the status block.
Insert one blank row between the chat area and the status bar
(spinner / model / cwd). Previously the status block sat flush
against the chat content above; the blank gives it air whether
the agent is busy (spinner first) or idle (model line first).
Cursor row math bumped to skip both the new top blank and the
existing blank between statusLines and edLines so the input
cursor still lands on the right cell.
The bridge already mirrored the assistant's text reply into the
paired Telegram chat but had no way to push real attachments. A
turn that came in over Telegram could only ever produce a textual
description of an image, never the image itself.
Add two model-facing tools, registered on the running agent only
while the bridge is connected:
- telegram_send_image(path, caption?) uploads a local image
(png/jpg/gif/webp) as an inline Telegram photo. Telegram
compresses for preview, which is what you usually want for a
screenshot or chart.
- telegram_send_file(path, caption?) uploads any local file as a
document attachment with no compression. Use for non-images or
when the recipient needs the original bytes.
Plumbing:
- Client.SendPhoto multipart upload mirrors SendDocument, hitting
sendPhoto so Telegram renders the image inline.
- Bridge.SendImage / SendDocument resolve the paired chat id and
return a clear error when the bridge is not running or no user
has paired yet.
- A small TelegramSender interface in package tools keeps the
tools package free of any telegram dependency; an adapter in
interactive.go forwards to the live *telegram.Bridge.
- applyTelegramTools mutates the running agent's tool registry on
/telegram connect / disconnect, on /model swaps, and on login
rebuilds. Walks the live registry rather than restoring from a
snapshot so extension or /reload-ext additions survive a later
disconnect; we only add or strip the two telegram entries.
Both tools respect the sandbox, refuse non-image inputs in
send_image, and reject directories. They return a one-line text
result the model can use to confirm the upload ("sent /path/foo.png
to telegram (1.2 MB)").
Two fixes for image-block rendering after the box refactor.
clipBottomClippedImages decides whether to suppress an image escape
this frame by scanning forward for the matching "image - ..." info
line, treating any non-blank intervening row as a blocker. The image
block layout is: escape row + N blank reservation rows + info row.
With tool boxes, the reservation rows are wrapped as "\u2502 \u2026 \u2502"
and the naive whitespace check now sees them as non-blank \u2014 the scan
broke before reaching the info line, decided the image was clipped,
and zeroed out the escape sequence. Result: a tall blank gap inside
the box where the image should be, with only the metadata caption
visible at the bottom (visible on iTerm2 / Kitty / WezTerm).
Add isBoxBlankLine() that strips ANSI escapes, surrounding whitespace,
and \u2502 box edges before checking for emptiness. Use it in both
clipBottomClippedImages and snapViewportStartToImageBlock so the
viewport snap-back also recognises wrapped reservation rows.
Also add one extra blank reservation row between the image footprint
and the muted "image - mime - WxH - sizeKB" caption, so the caption
doesn't sit flush against the last pixel row of the image.
Until now an interactive session's turns lived only in the running
agent's in-memory transcript; the session file on disk got nothing
new until iv.Run returned and the deferred WriteNewTranscript at
the bottom of runInteractive fired. That meant any process death
that bypassed the deferred flush \u2014 closed terminal window (SIGHUP),
system shutdown (SIGTERM), kill -9, an OS / power crash \u2014 took the
entire session with it. The window of data-at-risk was "everything
since the TUI started or last switched session."
Two changes that close the window:
- Add OnMessageAppended / OnUsage hooks on core.Agent, fired right
after each transcript message is appended (user prompt, finalised
assistant turn, tool-results message, the OpenAI image mirror)
and after each usage event arrives. The interactive runtime in
cli.go binds them to sess.AppendMessage / sess.AppendUsage so
every finished turn lands in the session JSONL the moment it's
durable in memory. A persistMu mutex coordinates the agent
goroutine's per-message writes with the TUI goroutine's session
swap (/sessions) and explicit flushes (/session export).
sessBaselineMsgs advances in lock-step so the exit-time
WriteNewTranscript no longer double-writes rows already on disk.
- Install a SIGTERM / SIGHUP handler in runInteractive that flushes
any not-yet-persisted in-flight turn before exiting. SIGINT is
intentionally NOT handled \u2014 the TUI consumes Ctrl+C as a regular
key event for cancel/clear semantics, so installing a SIGINT
notifier here would swallow it. The handler exits with os.Exit(0)
rather than re-raising, because re-raising would skip the chance
to flush and the only at-risk state we care about (the session
file) is already flushed by the time we get there.
Build closures (BuildAgent / BuildAgentFor) are re-wrapped after
the persistence wiring exists so any agent the TUI constructs on
login or /model swap also gets the hooks; otherwise switching
provider would silently revert to the old in-memory-only behaviour.
Print and JSON modes are unaffected: they run a single turn and
already flush via WriteNewTranscript at the end of their handler.
UI polish:
- Add Theme.AccentBar(c) helper that returns the half-block leader
("\u258c ") in colour c. Use it everywhere a speaker / prompt bar is
drawn: main editor prompt, /btw editor + speaker labels, login
code editor, welcome banner, --help headline, and the chat side
speaker headers (you / zot, including the streaming overlay).
Single source of truth for the bar style across the UI.
- Insert one blank row between the status bar and the editor and
one trailing blank below the editor so the input has breathing
room from the surrounding chrome instead of sitting flush against
the status line and the terminal edge. Cursor row math is bumped
+1 to account for the inserted row.
- Status bar narrow split: when the idle status line would exceed
the terminal width, split it into provider/model on one row,
token+cost+context stats on the next, then cwd, instead of
letting the terminal hard-wrap mid-line. Mirrors the existing
busy-prefix split.
Session cost restoration:
- Add core.SessionUsage(path) that scans a session file for the
latest "usage" row and returns its cumulative usage (the running
session total). Old sessions with no usage rows return zero.
- Seed the agent with that cumulative usage on every load path:
/sessions picker, --continue, --resume, --session. Previously
loading a session restored the messages but not the cost, so the
status bar showed \/bin/bash.000 until the next turn produced a fresh
EvUsage event.
- Mirror the seeded cost into i.cumUsage on NewInteractive (CLI
startup loads) and applySessionSelection (in-tui /sessions load)
so the status bar reflects the historical total immediately.