zot/internal
patriceckhart b188801c32 perf(prompt): slim system prompt from 1064 tokens to 126 tokens
Earlier I bloated the default system prompt on purpose to cross
Anthropic's 1024-token cacheable-prefix floor, on the theory that
small prompts lose every fresh session to R=0. That theory turned
out to be wrong: the real reason fresh sessions looked expensive
was the double-counting bug in Stream() (message_start and
message_delta both ship cumulative usage, we were summing both).
Once that was fixed, the padding stopped earning its bytes.

New default:

  You are zot, a lightweight terminal coding agent. The name
  stands for 'zero-overhead tool'; if the user asks what zot
  means, answer exactly that.

  Your output renders in a TUI that understands markdown for
  prose and plain text for tool output. Use markdown freely,
  keep answers concise, and let tool calls speak for themselves
  rather than narrating them in prose before you invoke them.
  Act first, then summarise what you did.

Removed the tool-listing section (the provider already advertises
tools in the request's tools[] array, so listing them in prose was
pure duplication) and the full operating-guidelines block
(frontier models already internalise "prefer edit over write",
"read before editing", "don't run sudo", etc.).

Benchmarked head-to-head on a fresh 2-turn session with the same
scenario:

  heavy prompt (1064 tokens total):
    turn 1: in=7 R=0 W=7550 out=117    $0.0501
    turn 2: in=6 R=2714 W=2176 out=61  $0.0165
    total:                              $0.0666

  slim prompt (126 tokens total):
    turn 1: in=1522 R=0 W=3636 out=111 $0.0334
    turn 2: in=6 R=3642 W=60 out=71    $0.0040
    total:                              $0.0374

Slim is 44% cheaper across two turns. Turn 2 is where it really
pays off (W=60 vs W=2176): every extra token in the base prompt
gets re-written on every turn because the trailing-user cache
checkpoint keeps advancing.

--system-prompt, --append-system-prompt, and $ZOT_HOME/SYSTEM.md
still work and take precedence for users who want more biasing.

'what does zot mean?' still returns exactly 'zero-overhead tool'.
2026-04-19 19:23:16 +02:00
..
agent perf(prompt): slim system prompt from 1064 tokens to 126 tokens 2026-04-19 19:23:16 +02:00
assets add logo to callback page 2026-04-18 10:15:53 +02:00
auth add auto compaction 2026-04-18 10:34:08 +02:00
core fix(no-yolo): don't auto-refuse tool calls in non-interactive modes 2026-04-19 19:17:05 +02:00
extproto feat(ext): phase 4 - full-event interception, arg rewrites, /reload-ext 2026-04-19 17:02:04 +02:00
provider perf(anthropic): fix cost double-count, tighten caching, correct catalog 2026-04-19 18:57:18 +02:00
skills perf(prompt): cut system prompt to the bone (410 -> 54 tokens) 2026-04-19 17:39:38 +02:00
tui perf(read): drop line numbers from model-facing output 2026-04-19 17:33:05 +02:00