mirror of
https://github.com/patriceckhart/zot.git
synced 2026-06-26 21:36:31 +02:00
The status-bar was showing 2x the real cost. Anthropic's SSE stream
sends the full cumulative usage payload on both message_start AND
message_delta, and our code was summing them with += on each. Cache
tokens, the biggest cost component on multi-turn sessions, were
therefore counted twice on every single API call.
Fix: assign instead of accumulate within one Stream() invocation.
Cross-call accumulation still happens correctly in
core.CostTracker.Add(). Verified end-to-end: a truly fresh "read
sample.ts on desktop" session that used to report $0.15 now reports
$0.07 with the same cache-hit rate.
While chasing that, audited and corrected the rest of the request
pipeline so the cache actually hits cleanly.
Provider layer (internal/provider/anthropic.go):
- cache_control on the Claude Code identity line (was uncached),
giving Anthropic a first stable checkpoint independent of the
user system prompt. Turns a cold start from R=0 into R>0 for
any subsequent fresh session within the cache TTL.
- tool_result blocks go in their OWN new user message instead of
merging into the preceding user message. Merging was mutating
the prior user message's content array between turns, busting
byte-identical prefix match in Anthropic's cache.
- tagLastUserCache: exactly one cache_control on the last user
message (was two), so identity + sysprompt + last-tool +
last-user fits Anthropic's 4-breakpoint budget exactly.
- user-agent dropped its "(external, cli)" suffix to match the
canonical Claude Code string exactly.
- ZOT_DEBUG_ANTHROPIC=<path> env hook appends each outgoing
request body (one JSON object per line) to that file. Off by
default; for debugging cache / cost issues in the field.
- Usage field handling now correctly assigns the latest value
from each SSE event instead of summing.
Core (internal/core/tool.go):
- Registry.Specs() now sorts tools alphabetically. Go map
iteration order is randomized per call; randomized tool arrays
were breaking Anthropic's byte-level prefix match on every
single call within a session.
System prompt (internal/agent/systemprompt.go):
- Restored a substantial default prompt with structured tools +
operating guidelines sections. The earlier aggressive trim
dropped us under Anthropic's 1024-token minimum cacheable
prefix floor: prefixes below 1024 tokens are silently NOT
cached by Anthropic, so every fresh session started cold with
R=0 no matter what else we did.
- Current default ~1040 tokens on its own; with identity and
tools it's ~1400, comfortably above the 1024 floor.
- --system-prompt, --append-system-prompt, and
$ZOT_HOME/SYSTEM.md escape hatches all still work and take
precedence.
Model catalog (internal/provider/models.go):
- claude-opus-4-5: 1M ctx / 128k max -> 200k ctx / 64k max. I had
over-extrapolated; 1M context is a 4.6+ feature.
- gpt-5.4: 400k -> 272k. Canonical value on both the OpenAI
direct API and the ChatGPT Codex OAuth backend.
- gpt-5.1, gpt-5.2, gpt-5.3, gpt-5.4-mini: pinned to 272k.
OpenAI advertises 400k on direct and Codex caps at 272k. zot
serves both from one catalog row per id, so we pin to the
smaller number to keep the context-usage meter honest under
subscription auth. Direct-API users see a conservative estimate
instead of an inflated one.
README:
- Tiny capitalization touch-up on the opening line.
|
||
|---|---|---|
| .. | ||
| agent | ||
| assets | ||
| auth | ||
| core | ||
| extproto | ||
| provider | ||
| skills | ||
| tui | ||