zot/internal at v0.0.41 - clawdie/zot - Forgejo: Beyond coding. We Forge.

mirror of https://github.com/patriceckhart/zot.git synced 2026-06-26 21:36:31 +02:00

History

patriceckhart f371687654 perf(anthropic): fix cost double-count, tighten caching, correct catalog The status-bar was showing 2x the real cost. Anthropic's SSE stream sends the full cumulative usage payload on both message_start AND message_delta, and our code was summing them with += on each. Cache tokens, the biggest cost component on multi-turn sessions, were therefore counted twice on every single API call. Fix: assign instead of accumulate within one Stream() invocation. Cross-call accumulation still happens correctly in core.CostTracker.Add(). Verified end-to-end: a truly fresh "read sample.ts on desktop" session that used to report $0.15 now reports $0.07 with the same cache-hit rate. While chasing that, audited and corrected the rest of the request pipeline so the cache actually hits cleanly. Provider layer (internal/provider/anthropic.go): - cache_control on the Claude Code identity line (was uncached), giving Anthropic a first stable checkpoint independent of the user system prompt. Turns a cold start from R=0 into R>0 for any subsequent fresh session within the cache TTL. - tool_result blocks go in their OWN new user message instead of merging into the preceding user message. Merging was mutating the prior user message's content array between turns, busting byte-identical prefix match in Anthropic's cache. - tagLastUserCache: exactly one cache_control on the last user message (was two), so identity + sysprompt + last-tool + last-user fits Anthropic's 4-breakpoint budget exactly. - user-agent dropped its "(external, cli)" suffix to match the canonical Claude Code string exactly. - ZOT_DEBUG_ANTHROPIC=<path> env hook appends each outgoing request body (one JSON object per line) to that file. Off by default; for debugging cache / cost issues in the field. - Usage field handling now correctly assigns the latest value from each SSE event instead of summing. Core (internal/core/tool.go): - Registry.Specs() now sorts tools alphabetically. Go map iteration order is randomized per call; randomized tool arrays were breaking Anthropic's byte-level prefix match on every single call within a session. System prompt (internal/agent/systemprompt.go): - Restored a substantial default prompt with structured tools + operating guidelines sections. The earlier aggressive trim dropped us under Anthropic's 1024-token minimum cacheable prefix floor: prefixes below 1024 tokens are silently NOT cached by Anthropic, so every fresh session started cold with R=0 no matter what else we did. - Current default ~1040 tokens on its own; with identity and tools it's ~1400, comfortably above the 1024 floor. - --system-prompt, --append-system-prompt, and $ZOT_HOME/SYSTEM.md escape hatches all still work and take precedence. Model catalog (internal/provider/models.go): - claude-opus-4-5: 1M ctx / 128k max -> 200k ctx / 64k max. I had over-extrapolated; 1M context is a 4.6+ feature. - gpt-5.4: 400k -> 272k. Canonical value on both the OpenAI direct API and the ChatGPT Codex OAuth backend. - gpt-5.1, gpt-5.2, gpt-5.3, gpt-5.4-mini: pinned to 272k. OpenAI advertises 400k on direct and Codex caps at 272k. zot serves both from one catalog row per id, so we pin to the smaller number to keep the context-usage meter honest under subscription auth. Direct-API users see a conservative estimate instead of an inflated one. README: - Tiny capitalization touch-up on the opening line.		2026-04-19 18:57:18 +02:00
..
agent	perf(anthropic): fix cost double-count, tighten caching, correct catalog	2026-04-19 18:57:18 +02:00
assets	add logo to callback page	2026-04-18 10:15:53 +02:00
auth	add auto compaction	2026-04-18 10:34:08 +02:00
core	perf(anthropic): fix cost double-count, tighten caching, correct catalog	2026-04-19 18:57:18 +02:00
extproto	feat(ext): phase 4 - full-event interception, arg rewrites, /reload-ext	2026-04-19 17:02:04 +02:00
provider	perf(anthropic): fix cost double-count, tighten caching, correct catalog	2026-04-19 18:57:18 +02:00
skills	perf(prompt): cut system prompt to the bone (410 -> 54 tokens)	2026-04-19 17:39:38 +02:00
tui	perf(read): drop line numbers from model-facing output	2026-04-19 17:33:05 +02:00