mirror of
https://github.com/patriceckhart/zot.git
synced 2026-06-27 05:46:34 +02:00
The status-bar was showing 2x the real cost. Anthropic's SSE stream
sends the full cumulative usage payload on both message_start AND
message_delta, and our code was summing them with += on each. Cache
tokens, the biggest cost component on multi-turn sessions, were
therefore counted twice on every single API call.
Fix: assign instead of accumulate within one Stream() invocation.
Cross-call accumulation still happens correctly in
core.CostTracker.Add(). Verified end-to-end: a truly fresh "read
sample.ts on desktop" session that used to report $0.15 now reports
$0.07 with the same cache-hit rate.
While chasing that, audited and corrected the rest of the request
pipeline so the cache actually hits cleanly.
Provider layer (internal/provider/anthropic.go):
- cache_control on the Claude Code identity line (was uncached),
giving Anthropic a first stable checkpoint independent of the
user system prompt. Turns a cold start from R=0 into R>0 for
any subsequent fresh session within the cache TTL.
- tool_result blocks go in their OWN new user message instead of
merging into the preceding user message. Merging was mutating
the prior user message's content array between turns, busting
byte-identical prefix match in Anthropic's cache.
- tagLastUserCache: exactly one cache_control on the last user
message (was two), so identity + sysprompt + last-tool +
last-user fits Anthropic's 4-breakpoint budget exactly.
- user-agent dropped its "(external, cli)" suffix to match the
canonical Claude Code string exactly.
- ZOT_DEBUG_ANTHROPIC=<path> env hook appends each outgoing
request body (one JSON object per line) to that file. Off by
default; for debugging cache / cost issues in the field.
- Usage field handling now correctly assigns the latest value
from each SSE event instead of summing.
Core (internal/core/tool.go):
- Registry.Specs() now sorts tools alphabetically. Go map
iteration order is randomized per call; randomized tool arrays
were breaking Anthropic's byte-level prefix match on every
single call within a session.
System prompt (internal/agent/systemprompt.go):
- Restored a substantial default prompt with structured tools +
operating guidelines sections. The earlier aggressive trim
dropped us under Anthropic's 1024-token minimum cacheable
prefix floor: prefixes below 1024 tokens are silently NOT
cached by Anthropic, so every fresh session started cold with
R=0 no matter what else we did.
- Current default ~1040 tokens on its own; with identity and
tools it's ~1400, comfortably above the 1024 floor.
- --system-prompt, --append-system-prompt, and
$ZOT_HOME/SYSTEM.md escape hatches all still work and take
precedence.
Model catalog (internal/provider/models.go):
- claude-opus-4-5: 1M ctx / 128k max -> 200k ctx / 64k max. I had
over-extrapolated; 1M context is a 4.6+ feature.
- gpt-5.4: 400k -> 272k. Canonical value on both the OpenAI
direct API and the ChatGPT Codex OAuth backend.
- gpt-5.1, gpt-5.2, gpt-5.3, gpt-5.4-mini: pinned to 272k.
OpenAI advertises 400k on direct and Codex caps at 272k. zot
serves both from one catalog row per id, so we pin to the
smaller number to keep the context-usage meter honest under
subscription auth. Direct-API users see a conservative estimate
instead of an inflated one.
README:
- Tiny capitalization touch-up on the opening line.
81 lines
2.2 KiB
Go
81 lines
2.2 KiB
Go
// Package core implements the agent loop, tool runtime, and session
|
|
// persistence. It is provider-agnostic: it talks to an LLM only through
|
|
// the provider.Client interface.
|
|
package core
|
|
|
|
import (
|
|
"context"
|
|
"encoding/json"
|
|
"fmt"
|
|
"sort"
|
|
|
|
"github.com/patriceckhart/zot/internal/provider"
|
|
)
|
|
|
|
// Tool is a capability the agent can invoke.
|
|
type Tool interface {
|
|
// Name is the unique tool id shown to the LLM.
|
|
Name() string
|
|
// Description is a one-line summary shown to the LLM.
|
|
Description() string
|
|
// Schema is a JSON Schema object for Execute's args.
|
|
Schema() json.RawMessage
|
|
// Execute runs the tool. progress may be called any number of times
|
|
// with partial textual output (for UIs); it is not sent to the LLM.
|
|
Execute(ctx context.Context, args json.RawMessage, progress func(string)) (ToolResult, error)
|
|
}
|
|
|
|
// ToolResult is the outcome of Tool.Execute.
|
|
type ToolResult struct {
|
|
// Content is sent back to the LLM (text and/or images).
|
|
Content []provider.Content
|
|
// IsError marks this result as an error to the LLM.
|
|
IsError bool
|
|
// Details is arbitrary data for UIs and logs; not sent to the LLM.
|
|
Details any
|
|
}
|
|
|
|
// Registry is a name->Tool map.
|
|
type Registry map[string]Tool
|
|
|
|
// NewRegistry builds a Registry from a list of tools.
|
|
func NewRegistry(tools ...Tool) Registry {
|
|
r := Registry{}
|
|
for _, t := range tools {
|
|
r[t.Name()] = t
|
|
}
|
|
return r
|
|
}
|
|
|
|
// Specs returns the tool definitions to advertise to the LLM.
|
|
// Sorted by tool name so the order is stable across requests. This
|
|
// is load-bearing for provider-side prompt caching: providers
|
|
// prefix-match tool definitions, and Go's map iteration order is
|
|
// randomized per call, which would otherwise bust the cache every
|
|
// single turn.
|
|
func (r Registry) Specs() []provider.Tool {
|
|
names := make([]string, 0, len(r))
|
|
for name := range r {
|
|
names = append(names, name)
|
|
}
|
|
sort.Strings(names)
|
|
out := make([]provider.Tool, 0, len(r))
|
|
for _, name := range names {
|
|
t := r[name]
|
|
out = append(out, provider.Tool{
|
|
Name: t.Name(),
|
|
Description: t.Description(),
|
|
Schema: t.Schema(),
|
|
})
|
|
}
|
|
return out
|
|
}
|
|
|
|
// Get looks up a tool by name.
|
|
func (r Registry) Get(name string) (Tool, error) {
|
|
t, ok := r[name]
|
|
if !ok {
|
|
return nil, fmt.Errorf("unknown tool %q", name)
|
|
}
|
|
return t, nil
|
|
}
|