Commit graph

5 commits

Author SHA1 Message Date
patriceckhart
f371687654 perf(anthropic): fix cost double-count, tighten caching, correct catalog
The status-bar was showing 2x the real cost. Anthropic's SSE stream
sends the full cumulative usage payload on both message_start AND
message_delta, and our code was summing them with += on each. Cache
tokens, the biggest cost component on multi-turn sessions, were
therefore counted twice on every single API call.

Fix: assign instead of accumulate within one Stream() invocation.
Cross-call accumulation still happens correctly in
core.CostTracker.Add(). Verified end-to-end: a truly fresh "read
sample.ts on desktop" session that used to report $0.15 now reports
$0.07 with the same cache-hit rate.

While chasing that, audited and corrected the rest of the request
pipeline so the cache actually hits cleanly.

Provider layer (internal/provider/anthropic.go):
  - cache_control on the Claude Code identity line (was uncached),
    giving Anthropic a first stable checkpoint independent of the
    user system prompt. Turns a cold start from R=0 into R>0 for
    any subsequent fresh session within the cache TTL.
  - tool_result blocks go in their OWN new user message instead of
    merging into the preceding user message. Merging was mutating
    the prior user message's content array between turns, busting
    byte-identical prefix match in Anthropic's cache.
  - tagLastUserCache: exactly one cache_control on the last user
    message (was two), so identity + sysprompt + last-tool +
    last-user fits Anthropic's 4-breakpoint budget exactly.
  - user-agent dropped its "(external, cli)" suffix to match the
    canonical Claude Code string exactly.
  - ZOT_DEBUG_ANTHROPIC=<path> env hook appends each outgoing
    request body (one JSON object per line) to that file. Off by
    default; for debugging cache / cost issues in the field.
  - Usage field handling now correctly assigns the latest value
    from each SSE event instead of summing.

Core (internal/core/tool.go):
  - Registry.Specs() now sorts tools alphabetically. Go map
    iteration order is randomized per call; randomized tool arrays
    were breaking Anthropic's byte-level prefix match on every
    single call within a session.

System prompt (internal/agent/systemprompt.go):
  - Restored a substantial default prompt with structured tools +
    operating guidelines sections. The earlier aggressive trim
    dropped us under Anthropic's 1024-token minimum cacheable
    prefix floor: prefixes below 1024 tokens are silently NOT
    cached by Anthropic, so every fresh session started cold with
    R=0 no matter what else we did.
  - Current default ~1040 tokens on its own; with identity and
    tools it's ~1400, comfortably above the 1024 floor.
  - --system-prompt, --append-system-prompt, and
    $ZOT_HOME/SYSTEM.md escape hatches all still work and take
    precedence.

Model catalog (internal/provider/models.go):
  - claude-opus-4-5: 1M ctx / 128k max -> 200k ctx / 64k max. I had
    over-extrapolated; 1M context is a 4.6+ feature.
  - gpt-5.4: 400k -> 272k. Canonical value on both the OpenAI
    direct API and the ChatGPT Codex OAuth backend.
  - gpt-5.1, gpt-5.2, gpt-5.3, gpt-5.4-mini: pinned to 272k.
    OpenAI advertises 400k on direct and Codex caps at 272k. zot
    serves both from one catalog row per id, so we pin to the
    smaller number to keep the context-usage meter honest under
    subscription auth. Direct-API users see a conservative estimate
    instead of an inflated one.

README:
  - Tiny capitalization touch-up on the opening line.
2026-04-19 18:57:18 +02:00
patriceckhart
3ff6d9e6b7 fix(anthropic): drop claude-code identity cache marker
oauth requests now exceed anthropic's 4-breakpoint cache_control
limit when the conversation has 2+ user messages. previous layout
emitted 5 markers: identity + system + tools + 2 user messages.

drop the marker on the small claude-code identity line. it's a few
tokens and gets folded into the cached prefix implicitly when the
request matches turn-over-turn anyway. budget now: system + tools +
last 2 user messages = 4. fits.

reproduces the user-reported error:
  anthropic: http 400 ... A maximum of 4 blocks with cache_control
  may be provided. Found 5.

verified by sending two consecutive prompts through zot rpc on an
oauth credential -- first turn returns the assistant message
cleanly, second turn does too instead of 400ing.
2026-04-19 12:39:33 +02:00
patriceckhart
a670442c9e feat: zotcore SDK + zot rpc subprocess protocol
two new ways to embed the zot agent runtime in third-party apps:

1. pkg/zotcore - public Go SDK
   - Runtime type: New(Config), Prompt(ctx,text,imgs)->chan Event,
     Cancel, Compact, SetModel, State, Messages, Cost, ListModels,
     Close. Concurrent-safe; one prompt at a time per Runtime,
     ErrBusy if you try to overlap. Spawn multiple Runtimes for
     multiple projects.
   - Public types mirror the JSON-RPC wire schema 1:1 so consumers
     can share parsing code with the out-of-process clients.
   - Internal core/agent/provider stay internal; SDK is a thin
     facade that exposes only what's stable.

2. zot rpc subcommand - newline-delimited JSON on stdin/stdout
   - 'zot rpc' (or 'zot --rpc') turns the agent runtime into a
     subprocess that any language can drive via pipes.
   - Commands: hello, prompt, abort, compact, get_state,
     get_messages, clear, set_model, get_models, ping. Each
     optionally carries an id; the matching response echoes it.
   - Stream notifications: turn_start, user_message,
     assistant_start, text_delta, tool_call, tool_progress,
     tool_result, assistant_message, usage, turn_end, done,
     error, compact_done. Same shape as the existing --json mode
     events (modes.EventToJSON / ContentToJSON were exported
     for reuse).
   - Auth: optional ZOTCORE_RPC_TOKEN env var; first command
     must be hello {token: ...} when set. Without the env var
     the spawning process is implicitly trusted.
   - Concurrency: one prompt or compact at a time per process,
     enforced by a turnMu mutex. abort fires immediately
     regardless. Stdin close exits the process.

3. docs/rpc.md - full schema reference
4. examples/rpc/{python,node,shell,go} - reference clients
5. examples/sdk - in-process Go embedding example
6. README updated with a new modes entry and an embedding section
2026-04-19 12:26:48 +02:00
patriceckhart
2158c272af fix ci: portable syscall.Select via x/sys/unix; gofmt pass
- rewrite resize_unix.go on top of golang.org/x/sys/unix so the
  peek-stdin helper compiles on linux (Select returns (int, error),
  Timeval.Usec is int64) as well as darwin (int32, error-only)
- promote golang.org/x/sys to a direct dep
- gofmt -w . (11 files of alignment drift from recent edits)
- install.sh / install.ps1: accept $GITHUB_TOKEN so the installers
  work against the repo while it's private; no-op on public repos
- README: document the private-repo install paths (PAT for curl|bash
  and powershell, GOPRIVATE for go install)
2026-04-18 10:55:42 +02:00
patriceckhart
6cced27476 initial commit 2026-04-17 20:36:38 +02:00