mirror of
https://github.com/patriceckhart/zot.git
synced 2026-06-26 21:36:31 +02:00
fix(anthropic): drop claude-code identity cache marker
oauth requests now exceed anthropic's 4-breakpoint cache_control limit when the conversation has 2+ user messages. previous layout emitted 5 markers: identity + system + tools + 2 user messages. drop the marker on the small claude-code identity line. it's a few tokens and gets folded into the cached prefix implicitly when the request matches turn-over-turn anyway. budget now: system + tools + last 2 user messages = 4. fits. reproduces the user-reported error: anthropic: http 400 ... A maximum of 4 blocks with cache_control may be provided. Found 5. verified by sending two consecutive prompts through zot rpc on an oauth credential -- first turn returns the assistant message cleanly, second turn does too instead of 400ing.
This commit is contained in:
parent
ebc5dad18c
commit
3ff6d9e6b7
1 changed files with 8 additions and 3 deletions
|
|
@ -184,11 +184,16 @@ func (c *anthropicClient) buildRequest(req Request) (*anthRequest, error) {
|
|||
// System prompt assembly differs between api-key and OAuth modes.
|
||||
// OAuth requests MUST begin with the Claude Code identity line or
|
||||
// Anthropic rejects them (429 rate_limit_error with zero tokens used).
|
||||
//
|
||||
// Cache budget: anthropic caps cache_control to 4 breakpoints per
|
||||
// request. We spend them on (system prompt) + (tools tail) + (last
|
||||
// two user messages). The claude-code identity line stays uncached
|
||||
// because it's a few tokens and gets folded into the larger prefix
|
||||
// implicitly anyway.
|
||||
if c.oauthTok != "" {
|
||||
out.System = []anthSystemBlock{{
|
||||
Type: "text",
|
||||
Text: claudeCodeIdentity,
|
||||
CacheControl: &anthCacheCtrl{Type: "ephemeral"},
|
||||
Type: "text",
|
||||
Text: claudeCodeIdentity,
|
||||
}}
|
||||
if req.System != "" {
|
||||
out.System = append(out.System, anthSystemBlock{
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue