fix(anthropic): drop claude-code identity cache marker

oauth requests now exceed anthropic's 4-breakpoint cache_control limit when the conversation has 2+ user messages. previous layout emitted 5 markers: identity + system + tools + 2 user messages. drop the marker on the small claude-code identity line. it's a few tokens and gets folded into the cached prefix implicitly when the request matches turn-over-turn anyway. budget now: system + tools + last 2 user messages = 4. fits. reproduces the user-reported error: anthropic: http 400 ... A maximum of 4 blocks with cache_control may be provided. Found 5. verified by sending two consecutive prompts through zot rpc on an oauth credential -- first turn returns the assistant message cleanly, second turn does too instead of 400ing.
2026-06-26 21:36:31 +02:00 · 2026-04-19 12:39:33 +02:00 · 2026-04-19 12:39:33 +02:00 · 3ff6d9e6b7
commit 3ff6d9e6b7
parent ebc5dad18c
1 changed files with 8 additions and 3 deletions
--- a/internal/provider/anthropic.go
+++ b/internal/provider/anthropic.go
@ -184,11 +184,16 @@ func (c *anthropicClient) buildRequest(req Request) (*anthRequest, error) {
 	// System prompt assembly differs between api-key and OAuth modes.
 	// OAuth requests MUST begin with the Claude Code identity line or
 	// Anthropic rejects them (429 rate_limit_error with zero tokens used).
+	//
+	// Cache budget: anthropic caps cache_control to 4 breakpoints per
+	// request. We spend them on (system prompt) + (tools tail) + (last
+	// two user messages). The claude-code identity line stays uncached
+	// because it's a few tokens and gets folded into the larger prefix
+	// implicitly anyway.
 	if c.oauthTok != "" {
 		out.System = []anthSystemBlock{{
-			Type:         "text",
-			Text:         claudeCodeIdentity,
-			CacheControl: &anthCacheCtrl{Type: "ephemeral"},
+			Type: "text",
+			Text: claudeCodeIdentity,
 		}}
 		if req.System != "" {
 			out.System = append(out.System, anthSystemBlock{