diff --git a/docs/COLIBRI-TOKENOMICS-TRIFECTA.md b/docs/COLIBRI-TOKENOMICS-TRIFECTA.md new file mode 100644 index 0000000..a31d4f9 --- /dev/null +++ b/docs/COLIBRI-TOKENOMICS-TRIFECTA.md @@ -0,0 +1,156 @@ +# Colibri Tokenomics — The Trifecta Framework + +**Source:** Indie Devdan, "Agent Specs: The Unreasonable Effectiveness of Useful Tokens" +(https://www.youtube.com/watch?v=o4KZH_KSqYQ) +**Date:** 2026-06-01 +**Status:** Strategic vision — maps to existing T1.4/T1.5 work + +> **Scope:** This applies to the full Colibri control plane. The simplified +> `clawdie` operator lane intentionally ships none of this (no cost modes, +> quotas, or metering) — see `docs/CLAWDIE-AGENT-WIKI.md`. + +## Core Thesis + +``` +More useful tokens > fewer useful tokens +Cost per intelligence > cost per token +If you don't measure, you can't improve +``` + +The video validates what Colibri is already building: a cache-first, +measure-everything agent runtime. The "trifecta" is our north star. + +## The Trifecta + +| Axis | What it means for agents | Colibri surface | +| ----------- | ---------------------------------------------- | ---------------------------------------- | +| Performance | Did the agent get it right? Task success rate | Task outcomes, eval harness (T1.6) | +| Speed | Tokens/second, cache-hit ratio, latency | `colibri-deepseek` cache probe, T1.4 | +| Cost | Dollars per task. Not per token — per _result_ | `cost.rs` CostMode, escalation, metering | + +You cannot optimize one without understanding impact on the other two. +A cheap model that needs 5 retries is more expensive than a capable model +that gets it right in one shot. + +## Token Arbitrage (the "golden line") + +The video's key economic insight: **don't just spend tokens — arbitrage them.** + +Cache-hit tokens cost ~10% of fresh tokens (DeepSeek pricing). Every byte +in the stable prefix that hits cache is 90% cheaper. The arbitrage +strategy: + +1. **Maximize cache-hit surface**: byte-stable system prefix, skills, + tool definitions, agent identity — warm once, reuse thousands of times +2. **Spend where it counts**: conversation turns, tool results, novel + context — these are unavoidable, so make them _useful_ (VSpecs, + rich context, HTML plans) +3. **Trim where it doesn't**: auto-compaction, summarization, tool result + truncation — Colibri's 3-region model already does this + +### Existing Colibri arbitrage infrastructure + +``` +T1.4 Prompt Discipline (code present, integration in progress): + Region 1: STABLE_SYSTEM_PREFIX → cache-hit (90% cheaper) + Region 2: conversation log (compacted) → fresh tokens + Region 3: volatile scratch (empty) → zero cost + +CostMode escalation (Fast → Smart → Max): + Fast: 500K budget, compact tool results, 5 turns + Smart: 2M budget, keep tool results, 20 turns ← default + Max: 8M budget, full context, 100 turns + +Cache warming (T1.4 PR3b, merged): + Pre-warm STABLE_SYSTEM_PREFIX on daemon startup + Re-warm every N hours (configurable) + ~3,500 tokens per warm cycle → pays off in ~7 agent tasks +``` + +## What We Still Need (Trifecta Dashboard) + +The video's core message: observability isn't optional for production +agents. Colibri already captures the raw data. What's missing is the +trifecta view: + +### Per-task cost tracking + +``` +task_id: "abc123" +model: "deepseek-v4-flash" +tokens_in: 45,230 (12,100 cache-hit, 33,130 fresh) +tokens_out: 2,847 +cost: $0.047 (cache savings: $0.012) +latency: 8.3s +success: true +``` + +### Trifecta balance sheet + +``` +Performance ████████░░ 82% task success (rolling 24h) +Speed ██████░░░░ 61% cache-hit ratio +Cost ████████░░ $0.047 avg/task (target: <$0.05) +``` + +### Model selection arbitrage + +Given a task, Colibri should be able to answer: + +- Can this task be handled by a cheap model (DeepSeek V3, Gemini Flash)? +- Is the cache-hit ratio high enough that the premium model is actually cheaper? +- What's the cost delta between models for this specific task type? + +## Visual Specs (VSpecs) — Future Input Modality + +The video introduces "VSpecs": plans with embedded images generated by +GPT Image 2. Multimodal models (Gemini 3.5 Flash, GPT-5) read these +images as "useful tokens" — a UI mockup is worth 1000 words of text +description. + +For Colibri: this means the prompt assembly pipeline should eventually +support image tokens in Region 2 (conversation log). NOT for T1.4 — +this is T2.x territory. But the cost model should be ready for mixed +text+image token budgets. + +## Golden Rules (from the video, adapted for Colibri) + +1. **Measure everything.** Every tool call, every token, every dollar. + Colibri's glasspane architecture already captures the event stream; + the trifecta dashboard makes it actionable. + +2. **Arbitrage cache vs spend.** The stable prefix is free money. + Maximize its size, minimize its churn. + +3. **Cost per intelligence, not per token.** Don't compare model prices + in a vacuum. Compare cost-per-successful-task. A $0.05 task that + works is infinitely cheaper than a $0.01 task that fails. + +4. **Trade-offs are engineering.** There is no "best" model. There is + only the right model for THIS task, under THESE constraints. + +5. **Closed loop: measure → analyze → improve.** The trifecta dashboard + isn't a report — it's a feedback loop. Every task feeds back into + model selection, prompt design, and cache strategy. + +## Integration with Existing Work + +| Colibri component | Trifecta role | Status | +| --------------------------- | ------------------------------------- | ------- | +| `colibri-deepseek` | Cache probe, hit metering | ✅ done | +| `colibri-daemon/cost.rs` | CostMode, budget enforcement | ✅ done | +| `colibri-daemon/session.rs` | 3-region prompt, compaction | ✅ done | +| Cache warming (T1.4 PR3b) | Pre-warm stable prefix | ✅ done | +| Prompt discipline (T1.4) | Byte-stable assembly, cost-aware trim | 🔧 WIP | +| Trifecta dashboard (T1.5) | Per-task cost/speed/perf metrics | 📋 plan | +| Eval harness (T1.6) | Task success measurement | 📋 plan | +| Model selection (T2.x) | Arbitrage engine, cost-aware routing | 📋 plan | +| VSpec support (T2.x) | Image tokens in prompt assembly | 📋 plan | + +## Reference + +- Video: "Agent Specs: The Unreasonable Effectiveness of Useful Tokens" + https://www.youtube.com/watch?v=o4KZH_KSqYQ +- Colibri T1.4 Prompt Discipline: `docs/T1.4-PROMPT-DISCIPLINE-PLAN.md` +- Colibri T1.4 Cache Warming: `docs/T1.4-CACHE-WARMING-DESIGN.md` +- Colibri Glasspane Design: `docs/COLIBRI-GLASSPANE-DESIGN.md`