docs: Colibri Tokenomics — trifecta framework (performance/speed/cost)

Strategic vision integrating Indie Devdan's agent trifecta concept into the Colibri roadmap. 'More useful tokens > fewer useful tokens' mapped onto existing T1.4 cache-first architecture. Trifecta = Performance (task success) + Speed (cache-hit/latency) + Cost (dollars per result). Token arbitrage as the golden line: maximize cache-hit surface, spend on useful context, trim waste. Validates Colibri's 3-region prompt + CostMode + cache warming are already trifecta-aligned. Adds T1.5 (dashboard) and T2.x (model selection arbitrage, VSpec support) to roadmap.
2026-06-02 15:19:21 +02:00 · 2026-06-02 15:19:21 +02:00 · 7c82a89881
commit 7c82a89881
parent 35174b2f32
1 changed files with 151 additions and 0 deletions
--- a/docs/COLIBRI-TOKENOMICS-TRIFECTA.md
+++ b/docs/COLIBRI-TOKENOMICS-TRIFECTA.md
@ -0,0 +1,151 @@
+# Colibri Tokenomics — The Trifecta Framework
+
+**Source:** Indie Devdan, "Agent Specs: The Unreasonable Effectiveness of Useful Tokens"
+(https://www.youtube.com/watch?v=o4KZH_KSqYQ)
+**Date:** 2026-06-01
+**Status:** Strategic vision — maps to existing T1.4/T1.5 work
+
+## Core Thesis
+
+```
+More useful tokens > fewer useful tokens
+Cost per intelligence > cost per token
+If you don't measure, you can't improve
+```
+
+The video validates what Colibri is already building: a cache-first,
+measure-everything agent runtime. The "trifecta" is our north star.
+
+## The Trifecta
+
+| Axis        | What it means for agents                          | Colibri surface                       |
+|-------------|---------------------------------------------------|---------------------------------------|
+| Performance | Did the agent get it right? Task success rate     | Task outcomes, eval harness (T1.6)    |
+| Speed       | Tokens/second, cache-hit ratio, latency           | `colibri-deepseek` cache probe, T1.4  |
+| Cost        | Dollars per task. Not per token — per *result*    | `cost.rs` CostMode, escalation, metering |
+
+You cannot optimize one without understanding impact on the other two.
+A cheap model that needs 5 retries is more expensive than a capable model
+that gets it right in one shot.
+
+## Token Arbitrage (the "golden line")
+
+The video's key economic insight: **don't just spend tokens — arbitrage them.**
+
+Cache-hit tokens cost ~10% of fresh tokens (DeepSeek pricing). Every byte
+in the stable prefix that hits cache is 90% cheaper. The arbitrage
+strategy:
+
+1. **Maximize cache-hit surface**: byte-stable system prefix, skills,
+   tool definitions, agent identity — warm once, reuse thousands of times
+2. **Spend where it counts**: conversation turns, tool results, novel
+   context — these are unavoidable, so make them *useful* (VSpecs,
+   rich context, HTML plans)
+3. **Trim where it doesn't**: auto-compaction, summarization, tool result
+   truncation — Colibri's 3-region model already does this
+
+### Existing Colibri arbitrage infrastructure
+
+```
+T1.4 Prompt Discipline (code present, integration in progress):
+  Region 1: STABLE_SYSTEM_PREFIX          → cache-hit (90% cheaper)
+  Region 2: conversation log (compacted)  → fresh tokens
+  Region 3: volatile scratch (empty)      → zero cost
+
+CostMode escalation (Fast → Smart → Max):
+  Fast:    500K budget, compact tool results, 5 turns
+  Smart:   2M budget, keep tool results, 20 turns  ← default
+  Max:     8M budget, full context, 100 turns
+
+Cache warming (T1.4 PR3b, merged):
+  Pre-warm STABLE_SYSTEM_PREFIX on daemon startup
+  Re-warm every N hours (configurable)
+  ~3,500 tokens per warm cycle → pays off in ~7 agent tasks
+```
+
+## What We Still Need (Trifecta Dashboard)
+
+The video's core message: observability isn't optional for production
+agents. Colibri already captures the raw data. What's missing is the
+trifecta view:
+
+### Per-task cost tracking
+
+```
+task_id: "abc123"
+model: "deepseek-v4-pro"
+tokens_in: 45,230   (12,100 cache-hit, 33,130 fresh)
+tokens_out: 2,847
+cost: $0.047         (cache savings: $0.012)
+latency: 8.3s
+success: true
+```
+
+### Trifecta balance sheet
+
+```
+Performance  ████████░░  82% task success (rolling 24h)
+Speed        ██████░░░░  61% cache-hit ratio
+Cost         ████████░░  $0.047 avg/task (target: <$0.05)
+```
+
+### Model selection arbitrage
+
+Given a task, Colibri should be able to answer:
+- Can this task be handled by a cheap model (DeepSeek V3, Gemini Flash)?
+- Is the cache-hit ratio high enough that the premium model is actually cheaper?
+- What's the cost delta between models for this specific task type?
+
+## Visual Specs (VSpecs) — Future Input Modality
+
+The video introduces "VSpecs": plans with embedded images generated by
+GPT Image 2. Multimodal models (Gemini 3.5 Flash, GPT-5) read these
+images as "useful tokens" — a UI mockup is worth 1000 words of text
+description.
+
+For Colibri: this means the prompt assembly pipeline should eventually
+support image tokens in Region 2 (conversation log). NOT for T1.4 —
+this is T2.x territory. But the cost model should be ready for mixed
+text+image token budgets.
+
+## Golden Rules (from the video, adapted for Colibri)
+
+1. **Measure everything.** Every tool call, every token, every dollar.
+   Colibri's glasspane architecture already captures the event stream;
+   the trifecta dashboard makes it actionable.
+
+2. **Arbitrage cache vs spend.** The stable prefix is free money.
+   Maximize its size, minimize its churn.
+
+3. **Cost per intelligence, not per token.** Don't compare model prices
+   in a vacuum. Compare cost-per-successful-task. A $0.05 task that
+   works is infinitely cheaper than a $0.01 task that fails.
+
+4. **Trade-offs are engineering.** There is no "best" model. There is
+   only the right model for THIS task, under THESE constraints.
+
+5. **Closed loop: measure → analyze → improve.** The trifecta dashboard
+   isn't a report — it's a feedback loop. Every task feeds back into
+   model selection, prompt design, and cache strategy.
+
+## Integration with Existing Work
+
+| Colibri component            | Trifecta role                           | Status  |
+|------------------------------|-----------------------------------------|---------|
+| `colibri-deepseek`           | Cache probe, hit metering               | ✅ done |
+| `colibri-daemon/cost.rs`     | CostMode, budget enforcement            | ✅ done |
+| `colibri-daemon/session.rs`  | 3-region prompt, compaction             | ✅ done |
+| Cache warming (T1.4 PR3b)    | Pre-warm stable prefix                  | ✅ done |
+| Prompt discipline (T1.4)     | Byte-stable assembly, cost-aware trim   | 🔧 WIP  |
+| Trifecta dashboard (T1.5)    | Per-task cost/speed/perf metrics        | 📋 plan |
+| Eval harness (T1.6)          | Task success measurement                | 📋 plan |
+| Model selection (T2.x)       | Arbitrage engine, cost-aware routing    | 📋 plan |
+| VSpec support (T2.x)         | Image tokens in prompt assembly         | 📋 plan |
+
+## Reference
+
+- Video: "Agent Specs: The Unreasonable Effectiveness of Useful Tokens"
+  https://www.youtube.com/watch?v=o4KZH_KSqYQ
+- Colibri T1.4 Prompt Discipline: `docs/T1.4-PROMPT-DISCIPLINE-PLAN.md`
+- Colibri T1.4 Cache Warming: `docs/T1.4-CACHE-WARMING-DESIGN.md`
+- Colibri Glasspane Design: `docs/COLIBRI-GLASSPANE-DESIGN.md`