2026-06-02 15:19:21 +02:00
|
|
|
|
# Colibri Tokenomics — The Trifecta Framework
|
|
|
|
|
|
|
|
|
|
|
|
**Source:** Indie Devdan, "Agent Specs: The Unreasonable Effectiveness of Useful Tokens"
|
|
|
|
|
|
(https://www.youtube.com/watch?v=o4KZH_KSqYQ)
|
2026-06-24 16:43:41 +02:00
|
|
|
|
**Date:** 01.jun.2026
|
2026-06-02 15:19:21 +02:00
|
|
|
|
**Status:** Strategic vision — maps to existing T1.4/T1.5 work
|
|
|
|
|
|
|
2026-06-13 19:19:07 +02:00
|
|
|
|
> **Scope:** This applies to the full Colibri control plane.
|
2026-06-02 15:52:42 +02:00
|
|
|
|
|
2026-06-02 15:19:21 +02:00
|
|
|
|
## Core Thesis
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
More useful tokens > fewer useful tokens
|
|
|
|
|
|
Cost per intelligence > cost per token
|
|
|
|
|
|
If you don't measure, you can't improve
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
The video validates what Colibri is already building: a cache-first,
|
|
|
|
|
|
measure-everything agent runtime. The "trifecta" is our north star.
|
|
|
|
|
|
|
|
|
|
|
|
## The Trifecta
|
|
|
|
|
|
|
2026-06-02 17:43:10 +02:00
|
|
|
|
| Axis | What it means for agents | Colibri surface |
|
|
|
|
|
|
| ----------- | ---------------------------------------------- | ---------------------------------------- |
|
|
|
|
|
|
| Performance | Did the agent get it right? Task success rate | Task outcomes, eval harness (T1.6) |
|
|
|
|
|
|
| Speed | Tokens/second, cache-hit ratio, latency | `colibri-deepseek` cache probe, T1.4 |
|
|
|
|
|
|
| Cost | Dollars per task. Not per token — per _result_ | `cost.rs` CostMode, escalation, metering |
|
2026-06-02 15:19:21 +02:00
|
|
|
|
|
2026-06-21 13:09:19 +02:00
|
|
|
|
Optimize each dimension with full awareness of its impact on the other two.
|
2026-06-02 15:19:21 +02:00
|
|
|
|
A cheap model that needs 5 retries is more expensive than a capable model
|
|
|
|
|
|
that gets it right in one shot.
|
|
|
|
|
|
|
|
|
|
|
|
## Token Arbitrage (the "golden line")
|
|
|
|
|
|
|
2026-06-21 13:09:19 +02:00
|
|
|
|
**Arbitrage tokens for maximum value.** Every byte that hits cache is a 10×
|
|
|
|
|
|
discount — design prompts to maximize cache-hit prefixes.
|
2026-06-02 15:19:21 +02:00
|
|
|
|
|
|
|
|
|
|
Cache-hit tokens cost ~10% of fresh tokens (DeepSeek pricing). Every byte
|
|
|
|
|
|
in the stable prefix that hits cache is 90% cheaper. The arbitrage
|
|
|
|
|
|
strategy:
|
|
|
|
|
|
|
|
|
|
|
|
1. **Maximize cache-hit surface**: byte-stable system prefix, skills,
|
|
|
|
|
|
tool definitions, agent identity — warm once, reuse thousands of times
|
|
|
|
|
|
2. **Spend where it counts**: conversation turns, tool results, novel
|
2026-06-02 17:43:10 +02:00
|
|
|
|
context — these are unavoidable, so make them _useful_ (VSpecs,
|
2026-06-02 15:19:21 +02:00
|
|
|
|
rich context, HTML plans)
|
|
|
|
|
|
3. **Trim where it doesn't**: auto-compaction, summarization, tool result
|
|
|
|
|
|
truncation — Colibri's 3-region model already does this
|
|
|
|
|
|
|
|
|
|
|
|
### Existing Colibri arbitrage infrastructure
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
T1.4 Prompt Discipline (code present, integration in progress):
|
|
|
|
|
|
Region 1: STABLE_SYSTEM_PREFIX → cache-hit (90% cheaper)
|
|
|
|
|
|
Region 2: conversation log (compacted) → fresh tokens
|
|
|
|
|
|
Region 3: volatile scratch (empty) → zero cost
|
|
|
|
|
|
|
|
|
|
|
|
CostMode escalation (Fast → Smart → Max):
|
|
|
|
|
|
Fast: 500K budget, compact tool results, 5 turns
|
|
|
|
|
|
Smart: 2M budget, keep tool results, 20 turns ← default
|
|
|
|
|
|
Max: 8M budget, full context, 100 turns
|
|
|
|
|
|
|
|
|
|
|
|
Cache warming (T1.4 PR3b, merged):
|
|
|
|
|
|
Pre-warm STABLE_SYSTEM_PREFIX on daemon startup
|
|
|
|
|
|
Re-warm every N hours (configurable)
|
|
|
|
|
|
~3,500 tokens per warm cycle → pays off in ~7 agent tasks
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
## What We Still Need (Trifecta Dashboard)
|
|
|
|
|
|
|
|
|
|
|
|
The video's core message: observability isn't optional for production
|
|
|
|
|
|
agents. Colibri already captures the raw data. What's missing is the
|
|
|
|
|
|
trifecta view:
|
|
|
|
|
|
|
|
|
|
|
|
### Per-task cost tracking
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
task_id: "abc123"
|
2026-06-02 15:52:42 +02:00
|
|
|
|
model: "deepseek-v4-flash"
|
2026-06-02 15:19:21 +02:00
|
|
|
|
tokens_in: 45,230 (12,100 cache-hit, 33,130 fresh)
|
|
|
|
|
|
tokens_out: 2,847
|
|
|
|
|
|
cost: $0.047 (cache savings: $0.012)
|
|
|
|
|
|
latency: 8.3s
|
|
|
|
|
|
success: true
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### Trifecta balance sheet
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
Performance ████████░░ 82% task success (rolling 24h)
|
|
|
|
|
|
Speed ██████░░░░ 61% cache-hit ratio
|
|
|
|
|
|
Cost ████████░░ $0.047 avg/task (target: <$0.05)
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### Model selection arbitrage
|
|
|
|
|
|
|
|
|
|
|
|
Given a task, Colibri should be able to answer:
|
2026-06-02 17:43:10 +02:00
|
|
|
|
|
2026-06-02 15:19:21 +02:00
|
|
|
|
- Can this task be handled by a cheap model (DeepSeek V3, Gemini Flash)?
|
|
|
|
|
|
- Is the cache-hit ratio high enough that the premium model is actually cheaper?
|
|
|
|
|
|
- What's the cost delta between models for this specific task type?
|
|
|
|
|
|
|
|
|
|
|
|
## Visual Specs (VSpecs) — Future Input Modality
|
|
|
|
|
|
|
|
|
|
|
|
The video introduces "VSpecs": plans with embedded images generated by
|
|
|
|
|
|
GPT Image 2. Multimodal models (Gemini 3.5 Flash, GPT-5) read these
|
|
|
|
|
|
images as "useful tokens" — a UI mockup is worth 1000 words of text
|
|
|
|
|
|
description.
|
|
|
|
|
|
|
|
|
|
|
|
For Colibri: this means the prompt assembly pipeline should eventually
|
|
|
|
|
|
support image tokens in Region 2 (conversation log). NOT for T1.4 —
|
|
|
|
|
|
this is T2.x territory. But the cost model should be ready for mixed
|
|
|
|
|
|
text+image token budgets.
|
|
|
|
|
|
|
|
|
|
|
|
## Golden Rules (from the video, adapted for Colibri)
|
|
|
|
|
|
|
|
|
|
|
|
1. **Measure everything.** Every tool call, every token, every dollar.
|
|
|
|
|
|
Colibri's glasspane architecture already captures the event stream;
|
|
|
|
|
|
the trifecta dashboard makes it actionable.
|
|
|
|
|
|
|
|
|
|
|
|
2. **Arbitrage cache vs spend.** The stable prefix is free money.
|
|
|
|
|
|
Maximize its size, minimize its churn.
|
|
|
|
|
|
|
2026-06-21 13:09:19 +02:00
|
|
|
|
3. **Cost per intelligence, not per token.** Compare cost-per-successful-task,
|
|
|
|
|
|
not raw model prices in isolation. A $0.05 task that
|
2026-06-02 15:19:21 +02:00
|
|
|
|
works is infinitely cheaper than a $0.01 task that fails.
|
|
|
|
|
|
|
|
|
|
|
|
4. **Trade-offs are engineering.** There is no "best" model. There is
|
|
|
|
|
|
only the right model for THIS task, under THESE constraints.
|
|
|
|
|
|
|
|
|
|
|
|
5. **Closed loop: measure → analyze → improve.** The trifecta dashboard
|
|
|
|
|
|
isn't a report — it's a feedback loop. Every task feeds back into
|
|
|
|
|
|
model selection, prompt design, and cache strategy.
|
|
|
|
|
|
|
|
|
|
|
|
## Integration with Existing Work
|
|
|
|
|
|
|
2026-06-02 17:43:10 +02:00
|
|
|
|
| Colibri component | Trifecta role | Status |
|
|
|
|
|
|
| --------------------------- | ------------------------------------- | ------- |
|
|
|
|
|
|
| `colibri-deepseek` | Cache probe, hit metering | ✅ done |
|
|
|
|
|
|
| `colibri-daemon/cost.rs` | CostMode, budget enforcement | ✅ done |
|
|
|
|
|
|
| `colibri-daemon/session.rs` | 3-region prompt, compaction | ✅ done |
|
|
|
|
|
|
| Cache warming (T1.4 PR3b) | Pre-warm stable prefix | ✅ done |
|
|
|
|
|
|
| Prompt discipline (T1.4) | Byte-stable assembly, cost-aware trim | 🔧 WIP |
|
|
|
|
|
|
| Trifecta dashboard (T1.5) | Per-task cost/speed/perf metrics | 📋 plan |
|
|
|
|
|
|
| Eval harness (T1.6) | Task success measurement | 📋 plan |
|
|
|
|
|
|
| Model selection (T2.x) | Arbitrage engine, cost-aware routing | 📋 plan |
|
|
|
|
|
|
| VSpec support (T2.x) | Image tokens in prompt assembly | 📋 plan |
|
2026-06-02 15:19:21 +02:00
|
|
|
|
|
|
|
|
|
|
## Reference
|
|
|
|
|
|
|
|
|
|
|
|
- Video: "Agent Specs: The Unreasonable Effectiveness of Useful Tokens"
|
|
|
|
|
|
https://www.youtube.com/watch?v=o4KZH_KSqYQ
|
|
|
|
|
|
- Colibri T1.4 Prompt Discipline: `docs/T1.4-PROMPT-DISCIPLINE-PLAN.md`
|
|
|
|
|
|
- Colibri Glasspane Design: `docs/COLIBRI-GLASSPANE-DESIGN.md`
|