review/per-task-cost #230

Merged
clawdie merged 10 commits from review/per-task-cost into main 2026-06-27 13:23:48 +02:00
Owner
No description provided.
clawdie added 10 commits 2026-06-27 13:23:12 +02:00
ProviderSmokeResult → ProviderTestResult
PROVIDER_SMOKE_SCHEMA → PROVIDER_TEST_SCHEMA
clawdie.provider-smoke.result.v1 → clawdie.provider-test.result.v1
(manifests, golden tests, wiki, zot_rpc comments)

Rationale: smoke is jargon; test is clear and consistent with
the project's naming conventions (avoid dead/fake/smoke labels).
Colibri now captures per-task cost metrics when an agent finishes a
  claimed task. End-to-end: zot usage events → glasspane accumulation →
  daemon heartbeat capture → store persistence → MCP query.

  Contracts  — TaskCostSummary schema (clawdie.task-cost-summary.v1)
  Glasspane  — PaneUsage with Eq-safe micro-cents storage, accumulates
               zot usage events (previously discarded)
  Store      — Task extended with 8 cost columns, TaskCost struct,
               set_task_cost() transitions to Done/Failed + writes cost
  Daemon     — heartbeat poll_exit reads pane usage, writes TaskCost,
               gated on session_id prefix "task-"
  MCP        — colibri_get_task_cost tool (read-only, returns cost data)
  Currency   — Renamed cost_usd → cost everywhere. Value stays in
               provider billing currency (USD). Multi-currency display
               is a consumer-layer concern for the future.
  Tests      — +5 new tests: contract round-trip, store write+not-found,
               glasspane accumulation (Zot+Pi), MCP tool list count

  Crate delta: contracts +30, glasspane +50, store +80, daemon +40,
  mcp +20, tests +60. 306 total tests (up from 297).
Documents the T1.5 feature: zot usage events → glasspane PaneUsage →
daemon heartbeat → store.set_task_cost() → MCP colibri_get_task_cost.

Adds contracts cross-link (TaskCostSummary schema v1) to see-also.
New integration test: spawn_agent_with_usage_captures_task_cost
- Spawns colibri-test-agent with --emit-usage flag
- Agent emits zot-compatible usage event (input=150, output=80, cost=0.0042)
- Calls heartbeat() manually to capture cost (was private, now pub)
- Verifies all 8 cost fields are persisted on the task

Test agent changes:
- New --emit-usage flag emits usage JSONL event with deterministic values
- New parses_emit_usage_flag unit test

Glasspane change: usage accumulation was Zot-only — now all runtimes
accumulate (Pi, Local included). This enables cost tracking for any agent
harness that emits usage events. Updated zot_usage_accumulates test.

Sam & Hermes
Design doc for T2.x routing covering:
  - Stable machine_id (not hostname) for hive identity
  - Extended capability matrix with ollama + llama.cpp probes
  - Cost-aware routing tiers (local $0 → DeepSeek $0.27 → premium)
  - Three implementation options (mother-centric, peer-to-peer, skill-based)
  - Integration with T1.5 per-task cost tracking
  - "Verify, don't guess" — all capabilities from hw-probe, not declarations

  Recommendation: Option A (mother-centric foundation) + Option C
  (skill-based agent routing) layered on top. Local LLM is the
  ultimate cache-hit token — $0.0000 per task on a beefy member.
New wiki page: multi-node cost observability, A2A agent discovery,
and operator board design. Covers:
- What Hive Pane shows (node status, cost, tasks, GPU)
- Relationship to glasspane, mother-hive, task-board
- A2A integration: Agent Card, task exchange, cost data parts
- Data flow: node boot → A2A discovery → register → board
- Schema (hive_pane PostgreSQL view)
- Non-goals (not glasspane replacement, not Grafana)

Added to EN and SL wiki indexes.
wiki-lint --strict: PASS (163 refs, 0 failures).

Sam & Hermes
Merges: HIVE-PANE.md (glasspane for hive), end-to-end cost capture test,
  runtime-agnostic usage accumulation, test agent --emit-usage flag,
  heartbeat() pub for tests.

  Both wiki entries (hive-routing + hive-pane) preserved in index.
Three fixes to HIVE-PANE.md:
  1. machine_id as stable node identity — Agent Card input schema + hive_pane
     VIEW join key (was hostname-only)
  2. Local LLM column in the mockup board — ollama/llama.cpp model info
  3. cost_usd → cost in A2A cost data part (matches T1.5 rename)

  Cross-links:
  - hive-pane → hive-routing (engine vs presentation layer)
  - hive-routing → hive-pane (companion doc, A2A integration note)

  hive_pane VIEW now joins on machine_id, uses total_cost (not total_cost_usd).
docs(wiki): A2A complexity audit — when it pays off vs when it adds weight
Some checks failed
CI / rust (pull_request) Has been cancelled
CI / markdown (pull_request) Has been cancelled
CI / port (pull_request) Has been cancelled
CI / agent-jail-pkgs (pull_request) Has been cancelled
affee26afa
Full protocol surface audit across Colibri's 5 current protocols
  (~5,324 lines). Key finding: A2A is an interoperability play, not a
  complexity reduction play.

  Replaced:
  - Mother MCP-over-SSH bridge → A2A HTTP endpoint (−160 lines, +380 lines)
  - External MCP discovery → Agent Card (future, zero adopters today)
  - Ad-hoc cost format → typed A2A part (negligible code impact)

  Not replaced: Unix socket (local IPC), spawner (process lifecycle),
  glasspane (PTY observer), store (SQLite), MCP editor bridge (human↔tool).

  Net delta: ~0 lines (moves code, doesn't shrink it). Protocol count: 5→6.

  Recommendation: A2A is Phase 3 — not Phase 2, not 0.12. The current
  MCP-over-SSH bridge (437 lines) works for 4 nodes. A2A pays off at 10+
  nodes or when third-party tools ship A2A support. The Agent Card design
  in HIVE-PANE.md stays as a north star.

  Cross-linked from hive-pane.md + wiki index. 182 refs, clean lint.
clawdie merged commit 07ff198008 into main 2026-06-27 13:23:48 +02:00
clawdie deleted branch review/per-task-cost 2026-06-27 13:23:51 +02:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: clawdie/colibri#230
No description provided.