docs/model-selection-and-eval #244

Merged

clawdie merged 2 commits from docs/model-selection-and-eval into main

2026-06-27 22:24:29 +02:00

clawdie commented

2026-06-27 22:23:59 +02:00

Owner

No description provided.

clawdie added 2 commits 2026-06-27 22:24:04 +02:00

fix(tests): cargo fmt on cost_pipeline.rs — PR #243 followup 08cdae1c47

Cargo fmt drift in the new cost pipeline integration tests:
- Multi-line .args() calls (8+ args per line)
- Multi-line assert!() with format strings
- Braced if-let-else blocks

Sam & Claude

docs(wiki): model selection + evaluation harness design

CI / rust (pull_request) Waiting to run

Details

CI / markdown (pull_request) Waiting to run

Details

CI / port (pull_request) Waiting to run

Details

CI / agent-jail-pkgs (pull_request) Waiting to run

Details

b096168aee

New wiki page: model-selection-and-eval.md (445 lines)

Completes the T2.x trifecta design:
- Evaluation harness: 3 modes (self-report, local LLM, cloud LLM)
- Model selection: weighted scoring (success rate, cost, capability, latency)
- Integration with hive-routing: data flow + implementation phases
- 4 implementation phases, ~10 days total, ~570 lines

Indexed in both en/index.md and sl/index.md.

Follows PR #241 (conflict marker fix) and the now-merged screenshot
pipeline. The eval harness provides the feedback loop that makes
model-selection decisions data-driven rather than heuristic.

Sam & Claude