docs/model-selection-and-eval #244

Merged
clawdie merged 2 commits from docs/model-selection-and-eval into main 2026-06-27 22:24:29 +02:00

2 commits

Author SHA1 Message Date
Sam & Claude
b096168aee docs(wiki): model selection + evaluation harness design
Some checks failed
CI / rust (pull_request) Has been cancelled
CI / markdown (pull_request) Has been cancelled
CI / port (pull_request) Has been cancelled
CI / agent-jail-pkgs (pull_request) Has been cancelled
New wiki page: model-selection-and-eval.md (445 lines)

Completes the T2.x trifecta design:
- Evaluation harness: 3 modes (self-report, local LLM, cloud LLM)
- Model selection: weighted scoring (success rate, cost, capability, latency)
- Integration with hive-routing: data flow + implementation phases
- 4 implementation phases, ~10 days total, ~570 lines

Indexed in both en/index.md and sl/index.md.

Follows PR #241 (conflict marker fix) and the now-merged screenshot
pipeline. The eval harness provides the feedback loop that makes
model-selection decisions data-driven rather than heuristic.

Sam & Claude
2026-06-27 22:18:18 +02:00
Sam & Claude
08cdae1c47 fix(tests): cargo fmt on cost_pipeline.rs — PR #243 followup
Cargo fmt drift in the new cost pipeline integration tests:
- Multi-line .args() calls (8+ args per line)
- Multi-line assert!() with format strings
- Braced if-let-else blocks

Sam & Claude
2026-06-27 22:18:18 +02:00