docs/model-selection-and-eval #244

Merged

clawdie merged 2 commits from docs/model-selection-and-eval into main

2026-06-27 22:24:29 +02:00

Author	SHA1	Message	Date
Sam & Claude	b096168aee	docs(wiki): model selection + evaluation harness design Some checks failed CI / rust (pull_request) Has been cancelled Details CI / markdown (pull_request) Has been cancelled Details CI / port (pull_request) Has been cancelled Details CI / agent-jail-pkgs (pull_request) Has been cancelled Details New wiki page: model-selection-and-eval.md (445 lines) Completes the T2.x trifecta design: - Evaluation harness: 3 modes (self-report, local LLM, cloud LLM) - Model selection: weighted scoring (success rate, cost, capability, latency) - Integration with hive-routing: data flow + implementation phases - 4 implementation phases, ~10 days total, ~570 lines Indexed in both en/index.md and sl/index.md. Follows PR #241 (conflict marker fix) and the now-merged screenshot pipeline. The eval harness provides the feedback loop that makes model-selection decisions data-driven rather than heuristic. Sam & Claude	2026-06-27 22:18:18 +02:00
Sam & Claude	08cdae1c47	fix(tests): cargo fmt on cost_pipeline.rs — PR #243 followup Cargo fmt drift in the new cost pipeline integration tests: - Multi-line .args() calls (8+ args per line) - Multi-line assert!() with format strings - Braced if-let-else blocks Sam & Claude	2026-06-27 22:18:18 +02:00

Author

SHA1

Message

Date

Sam & Claude

b096168aee

docs(wiki): model selection + evaluation harness design

CI / rust (pull_request) Has been cancelled

Details

CI / markdown (pull_request) Has been cancelled

Details

CI / port (pull_request) Has been cancelled

Details

CI / agent-jail-pkgs (pull_request) Has been cancelled

Details

New wiki page: model-selection-and-eval.md (445 lines)

Completes the T2.x trifecta design:
- Evaluation harness: 3 modes (self-report, local LLM, cloud LLM)
- Model selection: weighted scoring (success rate, cost, capability, latency)
- Integration with hive-routing: data flow + implementation phases
- 4 implementation phases, ~10 days total, ~570 lines

Indexed in both en/index.md and sl/index.md.

Follows PR #241 (conflict marker fix) and the now-merged screenshot
pipeline. The eval harness provides the feedback loop that makes
model-selection decisions data-driven rather than heuristic.

Sam & Claude

2026-06-27 22:18:18 +02:00

Sam & Claude

08cdae1c47

fix(tests): cargo fmt on cost_pipeline.rs — PR #243 followup

Cargo fmt drift in the new cost pipeline integration tests:
- Multi-line .args() calls (8+ args per line)
- Multi-line assert!() with format strings
- Braced if-let-else blocks

Sam & Claude

2026-06-27 22:18:18 +02:00