colibri/docs/COLIBRI-SKILLS-PLAN.md
Sam & Claude b878b4bdfb
Some checks failed
CI / agent-jail-pkgs (pull_request) Has been cancelled
CI / rust (pull_request) Has been cancelled
CI / markdown (pull_request) Has been cancelled
CI / port (pull_request) Has been cancelled
docs: rewrite negative patterns as positive actionable instructions
Convert 'do not', 'cannot', 'never', 'avoid', 'don't' patterns across
AGENTS.md, README.md, and 11 docs/*.md files into positive,
actionable instructions that tell the reader what TO do.

Preserved: hard safety constraints (MUST NOT agent boundaries,
vault credential confinement intent) — these are enforceable
guardrails where the prohibition IS the instruction.
2026-06-21 13:09:19 +02:00

7.8 KiB

Colibri Skills Plan

Status: Phase 1 scaffolded — read-only split-brain consumer

Crate: crates/colibri-skills

Purpose

colibri-skills is Colibri's read-only runtime consumer for reviewed skill artifacts authored in the Clawdie-AI repo. It does not author, edit, or store canonical skills. Clawdie-AI remains the source of truth; Colibri indexes and serves typed/runtime views.

Clawdie-AI repo (source of truth)
  docs/astro-howto/
  docs/forgejo-admin/
  docs/vaultwarden-onboarding/
  ...

Colibri colibri-skills crate (read-only consumer)
  reads committed skill artifacts
  validates checksums
  indexes Markdown/transcript chunks
  exposes Skill, SkillArtifact, SkillChunk structs
  serves CLI/TUI/search later

This keeps the split-brain model explicit:

  • system_skills: committed built-in knowledge / manuals / reviewed skillpacks
  • system_brain: user and agent memory
  • system_ops: live runtime, task, service, and daemon state

Seed artifact: Astro how-to

The first concrete skillpack is docs/astro-howto/ in Clawdie-AI. It is useful because it is not just prose; it includes transcript, generated how-to docs, commands, screenshots, contact sheet, manifest, checksums, and scripts.

{
  "skill_id": "astro-howto",
  "source": "local video-derived training artifact",
  "inputs": [
    "transcript_local.txt",
    "screenshots/",
    "contact-sheet/contact_sheet.jpg"
  ],
  "outputs": [
    "docs/HOWTO.md",
    "docs/COMMANDS.md",
    "docs/SCREENSHOTS.md",
    "docs/SUMMARY.md"
  ],
  "verification": "can user create and run an Astro project?",
  "media": "screenshots/*.jpg (paths + hashes, not blobs)",
  "manifest": "run_manifest.json",
  "checksums": "artifacts.sha256"
}

Pipeline shape:

video → local transcript → topic extraction → how-to/runbook
→ screenshots/contact sheet → commands → verification test
→ manifest + checksums → reviewed skill artifact → Colibri read-only index

Ownership

Layer Role Writes Reads
Clawdie-AI Source of truth Skill artifacts via PR N/A
colibri-skills Runtime consumer Writes only to the runtime store; source repo remains read-only for the skills
consumer. Indexed skill structs from committed artifacts
Agents Authors/reviewers Candidate skill artifact PRs Skill content for task routing
system_brain Agent/user memory Personal/user/agent context Not canonical skill docs
system_ops Runtime state Live task/service state Not skills

What colibri-skills does

  • Read skill manifests from a configured Clawdie-AI checkout path
  • Parse run_manifest.json
  • Validate checksums against artifacts.sha256
  • Classify artifacts as document, image, script, transcript, manifest, checksum, report, contact sheet, or other
  • Index Markdown/transcript chunks for search
  • Expose stable typed structs for daemon/client/TUI callers
  • Persist runtime index metadata in SQLite

What colibri-skills does not do

  • Author, edit, or create skills
  • Store image blobs in SQLite; store paths and hashes only
  • Replace system_brain
  • Replace system_ops
  • Own provider/API budget logic
  • Require nonportable local source media paths at runtime

Phase 1 delivered

The scaffold crate now provides:

  • Skill
  • SkillManifest
  • SkillArtifact
  • SkillChunk
  • ArtifactType
  • SkillStatus
  • ImportSummary
  • SearchResult
  • unit tests for artifact classification and status/summary behavior

Phase 1 is intentionally scaffold-only: compile and type proof, no runtime import behavior yet.

SQLite schema target

CREATE TABLE system_skills (
    skill_id TEXT PRIMARY KEY,
    display_name TEXT NOT NULL,
    source_path TEXT NOT NULL,            -- relative within Clawdie-AI repo
    manifest_hash TEXT,                   -- sha256 of run_manifest.json
    created_at TEXT NOT NULL,             -- ISO 8601
    updated_at TEXT NOT NULL,
    verification TEXT,                    -- natural-language verification test
    status TEXT NOT NULL DEFAULT 'active' -- active, archived, superseded
);

CREATE TABLE system_skill_artifacts (
    artifact_id INTEGER PRIMARY KEY AUTOINCREMENT,
    skill_id TEXT NOT NULL REFERENCES system_skills(skill_id),
    artifact_type TEXT NOT NULL,
    relative_path TEXT NOT NULL,          -- within the skill directory
    file_name TEXT NOT NULL,
    mime_type TEXT,
    size_bytes INTEGER,
    sha256_hash TEXT NOT NULL,
    UNIQUE(skill_id, relative_path)
);

CREATE TABLE system_skill_chunks (
    chunk_id INTEGER PRIMARY KEY AUTOINCREMENT,
    skill_id TEXT NOT NULL REFERENCES system_skills(skill_id),
    artifact_id INTEGER NOT NULL REFERENCES system_skill_artifacts(artifact_id),
    chunk_type TEXT NOT NULL,
    heading TEXT,
    content TEXT NOT NULL,
    line_start INTEGER,
    line_end INTEGER,
    tokens_estimate INTEGER
);

CREATE INDEX idx_skills_status ON system_skills(status);
CREATE INDEX idx_artifacts_skill ON system_skill_artifacts(skill_id);
CREATE INDEX idx_artifacts_type ON system_skill_artifacts(artifact_type);
CREATE INDEX idx_chunks_skill ON system_skill_chunks(skill_id);
CREATE INDEX idx_chunks_type ON system_skill_chunks(chunk_type);

CREATE VIRTUAL TABLE IF NOT EXISTS skill_fts USING fts5(
    content,
    heading,
    skill_id,
    chunk_type,
    content=system_skill_chunks,
    content_rowid=chunk_id
);

Import flow target

  1. Read Clawdie-AI checkout path from config/env.
  2. Scan for directories containing run_manifest.json.
  3. Parse manifest and derive skill metadata.
  4. Read artifacts, compute SHA-256, and verify artifacts.sha256 when present.
  5. Chunk Markdown by heading and transcripts by timestamp/segment.
  6. Upsert SQLite rows idempotently.
  7. Return ImportSummary with skills found/indexed/skipped, artifacts, chunks, checksum failures, and errors.

CLI surface target

colibri list-skills
colibri show-skill <id>
colibri search-skills <query>
colibri index-skills
colibri verify-skill <id>

Portability rules

  • Store image paths and hashes, not blobs.
  • Treat local provenance paths like /home/samob/Videos/... as metadata only.
  • Verify checksums against committed artifacts, not local source paths.
  • Store paths relative to the Clawdie-AI repo. Normal tests run with only local SQLite and committed test fixtures; keep PostgreSQL, remote Forgejo, and local media as optional integration dependencies.

Future skillpacks

astro-howto
forgejo-admin
vaultwarden-onboarding
freebsd-update-reboot
colibri-iso-build
zed-on-freebsd
pi-headless-login

Implementation phases

Phase What Depends on
1 Scaffold crate + structs + schema plan Nothing
2 Manifest parser (run_manifest.jsonSkillManifest) Phase 1
3 Checksum validator (artifacts.sha256 → verify) Phase 2
4 Markdown/transcript chunker Phase 1
5 SQLite storage + FTS5 search Phases 3, 4
6 CLI commands (list, show, search, index, verify) Phase 5
7 Daemon/client/TUI integration Phase 6
  • clawdie-ai/docs/astro-howto/
  • clawdie-ai/docs/VAULTWARDEN-SETUP.md
  • clawdie-ai/bootstrap/skills-memory/artifact.sql
  • clawdie-ai/src/split-brain-status.ts