hermes-bsd/website/docs/developer-guide/plugin-llm-access.md

---
sidebar_position: 11
title: "Plugin LLM Access"
description: "Run any LLM call from inside a plugin via ctx.llm — chat or structured, sync or async. Host-owned auth, fail-closed trust gate, optional JSON Schema validation."
---

# Plugin LLM Access

`ctx.llm` is the supported way for a plugin to make an LLM call.
Chat completion, structured extraction, sync, async, with or without
images — same surface, same trust gate, same host-owned credentials.

Plugins reach for this when they need to do something that involves
the model but isn't part of the agent's conversation. A hook that
rewrites a tool error into something a non-engineer can read. A
gateway adapter that translates an inbound message before queuing
it. A slash command that summarises a long paste. A scheduled job
that scores yesterday's activity and writes one line to a status
board. A pre-filter that decides whether a message is worth waking
the agent up for at all.

These are jobs the agent shouldn't be in the loop on. They want one
LLM call, a typed answer, and to be done.

## The smallest possible call

```python
result = ctx.llm.complete(messages=[{"role": "user", "content": "ping"}])
return result.text
```

That's the whole API in one line. No keys, no provider config, no
SDK initialisation. The plugin runs against whatever provider and
model the user is currently using — when they switch providers, the
plugin follows them automatically.

## A more complete chat example

```python
result = ctx.llm.complete(
    messages=[
        {"role": "system", "content": "Rewrite errors as one short sentence a non-engineer can act on."},
        {"role": "user",   "content": traceback_text},
    ],
    max_tokens=64,
    purpose="hooks.error-rewrite",
)
return result.text
```

`purpose` is a free-form audit string — it shows up in `agent.log`
and in `result.audit` so operators can see which plugin made which
call. Optional but recommended for anything that fires often.

## Structured output

When the plugin needs a typed answer, switch to the structured lane:

```python
result = ctx.llm.complete_structured(
    instructions="Score this support reply for urgency (0–1) and pick a category.",
    input=[{"type": "text", "text": message_body}],
    json_schema=TRIAGE_SCHEMA,
    purpose="support.triage",
    temperature=0.0,
    max_tokens=128,
)

if result.parsed["urgency"] > 0.8:
    await dispatch_to_oncall(result.parsed["category"], message_body)
```

The host requests JSON output from the provider, parses it locally
as a fallback, validates against your schema if `jsonschema` is
installed, and hands back a Python object on `result.parsed`. If the
model couldn't produce valid JSON, `result.parsed` is `None` and
`result.text` carries the raw response.

## What this lane gives you

* **One call, four shapes.** `complete()` for chat,
  `complete_structured()` for typed JSON, `acomplete()` and
  `acomplete_structured()` for asyncio. Same arguments, same result
  objects.
* **Host-owned credentials.** OAuth tokens, refresh flows, the
  credential pool, per-task aux overrides — every credential
  concept Hermes already has applies. The plugin never sees a
  token; the host attributes the call back through `result.audit`.
* **Bounded.** Single sync or async call. No streaming, no tool
  loops, no conversation state to manage. State the input, get the
  result, return.
* **Fail-closed trust.** A plugin you've never configured cannot
  pick its own provider, model, agent, or stored credential. The
  default posture is "use what the user is using." Operators opt in
  to specific overrides, per plugin, in `config.yaml`.

## Quick start

Two complete plugins below — one chat, one structured. Both ship
inside a single `register(ctx)` function and need zero outside
configuration to run against whatever model the user has active.

### Chat completion — `/tldr`

```python
def register(ctx):
    ctx.register_command(
        name="tldr",
        handler=lambda raw: _tldr(ctx, raw),
        description="Summarise the supplied text in one paragraph.",
        args_hint="<text>",
    )


def _tldr(ctx, raw_args: str) -> str:
    text = raw_args.strip()
    if not text:
        return "Usage: /tldr <text to summarise>"
    result = ctx.llm.complete(
        messages=[
            {"role": "system",
             "content": "Summarise the user's text in one tight paragraph. No preamble."},
            {"role": "user", "content": text},
        ],
        max_tokens=256,
        temperature=0.3,
        purpose="tldr",
    )
    return result.text
```

`result.text` is the model's response; `result.usage` carries token
counts; `result.provider` and `result.model` carry attribution.

### Structured extraction — `/paste-to-tasks`

```python
def register(ctx):
    ctx.register_command(
        name="paste-to-tasks",
        handler=lambda raw: _paste_to_tasks(ctx, raw),
        description="Turn freeform meeting notes into structured tasks.",
        args_hint="<text>",
    )


_TASKS_SCHEMA = {
    "type": "object",
    "properties": {
        "tasks": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "owner":  {"type": "string"},
                    "action": {"type": "string"},
                    "due":    {"type": "string", "description": "ISO date or empty"},
                },
                "required": ["action"],
            },
        },
    },
    "required": ["tasks"],
}


def _paste_to_tasks(ctx, raw_args: str) -> str:
    if not raw_args.strip():
        return "Usage: /paste-to-tasks <meeting notes>"
    result = ctx.llm.complete_structured(
        instructions=(
            "Extract concrete action items from these meeting notes. "
            "One task per actionable line. If no owner is named, leave 'owner' blank."
        ),
        input=[{"type": "text", "text": raw_args}],
        json_schema=_TASKS_SCHEMA,
        schema_name="meeting.tasks",
        purpose="paste-to-tasks",
        temperature=0.0,
        max_tokens=512,
    )
    if result.parsed is None:
        return f"Couldn't parse a response. Raw output:\n{result.text}"
    lines = [f"- [{t.get('owner') or '?'}] {t['action']}" for t in result.parsed["tasks"]]
    return "\n".join(lines) or "(no tasks found)"
```

A third worked example, this time with image input, lives in the
[`hermes-example-plugins`](https://github.com/NousResearch/hermes-example-plugins/tree/main/plugin-llm-example)
repo (companion repo for reference plugins — not bundled with
hermes-agent itself). For the async surface (`acomplete()` /
`acomplete_structured()` with `asyncio.gather()`), see
[`plugin-llm-async-example`](https://github.com/NousResearch/hermes-example-plugins/tree/main/plugin-llm-async-example)
in the same repo.

## When to use which

| You want… | Reach for |
|---|---|
| A free-form text response (translation, summary, rewrite, generation) | `complete()` |
| A multi-turn prompt (system + few-shot examples + user) | `complete()` |
| A typed dict back, validated against a schema | `complete_structured()` |
| Image-or-text input with a typed dict back | `complete_structured()` |
| The same call from async code (gateway adapters, async hooks) | `acomplete()` / `acomplete_structured()` |

Everything else — provider selection, model resolution, auth, fallback,
timeout, vision routing — is the same across all four.

## API surface

`ctx.llm` is an instance of `agent.plugin_llm.PluginLlm`.

### `complete()`

```python
result = ctx.llm.complete(
    messages=[{"role": "user", "content": "Hi"}],
    provider=None,         # optional, gated — Hermes provider id (e.g. "openrouter")
    model=None,            # optional, gated — whatever string that provider expects
    temperature=None,
    max_tokens=None,
    timeout=None,          # seconds
    agent_id=None,         # optional, gated
    profile=None,          # optional, gated — explicit auth-profile name
    purpose="optional-audit-string",
)
# → PluginLlmCompleteResult(text, provider, model, agent_id, usage, audit)
```

Plain chat completion. `messages` is the standard OpenAI shape — a
list of `{"role": "...", "content": "..."}` dicts. Multi-turn
prompts (system + few-shot user/assistant pairs + final user) work
exactly as they would with the OpenAI SDK.

`provider=` and `model=` are independent and follow the same shape
as the host's main config (`model.provider` + `model.model`). Set
just `model=` to use the user's active provider with a different
model on it. Set both to switch providers entirely. Either argument
without operator opt-in raises `PluginLlmTrustError`.

### `complete_structured()`

```python
result = ctx.llm.complete_structured(
    instructions="What you want extracted.",
    input=[
        {"type": "text",  "text": "..."},
        {"type": "image", "data": b"...", "mime_type": "image/png"},
        {"type": "image", "url":  "https://..."},
    ],
    json_schema={...},     # optional — triggers parsed result + validation
    json_mode=False,       # set True without a schema to ask for JSON anyway
    schema_name=None,      # optional human-readable schema name
    system_prompt=None,
    provider=None,         # optional, gated
    model=None,            # optional, gated
    temperature=None,
    max_tokens=None,
    timeout=None,
    agent_id=None,
    profile=None,
    purpose=None,
)
# → PluginLlmStructuredResult(text, provider, model, agent_id,
#                             usage, parsed, content_type, audit)
```

Inputs are typed text or image blocks (raw bytes get base64 encoded
as a `data:` URL automatically). When `json_schema` or
`json_mode=True` is supplied, the host requests JSON output via
`response_format`, parses it locally as a fallback, and validates
against your schema if `jsonschema` is installed.

* `result.content_type == "json"` — `result.parsed` is a Python
  object that matches your schema.
* `result.content_type == "text"` — parsing or validation failed;
  inspect `result.text` for the raw model response.

### Async

```python
result = await ctx.llm.acomplete(messages=...)
result = await ctx.llm.acomplete_structured(instructions=..., input=...)
```

Same arguments and result types as their sync counterparts. Use
these from gateway adapters, async hooks, or any plugin code
already running on an asyncio loop.

### Result attributes

```python
@dataclass
class PluginLlmCompleteResult:
    text: str                    # the assistant's response
    provider: str                # e.g. "openrouter", "anthropic"
    model: str                   # whatever the provider returned for this call
    agent_id: str                # whose model/auth was used
    usage: PluginLlmUsage        # tokens + cache + cost estimate
    audit: Dict[str, Any]        # plugin_id, purpose, profile

@dataclass
class PluginLlmStructuredResult(PluginLlmCompleteResult):
    parsed: Optional[Any]        # JSON object when content_type == "json"
    content_type: str            # "json" or "text"
    # audit also carries schema_name when supplied
```

`usage` carries `input_tokens`, `output_tokens`, `total_tokens`,
`cache_read_tokens`, `cache_write_tokens`, and `cost_usd` when the
provider returns those fields.

## Trust gate

The default behaviour is fail-closed. With no `plugins.entries`
config block, a plugin can:

* run any of the four methods against the user's active provider
  and model,
* set request-shaping arguments (`temperature`, `max_tokens`,
  `timeout`, `system_prompt`, `purpose`, `messages`, `instructions`,
  `input`, `json_schema`),

…and that's it. `provider=`, `model=`, `agent_id=`, and `profile=`
arguments raise `PluginLlmTrustError` until the operator opts in.

**Most plugins never need this section.** A plugin that just calls
`ctx.llm.complete(messages=...)` with no overrides runs against
whatever the user has active and works zero-config. The block below
is only relevant when a plugin specifically wants to pin to a
different model or provider than the user.

```yaml
plugins:
  entries:
    my-plugin:
      llm:
        # Allow this plugin to choose a different Hermes provider
        # (must be one Hermes already knows about — same names as
        # `hermes model` and config.yaml model.provider).
        allow_provider_override: true

        # Optionally restrict which providers. Use ["*"] for any.
        allowed_providers:
          - openrouter
          - anthropic

        # Allow this plugin to ask for a specific model.
        allow_model_override: true

        # Optionally restrict which models. Use ["*"] for any.
        # Models are matched literally against whatever string the
        # plugin sends — Hermes does not look anything up.
        allowed_models:
          - openai/gpt-4o-mini
          - anthropic/claude-3-5-haiku

        # Allow cross-agent calls (rare).
        allow_agent_id_override: false

        # Allow the plugin to request a specific stored auth profile
        # (e.g. a different OAuth account on the same provider).
        allow_profile_override: false
```

The plugin id is the manifest `name:` field for flat plugins, or the
path-derived key for nested plugins (`image_gen/openai`,
`memory/honcho`, etc.).

### What the gate enforces

| Override        | Default | Config key                       |
| --------------- | ------- | -------------------------------- |
| `provider=`     | denied  | `allow_provider_override: true`  |
| ↳ allowlist     | —       | `allowed_providers: [...]`       |
| `model=`        | denied  | `allow_model_override: true`     |
| ↳ allowlist     | —       | `allowed_models: [...]`          |
| `agent_id=`     | denied  | `allow_agent_id_override: true`  |
| `profile=`      | denied  | `allow_profile_override: true`   |

Each override is independently gated. Granting `allow_model_override`
does **not** also grant `allow_provider_override` — a plugin trusted
to pick a model is still pinned to the user's active provider unless
it gets the provider gate as well.

### What the gate does NOT need to enforce

* Request-shaping arguments — `temperature`, `max_tokens`,
  `timeout`, `system_prompt`, `purpose`, `messages`, `instructions`,
  `input`, `json_schema`, `schema_name`, `json_mode` — are always
  allowed; they don't pick credentials or routes.
* The default deny posture means an unconfigured plugin can still do
  useful work — it just runs against the active provider and model.
  Operators only need to think about `plugins.entries` for plugins
  that want finer routing.

## What the host owns

A complete list of the things `ctx.llm` does for the plugin so you
don't have to:

* **Provider resolution.** Reads `model.provider` + `model.model`
  from the user's config (or the explicit overrides when trusted).
* **Auth.** Pulls API keys, OAuth tokens, or refresh tokens from
  `~/.hermes/auth.json` / env, including the credential pool when
  one is configured. The plugin never sees them.
* **Vision routing.** When image input is supplied and the user's
  active text model is text-only, the host falls back to the
  configured vision model automatically.
* **Fallback chain.** If the user's primary provider 5xxs or 429s,
  the request goes through Hermes' usual aggregator-aware fallback
  before it returns an error to the plugin.
* **Timeout.** Honours your `timeout=` argument, falling back to
  `auxiliary.<task>.timeout` config or the global aux default.
* **JSON shaping.** Sends `response_format` to the provider when
  you ask for JSON, then re-parses locally from a code-fenced
  response if the provider returned one.
* **Schema validation.** Validates against your `json_schema` when
  `jsonschema` is installed; logs a debug line and skips strict
  validation otherwise.
* **Audit log.** Each call writes one INFO line to `agent.log` with
  the plugin id, provider/model, purpose, and token totals.

## What the plugin owns

* **Request shape.** `messages` for chat, `instructions` + `input`
  for structured. The plugin builds the prompt; the host runs it.
* **Schema.** Whatever shape you want back. The host doesn't infer
  it for you.
* **Error handling.** `complete_structured()` raises `ValueError` on
  empty inputs and on schema-validation failure. `PluginLlmTrustError`
  fires when the trust gate denies an override. Anything else
  (provider 5xx, no credentials configured, timeout) raises whatever
  `auxiliary_client.call_llm()` raises.
* **Cost.** Every call runs against the user's paid provider. Don't
  loop on `complete()` for every gateway message without thinking
  about token spend.

## Where this fits in the plugin surface

Existing `ctx.*` methods extend an existing Hermes subsystem:

| `ctx.register_tool` | adds a tool the agent can call |
| `ctx.register_platform` | wires a new gateway adapter |
| `ctx.register_image_gen_provider` | replaces an image-gen backend |
| `ctx.register_memory_provider` | replaces the memory backend |
| `ctx.register_context_engine` | replaces the context compressor |
| `ctx.register_hook` | observes a lifecycle event |

`ctx.llm` is the first surface that lets a plugin run the same
model the user is talking to, *out of band*, without any of the
above. That's its only job. If your plugin needs to register a
tool the agent invokes, use `register_tool`. If it needs to react
to a lifecycle event, use `register_hook`. If it needs to make its
own model call — for any reason, structured or not — `ctx.llm`.

## Reference

* Implementation: [`agent/plugin_llm.py`](https://github.com/NousResearch/hermes-agent/blob/main/agent/plugin_llm.py)
* Tests: [`tests/agent/test_plugin_llm.py`](https://github.com/NousResearch/hermes-agent/blob/main/tests/agent/test_plugin_llm.py)
* Reference plugins (companion repo):
  * [`plugin-llm-example`](https://github.com/NousResearch/hermes-example-plugins/tree/main/plugin-llm-example) — sync structured extraction with image input
  * [`plugin-llm-async-example`](https://github.com/NousResearch/hermes-example-plugins/tree/main/plugin-llm-async-example) — async with `asyncio.gather()`
* Auxiliary client (the engine under the hood): see
  [Provider Runtime](/docs/developer-guide/provider-runtime).
-												feat(plugins): run any LLM call from inside a plugin via ctx.llm (#23194)

* feat(plugins): host-owned LLM access via ctx.llm

Plugins can now ask the host to run a one-shot chat or structured
completion against the user's active model and auth, without ever
seeing an OAuth token or API key. Closes the gap where plugins that
needed bounded structured inference (receipts, CRM extraction,
support classification) had to either bring their own provider keys
or register a tool the agent had to call.

New surface on PluginContext:
- ctx.llm.complete(messages, ...)
- ctx.llm.complete_structured(instructions, input, json_schema, ...)
- async siblings ctx.llm.acomplete / acomplete_structured

Backed by the existing auxiliary_client.call_llm pipeline — every
provider, fallback chain, vision routing, and timeout policy Hermes
already supports applies automatically.

Trust gate (fail-closed by default):
- plugins.entries.<id>.llm.allow_model_override
- plugins.entries.<id>.llm.allowed_models (allowlist; '*' = any)
- plugins.entries.<id>.llm.allow_agent_id_override
- plugins.entries.<id>.llm.allow_profile_override

Embedded model@profile shorthand goes through the same gate as
explicit profile=, so it can't bypass the auth-profile policy.
Conflicting explicit and embedded profiles fail closed.

Also lands:
- plugins/plugin-llm-example/ — reference plugin that registers
  /receipt-extract, demonstrating image+text structured input,
  jsonschema validation, and the trust-gate config.
- website/docs/developer-guide/plugin-llm-access.md — full API docs.
- 45 unit tests covering trust gates, JSON parsing, schema
  validation, image encoding, async surface, and config loading.

Validation:
- 2628 tests pass in tests/agent/
- E2E: bundled plugin loaded with isolated HERMES_HOME, slash
  command produced parsed JSON via stubbed call_llm
- response_format extra_body wired correctly for both json_object
  and json_schema modes

* docs(plugin-llm): rewrite quickstart and framing

The quickstart now uses a meeting-notes-to-tasks example instead of
a receipt extractor, and the page leads with hook-time / gateway
pre-filter / scheduled-job framing rather than the OpenClaw
KB/support/CRM/finance/migration enumeration that the original
upstream PR used. Receipt example moved to a separate worked
example link so the docs page itself doesn't echo any of the
upstream framing.

Also clarifies where ctx.llm fits in the broader plugin surface
(table comparing register_tool / register_platform / register_hook
/ etc.) and what makes this lane different from auxiliary_client
internals.

No code change.

* docs(plugin-llm): reframe as any LLM call, not just structured output

The original draft leaned heavily on complete_structured() and made
the chat lane (complete() / acomplete()) feel like a footnote.
Restructure so:

- The page title and description say 'any LLM call.'
- The lead shows BOTH a plain chat call (error rewriter) AND a
  structured call (triage scorer) up top.
- Quick start has two complete plugin examples — /tldr (chat) and
  /paste-to-tasks (structured).
- New 'When to use which' table for choosing complete() vs
  complete_structured() vs the async siblings.
- Trust-gate sections explicitly note 'all four methods,' and the
  request-shaping list calls out chat-only fields (messages) and
  structured-only fields (instructions, input, json_schema)
  alongside each other.
- The 'Where this fits' section now says 'for any reason,
  structured or not.'

The receipt-extractor reference plugin still exists under
plugins/plugin-llm-example/ — but the docs page no longer treats
it as the canonical surface example. It's now described as 'a third
worked example, this time with image input.'

No code change.

* feat(plugin-llm): split provider/model into independent explicit kwargs

The first cut accepted a single 'provider/model' slug on every method
and split it internally. That looked clean but broke under live test:
the model-override path tried to use the slug's vendor prefix as a
literal Hermes provider id, which silently switched the user off
their aggregator (e.g. plugin asks for 'openai/gpt-4o-mini' on a user
who routes through OpenRouter — host attempted to call the 'openai'
provider directly, failed because OPENAI_API_KEY wasn't set).

New shape mirrors the host's main config:

  ctx.llm.complete(
      messages=[...],
      provider='openrouter',         # gated, optional
      model='openai/gpt-4o-mini',    # gated, optional
      profile='work',                # gated, optional
      ...
  )

Each is independently gated by its own allow_*_override flag.
Granting model-override does NOT auto-grant provider-override.
Allowlists are now per-axis (allowed_providers, allowed_models)
matched literally against whatever string the plugin sends.

Dropped 'model@profile' embedded-suffix shorthand entirely. Hermes
doesn't use that pattern anywhere else; profile= is its own kwarg.

Live E2E (against real OpenRouter via Teknium's config) confirms:
- zero-config call works
- default-deny blocks each override with a helpful error
- model-only override stays on user's active provider (the bug)
- provider+model override switches cleanly
- allowlist refuses non-listed entries
- structured output round-trip parses + schema-validates

Tests: 49 cases (up from 45); all green. Docs updated to match the
new shape, including a 'most plugins never need this section' callout
on the trust-gate config block.

* fix+cleanup(plugin-llm): real attribution, hook-mode coverage, move example out of core

Three integration fixes for the ctx.llm surface:

1. Attribution bug — result.provider and result.model now reflect
   what call_llm actually used, not placeholder fallbacks ('auto',
   'default'). New _resolve_attribution() helper:

     - explicit overrides win (what the call targeted)
     - response.model wins for the recorded model (provider
       canonicalisation: 'gpt-4o' → 'gpt-4o-2024-08-06' etc.)
     - falls back to _read_main_provider() / _read_main_model()
       when no override is set, so audit logs reflect the user's
       active main provider/model
     - 'auto' / 'default' only when EVERYTHING is empty

   Live verified: zero-config call now records
   provider='openrouter', model='anthropic/claude-4.7-opus-20260416'
   instead of provider='auto', model='default'.

2. Hook-mode coverage — TestHookMode confirms ctx.llm.complete
   works from inside a registered post_tool_call callback. The
   docs page promised hook integration; now there's a test that
   exercises the lazy-import path through the real invoke_hook
   machinery. Two cases: traceback-rewrite hook with conditional
   ctx.llm.complete, and minimal hook regression for the
   sync-hook + sync-llm path.

3. Reference plugin moved out of core. plugins/plugin-llm-example/
   is gone from hermes-agent — it now lives in the new
   NousResearch/hermes-example-plugins companion repo. The docs
   page links there. Hermes' bundled plugins should be plugins
   users actually run; reference / docs-companion plugins live
   externally.

Test count: 56 (up from 49). Wider sweep on tests/hermes_cli/
+ tests/gateway/ + tests/tools/ + tests/agent/ shows 16770
passing; the 12 failures are all pre-existing on origin/main
(verified by stashing this branch's changes and re-running) —
kanban-boards, delegate-task, gateway-restart, tts-routing —
none touch the plugin_llm surface.

* chore(plugins): move all example plugins to companion repo

Reference / docs-companion plugins now live exclusively in
NousResearch/hermes-example-plugins, not bundled with the core repo:

- example-dashboard
- strike-freedom-cockpit

A new fourth example, plugin-llm-async-example, was added to that
repo demonstrating ctx.llm's async surface (acomplete()) with
asyncio.gather() — registers /translate <lang>: <text> which fires
forward translation + sentiment classifier in parallel, then a
back-translation for QA. Live-tested at 2.5s for three real
provider round-trips (would be ~5-6s sequential).

Docs updated:
- developer-guide/plugin-llm-access.md links both sync and async
  examples in the Reference section
- user-guide/features/extending-the-dashboard.md repoints both demo
  sections to the companion repo with corrected install paths
- user-guide/features/built-in-plugins.md drops the two demo rows
- AGENTS.md notes that example plugins live in the companion repo

Net: hermes-agent's plugins/ directory now contains only plugins
users actually run (memory providers, dashboard tabs that ship real
features, the disk-cleanup hook, platform adapters). All four
demo / reference plugins live externally where they can be cloned
on demand instead of inflating the core install.
											
										
										
											2026-05-10 07:09:28 -07:00
+								---
 								sidebar_position: 11
 								title: "Plugin LLM Access"
 								description: "Run any LLM call from inside a plugin via ctx.llm — chat or structured, sync or async. Host-owned auth, fail-closed trust gate, optional JSON Schema validation."
 								---
 								# Plugin LLM Access
 								`ctx.llm` is the supported way for a plugin to make an LLM call.
 								Chat completion, structured extraction, sync, async, with or without
 								images — same surface, same trust gate, same host-owned credentials.
 								Plugins reach for this when they need to do something that involves
 								the model but isn't part of the agent's conversation. A hook that
 								rewrites a tool error into something a non-engineer can read. A
 								gateway adapter that translates an inbound message before queuing
 								it. A slash command that summarises a long paste. A scheduled job
 								that scores yesterday's activity and writes one line to a status
 								board. A pre-filter that decides whether a message is worth waking
 								the agent up for at all.
 								These are jobs the agent shouldn't be in the loop on. They want one
 								LLM call, a typed answer, and to be done.
 								## The smallest possible call
 								```python
 								result = ctx.llm.complete(messages=[{"role": "user", "content": "ping"}])
 								return result.text
 								```
 								That's the whole API in one line. No keys, no provider config, no
 								SDK initialisation. The plugin runs against whatever provider and
 								model the user is currently using — when they switch providers, the
 								plugin follows them automatically.
 								## A more complete chat example
 								```python
 								result = ctx.llm.complete(
 								    messages=[
 								        {"role": "system", "content": "Rewrite errors as one short sentence a non-engineer can act on."},
 								        {"role": "user",   "content": traceback_text},
 								    ],
 								    max_tokens=64,
 								    purpose="hooks.error-rewrite",
 								)
 								return result.text
 								```
 								`purpose` is a free-form audit string — it shows up in `agent.log`
 								and in `result.audit` so operators can see which plugin made which
 								call. Optional but recommended for anything that fires often.
 								## Structured output
 								When the plugin needs a typed answer, switch to the structured lane:
 								```python
 								result = ctx.llm.complete_structured(
 								    instructions="Score this support reply for urgency (0–1) and pick a category.",
 								    input=[{"type": "text", "text": message_body}],
 								    json_schema=TRIAGE_SCHEMA,
 								    purpose="support.triage",
 								    temperature=0.0,
 								    max_tokens=128,
 								)
 								if result.parsed["urgency"] > 0.8:
 								    await dispatch_to_oncall(result.parsed["category"], message_body)
 								```
 								The host requests JSON output from the provider, parses it locally
 								as a fallback, validates against your schema if `jsonschema` is
 								installed, and hands back a Python object on `result.parsed`. If the
 								model couldn't produce valid JSON, `result.parsed` is `None` and
 								`result.text` carries the raw response.
 								## What this lane gives you
 								* **One call, four shapes.** `complete()` for chat,
 								  `complete_structured()` for typed JSON, `acomplete()` and
 								  `acomplete_structured()` for asyncio. Same arguments, same result
 								  objects.
 								* **Host-owned credentials.** OAuth tokens, refresh flows, the
 								  credential pool, per-task aux overrides — every credential
 								  concept Hermes already has applies. The plugin never sees a
 								  token; the host attributes the call back through `result.audit`.
 								* **Bounded.** Single sync or async call. No streaming, no tool
 								  loops, no conversation state to manage. State the input, get the
 								  result, return.
 								* **Fail-closed trust.** A plugin you've never configured cannot
 								  pick its own provider, model, agent, or stored credential. The
 								  default posture is "use what the user is using." Operators opt in
 								  to specific overrides, per plugin, in `config.yaml`.
 								## Quick start
 								Two complete plugins below — one chat, one structured. Both ship
 								inside a single `register(ctx)` function and need zero outside
 								configuration to run against whatever model the user has active.
 								### Chat completion — `/tldr`
 								```python
 								def register(ctx):
 								    ctx.register_command(
 								        name="tldr",
 								        handler=lambda raw: _tldr(ctx, raw),
 								        description="Summarise the supplied text in one paragraph.",
 								        args_hint="<text>",
 								    )
 								def _tldr(ctx, raw_args: str) -> str:
 								    text = raw_args.strip()
 								    if not text:
 								        return "Usage: /tldr <text to summarise>"
 								    result = ctx.llm.complete(
 								        messages=[
 								            {"role": "system",
 								             "content": "Summarise the user's text in one tight paragraph. No preamble."},
 								            {"role": "user", "content": text},
 								        ],
 								        max_tokens=256,
 								        temperature=0.3,
 								        purpose="tldr",
 								    )
 								    return result.text
 								```
 								`result.text` is the model's response; `result.usage` carries token
 								counts; `result.provider` and `result.model` carry attribution.
 								### Structured extraction — `/paste-to-tasks`
 								```python
 								def register(ctx):
 								    ctx.register_command(
 								        name="paste-to-tasks",
 								        handler=lambda raw: _paste_to_tasks(ctx, raw),
 								        description="Turn freeform meeting notes into structured tasks.",
 								        args_hint="<text>",
 								    )
 								_TASKS_SCHEMA = {
 								    "type": "object",
 								    "properties": {
 								        "tasks": {
 								            "type": "array",
 								            "items": {
 								                "type": "object",
 								                "properties": {
 								                    "owner":  {"type": "string"},
 								                    "action": {"type": "string"},
 								                    "due":    {"type": "string", "description": "ISO date or empty"},
 								                },
 								                "required": ["action"],
 								            },
 								        },
 								    },
 								    "required": ["tasks"],
 								}
 								def _paste_to_tasks(ctx, raw_args: str) -> str:
 								    if not raw_args.strip():
 								        return "Usage: /paste-to-tasks <meeting notes>"
 								    result = ctx.llm.complete_structured(
 								        instructions=(
 								            "Extract concrete action items from these meeting notes. "
 								            "One task per actionable line. If no owner is named, leave 'owner' blank."
 								        ),
 								        input=[{"type": "text", "text": raw_args}],
 								        json_schema=_TASKS_SCHEMA,
 								        schema_name="meeting.tasks",
 								        purpose="paste-to-tasks",
 								        temperature=0.0,
 								        max_tokens=512,
 								    )
 								    if result.parsed is None:
 								        return f"Couldn't parse a response. Raw output:\n{result.text}"
 								    lines = [f"- [{t.get('owner') or '?'}] {t['action']}" for t in result.parsed["tasks"]]
 								    return "\n".join(lines) or "(no tasks found)"
 								```
 								A third worked example, this time with image input, lives in the
 								[`hermes-example-plugins`](https://github.com/NousResearch/hermes-example-plugins/tree/main/plugin-llm-example)
 								repo (companion repo for reference plugins — not bundled with
 								hermes-agent itself). For the async surface (`acomplete()` /
 								`acomplete_structured()` with `asyncio.gather()`), see
 								[`plugin-llm-async-example`](https://github.com/NousResearch/hermes-example-plugins/tree/main/plugin-llm-async-example)
 								in the same repo.
 								## When to use which
 								| You want… | Reach for |
 								|---|---|
 								| A free-form text response (translation, summary, rewrite, generation) | `complete()` |
 								| A multi-turn prompt (system + few-shot examples + user) | `complete()` |
 								| A typed dict back, validated against a schema | `complete_structured()` |
 								| Image-or-text input with a typed dict back | `complete_structured()` |
 								| The same call from async code (gateway adapters, async hooks) | `acomplete()` / `acomplete_structured()` |
 								Everything else — provider selection, model resolution, auth, fallback,
 								timeout, vision routing — is the same across all four.
 								## API surface
 								`ctx.llm` is an instance of `agent.plugin_llm.PluginLlm`.
 								### `complete()`
 								```python
 								result = ctx.llm.complete(
 								    messages=[{"role": "user", "content": "Hi"}],
 								    provider=None,         # optional, gated — Hermes provider id (e.g. "openrouter")
 								    model=None,            # optional, gated — whatever string that provider expects
 								    temperature=None,
 								    max_tokens=None,
 								    timeout=None,          # seconds
 								    agent_id=None,         # optional, gated
 								    profile=None,          # optional, gated — explicit auth-profile name
 								    purpose="optional-audit-string",
 								)
 								# → PluginLlmCompleteResult(text, provider, model, agent_id, usage, audit)
 								```
 								Plain chat completion. `messages` is the standard OpenAI shape — a
 								list of `{"role": "...", "content": "..."}` dicts. Multi-turn
 								prompts (system + few-shot user/assistant pairs + final user) work
 								exactly as they would with the OpenAI SDK.
 								`provider=` and `model=` are independent and follow the same shape
 								as the host's main config (`model.provider` + `model.model`). Set
 								just `model=` to use the user's active provider with a different
 								model on it. Set both to switch providers entirely. Either argument
 								without operator opt-in raises `PluginLlmTrustError`.
 								### `complete_structured()`
 								```python
 								result = ctx.llm.complete_structured(
 								    instructions="What you want extracted.",
 								    input=[
 								        {"type": "text",  "text": "..."},
 								        {"type": "image", "data": b"...", "mime_type": "image/png"},
 								        {"type": "image", "url":  "https://..."},
 								    ],
 								    json_schema={...},     # optional — triggers parsed result + validation
 								    json_mode=False,       # set True without a schema to ask for JSON anyway
 								    schema_name=None,      # optional human-readable schema name
 								    system_prompt=None,
 								    provider=None,         # optional, gated
 								    model=None,            # optional, gated
 								    temperature=None,
 								    max_tokens=None,
 								    timeout=None,
 								    agent_id=None,
 								    profile=None,
 								    purpose=None,
 								)
 								# → PluginLlmStructuredResult(text, provider, model, agent_id,
 								#                             usage, parsed, content_type, audit)
 								```
 								Inputs are typed text or image blocks (raw bytes get base64 encoded
 								as a `data:` URL automatically). When `json_schema` or
 								`json_mode=True` is supplied, the host requests JSON output via
 								`response_format`, parses it locally as a fallback, and validates
 								against your schema if `jsonschema` is installed.
 								* `result.content_type == "json"` — `result.parsed` is a Python
 								  object that matches your schema.
 								* `result.content_type == "text"` — parsing or validation failed;
 								  inspect `result.text` for the raw model response.
 								### Async
 								```python
 								result = await ctx.llm.acomplete(messages=...)
 								result = await ctx.llm.acomplete_structured(instructions=..., input=...)
 								```
 								Same arguments and result types as their sync counterparts. Use
 								these from gateway adapters, async hooks, or any plugin code
 								already running on an asyncio loop.
 								### Result attributes
 								```python
 								@dataclass
 								class PluginLlmCompleteResult:
 								    text: str                    # the assistant's response
 								    provider: str                # e.g. "openrouter", "anthropic"
 								    model: str                   # whatever the provider returned for this call
 								    agent_id: str                # whose model/auth was used
 								    usage: PluginLlmUsage        # tokens + cache + cost estimate
 								    audit: Dict[str, Any]        # plugin_id, purpose, profile
 								@dataclass
 								class PluginLlmStructuredResult(PluginLlmCompleteResult):
 								    parsed: Optional[Any]        # JSON object when content_type == "json"
 								    content_type: str            # "json" or "text"
 								    # audit also carries schema_name when supplied
 								```
 								`usage` carries `input_tokens`, `output_tokens`, `total_tokens`,
 								`cache_read_tokens`, `cache_write_tokens`, and `cost_usd` when the
 								provider returns those fields.
 								## Trust gate
 								The default behaviour is fail-closed. With no `plugins.entries`
 								config block, a plugin can:
 								* run any of the four methods against the user's active provider
 								  and model,
 								* set request-shaping arguments (`temperature`, `max_tokens`,
 								  `timeout`, `system_prompt`, `purpose`, `messages`, `instructions`,
 								  `input`, `json_schema`),
 								…and that's it. `provider=`, `model=`, `agent_id=`, and `profile=`
 								arguments raise `PluginLlmTrustError` until the operator opts in.
 								**Most plugins never need this section.** A plugin that just calls
 								`ctx.llm.complete(messages=...)` with no overrides runs against
 								whatever the user has active and works zero-config. The block below
 								is only relevant when a plugin specifically wants to pin to a
 								different model or provider than the user.
 								```yaml
 								plugins:
 								  entries:
 								    my-plugin:
 								      llm:
 								        # Allow this plugin to choose a different Hermes provider
 								        # (must be one Hermes already knows about — same names as
 								        # `hermes model` and config.yaml model.provider).
 								        allow_provider_override: true
 								        # Optionally restrict which providers. Use ["*"] for any.
 								        allowed_providers:
 								          - openrouter
 								          - anthropic
 								        # Allow this plugin to ask for a specific model.
 								        allow_model_override: true
 								        # Optionally restrict which models. Use ["*"] for any.
 								        # Models are matched literally against whatever string the
 								        # plugin sends — Hermes does not look anything up.
 								        allowed_models:
 								          - openai/gpt-4o-mini
 								          - anthropic/claude-3-5-haiku
 								        # Allow cross-agent calls (rare).
 								        allow_agent_id_override: false
 								        # Allow the plugin to request a specific stored auth profile
 								        # (e.g. a different OAuth account on the same provider).
 								        allow_profile_override: false
 								```
 								The plugin id is the manifest `name:` field for flat plugins, or the
 								path-derived key for nested plugins (`image_gen/openai`,
 								`memory/honcho`, etc.).
 								### What the gate enforces
 								| Override        | Default | Config key                       |
 								| --------------- | ------- | -------------------------------- |
 								| `provider=`     | denied  | `allow_provider_override: true`  |
 								| ↳ allowlist     | —       | `allowed_providers: [...]`       |
 								| `model=`        | denied  | `allow_model_override: true`     |
 								| ↳ allowlist     | —       | `allowed_models: [...]`          |
 								| `agent_id=`     | denied  | `allow_agent_id_override: true`  |
 								| `profile=`      | denied  | `allow_profile_override: true`   |
 								Each override is independently gated. Granting `allow_model_override`
 								does **not** also grant `allow_provider_override` — a plugin trusted
 								to pick a model is still pinned to the user's active provider unless
 								it gets the provider gate as well.
 								### What the gate does NOT need to enforce
 								* Request-shaping arguments — `temperature`, `max_tokens`,
 								  `timeout`, `system_prompt`, `purpose`, `messages`, `instructions`,
 								  `input`, `json_schema`, `schema_name`, `json_mode` — are always
 								  allowed; they don't pick credentials or routes.
 								* The default deny posture means an unconfigured plugin can still do
 								  useful work — it just runs against the active provider and model.
 								  Operators only need to think about `plugins.entries` for plugins
 								  that want finer routing.
 								## What the host owns
 								A complete list of the things `ctx.llm` does for the plugin so you
 								don't have to:
 								* **Provider resolution.** Reads `model.provider` + `model.model`
 								  from the user's config (or the explicit overrides when trusted).
 								* **Auth.** Pulls API keys, OAuth tokens, or refresh tokens from
 								  `~/.hermes/auth.json` / env, including the credential pool when
 								  one is configured. The plugin never sees them.
 								* **Vision routing.** When image input is supplied and the user's
 								  active text model is text-only, the host falls back to the
 								  configured vision model automatically.
 								* **Fallback chain.** If the user's primary provider 5xxs or 429s,
 								  the request goes through Hermes' usual aggregator-aware fallback
 								  before it returns an error to the plugin.
 								* **Timeout.** Honours your `timeout=` argument, falling back to
 								  `auxiliary.<task>.timeout` config or the global aux default.
 								* **JSON shaping.** Sends `response_format` to the provider when
 								  you ask for JSON, then re-parses locally from a code-fenced
 								  response if the provider returned one.
 								* **Schema validation.** Validates against your `json_schema` when
 								  `jsonschema` is installed; logs a debug line and skips strict
 								  validation otherwise.
 								* **Audit log.** Each call writes one INFO line to `agent.log` with
 								  the plugin id, provider/model, purpose, and token totals.
 								## What the plugin owns
 								* **Request shape.** `messages` for chat, `instructions` + `input`
 								  for structured. The plugin builds the prompt; the host runs it.
 								* **Schema.** Whatever shape you want back. The host doesn't infer
 								  it for you.
 								* **Error handling.** `complete_structured()` raises `ValueError` on
 								  empty inputs and on schema-validation failure. `PluginLlmTrustError`
 								  fires when the trust gate denies an override. Anything else
 								  (provider 5xx, no credentials configured, timeout) raises whatever
 								  `auxiliary_client.call_llm()` raises.
 								* **Cost.** Every call runs against the user's paid provider. Don't
 								  loop on `complete()` for every gateway message without thinking
 								  about token spend.
 								## Where this fits in the plugin surface
 								Existing `ctx.*` methods extend an existing Hermes subsystem:
 								| `ctx.register_tool` | adds a tool the agent can call |
 								| `ctx.register_platform` | wires a new gateway adapter |
 								| `ctx.register_image_gen_provider` | replaces an image-gen backend |
 								| `ctx.register_memory_provider` | replaces the memory backend |
 								| `ctx.register_context_engine` | replaces the context compressor |
 								| `ctx.register_hook` | observes a lifecycle event |
 								`ctx.llm` is the first surface that lets a plugin run the same
 								model the user is talking to, *out of band*, without any of the
 								above. That's its only job. If your plugin needs to register a
 								tool the agent invokes, use `register_tool`. If it needs to react
 								to a lifecycle event, use `register_hook`. If it needs to make its
 								own model call — for any reason, structured or not — `ctx.llm`.
 								## Reference
 								* Implementation: [`agent/plugin_llm.py`](https://github.com/NousResearch/hermes-agent/blob/main/agent/plugin_llm.py)
 								* Tests: [`tests/agent/test_plugin_llm.py`](https://github.com/NousResearch/hermes-agent/blob/main/tests/agent/test_plugin_llm.py)
 								* Reference plugins (companion repo):
 								  * [`plugin-llm-example`](https://github.com/NousResearch/hermes-example-plugins/tree/main/plugin-llm-example) — sync structured extraction with image input
 								  * [`plugin-llm-async-example`](https://github.com/NousResearch/hermes-example-plugins/tree/main/plugin-llm-async-example) — async with `asyncio.gather()`
 								* Auxiliary client (the engine under the hood): see
 								  [Provider Runtime](/docs/developer-guide/provider-runtime).