feat: add persistent CLI status bar and usage details (#1522)
Salvaged from PR #1104 by kshitijk4poor. Closes #683.
Adds a persistent status bar to the CLI showing model name, context
window usage with visual bar, estimated cost, and session duration.
Responsive layout degrades gracefully for narrow terminals.
Changes:
- agent/usage_pricing.py: shared pricing table, cost estimation with
Decimal arithmetic, duration/token formatting helpers
- agent/insights.py: refactored to reuse usage_pricing (eliminates
duplicate pricing table and formatting logic)
- cli.py: status bar with FormattedTextControl fragments, color-coded
context thresholds (green/yellow/orange/red), enhanced /usage with
cost breakdown, 1Hz idle refresh for status bar updates
- tests/test_cli_status_bar.py: status bar snapshot, width collapsing,
usage report with/without pricing, zero-priced model handling
- tests/test_insights.py: verify zero-priced providers show as unknown
Salvage fixes:
- Resolved conflict with voice status bar (both coexist in layout)
- Import _format_context_length from hermes_cli.banner (moved since PR)
Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
2026-03-16 04:42:48 -07:00
|
|
|
from __future__ import annotations
|
|
|
|
|
|
2026-05-07 16:24:31 -04:00
|
|
|
import re
|
2026-03-17 03:44:44 -07:00
|
|
|
from dataclasses import dataclass
|
|
|
|
|
from datetime import datetime, timezone
|
feat: add persistent CLI status bar and usage details (#1522)
Salvaged from PR #1104 by kshitijk4poor. Closes #683.
Adds a persistent status bar to the CLI showing model name, context
window usage with visual bar, estimated cost, and session duration.
Responsive layout degrades gracefully for narrow terminals.
Changes:
- agent/usage_pricing.py: shared pricing table, cost estimation with
Decimal arithmetic, duration/token formatting helpers
- agent/insights.py: refactored to reuse usage_pricing (eliminates
duplicate pricing table and formatting logic)
- cli.py: status bar with FormattedTextControl fragments, color-coded
context thresholds (green/yellow/orange/red), enhanced /usage with
cost breakdown, 1Hz idle refresh for status bar updates
- tests/test_cli_status_bar.py: status bar snapshot, width collapsing,
usage report with/without pricing, zero-priced model handling
- tests/test_insights.py: verify zero-priced providers show as unknown
Salvage fixes:
- Resolved conflict with voice status bar (both coexist in layout)
- Import _format_context_length from hermes_cli.banner (moved since PR)
Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
2026-03-16 04:42:48 -07:00
|
|
|
from decimal import Decimal
|
2026-03-17 03:44:44 -07:00
|
|
|
from typing import Any, Dict, Literal, Optional
|
|
|
|
|
|
2026-03-18 03:04:07 -07:00
|
|
|
from agent.model_metadata import fetch_endpoint_model_metadata, fetch_model_metadata
|
fix: sweep remaining provider-URL substring checks across codebase
Completes the hostname-hardening sweep — every substring check against a
provider host in live-routing code is now hostname-based. This closes the
same false-positive class for OpenRouter, GitHub Copilot, Kimi, Qwen,
ChatGPT/Codex, Bedrock, GitHub Models, Vercel AI Gateway, Nous, Z.AI,
Moonshot, Arcee, and MiniMax that the original PR closed for OpenAI, xAI,
and Anthropic.
New helper:
- utils.base_url_host_matches(base_url, domain) — safe counterpart to
'domain in base_url'. Accepts hostname equality and subdomain matches;
rejects path segments, host suffixes, and prefix collisions.
Call sites converted (real-code only; tests, optional-skills, red-teaming
scripts untouched):
run_agent.py (10 sites):
- AIAgent.__init__ Bedrock branch, ChatGPT/Codex branch (also path check)
- header cascade for openrouter / copilot / kimi / qwen / chatgpt
- interleaved-thinking trigger (openrouter + claude)
- _is_openrouter_url(), _is_qwen_portal()
- is_native_anthropic check
- github-models-vs-copilot detection (3 sites)
- reasoning-capable route gate (nousresearch, vercel, github)
- codex-backend detection in API kwargs build
- fallback api_mode Bedrock detection
agent/auxiliary_client.py (7 sites):
- extra-headers cascades in 4 distinct client-construction paths
(resolve custom, resolve auto, OpenRouter-fallback-to-custom,
_async_client_from_sync, resolve_provider_client explicit-custom,
resolve_auto_with_codex)
- _is_openrouter_client() base_url sniff
agent/usage_pricing.py:
- resolve_billing_route openrouter branch
agent/model_metadata.py:
- _is_openrouter_base_url(), Bedrock context-length lookup
hermes_cli/providers.py:
- determine_api_mode Bedrock heuristic
hermes_cli/runtime_provider.py:
- _is_openrouter_url flag for API-key preference (issues #420, #560)
hermes_cli/doctor.py:
- Kimi User-Agent header for /models probes
tools/delegate_tool.py:
- subagent Codex endpoint detection
trajectory_compressor.py:
- _detect_provider() cascade (8 providers: openrouter, nous, codex, zai,
kimi-coding, arcee, minimax-cn, minimax)
cli.py, gateway/run.py:
- /model-switch cache-enabled hint (openrouter + claude)
Bedrock detection tightened from 'bedrock-runtime in url' to
'hostname starts with bedrock-runtime. AND host is under amazonaws.com'.
ChatGPT/Codex detection tightened from 'chatgpt.com/backend-api/codex in
url' to 'hostname is chatgpt.com AND path contains /backend-api/codex'.
Tests:
- tests/test_base_url_hostname.py extended with a base_url_host_matches
suite (exact match, subdomain, path-segment rejection, host-suffix
rejection, host-prefix rejection, empty-input, case-insensitivity,
trailing dot).
Validation: 651 targeted tests pass (runtime_provider, minimax, bedrock,
gemini, auxiliary, codex_cloudflare, usage_pricing, compressor_fallback,
fallback_model, openai_client_lifecycle, provider_parity, cli_provider_resolution,
delegate, credential_pool, context_compressor, plus the 4 hostname test
modules). 26-assertion E2E call-site verification across 6 modules passes.
2026-04-20 21:17:28 -07:00
|
|
|
from utils import base_url_host_matches
|
feat: add persistent CLI status bar and usage details (#1522)
Salvaged from PR #1104 by kshitijk4poor. Closes #683.
Adds a persistent status bar to the CLI showing model name, context
window usage with visual bar, estimated cost, and session duration.
Responsive layout degrades gracefully for narrow terminals.
Changes:
- agent/usage_pricing.py: shared pricing table, cost estimation with
Decimal arithmetic, duration/token formatting helpers
- agent/insights.py: refactored to reuse usage_pricing (eliminates
duplicate pricing table and formatting logic)
- cli.py: status bar with FormattedTextControl fragments, color-coded
context thresholds (green/yellow/orange/red), enhanced /usage with
cost breakdown, 1Hz idle refresh for status bar updates
- tests/test_cli_status_bar.py: status bar snapshot, width collapsing,
usage report with/without pricing, zero-priced model handling
- tests/test_insights.py: verify zero-priced providers show as unknown
Salvage fixes:
- Resolved conflict with voice status bar (both coexist in layout)
- Import _format_context_length from hermes_cli.banner (moved since PR)
Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
2026-03-16 04:42:48 -07:00
|
|
|
|
|
|
|
|
DEFAULT_PRICING = {"input": 0.0, "output": 0.0}
|
|
|
|
|
|
2026-03-17 03:44:44 -07:00
|
|
|
_ZERO = Decimal("0")
|
|
|
|
|
_ONE_MILLION = Decimal("1000000")
|
2026-05-15 10:07:45 +10:00
|
|
|
_NOUS_DEFAULT_BASE_URL = "https://inference-api.nousresearch.com/v1"
|
2026-03-17 03:44:44 -07:00
|
|
|
|
|
|
|
|
CostStatus = Literal["actual", "estimated", "included", "unknown"]
|
|
|
|
|
CostSource = Literal[
|
|
|
|
|
"provider_cost_api",
|
|
|
|
|
"provider_generation_api",
|
|
|
|
|
"provider_models_api",
|
|
|
|
|
"official_docs_snapshot",
|
|
|
|
|
"user_override",
|
|
|
|
|
"custom_contract",
|
|
|
|
|
"none",
|
|
|
|
|
]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@dataclass(frozen=True)
|
|
|
|
|
class CanonicalUsage:
|
|
|
|
|
input_tokens: int = 0
|
|
|
|
|
output_tokens: int = 0
|
|
|
|
|
cache_read_tokens: int = 0
|
|
|
|
|
cache_write_tokens: int = 0
|
|
|
|
|
reasoning_tokens: int = 0
|
|
|
|
|
request_count: int = 1
|
|
|
|
|
raw_usage: Optional[dict[str, Any]] = None
|
|
|
|
|
|
|
|
|
|
@property
|
|
|
|
|
def prompt_tokens(self) -> int:
|
|
|
|
|
return self.input_tokens + self.cache_read_tokens + self.cache_write_tokens
|
|
|
|
|
|
|
|
|
|
@property
|
|
|
|
|
def total_tokens(self) -> int:
|
|
|
|
|
return self.prompt_tokens + self.output_tokens
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@dataclass(frozen=True)
|
|
|
|
|
class BillingRoute:
|
|
|
|
|
provider: str
|
|
|
|
|
model: str
|
|
|
|
|
base_url: str = ""
|
|
|
|
|
billing_mode: str = "unknown"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@dataclass(frozen=True)
|
|
|
|
|
class PricingEntry:
|
|
|
|
|
input_cost_per_million: Optional[Decimal] = None
|
|
|
|
|
output_cost_per_million: Optional[Decimal] = None
|
|
|
|
|
cache_read_cost_per_million: Optional[Decimal] = None
|
|
|
|
|
cache_write_cost_per_million: Optional[Decimal] = None
|
|
|
|
|
request_cost: Optional[Decimal] = None
|
|
|
|
|
source: CostSource = "none"
|
|
|
|
|
source_url: Optional[str] = None
|
|
|
|
|
pricing_version: Optional[str] = None
|
|
|
|
|
fetched_at: Optional[datetime] = None
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@dataclass(frozen=True)
|
|
|
|
|
class CostResult:
|
|
|
|
|
amount_usd: Optional[Decimal]
|
|
|
|
|
status: CostStatus
|
|
|
|
|
source: CostSource
|
|
|
|
|
label: str
|
|
|
|
|
fetched_at: Optional[datetime] = None
|
|
|
|
|
pricing_version: Optional[str] = None
|
|
|
|
|
notes: tuple[str, ...] = ()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
_UTC_NOW = lambda: datetime.now(timezone.utc)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# Official docs snapshot entries. Models whose published pricing and cache
|
|
|
|
|
# semantics are stable enough to encode exactly.
|
|
|
|
|
_OFFICIAL_DOCS_PRICING: Dict[tuple[str, str], PricingEntry] = {
|
feat: add claude-opus-4.8 and claude-opus-4.8-fast (#34003)
Anthropic released Claude Opus 4.8 on 2026-05-27, available on
OpenRouter, Anthropic, Amazon Bedrock, and Claude Platform on AWS:
- https://openrouter.ai/anthropic/claude-opus-4.8
- https://openrouter.ai/anthropic/claude-opus-4.8-fast
The fast-mode variant is a separate model ID (anthropic/claude-opus-4.8-fast)
priced at 2x of the base model — a notable improvement over the 6x premium
on older Opus generations (4.6/4.7). It is NOT a `speed: "fast"` request
parameter like Opus 4.6; Anthropic's native fast-mode beta still only
covers Opus 4.6.
Changes:
hermes_cli/models.py
- Add anthropic/claude-opus-4.8 + anthropic/claude-opus-4.8-fast to
the OpenRouter fallback snapshot and the Nous Portal curated list
(live catalogs surface them automatically when reachable; the
fallback list matters when the manifest fetch fails).
- Add claude-opus-4-8 to the Anthropic-native picker list.
agent/model_metadata.py
- Register claude-opus-4-8 / claude-opus-4.8 in DEFAULT_CONTEXT_LENGTHS
with 1M tokens (matches 4.6/4.7).
agent/anthropic_adapter.py
- Extend _XHIGH_EFFORT_SUBSTRINGS, _ADAPTIVE_THINKING_SUBSTRINGS, and
_NO_SAMPLING_PARAMS_SUBSTRINGS with "4-8"/"4.8". 4.8 inherits the
Opus 4.7 API contract: adaptive thinking only, xhigh effort level
supported, sampling parameters (temperature/top_p/top_k) return 400.
- Add claude-opus-4-8 to _ANTHROPIC_OUTPUT_LIMITS (128k max output,
same as 4.7). Matches by substring so claude-opus-4-8-fast and
date-stamped variants resolve correctly.
agent/usage_pricing.py
- Add anthropic/claude-opus-4-8: $5/$25 per MTok input/output, $0.50
cache read, $6.25 cache write (same as 4.6/4.7).
- Add anthropic/claude-opus-4-8-fast: $10/$50 per MTok (2x), $1.00
cache read, $12.50 cache write. Per OpenRouter, the 2x premium is
the only differentiator from regular Opus 4.8.
- OpenRouter routes still pull pricing from the live /models API, so
no static OpenRouter entry is needed.
tests/agent/test_model_metadata.py
- Extend the Claude 4.6+ context-length tag list with 4.8/4-8.
website/static/api/model-catalog.json
- Regenerated via `python scripts/build_model_catalog.py` to pick up
the new entries in the OpenRouter and Nous Portal fallback lists.
E2E verification (isolated sys.path import against the worktree):
- _supports_adaptive_thinking, _supports_xhigh_effort, _forbids_sampling_params
all return True for claude-opus-4.8 and claude-opus-4.8-fast.
- _supports_fast_mode (the `speed: "fast"` request-parameter gate) stays
False for 4.8 — fast mode is a separate model ID on OpenRouter, not a
parameter Anthropic accepts on the base model.
- DEFAULT_CONTEXT_LENGTHS resolves 1M for both notations.
- resolve_billing_route + _lookup_official_docs_pricing resolve the
correct $5/$25 (regular) and $10/$50 (fast) pricing for both
dot-notation and dash-notation inputs.
- 4.7 and 4.6 regression: behavior unchanged.
Unit tests: 305 passed across tests/agent/test_usage_pricing.py,
test_model_metadata.py, tests/hermes_cli/test_model_catalog.py,
test_models.py, test_model_validation.py, test_models_dev_preferred_merge.py.
2026-05-28 10:31:59 -07:00
|
|
|
# ── Anthropic Claude 4.8 ─────────────────────────────────────────────
|
|
|
|
|
# Same $5/$25 base pricing as 4.6/4.7. Fast-mode variant is a separate
|
|
|
|
|
# model ID with 2x premium (vs the 6x premium on older Opus generations).
|
|
|
|
|
# Source: https://openrouter.ai/anthropic/claude-opus-4.8
|
|
|
|
|
(
|
|
|
|
|
"anthropic",
|
|
|
|
|
"claude-opus-4-8",
|
|
|
|
|
): PricingEntry(
|
|
|
|
|
input_cost_per_million=Decimal("5.00"),
|
|
|
|
|
output_cost_per_million=Decimal("25.00"),
|
|
|
|
|
cache_read_cost_per_million=Decimal("0.50"),
|
|
|
|
|
cache_write_cost_per_million=Decimal("6.25"),
|
|
|
|
|
source="official_docs_snapshot",
|
|
|
|
|
source_url="https://platform.claude.com/docs/en/about-claude/pricing",
|
|
|
|
|
pricing_version="anthropic-pricing-2026-05",
|
|
|
|
|
),
|
|
|
|
|
(
|
|
|
|
|
"anthropic",
|
|
|
|
|
"claude-opus-4-8-fast",
|
|
|
|
|
): PricingEntry(
|
|
|
|
|
input_cost_per_million=Decimal("10.00"),
|
|
|
|
|
output_cost_per_million=Decimal("50.00"),
|
|
|
|
|
cache_read_cost_per_million=Decimal("1.00"),
|
|
|
|
|
cache_write_cost_per_million=Decimal("12.50"),
|
|
|
|
|
source="official_docs_snapshot",
|
|
|
|
|
source_url="https://openrouter.ai/anthropic/claude-opus-4.8-fast",
|
|
|
|
|
pricing_version="anthropic-pricing-2026-05",
|
|
|
|
|
),
|
2026-05-07 16:24:31 -04:00
|
|
|
# ── Anthropic Claude 4.7 ─────────────────────────────────────────────
|
|
|
|
|
# Opus 4.5/4.6/4.7 share $5/$25 pricing (new tokenizer, up to 35% more
|
|
|
|
|
# tokens for the same text).
|
|
|
|
|
# Source: https://platform.claude.com/docs/en/about-claude/pricing
|
|
|
|
|
(
|
|
|
|
|
"anthropic",
|
|
|
|
|
"claude-opus-4-7",
|
|
|
|
|
): PricingEntry(
|
|
|
|
|
input_cost_per_million=Decimal("5.00"),
|
|
|
|
|
output_cost_per_million=Decimal("25.00"),
|
|
|
|
|
cache_read_cost_per_million=Decimal("0.50"),
|
|
|
|
|
cache_write_cost_per_million=Decimal("6.25"),
|
|
|
|
|
source="official_docs_snapshot",
|
|
|
|
|
source_url="https://platform.claude.com/docs/en/about-claude/pricing",
|
|
|
|
|
pricing_version="anthropic-pricing-2026-05",
|
|
|
|
|
),
|
|
|
|
|
(
|
|
|
|
|
"anthropic",
|
|
|
|
|
"claude-opus-4-7-20250507",
|
|
|
|
|
): PricingEntry(
|
|
|
|
|
input_cost_per_million=Decimal("5.00"),
|
|
|
|
|
output_cost_per_million=Decimal("25.00"),
|
|
|
|
|
cache_read_cost_per_million=Decimal("0.50"),
|
|
|
|
|
cache_write_cost_per_million=Decimal("6.25"),
|
|
|
|
|
source="official_docs_snapshot",
|
|
|
|
|
source_url="https://platform.claude.com/docs/en/about-claude/pricing",
|
|
|
|
|
pricing_version="anthropic-pricing-2026-05",
|
|
|
|
|
),
|
|
|
|
|
# ── Anthropic Claude 4.6 ─────────────────────────────────────────────
|
|
|
|
|
(
|
|
|
|
|
"anthropic",
|
|
|
|
|
"claude-opus-4-6",
|
|
|
|
|
): PricingEntry(
|
|
|
|
|
input_cost_per_million=Decimal("5.00"),
|
|
|
|
|
output_cost_per_million=Decimal("25.00"),
|
|
|
|
|
cache_read_cost_per_million=Decimal("0.50"),
|
|
|
|
|
cache_write_cost_per_million=Decimal("6.25"),
|
|
|
|
|
source="official_docs_snapshot",
|
|
|
|
|
source_url="https://platform.claude.com/docs/en/about-claude/pricing",
|
|
|
|
|
pricing_version="anthropic-pricing-2026-05",
|
|
|
|
|
),
|
|
|
|
|
(
|
|
|
|
|
"anthropic",
|
|
|
|
|
"claude-opus-4-6-20250414",
|
|
|
|
|
): PricingEntry(
|
|
|
|
|
input_cost_per_million=Decimal("5.00"),
|
|
|
|
|
output_cost_per_million=Decimal("25.00"),
|
|
|
|
|
cache_read_cost_per_million=Decimal("0.50"),
|
|
|
|
|
cache_write_cost_per_million=Decimal("6.25"),
|
|
|
|
|
source="official_docs_snapshot",
|
|
|
|
|
source_url="https://platform.claude.com/docs/en/about-claude/pricing",
|
|
|
|
|
pricing_version="anthropic-pricing-2026-05",
|
|
|
|
|
),
|
|
|
|
|
(
|
|
|
|
|
"anthropic",
|
|
|
|
|
"claude-sonnet-4-6",
|
|
|
|
|
): PricingEntry(
|
|
|
|
|
input_cost_per_million=Decimal("3.00"),
|
|
|
|
|
output_cost_per_million=Decimal("15.00"),
|
|
|
|
|
cache_read_cost_per_million=Decimal("0.30"),
|
|
|
|
|
cache_write_cost_per_million=Decimal("3.75"),
|
|
|
|
|
source="official_docs_snapshot",
|
|
|
|
|
source_url="https://platform.claude.com/docs/en/about-claude/pricing",
|
|
|
|
|
pricing_version="anthropic-pricing-2026-05",
|
|
|
|
|
),
|
|
|
|
|
(
|
|
|
|
|
"anthropic",
|
|
|
|
|
"claude-sonnet-4-6-20250414",
|
|
|
|
|
): PricingEntry(
|
|
|
|
|
input_cost_per_million=Decimal("3.00"),
|
|
|
|
|
output_cost_per_million=Decimal("15.00"),
|
|
|
|
|
cache_read_cost_per_million=Decimal("0.30"),
|
|
|
|
|
cache_write_cost_per_million=Decimal("3.75"),
|
|
|
|
|
source="official_docs_snapshot",
|
|
|
|
|
source_url="https://platform.claude.com/docs/en/about-claude/pricing",
|
|
|
|
|
pricing_version="anthropic-pricing-2026-05",
|
|
|
|
|
),
|
|
|
|
|
# ── Anthropic Claude 4.5 ─────────────────────────────────────────────
|
|
|
|
|
(
|
|
|
|
|
"anthropic",
|
|
|
|
|
"claude-opus-4-5",
|
|
|
|
|
): PricingEntry(
|
|
|
|
|
input_cost_per_million=Decimal("5.00"),
|
|
|
|
|
output_cost_per_million=Decimal("25.00"),
|
|
|
|
|
cache_read_cost_per_million=Decimal("0.50"),
|
|
|
|
|
cache_write_cost_per_million=Decimal("6.25"),
|
|
|
|
|
source="official_docs_snapshot",
|
|
|
|
|
source_url="https://platform.claude.com/docs/en/about-claude/pricing",
|
|
|
|
|
pricing_version="anthropic-pricing-2026-05",
|
|
|
|
|
),
|
|
|
|
|
(
|
|
|
|
|
"anthropic",
|
|
|
|
|
"claude-sonnet-4-5",
|
|
|
|
|
): PricingEntry(
|
|
|
|
|
input_cost_per_million=Decimal("3.00"),
|
|
|
|
|
output_cost_per_million=Decimal("15.00"),
|
|
|
|
|
cache_read_cost_per_million=Decimal("0.30"),
|
|
|
|
|
cache_write_cost_per_million=Decimal("3.75"),
|
|
|
|
|
source="official_docs_snapshot",
|
|
|
|
|
source_url="https://platform.claude.com/docs/en/about-claude/pricing",
|
|
|
|
|
pricing_version="anthropic-pricing-2026-05",
|
|
|
|
|
),
|
|
|
|
|
(
|
|
|
|
|
"anthropic",
|
|
|
|
|
"claude-haiku-4-5",
|
|
|
|
|
): PricingEntry(
|
|
|
|
|
input_cost_per_million=Decimal("1.00"),
|
|
|
|
|
output_cost_per_million=Decimal("5.00"),
|
|
|
|
|
cache_read_cost_per_million=Decimal("0.10"),
|
|
|
|
|
cache_write_cost_per_million=Decimal("1.25"),
|
|
|
|
|
source="official_docs_snapshot",
|
|
|
|
|
source_url="https://platform.claude.com/docs/en/about-claude/pricing",
|
|
|
|
|
pricing_version="anthropic-pricing-2026-05",
|
|
|
|
|
),
|
|
|
|
|
# ── Anthropic Claude 4 / 4.1 ─────────────────────────────────────────
|
2026-03-17 03:44:44 -07:00
|
|
|
(
|
|
|
|
|
"anthropic",
|
|
|
|
|
"claude-opus-4-20250514",
|
|
|
|
|
): PricingEntry(
|
|
|
|
|
input_cost_per_million=Decimal("15.00"),
|
|
|
|
|
output_cost_per_million=Decimal("75.00"),
|
|
|
|
|
cache_read_cost_per_million=Decimal("1.50"),
|
|
|
|
|
cache_write_cost_per_million=Decimal("18.75"),
|
|
|
|
|
source="official_docs_snapshot",
|
2026-05-07 16:24:31 -04:00
|
|
|
source_url="https://platform.claude.com/docs/en/about-claude/pricing",
|
|
|
|
|
pricing_version="anthropic-pricing-2026-05",
|
2026-03-17 03:44:44 -07:00
|
|
|
),
|
|
|
|
|
(
|
|
|
|
|
"anthropic",
|
|
|
|
|
"claude-sonnet-4-20250514",
|
|
|
|
|
): PricingEntry(
|
|
|
|
|
input_cost_per_million=Decimal("3.00"),
|
|
|
|
|
output_cost_per_million=Decimal("15.00"),
|
|
|
|
|
cache_read_cost_per_million=Decimal("0.30"),
|
|
|
|
|
cache_write_cost_per_million=Decimal("3.75"),
|
|
|
|
|
source="official_docs_snapshot",
|
2026-05-07 16:24:31 -04:00
|
|
|
source_url="https://platform.claude.com/docs/en/about-claude/pricing",
|
|
|
|
|
pricing_version="anthropic-pricing-2026-05",
|
2026-03-17 03:44:44 -07:00
|
|
|
),
|
|
|
|
|
# OpenAI
|
|
|
|
|
(
|
|
|
|
|
"openai",
|
|
|
|
|
"gpt-4o",
|
|
|
|
|
): PricingEntry(
|
|
|
|
|
input_cost_per_million=Decimal("2.50"),
|
|
|
|
|
output_cost_per_million=Decimal("10.00"),
|
|
|
|
|
cache_read_cost_per_million=Decimal("1.25"),
|
|
|
|
|
source="official_docs_snapshot",
|
|
|
|
|
source_url="https://openai.com/api/pricing/",
|
|
|
|
|
pricing_version="openai-pricing-2026-03-16",
|
|
|
|
|
),
|
|
|
|
|
(
|
|
|
|
|
"openai",
|
|
|
|
|
"gpt-4o-mini",
|
|
|
|
|
): PricingEntry(
|
|
|
|
|
input_cost_per_million=Decimal("0.15"),
|
|
|
|
|
output_cost_per_million=Decimal("0.60"),
|
|
|
|
|
cache_read_cost_per_million=Decimal("0.075"),
|
|
|
|
|
source="official_docs_snapshot",
|
|
|
|
|
source_url="https://openai.com/api/pricing/",
|
|
|
|
|
pricing_version="openai-pricing-2026-03-16",
|
|
|
|
|
),
|
|
|
|
|
(
|
|
|
|
|
"openai",
|
|
|
|
|
"gpt-4.1",
|
|
|
|
|
): PricingEntry(
|
|
|
|
|
input_cost_per_million=Decimal("2.00"),
|
|
|
|
|
output_cost_per_million=Decimal("8.00"),
|
|
|
|
|
cache_read_cost_per_million=Decimal("0.50"),
|
|
|
|
|
source="official_docs_snapshot",
|
|
|
|
|
source_url="https://openai.com/api/pricing/",
|
|
|
|
|
pricing_version="openai-pricing-2026-03-16",
|
|
|
|
|
),
|
|
|
|
|
(
|
|
|
|
|
"openai",
|
|
|
|
|
"gpt-4.1-mini",
|
|
|
|
|
): PricingEntry(
|
|
|
|
|
input_cost_per_million=Decimal("0.40"),
|
|
|
|
|
output_cost_per_million=Decimal("1.60"),
|
|
|
|
|
cache_read_cost_per_million=Decimal("0.10"),
|
|
|
|
|
source="official_docs_snapshot",
|
|
|
|
|
source_url="https://openai.com/api/pricing/",
|
|
|
|
|
pricing_version="openai-pricing-2026-03-16",
|
|
|
|
|
),
|
|
|
|
|
(
|
|
|
|
|
"openai",
|
|
|
|
|
"gpt-4.1-nano",
|
|
|
|
|
): PricingEntry(
|
|
|
|
|
input_cost_per_million=Decimal("0.10"),
|
|
|
|
|
output_cost_per_million=Decimal("0.40"),
|
|
|
|
|
cache_read_cost_per_million=Decimal("0.025"),
|
|
|
|
|
source="official_docs_snapshot",
|
|
|
|
|
source_url="https://openai.com/api/pricing/",
|
|
|
|
|
pricing_version="openai-pricing-2026-03-16",
|
|
|
|
|
),
|
|
|
|
|
(
|
|
|
|
|
"openai",
|
|
|
|
|
"o3",
|
|
|
|
|
): PricingEntry(
|
|
|
|
|
input_cost_per_million=Decimal("10.00"),
|
|
|
|
|
output_cost_per_million=Decimal("40.00"),
|
|
|
|
|
cache_read_cost_per_million=Decimal("2.50"),
|
|
|
|
|
source="official_docs_snapshot",
|
|
|
|
|
source_url="https://openai.com/api/pricing/",
|
|
|
|
|
pricing_version="openai-pricing-2026-03-16",
|
|
|
|
|
),
|
|
|
|
|
(
|
|
|
|
|
"openai",
|
|
|
|
|
"o3-mini",
|
|
|
|
|
): PricingEntry(
|
|
|
|
|
input_cost_per_million=Decimal("1.10"),
|
|
|
|
|
output_cost_per_million=Decimal("4.40"),
|
|
|
|
|
cache_read_cost_per_million=Decimal("0.55"),
|
|
|
|
|
source="official_docs_snapshot",
|
|
|
|
|
source_url="https://openai.com/api/pricing/",
|
|
|
|
|
pricing_version="openai-pricing-2026-03-16",
|
|
|
|
|
),
|
2026-05-07 16:24:31 -04:00
|
|
|
# ── Anthropic older models (pre-4.5 generation) ────────────────────────
|
2026-03-17 03:44:44 -07:00
|
|
|
(
|
|
|
|
|
"anthropic",
|
|
|
|
|
"claude-3-5-sonnet-20241022",
|
|
|
|
|
): PricingEntry(
|
|
|
|
|
input_cost_per_million=Decimal("3.00"),
|
|
|
|
|
output_cost_per_million=Decimal("15.00"),
|
|
|
|
|
cache_read_cost_per_million=Decimal("0.30"),
|
|
|
|
|
cache_write_cost_per_million=Decimal("3.75"),
|
|
|
|
|
source="official_docs_snapshot",
|
2026-05-07 16:24:31 -04:00
|
|
|
source_url="https://platform.claude.com/docs/en/about-claude/pricing",
|
|
|
|
|
pricing_version="anthropic-pricing-2026-05",
|
2026-03-17 03:44:44 -07:00
|
|
|
),
|
|
|
|
|
(
|
|
|
|
|
"anthropic",
|
|
|
|
|
"claude-3-5-haiku-20241022",
|
|
|
|
|
): PricingEntry(
|
|
|
|
|
input_cost_per_million=Decimal("0.80"),
|
|
|
|
|
output_cost_per_million=Decimal("4.00"),
|
|
|
|
|
cache_read_cost_per_million=Decimal("0.08"),
|
|
|
|
|
cache_write_cost_per_million=Decimal("1.00"),
|
|
|
|
|
source="official_docs_snapshot",
|
2026-05-07 16:24:31 -04:00
|
|
|
source_url="https://platform.claude.com/docs/en/about-claude/pricing",
|
|
|
|
|
pricing_version="anthropic-pricing-2026-05",
|
2026-03-17 03:44:44 -07:00
|
|
|
),
|
|
|
|
|
(
|
|
|
|
|
"anthropic",
|
|
|
|
|
"claude-3-opus-20240229",
|
|
|
|
|
): PricingEntry(
|
|
|
|
|
input_cost_per_million=Decimal("15.00"),
|
|
|
|
|
output_cost_per_million=Decimal("75.00"),
|
|
|
|
|
cache_read_cost_per_million=Decimal("1.50"),
|
|
|
|
|
cache_write_cost_per_million=Decimal("18.75"),
|
|
|
|
|
source="official_docs_snapshot",
|
2026-05-07 16:24:31 -04:00
|
|
|
source_url="https://platform.claude.com/docs/en/about-claude/pricing",
|
|
|
|
|
pricing_version="anthropic-pricing-2026-05",
|
2026-03-17 03:44:44 -07:00
|
|
|
),
|
|
|
|
|
(
|
|
|
|
|
"anthropic",
|
|
|
|
|
"claude-3-haiku-20240307",
|
|
|
|
|
): PricingEntry(
|
|
|
|
|
input_cost_per_million=Decimal("0.25"),
|
|
|
|
|
output_cost_per_million=Decimal("1.25"),
|
|
|
|
|
cache_read_cost_per_million=Decimal("0.03"),
|
|
|
|
|
cache_write_cost_per_million=Decimal("0.30"),
|
|
|
|
|
source="official_docs_snapshot",
|
2026-05-07 16:24:31 -04:00
|
|
|
source_url="https://platform.claude.com/docs/en/about-claude/pricing",
|
|
|
|
|
pricing_version="anthropic-pricing-2026-05",
|
2026-03-17 03:44:44 -07:00
|
|
|
),
|
|
|
|
|
# DeepSeek
|
|
|
|
|
(
|
|
|
|
|
"deepseek",
|
|
|
|
|
"deepseek-chat",
|
|
|
|
|
): PricingEntry(
|
|
|
|
|
input_cost_per_million=Decimal("0.14"),
|
|
|
|
|
output_cost_per_million=Decimal("0.28"),
|
|
|
|
|
source="official_docs_snapshot",
|
|
|
|
|
source_url="https://api-docs.deepseek.com/quick_start/pricing",
|
|
|
|
|
pricing_version="deepseek-pricing-2026-03-16",
|
|
|
|
|
),
|
|
|
|
|
(
|
|
|
|
|
"deepseek",
|
|
|
|
|
"deepseek-reasoner",
|
|
|
|
|
): PricingEntry(
|
|
|
|
|
input_cost_per_million=Decimal("0.55"),
|
|
|
|
|
output_cost_per_million=Decimal("2.19"),
|
|
|
|
|
source="official_docs_snapshot",
|
|
|
|
|
source_url="https://api-docs.deepseek.com/quick_start/pricing",
|
|
|
|
|
pricing_version="deepseek-pricing-2026-03-16",
|
|
|
|
|
),
|
2026-05-12 15:04:18 -07:00
|
|
|
(
|
|
|
|
|
"deepseek",
|
|
|
|
|
"deepseek-v4-pro",
|
|
|
|
|
): PricingEntry(
|
|
|
|
|
input_cost_per_million=Decimal("1.74"),
|
|
|
|
|
output_cost_per_million=Decimal("3.48"),
|
|
|
|
|
cache_read_cost_per_million=Decimal("0.0145"),
|
|
|
|
|
source="official_docs_snapshot",
|
|
|
|
|
source_url="https://api-docs.deepseek.com/quick_start/pricing",
|
|
|
|
|
pricing_version="deepseek-pricing-2026-05-12",
|
|
|
|
|
),
|
2026-03-17 03:44:44 -07:00
|
|
|
# Google Gemini
|
|
|
|
|
(
|
|
|
|
|
"google",
|
|
|
|
|
"gemini-2.5-pro",
|
|
|
|
|
): PricingEntry(
|
|
|
|
|
input_cost_per_million=Decimal("1.25"),
|
|
|
|
|
output_cost_per_million=Decimal("10.00"),
|
|
|
|
|
source="official_docs_snapshot",
|
|
|
|
|
source_url="https://ai.google.dev/pricing",
|
|
|
|
|
pricing_version="google-pricing-2026-03-16",
|
|
|
|
|
),
|
|
|
|
|
(
|
|
|
|
|
"google",
|
|
|
|
|
"gemini-2.5-flash",
|
|
|
|
|
): PricingEntry(
|
|
|
|
|
input_cost_per_million=Decimal("0.15"),
|
|
|
|
|
output_cost_per_million=Decimal("0.60"),
|
|
|
|
|
source="official_docs_snapshot",
|
|
|
|
|
source_url="https://ai.google.dev/pricing",
|
|
|
|
|
pricing_version="google-pricing-2026-03-16",
|
|
|
|
|
),
|
|
|
|
|
(
|
|
|
|
|
"google",
|
|
|
|
|
"gemini-2.0-flash",
|
|
|
|
|
): PricingEntry(
|
|
|
|
|
input_cost_per_million=Decimal("0.10"),
|
|
|
|
|
output_cost_per_million=Decimal("0.40"),
|
|
|
|
|
source="official_docs_snapshot",
|
|
|
|
|
source_url="https://ai.google.dev/pricing",
|
|
|
|
|
pricing_version="google-pricing-2026-03-16",
|
|
|
|
|
),
|
feat: native AWS Bedrock provider via Converse API
Salvaged from PR #7920 by JiaDe-Wu — cherry-picked Bedrock-specific
additions onto current main, skipping stale-branch reverts (293 commits
behind).
Dual-path architecture:
- Claude models → AnthropicBedrock SDK (prompt caching, thinking budgets)
- Non-Claude models → Converse API via boto3 (Nova, DeepSeek, Llama, Mistral)
Includes:
- Core adapter (agent/bedrock_adapter.py, 1098 lines)
- Full provider registration (auth, models, providers, config, runtime, main)
- IAM credential chain + Bedrock API Key auth modes
- Dynamic model discovery via ListFoundationModels + ListInferenceProfiles
- Streaming with delta callbacks, error classification, guardrails
- hermes doctor + hermes auth integration
- /usage pricing for 7 Bedrock models
- 130 automated tests (79 unit + 28 integration + follow-up fixes)
- Documentation (website/docs/guides/aws-bedrock.md)
- boto3 optional dependency (pip install hermes-agent[bedrock])
Co-authored-by: JiaDe WU <40445668+JiaDe-Wu@users.noreply.github.com>
2026-04-15 15:18:01 -07:00
|
|
|
# AWS Bedrock — pricing per the Bedrock pricing page.
|
|
|
|
|
# Bedrock charges the same per-token rates as the model provider but
|
|
|
|
|
# through AWS billing. These are the on-demand prices (no commitment).
|
|
|
|
|
# Source: https://aws.amazon.com/bedrock/pricing/
|
|
|
|
|
(
|
|
|
|
|
"bedrock",
|
|
|
|
|
"anthropic.claude-opus-4-6",
|
|
|
|
|
): PricingEntry(
|
|
|
|
|
input_cost_per_million=Decimal("15.00"),
|
|
|
|
|
output_cost_per_million=Decimal("75.00"),
|
|
|
|
|
source="official_docs_snapshot",
|
|
|
|
|
source_url="https://aws.amazon.com/bedrock/pricing/",
|
|
|
|
|
pricing_version="bedrock-pricing-2026-04",
|
|
|
|
|
),
|
|
|
|
|
(
|
|
|
|
|
"bedrock",
|
|
|
|
|
"anthropic.claude-sonnet-4-6",
|
|
|
|
|
): PricingEntry(
|
|
|
|
|
input_cost_per_million=Decimal("3.00"),
|
|
|
|
|
output_cost_per_million=Decimal("15.00"),
|
|
|
|
|
source="official_docs_snapshot",
|
|
|
|
|
source_url="https://aws.amazon.com/bedrock/pricing/",
|
|
|
|
|
pricing_version="bedrock-pricing-2026-04",
|
|
|
|
|
),
|
|
|
|
|
(
|
|
|
|
|
"bedrock",
|
|
|
|
|
"anthropic.claude-sonnet-4-5",
|
|
|
|
|
): PricingEntry(
|
|
|
|
|
input_cost_per_million=Decimal("3.00"),
|
|
|
|
|
output_cost_per_million=Decimal("15.00"),
|
|
|
|
|
source="official_docs_snapshot",
|
|
|
|
|
source_url="https://aws.amazon.com/bedrock/pricing/",
|
|
|
|
|
pricing_version="bedrock-pricing-2026-04",
|
|
|
|
|
),
|
|
|
|
|
(
|
|
|
|
|
"bedrock",
|
|
|
|
|
"anthropic.claude-haiku-4-5",
|
|
|
|
|
): PricingEntry(
|
|
|
|
|
input_cost_per_million=Decimal("0.80"),
|
|
|
|
|
output_cost_per_million=Decimal("4.00"),
|
|
|
|
|
source="official_docs_snapshot",
|
|
|
|
|
source_url="https://aws.amazon.com/bedrock/pricing/",
|
|
|
|
|
pricing_version="bedrock-pricing-2026-04",
|
|
|
|
|
),
|
|
|
|
|
(
|
|
|
|
|
"bedrock",
|
|
|
|
|
"amazon.nova-pro",
|
|
|
|
|
): PricingEntry(
|
|
|
|
|
input_cost_per_million=Decimal("0.80"),
|
|
|
|
|
output_cost_per_million=Decimal("3.20"),
|
|
|
|
|
source="official_docs_snapshot",
|
|
|
|
|
source_url="https://aws.amazon.com/bedrock/pricing/",
|
|
|
|
|
pricing_version="bedrock-pricing-2026-04",
|
|
|
|
|
),
|
|
|
|
|
(
|
|
|
|
|
"bedrock",
|
|
|
|
|
"amazon.nova-lite",
|
|
|
|
|
): PricingEntry(
|
|
|
|
|
input_cost_per_million=Decimal("0.06"),
|
|
|
|
|
output_cost_per_million=Decimal("0.24"),
|
|
|
|
|
source="official_docs_snapshot",
|
|
|
|
|
source_url="https://aws.amazon.com/bedrock/pricing/",
|
|
|
|
|
pricing_version="bedrock-pricing-2026-04",
|
|
|
|
|
),
|
|
|
|
|
(
|
|
|
|
|
"bedrock",
|
|
|
|
|
"amazon.nova-micro",
|
|
|
|
|
): PricingEntry(
|
|
|
|
|
input_cost_per_million=Decimal("0.035"),
|
|
|
|
|
output_cost_per_million=Decimal("0.14"),
|
|
|
|
|
source="official_docs_snapshot",
|
|
|
|
|
source_url="https://aws.amazon.com/bedrock/pricing/",
|
|
|
|
|
pricing_version="bedrock-pricing-2026-04",
|
|
|
|
|
),
|
2026-04-29 12:12:56 +01:00
|
|
|
# MiniMax
|
|
|
|
|
(
|
|
|
|
|
"minimax",
|
|
|
|
|
"minimax-m2.7",
|
|
|
|
|
): PricingEntry(
|
|
|
|
|
input_cost_per_million=Decimal("0.30"),
|
|
|
|
|
output_cost_per_million=Decimal("1.20"),
|
|
|
|
|
source="official_docs_snapshot",
|
|
|
|
|
pricing_version="minimax-pricing-2026-04",
|
|
|
|
|
),
|
|
|
|
|
(
|
|
|
|
|
"minimax-cn",
|
|
|
|
|
"minimax-m2.7",
|
|
|
|
|
): PricingEntry(
|
|
|
|
|
input_cost_per_million=Decimal("0.30"),
|
|
|
|
|
output_cost_per_million=Decimal("1.20"),
|
|
|
|
|
source="official_docs_snapshot",
|
|
|
|
|
pricing_version="minimax-pricing-2026-04",
|
|
|
|
|
),
|
2026-03-17 03:44:44 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _to_decimal(value: Any) -> Optional[Decimal]:
|
|
|
|
|
if value is None:
|
|
|
|
|
return None
|
|
|
|
|
try:
|
|
|
|
|
return Decimal(str(value))
|
|
|
|
|
except Exception:
|
|
|
|
|
return None
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _to_int(value: Any) -> int:
|
|
|
|
|
try:
|
|
|
|
|
return int(value or 0)
|
|
|
|
|
except Exception:
|
|
|
|
|
return 0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def resolve_billing_route(
|
|
|
|
|
model_name: str,
|
|
|
|
|
provider: Optional[str] = None,
|
|
|
|
|
base_url: Optional[str] = None,
|
|
|
|
|
) -> BillingRoute:
|
|
|
|
|
provider_name = (provider or "").strip().lower()
|
|
|
|
|
base = (base_url or "").strip().lower()
|
|
|
|
|
model = (model_name or "").strip()
|
|
|
|
|
if not provider_name and "/" in model:
|
|
|
|
|
inferred_provider, bare_model = model.split("/", 1)
|
|
|
|
|
if inferred_provider in {"anthropic", "openai", "google"}:
|
|
|
|
|
provider_name = inferred_provider
|
|
|
|
|
model = bare_model
|
|
|
|
|
|
|
|
|
|
if provider_name == "openai-codex":
|
|
|
|
|
return BillingRoute(provider="openai-codex", model=model, base_url=base_url or "", billing_mode="subscription_included")
|
fix: sweep remaining provider-URL substring checks across codebase
Completes the hostname-hardening sweep — every substring check against a
provider host in live-routing code is now hostname-based. This closes the
same false-positive class for OpenRouter, GitHub Copilot, Kimi, Qwen,
ChatGPT/Codex, Bedrock, GitHub Models, Vercel AI Gateway, Nous, Z.AI,
Moonshot, Arcee, and MiniMax that the original PR closed for OpenAI, xAI,
and Anthropic.
New helper:
- utils.base_url_host_matches(base_url, domain) — safe counterpart to
'domain in base_url'. Accepts hostname equality and subdomain matches;
rejects path segments, host suffixes, and prefix collisions.
Call sites converted (real-code only; tests, optional-skills, red-teaming
scripts untouched):
run_agent.py (10 sites):
- AIAgent.__init__ Bedrock branch, ChatGPT/Codex branch (also path check)
- header cascade for openrouter / copilot / kimi / qwen / chatgpt
- interleaved-thinking trigger (openrouter + claude)
- _is_openrouter_url(), _is_qwen_portal()
- is_native_anthropic check
- github-models-vs-copilot detection (3 sites)
- reasoning-capable route gate (nousresearch, vercel, github)
- codex-backend detection in API kwargs build
- fallback api_mode Bedrock detection
agent/auxiliary_client.py (7 sites):
- extra-headers cascades in 4 distinct client-construction paths
(resolve custom, resolve auto, OpenRouter-fallback-to-custom,
_async_client_from_sync, resolve_provider_client explicit-custom,
resolve_auto_with_codex)
- _is_openrouter_client() base_url sniff
agent/usage_pricing.py:
- resolve_billing_route openrouter branch
agent/model_metadata.py:
- _is_openrouter_base_url(), Bedrock context-length lookup
hermes_cli/providers.py:
- determine_api_mode Bedrock heuristic
hermes_cli/runtime_provider.py:
- _is_openrouter_url flag for API-key preference (issues #420, #560)
hermes_cli/doctor.py:
- Kimi User-Agent header for /models probes
tools/delegate_tool.py:
- subagent Codex endpoint detection
trajectory_compressor.py:
- _detect_provider() cascade (8 providers: openrouter, nous, codex, zai,
kimi-coding, arcee, minimax-cn, minimax)
cli.py, gateway/run.py:
- /model-switch cache-enabled hint (openrouter + claude)
Bedrock detection tightened from 'bedrock-runtime in url' to
'hostname starts with bedrock-runtime. AND host is under amazonaws.com'.
ChatGPT/Codex detection tightened from 'chatgpt.com/backend-api/codex in
url' to 'hostname is chatgpt.com AND path contains /backend-api/codex'.
Tests:
- tests/test_base_url_hostname.py extended with a base_url_host_matches
suite (exact match, subdomain, path-segment rejection, host-suffix
rejection, host-prefix rejection, empty-input, case-insensitivity,
trailing dot).
Validation: 651 targeted tests pass (runtime_provider, minimax, bedrock,
gemini, auxiliary, codex_cloudflare, usage_pricing, compressor_fallback,
fallback_model, openai_client_lifecycle, provider_parity, cli_provider_resolution,
delegate, credential_pool, context_compressor, plus the 4 hostname test
modules). 26-assertion E2E call-site verification across 6 modules passes.
2026-04-20 21:17:28 -07:00
|
|
|
if provider_name == "openrouter" or base_url_host_matches(base_url or "", "openrouter.ai"):
|
2026-03-17 03:44:44 -07:00
|
|
|
return BillingRoute(provider="openrouter", model=model, base_url=base_url or "", billing_mode="official_models_api")
|
2026-05-15 10:07:45 +10:00
|
|
|
if provider_name == "nous" or base_url_host_matches(base_url or "", "inference-api.nousresearch.com"):
|
|
|
|
|
return BillingRoute(provider="nous", model=model, base_url=base_url or _NOUS_DEFAULT_BASE_URL, billing_mode="official_models_api")
|
2026-03-17 03:44:44 -07:00
|
|
|
if provider_name == "anthropic":
|
|
|
|
|
return BillingRoute(provider="anthropic", model=model.split("/")[-1], base_url=base_url or "", billing_mode="official_docs_snapshot")
|
|
|
|
|
if provider_name == "openai":
|
|
|
|
|
return BillingRoute(provider="openai", model=model.split("/")[-1], base_url=base_url or "", billing_mode="official_docs_snapshot")
|
2026-04-29 12:12:56 +01:00
|
|
|
if provider_name in {"minimax", "minimax-cn"}:
|
|
|
|
|
return BillingRoute(provider=provider_name, model=model.split("/")[-1], base_url=base_url or "", billing_mode="official_docs_snapshot")
|
2026-03-17 03:44:44 -07:00
|
|
|
if provider_name in {"custom", "local"} or (base and "localhost" in base):
|
|
|
|
|
return BillingRoute(provider=provider_name or "custom", model=model, base_url=base_url or "", billing_mode="unknown")
|
|
|
|
|
return BillingRoute(provider=provider_name or "unknown", model=model.split("/")[-1] if model else "", base_url=base_url or "", billing_mode="unknown")
|
|
|
|
|
|
|
|
|
|
|
2026-05-07 16:24:31 -04:00
|
|
|
def _normalize_anthropic_model_name(model: str) -> str:
|
|
|
|
|
"""Normalize Anthropic model name variants to canonical form.
|
|
|
|
|
|
|
|
|
|
Handles:
|
|
|
|
|
- Dot notation: claude-opus-4.7 → claude-opus-4-7
|
|
|
|
|
- Short aliases: claude-opus-4.7 → claude-opus-4-7
|
|
|
|
|
- Strips anthropic/ prefix if present
|
|
|
|
|
"""
|
|
|
|
|
name = model.lower().strip()
|
|
|
|
|
if name.startswith("anthropic/"):
|
|
|
|
|
name = name[len("anthropic/"):]
|
|
|
|
|
# Normalize dots to dashes in version numbers (e.g. 4.7 → 4-7, 4.6 → 4-6)
|
|
|
|
|
# But preserve the rest of the name structure
|
|
|
|
|
name = re.sub(r"(\d+)\.(\d+)", r"\1-\2", name)
|
|
|
|
|
return name
|
|
|
|
|
|
|
|
|
|
|
2026-03-17 03:44:44 -07:00
|
|
|
def _lookup_official_docs_pricing(route: BillingRoute) -> Optional[PricingEntry]:
|
2026-05-07 16:24:31 -04:00
|
|
|
model = route.model.lower()
|
|
|
|
|
# Direct lookup first
|
|
|
|
|
entry = _OFFICIAL_DOCS_PRICING.get((route.provider, model))
|
|
|
|
|
if entry:
|
|
|
|
|
return entry
|
|
|
|
|
# Try normalized name for Anthropic (handles dot-notation like opus-4.7)
|
|
|
|
|
if route.provider == "anthropic":
|
|
|
|
|
normalized = _normalize_anthropic_model_name(model)
|
|
|
|
|
if normalized != model:
|
|
|
|
|
entry = _OFFICIAL_DOCS_PRICING.get((route.provider, normalized))
|
|
|
|
|
if entry:
|
|
|
|
|
return entry
|
|
|
|
|
return None
|
2026-03-17 03:44:44 -07:00
|
|
|
|
|
|
|
|
|
|
|
|
|
def _openrouter_pricing_entry(route: BillingRoute) -> Optional[PricingEntry]:
|
2026-03-18 03:04:07 -07:00
|
|
|
return _pricing_entry_from_metadata(
|
|
|
|
|
fetch_model_metadata(),
|
|
|
|
|
route.model,
|
|
|
|
|
source_url="https://openrouter.ai/docs/api/api-reference/models/get-models",
|
|
|
|
|
pricing_version="openrouter-models-api",
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _pricing_entry_from_metadata(
|
|
|
|
|
metadata: Dict[str, Dict[str, Any]],
|
|
|
|
|
model_id: str,
|
|
|
|
|
*,
|
|
|
|
|
source_url: str,
|
|
|
|
|
pricing_version: str,
|
|
|
|
|
) -> Optional[PricingEntry]:
|
2026-03-17 03:44:44 -07:00
|
|
|
if model_id not in metadata:
|
|
|
|
|
return None
|
|
|
|
|
pricing = metadata[model_id].get("pricing") or {}
|
|
|
|
|
prompt = _to_decimal(pricing.get("prompt"))
|
|
|
|
|
completion = _to_decimal(pricing.get("completion"))
|
|
|
|
|
request = _to_decimal(pricing.get("request"))
|
|
|
|
|
cache_read = _to_decimal(
|
|
|
|
|
pricing.get("cache_read")
|
|
|
|
|
or pricing.get("cached_prompt")
|
|
|
|
|
or pricing.get("input_cache_read")
|
|
|
|
|
)
|
|
|
|
|
cache_write = _to_decimal(
|
|
|
|
|
pricing.get("cache_write")
|
|
|
|
|
or pricing.get("cache_creation")
|
|
|
|
|
or pricing.get("input_cache_write")
|
|
|
|
|
)
|
|
|
|
|
if prompt is None and completion is None and request is None:
|
|
|
|
|
return None
|
2026-03-18 03:04:07 -07:00
|
|
|
|
2026-03-17 03:44:44 -07:00
|
|
|
def _per_token_to_per_million(value: Optional[Decimal]) -> Optional[Decimal]:
|
|
|
|
|
if value is None:
|
|
|
|
|
return None
|
|
|
|
|
return value * _ONE_MILLION
|
|
|
|
|
|
|
|
|
|
return PricingEntry(
|
|
|
|
|
input_cost_per_million=_per_token_to_per_million(prompt),
|
|
|
|
|
output_cost_per_million=_per_token_to_per_million(completion),
|
|
|
|
|
cache_read_cost_per_million=_per_token_to_per_million(cache_read),
|
|
|
|
|
cache_write_cost_per_million=_per_token_to_per_million(cache_write),
|
|
|
|
|
request_cost=request,
|
|
|
|
|
source="provider_models_api",
|
2026-03-18 03:04:07 -07:00
|
|
|
source_url=source_url,
|
|
|
|
|
pricing_version=pricing_version,
|
2026-03-17 03:44:44 -07:00
|
|
|
fetched_at=_UTC_NOW(),
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def get_pricing_entry(
|
|
|
|
|
model_name: str,
|
|
|
|
|
provider: Optional[str] = None,
|
|
|
|
|
base_url: Optional[str] = None,
|
2026-03-18 03:04:07 -07:00
|
|
|
api_key: Optional[str] = None,
|
2026-03-17 03:44:44 -07:00
|
|
|
) -> Optional[PricingEntry]:
|
|
|
|
|
route = resolve_billing_route(model_name, provider=provider, base_url=base_url)
|
|
|
|
|
if route.billing_mode == "subscription_included":
|
|
|
|
|
return PricingEntry(
|
|
|
|
|
input_cost_per_million=_ZERO,
|
|
|
|
|
output_cost_per_million=_ZERO,
|
|
|
|
|
cache_read_cost_per_million=_ZERO,
|
|
|
|
|
cache_write_cost_per_million=_ZERO,
|
|
|
|
|
source="none",
|
|
|
|
|
pricing_version="included-route",
|
|
|
|
|
)
|
|
|
|
|
if route.provider == "openrouter":
|
|
|
|
|
return _openrouter_pricing_entry(route)
|
2026-03-18 03:04:07 -07:00
|
|
|
if route.base_url:
|
|
|
|
|
entry = _pricing_entry_from_metadata(
|
|
|
|
|
fetch_endpoint_model_metadata(route.base_url, api_key=api_key or ""),
|
|
|
|
|
route.model,
|
|
|
|
|
source_url=f"{route.base_url.rstrip('/')}/models",
|
|
|
|
|
pricing_version="openai-compatible-models-api",
|
|
|
|
|
)
|
|
|
|
|
if entry:
|
|
|
|
|
return entry
|
2026-03-17 03:44:44 -07:00
|
|
|
return _lookup_official_docs_pricing(route)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def normalize_usage(
|
|
|
|
|
response_usage: Any,
|
|
|
|
|
*,
|
|
|
|
|
provider: Optional[str] = None,
|
|
|
|
|
api_mode: Optional[str] = None,
|
|
|
|
|
) -> CanonicalUsage:
|
|
|
|
|
"""Normalize raw API response usage into canonical token buckets.
|
|
|
|
|
|
|
|
|
|
Handles three API shapes:
|
|
|
|
|
- Anthropic: input_tokens/output_tokens/cache_read_input_tokens/cache_creation_input_tokens
|
|
|
|
|
- Codex Responses: input_tokens includes cache tokens; input_tokens_details.cached_tokens separates them
|
|
|
|
|
- OpenAI Chat Completions: prompt_tokens includes cache tokens; prompt_tokens_details.cached_tokens separates them
|
|
|
|
|
|
|
|
|
|
In both Codex and OpenAI modes, input_tokens is derived by subtracting cache
|
|
|
|
|
tokens from the total — the API contract is that input/prompt totals include
|
|
|
|
|
cached tokens and the details object breaks them out.
|
|
|
|
|
"""
|
|
|
|
|
if not response_usage:
|
|
|
|
|
return CanonicalUsage()
|
|
|
|
|
|
|
|
|
|
provider_name = (provider or "").strip().lower()
|
|
|
|
|
mode = (api_mode or "").strip().lower()
|
|
|
|
|
|
|
|
|
|
if mode == "anthropic_messages" or provider_name == "anthropic":
|
|
|
|
|
input_tokens = _to_int(getattr(response_usage, "input_tokens", 0))
|
|
|
|
|
output_tokens = _to_int(getattr(response_usage, "output_tokens", 0))
|
|
|
|
|
cache_read_tokens = _to_int(getattr(response_usage, "cache_read_input_tokens", 0))
|
|
|
|
|
cache_write_tokens = _to_int(getattr(response_usage, "cache_creation_input_tokens", 0))
|
|
|
|
|
elif mode == "codex_responses":
|
|
|
|
|
input_total = _to_int(getattr(response_usage, "input_tokens", 0))
|
|
|
|
|
output_tokens = _to_int(getattr(response_usage, "output_tokens", 0))
|
|
|
|
|
details = getattr(response_usage, "input_tokens_details", None)
|
|
|
|
|
cache_read_tokens = _to_int(getattr(details, "cached_tokens", 0) if details else 0)
|
|
|
|
|
cache_write_tokens = _to_int(
|
|
|
|
|
getattr(details, "cache_creation_tokens", 0) if details else 0
|
|
|
|
|
)
|
|
|
|
|
input_tokens = max(0, input_total - cache_read_tokens - cache_write_tokens)
|
|
|
|
|
else:
|
|
|
|
|
prompt_total = _to_int(getattr(response_usage, "prompt_tokens", 0))
|
|
|
|
|
output_tokens = _to_int(getattr(response_usage, "completion_tokens", 0))
|
|
|
|
|
details = getattr(response_usage, "prompt_tokens_details", None)
|
fix(usage): read top-level Anthropic cache fields from OAI-compatible proxies
Port from cline/cline#10266.
When OpenAI-compatible proxies (OpenRouter, Vercel AI Gateway, Cline)
route Claude models, they sometimes surface the Anthropic-native cache
counters (`cache_read_input_tokens`, `cache_creation_input_tokens`) at
the top level of the `usage` object instead of nesting them inside
`prompt_tokens_details`. Our chat-completions branch of
`normalize_usage()` only read the nested `prompt_tokens_details` fields,
so those responses:
- reported `cache_write_tokens = 0` even when the model actually did a
prompt-cache write,
- reported only some of the cache-read tokens when the proxy exposed them
top-level only,
- overstated `input_tokens` by the missed cache-write amount, which in
turn made cost estimation and the status-bar cache-hit percentage wrong
for Claude traffic going through these gateways.
Now the chat-completions branch tries the OpenAI-standard
`prompt_tokens_details` first and falls back to the top-level
Anthropic-shape fields only if the nested values are absent/zero. The
Anthropic and Codex Responses branches are unchanged.
Regression guards added for three shapes: top-level write + nested read,
top-level-only, and both-present (nested wins).
2026-04-22 17:03:35 -07:00
|
|
|
# Primary: OpenAI-style prompt_tokens_details. Fallback: Anthropic-style
|
remove Vercel AI Gateway and Vercel Sandbox (#33067)
* remove Vercel AI Gateway provider and Vercel Sandbox terminal backend
Both Vercel-hosted integrations are removed end-to-end. Users on the AI
Gateway should switch to OpenRouter or one of the other aggregators
(Nous Portal, Kilo Code). Users on the Vercel Sandbox backend should
switch to Docker, Modal, Daytona, or SSH.
What's removed:
- `plugins/model-providers/ai-gateway/` provider plugin
- `hermes_cli/vercel_auth.py` Vercel-Sandbox auth helper
- `tools/environments/vercel_sandbox.py` terminal backend
- `ai-gateway` provider wiring across auth, doctor, setup, models,
config, status, providers, main, web_server, model_normalize, dump
- `vercel_sandbox` backend wiring across terminal_tool, file_tools,
code_execution_tool, file_operations, approval, skills_tool,
environments/local, credential_files, lazy_deps, prompt_builder,
cli, gateway/run
- `AI_GATEWAY_BASE_URL` constant, `_AI_GATEWAY_HEADERS` auxiliary-client
header set, run_agent base-URL header/reasoning special-cases
- `[vercel]` pyproject extra and `vercel`/`vercel-workers` from uv.lock
- env vars: `AI_GATEWAY_API_KEY`, `AI_GATEWAY_BASE_URL`, `VERCEL_TOKEN`,
`VERCEL_PROJECT_ID`, `VERCEL_TEAM_ID`, `VERCEL_OIDC_TOKEN`,
`TERMINAL_VERCEL_RUNTIME`
- Tests: deletes test_ai_gateway_models.py and
test_vercel_sandbox_environment.py; scrubs references across 23
surviving test files (no entire tests deleted unless they were
dedicated to AI Gateway / Sandbox)
- Docs: provider tables, env-var reference, setup guides, security
notes, tool config, terminal-backend tables — English plus zh-Hans
i18n parity
- `hermes-agent` skill: provider table entry and remote-backend list
What stays (intentional):
- `popular-web-designs/templates/vercel.md` — CSS design reference,
unrelated to Vercel-the-AI-product
- `x-vercel-id` in `stream_diag.py` headers — generic Vercel CDN
response header, useful diag signal on any Vercel-hosted endpoint
- `vercel-labs/agent-browser` URL in browser config — lightpanda
browser project, different OSS effort
- `userStories.json` historical contributor entry mentioning Vercel
Sandbox — archive, not active docs
Validation:
- 1153 tests in the 22 targeted files pass (`scripts/run_tests.sh`)
- Full repo `py_compile` clean
- Live import of every touched module + invariant check (no
`ai-gateway` in `PROVIDER_REGISTRY`, no `_AI_GATEWAY_HEADERS`, no
`vercel_sandbox` in `_REMOTE_TERMINAL_BACKENDS`)
* test: convert profile-count check from change-detector to invariant
The hardcoded "== 34" assertion broke when ai-gateway was removed.
Per AGENTS.md change-detector-test guidance, assert the relationship
(registry count >= number of plugin dirs) instead of a literal count.
Counts shift when providers are added/removed; that's expected.
2026-05-27 00:43:32 -07:00
|
|
|
# top-level fields that some OpenAI-compatible proxies (OpenRouter, Cline)
|
|
|
|
|
# expose when routing Claude models — without this
|
fix(usage): read top-level Anthropic cache fields from OAI-compatible proxies
Port from cline/cline#10266.
When OpenAI-compatible proxies (OpenRouter, Vercel AI Gateway, Cline)
route Claude models, they sometimes surface the Anthropic-native cache
counters (`cache_read_input_tokens`, `cache_creation_input_tokens`) at
the top level of the `usage` object instead of nesting them inside
`prompt_tokens_details`. Our chat-completions branch of
`normalize_usage()` only read the nested `prompt_tokens_details` fields,
so those responses:
- reported `cache_write_tokens = 0` even when the model actually did a
prompt-cache write,
- reported only some of the cache-read tokens when the proxy exposed them
top-level only,
- overstated `input_tokens` by the missed cache-write amount, which in
turn made cost estimation and the status-bar cache-hit percentage wrong
for Claude traffic going through these gateways.
Now the chat-completions branch tries the OpenAI-standard
`prompt_tokens_details` first and falls back to the top-level
Anthropic-shape fields only if the nested values are absent/zero. The
Anthropic and Codex Responses branches are unchanged.
Regression guards added for three shapes: top-level write + nested read,
top-level-only, and both-present (nested wins).
2026-04-22 17:03:35 -07:00
|
|
|
# fallback, cache writes are undercounted as 0 and cache reads can be
|
|
|
|
|
# missed when the proxy only surfaces them at the top level.
|
|
|
|
|
# Port of cline/cline#10266.
|
2026-03-17 03:44:44 -07:00
|
|
|
cache_read_tokens = _to_int(getattr(details, "cached_tokens", 0) if details else 0)
|
fix(usage): read top-level Anthropic cache fields from OAI-compatible proxies
Port from cline/cline#10266.
When OpenAI-compatible proxies (OpenRouter, Vercel AI Gateway, Cline)
route Claude models, they sometimes surface the Anthropic-native cache
counters (`cache_read_input_tokens`, `cache_creation_input_tokens`) at
the top level of the `usage` object instead of nesting them inside
`prompt_tokens_details`. Our chat-completions branch of
`normalize_usage()` only read the nested `prompt_tokens_details` fields,
so those responses:
- reported `cache_write_tokens = 0` even when the model actually did a
prompt-cache write,
- reported only some of the cache-read tokens when the proxy exposed them
top-level only,
- overstated `input_tokens` by the missed cache-write amount, which in
turn made cost estimation and the status-bar cache-hit percentage wrong
for Claude traffic going through these gateways.
Now the chat-completions branch tries the OpenAI-standard
`prompt_tokens_details` first and falls back to the top-level
Anthropic-shape fields only if the nested values are absent/zero. The
Anthropic and Codex Responses branches are unchanged.
Regression guards added for three shapes: top-level write + nested read,
top-level-only, and both-present (nested wins).
2026-04-22 17:03:35 -07:00
|
|
|
if not cache_read_tokens:
|
|
|
|
|
cache_read_tokens = _to_int(getattr(response_usage, "cache_read_input_tokens", 0))
|
2026-03-17 03:44:44 -07:00
|
|
|
cache_write_tokens = _to_int(
|
|
|
|
|
getattr(details, "cache_write_tokens", 0) if details else 0
|
|
|
|
|
)
|
fix(usage): read top-level Anthropic cache fields from OAI-compatible proxies
Port from cline/cline#10266.
When OpenAI-compatible proxies (OpenRouter, Vercel AI Gateway, Cline)
route Claude models, they sometimes surface the Anthropic-native cache
counters (`cache_read_input_tokens`, `cache_creation_input_tokens`) at
the top level of the `usage` object instead of nesting them inside
`prompt_tokens_details`. Our chat-completions branch of
`normalize_usage()` only read the nested `prompt_tokens_details` fields,
so those responses:
- reported `cache_write_tokens = 0` even when the model actually did a
prompt-cache write,
- reported only some of the cache-read tokens when the proxy exposed them
top-level only,
- overstated `input_tokens` by the missed cache-write amount, which in
turn made cost estimation and the status-bar cache-hit percentage wrong
for Claude traffic going through these gateways.
Now the chat-completions branch tries the OpenAI-standard
`prompt_tokens_details` first and falls back to the top-level
Anthropic-shape fields only if the nested values are absent/zero. The
Anthropic and Codex Responses branches are unchanged.
Regression guards added for three shapes: top-level write + nested read,
top-level-only, and both-present (nested wins).
2026-04-22 17:03:35 -07:00
|
|
|
if not cache_write_tokens:
|
|
|
|
|
cache_write_tokens = _to_int(
|
|
|
|
|
getattr(response_usage, "cache_creation_input_tokens", 0)
|
|
|
|
|
)
|
2026-03-17 03:44:44 -07:00
|
|
|
input_tokens = max(0, prompt_total - cache_read_tokens - cache_write_tokens)
|
|
|
|
|
|
|
|
|
|
reasoning_tokens = 0
|
|
|
|
|
output_details = getattr(response_usage, "output_tokens_details", None)
|
|
|
|
|
if output_details:
|
|
|
|
|
reasoning_tokens = _to_int(getattr(output_details, "reasoning_tokens", 0))
|
|
|
|
|
|
|
|
|
|
return CanonicalUsage(
|
|
|
|
|
input_tokens=input_tokens,
|
|
|
|
|
output_tokens=output_tokens,
|
|
|
|
|
cache_read_tokens=cache_read_tokens,
|
|
|
|
|
cache_write_tokens=cache_write_tokens,
|
|
|
|
|
reasoning_tokens=reasoning_tokens,
|
|
|
|
|
)
|
|
|
|
|
|
feat: add persistent CLI status bar and usage details (#1522)
Salvaged from PR #1104 by kshitijk4poor. Closes #683.
Adds a persistent status bar to the CLI showing model name, context
window usage with visual bar, estimated cost, and session duration.
Responsive layout degrades gracefully for narrow terminals.
Changes:
- agent/usage_pricing.py: shared pricing table, cost estimation with
Decimal arithmetic, duration/token formatting helpers
- agent/insights.py: refactored to reuse usage_pricing (eliminates
duplicate pricing table and formatting logic)
- cli.py: status bar with FormattedTextControl fragments, color-coded
context thresholds (green/yellow/orange/red), enhanced /usage with
cost breakdown, 1Hz idle refresh for status bar updates
- tests/test_cli_status_bar.py: status bar snapshot, width collapsing,
usage report with/without pricing, zero-priced model handling
- tests/test_insights.py: verify zero-priced providers show as unknown
Salvage fixes:
- Resolved conflict with voice status bar (both coexist in layout)
- Import _format_context_length from hermes_cli.banner (moved since PR)
Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
2026-03-16 04:42:48 -07:00
|
|
|
|
2026-03-17 03:44:44 -07:00
|
|
|
def estimate_usage_cost(
|
|
|
|
|
model_name: str,
|
|
|
|
|
usage: CanonicalUsage,
|
|
|
|
|
*,
|
|
|
|
|
provider: Optional[str] = None,
|
|
|
|
|
base_url: Optional[str] = None,
|
2026-03-18 03:04:07 -07:00
|
|
|
api_key: Optional[str] = None,
|
2026-03-17 03:44:44 -07:00
|
|
|
) -> CostResult:
|
|
|
|
|
route = resolve_billing_route(model_name, provider=provider, base_url=base_url)
|
|
|
|
|
if route.billing_mode == "subscription_included":
|
|
|
|
|
return CostResult(
|
|
|
|
|
amount_usd=_ZERO,
|
|
|
|
|
status="included",
|
|
|
|
|
source="none",
|
|
|
|
|
label="included",
|
|
|
|
|
pricing_version="included-route",
|
|
|
|
|
)
|
|
|
|
|
|
2026-03-18 03:04:07 -07:00
|
|
|
entry = get_pricing_entry(model_name, provider=provider, base_url=base_url, api_key=api_key)
|
2026-03-17 03:44:44 -07:00
|
|
|
if not entry:
|
|
|
|
|
return CostResult(amount_usd=None, status="unknown", source="none", label="n/a")
|
|
|
|
|
|
|
|
|
|
notes: list[str] = []
|
|
|
|
|
amount = _ZERO
|
|
|
|
|
|
|
|
|
|
if usage.input_tokens and entry.input_cost_per_million is None:
|
|
|
|
|
return CostResult(amount_usd=None, status="unknown", source=entry.source, label="n/a")
|
|
|
|
|
if usage.output_tokens and entry.output_cost_per_million is None:
|
|
|
|
|
return CostResult(amount_usd=None, status="unknown", source=entry.source, label="n/a")
|
|
|
|
|
if usage.cache_read_tokens:
|
|
|
|
|
if entry.cache_read_cost_per_million is None:
|
|
|
|
|
return CostResult(
|
|
|
|
|
amount_usd=None,
|
|
|
|
|
status="unknown",
|
|
|
|
|
source=entry.source,
|
|
|
|
|
label="n/a",
|
|
|
|
|
notes=("cache-read pricing unavailable for route",),
|
|
|
|
|
)
|
|
|
|
|
if usage.cache_write_tokens:
|
|
|
|
|
if entry.cache_write_cost_per_million is None:
|
|
|
|
|
return CostResult(
|
|
|
|
|
amount_usd=None,
|
|
|
|
|
status="unknown",
|
|
|
|
|
source=entry.source,
|
|
|
|
|
label="n/a",
|
|
|
|
|
notes=("cache-write pricing unavailable for route",),
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
if entry.input_cost_per_million is not None:
|
|
|
|
|
amount += Decimal(usage.input_tokens) * entry.input_cost_per_million / _ONE_MILLION
|
|
|
|
|
if entry.output_cost_per_million is not None:
|
|
|
|
|
amount += Decimal(usage.output_tokens) * entry.output_cost_per_million / _ONE_MILLION
|
|
|
|
|
if entry.cache_read_cost_per_million is not None:
|
|
|
|
|
amount += Decimal(usage.cache_read_tokens) * entry.cache_read_cost_per_million / _ONE_MILLION
|
|
|
|
|
if entry.cache_write_cost_per_million is not None:
|
|
|
|
|
amount += Decimal(usage.cache_write_tokens) * entry.cache_write_cost_per_million / _ONE_MILLION
|
|
|
|
|
if entry.request_cost is not None and usage.request_count:
|
|
|
|
|
amount += Decimal(usage.request_count) * entry.request_cost
|
|
|
|
|
|
|
|
|
|
status: CostStatus = "estimated"
|
|
|
|
|
label = f"~${amount:.2f}"
|
|
|
|
|
if entry.source == "none" and amount == _ZERO:
|
|
|
|
|
status = "included"
|
|
|
|
|
label = "included"
|
|
|
|
|
|
|
|
|
|
if route.provider == "openrouter":
|
|
|
|
|
notes.append("OpenRouter cost is estimated from the models API until reconciled.")
|
|
|
|
|
|
|
|
|
|
return CostResult(
|
|
|
|
|
amount_usd=amount,
|
|
|
|
|
status=status,
|
|
|
|
|
source=entry.source,
|
|
|
|
|
label=label,
|
|
|
|
|
fetched_at=entry.fetched_at,
|
|
|
|
|
pricing_version=entry.pricing_version,
|
|
|
|
|
notes=tuple(notes),
|
feat: add persistent CLI status bar and usage details (#1522)
Salvaged from PR #1104 by kshitijk4poor. Closes #683.
Adds a persistent status bar to the CLI showing model name, context
window usage with visual bar, estimated cost, and session duration.
Responsive layout degrades gracefully for narrow terminals.
Changes:
- agent/usage_pricing.py: shared pricing table, cost estimation with
Decimal arithmetic, duration/token formatting helpers
- agent/insights.py: refactored to reuse usage_pricing (eliminates
duplicate pricing table and formatting logic)
- cli.py: status bar with FormattedTextControl fragments, color-coded
context thresholds (green/yellow/orange/red), enhanced /usage with
cost breakdown, 1Hz idle refresh for status bar updates
- tests/test_cli_status_bar.py: status bar snapshot, width collapsing,
usage report with/without pricing, zero-priced model handling
- tests/test_insights.py: verify zero-priced providers show as unknown
Salvage fixes:
- Resolved conflict with voice status bar (both coexist in layout)
- Import _format_context_length from hermes_cli.banner (moved since PR)
Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
2026-03-16 04:42:48 -07:00
|
|
|
)
|
|
|
|
|
|
|
|
|
|
|
2026-03-17 03:44:44 -07:00
|
|
|
def has_known_pricing(
|
|
|
|
|
model_name: str,
|
|
|
|
|
provider: Optional[str] = None,
|
|
|
|
|
base_url: Optional[str] = None,
|
2026-03-18 03:04:07 -07:00
|
|
|
api_key: Optional[str] = None,
|
2026-03-17 03:44:44 -07:00
|
|
|
) -> bool:
|
|
|
|
|
"""Check whether we have pricing data for this model+route.
|
|
|
|
|
|
|
|
|
|
Uses direct lookup instead of routing through the full estimation
|
|
|
|
|
pipeline — avoids creating dummy usage objects just to check status.
|
|
|
|
|
"""
|
|
|
|
|
route = resolve_billing_route(model_name, provider=provider, base_url=base_url)
|
|
|
|
|
if route.billing_mode == "subscription_included":
|
|
|
|
|
return True
|
2026-03-18 03:04:07 -07:00
|
|
|
entry = get_pricing_entry(model_name, provider=provider, base_url=base_url, api_key=api_key)
|
2026-03-17 03:44:44 -07:00
|
|
|
return entry is not None
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
feat: add persistent CLI status bar and usage details (#1522)
Salvaged from PR #1104 by kshitijk4poor. Closes #683.
Adds a persistent status bar to the CLI showing model name, context
window usage with visual bar, estimated cost, and session duration.
Responsive layout degrades gracefully for narrow terminals.
Changes:
- agent/usage_pricing.py: shared pricing table, cost estimation with
Decimal arithmetic, duration/token formatting helpers
- agent/insights.py: refactored to reuse usage_pricing (eliminates
duplicate pricing table and formatting logic)
- cli.py: status bar with FormattedTextControl fragments, color-coded
context thresholds (green/yellow/orange/red), enhanced /usage with
cost breakdown, 1Hz idle refresh for status bar updates
- tests/test_cli_status_bar.py: status bar snapshot, width collapsing,
usage report with/without pricing, zero-priced model handling
- tests/test_insights.py: verify zero-priced providers show as unknown
Salvage fixes:
- Resolved conflict with voice status bar (both coexist in layout)
- Import _format_context_length from hermes_cli.banner (moved since PR)
Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
2026-03-16 04:42:48 -07:00
|
|
|
def format_duration_compact(seconds: float) -> str:
|
|
|
|
|
if seconds < 60:
|
|
|
|
|
return f"{seconds:.0f}s"
|
|
|
|
|
minutes = seconds / 60
|
|
|
|
|
if minutes < 60:
|
|
|
|
|
return f"{minutes:.0f}m"
|
|
|
|
|
hours = minutes / 60
|
|
|
|
|
if hours < 24:
|
|
|
|
|
remaining_min = int(minutes % 60)
|
|
|
|
|
return f"{int(hours)}h {remaining_min}m" if remaining_min else f"{int(hours)}h"
|
|
|
|
|
days = hours / 24
|
|
|
|
|
return f"{days:.1f}d"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def format_token_count_compact(value: int) -> str:
|
|
|
|
|
abs_value = abs(int(value))
|
|
|
|
|
if abs_value < 1_000:
|
|
|
|
|
return str(int(value))
|
|
|
|
|
|
|
|
|
|
sign = "-" if value < 0 else ""
|
|
|
|
|
units = ((1_000_000_000, "B"), (1_000_000, "M"), (1_000, "K"))
|
|
|
|
|
for threshold, suffix in units:
|
|
|
|
|
if abs_value >= threshold:
|
|
|
|
|
scaled = abs_value / threshold
|
|
|
|
|
if scaled < 10:
|
|
|
|
|
text = f"{scaled:.2f}"
|
|
|
|
|
elif scaled < 100:
|
|
|
|
|
text = f"{scaled:.1f}"
|
|
|
|
|
else:
|
|
|
|
|
text = f"{scaled:.0f}"
|
2026-03-25 12:45:58 -07:00
|
|
|
if "." in text:
|
|
|
|
|
text = text.rstrip("0").rstrip(".")
|
feat: add persistent CLI status bar and usage details (#1522)
Salvaged from PR #1104 by kshitijk4poor. Closes #683.
Adds a persistent status bar to the CLI showing model name, context
window usage with visual bar, estimated cost, and session duration.
Responsive layout degrades gracefully for narrow terminals.
Changes:
- agent/usage_pricing.py: shared pricing table, cost estimation with
Decimal arithmetic, duration/token formatting helpers
- agent/insights.py: refactored to reuse usage_pricing (eliminates
duplicate pricing table and formatting logic)
- cli.py: status bar with FormattedTextControl fragments, color-coded
context thresholds (green/yellow/orange/red), enhanced /usage with
cost breakdown, 1Hz idle refresh for status bar updates
- tests/test_cli_status_bar.py: status bar snapshot, width collapsing,
usage report with/without pricing, zero-priced model handling
- tests/test_insights.py: verify zero-priced providers show as unknown
Salvage fixes:
- Resolved conflict with voice status bar (both coexist in layout)
- Import _format_context_length from hermes_cli.banner (moved since PR)
Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
2026-03-16 04:42:48 -07:00
|
|
|
return f"{sign}{text}{suffix}"
|
|
|
|
|
|
|
|
|
|
return f"{value:,}"
|