Compare commits

...

7 Commits

Author SHA1 Message Date
Teknium
ceb467b6a9 docs(plugins): cover every pluggable surface in both the overview and how-to
Both plugins.md and build-a-hermes-plugin.md now cover every extension
surface end-to-end \u2014 general plugin APIs, specialized plugin types,
config-driven surfaces \u2014 with concrete authoring patterns for each.

plugins.md:
- 'What plugins can do' table grows from 9 rows (general ctx.register_*
  only) to 14 rows covering register_platform, register_image_gen_provider,
  register_context_engine, MemoryProvider subclass, register_provider
  (model). Each row links to its full authoring guide.
- New 'Plugin sub-categories' section under Plugin Discovery explains
  how plugins/platforms/, plugins/image_gen/, plugins/memory/,
  plugins/context_engine/, plugins/model-providers/ are routed to
  different loaders \u2014 PluginManager vs the per-category own-loader
  systems.
- Explicit mention of user-override semantics at
  ~/.hermes/plugins/model-providers/ and ~/.hermes/plugins/memory/.

build-a-hermes-plugin.md:
- New '## Specialized plugin types' section (5 sub-sections):
  - Model provider plugins \u2014 ProviderProfile + plugin.yaml example,
    auto-wiring summary, link to full guide
  - Platform plugins \u2014 BasePlatformAdapter + register_platform() skeleton
  - Memory provider plugins \u2014 MemoryProvider subclass example
  - Context engine plugins \u2014 ContextEngine subclass example
  - Image-generation backends \u2014 ImageGenProvider + kind: backend example
- New '## Non-Python extension surfaces' section (5 sub-sections):
  - MCP servers \u2014 config.yaml mcp_servers.<name> example
  - Gateway event hooks \u2014 HOOK.yaml + handler.py example
  - Shell hooks \u2014 hooks: block in config.yaml example
  - Skill sources (taps) \u2014 hermes skills tap add example
  - TTS / STT command templates \u2014 tts.providers.<name> with type: command
- Distribute via pip / NixOS promoted from ### to ## (they were orphaned
  after the reorganization)

Each specialized / non-Python section has a concrete, copy-pasteable
example plus a 'Full guide:' link to the authoritative doc. Devs arriving
at the build-a-hermes-plugin guide now see every extension surface at
their disposal, not just the general tool/hook/slash-command surface.

Verified:
- Docusaurus build SUCCESS, zero new broken links
- All new cross-links (developer-guide/model-provider-plugin,
  adding-platform-adapters, memory-provider-plugin, context-engine-plugin,
  user-guide/features/mcp, skills#skills-hub, hooks#gateway-event-hooks,
  hooks#shell-hooks, tts#custom-command-providers,
  tts#voice-message-transcription-stt) resolve
- Same 3 pre-existing broken links on main (cron-script-only, llms.txt,
  adding-platform-adapters#step-by-step-checklist)
2026-05-06 05:03:39 -07:00
Teknium
8026e5f54c docs(plugins): expand pluggable interfaces table with MCP / event hooks / shell hooks / skill taps
Broadened the scope beyond Python register_* hooks. Hermes has MULTIPLE
plugin-style extension surfaces; they're now all in one table instead of
being scattered across feature docs.

Added rows for:
- **MCP servers** — config.yaml mcp_servers.<name> auto-registers external
  tools from any MCP server. Huge extensibility surface, previously not
  linked from the plugin map.
- **Gateway event hooks** — drop HOOK.yaml + handler.py into
  ~/.hermes/hooks/<name>/ to fire on gateway:startup, session:*, agent:*,
  command:* events. Separate from Python plugin hooks.
- **Shell hooks** — hooks: block in config.yaml runs shell commands on
  events (notifications, auditing, etc.).
- **Skill sources (taps)** — hermes skills tap add <repo> to pull in new
  skill registries beyond the built-in sources.

Both docs updated:
- user-guide/features/plugins.md: table column renamed to 'How' (mixes
  Python API + config-driven + drop-in-dir surfaces accurately)
- guides/build-a-hermes-plugin.md: :::info map at top mirrors the new
  surfaces with a forward-link to the consolidated table

Note block rewritten: instead of singling out TTS/STT as the 'different
style' exception, now honestly describes that Hermes deliberately
supports three plugin styles — Python APIs, config-driven commands, and
drop-in manifest directories — and devs should pick the one that fits
their integration.

Not included (considered and rejected):
- Transport layer (register_transport) — internal, not user-facing
- Tool-call parsers — internal, VLLM phase-2 thing
- Cloud browser providers — hardcoded registry, not drop-in yet
- Terminal backends — hardcoded if/elif, not drop-in yet
- Skill sources (the ABC) — hardcoded list, only taps are user-extensible

Verified:
- All 5 new anchors resolve (gateway-event-hooks, shell-hooks, skills-hub,
  custom-command-providers, voice-message-transcription-stt)
- Docusaurus build SUCCESS, zero new broken links
- Same 3 pre-existing broken links on main (cron-script-only, llms.txt,
  adding-platform-adapters#step-by-step-checklist)
2026-05-06 04:55:05 -07:00
Teknium
a401f8172b docs(plugins): correct TTS/STT pluggability \u2014 they ARE plugins (command-providers)
Previous commit incorrectly said TTS/STT 'aren't plugin-extensible'. They
are, via the config-driven command-provider pattern \u2014 any CLI that reads
text and writes audio (or vice versa for STT) is automatically a plugin
with zero Python. The tts.md docs cover this extensively and I missed it.

plugins.md:
- TTS row: 'Config-driven (not a Python plugin)', points at
  tts.md#custom-command-providers
- STT row: points at tts.md#voice-message-transcription-stt (STT docs
  live in tts.md despite the filename)
- Expanded note: TTS/STT use config-driven shell-command templates as
  their plugin surface (full tts.providers.<name> registry for TTS;
  HERMES_LOCAL_STT_COMMAND escape hatch for STT)
- Any CLI that reads/writes files is automatically a plugin \u2014 no Python
  register_* API needed
- Future register_tts_provider()/register_stt_provider() hooks mentioned
  as nice-to-have for SDK/streaming cases, not as the primary story

build-a-hermes-plugin.md:
- Same map update: TTS/STT rows explicit, footer note corrected

Verified:
- tts.md anchors (custom-command-providers, voice-message-transcription-stt)
  exist and resolve in docusaurus build (SUCCESS, no new broken links)
2026-05-06 04:29:55 -07:00
Teknium
c7f5ed4474 docs(plugins): add 'pluggable interfaces at a glance' maps to plugins.md + build-a-hermes-plugin
Devs landing on either the user-guide plugin page or the build-a-plugin
guide now get an upfront table of every distinct pluggable surface with
a link to the right authoring doc. Previously they'd have to read the
full general-plugin guide to discover that model providers / platforms
/ memory / context engines are separate systems.

user-guide/features/plugins.md:
- New 'Pluggable interfaces — where to go for each' section below the
  existing 4-kinds table
- 10 rows covering every register_* surface (tool, hook, slash command,
  CLI subcommand, skill, model provider, platform, memory, context
  engine, image-gen)
- Explicit note: TTS/STT are NOT plugin-extensible yet — documented
  with a pointer to the current config.yaml 'command providers' pattern
  and a note that register_tts_provider()/register_stt_provider() may
  come later

guides/build-a-hermes-plugin.md:
- New :::info 'Not sure which guide you need?' map at the top so devs
  see all pluggable interfaces before investing in this 737-line
  general-plugin walkthrough
- Existing bottom :::tip expanded to include platform adapters alongside
  model/memory/context plugins

Verified:
- All 8 cross-doc links in the new plugins.md table resolve in a
  docusaurus build (SUCCESS, no new broken links)
- TTS link corrected (features/voice → features/tts; latter exists)
- Pre-existing broken links/anchors (cron-script-only, llms.txt,
  adding-platform-adapters#step-by-step-checklist) are unchanged
2026-05-06 04:23:01 -07:00
Teknium
d69d65ad00 docs(providers): add model-provider-plugin authoring guide + fix stale refs
New docs:
- website/docs/developer-guide/model-provider-plugin.md — full authoring
  guide (directory layout, minimal example, ProviderProfile fields,
  overridable hooks, user overrides, api_mode selection, auth types,
  testing, pip distribution)
- Wired into website/sidebars.ts under 'Extending'
- Cross-references added in:
  - guides/build-a-hermes-plugin.md (tip block)
  - developer-guide/adding-providers.md
  - developer-guide/provider-runtime.md

User guide:
- user-guide/features/plugins.md: Plugin types table grows from 3 to 4
  with 'Model providers' row

Stale comment cleanup (providers/*.py → plugins/model-providers/<name>/):
- hermes_cli/main.py:_is_profile_api_key_provider docstring
- hermes_cli/doctor.py:_build_apikey_providers_list docstring
- hermes_cli/auth.py: PROVIDER_REGISTRY + alias auto-extension comments
- hermes_cli/models.py: CANONICAL_PROVIDERS auto-extension comment

AGENTS.md:
- Project-structure tree: added plugins/model-providers/ row
- New section: 'Model-provider plugins' explaining discovery, override
  semantics, PluginManager integration, kind auto-coerce heuristic

Verified: docusaurus build succeeds, new page renders, all 3 cross-links
resolve. 347/347 targeted tests pass (tests/providers/,
tests/hermes_cli/test_plugins.py, tests/hermes_cli/test_runtime_provider_resolution.py,
tests/run_agent/test_provider_parity.py).
2026-05-06 04:06:06 -07:00
Teknium
579c73384c feat(providers): make all 33 providers pluggable under plugins/model-providers/
Every provider profile is now a self-contained plugin under
plugins/model-providers/<name>/, mirroring the plugins/platforms/
pattern established for IRC and Teams. The ProviderProfile ABC
stays in providers/; the per-provider profile data moves out.

- plugins/model-providers/<name>/__init__.py calls register_provider()
- plugins/model-providers/<name>/plugin.yaml declares kind: model-provider
- providers/__init__.py._discover_providers() lazily scans bundled plugins
  then $HERMES_HOME/plugins/model-providers/<name>/ (user override path)
- User plugins with the same name override bundled ones (last-writer-wins
  in register_provider)
- Legacy providers/<name>.py layout still supported for back-compat with
  out-of-tree editable installs
- Hermes PluginManager: new kind=model-provider; skipped like memory
  plugins (providers/ discovery owns them); standalone plugins with
  register_provider+ProviderProfile in their __init__.py auto-coerce to
  this kind (same heuristic as memory providers)
- skip_names extended to include 'model-providers' so the general
  PluginManager doesn't double-scan the category
- 4 new tests in tests/providers/test_plugin_discovery.py covering
  bundled discovery, user override, and general-loader isolation
- Docs updated: website/docs/developer-guide/adding-providers.md,
  provider-runtime.md, providers/README.md, plugins/model-providers/README.md

No API break: auth.py / config.py / doctor.py / models.py / runtime_provider.py /
model_metadata.py / auxiliary_client.py / chat_completions.py / run_agent.py
all still consume providers via get_provider_profile() / list_providers() —
they just now see plugin-discovered entries instead of pkgutil-iterated ones.

Third parties can now drop a single directory into
~/.hermes/plugins/model-providers/<name>/ to add or override an inference
provider without touching the repo.
2026-05-05 13:36:08 -07:00
kshitijk4poor
abce1a5d08 feat: provider modules — ProviderProfile ABC, 33 providers, fetch_models, transport single-path
Introduces providers/ package — single source of truth for every
inference provider. Adding a simple api-key provider now requires one
providers/<name>.py file with zero edits anywhere else.

What this PR ships:
- providers/ package (ProviderProfile ABC + 33 profiles across 4 api_modes)
- ProviderProfile declarative fields: name, api_mode, aliases, display_name,
  env_vars, base_url, models_url, auth_type, fallback_models, hostname,
  default_headers, fixed_temperature, default_max_tokens, default_aux_model
- 4 overridable hooks: prepare_messages, build_extra_body,
  build_api_kwargs_extras, fetch_models
- chat_completions.build_kwargs: profile path via _build_kwargs_from_profile,
  legacy flag path retained for lmstudio/tencent-tokenhub (which have
  session-aware reasoning probing that doesn't map cleanly to hooks yet)
- run_agent.py: profile path for all registered providers; legacy path
  variable scoping fixed (all flags defined before branching)
- Auto-wires: auth.PROVIDER_REGISTRY, models.CANONICAL_PROVIDERS,
  doctor health checks, config.OPTIONAL_ENV_VARS, model_metadata._URL_TO_PROVIDER
- GeminiProfile: thinking_config translation (native + openai-compat nested)
- New tests/providers/ (79 tests covering profile declarations, transport
  parity, hook overrides, e2e kwargs assembly)

Deltas vs original PR (salvaged onto current main):
- Added profiles: alibaba-coding-plan, azure-foundry, minimax-oauth
  (were added to main since original PR)
- Skipped profiles: lmstudio, tencent-tokenhub stay on legacy path (their
  reasoning_effort probing has no clean hook equivalent yet)
- Removed lmstudio alias from custom profile (it's a separate provider now)
- Skipped openrouter/custom from PROVIDER_REGISTRY auto-extension
  (resolve_provider special-cases them; adding breaks runtime resolution)
- runtime_provider: profile.api_mode only as fallback when URL detection
  finds nothing (was breaking minimax /v1 override)
- Preserved main's legacy-path improvements: deepseek reasoning_content
  preserve, gemini Gemma skip, OpenRouter response caching, Anthropic 1M
  beta recovery, etc.
- Kept agent/copilot_acp_client.py in place (rejected PR's relocation —
  main has 7 fixes landed since; relocation would revert them)
- _API_KEY_PROVIDER_AUX_MODELS alias kept for backward compat with existing
  test imports

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Closes #14418
2026-05-05 10:18:49 -07:00
93 changed files with 4059 additions and 190 deletions

View File

@@ -42,6 +42,7 @@ hermes-agent/
├── plugins/ # Plugin system (see "Plugins" section below)
│ ├── memory/ # Memory-provider plugins (honcho, mem0, supermemory, ...)
│ ├── context_engine/ # Context-engine plugins
│ ├── model-providers/ # Inference backend plugins (openrouter, anthropic, gmi, ...)
│ ├── kanban/ # Multi-agent board dispatcher + worker plugin
│ ├── hermes-achievements/ # Gamified achievement tracking
│ ├── observability/ # Metrics / traces / logs plugin
@@ -512,6 +513,31 @@ generic plugin surface (new hook, new ctx method) — never hardcode
plugin-specific logic into core. PR #5295 removed 95 lines of hardcoded
honcho argparse from `main.py` for exactly this reason.
### Model-provider plugins (`plugins/model-providers/<name>/`)
Every inference backend (openrouter, anthropic, gmi, deepseek, nvidia, …)
ships as a plugin here. Each plugin's `__init__.py` calls
`providers.register_provider(ProviderProfile(...))` at module load.
`providers/__init__.py._discover_providers()` is a **lazy, separate
discovery system** — scanned on first `get_provider_profile()` or
`list_providers()` call, NOT by the general PluginManager.
Scan order:
1. Bundled: `<repo>/plugins/model-providers/<name>/`
2. User: `$HERMES_HOME/plugins/model-providers/<name>/`
3. Legacy: `<repo>/providers/<name>.py` (back-compat)
User plugins of the same name override bundled ones — `register_provider()`
is last-writer-wins. This lets third parties swap out any built-in
profile without a repo patch.
The general PluginManager records `kind: model-provider` manifests but does
NOT import them (would double-instantiate `ProviderProfile`). Plugins
without an explicit `kind:` get auto-coerced via a source-text heuristic
(`register_provider` + `ProviderProfile` in `__init__.py`).
Full authoring guide: `website/docs/developer-guide/model-provider-plugin.md`.
### Dashboard / context-engine / image-gen plugin directories
`plugins/context_engine/`, `plugins/image_gen/`, `plugins/example-dashboard/`,

View File

@@ -216,7 +216,26 @@ def _fixed_temperature_for_model(
return None
# Default auxiliary models for direct API-key providers (cheap/fast for side tasks)
_API_KEY_PROVIDER_AUX_MODELS: Dict[str, str] = {
def _get_aux_model_for_provider(provider_id: str) -> str:
"""Return the cheap auxiliary model for a provider.
Reads from ProviderProfile.default_aux_model first, falling back to the
legacy hardcoded dict for providers that predate the profiles system.
"""
try:
from providers import get_provider_profile
_p = get_provider_profile(provider_id)
if _p and _p.default_aux_model:
return _p.default_aux_model
except Exception:
pass
return _API_KEY_PROVIDER_AUX_MODELS_FALLBACK.get(provider_id, "")
# Fallback for providers not yet migrated to ProviderProfile.default_aux_model,
# plus providers we intentionally keep pinned here (e.g. Anthropic predates
# profiles). New providers should set default_aux_model on their profile instead.
_API_KEY_PROVIDER_AUX_MODELS_FALLBACK: Dict[str, str] = {
"gemini": "gemini-3-flash-preview",
"zai": "glm-4.5-flash",
"kimi-coding": "kimi-k2-turbo-preview",
@@ -235,6 +254,10 @@ _API_KEY_PROVIDER_AUX_MODELS: Dict[str, str] = {
"tencent-tokenhub": "hy3-preview",
}
# Legacy alias — callers that haven't been updated to _get_aux_model_for_provider()
# can still use this dict directly. Kept in sync with _FALLBACK above.
_API_KEY_PROVIDER_AUX_MODELS: Dict[str, str] = _API_KEY_PROVIDER_AUX_MODELS_FALLBACK
# Vision-specific model overrides for direct providers.
# When the user's main provider has a dedicated vision/multimodal model that
# differs from their main chat model, map it here. The vision auto-detect
@@ -1155,7 +1178,7 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
raw_base_url = _pool_runtime_base_url(entry, pconfig.inference_base_url) or pconfig.inference_base_url
base_url = _to_openai_base_url(raw_base_url)
model = _API_KEY_PROVIDER_AUX_MODELS.get(provider_id)
model = _get_aux_model_for_provider(provider_id) or None
if model is None:
continue # skip provider if we don't know a valid aux model
logger.debug("Auxiliary text client: %s (%s) via pool", pconfig.name, model)
@@ -1171,6 +1194,14 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
from hermes_cli.models import copilot_default_headers
extra["default_headers"] = copilot_default_headers()
else:
try:
from providers import get_provider_profile as _gpf_aux
_ph_aux = _gpf_aux(provider_id)
if _ph_aux and _ph_aux.default_headers:
extra["default_headers"] = dict(_ph_aux.default_headers)
except Exception:
pass
_client = OpenAI(api_key=api_key, base_url=base_url, **extra)
_client = _maybe_wrap_anthropic(_client, model, api_key, raw_base_url)
return _client, model
@@ -1182,7 +1213,7 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
raw_base_url = str(creds.get("base_url", "")).strip().rstrip("/") or pconfig.inference_base_url
base_url = _to_openai_base_url(raw_base_url)
model = _API_KEY_PROVIDER_AUX_MODELS.get(provider_id)
model = _get_aux_model_for_provider(provider_id) or None
if model is None:
continue # skip provider if we don't know a valid aux model
logger.debug("Auxiliary text client: %s (%s)", pconfig.name, model)
@@ -1198,6 +1229,14 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
from hermes_cli.models import copilot_default_headers
extra["default_headers"] = copilot_default_headers()
else:
try:
from providers import get_provider_profile as _gpf_aux2
_ph_aux2 = _gpf_aux2(provider_id)
if _ph_aux2 and _ph_aux2.default_headers:
extra["default_headers"] = dict(_ph_aux2.default_headers)
except Exception:
pass
_client = OpenAI(api_key=api_key, base_url=base_url, **extra)
_client = _maybe_wrap_anthropic(_client, model, api_key, raw_base_url)
return _client, model
@@ -1570,7 +1609,7 @@ def _try_anthropic(explicit_api_key: str = None) -> Tuple[Optional[Any], Optiona
from agent.anthropic_adapter import _is_oauth_token
is_oauth = _is_oauth_token(token)
model = _API_KEY_PROVIDER_AUX_MODELS.get("anthropic", "claude-haiku-4-5-20251001")
model = _get_aux_model_for_provider("anthropic") or "claude-haiku-4-5-20251001"
logger.debug("Auxiliary client: Anthropic native (%s) at %s (oauth=%s)", model, base_url, is_oauth)
try:
real_client = build_anthropic_client(token, base_url)
@@ -2373,7 +2412,7 @@ def resolve_provider_client(
if explicit_base_url:
base_url = _to_openai_base_url(explicit_base_url.strip().rstrip("/"))
default_model = _API_KEY_PROVIDER_AUX_MODELS.get(provider, "")
default_model = _get_aux_model_for_provider(provider)
final_model = _normalize_resolved_model(model or default_model, provider)
if provider == "gemini":

View File

@@ -318,6 +318,17 @@ _URL_TO_PROVIDER: Dict[str, str] = {
"ollama.com": "ollama-cloud",
}
# Auto-extend with hostnames derived from provider profiles.
# Any provider with a base_url not already in the map gets added automatically.
try:
from providers import list_providers as _list_providers
for _pp in _list_providers():
_host = _pp.get_hostname()
if _host and _host not in _URL_TO_PROVIDER:
_URL_TO_PROVIDER[_host] = _pp.name
except Exception:
pass
def _infer_provider_from_url(base_url: str) -> Optional[str]:
"""Infer the models.dev provider name from a base URL.

View File

@@ -6,9 +6,16 @@ Usage:
result = transport.normalize_response(raw_response)
"""
from agent.transports.types import NormalizedResponse, ToolCall, Usage, build_tool_call, map_finish_reason # noqa: F401
from agent.transports.types import (
NormalizedResponse,
ToolCall,
Usage,
build_tool_call,
map_finish_reason,
) # noqa: F401
_REGISTRY: dict = {}
_discovered: bool = False
def register_transport(api_mode: str, transport_cls: type) -> None:
@@ -23,6 +30,9 @@ def get_transport(api_mode: str):
This allows gradual migration — call sites can check for None
and fall back to the legacy code path.
"""
global _discovered
if not _discovered:
_discover_transports()
cls = _REGISTRY.get(api_mode)
if cls is None:
# The registry can be partially populated when a specific transport
@@ -38,6 +48,8 @@ def get_transport(api_mode: str):
def _discover_transports() -> None:
"""Import all transport modules to trigger auto-registration."""
global _discovered
_discovered = True
try:
import agent.transports.anthropic # noqa: F401
except ImportError:

View File

@@ -109,7 +109,9 @@ class ChatCompletionsTransport(ProviderTransport):
def api_mode(self) -> str:
return "chat_completions"
def convert_messages(self, messages: List[Dict[str, Any]], **kwargs) -> List[Dict[str, Any]]:
def convert_messages(
self, messages: list[dict[str, Any]], **kwargs
) -> list[dict[str, Any]]:
"""Messages are already in OpenAI format — sanitize Codex leaks only.
Strips Codex Responses API fields (``codex_reasoning_items`` /
@@ -126,7 +128,9 @@ class ChatCompletionsTransport(ProviderTransport):
tool_calls = msg.get("tool_calls")
if isinstance(tool_calls, list):
for tc in tool_calls:
if isinstance(tc, dict) and ("call_id" in tc or "response_item_id" in tc):
if isinstance(tc, dict) and (
"call_id" in tc or "response_item_id" in tc
):
needs_sanitize = True
break
if needs_sanitize:
@@ -149,39 +153,41 @@ class ChatCompletionsTransport(ProviderTransport):
tc.pop("response_item_id", None)
return sanitized
def convert_tools(self, tools: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
def convert_tools(self, tools: list[dict[str, Any]]) -> list[dict[str, Any]]:
"""Tools are already in OpenAI format — identity."""
return tools
def build_kwargs(
self,
model: str,
messages: List[Dict[str, Any]],
tools: Optional[List[Dict[str, Any]]] = None,
messages: list[dict[str, Any]],
tools: list[dict[str, Any]] | None = None,
**params,
) -> Dict[str, Any]:
) -> dict[str, Any]:
"""Build chat.completions.create() kwargs.
This is the most complex transport method — it handles ~16 providers
via params rather than subclasses.
params:
params (all optional):
timeout: float — API call timeout
max_tokens: int | None — user-configured max tokens
ephemeral_max_output_tokens: int | None — one-shot override (error recovery)
ephemeral_max_output_tokens: int | None — one-shot override
max_tokens_param_fn: callable — returns {max_tokens: N} or {max_completion_tokens: N}
reasoning_config: dict | None
request_overrides: dict | None
session_id: str | None
qwen_session_metadata: dict | None — {sessionId, promptId} precomputed
model_lower: str — lowercase model name for pattern matching
# Provider detection flags (all optional, default False)
# Provider profile path (all per-provider quirks live in providers/)
provider_profile: ProviderProfile | None — when present, delegates to
_build_kwargs_from_profile(); all flag params below are bypassed.
# Legacy-path flags — only used when provider_profile is None
# (i.e. custom / unregistered providers). Known providers all go
# through provider_profile.
is_openrouter: bool
is_nous: bool
is_qwen_portal: bool
is_github_models: bool
is_nvidia_nim: bool
is_kimi: bool
is_tokenhub: bool
is_lmstudio: bool
is_custom_provider: bool
ollama_num_ctx: int | None
@@ -190,6 +196,7 @@ class ChatCompletionsTransport(ProviderTransport):
# Qwen-specific
qwen_prepare_fn: callable | None — runs AFTER codex sanitization
qwen_prepare_inplace_fn: callable | None — in-place variant for deepcopied lists
qwen_session_metadata: dict | None
# Temperature
fixed_temperature: Any — from _fixed_temperature_for_model()
omit_temperature: bool
@@ -199,28 +206,21 @@ class ChatCompletionsTransport(ProviderTransport):
lmstudio_reasoning_options: list[str] | None # raw allowed_options from /api/v1/models
# Claude on OpenRouter/Nous max output
anthropic_max_output: int | None
# Extra
extra_body_additions: dict | None — pre-built extra_body entries
extra_body_additions: dict | None
"""
# Codex sanitization: drop reasoning_items / call_id / response_item_id
sanitized = self.convert_messages(messages)
# Qwen portal prep AFTER codex sanitization. If sanitize already
# deepcopied, reuse that copy via the in-place variant to avoid a
# second deepcopy.
is_qwen = params.get("is_qwen_portal", False)
if is_qwen:
qwen_prep = params.get("qwen_prepare_fn")
qwen_prep_inplace = params.get("qwen_prepare_inplace_fn")
if sanitized is messages:
if qwen_prep is not None:
sanitized = qwen_prep(sanitized)
else:
# Already deepcopied — transform in place
if qwen_prep_inplace is not None:
qwen_prep_inplace(sanitized)
elif qwen_prep is not None:
sanitized = qwen_prep(sanitized)
# ── Provider profile: single-path when present ──────────────────
_profile = params.get("provider_profile")
if _profile:
return self._build_kwargs_from_profile(
_profile, model, sanitized, tools, params
)
# ── Legacy fallback (unregistered / unknown provider) ───────────
# Reached only when get_provider_profile() returned None.
# Known providers always go through the profile path above.
# Developer role swap for GPT-5/Codex models
model_lower = params.get("model_lower", (model or "").lower())
@@ -233,7 +233,7 @@ class ChatCompletionsTransport(ProviderTransport):
sanitized = list(sanitized)
sanitized[0] = {**sanitized[0], "role": "developer"}
api_kwargs: Dict[str, Any] = {
api_kwargs: dict[str, Any] = {
"model": model,
"messages": sanitized,
}
@@ -242,19 +242,6 @@ class ChatCompletionsTransport(ProviderTransport):
if timeout is not None:
api_kwargs["timeout"] = timeout
# Temperature
fixed_temp = params.get("fixed_temperature")
omit_temp = params.get("omit_temperature", False)
if omit_temp:
api_kwargs.pop("temperature", None)
elif fixed_temp is not None:
api_kwargs["temperature"] = fixed_temp
# Qwen metadata (caller precomputes {sessionId, promptId})
qwen_meta = params.get("qwen_session_metadata")
if qwen_meta and is_qwen:
api_kwargs["metadata"] = qwen_meta
# Tools
if tools:
# Moonshot/Kimi uses a stricter flavored JSON Schema. Rewriting
@@ -278,13 +265,6 @@ class ChatCompletionsTransport(ProviderTransport):
api_kwargs.update(max_tokens_fn(ephemeral))
elif max_tokens is not None and max_tokens_fn:
api_kwargs.update(max_tokens_fn(max_tokens))
elif is_nvidia_nim and max_tokens_fn:
api_kwargs.update(max_tokens_fn(16384))
elif is_qwen and max_tokens_fn:
api_kwargs.update(max_tokens_fn(65536))
elif is_kimi and max_tokens_fn:
# Kimi/Moonshot: 32000 matches Kimi CLI's default
api_kwargs.update(max_tokens_fn(32000))
elif anthropic_max_out is not None:
api_kwargs["max_tokens"] = anthropic_max_out
@@ -331,7 +311,7 @@ class ChatCompletionsTransport(ProviderTransport):
api_kwargs["reasoning_effort"] = _lm_effort
# extra_body assembly
extra_body: Dict[str, Any] = {}
extra_body: dict[str, Any] = {}
is_openrouter = params.get("is_openrouter", False)
is_nous = params.get("is_nous", False)
@@ -361,35 +341,7 @@ class ChatCompletionsTransport(ProviderTransport):
if gh_reasoning is not None:
extra_body["reasoning"] = gh_reasoning
else:
if reasoning_config is not None:
rc = dict(reasoning_config)
if is_nous and rc.get("enabled") is False:
pass # omit for Nous when disabled
else:
extra_body["reasoning"] = rc
else:
extra_body["reasoning"] = {"enabled": True, "effort": "medium"}
if is_nous:
extra_body["tags"] = ["product=hermes-agent"]
# Ollama num_ctx
ollama_ctx = params.get("ollama_num_ctx")
if ollama_ctx:
options = extra_body.get("options", {})
options["num_ctx"] = ollama_ctx
extra_body["options"] = options
# Ollama/custom think=false
if params.get("is_custom_provider", False):
if reasoning_config and isinstance(reasoning_config, dict):
_effort = (reasoning_config.get("effort") or "").strip().lower()
_enabled = reasoning_config.get("enabled", True)
if _effort == "none" or _enabled is False:
extra_body["think"] = False
if is_qwen:
extra_body["vl_high_resolution_images"] = True
extra_body["reasoning"] = {"enabled": True, "effort": "medium"}
if provider_name == "gemini":
raw_thinking_config = _build_gemini_thinking_config(model, reasoning_config)
@@ -423,6 +375,120 @@ class ChatCompletionsTransport(ProviderTransport):
return api_kwargs
def _build_kwargs_from_profile(self, profile, model, sanitized, tools, params):
"""Build API kwargs using a ProviderProfile — single path, no legacy flags.
This method replaces the entire flag-based kwargs assembly when a
provider_profile is passed. Every quirk comes from the profile object.
"""
from providers.base import OMIT_TEMPERATURE
# Message preprocessing
sanitized = profile.prepare_messages(sanitized)
# Developer role swap — model-name-based, applies to all providers
_model_lower = (model or "").lower()
if (
sanitized
and isinstance(sanitized[0], dict)
and sanitized[0].get("role") == "system"
and any(p in _model_lower for p in DEVELOPER_ROLE_MODELS)
):
sanitized = list(sanitized)
sanitized[0] = {**sanitized[0], "role": "developer"}
api_kwargs: dict[str, Any] = {
"model": model,
"messages": sanitized,
}
# Temperature
if profile.fixed_temperature is OMIT_TEMPERATURE:
pass # Don't include temperature at all
elif profile.fixed_temperature is not None:
api_kwargs["temperature"] = profile.fixed_temperature
else:
# Use caller's temperature if provided
temp = params.get("temperature")
if temp is not None:
api_kwargs["temperature"] = temp
# Timeout
timeout = params.get("timeout")
if timeout is not None:
api_kwargs["timeout"] = timeout
# Tools — apply Moonshot/Kimi schema sanitization regardless of path
if tools:
if is_moonshot_model(model):
tools = sanitize_moonshot_tools(tools)
api_kwargs["tools"] = tools
# max_tokens resolution — priority: ephemeral > user > profile default
max_tokens_fn = params.get("max_tokens_param_fn")
ephemeral = params.get("ephemeral_max_output_tokens")
user_max = params.get("max_tokens")
anthropic_max = params.get("anthropic_max_output")
if ephemeral is not None and max_tokens_fn:
api_kwargs.update(max_tokens_fn(ephemeral))
elif user_max is not None and max_tokens_fn:
api_kwargs.update(max_tokens_fn(user_max))
elif profile.default_max_tokens and max_tokens_fn:
api_kwargs.update(max_tokens_fn(profile.default_max_tokens))
elif anthropic_max is not None:
api_kwargs["max_tokens"] = anthropic_max
# Provider-specific api_kwargs extras (reasoning_effort, metadata, etc.)
reasoning_config = params.get("reasoning_config")
extra_body_from_profile, top_level_from_profile = (
profile.build_api_kwargs_extras(
reasoning_config=reasoning_config,
supports_reasoning=params.get("supports_reasoning", False),
qwen_session_metadata=params.get("qwen_session_metadata"),
model=model,
ollama_num_ctx=params.get("ollama_num_ctx"),
)
)
api_kwargs.update(top_level_from_profile)
# extra_body assembly
extra_body: dict[str, Any] = {}
# Profile's extra_body (tags, provider prefs, vl_high_resolution, etc.)
profile_body = profile.build_extra_body(
session_id=params.get("session_id"),
provider_preferences=params.get("provider_preferences"),
model=model,
base_url=params.get("base_url"),
reasoning_config=reasoning_config,
)
if profile_body:
extra_body.update(profile_body)
# Profile's reasoning/thinking extra_body entries
if extra_body_from_profile:
extra_body.update(extra_body_from_profile)
# Merge any pre-built extra_body additions from the caller
additions = params.get("extra_body_additions")
if additions:
extra_body.update(additions)
# Request overrides (user config)
overrides = params.get("request_overrides")
if overrides:
for k, v in overrides.items():
if k == "extra_body" and isinstance(v, dict):
extra_body.update(v)
else:
api_kwargs[k] = v
if extra_body:
api_kwargs["extra_body"] = extra_body
return api_kwargs
def normalize_response(self, response: Any, **kwargs) -> NormalizedResponse:
"""Normalize OpenAI ChatCompletion to NormalizedResponse.
@@ -444,7 +510,7 @@ class ChatCompletionsTransport(ProviderTransport):
# Gemini 3 thinking models attach extra_content with
# thought_signature — without replay on the next turn the API
# rejects the request with 400.
tc_provider_data: Dict[str, Any] = {}
tc_provider_data: dict[str, Any] = {}
extra = getattr(tc, "extra_content", None)
if extra is None and hasattr(tc, "model_extra"):
extra = (tc.model_extra or {}).get("extra_content")
@@ -455,12 +521,14 @@ class ChatCompletionsTransport(ProviderTransport):
except Exception:
pass
tc_provider_data["extra_content"] = extra
tool_calls.append(ToolCall(
id=tc.id,
name=tc.function.name,
arguments=tc.function.arguments,
provider_data=tc_provider_data or None,
))
tool_calls.append(
ToolCall(
id=tc.id,
name=tc.function.name,
arguments=tc.function.arguments,
provider_data=tc_provider_data or None,
)
)
usage = None
if hasattr(response, "usage") and response.usage:
@@ -508,7 +576,7 @@ class ChatCompletionsTransport(ProviderTransport):
return False
return True
def extract_cache_stats(self, response: Any) -> Optional[Dict[str, int]]:
def extract_cache_stats(self, response: Any) -> dict[str, int] | None:
"""Extract OpenRouter/OpenAI cache stats from prompt_tokens_details."""
usage = getattr(response, "usage", None)
if usage is None:

View File

@@ -12,7 +12,7 @@ from __future__ import annotations
import json
from dataclasses import dataclass, field
from typing import Any, Dict, List, Optional
from typing import Any
@dataclass
@@ -32,10 +32,10 @@ class ToolCall:
* Others: ``None``
"""
id: Optional[str]
id: str | None
name: str
arguments: str # JSON string
provider_data: Optional[Dict[str, Any]] = field(default=None, repr=False)
provider_data: dict[str, Any] | None = field(default=None, repr=False)
# ── Backward compatibility ──────────────────────────────────
# The agent loop reads tc.function.name / tc.function.arguments
@@ -47,17 +47,17 @@ class ToolCall:
return "function"
@property
def function(self) -> "ToolCall":
def function(self) -> ToolCall:
"""Return self so tc.function.name / tc.function.arguments work."""
return self
@property
def call_id(self) -> Optional[str]:
def call_id(self) -> str | None:
"""Codex call_id from provider_data, accessed via getattr by _build_assistant_message."""
return (self.provider_data or {}).get("call_id")
@property
def response_item_id(self) -> Optional[str]:
def response_item_id(self) -> str | None:
"""Codex response_item_id from provider_data."""
return (self.provider_data or {}).get("response_item_id")
@@ -101,18 +101,18 @@ class NormalizedResponse:
* Others: ``None``
"""
content: Optional[str]
tool_calls: Optional[List[ToolCall]]
content: str | None
tool_calls: list[ToolCall] | None
finish_reason: str # "stop", "tool_calls", "length", "content_filter"
reasoning: Optional[str] = None
usage: Optional[Usage] = None
provider_data: Optional[Dict[str, Any]] = field(default=None, repr=False)
reasoning: str | None = None
usage: Usage | None = None
provider_data: dict[str, Any] | None = field(default=None, repr=False)
# ── Backward compatibility ──────────────────────────────────
# The shim _nr_to_assistant_message() mapped these from provider_data.
# These properties let NormalizedResponse pass through directly.
@property
def reasoning_content(self) -> Optional[str]:
def reasoning_content(self) -> str | None:
pd = self.provider_data or {}
return pd.get("reasoning_content")
@@ -136,8 +136,9 @@ class NormalizedResponse:
# Factory helpers
# ---------------------------------------------------------------------------
def build_tool_call(
id: Optional[str],
id: str | None,
name: str,
arguments: Any,
**provider_fields: Any,
@@ -151,7 +152,7 @@ def build_tool_call(
return ToolCall(id=id, name=name, arguments=args_str, provider_data=pd)
def map_finish_reason(reason: Optional[str], mapping: Dict[str, str]) -> str:
def map_finish_reason(reason: str | None, mapping: dict[str, str]) -> str:
"""Translate a provider-specific stop reason to the normalised set.
Falls back to ``"stop"`` for unknown or ``None`` reasons.

View File

@@ -416,6 +416,40 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
),
}
# Auto-extend PROVIDER_REGISTRY with any api-key provider registered in
# providers/ that is not already declared above. New providers only need a
# plugins/model-providers/<name>/ plugin — no edits to this file required.
try:
from providers import list_providers as _list_providers_for_registry
for _pp in _list_providers_for_registry():
if _pp.name in PROVIDER_REGISTRY:
continue
if _pp.auth_type != "api_key" or not _pp.env_vars:
continue
# Skip providers that need custom token resolution or are special-cased
# in resolve_provider() (copilot/kimi/zai have bespoke token refresh;
# openrouter/custom are aggregator/user-supplied and handled outside
# the registry — adding them here breaks runtime_provider resolution
# that relies on `openrouter not in PROVIDER_REGISTRY`).
if _pp.name in {"copilot", "kimi-coding", "kimi-coding-cn", "zai", "openrouter", "custom"}:
continue
_api_key_vars = tuple(v for v in _pp.env_vars if not v.endswith("_BASE_URL") and not v.endswith("_URL"))
_base_url_var = next((v for v in _pp.env_vars if v.endswith("_BASE_URL") or v.endswith("_URL")), None)
PROVIDER_REGISTRY[_pp.name] = ProviderConfig(
id=_pp.name,
name=_pp.display_name or _pp.name,
auth_type="api_key",
inference_base_url=_pp.base_url,
api_key_env_vars=_api_key_vars or _pp.env_vars,
base_url_env_var=_base_url_var or "",
)
# Also register aliases so resolve_provider() resolves them
for _alias in _pp.aliases:
if _alias not in PROVIDER_REGISTRY:
PROVIDER_REGISTRY[_alias] = PROVIDER_REGISTRY[_pp.name]
except Exception:
pass
# =============================================================================
# Anthropic Key Helper
@@ -1195,6 +1229,17 @@ def resolve_provider(
"vllm": "custom", "llamacpp": "custom",
"llama.cpp": "custom", "llama-cpp": "custom",
}
# Extend with aliases declared in plugins/model-providers/<name>/ that aren't already mapped.
# This keeps providers/ as the single source for new aliases while the
# hardcoded dict above remains authoritative for existing ones.
try:
from providers import list_providers as _lp
for _pp in _lp():
for _alias in _pp.aliases:
if _alias not in _PROVIDER_ALIASES:
_PROVIDER_ALIASES[_alias] = _pp.name
except Exception:
pass
normalized = _PROVIDER_ALIASES.get(normalized, normalized)
if normalized == "openrouter":

View File

@@ -4840,3 +4840,45 @@ def config_command(args):
print(" hermes config path Show config file path")
print(" hermes config env-path Show .env file path")
sys.exit(1)
# ── Profile-driven env var injection ─────────────────────────────────────────
# Any provider registered in providers/ with auth_type="api_key" automatically
# gets its env_vars exposed in OPTIONAL_ENV_VARS without editing this file.
# Runs once at import time.
_profile_env_vars_injected = False
def _inject_profile_env_vars() -> None:
"""Populate OPTIONAL_ENV_VARS from provider profiles not already listed.
Called once at module load time. Idempotent — repeated calls are no-ops.
"""
global _profile_env_vars_injected
if _profile_env_vars_injected:
return
_profile_env_vars_injected = True
try:
from providers import list_providers
for _pp in list_providers():
if _pp.auth_type not in ("api_key",):
continue
for _var in _pp.env_vars:
if _var in OPTIONAL_ENV_VARS:
continue
_is_key = not _var.endswith("_BASE_URL") and not _var.endswith("_URL")
OPTIONAL_ENV_VARS[_var] = {
"description": f"{_pp.display_name or _pp.name} {'API key' if _is_key else 'base URL override'}",
"prompt": f"{_pp.display_name or _pp.name} {'API key' if _is_key else 'base URL (leave empty for default)'}",
"url": _pp.signup_url or None,
"password": _is_key,
"category": "provider",
"advanced": True,
}
except Exception:
pass
# Eagerly inject so that OPTIONAL_ENV_VARS is fully populated at import time.
_inject_profile_env_vars()

View File

@@ -175,6 +175,85 @@ def _check_gateway_service_linger(issues: list[str]) -> None:
check_warn("Could not verify systemd linger", f"({linger_detail})")
_APIKEY_PROVIDERS_CACHE: list | None = None
def _build_apikey_providers_list() -> list:
"""Build the API-key provider health-check list once and cache it.
Tuple format: (name, env_vars, default_url, base_env, supports_models_endpoint)
Base list augmented with any ProviderProfile with auth_type="api_key" not
already present — adding plugins/model-providers/<name>/ is sufficient to get into doctor.
"""
_static = [
("Z.AI / GLM", ("GLM_API_KEY", "ZAI_API_KEY", "Z_AI_API_KEY"), "https://api.z.ai/api/paas/v4/models", "GLM_BASE_URL", True),
("Kimi / Moonshot", ("KIMI_API_KEY",), "https://api.moonshot.ai/v1/models", "KIMI_BASE_URL", True),
("StepFun Step Plan", ("STEPFUN_API_KEY",), "https://api.stepfun.ai/step_plan/v1/models", "STEPFUN_BASE_URL", True),
("Kimi / Moonshot (China)", ("KIMI_CN_API_KEY",), "https://api.moonshot.cn/v1/models", None, True),
("Arcee AI", ("ARCEEAI_API_KEY",), "https://api.arcee.ai/api/v1/models", "ARCEE_BASE_URL", True),
("GMI Cloud", ("GMI_API_KEY",), "https://api.gmi-serving.com/v1/models", "GMI_BASE_URL", True),
("DeepSeek", ("DEEPSEEK_API_KEY",), "https://api.deepseek.com/v1/models", "DEEPSEEK_BASE_URL", True),
("Hugging Face", ("HF_TOKEN",), "https://router.huggingface.co/v1/models", "HF_BASE_URL", True),
("NVIDIA NIM", ("NVIDIA_API_KEY",), "https://integrate.api.nvidia.com/v1/models", "NVIDIA_BASE_URL", True),
("Alibaba/DashScope", ("DASHSCOPE_API_KEY",), "https://dashscope-intl.aliyuncs.com/compatible-mode/v1/models", "DASHSCOPE_BASE_URL", True),
# MiniMax global: /v1 endpoint supports /models.
("MiniMax", ("MINIMAX_API_KEY",), "https://api.minimax.io/v1/models", "MINIMAX_BASE_URL", True),
# MiniMax CN: /v1 endpoint does NOT support /models (returns 404).
("MiniMax (China)", ("MINIMAX_CN_API_KEY",), "https://api.minimaxi.com/v1/models", "MINIMAX_CN_BASE_URL", False),
("Vercel AI Gateway", ("AI_GATEWAY_API_KEY",), "https://ai-gateway.vercel.sh/v1/models", "AI_GATEWAY_BASE_URL", True),
("Kilo Code", ("KILOCODE_API_KEY",), "https://api.kilo.ai/api/gateway/models", "KILOCODE_BASE_URL", True),
("OpenCode Zen", ("OPENCODE_ZEN_API_KEY",), "https://opencode.ai/zen/v1/models", "OPENCODE_ZEN_BASE_URL", True),
# OpenCode Go has no shared /models endpoint; skip the health check.
("OpenCode Go", ("OPENCODE_GO_API_KEY",), None, "OPENCODE_GO_BASE_URL", False),
]
_known_names = {t[0] for t in _static}
# Also index by profile canonical name so profiles without display_name
# don't create duplicate entries for providers already in the static list.
_known_canonical: set[str] = set()
_name_to_canonical = {
"Z.AI / GLM": "zai", "Kimi / Moonshot": "kimi-coding",
"StepFun Step Plan": "stepfun", "Kimi / Moonshot (China)": "kimi-coding-cn",
"Arcee AI": "arcee", "GMI Cloud": "gmi", "DeepSeek": "deepseek",
"Hugging Face": "huggingface", "NVIDIA NIM": "nvidia",
"Alibaba/DashScope": "alibaba", "MiniMax": "minimax",
"MiniMax (China)": "minimax-cn", "Vercel AI Gateway": "ai-gateway",
"Kilo Code": "kilocode", "OpenCode Zen": "opencode-zen",
"OpenCode Go": "opencode-go",
}
for _label, _canonical in _name_to_canonical.items():
_known_canonical.add(_canonical)
try:
from providers import list_providers
from providers.base import ProviderProfile as _PP
for _pp in list_providers():
if not isinstance(_pp, _PP) or _pp.auth_type != "api_key" or not _pp.env_vars:
continue
_label = _pp.display_name or _pp.name
if _label in _known_names or _pp.name in _known_canonical:
continue
# Separate API-key vars from base-URL override vars — the health-check
# loop sends the first found value as Authorization: Bearer, so a URL
# string must never be picked.
_key_vars = tuple(
v for v in _pp.env_vars
if not v.endswith("_BASE_URL") and not v.endswith("_URL")
)
_base_var = next(
(v for v in _pp.env_vars if v.endswith("_BASE_URL") or v.endswith("_URL")),
None,
)
if not _key_vars:
continue
_models_url = (
(_pp.models_url or (_pp.base_url.rstrip("/") + "/models"))
if _pp.base_url else None
)
_static.append((_label, _key_vars, _models_url, _base_var, True))
except Exception:
pass
return _static
def run_doctor(args):
"""Run diagnostic checks."""
should_fix = getattr(args, 'fix', False)
@@ -1087,27 +1166,11 @@ def run_doctor(args):
# -- API-key providers --
# Tuple: (name, env_vars, default_url, base_env, supports_models_endpoint)
# If supports_models_endpoint is False, we skip the health check and just show "configured"
_apikey_providers = [
("Z.AI / GLM", ("GLM_API_KEY", "ZAI_API_KEY", "Z_AI_API_KEY"), "https://api.z.ai/api/paas/v4/models", "GLM_BASE_URL", True),
("Kimi / Moonshot", ("KIMI_API_KEY",), "https://api.moonshot.ai/v1/models", "KIMI_BASE_URL", True),
("StepFun Step Plan", ("STEPFUN_API_KEY",), "https://api.stepfun.ai/step_plan/v1/models", "STEPFUN_BASE_URL", True),
("Kimi / Moonshot (China)", ("KIMI_CN_API_KEY",), "https://api.moonshot.cn/v1/models", None, True),
("Arcee AI", ("ARCEEAI_API_KEY",), "https://api.arcee.ai/api/v1/models", "ARCEE_BASE_URL", True),
("GMI Cloud", ("GMI_API_KEY",), "https://api.gmi-serving.com/v1/models", "GMI_BASE_URL", True),
("DeepSeek", ("DEEPSEEK_API_KEY",), "https://api.deepseek.com/v1/models", "DEEPSEEK_BASE_URL", True),
("Hugging Face", ("HF_TOKEN",), "https://router.huggingface.co/v1/models", "HF_BASE_URL", True),
("NVIDIA NIM", ("NVIDIA_API_KEY",), "https://integrate.api.nvidia.com/v1/models", "NVIDIA_BASE_URL", True),
("Alibaba/DashScope", ("DASHSCOPE_API_KEY",), "https://dashscope-intl.aliyuncs.com/compatible-mode/v1/models", "DASHSCOPE_BASE_URL", True),
# MiniMax global: /v1 endpoint supports /models.
("MiniMax", ("MINIMAX_API_KEY",), "https://api.minimax.io/v1/models", "MINIMAX_BASE_URL", True),
# MiniMax CN: /v1 endpoint does NOT support /models (returns 404).
("MiniMax (China)", ("MINIMAX_CN_API_KEY",), "https://api.minimaxi.com/v1/models", "MINIMAX_CN_BASE_URL", False),
("Vercel AI Gateway", ("AI_GATEWAY_API_KEY",), "https://ai-gateway.vercel.sh/v1/models", "AI_GATEWAY_BASE_URL", True),
("Kilo Code", ("KILOCODE_API_KEY",), "https://api.kilo.ai/api/gateway/models", "KILOCODE_BASE_URL", True),
("OpenCode Zen", ("OPENCODE_ZEN_API_KEY",), "https://opencode.ai/zen/v1/models", "OPENCODE_ZEN_BASE_URL", True),
# OpenCode Go has no shared /models endpoint; skip the health check.
("OpenCode Go", ("OPENCODE_GO_API_KEY",), None, "OPENCODE_GO_BASE_URL", False),
]
# Cached at module level after first build — profiles auto-extend it.
global _APIKEY_PROVIDERS_CACHE
if _APIKEY_PROVIDERS_CACHE is None:
_APIKEY_PROVIDERS_CACHE = _build_apikey_providers_list()
_apikey_providers = _APIKEY_PROVIDERS_CACHE
for _pname, _env_vars, _default_url, _base_env, _supports_health_check in _apikey_providers:
_key = ""
for _ev in _env_vars:

View File

@@ -1611,6 +1611,21 @@ def cmd_model(args):
select_provider_and_model(args=args)
def _is_profile_api_key_provider(provider_id: str) -> bool:
"""Return True when provider_id maps to a profile with auth_type='api_key'.
Used as a catch-all in select_provider_and_model() so that new providers
declared in plugins/model-providers/<name>/ automatically dispatch to _model_flow_api_key_provider
without requiring an explicit elif branch here.
"""
try:
from providers import get_provider_profile
_p = get_provider_profile(provider_id)
return _p is not None and _p.auth_type == "api_key"
except Exception:
return False
def select_provider_and_model(args=None):
"""Core provider selection + model picking logic.
@@ -1907,7 +1922,7 @@ def select_provider_and_model(args=None):
"ollama-cloud",
"tencent-tokenhub",
"lmstudio",
):
) or _is_profile_api_key_provider(selected_provider):
_model_flow_api_key_provider(config, selected_provider, current_model)
# ── Post-switch cleanup: clear stale OPENAI_BASE_URL ──────────────
@@ -8215,6 +8230,22 @@ def cmd_logs(args):
)
def _build_provider_choices() -> list[str]:
"""Build the --provider choices list from CANONICAL_PROVIDERS + 'auto'."""
try:
from hermes_cli.models import CANONICAL_PROVIDERS as _cp
return ["auto"] + [p.slug for p in _cp]
except Exception:
# Fallback: static list guarantees the CLI always works
return [
"auto", "openrouter", "nous", "openai-codex", "copilot-acp", "copilot",
"anthropic", "gemini", "google-gemini-cli", "xai", "bedrock", "azure-foundry",
"ollama-cloud", "huggingface", "zai", "kimi-coding", "kimi-coding-cn",
"stepfun", "minimax", "minimax-cn", "kilocode", "xiaomi", "arcee",
"nvidia", "deepseek", "alibaba", "qwen-oauth", "opencode-zen", "opencode-go",
]
def main():
"""Main entry point for hermes CLI."""
from hermes_cli._parser import build_top_level_parser

View File

@@ -806,6 +806,25 @@ CANONICAL_PROVIDERS: list[ProviderEntry] = [
ProviderEntry("ai-gateway", "Vercel AI Gateway", "Vercel AI Gateway"),
]
# Auto-extend CANONICAL_PROVIDERS with any provider registered in providers/
# that is not already in the list above. Adding plugins/model-providers/<name>/
# is sufficient to expose a new provider in the model picker, /model, and all
# downstream consumers — no edits to this file needed.
_canonical_slugs = {p.slug for p in CANONICAL_PROVIDERS}
try:
from providers import list_providers as _list_providers_for_canonical
for _pp in _list_providers_for_canonical():
if _pp.name in _canonical_slugs:
continue
if _pp.auth_type in ("oauth_device_code", "oauth_external", "external_process", "aws_sdk", "copilot"):
continue # non-api-key flows need bespoke picker UX; skip auto-inject
_label = _pp.display_name or _pp.name
_desc = _pp.description or f"{_label} (direct API)"
CANONICAL_PROVIDERS.append(ProviderEntry(_pp.name, _label, _desc))
_canonical_slugs.add(_pp.name)
except Exception:
pass
# Derived dicts — used throughout the codebase
_PROVIDER_LABELS = {p.slug: p.label for p in CANONICAL_PROVIDERS}
_PROVIDER_LABELS["custom"] = "Custom endpoint" # special case: not a named provider
@@ -2023,6 +2042,34 @@ def provider_model_ids(provider: Optional[str], *, force_refresh: bool = False)
return ids
except Exception:
pass
# ── Profile-based generic live fetch (all simple api-key providers) ──
# Handles any provider registered in providers/ with auth_type="api_key".
# Replaces per-provider copy-paste blocks (stepfun, gmi, zai, etc.).
try:
from providers import get_provider_profile
from hermes_cli.auth import resolve_api_key_provider_credentials
_p = get_provider_profile(normalized)
if _p and _p.auth_type == "api_key" and _p.base_url:
try:
creds = resolve_api_key_provider_credentials(normalized)
api_key = str(creds.get("api_key") or "").strip()
base_url = str(creds.get("base_url") or "").strip()
except Exception:
api_key, base_url = "", _p.base_url
if not base_url:
base_url = _p.base_url
if api_key:
live = _p.fetch_models(api_key=api_key)
if live:
return live
# Use profile's fallback_models if defined
if _p.fallback_models:
return list(_p.fallback_models)
except Exception:
pass
curated_static = list(_PROVIDER_MODELS.get(normalized, []))
if normalized in _MODELS_DEV_PREFERRED:
return _merge_with_models_dev(normalized, curated_static)

View File

@@ -173,7 +173,7 @@ def _get_enabled_plugins() -> Optional[set]:
# Data classes
# ---------------------------------------------------------------------------
_VALID_PLUGIN_KINDS: Set[str] = {"standalone", "backend", "exclusive", "platform"}
_VALID_PLUGIN_KINDS: Set[str] = {"standalone", "backend", "exclusive", "platform", "model-provider"}
@dataclass
@@ -643,15 +643,17 @@ class PluginManager:
# - flat: ``plugins/disk-cleanup/plugin.yaml`` (standalone)
# - category: ``plugins/image_gen/openai/plugin.yaml`` (backend)
#
# ``memory/`` and ``context_engine/`` are skipped at the top level —
# they have their own discovery systems. ``platforms/`` is a category
# holding platform adapters (scanned one level deeper below).
# ``memory/``, ``context_engine/``, and ``model-providers/`` are
# skipped at the top level — they have their own discovery systems
# (plugins/memory/__init__.py, providers/__init__.py). ``platforms/``
# is a category holding platform adapters (scanned one level deeper
# below).
repo_plugins = get_bundled_plugins_dir()
manifests.extend(
self._scan_directory(
repo_plugins,
source="bundled",
skip_names={"memory", "context_engine", "platforms"},
skip_names={"memory", "context_engine", "platforms", "model-providers"},
)
)
manifests.extend(
@@ -709,6 +711,21 @@ class PluginManager:
)
continue
# Model provider plugins are loaded by providers/__init__.py
# (its own lazy discovery keyed off first get_provider_profile()
# call). We record the manifest here for introspection but do
# not import the module — a second import would create two
# ProviderProfile instances and break the "last writer wins"
# override semantics between bundled and user plugins.
if manifest.kind == "model-provider":
loaded = LoadedPlugin(manifest=manifest, enabled=True)
self._plugins[lookup_key] = loaded
logger.debug(
"Skipping '%s' (model-provider, handled by providers/ discovery)",
lookup_key,
)
continue
# Built-in backends auto-load — they ship with hermes and must
# just work. Selection among them (e.g. which image_gen backend
# services calls) is driven by ``<category>.provider`` config,
@@ -886,6 +903,19 @@ class PluginManager:
"treating as kind='exclusive'",
key,
)
elif (
"register_provider" in source_text
and "ProviderProfile" in source_text
):
# Model provider plugin (calls register_provider()
# from ``providers`` with a ProviderProfile). Route
# to providers/__init__.py discovery.
kind = "model-provider"
logger.debug(
"Plugin %s: detected model provider, "
"treating as kind='model-provider'",
key,
)
except Exception:
pass

View File

@@ -0,0 +1,70 @@
# Model Provider Plugins
Each subdirectory is a self-contained provider profile plugin. The
directory layout mirrors `plugins/platforms/`:
```
plugins/model-providers/
├── openrouter/
│ ├── __init__.py # registers the ProviderProfile
│ └── plugin.yaml # manifest: name, kind, version, description
├── anthropic/
│ ├── __init__.py
│ └── plugin.yaml
└── ...
```
## How discovery works
`providers/__init__.py._discover_providers()` scans this directory (and
`$HERMES_HOME/plugins/model-providers/`) the first time anything calls
`get_provider_profile()` or `list_providers()`. Each `__init__.py` is
imported and expected to call `providers.register_provider(profile)`.
User plugins at `$HERMES_HOME/plugins/model-providers/<name>/` override
bundled plugins of the same name — last-writer-wins in
`register_provider()`. Drop a file there to replace a built-in.
## Adding a new provider
1. Create `plugins/model-providers/<your_provider>/__init__.py`:
```python
from providers import register_provider
from providers.base import ProviderProfile
my_provider = ProviderProfile(
name="your-provider",
aliases=("alias1", "alias2"),
display_name="Your Provider",
description="One-line description shown in the setup picker",
signup_url="https://your-provider.example.com/keys",
env_vars=("YOUR_PROVIDER_API_KEY", "YOUR_PROVIDER_BASE_URL"),
base_url="https://api.your-provider.example.com/v1",
default_aux_model="your-cheap-model",
)
register_provider(my_provider)
```
2. Create `plugins/model-providers/<your_provider>/plugin.yaml`:
```yaml
name: your-provider-profile
kind: model-provider
version: 1.0.0
description: Short sentence about the provider
author: Your Name
```
Nothing else needs to change. `auth.py`, `config.py`, `models.py`,
`doctor.py`, `model_metadata.py`, `runtime_provider.py`, and the
chat_completions transport all auto-wire from the registry.
## Non-trivial profiles
Override the `ProviderProfile` hooks in a subclass for per-provider
quirks — see `plugins/model-providers/openrouter/__init__.py` for
`build_extra_body` and `build_api_kwargs_extras` examples, and
`plugins/model-providers/gemini/__init__.py` for `thinking_config`
translation.

View File

@@ -0,0 +1,43 @@
"""Vercel AI Gateway provider profile.
AI Gateway routes to multiple backends. Hermes sends attribution
headers and full reasoning config passthrough.
"""
from typing import Any
from providers import register_provider
from providers.base import ProviderProfile
class VercelAIGatewayProfile(ProviderProfile):
"""Vercel AI Gateway — attribution headers + reasoning passthrough."""
def build_api_kwargs_extras(
self,
*,
reasoning_config: dict | None = None,
supports_reasoning: bool = True,
**ctx: Any,
) -> tuple[dict[str, Any], dict[str, Any]]:
extra_body: dict[str, Any] = {}
if supports_reasoning and reasoning_config is not None:
extra_body["reasoning"] = dict(reasoning_config)
elif supports_reasoning:
extra_body["reasoning"] = {"enabled": True, "effort": "medium"}
return extra_body, {}
vercel = VercelAIGatewayProfile(
name="ai-gateway",
aliases=("vercel", "vercel-ai-gateway", "ai_gateway", "aigateway"),
env_vars=("AI_GATEWAY_API_KEY",),
base_url="https://ai-gateway.vercel.sh/v1",
default_headers={
"HTTP-Referer": "https://hermes-agent.nousresearch.com",
"X-Title": "Hermes Agent",
},
default_aux_model="google/gemini-3-flash",
)
register_provider(vercel)

View File

@@ -0,0 +1,5 @@
name: ai-gateway-provider
kind: model-provider
version: 1.0.0
description: Vercel AI Gateway
author: Nous Research

View File

@@ -0,0 +1,21 @@
"""Alibaba Cloud Coding Plan provider profile.
Separate from the standard `alibaba` profile because it hits a different
endpoint (coding-intl.dashscope.aliyuncs.com) with a dedicated API key tier.
"""
from providers import register_provider
from providers.base import ProviderProfile
alibaba_coding_plan = ProviderProfile(
name="alibaba-coding-plan",
aliases=("alibaba_coding", "alibaba-coding", "dashscope-coding"),
display_name="Alibaba Cloud (Coding Plan)",
description="Alibaba Cloud Coding Plan — dedicated coding tier",
signup_url="https://help.aliyun.com/zh/model-studio/",
env_vars=("ALIBABA_CODING_PLAN_API_KEY", "DASHSCOPE_API_KEY", "ALIBABA_CODING_PLAN_BASE_URL"),
base_url="https://coding-intl.dashscope.aliyuncs.com/v1",
auth_type="api_key",
)
register_provider(alibaba_coding_plan)

View File

@@ -0,0 +1,5 @@
name: alibaba-coding-plan-provider
kind: model-provider
version: 1.0.0
description: Alibaba Cloud Coding Plan
author: Nous Research

View File

@@ -0,0 +1,13 @@
"""Alibaba Cloud DashScope provider profile."""
from providers import register_provider
from providers.base import ProviderProfile
alibaba = ProviderProfile(
name="alibaba",
aliases=("dashscope", "alibaba-cloud", "qwen-dashscope"),
env_vars=("DASHSCOPE_API_KEY",),
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
register_provider(alibaba)

View File

@@ -0,0 +1,5 @@
name: alibaba-provider
kind: model-provider
version: 1.0.0
description: Alibaba DashScope (international)
author: Nous Research

View File

@@ -0,0 +1,52 @@
"""Native Anthropic provider profile."""
import json
import logging
import urllib.request
from providers import register_provider
from providers.base import ProviderProfile
logger = logging.getLogger(__name__)
class AnthropicProfile(ProviderProfile):
"""Native Anthropic — uses x-api-key header, not Bearer."""
def fetch_models(
self,
*,
api_key: str | None = None,
timeout: float = 8.0,
) -> list[str] | None:
"""Anthropic uses x-api-key header and anthropic-version."""
if not api_key:
return None
try:
req = urllib.request.Request("https://api.anthropic.com/v1/models")
req.add_header("x-api-key", api_key)
req.add_header("anthropic-version", "2023-06-01")
req.add_header("Accept", "application/json")
with urllib.request.urlopen(req, timeout=timeout) as resp:
data = json.loads(resp.read().decode())
return [
m["id"]
for m in data.get("data", [])
if isinstance(m, dict) and "id" in m
]
except Exception as exc:
logger.debug("fetch_models(anthropic): %s", exc)
return None
anthropic = AnthropicProfile(
name="anthropic",
aliases=("claude", "claude-oauth", "claude-code"),
api_mode="anthropic_messages",
env_vars=("ANTHROPIC_API_KEY", "ANTHROPIC_TOKEN", "CLAUDE_CODE_OAUTH_TOKEN"),
base_url="https://api.anthropic.com",
auth_type="api_key",
default_aux_model="claude-haiku-4-5-20251001",
)
register_provider(anthropic)

View File

@@ -0,0 +1,5 @@
name: anthropic-provider
kind: model-provider
version: 1.0.0
description: Anthropic (Claude)
author: Nous Research

View File

@@ -0,0 +1,13 @@
"""Arcee AI provider profile."""
from providers import register_provider
from providers.base import ProviderProfile
arcee = ProviderProfile(
name="arcee",
aliases=("arcee-ai", "arceeai"),
env_vars=("ARCEEAI_API_KEY",),
base_url="https://api.arcee.ai/api/v1",
)
register_provider(arcee)

View File

@@ -0,0 +1,5 @@
name: arcee-provider
kind: model-provider
version: 1.0.0
description: Arcee AI
author: Nous Research

View File

@@ -0,0 +1,21 @@
"""Azure AI Foundry provider profile.
Azure Foundry exposes an OpenAI-compatible endpoint; users supply their own
base URL at setup since endpoints are per-resource.
"""
from providers import register_provider
from providers.base import ProviderProfile
azure_foundry = ProviderProfile(
name="azure-foundry",
aliases=("azure", "azure-ai-foundry", "azure-ai"),
display_name="Azure Foundry",
description="Azure AI Foundry — OpenAI-compatible endpoint (user-supplied base URL)",
signup_url="https://ai.azure.com/",
env_vars=("AZURE_FOUNDRY_API_KEY", "AZURE_FOUNDRY_BASE_URL"),
base_url="", # per-resource; user provides at setup
auth_type="api_key",
)
register_provider(azure_foundry)

View File

@@ -0,0 +1,5 @@
name: azure-foundry-provider
kind: model-provider
version: 1.0.0
description: Azure AI Foundry
author: Nous Research

View File

@@ -0,0 +1,29 @@
"""AWS Bedrock provider profile."""
from providers import register_provider
from providers.base import ProviderProfile
class BedrockProfile(ProviderProfile):
"""AWS Bedrock — no REST /v1/models endpoint; uses AWS SDK."""
def fetch_models(
self,
*,
api_key: str | None = None,
timeout: float = 8.0,
) -> list[str] | None:
"""Bedrock model listing requires AWS SDK, not a REST call."""
return None
bedrock = BedrockProfile(
name="bedrock",
aliases=("aws", "aws-bedrock", "amazon-bedrock", "amazon"),
api_mode="bedrock_converse",
env_vars=(), # AWS SDK credentials — not env vars
base_url="https://bedrock-runtime.us-east-1.amazonaws.com",
auth_type="aws_sdk",
)
register_provider(bedrock)

View File

@@ -0,0 +1,5 @@
name: bedrock-provider
kind: model-provider
version: 1.0.0
description: AWS Bedrock
author: Nous Research

View File

@@ -0,0 +1,34 @@
"""GitHub Copilot ACP provider profile.
copilot-acp uses an external ACP subprocess — NOT the standard
transport. api_mode="copilot_acp" is handled separately in run_agent.py.
The profile captures auth + endpoint metadata for registry migration.
"""
from providers import register_provider
from providers.base import ProviderProfile
class CopilotACPProfile(ProviderProfile):
"""GitHub Copilot ACP — external process, no REST models endpoint."""
def fetch_models(
self,
*,
api_key: str | None = None,
timeout: float = 8.0,
) -> list[str] | None:
"""Model listing is handled by the ACP subprocess."""
return None
copilot_acp = CopilotACPProfile(
name="copilot-acp",
aliases=("github-copilot-acp", "copilot-acp-agent"),
api_mode="chat_completions", # ACP subprocess uses chat_completions routing
env_vars=(), # Managed by ACP subprocess
base_url="acp://copilot", # ACP internal scheme
auth_type="external_process",
)
register_provider(copilot_acp)

View File

@@ -0,0 +1,5 @@
name: copilot-acp-provider
kind: model-provider
version: 1.0.0
description: GitHub Copilot via ACP subprocess
author: Nous Research

View File

@@ -0,0 +1,58 @@
"""Copilot / GitHub Models provider profile.
Copilot uses per-model api_mode routing:
- GPT-5+ / Codex models → codex_responses
- Claude models → anthropic_messages
- Everything else → chat_completions (this profile covers that subset)
Key quirks for the chat_completions subset:
- Editor attribution headers (via copilot_default_headers())
- GitHub Models reasoning extra_body (model-catalog gated)
"""
from typing import Any
from providers import register_provider
from providers.base import ProviderProfile
class CopilotProfile(ProviderProfile):
"""GitHub Copilot / GitHub Models — editor headers + reasoning."""
def build_api_kwargs_extras(
self,
*,
model: str | None = None,
reasoning_config: dict | None = None,
supports_reasoning: bool = False,
**ctx,
) -> tuple[dict[str, Any], dict[str, Any]]:
extra_body: dict[str, Any] = {}
if supports_reasoning and model:
try:
from hermes_cli.models import github_model_reasoning_efforts
supported_efforts = github_model_reasoning_efforts(model)
if supported_efforts and reasoning_config:
effort = reasoning_config.get("effort", "medium")
# Normalize non-standard effort levels to the nearest supported
if effort == "xhigh":
effort = "high"
if effort in supported_efforts:
extra_body["reasoning"] = {"effort": effort}
elif supported_efforts:
extra_body["reasoning"] = {"effort": "medium"}
except Exception:
pass
return extra_body, {}
copilot = CopilotProfile(
name="copilot",
aliases=("github-copilot", "github-models", "github-model", "github"),
env_vars=("COPILOT_GITHUB_TOKEN", "GH_TOKEN", "GITHUB_TOKEN"),
base_url="https://api.githubcopilot.com",
auth_type="copilot",
)
register_provider(copilot)

View File

@@ -0,0 +1,5 @@
name: copilot-provider
kind: model-provider
version: 1.0.0
description: GitHub Copilot
author: Nous Research

View File

@@ -0,0 +1,68 @@
"""Custom / Ollama (local) provider profile.
Covers any endpoint registered as provider="custom", including local
Ollama instances. Key quirks:
- ollama_num_ctx → extra_body.options.num_ctx (local context window)
- reasoning_config disabled → extra_body.think = False
"""
from typing import Any
from providers import register_provider
from providers.base import ProviderProfile
class CustomProfile(ProviderProfile):
"""Custom/Ollama local provider — think=false and num_ctx support."""
def build_api_kwargs_extras(
self,
*,
reasoning_config: dict | None = None,
ollama_num_ctx: int | None = None,
**ctx: Any,
) -> tuple[dict[str, Any], dict[str, Any]]:
extra_body: dict[str, Any] = {}
# Ollama context window
if ollama_num_ctx:
options = extra_body.get("options", {})
options["num_ctx"] = ollama_num_ctx
extra_body["options"] = options
# Disable thinking when reasoning is turned off
if reasoning_config and isinstance(reasoning_config, dict):
_effort = (reasoning_config.get("effort") or "").strip().lower()
_enabled = reasoning_config.get("enabled", True)
if _effort == "none" or _enabled is False:
extra_body["think"] = False
return extra_body, {}
def fetch_models(
self,
*,
api_key: str | None = None,
timeout: float = 8.0,
) -> list[str] | None:
"""Custom/Ollama: base_url is user-configured; fetch if set."""
if not self.base_url:
return None
return super().fetch_models(api_key=api_key, timeout=timeout)
custom = CustomProfile(
name="custom",
aliases=(
"ollama",
"local",
"vllm",
"llamacpp",
"llama.cpp",
"llama-cpp",
),
env_vars=(), # No fixed key — custom endpoint
base_url="", # User-configured
)
register_provider(custom)

View File

@@ -0,0 +1,5 @@
name: custom-provider
kind: model-provider
version: 1.0.0
description: Custom / Ollama / local OpenAI-compatible endpoint
author: Nous Research

View File

@@ -0,0 +1,20 @@
"""DeepSeek provider profile."""
from providers import register_provider
from providers.base import ProviderProfile
deepseek = ProviderProfile(
name="deepseek",
aliases=("deepseek-chat",),
env_vars=("DEEPSEEK_API_KEY",),
display_name="DeepSeek",
description="DeepSeek — native DeepSeek API",
signup_url="https://platform.deepseek.com/",
fallback_models=(
"deepseek-chat",
"deepseek-reasoner",
),
base_url="https://api.deepseek.com/v1",
)
register_provider(deepseek)

View File

@@ -0,0 +1,5 @@
name: deepseek-provider
kind: model-provider
version: 1.0.0
description: DeepSeek
author: Nous Research

View File

@@ -0,0 +1,72 @@
"""Google Gemini provider profiles.
gemini: Google AI Studio (API key) — uses GeminiNativeClient
google-gemini-cli: Google Cloud Code Assist (OAuth) — uses GeminiCloudCodeClient
Both report api_mode="chat_completions" but use custom native clients
that bypass the standard OpenAI transport. The profile captures auth
and endpoint metadata for auth.py / runtime_provider.py migration, and
carries the thinking_config translation hook so the transport's profile
path produces the same extra_body shape the legacy flag path did.
"""
from typing import Any
from providers import register_provider
from providers.base import ProviderProfile
class GeminiProfile(ProviderProfile):
"""Gemini — translate reasoning_config to thinking_config in extra_body."""
def build_extra_body(
self, *, session_id: str | None = None, **context: Any
) -> dict[str, Any]:
"""Emit extra_body.thinking_config (native) or extra_body.extra_body.google.thinking_config
(OpenAI-compat /openai subpath), mirroring the legacy path's behavior.
"""
from agent.transports.chat_completions import (
_build_gemini_thinking_config,
_is_gemini_openai_compat_base_url,
_snake_case_gemini_thinking_config,
)
model = context.get("model") or ""
reasoning_config = context.get("reasoning_config")
base_url = context.get("base_url") or self.base_url
raw_thinking_config = _build_gemini_thinking_config(model, reasoning_config)
if not raw_thinking_config:
return {}
body: dict[str, Any] = {}
if self.name == "gemini" and _is_gemini_openai_compat_base_url(base_url):
thinking_config = _snake_case_gemini_thinking_config(raw_thinking_config)
if thinking_config:
body["extra_body"] = {"google": {"thinking_config": thinking_config}}
else:
body["thinking_config"] = raw_thinking_config
return body
gemini = GeminiProfile(
name="gemini",
aliases=("google", "google-gemini", "google-ai-studio"),
api_mode="chat_completions",
env_vars=("GOOGLE_API_KEY", "GEMINI_API_KEY"),
base_url="https://generativelanguage.googleapis.com/v1beta",
auth_type="api_key",
default_aux_model="gemini-3-flash-preview",
)
google_gemini_cli = GeminiProfile(
name="google-gemini-cli",
aliases=("gemini-cli", "gemini-oauth"),
api_mode="chat_completions",
env_vars=(), # OAuth — no API key
base_url="cloudcode-pa://google", # Cloud Code Assist internal scheme
auth_type="oauth_external",
)
register_provider(gemini)
register_provider(google_gemini_cli)

View File

@@ -0,0 +1,5 @@
name: gemini-provider
kind: model-provider
version: 1.0.0
description: Google Gemini (API key + Cloud Code OAuth)
author: Nous Research

View File

@@ -0,0 +1,26 @@
"""GMI Cloud provider profile."""
from providers import register_provider
from providers.base import ProviderProfile
gmi = ProviderProfile(
name="gmi",
aliases=("gmi-cloud", "gmicloud"),
display_name="GMI Cloud",
description="GMI Cloud — multi-model direct API (slash-form model IDs)",
signup_url="https://www.gmicloud.ai/",
env_vars=("GMI_API_KEY", "GMI_BASE_URL"),
base_url="https://api.gmi-serving.com/v1",
auth_type="api_key",
default_aux_model="google/gemini-3.1-flash-lite-preview",
fallback_models=(
"zai-org/GLM-5.1-FP8",
"deepseek-ai/DeepSeek-V3.2",
"moonshotai/Kimi-K2.5",
"google/gemini-3.1-flash-lite-preview",
"anthropic/claude-sonnet-4.6",
"openai/gpt-5.4",
),
)
register_provider(gmi)

View File

@@ -0,0 +1,5 @@
name: gmi-provider
kind: model-provider
version: 1.0.0
description: GMI Cloud
author: Nous Research

View File

@@ -0,0 +1,20 @@
"""Hugging Face provider profile."""
from providers import register_provider
from providers.base import ProviderProfile
huggingface = ProviderProfile(
name="huggingface",
aliases=("hf", "hugging-face", "huggingface-hub"),
env_vars=("HF_TOKEN",),
display_name="HuggingFace",
description="HuggingFace Inference API",
signup_url="https://huggingface.co/settings/tokens",
fallback_models=(
"Qwen/Qwen3.5-72B-Instruct",
"deepseek-ai/DeepSeek-V3.2",
),
base_url="https://router.huggingface.co/v1",
)
register_provider(huggingface)

View File

@@ -0,0 +1,5 @@
name: huggingface-provider
kind: model-provider
version: 1.0.0
description: HuggingFace Inference Providers
author: Nous Research

View File

@@ -0,0 +1,14 @@
"""Kilo Code provider profile."""
from providers import register_provider
from providers.base import ProviderProfile
kilocode = ProviderProfile(
name="kilocode",
aliases=("kilo-code", "kilo", "kilo-gateway"),
env_vars=("KILOCODE_API_KEY",),
base_url="https://api.kilo.ai/api/gateway",
default_aux_model="google/gemini-3-flash-preview",
)
register_provider(kilocode)

View File

@@ -0,0 +1,5 @@
name: kilocode-provider
kind: model-provider
version: 1.0.0
description: Kilo Code
author: Nous Research

View File

@@ -0,0 +1,71 @@
"""Kimi / Moonshot provider profiles.
Kimi has dual endpoints:
- sk-kimi-* keys → api.kimi.com/coding (Anthropic Messages API)
- legacy keys → api.moonshot.ai/v1 (OpenAI chat completions)
This module covers the chat_completions path (/v1 endpoint).
"""
from typing import Any
from providers import register_provider
from providers.base import OMIT_TEMPERATURE, ProviderProfile
class KimiProfile(ProviderProfile):
"""Kimi/Moonshot — temperature omitted, thinking + reasoning_effort."""
def build_api_kwargs_extras(
self, *, reasoning_config: dict | None = None, **context
) -> tuple[dict[str, Any], dict[str, Any]]:
"""Kimi uses extra_body.thinking + top-level reasoning_effort."""
extra_body = {}
top_level = {}
if not reasoning_config or not isinstance(reasoning_config, dict):
# No config → thinking enabled, default effort
extra_body["thinking"] = {"type": "enabled"}
top_level["reasoning_effort"] = "medium"
return extra_body, top_level
enabled = reasoning_config.get("enabled", True)
if enabled is False:
extra_body["thinking"] = {"type": "disabled"}
return extra_body, top_level
# Enabled
extra_body["thinking"] = {"type": "enabled"}
effort = (reasoning_config.get("effort") or "").strip().lower()
if effort in ("low", "medium", "high"):
top_level["reasoning_effort"] = effort
else:
top_level["reasoning_effort"] = "medium"
return extra_body, top_level
kimi = KimiProfile(
name="kimi-coding",
aliases=("kimi", "moonshot", "kimi-for-coding"),
env_vars=("KIMI_API_KEY", "KIMI_CODING_API_KEY"),
base_url="https://api.moonshot.ai/v1",
fixed_temperature=OMIT_TEMPERATURE,
default_max_tokens=32000,
default_headers={"User-Agent": "hermes-agent/1.0"},
default_aux_model="kimi-k2-turbo-preview",
)
kimi_cn = KimiProfile(
name="kimi-coding-cn",
aliases=("kimi-cn", "moonshot-cn"),
env_vars=("KIMI_CN_API_KEY",),
base_url="https://api.moonshot.cn/v1",
fixed_temperature=OMIT_TEMPERATURE,
default_max_tokens=32000,
default_headers={"User-Agent": "hermes-agent/1.0"},
default_aux_model="kimi-k2-turbo-preview",
)
register_provider(kimi)
register_provider(kimi_cn)

View File

@@ -0,0 +1,5 @@
name: kimi-coding-provider
kind: model-provider
version: 1.0.0
description: Moonshot Kimi Coding (global + China)
author: Nous Research

View File

@@ -0,0 +1,45 @@
"""MiniMax provider profiles (international + China).
Both use anthropic_messages api_mode — their inference_base_url
ends with /anthropic which triggers auto-detection to anthropic_messages.
"""
from providers import register_provider
from providers.base import ProviderProfile
minimax = ProviderProfile(
name="minimax",
aliases=("mini-max",),
api_mode="anthropic_messages",
env_vars=("MINIMAX_API_KEY",),
base_url="https://api.minimax.io/anthropic",
auth_type="api_key",
default_aux_model="MiniMax-M2.7",
)
minimax_cn = ProviderProfile(
name="minimax-cn",
aliases=("minimax-china", "minimax_cn"),
api_mode="anthropic_messages",
env_vars=("MINIMAX_CN_API_KEY",),
base_url="https://api.minimaxi.com/anthropic",
auth_type="api_key",
default_aux_model="MiniMax-M2.7",
)
minimax_oauth = ProviderProfile(
name="minimax-oauth",
aliases=("minimax_oauth", "minimax-oauth-io"),
api_mode="anthropic_messages",
display_name="MiniMax (OAuth)",
description="MiniMax via OAuth browser flow — no API key required",
signup_url="https://api.minimax.io/",
env_vars=(), # OAuth — tokens in auth.json, not env
base_url="https://api.minimax.io/anthropic",
auth_type="oauth_external",
default_aux_model="MiniMax-M2.7-highspeed",
)
register_provider(minimax)
register_provider(minimax_cn)
register_provider(minimax_oauth)

View File

@@ -0,0 +1,5 @@
name: minimax-provider
kind: model-provider
version: 1.0.0
description: MiniMax M-series (global + China + OAuth)
author: Nous Research

View File

@@ -0,0 +1,53 @@
"""Nous Portal provider profile."""
from typing import Any
from providers import register_provider
from providers.base import ProviderProfile
class NousProfile(ProviderProfile):
"""Nous Portal — product tags, reasoning with Nous-specific omission."""
def build_extra_body(
self, *, session_id: str | None = None, **context
) -> dict[str, Any]:
return {"tags": ["product=hermes-agent"]}
def build_api_kwargs_extras(
self,
*,
reasoning_config: dict | None = None,
supports_reasoning: bool = False,
**context,
) -> tuple[dict[str, Any], dict[str, Any]]:
"""Nous: passes full reasoning_config, but OMITS when disabled."""
extra_body = {}
if supports_reasoning:
if reasoning_config is not None:
rc = dict(reasoning_config)
if rc.get("enabled") is False:
pass # Nous omits reasoning when disabled
else:
extra_body["reasoning"] = rc
else:
extra_body["reasoning"] = {"enabled": True, "effort": "medium"}
return extra_body, {}
nous = NousProfile(
name="nous",
aliases=("nous-portal", "nousresearch"),
env_vars=("NOUS_API_KEY",),
display_name="Nous Research",
description="Nous Research — Hermes model family",
signup_url="https://nousresearch.com/",
fallback_models=(
"hermes-3-405b",
"hermes-3-70b",
),
base_url="https://inference.nousresearch.com/v1",
auth_type="oauth_device_code",
)
register_provider(nous)

View File

@@ -0,0 +1,5 @@
name: nous-provider
kind: model-provider
version: 1.0.0
description: Nous Research Portal
author: Nous Research

View File

@@ -0,0 +1,21 @@
"""NVIDIA NIM provider profile."""
from providers import register_provider
from providers.base import ProviderProfile
nvidia = ProviderProfile(
name="nvidia",
aliases=("nvidia-nim",),
env_vars=("NVIDIA_API_KEY",),
display_name="NVIDIA NIM",
description="NVIDIA NIM — accelerated inference",
signup_url="https://build.nvidia.com/",
fallback_models=(
"nvidia/llama-3.1-nemotron-70b-instruct",
"nvidia/llama-3.3-70b-instruct",
),
base_url="https://integrate.api.nvidia.com/v1",
default_max_tokens=16384,
)
register_provider(nvidia)

View File

@@ -0,0 +1,5 @@
name: nvidia-provider
kind: model-provider
version: 1.0.0
description: NVIDIA NIM
author: Nous Research

View File

@@ -0,0 +1,14 @@
"""Ollama Cloud provider profile."""
from providers import register_provider
from providers.base import ProviderProfile
ollama_cloud = ProviderProfile(
name="ollama-cloud",
aliases=("ollama_cloud",),
default_aux_model="nemotron-3-nano:30b",
env_vars=("OLLAMA_API_KEY",),
base_url="https://ollama.com/v1",
)
register_provider(ollama_cloud)

View File

@@ -0,0 +1,5 @@
name: ollama-cloud-provider
kind: model-provider
version: 1.0.0
description: Ollama Cloud
author: Nous Research

View File

@@ -0,0 +1,15 @@
"""OpenAI Codex (Responses API) provider profile."""
from providers import register_provider
from providers.base import ProviderProfile
openai_codex = ProviderProfile(
name="openai-codex",
aliases=("codex", "openai_codex"),
api_mode="codex_responses",
env_vars=(), # OAuth external — no API key
base_url="https://chatgpt.com/backend-api/codex",
auth_type="oauth_external",
)
register_provider(openai_codex)

View File

@@ -0,0 +1,5 @@
name: openai-codex-provider
kind: model-provider
version: 1.0.0
description: OpenAI Codex (Responses API)
author: Nous Research

View File

@@ -0,0 +1,30 @@
"""OpenCode provider profiles (Zen + Go).
Both use per-model api_mode routing:
- OpenCode Zen: Claude → anthropic_messages, GPT-5/Codex → codex_responses,
everything else → chat_completions (this profile)
- OpenCode Go: MiniMax → anthropic_messages, GLM/Kimi → chat_completions
(this profile)
"""
from providers import register_provider
from providers.base import ProviderProfile
opencode_zen = ProviderProfile(
name="opencode-zen",
aliases=("opencode", "opencode_zen", "zen"),
env_vars=("OPENCODE_ZEN_API_KEY",),
base_url="https://opencode.ai/zen/v1",
default_aux_model="gemini-3-flash",
)
opencode_go = ProviderProfile(
name="opencode-go",
aliases=("opencode_go", "go", "opencode-go-sub"),
env_vars=("OPENCODE_GO_API_KEY",),
base_url="https://opencode.ai/zen/go/v1",
default_aux_model="glm-5",
)
register_provider(opencode_zen)
register_provider(opencode_go)

View File

@@ -0,0 +1,5 @@
name: opencode-zen-provider
kind: model-provider
version: 1.0.0
description: OpenCode (Zen + Go)
author: Nous Research

View File

@@ -0,0 +1,86 @@
"""OpenRouter provider profile."""
import logging
from typing import Any
from providers import register_provider
from providers.base import ProviderProfile
logger = logging.getLogger(__name__)
_CACHE: list[str] | None = None
class OpenRouterProfile(ProviderProfile):
"""OpenRouter aggregator — provider preferences, reasoning config passthrough."""
def fetch_models(
self,
*,
api_key: str | None = None,
timeout: float = 8.0,
) -> list[str] | None:
"""Fetch from public OpenRouter catalog — no auth required.
Note: Tool-call capability filtering is applied by hermes_cli/models.py
via fetch_openrouter_models() → _openrouter_model_supports_tools(), not
here. The picker early-returns via the dedicated openrouter path before
reaching this method, so filtering here would be unreachable.
"""
global _CACHE # noqa: PLW0603
if _CACHE is not None:
return _CACHE
try:
result = super().fetch_models(api_key=None, timeout=timeout)
if result is not None:
_CACHE = result
return result
except Exception as exc:
logger.debug("fetch_models(openrouter): %s", exc)
return None
def build_extra_body(
self, *, session_id: str | None = None, **context: Any
) -> dict[str, Any]:
body: dict[str, Any] = {}
prefs = context.get("provider_preferences")
if prefs:
body["provider"] = prefs
return body
def build_api_kwargs_extras(
self,
*,
reasoning_config: dict | None = None,
supports_reasoning: bool = False,
**context: Any,
) -> tuple[dict[str, Any], dict[str, Any]]:
"""OpenRouter passes the full reasoning_config dict as extra_body.reasoning."""
extra_body: dict[str, Any] = {}
if supports_reasoning:
if reasoning_config is not None:
extra_body["reasoning"] = dict(reasoning_config)
else:
extra_body["reasoning"] = {"enabled": True, "effort": "medium"}
return extra_body, {}
openrouter = OpenRouterProfile(
name="openrouter",
aliases=("or",),
env_vars=("OPENROUTER_API_KEY",),
display_name="OpenRouter",
description="OpenRouter — unified API for 200+ models",
signup_url="https://openrouter.ai/keys",
base_url="https://openrouter.ai/api/v1",
models_url="https://openrouter.ai/api/v1/models",
fallback_models=(
"anthropic/claude-sonnet-4.6",
"openai/gpt-5.4",
"deepseek/deepseek-chat",
"google/gemini-3-flash-preview",
"qwen/qwen3-plus",
),
)
register_provider(openrouter)

View File

@@ -0,0 +1,5 @@
name: openrouter-provider
kind: model-provider
version: 1.0.0
description: OpenRouter aggregator
author: Nous Research

View File

@@ -0,0 +1,82 @@
"""Qwen Portal provider profile."""
import copy
from typing import Any
from providers import register_provider
from providers.base import ProviderProfile
class QwenProfile(ProviderProfile):
"""Qwen Portal — message normalization, vl_high_resolution, metadata top-level."""
def prepare_messages(self, messages: list[dict[str, Any]]) -> list[dict[str, Any]]:
"""Normalize content to list-of-dicts format.
Inject cache_control on system message.
Matches the behavior of run_agent.py:_qwen_prepare_chat_messages().
"""
prepared = copy.deepcopy(messages)
if not prepared:
return prepared
for msg in prepared:
if not isinstance(msg, dict):
continue
content = msg.get("content")
if isinstance(content, str):
msg["content"] = [{"type": "text", "text": content}]
elif isinstance(content, list):
normalized_parts = []
for part in content:
if isinstance(part, str):
normalized_parts.append({"type": "text", "text": part})
elif isinstance(part, dict):
normalized_parts.append(part)
if normalized_parts:
msg["content"] = normalized_parts
# Inject cache_control on the last part of the system message.
for msg in prepared:
if isinstance(msg, dict) and msg.get("role") == "system":
content = msg.get("content")
if (
isinstance(content, list)
and content
and isinstance(content[-1], dict)
):
content[-1]["cache_control"] = {"type": "ephemeral"}
break
return prepared
def build_extra_body(
self, *, session_id: str | None = None, **context
) -> dict[str, Any]:
return {"vl_high_resolution_images": True}
def build_api_kwargs_extras(
self,
*,
reasoning_config: dict | None = None,
qwen_session_metadata: dict | None = None,
**context,
) -> tuple[dict[str, Any], dict[str, Any]]:
"""Qwen metadata goes to top-level api_kwargs, not extra_body."""
top_level = {}
if qwen_session_metadata:
top_level["metadata"] = qwen_session_metadata
return {}, top_level
qwen = QwenProfile(
name="qwen-oauth",
aliases=("qwen", "qwen-portal", "qwen-cli"),
env_vars=("QWEN_API_KEY",),
base_url="https://portal.qwen.ai/v1",
auth_type="oauth_external",
default_max_tokens=65536,
)
register_provider(qwen)

View File

@@ -0,0 +1,5 @@
name: qwen-oauth-provider
kind: model-provider
version: 1.0.0
description: Qwen Portal (OAuth)
author: Nous Research

View File

@@ -0,0 +1,14 @@
"""StepFun provider profile."""
from providers import register_provider
from providers.base import ProviderProfile
stepfun = ProviderProfile(
name="stepfun",
aliases=("step", "stepfun-coding-plan"),
default_aux_model="step-3.5-flash",
env_vars=("STEPFUN_API_KEY",),
base_url="https://api.stepfun.ai/step_plan/v1",
)
register_provider(stepfun)

View File

@@ -0,0 +1,5 @@
name: stepfun-provider
kind: model-provider
version: 1.0.0
description: StepFun Step Plan
author: Nous Research

View File

@@ -0,0 +1,15 @@
"""xAI (Grok) provider profile."""
from providers import register_provider
from providers.base import ProviderProfile
xai = ProviderProfile(
name="xai",
aliases=("grok", "x-ai", "x.ai"),
api_mode="codex_responses",
env_vars=("XAI_API_KEY",),
base_url="https://api.x.ai/v1",
auth_type="api_key",
)
register_provider(xai)

View File

@@ -0,0 +1,5 @@
name: xai-provider
kind: model-provider
version: 1.0.0
description: xAI Grok (Responses API)
author: Nous Research

View File

@@ -0,0 +1,13 @@
"""Xiaomi MiMo provider profile."""
from providers import register_provider
from providers.base import ProviderProfile
xiaomi = ProviderProfile(
name="xiaomi",
aliases=("mimo", "xiaomi-mimo"),
env_vars=("XIAOMI_API_KEY",),
base_url="https://api.xiaomimimo.com/v1",
)
register_provider(xiaomi)

View File

@@ -0,0 +1,5 @@
name: xiaomi-provider
kind: model-provider
version: 1.0.0
description: Xiaomi MiMo
author: Nous Research

View File

@@ -0,0 +1,21 @@
"""ZAI / GLM provider profile."""
from providers import register_provider
from providers.base import ProviderProfile
zai = ProviderProfile(
name="zai",
aliases=("glm", "z-ai", "z.ai", "zhipu"),
env_vars=("GLM_API_KEY", "ZAI_API_KEY", "Z_AI_API_KEY"),
display_name="Z.AI (GLM)",
description="Z.AI / GLM — Zhipu AI models",
signup_url="https://z.ai/",
fallback_models=(
"glm-5",
"glm-4-9b",
),
base_url="https://api.z.ai/api/paas/v4",
default_aux_model="glm-4.5-flash",
)
register_provider(zai)

View File

@@ -0,0 +1,5 @@
name: zai-provider
kind: model-provider
version: 1.0.0
description: Z.AI / GLM
author: Nous Research

78
providers/README.md Normal file
View File

@@ -0,0 +1,78 @@
# providers/
Registry and ABC for every inference provider Hermes knows about.
Each provider is declared once as a `ProviderProfile`. Every other layer —
auth resolution, transport kwargs, model listing, runtime routing — reads from
these profiles instead of maintaining its own parallel data.
---
## Layout
```
providers/
├── base.py ProviderProfile dataclass + OMIT_TEMPERATURE sentinel
├── __init__.py Registry: register_provider(), get_provider_profile(), list_providers()
└── README.md This file
```
The **profiles themselves** live as plugins under
`plugins/model-providers/<name>/` (bundled in this repo) and
`$HERMES_HOME/plugins/model-providers/<name>/` (per-user overrides). The
registry in `providers/__init__.py` lazily discovers them the first time any
consumer calls `get_provider_profile()` or `list_providers()`. See
`plugins/model-providers/README.md` for the plugin contract and examples.
---
## How it wires in
The registry is populated on first access. After that, every downstream
layer reads from it:
- `hermes_cli/auth.py` extends `PROVIDER_REGISTRY` with every api-key
profile it sees (skipping `copilot`, `kimi-coding`, `kimi-coding-cn`,
`zai`, `openrouter`, `custom` — those need bespoke token resolution).
- `hermes_cli/models.py` extends `CANONICAL_PROVIDERS` and calls
`profile.fetch_models()` inside `provider_model_ids()`.
- `hermes_cli/doctor.py` adds a `/models` health check for each
`auth_type="api_key"` profile.
- `hermes_cli/config.py` injects every `env_var` into
`OPTIONAL_ENV_VARS` so the setup wizard knows about it.
- `hermes_cli/runtime_provider.py` reads `profile.api_mode` as a fallback
when URL detection finds nothing.
- `agent/model_metadata.py` maps hostname → provider via
`profile.get_hostname()`.
- `agent/auxiliary_client.py` reads `profile.default_aux_model` first
before falling back to the legacy hardcoded dict.
- `agent/transports/chat_completions.py::_build_kwargs_from_profile()`
invokes `profile.prepare_messages()`, `profile.build_extra_body()`,
and `profile.build_api_kwargs_extras()` on every call.
- `run_agent.py` passes `provider_profile=<ProviderProfile>` so the
transport takes the profile path instead of the legacy flag path.
---
## Adding a provider
See `plugins/model-providers/README.md` — drop a new directory there (or
under `$HERMES_HOME/plugins/model-providers/` for a private plugin).
---
## Hooks you can override on `ProviderProfile`
| Hook | Purpose |
|------|---------|
| `get_hostname()` | URL-based detection — default derives from `base_url`. |
| `prepare_messages(msgs)` | Provider-specific message preprocessing (Qwen normalises to list-of-parts, injects `cache_control`). |
| `build_extra_body(**ctx)` | Provider-specific `extra_body` (OpenRouter provider prefs, Gemini `thinking_config`). |
| `build_api_kwargs_extras(**ctx)` | `(extra_body_additions, top_level_kwargs)` — Kimi puts reasoning_effort top-level, Qwen splits `enable_thinking`/`thinking_budget`. |
| `fetch_models(*, api_key)` | Live catalog fetch — default hits `{models_url or base_url}/models` with Bearer auth. Override for no-REST providers (Bedrock), OAuth catalogs (Anthropic), or public catalogs (OpenRouter). |
---
## Configuration fields
Full reference in `providers/base.py` dataclass definition.

191
providers/__init__.py Normal file
View File

@@ -0,0 +1,191 @@
"""Provider module registry.
Provider profiles can live in two places:
1. Bundled plugins: ``plugins/model-providers/<name>/`` (shipped with hermes-agent)
2. User plugins: ``$HERMES_HOME/plugins/model-providers/<name>/``
Each plugin directory contains:
- ``__init__.py`` — calls ``register_provider(profile)`` at import
- ``plugin.yaml`` — manifest (name, kind: model-provider, version, description)
Discovery is lazy: the first call to ``get_provider_profile()`` or
``list_providers()`` scans both locations and imports every plugin. User
plugins override bundled plugins on name collision (last-writer-wins), so
third parties can monkey-patch or replace any built-in profile without
editing the repo.
For backward compatibility, ``providers/*.py`` files (other than ``base.py``
and ``__init__.py``) are still discovered via ``pkgutil.iter_modules``.
This lets out-of-tree users drop a single-file profile into an editable
install without the plugin dir structure. New profiles should prefer the
plugin layout.
Usage::
from providers import get_provider_profile
profile = get_provider_profile("nvidia") # ProviderProfile or None
profile = get_provider_profile("kimi") # checks name + aliases
"""
from __future__ import annotations
import importlib
import importlib.util
import logging
import sys
from pathlib import Path
from providers.base import OMIT_TEMPERATURE, ProviderProfile # noqa: F401
logger = logging.getLogger(__name__)
_REGISTRY: dict[str, ProviderProfile] = {}
_ALIASES: dict[str, str] = {}
_discovered = False
# Repo-root ``plugins/model-providers/`` — populated at discovery time.
_BUNDLED_PLUGINS_DIR = (
Path(__file__).resolve().parent.parent / "plugins" / "model-providers"
)
def register_provider(profile: ProviderProfile) -> None:
"""Register a provider profile by name and aliases.
Later registrations with the same name replace earlier ones — so user
plugins under ``$HERMES_HOME/plugins/model-providers/`` can override
bundled profiles without editing repo code.
"""
_REGISTRY[profile.name] = profile
for alias in profile.aliases:
_ALIASES[alias] = profile.name
def get_provider_profile(name: str) -> ProviderProfile | None:
"""Look up a provider profile by name or alias.
Returns None if the provider has no profile (falls back to generic).
"""
if not _discovered:
_discover_providers()
canonical = _ALIASES.get(name, name)
return _REGISTRY.get(canonical)
def list_providers() -> list[ProviderProfile]:
"""Return all registered provider profiles (one per canonical name)."""
if not _discovered:
_discover_providers()
# Deduplicate: _REGISTRY has canonical names; _ALIASES points to same objects
seen: set[int] = set()
result: list[ProviderProfile] = []
for profile in _REGISTRY.values():
pid = id(profile)
if pid not in seen:
seen.add(pid)
result.append(profile)
return result
def _user_plugins_dir() -> Path | None:
"""Return ``$HERMES_HOME/plugins/model-providers/`` if it exists."""
try:
from hermes_constants import get_hermes_home
d = get_hermes_home() / "plugins" / "model-providers"
return d if d.is_dir() else None
except Exception:
return None
def _import_plugin_dir(plugin_dir: Path, source: str) -> None:
"""Import a single plugin directory so it self-registers.
``source`` is "bundled" or "user", used only for log messages.
"""
init_file = plugin_dir / "__init__.py"
if not init_file.exists():
return
# Give bundled plugins a stable import path (``plugins.model_providers.<name>``)
# so relative imports within the plugin work. User plugins load via
# ``importlib.util.spec_from_file_location`` with a unique module name so
# multiple HERMES_HOME profiles don't alias each other.
safe_name = plugin_dir.name.replace("-", "_")
if source == "bundled":
module_name = f"plugins.model_providers.{safe_name}"
else:
module_name = f"_hermes_user_provider_{safe_name}"
if module_name in sys.modules:
return # already imported
try:
spec = importlib.util.spec_from_file_location(
module_name, init_file, submodule_search_locations=[str(plugin_dir)]
)
if spec is None or spec.loader is None:
return
module = importlib.util.module_from_spec(spec)
sys.modules[module_name] = module
spec.loader.exec_module(module)
except Exception as exc:
logger.warning(
"Failed to load %s provider plugin %s: %s", source, plugin_dir.name, exc
)
sys.modules.pop(module_name, None)
def _discover_providers() -> None:
"""Populate the registry by importing every provider plugin.
Order:
1. Bundled plugins at ``<repo>/plugins/model-providers/<name>/``
2. User plugins at ``$HERMES_HOME/plugins/model-providers/<name>/``
3. Legacy per-file modules at ``providers/<name>.py`` (back-compat)
Each step imports its plugins, which call ``register_provider()`` at
module-level. Later steps win on name collision.
"""
global _discovered
if _discovered:
return
_discovered = True
# 1. Bundled plugins — shipped with hermes-agent.
if _BUNDLED_PLUGINS_DIR.is_dir():
for child in sorted(_BUNDLED_PLUGINS_DIR.iterdir()):
if not child.is_dir() or child.name.startswith(("_", ".")):
continue
_import_plugin_dir(child, "bundled")
# 2. User plugins — under $HERMES_HOME/plugins/model-providers/<name>/.
# These can override any bundled profile of the same name (last-writer-wins
# in register_provider()).
user_dir = _user_plugins_dir()
if user_dir is not None:
for child in sorted(user_dir.iterdir()):
if not child.is_dir() or child.name.startswith(("_", ".")):
continue
_import_plugin_dir(child, "user")
# 3. Legacy single-file profiles at providers/<name>.py. Kept for
# back-compat — if someone drops a ``providers/foo.py`` into an
# editable install, it still works without the plugin layout.
try:
import pkgutil
import providers as _pkg
for _importer, modname, _ispkg in pkgutil.iter_modules(_pkg.__path__):
if modname.startswith("_") or modname == "base":
continue
try:
importlib.import_module(f"providers.{modname}")
except ImportError as exc:
logger.warning(
"Failed to import legacy provider module %s: %s", modname, exc
)
except Exception:
pass

165
providers/base.py Normal file
View File

@@ -0,0 +1,165 @@
"""Provider profile base class.
A ProviderProfile declares everything about an inference provider in one place:
auth, endpoints, client quirks, request-time quirks. The transport reads this
instead of receiving 20+ boolean flags.
Provider profiles are DECLARATIVE — they describe the provider's behavior.
They do NOT own client construction, credential rotation, or streaming.
Those stay on AIAgent.
"""
from __future__ import annotations
import logging
from dataclasses import dataclass, field
from typing import Any
logger = logging.getLogger(__name__)
# Sentinel for "omit temperature entirely" (Kimi: server manages it)
OMIT_TEMPERATURE = object()
@dataclass
class ProviderProfile:
"""Base provider profile — subclass or instantiate with overrides."""
# ── Identity ─────────────────────────────────────────────
name: str
api_mode: str = "chat_completions"
aliases: tuple = ()
# ── Human-readable metadata ───────────────────────────────
display_name: str = "" # e.g. "GMI Cloud" — shown in picker/labels
description: str = "" # e.g. "GMI Cloud (multi-model direct API)" — picker subtitle
signup_url: str = "" # e.g. "https://www.gmicloud.ai/" — shown during setup
# ── Auth & endpoints ─────────────────────────────────────
env_vars: tuple = ()
base_url: str = ""
models_url: str = "" # explicit models endpoint; falls back to {base_url}/models
auth_type: str = "api_key" # api_key|oauth_device_code|oauth_external|copilot|aws_sdk
# ── Model catalog ─────────────────────────────────────────
# fallback_models: curated list shown in /model picker when live fetch fails.
# Only agentic models that support tool calling should appear here.
fallback_models: tuple = ()
# hostname: base hostname for URL→provider reverse-mapping in model_metadata.py
# e.g. "api.gmi-serving.com". Derived from base_url when empty.
hostname: str = ""
# ── Client-level quirks (set once at client construction) ─
default_headers: dict[str, str] = field(default_factory=dict)
# ── Request-level quirks ─────────────────────────────────
# Temperature: None = use caller's default, OMIT_TEMPERATURE = don't send
fixed_temperature: Any = None
default_max_tokens: int | None = None
default_aux_model: str = (
"" # cheap model for auxiliary tasks (compression, vision, etc.)
)
# empty = use main model
# ── Hooks (override in subclass for complex providers) ───
def get_hostname(self) -> str:
"""Return the provider's base hostname for URL-based detection.
Uses self.hostname if set explicitly, otherwise derives it from base_url.
e.g. 'https://api.gmi-serving.com/v1''api.gmi-serving.com'
"""
if self.hostname:
return self.hostname
if self.base_url:
from urllib.parse import urlparse
return urlparse(self.base_url).hostname or ""
return ""
def prepare_messages(self, messages: list[dict[str, Any]]) -> list[dict[str, Any]]:
"""Provider-specific message preprocessing.
Called AFTER codex field sanitization, BEFORE developer role swap.
Default: pass-through.
"""
return messages
def build_extra_body(
self, *, session_id: str | None = None, **context: Any
) -> dict[str, Any]:
"""Provider-specific extra_body fields.
Merged into the API kwargs extra_body. Default: empty dict.
"""
return {}
def build_api_kwargs_extras(
self,
*,
reasoning_config: dict | None = None,
**context: Any,
) -> tuple[dict[str, Any], dict[str, Any]]:
"""Provider-specific kwargs split between extra_body and top-level api_kwargs.
Returns (extra_body_additions, top_level_kwargs).
The transport merges extra_body_additions into extra_body, and
top_level_kwargs directly into api_kwargs.
This split exists because some providers put reasoning config in
extra_body (OpenRouter: extra_body.reasoning) while others put it
as top-level api_kwargs (Kimi: api_kwargs.reasoning_effort).
Default: ({}, {}).
"""
return {}, {}
def fetch_models(
self,
*,
api_key: str | None = None,
timeout: float = 8.0,
) -> list[str] | None:
"""Fetch the live model list from the provider's models endpoint.
Returns a list of model ID strings, or None if the fetch failed or
the provider does not support live model listing.
Resolution order for the endpoint URL:
1. self.models_url (explicit override — use when the models
endpoint differs from the inference base URL, e.g. OpenRouter
exposes a public catalog at /api/v1/models while inference is
at /api/v1)
2. self.base_url + "/models" (standard OpenAI-compat fallback)
The default implementation sends Bearer auth when api_key is given
and forwards self.default_headers. Override to customise auth, path,
response shape, or to return None for providers with no REST catalog.
Callers must always fall back to the static _PROVIDER_MODELS list
when this returns None.
"""
url = (self.models_url or "").strip()
if not url:
if not self.base_url:
return None
url = self.base_url.rstrip("/") + "/models"
import json
import urllib.request
req = urllib.request.Request(url)
if api_key:
req.add_header("Authorization", f"Bearer {api_key}")
req.add_header("Accept", "application/json")
for k, v in self.default_headers.items():
req.add_header(k, v)
try:
with urllib.request.urlopen(req, timeout=timeout) as resp:
data = json.loads(resp.read().decode())
items = data if isinstance(data, list) else data.get("data", [])
return [m["id"] for m in items if isinstance(m, dict) and "id" in m]
except Exception as exc:
logger.debug("fetch_models(%s): %s", self.name, exc)
return None

View File

@@ -142,7 +142,7 @@ hermes_cli = ["web_dist/**/*"]
gateway = ["assets/**/*"]
[tool.setuptools.packages.find]
include = ["agent", "agent.*", "tools", "tools.*", "hermes_cli", "gateway", "gateway.*", "tui_gateway", "tui_gateway.*", "cron", "acp_adapter", "plugins", "plugins.*"]
include = ["agent", "agent.*", "tools", "tools.*", "hermes_cli", "gateway", "gateway.*", "tui_gateway", "tui_gateway.*", "cron", "acp_adapter", "plugins", "plugins.*", "providers", "providers.*"]
[tool.pytest.ini_options]
testpaths = ["tests"]

View File

@@ -1451,6 +1451,17 @@ class AIAgent:
elif base_url_host_matches(effective_base, "chatgpt.com"):
from agent.auxiliary_client import _codex_cloudflare_headers
client_kwargs["default_headers"] = _codex_cloudflare_headers(api_key)
elif "default_headers" not in client_kwargs:
# Fall back to profile.default_headers for providers that
# declare custom headers (e.g. Vercel AI Gateway attribution,
# Kimi User-Agent on non-kimi.com endpoints).
try:
from providers import get_provider_profile as _gpf
_ph = _gpf(self.provider)
if _ph and _ph.default_headers:
client_kwargs["default_headers"] = dict(_ph.default_headers)
except Exception:
pass
else:
# No explicit creds — use the centralized provider router
from agent.auxiliary_client import resolve_provider_client
@@ -6251,7 +6262,19 @@ class AIAgent:
self._client_kwargs.get("api_key", "")
)
else:
self._client_kwargs.pop("default_headers", None)
# No URL-specific headers — check profile.default_headers before clearing.
_ph_headers = None
try:
from providers import get_provider_profile as _gpf2
_ph2 = _gpf2(self.provider)
if _ph2 and _ph2.default_headers:
_ph_headers = dict(_ph2.default_headers)
except Exception:
pass
if _ph_headers:
self._client_kwargs["default_headers"] = _ph_headers
else:
self._client_kwargs.pop("default_headers", None)
def _swap_credential(self, entry) -> None:
runtime_key = getattr(entry, "runtime_api_key", None) or getattr(entry, "access_token", "")
@@ -8469,7 +8492,7 @@ class AIAgent:
_omit_temp = False
_fixed_temp = None
# Provider preferences (OpenRouter-specific)
# Provider preferences (OpenRouter-style)
_prefs: Dict[str, Any] = {}
if self.providers_allowed:
_prefs["only"] = self.providers_allowed
@@ -8484,16 +8507,16 @@ class AIAgent:
if self.provider_data_collection:
_prefs["data_collection"] = self.provider_data_collection
# Anthropic max output for Claude on OpenRouter/Nous
# Claude max-output override on aggregators
_ant_max = None
if (_is_or or _is_nous) and "claude" in (self.model or "").lower():
try:
from agent.anthropic_adapter import _get_anthropic_max_output
_ant_max = _get_anthropic_max_output(self.model)
except Exception:
pass # fail open — let the proxy pick its default
pass
# Qwen session metadata precomputed here (promptId is per-call random)
# Qwen session metadata
_qwen_meta = None
if _is_qwen:
_qwen_meta = {
@@ -8501,8 +8524,44 @@ class AIAgent:
"promptId": str(uuid.uuid4()),
}
# Ephemeral max output override — consume immediately so the next
# turn doesn't inherit it.
# ── Provider profile path (registered providers) ───────────────────
# Profiles handle per-provider quirks via hooks. When a profile is
# found, delegate fully; otherwise fall through to the legacy flag path.
try:
from providers import get_provider_profile
_profile = get_provider_profile(self.provider)
except Exception:
_profile = None
if _profile:
_ephemeral_out = getattr(self, "_ephemeral_max_output_tokens", None)
if _ephemeral_out is not None:
self._ephemeral_max_output_tokens = None
return _ct.build_kwargs(
model=self.model,
messages=api_messages,
tools=self.tools,
base_url=self.base_url,
timeout=self._resolved_api_call_timeout(),
max_tokens=self.max_tokens,
ephemeral_max_output_tokens=_ephemeral_out,
max_tokens_param_fn=self._max_tokens_param,
reasoning_config=self.reasoning_config,
request_overrides=self.request_overrides,
session_id=getattr(self, "session_id", None),
provider_profile=_profile,
ollama_num_ctx=self._ollama_num_ctx,
# Context forwarded to profile hooks:
provider_preferences=_prefs or None,
anthropic_max_output=_ant_max,
supports_reasoning=self._supports_reasoning_extra_body(),
qwen_session_metadata=_qwen_meta,
)
# ── Legacy flag path ────────────────────────────────────────────
# Reached only when get_provider_profile() returns None — i.e. a
# completely unknown provider not in providers/ registry.
_ephemeral_out = getattr(self, "_ephemeral_max_output_tokens", None)
if _ephemeral_out is not None:
self._ephemeral_max_output_tokens = None

View File

@@ -71,17 +71,17 @@ class TestMinimaxThinkingSupport:
class TestMinimaxAuxModel:
"""Verify auxiliary model is standard (not highspeed)."""
"""Verify auxiliary model is standard (not highspeed) — now reads from profiles."""
def test_minimax_aux_is_standard(self):
from agent.auxiliary_client import _API_KEY_PROVIDER_AUX_MODELS
assert _API_KEY_PROVIDER_AUX_MODELS["minimax"] == "MiniMax-M2.7"
assert _API_KEY_PROVIDER_AUX_MODELS["minimax-cn"] == "MiniMax-M2.7"
from agent.auxiliary_client import _get_aux_model_for_provider
assert _get_aux_model_for_provider("minimax") == "MiniMax-M2.7"
assert _get_aux_model_for_provider("minimax-cn") == "MiniMax-M2.7"
def test_minimax_aux_not_highspeed(self):
from agent.auxiliary_client import _API_KEY_PROVIDER_AUX_MODELS
assert "highspeed" not in _API_KEY_PROVIDER_AUX_MODELS["minimax"]
assert "highspeed" not in _API_KEY_PROVIDER_AUX_MODELS["minimax-cn"]
from agent.auxiliary_client import _get_aux_model_for_provider
assert "highspeed" not in _get_aux_model_for_provider("minimax")
assert "highspeed" not in _get_aux_model_for_provider("minimax-cn")
class TestMinimaxBetaHeaders:

View File

@@ -73,17 +73,21 @@ class TestChatCompletionsBuildKwargs:
assert kw["tools"] == tools
def test_openrouter_provider_prefs(self, transport):
from providers import get_provider_profile
profile = get_provider_profile("openrouter")
msgs = [{"role": "user", "content": "Hi"}]
kw = transport.build_kwargs(
model="gpt-4o", messages=msgs,
is_openrouter=True,
provider_profile=profile,
provider_preferences={"only": ["openai"]},
)
assert kw["extra_body"]["provider"] == {"only": ["openai"]}
def test_nous_tags(self, transport):
from providers import get_provider_profile
profile = get_provider_profile("nous")
msgs = [{"role": "user", "content": "Hi"}]
kw = transport.build_kwargs(model="gpt-4o", messages=msgs, is_nous=True)
kw = transport.build_kwargs(model="gpt-4o", messages=msgs, provider_profile=profile)
assert kw["extra_body"]["tags"] == ["product=hermes-agent"]
def test_reasoning_default(self, transport):
@@ -95,29 +99,36 @@ class TestChatCompletionsBuildKwargs:
assert kw["extra_body"]["reasoning"] == {"enabled": True, "effort": "medium"}
def test_nous_omits_disabled_reasoning(self, transport):
from providers import get_provider_profile
profile = get_provider_profile("nous")
msgs = [{"role": "user", "content": "Hi"}]
kw = transport.build_kwargs(
model="gpt-4o", messages=msgs,
provider_profile=profile,
supports_reasoning=True,
is_nous=True,
reasoning_config={"enabled": False},
)
# Nous rejects enabled=false; reasoning omitted entirely
assert "reasoning" not in kw.get("extra_body", {})
def test_ollama_num_ctx(self, transport):
from providers import get_provider_profile
profile = get_provider_profile("custom")
msgs = [{"role": "user", "content": "Hi"}]
kw = transport.build_kwargs(
model="llama3", messages=msgs,
provider_profile=profile,
ollama_num_ctx=32768,
)
assert kw["extra_body"]["options"]["num_ctx"] == 32768
def test_custom_think_false(self, transport):
from providers import get_provider_profile
profile = get_provider_profile("custom")
msgs = [{"role": "user", "content": "Hi"}]
kw = transport.build_kwargs(
model="qwen3", messages=msgs,
is_custom_provider=True,
provider_profile=profile,
reasoning_config={"effort": "none"},
)
assert kw["extra_body"]["think"] is False
@@ -304,23 +315,29 @@ class TestChatCompletionsBuildKwargs:
assert kw["max_tokens"] == 2048
def test_nvidia_default_max_tokens(self, transport):
"""NVIDIA max_tokens=16384 is now set via ProviderProfile, not legacy flag."""
from providers import get_provider_profile
profile = get_provider_profile("nvidia")
msgs = [{"role": "user", "content": "Hi"}]
kw = transport.build_kwargs(
model="glm-4.7", messages=msgs,
is_nvidia_nim=True,
model="nvidia/llama-3.1-405b-instruct",
messages=msgs,
max_tokens_param_fn=lambda n: {"max_tokens": n},
provider_profile=profile,
)
# NVIDIA default: 16384
assert kw["max_tokens"] == 16384
def test_qwen_default_max_tokens(self, transport):
from providers import get_provider_profile
profile = get_provider_profile("qwen-oauth")
msgs = [{"role": "user", "content": "Hi"}]
kw = transport.build_kwargs(
model="qwen3-coder-plus", messages=msgs,
is_qwen_portal=True,
provider_profile=profile,
max_tokens_param_fn=lambda n: {"max_tokens": n},
)
# Qwen default: 65536
# Qwen default: 65536 from profile.default_max_tokens
assert kw["max_tokens"] == 65536
def test_anthropic_max_output_for_claude_on_aggregator(self, transport):
@@ -343,14 +360,23 @@ class TestChatCompletionsBuildKwargs:
assert kw["service_tier"] == "priority"
def test_fixed_temperature(self, transport):
"""Fixed temperature is now set via ProviderProfile.fixed_temperature."""
from providers.base import ProviderProfile
msgs = [{"role": "user", "content": "Hi"}]
kw = transport.build_kwargs(model="gpt-4o", messages=msgs, fixed_temperature=0.6)
kw = transport.build_kwargs(
model="gpt-4o", messages=msgs,
provider_profile=ProviderProfile(name="_t", fixed_temperature=0.6),
)
assert kw["temperature"] == 0.6
def test_omit_temperature(self, transport):
"""Omit temperature is set via ProviderProfile with OMIT_TEMPERATURE sentinel."""
from providers.base import ProviderProfile, OMIT_TEMPERATURE
msgs = [{"role": "user", "content": "Hi"}]
kw = transport.build_kwargs(model="gpt-4o", messages=msgs, omit_temperature=True, fixed_temperature=0.5)
# omit wins
kw = transport.build_kwargs(
model="gpt-4o", messages=msgs,
provider_profile=ProviderProfile(name="_t", fixed_temperature=OMIT_TEMPERATURE),
)
assert "temperature" not in kw
@@ -358,18 +384,22 @@ class TestChatCompletionsKimi:
"""Regression tests for the Kimi/Moonshot quirks migrated into the transport."""
def test_kimi_max_tokens_default(self, transport):
from providers import get_provider_profile
profile = get_provider_profile("kimi-coding")
kw = transport.build_kwargs(
model="kimi-k2", messages=[{"role": "user", "content": "Hi"}],
is_kimi=True,
provider_profile=profile,
max_tokens_param_fn=lambda n: {"max_tokens": n},
)
# Kimi CLI default: 32000
# Kimi CLI default: 32000 from KimiProfile.default_max_tokens
assert kw["max_tokens"] == 32000
def test_kimi_reasoning_effort_top_level(self, transport):
from providers import get_provider_profile
profile = get_provider_profile("kimi-coding")
kw = transport.build_kwargs(
model="kimi-k2", messages=[{"role": "user", "content": "Hi"}],
is_kimi=True,
provider_profile=profile,
reasoning_config={"effort": "high"},
max_tokens_param_fn=lambda n: {"max_tokens": n},
)
@@ -387,17 +417,21 @@ class TestChatCompletionsKimi:
assert "reasoning_effort" not in kw
def test_kimi_thinking_enabled_extra_body(self, transport):
from providers import get_provider_profile
profile = get_provider_profile("kimi-coding")
kw = transport.build_kwargs(
model="kimi-k2", messages=[{"role": "user", "content": "Hi"}],
is_kimi=True,
provider_profile=profile,
max_tokens_param_fn=lambda n: {"max_tokens": n},
)
assert kw["extra_body"]["thinking"] == {"type": "enabled"}
def test_kimi_thinking_disabled_extra_body(self, transport):
from providers import get_provider_profile
profile = get_provider_profile("kimi-coding")
kw = transport.build_kwargs(
model="kimi-k2", messages=[{"role": "user", "content": "Hi"}],
is_kimi=True,
provider_profile=profile,
reasoning_config={"enabled": False},
max_tokens_param_fn=lambda n: {"max_tokens": n},
)

View File

@@ -269,9 +269,9 @@ class TestGmiModelMetadata:
class TestGmiAuxiliary:
def test_aux_default_model(self):
from agent.auxiliary_client import _API_KEY_PROVIDER_AUX_MODELS
from agent.auxiliary_client import _get_aux_model_for_provider
assert _API_KEY_PROVIDER_AUX_MODELS["gmi"] == "google/gemini-3.1-flash-lite-preview"
assert _get_aux_model_for_provider("gmi") == "google/gemini-3.1-flash-lite-preview"
def test_resolve_provider_client_uses_gmi_aux_default(self, monkeypatch):
monkeypatch.setenv("GMI_API_KEY", "gmi-test-key")

View File

View File

@@ -0,0 +1,118 @@
"""E2E tests: verify _build_kwargs_from_profile produces correct output.
These tests call _build_kwargs_from_profile on the transport directly,
without importing run_agent (which would cause xdist worker contamination).
"""
import pytest
from agent.transports.chat_completions import ChatCompletionsTransport
from providers import get_provider_profile
@pytest.fixture
def transport():
return ChatCompletionsTransport()
def _msgs():
return [{"role": "user", "content": "hi"}]
class TestNvidiaProfileWiring:
def test_nvidia_gets_default_max_tokens(self, transport):
profile = get_provider_profile("nvidia")
kwargs = transport.build_kwargs(
model="nvidia/llama-3.1-nemotron-70b-instruct",
messages=_msgs(),
tools=None,
provider_profile=profile,
max_tokens=None,
max_tokens_param_fn=lambda x: {"max_tokens": x} if x else {},
timeout=300,
reasoning_config=None,
request_overrides=None,
session_id="test",
ollama_num_ctx=None,
)
# NVIDIA profile sets default_max_tokens=16384
assert kwargs.get("max_tokens") == 16384
def test_nvidia_nim_alias(self, transport):
profile = get_provider_profile("nvidia-nim")
assert profile is not None
assert profile.name == "nvidia"
assert profile.default_max_tokens == 16384
def test_nvidia_model_passed(self, transport):
profile = get_provider_profile("nvidia")
kwargs = transport.build_kwargs(
model="nvidia/test-model",
messages=_msgs(),
tools=None,
provider_profile=profile,
max_tokens=None,
max_tokens_param_fn=lambda x: {"max_tokens": x} if x else {},
timeout=300,
reasoning_config=None,
request_overrides=None,
session_id="test",
ollama_num_ctx=None,
)
assert kwargs["model"] == "nvidia/test-model"
def test_nvidia_messages_passed(self, transport):
profile = get_provider_profile("nvidia")
msgs = _msgs()
kwargs = transport.build_kwargs(
model="nvidia/test",
messages=msgs,
tools=None,
provider_profile=profile,
max_tokens=None,
max_tokens_param_fn=lambda x: {"max_tokens": x} if x else {},
timeout=300,
reasoning_config=None,
request_overrides=None,
session_id="test",
ollama_num_ctx=None,
)
assert kwargs["messages"] == msgs
class TestDeepSeekProfileWiring:
def test_deepseek_no_forced_max_tokens(self, transport):
profile = get_provider_profile("deepseek")
kwargs = transport.build_kwargs(
model="deepseek-chat",
messages=_msgs(),
tools=None,
provider_profile=profile,
max_tokens=None,
max_tokens_param_fn=lambda x: {"max_tokens": x} if x else {},
timeout=300,
reasoning_config=None,
request_overrides=None,
session_id="test",
ollama_num_ctx=None,
)
# DeepSeek has no default_max_tokens
assert kwargs["model"] == "deepseek-chat"
assert kwargs.get("max_tokens") is None or "max_tokens" not in kwargs
def test_deepseek_messages_passed(self, transport):
profile = get_provider_profile("deepseek")
msgs = _msgs()
kwargs = transport.build_kwargs(
model="deepseek-chat",
messages=msgs,
tools=None,
provider_profile=profile,
max_tokens=None,
max_tokens_param_fn=lambda x: {"max_tokens": x} if x else {},
timeout=300,
reasoning_config=None,
request_overrides=None,
session_id="test",
ollama_num_ctx=None,
)
assert kwargs["messages"] == msgs

View File

@@ -0,0 +1,145 @@
"""Tests for the model-providers plugin discovery system.
Verifies that:
1. All bundled providers at plugins/model-providers/<name>/ are discovered
2. User plugins at $HERMES_HOME/plugins/model-providers/<name>/ override bundled
3. plugin.yaml manifests with kind=model-provider are correctly categorized
"""
from __future__ import annotations
import importlib
import sys
from pathlib import Path
import pytest
REPO_ROOT = Path(__file__).resolve().parents[2]
def _clear_provider_caches():
"""Force providers/__init__.py to re-discover on next list_providers()."""
import providers as _pkg
_pkg._REGISTRY.clear()
_pkg._ALIASES.clear()
_pkg._discovered = False
# Evict any cached plugin modules so the next import re-executes.
for mod in list(sys.modules.keys()):
if (
mod.startswith("plugins.model_providers")
or mod.startswith("_hermes_user_provider")
):
del sys.modules[mod]
def test_bundled_plugins_discovered():
"""Every plugins/model-providers/<name>/ should contain a plugin.yaml + __init__.py."""
plugins_dir = REPO_ROOT / "plugins" / "model-providers"
assert plugins_dir.is_dir(), f"Missing {plugins_dir}"
child_dirs = [c for c in plugins_dir.iterdir() if c.is_dir()]
assert len(child_dirs) >= 28, f"Expected at least 28 provider plugins, found {len(child_dirs)}"
for child in child_dirs:
assert (child / "__init__.py").exists(), f"{child.name} missing __init__.py"
assert (child / "plugin.yaml").exists(), f"{child.name} missing plugin.yaml"
def test_all_33_profiles_register():
"""After discovery, the registry must contain exactly 33 distinct profiles."""
_clear_provider_caches()
from providers import list_providers
profiles = list_providers()
names = sorted(p.name for p in profiles)
assert len(names) == 33, f"Expected 33 profiles, got {len(names)}: {names}"
# Spot-check representative providers from different categories
for required in (
"openrouter", "anthropic", "custom", "bedrock", "openai-codex",
"minimax-oauth", "gmi", "xiaomi", "alibaba-coding-plan",
):
assert required in names, f"Missing profile: {required}"
def test_user_plugin_overrides_bundled(tmp_path, monkeypatch):
"""A user plugin with the same name must override the bundled profile."""
# Point HERMES_HOME at a fresh temp dir
hermes_home = tmp_path / ".hermes"
hermes_home.mkdir()
monkeypatch.setenv("HERMES_HOME", str(hermes_home))
# get_hermes_home() may be module-cached depending on codebase; ensure the
# env var is the source of truth. Most code paths re-read it each call.
# Drop a user plugin that replaces 'gmi'
user_gmi = hermes_home / "plugins" / "model-providers" / "gmi"
user_gmi.mkdir(parents=True)
(user_gmi / "__init__.py").write_text(
"from providers import register_provider\n"
"from providers.base import ProviderProfile\n"
"\n"
"custom_gmi = ProviderProfile(\n"
' name="gmi",\n'
' aliases=("gmi-user-override-test",),\n'
' env_vars=("GMI_API_KEY",),\n'
' base_url="https://user-override.example.com/v1",\n'
' auth_type="api_key",\n'
")\n"
"register_provider(custom_gmi)\n"
)
(user_gmi / "plugin.yaml").write_text(
"name: gmi-user-override\n"
"kind: model-provider\n"
"version: 0.0.1\n"
"description: Test user override\n"
)
_clear_provider_caches()
from providers import get_provider_profile
gmi = get_provider_profile("gmi")
assert gmi is not None
assert gmi.base_url == "https://user-override.example.com/v1", (
f"User override not applied; got base_url={gmi.base_url!r}"
)
assert "gmi-user-override-test" in gmi.aliases
# Clean up: reset discovery state so other tests see the bundled version
_clear_provider_caches()
def test_general_plugin_manager_skips_model_provider_kind(tmp_path, monkeypatch):
"""The general PluginManager must NOT import model-provider plugins
(providers/__init__.py handles them). It records the manifest only."""
from hermes_cli import plugins as plugin_mod
hermes_home = tmp_path / ".hermes"
hermes_home.mkdir()
monkeypatch.setenv("HERMES_HOME", str(hermes_home))
# Create a user-installed plugin with an explicit kind: model-provider.
user_plugin = hermes_home / "plugins" / "test-model-provider"
user_plugin.mkdir(parents=True)
(user_plugin / "plugin.yaml").write_text(
"name: test-model-provider\n"
"kind: model-provider\n"
"version: 0.0.1\n"
)
(user_plugin / "__init__.py").write_text(
# Intentionally broken import — if the general loader tries to
# import this module, the test will fail with ImportError.
"raise AssertionError('model-provider plugins must not be imported by PluginManager')\n"
)
# Fresh manager
manager = plugin_mod.PluginManager()
manager.discover_and_load(force=True)
# The manifest should be recorded but not loaded
loaded = manager._plugins.get("test-model-provider")
assert loaded is not None
assert loaded.manifest.kind == "model-provider"
# No import means the module must NOT be in the plugins list as a loaded one.
# We check that the general loader didn't crash and didn't raise from the
# broken __init__.py.

View File

@@ -0,0 +1,290 @@
"""Profile-path parity tests: verify profile path produces identical output to legacy flags.
Each test calls build_kwargs twice — once with legacy flags, once with provider_profile —
and asserts the output is identical. This catches any behavioral drift between the two paths.
"""
import pytest
from agent.transports.chat_completions import ChatCompletionsTransport
from providers import get_provider_profile
@pytest.fixture
def transport():
return ChatCompletionsTransport()
def _msgs():
return [{"role": "user", "content": "hello"}]
def _max_tokens_fn(n):
return {"max_completion_tokens": n}
class TestNvidiaProfileParity:
def test_max_tokens_match(self, transport):
"""NVIDIA profile sets max_tokens=16384; legacy flag is removed."""
profile = transport.build_kwargs(
model="nvidia/nemotron", messages=_msgs(), tools=None,
provider_profile=get_provider_profile("nvidia"),
max_tokens_param_fn=_max_tokens_fn,
)
assert profile["max_completion_tokens"] == 16384
class TestKimiProfileParity:
def test_temperature_omitted(self, transport):
legacy = transport.build_kwargs(
model="kimi-k2", messages=_msgs(), tools=None,
provider_profile=get_provider_profile("kimi-coding"), omit_temperature=True,
)
profile = transport.build_kwargs(
model="kimi-k2", messages=_msgs(), tools=None,
provider_profile=get_provider_profile("kimi"),
)
assert "temperature" not in legacy
assert "temperature" not in profile
def test_max_tokens(self, transport):
legacy = transport.build_kwargs(
model="kimi-k2", messages=_msgs(), tools=None,
provider_profile=get_provider_profile("kimi-coding"), max_tokens_param_fn=_max_tokens_fn,
)
profile = transport.build_kwargs(
model="kimi-k2", messages=_msgs(), tools=None,
provider_profile=get_provider_profile("kimi"),
max_tokens_param_fn=_max_tokens_fn,
)
assert profile["max_completion_tokens"] == legacy["max_completion_tokens"] == 32000
def test_thinking_enabled(self, transport):
rc = {"enabled": True, "effort": "high"}
legacy = transport.build_kwargs(
model="kimi-k2", messages=_msgs(), tools=None,
provider_profile=get_provider_profile("kimi-coding"), reasoning_config=rc,
)
profile = transport.build_kwargs(
model="kimi-k2", messages=_msgs(), tools=None,
provider_profile=get_provider_profile("kimi"),
reasoning_config=rc,
)
assert profile["extra_body"]["thinking"] == legacy["extra_body"]["thinking"]
assert profile["reasoning_effort"] == legacy["reasoning_effort"] == "high"
def test_thinking_disabled(self, transport):
rc = {"enabled": False}
legacy = transport.build_kwargs(
model="kimi-k2", messages=_msgs(), tools=None,
provider_profile=get_provider_profile("kimi-coding"), reasoning_config=rc,
)
profile = transport.build_kwargs(
model="kimi-k2", messages=_msgs(), tools=None,
provider_profile=get_provider_profile("kimi"),
reasoning_config=rc,
)
assert profile["extra_body"]["thinking"] == legacy["extra_body"]["thinking"]
assert profile["extra_body"]["thinking"]["type"] == "disabled"
assert "reasoning_effort" not in profile
assert "reasoning_effort" not in legacy
def test_reasoning_effort_default(self, transport):
rc = {"enabled": True}
legacy = transport.build_kwargs(
model="kimi-k2", messages=_msgs(), tools=None,
provider_profile=get_provider_profile("kimi-coding"), reasoning_config=rc,
)
profile = transport.build_kwargs(
model="kimi-k2", messages=_msgs(), tools=None,
provider_profile=get_provider_profile("kimi"),
reasoning_config=rc,
)
assert profile["reasoning_effort"] == legacy["reasoning_effort"] == "medium"
class TestOpenRouterProfileParity:
def test_provider_preferences(self, transport):
prefs = {"allow": ["anthropic"]}
legacy = transport.build_kwargs(
model="anthropic/claude-sonnet-4.6", messages=_msgs(), tools=None,
provider_profile=get_provider_profile("openrouter"), provider_preferences=prefs,
)
profile = transport.build_kwargs(
model="anthropic/claude-sonnet-4.6", messages=_msgs(), tools=None,
provider_profile=get_provider_profile("openrouter"),
provider_preferences=prefs,
)
assert profile["extra_body"]["provider"] == legacy["extra_body"]["provider"]
def test_reasoning_full_config(self, transport):
rc = {"enabled": True, "effort": "high"}
legacy = transport.build_kwargs(
model="anthropic/claude-sonnet-4.6", messages=_msgs(), tools=None,
provider_profile=get_provider_profile("openrouter"), supports_reasoning=True, reasoning_config=rc,
)
profile = transport.build_kwargs(
model="anthropic/claude-sonnet-4.6", messages=_msgs(), tools=None,
provider_profile=get_provider_profile("openrouter"),
supports_reasoning=True, reasoning_config=rc,
)
assert profile["extra_body"]["reasoning"] == legacy["extra_body"]["reasoning"]
def test_default_reasoning(self, transport):
legacy = transport.build_kwargs(
model="anthropic/claude-sonnet-4.6", messages=_msgs(), tools=None,
provider_profile=get_provider_profile("openrouter"), supports_reasoning=True,
)
profile = transport.build_kwargs(
model="anthropic/claude-sonnet-4.6", messages=_msgs(), tools=None,
provider_profile=get_provider_profile("openrouter"),
supports_reasoning=True,
)
assert profile["extra_body"]["reasoning"] == legacy["extra_body"]["reasoning"]
class TestNousProfileParity:
def test_tags(self, transport):
legacy = transport.build_kwargs(
model="hermes-3", messages=_msgs(), tools=None, provider_profile=get_provider_profile("nous"),
)
profile = transport.build_kwargs(
model="hermes-3", messages=_msgs(), tools=None,
provider_profile=get_provider_profile("nous"),
)
assert profile["extra_body"]["tags"] == legacy["extra_body"]["tags"]
def test_reasoning_omitted_when_disabled(self, transport):
rc = {"enabled": False}
legacy = transport.build_kwargs(
model="hermes-3", messages=_msgs(), tools=None,
provider_profile=get_provider_profile("nous"), supports_reasoning=True, reasoning_config=rc,
)
profile = transport.build_kwargs(
model="hermes-3", messages=_msgs(), tools=None,
provider_profile=get_provider_profile("nous"),
supports_reasoning=True, reasoning_config=rc,
)
assert "reasoning" not in legacy.get("extra_body", {})
assert "reasoning" not in profile.get("extra_body", {})
class TestQwenProfileParity:
def test_max_tokens(self, transport):
legacy = transport.build_kwargs(
model="qwen3.5", messages=_msgs(), tools=None,
provider_profile=get_provider_profile("qwen-oauth"), max_tokens_param_fn=_max_tokens_fn,
)
profile = transport.build_kwargs(
model="qwen3.5", messages=_msgs(), tools=None,
provider_profile=get_provider_profile("qwen"),
max_tokens_param_fn=_max_tokens_fn,
)
assert profile["max_completion_tokens"] == legacy["max_completion_tokens"] == 65536
def test_vl_high_resolution(self, transport):
legacy = transport.build_kwargs(
model="qwen3.5", messages=_msgs(), tools=None, provider_profile=get_provider_profile("qwen-oauth"),
)
profile = transport.build_kwargs(
model="qwen3.5", messages=_msgs(), tools=None,
provider_profile=get_provider_profile("qwen"),
)
assert profile["extra_body"]["vl_high_resolution_images"] == legacy["extra_body"]["vl_high_resolution_images"]
def test_metadata_top_level(self, transport):
meta = {"sessionId": "s123", "promptId": "p456"}
legacy = transport.build_kwargs(
model="qwen3.5", messages=_msgs(), tools=None,
provider_profile=get_provider_profile("qwen-oauth"), qwen_session_metadata=meta,
)
profile = transport.build_kwargs(
model="qwen3.5", messages=_msgs(), tools=None,
provider_profile=get_provider_profile("qwen"),
qwen_session_metadata=meta,
)
assert profile["metadata"] == legacy["metadata"] == meta
assert "metadata" not in profile.get("extra_body", {})
def test_message_preprocessing(self, transport):
"""Qwen profile normalizes string content to list-of-parts."""
msgs = [
{"role": "system", "content": "You are helpful."},
{"role": "user", "content": "hello"},
]
profile = transport.build_kwargs(
model="qwen3.5", messages=msgs, tools=None,
provider_profile=get_provider_profile("qwen"),
)
out_msgs = profile["messages"]
# System message content normalized + cache_control injected
assert isinstance(out_msgs[0]["content"], list)
assert out_msgs[0]["content"][0]["type"] == "text"
assert "cache_control" in out_msgs[0]["content"][-1]
# User message content normalized
assert isinstance(out_msgs[1]["content"], list)
assert out_msgs[1]["content"][0] == {"type": "text", "text": "hello"}
class TestDeveloperRoleParity:
"""Developer role swap must work on BOTH legacy and profile paths."""
def test_legacy_path_swaps_for_gpt5(self, transport):
msgs = [{"role": "system", "content": "Be helpful"}, {"role": "user", "content": "hi"}]
kw = transport.build_kwargs(
model="gpt-5.4", messages=msgs, tools=None,
)
assert kw["messages"][0]["role"] == "developer"
def test_profile_path_swaps_for_gpt5(self, transport):
msgs = [{"role": "system", "content": "Be helpful"}, {"role": "user", "content": "hi"}]
kw = transport.build_kwargs(
model="gpt-5.4", messages=msgs, tools=None,
provider_profile=get_provider_profile("openrouter"),
)
assert kw["messages"][0]["role"] == "developer"
def test_profile_path_no_swap_for_claude(self, transport):
msgs = [{"role": "system", "content": "Be helpful"}, {"role": "user", "content": "hi"}]
kw = transport.build_kwargs(
model="anthropic/claude-sonnet-4.6", messages=msgs, tools=None,
provider_profile=get_provider_profile("openrouter"),
)
assert kw["messages"][0]["role"] == "system"
class TestRequestOverridesParity:
"""request_overrides with extra_body must merge identically on both paths."""
def test_extra_body_override_legacy(self, transport):
kw = transport.build_kwargs(
model="gpt-5.4", messages=_msgs(), tools=None,
provider_profile=get_provider_profile("openrouter"),
request_overrides={"extra_body": {"custom_key": "custom_val"}},
)
assert kw["extra_body"]["custom_key"] == "custom_val"
def test_extra_body_override_profile(self, transport):
kw = transport.build_kwargs(
model="gpt-5.4", messages=_msgs(), tools=None,
provider_profile=get_provider_profile("openrouter"),
request_overrides={"extra_body": {"custom_key": "custom_val"}},
)
assert kw["extra_body"]["custom_key"] == "custom_val"
def test_extra_body_override_merges_with_provider_body(self, transport):
"""Override extra_body merges WITH provider extra_body, not replaces."""
kw = transport.build_kwargs(
model="hermes-3", messages=_msgs(), tools=None,
provider_profile=get_provider_profile("nous"),
request_overrides={"extra_body": {"custom": True}},
)
assert kw["extra_body"]["tags"] == ["product=hermes-agent"] # from profile
assert kw["extra_body"]["custom"] is True # from override
def test_top_level_override(self, transport):
kw = transport.build_kwargs(
model="gpt-5.4", messages=_msgs(), tools=None,
provider_profile=get_provider_profile("openrouter"),
request_overrides={"top_p": 0.9},
)
assert kw["top_p"] == 0.9

View File

@@ -0,0 +1,203 @@
"""Tests for the provider module registry and profiles."""
import pytest
from providers import get_provider_profile, _REGISTRY
from providers.base import ProviderProfile, OMIT_TEMPERATURE
class TestRegistry:
def test_discovery_populates_registry(self):
p = get_provider_profile("nvidia")
assert p is not None
assert p.name == "nvidia"
def test_alias_lookup(self):
assert get_provider_profile("kimi").name == "kimi-coding"
assert get_provider_profile("moonshot").name == "kimi-coding"
assert get_provider_profile("kimi-coding-cn").name == "kimi-coding-cn"
assert get_provider_profile("or").name == "openrouter"
assert get_provider_profile("nous-portal").name == "nous"
assert get_provider_profile("qwen").name == "qwen-oauth"
assert get_provider_profile("qwen-portal").name == "qwen-oauth"
def test_unknown_provider_returns_none(self):
assert get_provider_profile("nonexistent-provider") is None
def test_all_providers_have_name(self):
get_provider_profile("nvidia") # trigger discovery
for name, profile in _REGISTRY.items():
assert profile.name == name
class TestNvidiaProfile:
def test_max_tokens(self):
p = get_provider_profile("nvidia")
assert p.default_max_tokens == 16384
def test_no_special_temperature(self):
p = get_provider_profile("nvidia")
assert p.fixed_temperature is None
def test_base_url(self):
p = get_provider_profile("nvidia")
assert "nvidia.com" in p.base_url
class TestKimiProfile:
def test_temperature_omit(self):
p = get_provider_profile("kimi")
assert p.fixed_temperature is OMIT_TEMPERATURE
def test_max_tokens(self):
p = get_provider_profile("kimi")
assert p.default_max_tokens == 32000
def test_cn_separate_profile(self):
p = get_provider_profile("kimi-coding-cn")
assert p.name == "kimi-coding-cn"
assert p.env_vars == ("KIMI_CN_API_KEY",)
assert "moonshot.cn" in p.base_url
def test_cn_not_alias_of_kimi(self):
kimi = get_provider_profile("kimi-coding")
cn = get_provider_profile("kimi-coding-cn")
assert kimi is not cn
assert kimi.base_url != cn.base_url
def test_thinking_enabled(self):
p = get_provider_profile("kimi")
eb, tl = p.build_api_kwargs_extras(reasoning_config={"enabled": True, "effort": "high"})
assert eb["thinking"] == {"type": "enabled"}
assert tl["reasoning_effort"] == "high"
def test_thinking_disabled(self):
p = get_provider_profile("kimi")
eb, tl = p.build_api_kwargs_extras(reasoning_config={"enabled": False})
assert eb["thinking"] == {"type": "disabled"}
assert "reasoning_effort" not in tl
def test_reasoning_effort_default(self):
p = get_provider_profile("kimi")
eb, tl = p.build_api_kwargs_extras(reasoning_config={"enabled": True})
assert tl["reasoning_effort"] == "medium"
def test_no_config_defaults(self):
p = get_provider_profile("kimi")
eb, tl = p.build_api_kwargs_extras(reasoning_config=None)
assert eb["thinking"] == {"type": "enabled"}
assert tl["reasoning_effort"] == "medium"
class TestOpenRouterProfile:
def test_extra_body_with_prefs(self):
p = get_provider_profile("openrouter")
body = p.build_extra_body(provider_preferences={"allow": ["anthropic"]})
assert body["provider"] == {"allow": ["anthropic"]}
def test_extra_body_no_prefs(self):
p = get_provider_profile("openrouter")
body = p.build_extra_body()
assert body == {}
def test_reasoning_full_config(self):
p = get_provider_profile("openrouter")
eb, _ = p.build_api_kwargs_extras(
reasoning_config={"enabled": True, "effort": "high"},
supports_reasoning=True,
)
assert eb["reasoning"] == {"enabled": True, "effort": "high"}
def test_reasoning_disabled_still_passes(self):
"""OpenRouter passes disabled reasoning through (unlike Nous)."""
p = get_provider_profile("openrouter")
eb, _ = p.build_api_kwargs_extras(
reasoning_config={"enabled": False},
supports_reasoning=True,
)
assert eb["reasoning"] == {"enabled": False}
def test_default_reasoning(self):
p = get_provider_profile("openrouter")
eb, _ = p.build_api_kwargs_extras(supports_reasoning=True)
assert eb["reasoning"] == {"enabled": True, "effort": "medium"}
class TestNousProfile:
def test_tags(self):
p = get_provider_profile("nous")
body = p.build_extra_body()
assert body["tags"] == ["product=hermes-agent"]
def test_auth_type(self):
p = get_provider_profile("nous")
assert p.auth_type == "oauth_device_code"
def test_reasoning_enabled(self):
p = get_provider_profile("nous")
eb, _ = p.build_api_kwargs_extras(
reasoning_config={"enabled": True, "effort": "medium"},
supports_reasoning=True,
)
assert eb["reasoning"] == {"enabled": True, "effort": "medium"}
def test_reasoning_omitted_when_disabled(self):
p = get_provider_profile("nous")
eb, _ = p.build_api_kwargs_extras(
reasoning_config={"enabled": False},
supports_reasoning=True,
)
assert "reasoning" not in eb
class TestQwenProfile:
def test_max_tokens(self):
p = get_provider_profile("qwen-oauth")
assert p.default_max_tokens == 65536
def test_auth_type(self):
p = get_provider_profile("qwen-oauth")
assert p.auth_type == "oauth_external"
def test_extra_body_vl(self):
p = get_provider_profile("qwen-oauth")
body = p.build_extra_body()
assert body["vl_high_resolution_images"] is True
def test_prepare_messages_normalizes_content(self):
p = get_provider_profile("qwen-oauth")
msgs = [
{"role": "system", "content": "Be helpful"},
{"role": "user", "content": "hello"},
]
result = p.prepare_messages(msgs)
# System message: content normalized to list, cache_control on last part
assert isinstance(result[0]["content"], list)
assert result[0]["content"][-1].get("cache_control") == {"type": "ephemeral"}
assert result[0]["content"][-1]["text"] == "Be helpful"
# User message: content normalized to list
assert isinstance(result[1]["content"], list)
assert result[1]["content"][0]["text"] == "hello"
def test_metadata_top_level(self):
p = get_provider_profile("qwen-oauth")
meta = {"sessionId": "s123", "promptId": "p456"}
eb, tl = p.build_api_kwargs_extras(qwen_session_metadata=meta)
assert tl["metadata"] == meta
assert "metadata" not in eb
class TestBaseProfile:
def test_prepare_messages_passthrough(self):
p = ProviderProfile(name="test")
msgs = [{"role": "user", "content": "hi"}]
assert p.prepare_messages(msgs) is msgs
def test_build_extra_body_empty(self):
p = ProviderProfile(name="test")
assert p.build_extra_body() == {}
def test_build_api_kwargs_extras_empty(self):
p = ProviderProfile(name="test")
eb, tl = p.build_api_kwargs_extras()
assert eb == {}
assert tl == {}

View File

@@ -0,0 +1,258 @@
"""Parity tests: pin the exact current transport behavior per provider.
These tests document the flag-based contract between run_agent.py and
ChatCompletionsTransport.build_kwargs(). When the next PR wires profiles
to replace flags, every assertion here must still pass — any failure is
a behavioral regression.
"""
import pytest
from agent.transports.chat_completions import ChatCompletionsTransport
from providers import get_provider_profile
@pytest.fixture
def transport():
return ChatCompletionsTransport()
def _simple_messages():
return [{"role": "user", "content": "hello"}]
def _max_tokens_fn(n):
return {"max_completion_tokens": n}
class TestNvidiaParity:
"""NVIDIA NIM: default max_tokens=16384."""
def test_default_max_tokens(self, transport):
"""NVIDIA default max_tokens=16384 comes from profile, not legacy is_nvidia_nim flag."""
from providers import get_provider_profile
profile = get_provider_profile("nvidia")
kw = transport.build_kwargs(
model="nvidia/llama-3.1-nemotron-70b-instruct",
messages=_simple_messages(),
tools=None,
max_tokens_param_fn=_max_tokens_fn,
provider_profile=profile,
)
assert kw["max_completion_tokens"] == 16384
def test_user_max_tokens_overrides(self, transport):
from providers import get_provider_profile
profile = get_provider_profile("nvidia")
kw = transport.build_kwargs(
model="nvidia/llama-3.1-nemotron-70b-instruct",
messages=_simple_messages(),
tools=None,
max_tokens=4096,
max_tokens_param_fn=_max_tokens_fn,
provider_profile=profile,
)
assert kw["max_completion_tokens"] == 4096 # user overrides default
class TestKimiParity:
"""Kimi: OMIT temperature, max_tokens=32000, thinking + reasoning_effort."""
def test_temperature_omitted(self, transport):
kw = transport.build_kwargs(
model="kimi-k2",
messages=_simple_messages(),
tools=None,
provider_profile=get_provider_profile("kimi-coding"),
omit_temperature=True,
)
assert "temperature" not in kw
def test_default_max_tokens(self, transport):
kw = transport.build_kwargs(
model="kimi-k2",
messages=_simple_messages(),
tools=None,
provider_profile=get_provider_profile("kimi-coding"),
max_tokens_param_fn=_max_tokens_fn,
)
assert kw["max_completion_tokens"] == 32000
def test_thinking_enabled(self, transport):
kw = transport.build_kwargs(
model="kimi-k2",
messages=_simple_messages(),
tools=None,
provider_profile=get_provider_profile("kimi-coding"),
reasoning_config={"enabled": True, "effort": "high"},
)
assert kw["extra_body"]["thinking"] == {"type": "enabled"}
def test_thinking_disabled(self, transport):
kw = transport.build_kwargs(
model="kimi-k2",
messages=_simple_messages(),
tools=None,
provider_profile=get_provider_profile("kimi-coding"),
reasoning_config={"enabled": False},
)
assert kw["extra_body"]["thinking"] == {"type": "disabled"}
def test_reasoning_effort_top_level(self, transport):
"""Kimi reasoning_effort is a TOP-LEVEL api_kwargs key, NOT in extra_body."""
kw = transport.build_kwargs(
model="kimi-k2",
messages=_simple_messages(),
tools=None,
provider_profile=get_provider_profile("kimi-coding"),
reasoning_config={"enabled": True, "effort": "high"},
)
assert kw.get("reasoning_effort") == "high"
assert "reasoning_effort" not in kw.get("extra_body", {})
def test_reasoning_effort_default_medium(self, transport):
kw = transport.build_kwargs(
model="kimi-k2",
messages=_simple_messages(),
tools=None,
provider_profile=get_provider_profile("kimi-coding"),
reasoning_config={"enabled": True},
)
assert kw.get("reasoning_effort") == "medium"
class TestOpenRouterParity:
"""OpenRouter: provider preferences, reasoning in extra_body."""
def test_provider_preferences(self, transport):
prefs = {"allow": ["anthropic"], "sort": "price"}
kw = transport.build_kwargs(
model="anthropic/claude-sonnet-4.6",
messages=_simple_messages(),
tools=None,
provider_profile=get_provider_profile("openrouter"),
provider_preferences=prefs,
)
assert kw["extra_body"]["provider"] == prefs
def test_reasoning_passes_full_config(self, transport):
"""OpenRouter passes the FULL reasoning_config dict, not just effort."""
rc = {"enabled": True, "effort": "high"}
kw = transport.build_kwargs(
model="anthropic/claude-sonnet-4.6",
messages=_simple_messages(),
tools=None,
provider_profile=get_provider_profile("openrouter"),
supports_reasoning=True,
reasoning_config=rc,
)
assert kw["extra_body"]["reasoning"] == rc
def test_default_reasoning_when_no_config(self, transport):
"""When supports_reasoning=True but no config, adds default."""
kw = transport.build_kwargs(
model="anthropic/claude-sonnet-4.6",
messages=_simple_messages(),
tools=None,
provider_profile=get_provider_profile("openrouter"),
supports_reasoning=True,
)
assert kw["extra_body"]["reasoning"] == {"enabled": True, "effort": "medium"}
class TestNousParity:
"""Nous: product tags, reasoning, omit when disabled."""
def test_tags(self, transport):
kw = transport.build_kwargs(
model="hermes-3-llama-3.1-405b",
messages=_simple_messages(),
tools=None,
provider_profile=get_provider_profile("nous"),
)
assert kw["extra_body"]["tags"] == ["product=hermes-agent"]
def test_reasoning_omitted_when_disabled(self, transport):
"""Nous special case: reasoning omitted entirely when disabled."""
kw = transport.build_kwargs(
model="hermes-3-llama-3.1-405b",
messages=_simple_messages(),
tools=None,
provider_profile=get_provider_profile("nous"),
supports_reasoning=True,
reasoning_config={"enabled": False},
)
assert "reasoning" not in kw.get("extra_body", {})
def test_reasoning_enabled(self, transport):
rc = {"enabled": True, "effort": "high"}
kw = transport.build_kwargs(
model="hermes-3-llama-3.1-405b",
messages=_simple_messages(),
tools=None,
provider_profile=get_provider_profile("nous"),
supports_reasoning=True,
reasoning_config=rc,
)
assert kw["extra_body"]["reasoning"] == rc
class TestQwenParity:
"""Qwen: max_tokens=65536, vl_high_resolution, metadata top-level."""
def test_default_max_tokens(self, transport):
kw = transport.build_kwargs(
model="qwen3.5-plus",
messages=_simple_messages(),
tools=None,
provider_profile=get_provider_profile("qwen-oauth"),
max_tokens_param_fn=_max_tokens_fn,
)
assert kw["max_completion_tokens"] == 65536
def test_vl_high_resolution(self, transport):
kw = transport.build_kwargs(
model="qwen3.5-plus",
messages=_simple_messages(),
tools=None,
provider_profile=get_provider_profile("qwen-oauth"),
)
assert kw["extra_body"]["vl_high_resolution_images"] is True
def test_metadata_top_level(self, transport):
"""Qwen metadata goes to top-level api_kwargs, NOT extra_body."""
meta = {"sessionId": "s123", "promptId": "p456"}
kw = transport.build_kwargs(
model="qwen3.5-plus",
messages=_simple_messages(),
tools=None,
provider_profile=get_provider_profile("qwen-oauth"),
qwen_session_metadata=meta,
)
assert kw["metadata"] == meta
assert "metadata" not in kw.get("extra_body", {})
class TestCustomOllamaParity:
"""Custom/Ollama: num_ctx, think=false — now tested via profile."""
def test_ollama_num_ctx(self, transport):
kw = transport.build_kwargs(
model="llama3.1",
messages=_simple_messages(),
tools=None,
provider_profile=get_provider_profile("custom"),
ollama_num_ctx=131072,
)
assert kw["extra_body"]["options"]["num_ctx"] == 131072
def test_think_false_when_disabled(self, transport):
kw = transport.build_kwargs(
model="qwen3:72b",
messages=_simple_messages(),
tools=None,
provider_profile=get_provider_profile("custom"),
reasoning_config={"enabled": False, "effort": "none"},
)
assert kw["extra_body"]["think"] is False

View File

@@ -1117,6 +1117,7 @@ class TestBuildApiKwargs:
assert "temperature" not in kwargs
def test_kimi_coding_endpoint_omits_temperature(self, agent):
agent.provider = "kimi-coding"
agent.base_url = "https://api.kimi.com/coding/v1"
agent._base_url_lower = agent.base_url.lower()
agent.model = "kimi-k2.5"
@@ -1129,6 +1130,7 @@ class TestBuildApiKwargs:
def test_kimi_coding_endpoint_sends_max_tokens_and_reasoning(self, agent):
"""Kimi endpoint should send max_tokens=32000 and reasoning_effort as
top-level params, matching Kimi CLI's default behavior."""
agent.provider = "kimi-coding"
agent.base_url = "https://api.kimi.com/coding/v1"
agent._base_url_lower = agent.base_url.lower()
agent.model = "kimi-for-coding"
@@ -1141,6 +1143,7 @@ class TestBuildApiKwargs:
def test_kimi_coding_endpoint_respects_custom_effort(self, agent):
"""reasoning_effort should reflect reasoning_config.effort when set."""
agent.provider = "kimi-coding"
agent.base_url = "https://api.kimi.com/coding/v1"
agent._base_url_lower = agent.base_url.lower()
agent.model = "kimi-for-coding"
@@ -1154,6 +1157,7 @@ class TestBuildApiKwargs:
def test_kimi_coding_endpoint_sends_thinking_extra_body(self, agent):
"""Kimi endpoint should send extra_body.thinking={"type":"enabled"}
to activate reasoning mode, mirroring Kimi CLI's with_thinking()."""
agent.provider = "kimi-coding"
agent.base_url = "https://api.kimi.com/coding/v1"
agent._base_url_lower = agent.base_url.lower()
agent.model = "kimi-for-coding"
@@ -1167,6 +1171,7 @@ class TestBuildApiKwargs:
"""When reasoning_config.enabled=False, thinking should be disabled
and reasoning_effort should be omitted entirely — mirroring Kimi
CLI's with_thinking("off") which maps to reasoning_effort=None."""
agent.provider = "kimi-coding"
agent.base_url = "https://api.kimi.com/coding/v1"
agent._base_url_lower = agent.base_url.lower()
agent.model = "kimi-for-coding"
@@ -1180,6 +1185,7 @@ class TestBuildApiKwargs:
def test_moonshot_endpoint_sends_max_tokens_and_reasoning(self, agent):
"""api.moonshot.ai should get the same Kimi-compatible params."""
agent.provider = "kimi-coding"
agent.base_url = "https://api.moonshot.ai/v1"
agent._base_url_lower = agent.base_url.lower()
agent.model = "kimi-k2.5"
@@ -1193,6 +1199,7 @@ class TestBuildApiKwargs:
def test_moonshot_cn_endpoint_sends_max_tokens_and_reasoning(self, agent):
"""api.moonshot.cn (China endpoint) should get the same params."""
agent.provider = "kimi-coding-cn"
agent.base_url = "https://api.moonshot.cn/v1"
agent._base_url_lower = agent.base_url.lower()
agent.model = "kimi-k2.5"
@@ -1205,6 +1212,7 @@ class TestBuildApiKwargs:
assert kwargs["extra_body"]["thinking"] == {"type": "enabled"}
def test_provider_preferences_injected(self, agent):
agent.provider = "openrouter"
agent.base_url = "https://openrouter.ai/api/v1"
agent.providers_allowed = ["Anthropic"]
messages = [{"role": "user", "content": "hi"}]
@@ -1213,6 +1221,7 @@ class TestBuildApiKwargs:
def test_reasoning_config_default_openrouter(self, agent):
"""Default reasoning config for OpenRouter should be medium."""
agent.provider = "openrouter"
agent.base_url = "https://openrouter.ai/api/v1"
agent.model = "anthropic/claude-sonnet-4-20250514"
messages = [{"role": "user", "content": "hi"}]
@@ -1222,6 +1231,7 @@ class TestBuildApiKwargs:
assert reasoning["effort"] == "medium"
def test_reasoning_config_custom(self, agent):
agent.provider = "openrouter"
agent.base_url = "https://openrouter.ai/api/v1"
agent.model = "anthropic/claude-sonnet-4-20250514"
agent.reasoning_config = {"enabled": False}
@@ -1237,6 +1247,7 @@ class TestBuildApiKwargs:
assert "reasoning" not in kwargs.get("extra_body", {})
def test_reasoning_sent_for_supported_openrouter_model(self, agent):
agent.provider = "openrouter"
agent.base_url = "https://openrouter.ai/api/v1"
agent.model = "qwen/qwen3.5-plus-02-15"
messages = [{"role": "user", "content": "hi"}]
@@ -1244,6 +1255,7 @@ class TestBuildApiKwargs:
assert kwargs["extra_body"]["reasoning"]["effort"] == "medium"
def test_reasoning_sent_for_nous_route(self, agent):
agent.provider = "nous"
agent.base_url = "https://inference-api.nousresearch.com/v1"
agent.model = "minimax/minimax-m2.5"
messages = [{"role": "user", "content": "hi"}]
@@ -1251,18 +1263,38 @@ class TestBuildApiKwargs:
assert kwargs["extra_body"]["reasoning"]["effort"] == "medium"
def test_reasoning_sent_for_copilot_gpt5(self, agent):
agent.base_url = "https://api.githubcopilot.com"
agent.model = "gpt-5.4"
messages = [{"role": "user", "content": "hi"}]
kwargs = agent._build_api_kwargs(messages)
"""Copilot/GitHub Models: GPT-5 reasoning goes in extra_body.reasoning."""
from agent.transports import get_transport
from providers import get_provider_profile
transport = get_transport("chat_completions")
profile = get_provider_profile("copilot")
msgs = [{"role": "user", "content": "hi"}]
kwargs = transport.build_kwargs(
model="gpt-5.4",
messages=msgs,
tools=None,
supports_reasoning=True,
provider_profile=profile,
)
assert kwargs["extra_body"]["reasoning"] == {"effort": "medium"}
def test_reasoning_xhigh_normalized_for_copilot(self, agent):
agent.base_url = "https://api.githubcopilot.com"
agent.model = "gpt-5.4"
agent.reasoning_config = {"enabled": True, "effort": "xhigh"}
messages = [{"role": "user", "content": "hi"}]
kwargs = agent._build_api_kwargs(messages)
"""xhigh effort should normalize to high for Copilot GitHub Models."""
from agent.transports import get_transport
from providers import get_provider_profile
transport = get_transport("chat_completions")
profile = get_provider_profile("copilot")
msgs = [{"role": "user", "content": "hi"}]
kwargs = transport.build_kwargs(
model="gpt-5.4",
messages=msgs,
tools=None,
supports_reasoning=True,
reasoning_config={"enabled": True, "effort": "xhigh"},
provider_profile=profile,
)
assert kwargs["extra_body"]["reasoning"] == {"effort": "high"}
def test_reasoning_omitted_for_non_reasoning_copilot_model(self, agent):
@@ -1280,6 +1312,7 @@ class TestBuildApiKwargs:
def test_qwen_portal_formats_messages_and_metadata(self, agent):
agent.provider = "qwen-oauth"
agent.base_url = "https://portal.qwen.ai/v1"
agent._base_url_lower = agent.base_url.lower()
agent.session_id = "sess-123"
@@ -1296,6 +1329,7 @@ class TestBuildApiKwargs:
assert kwargs["messages"][2]["content"][0]["text"] == "hi"
def test_qwen_portal_normalizes_bare_string_content_parts(self, agent):
agent.provider = "qwen-oauth"
agent.base_url = "https://portal.qwen.ai/v1"
agent._base_url_lower = agent.base_url.lower()
messages = [
@@ -1308,6 +1342,7 @@ class TestBuildApiKwargs:
assert user_content[1] == {"type": "text", "text": "world"}
def test_qwen_portal_no_system_message(self, agent):
agent.provider = "qwen-oauth"
agent.base_url = "https://portal.qwen.ai/v1"
agent._base_url_lower = agent.base_url.lower()
messages = [{"role": "user", "content": "hi"}]
@@ -1328,6 +1363,7 @@ class TestBuildApiKwargs:
def test_qwen_portal_default_max_tokens(self, agent):
"""When max_tokens is None, Qwen Portal gets a default of 65536
to prevent reasoning models from exhausting their output budget."""
agent.provider = "qwen-oauth"
agent.base_url = "https://portal.qwen.ai/v1"
agent._base_url_lower = agent.base_url.lower()
agent.max_tokens = None

View File

@@ -93,6 +93,46 @@ This path includes everything from Path A plus:
11. `run_agent.py`
12. `pyproject.toml` if a provider SDK is required
## Fast path: Simple API-key providers
If your provider is just an OpenAI-compatible endpoint that authenticates with a single API key, you do not need to touch `auth.py`, `runtime_provider.py`, `main.py`, or any of the other files in the full checklist below.
All you need is:
1. A plugin directory under `plugins/model-providers/<your-provider>/` containing:
- `__init__.py` — calls `register_provider(profile)` at module-level
- `plugin.yaml` — manifest (name, kind: model-provider, version, description)
2. That's it. Provider plugins auto-load the first time anything calls `get_provider_profile()` or `list_providers()` — bundled plugins (this repo) and user plugins at `$HERMES_HOME/plugins/model-providers/` both get picked up.
When you add a plugin and it calls `register_provider()`, the following wire up automatically:
1. `PROVIDER_REGISTRY` entry in `auth.py` (credential resolution, env-var lookup)
2. `api_mode` set to `chat_completions`
3. `base_url` sourced from the config or the declared env var
4. `env_vars` checked in priority order for the API key
5. `fallback_models` list registered for the provider
6. `--provider` CLI flag accepts the provider id
7. `hermes model` menu includes the provider
8. `hermes setup` wizard delegates to `main.py` automatically
9. `provider:model` alias syntax works
10. Runtime resolver returns the correct `base_url` and `api_key`
11. `HERMES_INFERENCE_PROVIDER` env-var override accepts the provider id
12. Fallback model activation can switch into the provider cleanly
User plugins at `$HERMES_HOME/plugins/model-providers/<name>/` override bundled plugins of the same name (last-writer-wins in `register_provider()`) — so third parties can monkey-patch or replace any built-in profile without editing the repo.
See `plugins/model-providers/nvidia/` or `plugins/model-providers/gmi/` as a template, and the full [Model Provider Plugin guide](/docs/developer-guide/model-provider-plugin) for field reference, hook idioms, and end-to-end examples.
## Full path: OAuth and complex providers
Use the full checklist below when your provider needs any of the following:
- OAuth or token refresh (Nous Portal, Codex, Google Gemini, Qwen Portal, Copilot)
- A non-OpenAI API shape that requires a new adapter (Anthropic Messages, Codex Responses)
- Custom endpoint detection or multi-region probing (z.ai, Kimi)
- A curated static model catalog or live `/models` fetch
- Provider-specific `hermes model` menu entries with bespoke auth flows
## Step 1: Pick one canonical provider id
Choose a single provider id and use it everywhere.

View File

@@ -0,0 +1,267 @@
---
sidebar_position: 10
title: "Model Provider Plugins"
description: "How to build a model provider (inference backend) plugin for Hermes Agent"
---
# Building a Model Provider Plugin
Model provider plugins declare an inference backend — an OpenAI-compatible endpoint, an Anthropic Messages server, a Codex-style Responses API, or a Bedrock-native surface — that Hermes can route `AIAgent` calls through. Every built-in provider (OpenRouter, Anthropic, GMI, DeepSeek, Nvidia, …) ships as one of these plugins. Third parties can add their own by dropping a directory under `$HERMES_HOME/plugins/model-providers/` with zero changes to the repo.
:::tip
Model provider plugins are the third kind of **provider plugin**. The others are [Memory Provider Plugins](/docs/developer-guide/memory-provider-plugin) (cross-session knowledge) and [Context Engine Plugins](/docs/developer-guide/context-engine-plugin) (context compression strategies). All three follow the same "drop a directory, declare a profile, no repo edits" pattern.
:::
## How discovery works
`providers/__init__.py._discover_providers()` runs lazily the first time any code calls `get_provider_profile()` or `list_providers()`. Discovery order:
1. **Bundled plugins**`<repo>/plugins/model-providers/<name>/` — ship with Hermes
2. **User plugins**`$HERMES_HOME/plugins/model-providers/<name>/` — drop in any directory; no restart required for subsequent sessions
3. **Legacy single-file**`<repo>/providers/<name>.py` — back-compat for out-of-tree editable installs
**User plugins override bundled plugins of the same name** because `register_provider()` is last-writer-wins. Drop a `$HERMES_HOME/plugins/model-providers/gmi/` directory to replace the built-in GMI profile without touching the repo.
## Directory structure
```
plugins/model-providers/my-provider/
├── __init__.py # Calls register_provider(profile) at module-level
├── plugin.yaml # kind: model-provider + metadata (optional but recommended)
└── README.md # Setup instructions (optional)
```
The only required file is `__init__.py`. `plugin.yaml` is used by `hermes plugins` for introspection and by the general PluginManager to route the plugin to the right loader; without it, the general loader falls back to a source-text heuristic.
## Minimal example — a simple API-key provider
```python
# plugins/model-providers/acme-inference/__init__.py
from providers import register_provider
from providers.base import ProviderProfile
acme = ProviderProfile(
name="acme-inference",
aliases=("acme",),
display_name="Acme Inference",
description="Acme — OpenAI-compatible direct API",
signup_url="https://acme.example.com/keys",
env_vars=("ACME_API_KEY", "ACME_BASE_URL"),
base_url="https://api.acme.example.com/v1",
auth_type="api_key",
default_aux_model="acme-small-fast",
fallback_models=(
"acme-large-v3",
"acme-medium-v3",
"acme-small-fast",
),
)
register_provider(acme)
```
```yaml
# plugins/model-providers/acme-inference/plugin.yaml
name: acme-inference
kind: model-provider
version: 1.0.0
description: Acme Inference — OpenAI-compatible direct API
author: Your Name
```
That's it. After dropping these two files, the following **auto-wire** with no other edits:
| Integration | Where | What it gets |
|---|---|---|
| Credential resolution | `hermes_cli/auth.py` | `PROVIDER_REGISTRY["acme-inference"]` populated from profile |
| `--provider` CLI flag | `hermes_cli/main.py` | Accepts `acme-inference` |
| `hermes model` picker | `hermes_cli/models.py` | Appears in `CANONICAL_PROVIDERS`, model list fetched from `{base_url}/models` |
| `hermes doctor` | `hermes_cli/doctor.py` | Health check for `ACME_API_KEY` + `{base_url}/models` probe |
| `hermes setup` | `hermes_cli/config.py` | `ACME_API_KEY` appears in `OPTIONAL_ENV_VARS` and the setup wizard |
| URL reverse-mapping | `agent/model_metadata.py` | Hostname → provider name for auto-detection |
| Auxiliary model | `agent/auxiliary_client.py` | Uses `default_aux_model` for compression / summarization |
| Runtime resolution | `hermes_cli/runtime_provider.py` | Returns correct `base_url`, `api_key`, `api_mode` |
| Transport | `agent/transports/chat_completions.py` | Profile path generates kwargs via `prepare_messages` / `build_extra_body` / `build_api_kwargs_extras` |
## ProviderProfile fields
Full definition in `providers/base.py`. The most useful ones:
| Field | Type | Purpose |
|---|---|---|
| `name` | str | Canonical id — matches `--provider` choices and `HERMES_INFERENCE_PROVIDER` |
| `aliases` | `tuple[str, ...]` | Alternative names resolved by `get_provider_profile()` (e.g. `grok``xai`) |
| `api_mode` | str | `chat_completions` \| `codex_responses` \| `anthropic_messages` \| `bedrock_converse` |
| `display_name` | str | Human label shown in `hermes model` picker |
| `description` | str | Picker subtitle |
| `signup_url` | str | Shown during first-run setup ("get an API key here") |
| `env_vars` | `tuple[str, ...]` | API-key env vars in priority order; a final `*_BASE_URL` entry is used as the user base-URL override |
| `base_url` | str | Default inference endpoint |
| `models_url` | str | Explicit catalog URL (falls back to `{base_url}/models`) |
| `auth_type` | str | `api_key` \| `oauth_device_code` \| `oauth_external` \| `copilot` \| `aws_sdk` \| `external_process` |
| `fallback_models` | `tuple[str, ...]` | Curated list shown when live catalog fetch fails |
| `default_headers` | `dict[str, str]` | Sent on every request (e.g. Copilot's `Editor-Version`) |
| `fixed_temperature` | Any | `None` = use caller's value; `OMIT_TEMPERATURE` sentinel = don't send temperature at all (Kimi) |
| `default_max_tokens` | `int \| None` | Provider-level max_tokens cap (Nvidia: 16384) |
| `default_aux_model` | str | Cheap model for auxiliary tasks (compression, vision, summarization) |
## Overridable hooks
Subclass `ProviderProfile` for non-trivial quirks:
```python
from typing import Any
from providers.base import ProviderProfile
class AcmeProfile(ProviderProfile):
def prepare_messages(self, messages: list[dict[str, Any]]) -> list[dict[str, Any]]:
"""Provider-specific message preprocessing. Runs after codex
sanitization, before developer-role swap. Default: pass-through."""
# Example: Qwen normalizes plain-text content to a list-of-parts
# array and injects cache_control; Kimi rewrites tool-call JSON
return messages
def build_extra_body(self, *, session_id=None, **context) -> dict:
"""Provider-specific extra_body fields merged into the API call.
Context includes: session_id, provider_preferences, model, base_url,
reasoning_config. Default: empty dict."""
# Example: OpenRouter's provider-preferences block,
# Gemini's thinking_config translation.
return {}
def build_api_kwargs_extras(self, *, reasoning_config=None, **context):
"""Returns (extra_body_additions, top_level_kwargs). Needed when some
fields go top-level (Kimi's reasoning_effort) and some go in extra_body
(OpenRouter's reasoning dict). Default: ({}, {})."""
return {}, {}
def fetch_models(self, *, api_key=None, timeout=8.0) -> list[str] | None:
"""Live catalog fetch. Default hits {models_url or base_url}/models with
Bearer auth. Override for: custom auth (Anthropic), no REST endpoint
(Bedrock → None), or public/unauthenticated catalogs (OpenRouter)."""
return super().fetch_models(api_key=api_key, timeout=timeout)
```
## Hook reference examples
Look at these bundled plugins for idioms:
| Plugin | Why look |
|---|---|
| `plugins/model-providers/openrouter/` | Aggregator with provider preferences, public model catalog |
| `plugins/model-providers/gemini/` | `thinking_config` translation (native + OpenAI-compat nested forms) |
| `plugins/model-providers/kimi-coding/` | `OMIT_TEMPERATURE`, `extra_body.thinking`, top-level `reasoning_effort` |
| `plugins/model-providers/qwen-oauth/` | Message normalization, `cache_control` injection, VL high-res |
| `plugins/model-providers/nous/` | Attribution tags, "omit reasoning when disabled" |
| `plugins/model-providers/custom/` | Ollama `num_ctx` + `think: false` quirks |
| `plugins/model-providers/bedrock/` | `api_mode="bedrock_converse"`, `fetch_models` returns None (no REST endpoint) |
## User overrides — replace a built-in without editing the repo
Say you want to point `gmi` at your private staging endpoint for testing. Create `~/.hermes/plugins/model-providers/gmi/__init__.py`:
```python
from providers import register_provider
from providers.base import ProviderProfile
register_provider(ProviderProfile(
name="gmi",
aliases=("gmi-cloud", "gmicloud"),
env_vars=("GMI_API_KEY",),
base_url="https://gmi-staging.internal.example.com/v1",
auth_type="api_key",
default_aux_model="google/gemini-3.1-flash-lite-preview",
))
```
Next session, `get_provider_profile("gmi").base_url` returns the staging URL. No repo patch, no rebuild. Because user plugins are discovered after bundled ones, the user `register_provider()` call wins.
## api_mode selection
Four values are recognized. Hermes picks one based on:
1. User explicit override (`config.yaml` `model.api_mode` when set)
2. OpenCode's per-model dispatch (`opencode_model_api_mode` for Zen and Go)
3. URL auto-detection — `/anthropic` suffix → `anthropic_messages`, `api.openai.com``codex_responses`, `api.x.ai``codex_responses`, `/coding` on Kimi domains → `chat_completions`
4. **Profile `api_mode`** as a fallback when URL detection finds nothing
5. Default `chat_completions`
Set `profile.api_mode` to match the default your provider ships — it acts as a hint. User URL overrides still win.
## Auth types
| `auth_type` | Meaning | Who uses it |
|---|---|---|
| `api_key` | Single env var carries a static API key | Most providers |
| `oauth_device_code` | Device-code OAuth flow | — |
| `oauth_external` | User signs in elsewhere, tokens land in `auth.json` | Anthropic OAuth, MiniMax OAuth, Gemini Cloud Code, Qwen Portal, Nous Portal |
| `copilot` | GitHub Copilot token refresh cycle | `copilot` plugin only |
| `aws_sdk` | AWS SDK credential chain (IAM role, profile, env) | `bedrock` plugin only |
| `external_process` | Auth handled by a subprocess the agent spawns | `copilot-acp` plugin only |
`auth_type` gates which codepaths treat your provider as a "simple api-key provider" — if it's not `api_key`, the PluginManager still records the manifest but Hermes' CLI-level automation (doctor checks, `--provider` flag, setup wizard delegation) may skip over it.
## Discovery timing
Provider discovery is **lazy** — triggered by the first `get_provider_profile()` or `list_providers()` call in the process. In practice this happens early at startup (`auth.py` module load extends `PROVIDER_REGISTRY` eagerly). If you need to verify your plugin loaded, run:
```bash
hermes doctor
```
— a successful `auth_type="api_key"` profile appears under the Provider Connectivity section with a `/models` probe.
For programmatic inspection:
```python
from providers import list_providers
for p in list_providers():
print(p.name, p.base_url, p.api_mode)
```
## Testing your plugin
Point `HERMES_HOME` at a temp directory so you don't pollute your real config:
```bash
export HERMES_HOME=/tmp/hermes-plugin-test
mkdir -p $HERMES_HOME/plugins/model-providers/my-provider
cat > $HERMES_HOME/plugins/model-providers/my-provider/__init__.py <<'EOF'
from providers import register_provider
from providers.base import ProviderProfile
register_provider(ProviderProfile(
name="my-provider",
env_vars=("MY_API_KEY",),
base_url="https://api.my-provider.example.com/v1",
auth_type="api_key",
))
EOF
export MY_API_KEY=your-test-key
hermes -z "hello" --provider my-provider -m some-model
```
## General PluginManager integration
The general `PluginManager` (the thing `hermes plugins` operates on) **sees** model-provider plugins but does not import them — `providers/__init__.py` owns their lifecycle. The manager records the manifest for introspection and categorizes by `kind: model-provider`. When you drop an unlabeled user plugin into `$HERMES_HOME/plugins/` that happens to call `register_provider` with a `ProviderProfile`, the manager auto-coerces it to `kind: model-provider` via a source-text heuristic — so the plugin still routes correctly even without `plugin.yaml`.
## Distribute via pip
Like any Hermes plugin, model providers can ship as a pip package. Add an entry point to your `pyproject.toml`:
```toml
[project.entry-points."hermes.plugins"]
acme-inference = "acme_hermes_plugin:register"
```
…where `acme_hermes_plugin:register` is a function that calls `register_provider(profile)`. The general PluginManager picks up entry-point plugins during `discover_and_load()`. For `kind: model-provider` pip plugins, you still need to declare the kind in your manifest (or rely on the source-text heuristic).
See [Building a Hermes Plugin](/docs/guides/build-a-hermes-plugin#distribute-via-pip) for the full entry-points setup.
## Related pages
- [Provider Runtime](/docs/developer-guide/provider-runtime) — resolution precedence + where each layer reads the profile
- [Adding Providers](/docs/developer-guide/adding-providers) — end-to-end checklist for new inference backends (covers both the fast plugin path and the full CLI/auth integration)
- [Memory Provider Plugins](/docs/developer-guide/memory-provider-plugin)
- [Context Engine Plugins](/docs/developer-guide/context-engine-plugin)
- [Building a Hermes Plugin](/docs/guides/build-a-hermes-plugin) — general plugin authoring

View File

@@ -20,8 +20,12 @@ Primary implementation:
- `hermes_cli/auth.py` — provider registry, `resolve_provider()`
- `hermes_cli/model_switch.py` — shared `/model` switch pipeline (CLI + gateway)
- `agent/auxiliary_client.py` — auxiliary model routing
- `providers/` — ABC + registry entry points (`ProviderProfile`, `register_provider`, `get_provider_profile`, `list_providers`)
- `plugins/model-providers/<name>/` — per-provider plugins (bundled) that declare `api_mode`, `base_url`, `env_vars`, `fallback_models` and register themselves into the registry on first access. User plugins at `$HERMES_HOME/plugins/model-providers/<name>/` override bundled ones of the same name.
If you are trying to add a new first-class inference provider, read [Adding Providers](./adding-providers.md) alongside this page.
`get_provider_profile()` in `providers/` returns a `ProviderProfile` for a given provider id. `runtime_provider.py` calls this at resolution time to get the canonical `base_url`, `env_vars` priority list, `api_mode`, and `fallback_models` without needing to duplicate that data in multiple files. Adding a new plugin under `plugins/model-providers/<your-provider>/` (or `$HERMES_HOME/plugins/model-providers/<your-provider>/`) that calls `register_provider()` is enough for `runtime_provider.py` to pick it up — no branch needed in the resolver itself.
If you are trying to add a new first-class inference provider, read [Adding Providers](./adding-providers.md) and the [Model Provider Plugin guide](./model-provider-plugin.md) alongside this page.
## Resolution precedence

View File

@@ -9,6 +9,28 @@ description: "Step-by-step guide to building a complete Hermes plugin with tools
This guide walks through building a complete Hermes plugin from scratch. By the end you'll have a working plugin with multiple tools, lifecycle hooks, shipped data files, and a bundled skill — everything the plugin system supports.
:::info Not sure which guide you need?
Hermes has several distinct pluggable interfaces — some use Python `register_*` APIs, others are config-driven or drop-in directories. Use this map first:
| If you want to add… | Read |
|---|---|
| Custom tools, hooks, slash commands, skills, or CLI subcommands | **This guide** (the general plugin surface) |
| An **LLM / inference backend** (new provider) | [Model Provider Plugins](/docs/developer-guide/model-provider-plugin) |
| A **gateway channel** (Discord/Telegram/IRC/Teams/etc.) | [Adding Platform Adapters](/docs/developer-guide/adding-platform-adapters) |
| A **memory backend** (Honcho/Mem0/Supermemory/etc.) | [Memory Provider Plugins](/docs/developer-guide/memory-provider-plugin) |
| A **context-compression engine** | [Context Engine Plugins](/docs/developer-guide/context-engine-plugin) |
| An **image-generation backend** | See bundled examples in `plugins/image_gen/openai/` and `plugins/image_gen/xai/` |
| A **TTS backend** (any CLI — Piper, VoxCPM, Kokoro, voice cloning, …) | [TTS custom command providers](/docs/user-guide/features/tts#custom-command-providers) — config-driven, no Python needed |
| An **STT backend** (custom whisper / ASR CLI) | [Voice Message Transcription](/docs/user-guide/features/tts#voice-message-transcription-stt) — set `HERMES_LOCAL_STT_COMMAND` to a shell template |
| **External tools via MCP** (filesystem, GitHub, Linear, any MCP server) | [MCP](/docs/user-guide/features/mcp) — declare `mcp_servers.<name>` in `config.yaml` |
| **Gateway event hooks** (fire on startup, session events, commands) | [Event Hooks](/docs/user-guide/features/hooks#gateway-event-hooks) — drop `HOOK.yaml` + `handler.py` into `~/.hermes/hooks/<name>/` |
| **Shell hooks** (run a shell command on events) | [Shell Hooks](/docs/user-guide/features/hooks#shell-hooks) — declare under `hooks:` in `config.yaml` |
| **Additional skill sources** (custom GitHub repos, private skill indexes) | [Skills](/docs/user-guide/features/skills) — `hermes skills tap add <repo>` |
| A first-class **core** inference provider (not a plugin) | [Adding Providers](/docs/developer-guide/adding-providers) |
See the full [Pluggable interfaces table](/docs/user-guide/features/plugins#pluggable-interfaces--where-to-go-for-each) for a consolidated view of every extension surface including config-driven (TTS, STT, MCP, shell hooks) and drop-in directory (gateway hooks) styles.
:::
## What you're building
A **calculator** plugin with two tools:
@@ -629,12 +651,267 @@ def register(ctx):
```
:::tip
This guide covers **general plugins** (tools, hooks, slash commands, CLI commands). For specialized plugin types, see:
- [Memory Provider Plugins](/docs/developer-guide/memory-provider-plugin) — cross-session knowledge backends
- [Context Engine Plugins](/docs/developer-guide/context-engine-plugin) — alternative context management strategies
This guide covers **general plugins** (tools, hooks, slash commands, CLI commands). The sections below sketch the authoring pattern for each specialized plugin type; each links to its full guide for field reference and examples.
:::
### Distribute via pip
## Specialized plugin types
Hermes has five specialized plugin types beyond the general surface. Each ships as a directory under `plugins/<category>/<name>/` (bundled) or `~/.hermes/plugins/<category>/<name>/` (user). The contract differs by category — pick the one you need, then read its full guide.
### Model provider plugins — add an LLM backend
Drop a profile into `plugins/model-providers/<name>/`:
```python
# plugins/model-providers/acme/__init__.py
from providers import register_provider
from providers.base import ProviderProfile
register_provider(ProviderProfile(
name="acme",
aliases=("acme-inference",),
display_name="Acme Inference",
env_vars=("ACME_API_KEY", "ACME_BASE_URL"),
base_url="https://api.acme.example.com/v1",
auth_type="api_key",
default_aux_model="acme-small-fast",
fallback_models=("acme-large-v3", "acme-medium-v3"),
))
```
```yaml
# plugins/model-providers/acme/plugin.yaml
name: acme-provider
kind: model-provider
version: 1.0.0
description: Acme Inference — OpenAI-compatible direct API
```
Lazy-discovered the first time anything calls `get_provider_profile()` or `list_providers()``auth.py`, `config.py`, `doctor.py`, `models.py`, `runtime_provider.py`, and the chat_completions transport auto-wire to it. User plugins override bundled ones by name.
**Full guide:** [Model Provider Plugins](/docs/developer-guide/model-provider-plugin) — field reference, overridable hooks (`prepare_messages`, `build_extra_body`, `build_api_kwargs_extras`, `fetch_models`), api_mode selection, auth types, testing.
### Platform plugins — add a gateway channel
Drop an adapter into `plugins/platforms/<name>/`:
```python
# plugins/platforms/myplatform/adapter.py
from gateway.platforms.base import BasePlatformAdapter
class MyPlatformAdapter(BasePlatformAdapter):
async def connect(self): ...
async def send(self, chat_id, text): ...
async def disconnect(self): ...
def check_requirements():
import os
return bool(os.environ.get("MYPLATFORM_TOKEN"))
def register(ctx):
ctx.register_platform(
name="myplatform",
label="MyPlatform",
adapter_factory=lambda cfg: MyPlatformAdapter(cfg),
check_fn=check_requirements,
required_env=["MYPLATFORM_TOKEN"],
emoji="💬",
platform_hint="You are chatting via MyPlatform. Keep responses concise.",
)
```
```yaml
# plugins/platforms/myplatform/plugin.yaml
name: myplatform-platform
kind: platform
version: 1.0.0
description: MyPlatform gateway adapter
requires_env: [MYPLATFORM_TOKEN]
```
**Full guide:** [Adding Platform Adapters](/docs/developer-guide/adding-platform-adapters) — complete `BasePlatformAdapter` contract, message routing, auth gating, setup wizard integration. Look at `plugins/platforms/irc/` for a stdlib-only working example.
### Memory provider plugins — add a cross-session knowledge backend
Drop an implementation of `MemoryProvider` into `plugins/memory/<name>/`:
```python
# plugins/memory/my-memory/__init__.py
from agent.memory_provider import MemoryProvider
class MyMemoryProvider(MemoryProvider):
@property
def name(self) -> str:
return "my-memory"
def is_available(self) -> bool:
import os
return bool(os.environ.get("MY_MEMORY_API_KEY"))
def initialize(self, session_id: str, **kwargs) -> None:
self._session_id = session_id
def sync_turn(self, user_message, assistant_response, **kwargs) -> None:
...
def prefetch(self, query: str, **kwargs) -> str | None:
...
def register(ctx):
ctx.register_memory_provider(MyMemoryProvider())
```
Memory providers are single-select — only one is active at a time, chosen via `memory.provider` in `config.yaml`.
**Full guide:** [Memory Provider Plugins](/docs/developer-guide/memory-provider-plugin) — full `MemoryProvider` ABC, threading contract, profile isolation, CLI command registration via `cli.py`.
### Context engine plugins — replace the context compressor
```python
# plugins/context_engine/my-engine/__init__.py
from agent.context_engine import ContextEngine
class MyContextEngine(ContextEngine):
@property
def name(self) -> str:
return "my-engine"
def should_compress(self, messages, model) -> bool: ...
def compress(self, messages, model) -> list[dict]: ...
def register(ctx):
ctx.register_context_engine(MyContextEngine())
```
Context engines are single-select — chosen via `context.engine` in `config.yaml`.
**Full guide:** [Context Engine Plugins](/docs/developer-guide/context-engine-plugin).
### Image-generation backends
Drop a provider into `plugins/image_gen/<name>/`:
```python
# plugins/image_gen/my-imggen/__init__.py
from agent.image_gen_provider import ImageGenProvider
class MyImageGenProvider(ImageGenProvider):
@property
def name(self) -> str:
return "my-imggen"
def is_available(self) -> bool: ...
def generate(self, prompt: str, **kwargs) -> str: ... # returns image path
def register(ctx):
ctx.register_image_gen_provider(MyImageGenProvider())
```
```yaml
# plugins/image_gen/my-imggen/plugin.yaml
name: my-imggen
kind: backend
version: 1.0.0
description: Custom image generation backend
```
**Reference examples:** `plugins/image_gen/openai/` (DALL-E / GPT-Image via OpenAI SDK), `plugins/image_gen/openai-codex/`, `plugins/image_gen/xai/` (Grok image gen).
## Non-Python extension surfaces
Hermes also accepts extensions that aren't Python plugins at all. These are shown in the [Pluggable interfaces table](/docs/user-guide/features/plugins#pluggable-interfaces--where-to-go-for-each); the sections below sketch each authoring style briefly.
### MCP servers — register external tools
Model Context Protocol (MCP) servers register their own tools into Hermes without any Python plugin. Declare them in `~/.hermes/config.yaml`:
```yaml
mcp_servers:
filesystem:
command: "npx"
args: ["-y", "@modelcontextprotocol/server-filesystem", "/home/user/projects"]
timeout: 120
linear:
url: "https://mcp.linear.app/sse"
auth:
type: "oauth"
```
Hermes connects to each server at startup, lists its tools, and registers them alongside built-ins. The LLM sees them exactly like any other tool. **Full guide:** [MCP](/docs/user-guide/features/mcp).
### Gateway event hooks — fire on lifecycle events
Drop a manifest + handler into `~/.hermes/hooks/<name>/`:
```yaml
# ~/.hermes/hooks/long-task-alert/HOOK.yaml
name: long-task-alert
description: Send a push notification when a long task finishes
events:
- agent:end
```
```python
# ~/.hermes/hooks/long-task-alert/handler.py
async def handle(event_type: str, context: dict) -> None:
if context.get("duration_seconds", 0) > 120:
# send notification …
pass
```
Events include `gateway:startup`, `session:start`, `session:end`, `session:reset`, `agent:start`, `agent:step`, `agent:end`, and wildcard `command:*`. Errors in hooks are caught and logged — they never block the main pipeline.
**Full guide:** [Gateway Event Hooks](/docs/user-guide/features/hooks#gateway-event-hooks).
### Shell hooks — run a shell command on tool calls
If you just want to run a script when a tool fires (notifications, audit logs, desktop alerts, auto-formatters), use shell hooks in `config.yaml` — no Python required:
```yaml
hooks:
- event: post_tool_call
command: "notify-send 'Tool ran: {tool_name}'"
when:
tools: [terminal, patch, write_file]
```
Supports all the same events as Python plugin hooks (`pre_tool_call`, `post_tool_call`, `pre_llm_call`, `post_llm_call`, `on_session_start`, `on_session_end`, `pre_gateway_dispatch`) plus structured JSON output for `pre_tool_call` blocking decisions.
**Full guide:** [Shell Hooks](/docs/user-guide/features/hooks#shell-hooks).
### Skill sources — add a custom skill registry
If you maintain a private GitHub repo of skills (or want to pull from a community index beyond the built-in sources), add it as a **tap**:
```bash
hermes skills tap add myorg/skills-repo
hermes skills search my-workflow --source myorg/skills-repo
hermes skills install myorg/skills-repo/my-workflow
```
**Full guide:** [Skills Hub](/docs/user-guide/features/skills#skills-hub).
### TTS / STT via command templates
Any CLI that reads/writes audio or text can be plugged in through `config.yaml` — no Python code:
```yaml
tts:
provider: voxcpm
providers:
voxcpm:
type: command
command: "voxcpm --ref ~/voice.wav --text-file {input_path} --out {output_path}"
output_format: mp3
voice_compatible: true
```
For STT, point `HERMES_LOCAL_STT_COMMAND` at a shell template. Supported placeholders: `{input_path}`, `{output_path}`, `{format}`, `{voice}`, `{model}`, `{speed}` (TTS); `{input_path}`, `{output_dir}`, `{language}`, `{model}` (STT). Any path-interacting CLI is automatically a plugin.
**Full guides:** [TTS custom command providers](/docs/user-guide/features/tts#custom-command-providers) · [STT](/docs/user-guide/features/tts#voice-message-transcription-stt).
## Distribute via pip
For sharing plugins publicly, add an entry point to your Python package:
@@ -649,7 +926,7 @@ pip install hermes-plugin-calculator
# Plugin auto-discovered on next hermes startup
```
### Distribute for NixOS
## Distribute for NixOS
NixOS users can install your plugin declaratively if you provide a `pyproject.toml` with entry points:

View File

@@ -480,6 +480,44 @@ model:
For on-prem deployments (DGX Spark, local GPU), set `NVIDIA_BASE_URL=http://localhost:8000/v1`. NIM exposes the same OpenAI-compatible chat completions API as build.nvidia.com, so switching between cloud and local is a one-line env-var change.
:::
### GMI Cloud
Open and reasoning models via [GMI Cloud](https://inference.gmi.ai) — OpenAI-compatible API, API key authentication.
```bash
# GMI Cloud
hermes chat --provider gmi --model deepseek-ai/DeepSeek-R1
# Requires: GMI_API_KEY in ~/.hermes/.env
```
Or set it permanently in `config.yaml`:
```yaml
model:
provider: "gmi"
default: "deepseek-ai/DeepSeek-R1"
```
The base URL can be overridden with `GMI_BASE_URL` (default: `https://api.gmi.ai/v1`).
### StepFun
Step-series models via [StepFun](https://platform.stepfun.com) — OpenAI-compatible API, API key authentication.
```bash
# StepFun
hermes chat --provider stepfun --model step-3-mini
# Requires: STEPFUN_API_KEY in ~/.hermes/.env
```
Or set it permanently in `config.yaml`:
```yaml
model:
provider: "stepfun"
default: "step-3-mini"
```
The base URL can be overridden with `STEPFUN_BASE_URL` (default: `https://api.stepfun.com/v1`).
### Hugging Face Inference Providers
[Hugging Face Inference Providers](https://huggingface.co/docs/inference-providers) routes to 20+ open models through a unified OpenAI-compatible endpoint (`router.huggingface.co/v1`). Requests are automatically routed to the fastest available backend (Groq, Together, SambaNova, etc.) with automatic failover.
@@ -1239,7 +1277,7 @@ fallback_model:
When activated, the fallback swaps the model and provider mid-session without losing your conversation. It fires **at most once** per session.
Supported providers: `openrouter`, `nous`, `openai-codex`, `copilot`, `copilot-acp`, `anthropic`, `gemini`, `google-gemini-cli`, `qwen-oauth`, `huggingface`, `zai`, `kimi-coding`, `kimi-coding-cn`, `minimax`, `minimax-cn`, `minimax-oauth`, `deepseek`, `nvidia`, `xai`, `ollama-cloud`, `bedrock`, `ai-gateway`, `opencode-zen`, `opencode-go`, `kilocode`, `xiaomi`, `arcee`, `gmi`, `alibaba`, `tencent-tokenhub`, `custom`.
Supported providers: `openrouter`, `nous`, `openai-codex`, `copilot`, `copilot-acp`, `anthropic`, `gemini`, `google-gemini-cli`, `qwen-oauth`, `huggingface`, `zai`, `kimi-coding`, `kimi-coding-cn`, `minimax`, `minimax-cn`, `minimax-oauth`, `deepseek`, `nvidia`, `xai`, `ollama-cloud`, `bedrock`, `ai-gateway`, `opencode-zen`, `opencode-go`, `kilocode`, `xiaomi`, `arcee`, `gmi`, `stepfun`, `alibaba`, `tencent-tokenhub`, `custom`.
:::tip
Fallback is configured exclusively through `config.yaml` — there are no environment variables for it. For full details on when it triggers, supported providers, and how it interacts with auxiliary tasks and delegation, see [Fallback Providers](/docs/user-guide/features/fallback-providers).

View File

@@ -69,6 +69,10 @@ All variables go in `~/.hermes/.env`. You can also set them with `hermes config
| `DEEPSEEK_BASE_URL` | Custom DeepSeek API base URL |
| `NVIDIA_API_KEY` | NVIDIA NIM API key — Nemotron and open models ([build.nvidia.com](https://build.nvidia.com)) |
| `NVIDIA_BASE_URL` | Override NVIDIA base URL (default: `https://integrate.api.nvidia.com/v1`; set to `http://localhost:8000/v1` for a local NIM endpoint) |
| `GMI_API_KEY` | GMI Cloud API key — open and reasoning models ([inference.gmi.ai](https://inference.gmi.ai)) |
| `GMI_BASE_URL` | Override GMI Cloud base URL (default: `https://api.gmi.ai/v1`) |
| `STEPFUN_API_KEY` | StepFun API key — Step-series models ([platform.stepfun.com](https://platform.stepfun.com)) |
| `STEPFUN_BASE_URL` | Override StepFun base URL (default: `https://api.stepfun.com/v1`) |
| `OLLAMA_API_KEY` | Ollama Cloud API key — managed Ollama catalog without local GPU ([ollama.com/settings/keys](https://ollama.com/settings/keys)) |
| `OLLAMA_BASE_URL` | Override Ollama Cloud base URL (default: `https://ollama.com/v1`) |
| `XAI_API_KEY` | xAI (Grok) API key for chat + TTS ([console.x.ai](https://console.x.ai/)) |
@@ -99,7 +103,7 @@ For native Anthropic auth, Hermes prefers Claude Code's own credential files whe
| Variable | Description |
|----------|-------------|
| `HERMES_INFERENCE_PROVIDER` | Override provider selection: `auto`, `custom`, `openrouter`, `nous`, `openai-codex`, `copilot`, `copilot-acp`, `anthropic`, `huggingface`, `gemini`, `zai`, `kimi-coding`, `kimi-coding-cn`, `minimax`, `minimax-cn`, `minimax-oauth` (browser OAuth login — no API key required; see [MiniMax OAuth guide](../guides/minimax-oauth.md)), `kilocode`, `xiaomi`, `arcee`, `gmi`, `alibaba`, `alibaba-coding-plan` (alias `alibaba_coding`), `deepseek`, `nvidia`, `ollama-cloud`, `xai` (alias `grok`), `google-gemini-cli`, `qwen-oauth`, `bedrock`, `opencode-zen`, `opencode-go`, `ai-gateway`, `tencent-tokenhub` (default: `auto`) |
| `HERMES_INFERENCE_PROVIDER` | Override provider selection: `auto`, `custom`, `openrouter`, `nous`, `openai-codex`, `copilot`, `copilot-acp`, `anthropic`, `huggingface`, `gemini`, `zai`, `kimi-coding`, `kimi-coding-cn`, `minimax`, `minimax-cn`, `minimax-oauth` (browser OAuth login — no API key required; see [MiniMax OAuth guide](../guides/minimax-oauth.md)), `kilocode`, `xiaomi`, `arcee`, `gmi`, `stepfun`, `alibaba`, `alibaba-coding-plan` (alias `alibaba_coding`), `deepseek`, `nvidia`, `ollama-cloud`, `xai` (alias `grok`), `google-gemini-cli`, `qwen-oauth`, `bedrock`, `opencode-zen`, `opencode-go`, `ai-gateway`, `tencent-tokenhub` (default: `auto`) |
| `HERMES_PORTAL_BASE_URL` | Override Nous Portal URL (for development/testing) |
| `NOUS_INFERENCE_BASE_URL` | Override Nous inference API URL |
| `HERMES_NOUS_MIN_KEY_TTL_SECONDS` | Min agent key TTL before re-mint (default: 1800 = 30min) |

View File

@@ -60,6 +60,8 @@ Both `provider` and `model` are **required**. If either is missing, the fallback
| MiniMax (China) | `minimax-cn` | `MINIMAX_CN_API_KEY` |
| DeepSeek | `deepseek` | `DEEPSEEK_API_KEY` |
| NVIDIA NIM | `nvidia` | `NVIDIA_API_KEY` (optional: `NVIDIA_BASE_URL`) |
| GMI Cloud | `gmi` | `GMI_API_KEY` (optional: `GMI_BASE_URL`) |
| StepFun | `stepfun` | `STEPFUN_API_KEY` (optional: `STEPFUN_BASE_URL`) |
| Ollama Cloud | `ollama-cloud` | `OLLAMA_API_KEY` |
| Google Gemini (OAuth) | `google-gemini-cli` | `hermes model` (Google OAuth; optional: `HERMES_GEMINI_PROJECT_ID`) |
| Google AI Studio | `gemini` | `GOOGLE_API_KEY` (alias: `GEMINI_API_KEY`) |

View File

@@ -93,6 +93,8 @@ Project-local plugins under `./.hermes/plugins/` are disabled by default. Enable
## What plugins can do
Every `ctx.*` API below is available inside a plugin's `register(ctx)` function.
| Capability | How |
|-----------|-----|
| Add tools | `ctx.register_tool(name=..., toolset=..., schema=..., handler=...)` |
@@ -104,6 +106,11 @@ Project-local plugins under `./.hermes/plugins/` are disabled by default. Enable
| Bundle skills | `ctx.register_skill(name, path)` — namespaced as `plugin:skill`, loaded via `skill_view("plugin:skill")` |
| Gate on env vars | `requires_env: [API_KEY]` in plugin.yaml — prompted during `hermes plugins install` |
| Distribute via pip | `[project.entry-points."hermes_agent.plugins"]` |
| Register a gateway platform (Discord, Telegram, IRC, …) | `ctx.register_platform(name, label, adapter_factory, check_fn, ...)` — see [Adding Platform Adapters](/docs/developer-guide/adding-platform-adapters) |
| Register an image-generation backend | `ctx.register_image_gen_provider(provider)` — see `plugins/image_gen/openai/` for an example |
| Register a context-compression engine | `ctx.register_context_engine(engine)` — see [Context Engine Plugins](/docs/developer-guide/context-engine-plugin) |
| Register a memory backend | Subclass `MemoryProvider` in `plugins/memory/<name>/__init__.py` — see [Memory Provider Plugins](/docs/developer-guide/memory-provider-plugin) (uses a separate discovery system) |
| Register an inference backend (LLM provider) | `register_provider(ProviderProfile(...))` in `plugins/model-providers/<name>/__init__.py` — see [Model Provider Plugins](/docs/developer-guide/model-provider-plugin) (uses a separate discovery system) |
## Plugin discovery
@@ -117,6 +124,21 @@ Project-local plugins under `./.hermes/plugins/` are disabled by default. Enable
Later sources override earlier ones on name collision, so a user plugin with the same name as a bundled plugin replaces it.
### Plugin sub-categories
Within each source, Hermes also recognizes sub-category directories that route plugins to specialized discovery systems:
| Sub-directory | What it holds | Discovery system |
|---|---|---|
| `plugins/` (root) | General plugins — tools, hooks, slash commands, CLI commands, bundled skills | `PluginManager` (kind: `standalone` or `backend`) |
| `plugins/platforms/<name>/` | Gateway channel adapters (`ctx.register_platform()`) | `PluginManager` (kind: `platform`, one level deeper) |
| `plugins/image_gen/<name>/` | Image-generation backends (`ctx.register_image_gen_provider()`) | `PluginManager` (kind: `backend`, one level deeper) |
| `plugins/memory/<name>/` | Memory providers (subclass `MemoryProvider`) | **Own loader** in `plugins/memory/__init__.py` (kind: `exclusive` — one active at a time) |
| `plugins/context_engine/<name>/` | Context-compression engines (`ctx.register_context_engine()`) | **Own loader** in `plugins/context_engine/__init__.py` (one active at a time) |
| `plugins/model-providers/<name>/` | LLM provider profiles (`register_provider(ProviderProfile(...))`) | **Own loader** in `providers/__init__.py` (lazily scanned on first `get_provider_profile()` call) |
User plugins at `~/.hermes/plugins/model-providers/<name>/` and `~/.hermes/plugins/memory/<name>/` override bundled plugins of the same name — last-writer-wins in `register_provider()` / `register_memory_provider()`. Drop a directory in, and it replaces the built-in without any repo edits.
## Plugins are opt-in
**Every plugin — user-installed, bundled, or pip — is disabled by default.** Discovery finds them (so they show up in `hermes plugins` and `/plugins`), but nothing loads until you add the plugin's name to `plugins.enabled` in `~/.hermes/config.yaml`. This stops anything with hooks or tools from running without your explicit consent.
@@ -163,15 +185,43 @@ Plugins can register callbacks for these lifecycle events. See the **[Event Hook
## Plugin types
Hermes has three kinds of plugins:
Hermes has four kinds of plugins:
| Type | What it does | Selection | Location |
|------|-------------|-----------|----------|
| **General plugins** | Add tools, hooks, slash commands, CLI commands | Multi-select (enable/disable) | `~/.hermes/plugins/` |
| **Memory providers** | Replace or augment built-in memory | Single-select (one active) | `plugins/memory/` |
| **Context engines** | Replace the built-in context compressor | Single-select (one active) | `plugins/context_engine/` |
| **Model providers** | Declare an inference backend (OpenRouter, Anthropic, …) | Multi-register, picked by `--provider` / `config.yaml` | `plugins/model-providers/` |
Memory providers and context engines are **provider plugins** — only one of each type can be active at a time. General plugins can be enabled in any combination.
Memory providers and context engines are **provider plugins** — only one of each type can be active at a time. Model providers are also plugins, but many load simultaneously; the user picks one at a time via `--provider` or `config.yaml`. General plugins can be enabled in any combination.
## Pluggable interfaces — where to go for each
The table above shows the four plugin categories, but within "General plugins" the `PluginContext` exposes several distinct extension points — and Hermes also accepts extensions outside the Python plugin system (config-driven backends, shell-hooked commands, external servers, etc.). Use this table to find the right doc for what you want to build:
| Want to add… | How | Authoring guide |
|---|---|---|
| A **tool** the LLM can call | Python plugin — `ctx.register_tool()` | [Build a Hermes Plugin](/docs/guides/build-a-hermes-plugin) · [Adding Tools](/docs/developer-guide/adding-tools) |
| A **lifecycle hook** (pre/post LLM, session start/end, tool filter) | Python plugin — `ctx.register_hook()` | [Hooks reference](/docs/user-guide/features/hooks) · [Build a Hermes Plugin](/docs/guides/build-a-hermes-plugin) |
| A **slash command** for the CLI / gateway | Python plugin — `ctx.register_command()` | [Build a Hermes Plugin](/docs/guides/build-a-hermes-plugin) · [Extending the CLI](/docs/developer-guide/extending-the-cli) |
| A **subcommand** for `hermes <thing>` | Python plugin — `ctx.register_cli_command()` | [Extending the CLI](/docs/developer-guide/extending-the-cli) |
| A bundled **skill** that your plugin ships | Python plugin — `ctx.register_skill()` | [Creating Skills](/docs/developer-guide/creating-skills) |
| An **inference backend** (LLM provider: OpenAI-compat, Codex, Anthropic-Messages, Bedrock) | Provider plugin — `register_provider(ProviderProfile(...))` in `plugins/model-providers/<name>/` | **[Model Provider Plugins](/docs/developer-guide/model-provider-plugin)** · [Adding Providers](/docs/developer-guide/adding-providers) |
| A **gateway channel** (Discord / Telegram / IRC / Teams / etc.) | Platform plugin — `ctx.register_platform()` in `plugins/platforms/<name>/` | [Adding Platform Adapters](/docs/developer-guide/adding-platform-adapters) |
| A **memory backend** (Honcho, Mem0, Supermemory, …) | Memory plugin — subclass `MemoryProvider` in `plugins/memory/<name>/` | [Memory Provider Plugins](/docs/developer-guide/memory-provider-plugin) |
| A **context-compression strategy** | Context-engine plugin — `ctx.register_context_engine()` | [Context Engine Plugins](/docs/developer-guide/context-engine-plugin) |
| An **image-generation backend** (DALL·E, SDXL, …) | Backend plugin — `ctx.register_image_gen_provider()` | See bundled examples in `plugins/image_gen/openai/` and `plugins/image_gen/xai/` |
| A **TTS backend** (any CLI — Piper, VoxCPM, Kokoro, xtts, voice-cloning scripts, …) | Config-driven — declare under `tts.providers.<name>` with `type: command` in `config.yaml` | [TTS setup](/docs/user-guide/features/tts#custom-command-providers) |
| An **STT backend** (custom whisper binary, local ASR CLI) | Config-driven — set `HERMES_LOCAL_STT_COMMAND` env var to a shell template | [Voice Message Transcription (STT)](/docs/user-guide/features/tts#voice-message-transcription-stt) |
| **External tools via MCP** (filesystem, GitHub, Linear, Notion, any MCP server) | Config-driven — declare `mcp_servers.<name>` with `command:` / `url:` in `config.yaml`. Hermes auto-discovers the server's tools and registers them alongside built-ins. | [MCP](/docs/user-guide/features/mcp) |
| **Additional skill sources** (custom GitHub repos, private skill indexes) | CLI — `hermes skills tap add <repo>` | [Skills Hub](/docs/user-guide/features/skills#skills-hub) |
| **Gateway event hooks** (fire on `gateway:startup`, `session:start`, `agent:end`, `command:*`) | Drop `HOOK.yaml` + `handler.py` into `~/.hermes/hooks/<name>/` | [Event Hooks](/docs/user-guide/features/hooks#gateway-event-hooks) |
| **Shell hooks** (run a shell command on events — notifications, audit logs, desktop alerts) | Config-driven — declare under `hooks:` in `config.yaml` | [Shell Hooks](/docs/user-guide/features/hooks#shell-hooks) |
:::note
Not everything is a Python plugin. Some extension surfaces intentionally use **config-driven shell commands** (TTS, STT, shell hooks) so any CLI you already have becomes a plugin without writing Python. Others are **external servers** (MCP) the agent connects to and auto-registers tools from. And some are **drop-in directories** (gateway hooks) with their own manifest format. Pick the right surface for the integration style that fits your use case; the authoring guides in the table above each cover placeholders, discovery, and examples.
:::
## NixOS declarative plugins

View File

@@ -209,6 +209,7 @@ const sidebars: SidebarsConfig = {
'developer-guide/adding-platform-adapters',
'developer-guide/memory-provider-plugin',
'developer-guide/context-engine-plugin',
'developer-guide/model-provider-plugin',
'developer-guide/creating-skills',
'developer-guide/extending-the-cli',
],