hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-04-28 06:51:16 +08:00

Author	SHA1	Message	Date
Teknium	1e37ddc929	feat(cli): add 'hermes fallback' command to manage fallback providers (#16052 ) Manage the fallback_providers chain from the CLI instead of hand-editing config.yaml. The picker reuses select_provider_and_model() from 'hermes model' — same provider list, same credential prompts, same model picker. hermes fallback [list] Show the current chain (primary + fallbacks) hermes fallback add Run the model picker, append selection to chain hermes fallback remove Pick an entry to delete (arrow-key menu) hermes fallback clear Remove all entries (with confirmation) 'add' snapshots config['model'] before calling the picker, extracts the user's selection from the post-picker state, then restores the primary and appends {provider, model, base_url?, api_mode?} to fallback_providers. Auth store's active_provider is snapshot/restored too so OAuth-provider fallbacks don't silently deactivate the user's primary. Duplicates and self-as-fallback are rejected. Legacy single-dict 'fallback_model' entries are auto-migrated to the list format on first write.	2026-04-26 06:19:04 -07:00
Teknium	83c1c201f6	feat(onboarding): contextual first-touch hints for /busy and /verbose (#16046 ) Instead of a blocking first-run questionnaire, show a one-time hint the first time the user hits each behavior fork: 1. First message while the agent is working — appends a hint to the busy-ack explaining the /busy queue vs /busy interrupt knob, phrased to match the mode that was just applied (don't tell a queue-mode user to switch to queue). 2. First tool that runs for >= 30s in the noisiest progress mode (tool_progress: all) — prints a hint about /verbose to cycle display modes (all -> new -> off -> verbose). Gated on /verbose actually being usable on the surface: always shown on CLI; on gateway only shown when display.tool_progress_command is enabled. Each hint is latched in config.yaml under onboarding.seen.<flag>, so it fires exactly once per install across CLI, gateway, and cron, then never again. Users can wipe the section to re-see hints. New: - agent/onboarding.py — is_seen / mark_seen / hint strings, shared by both CLI and gateway. - onboarding.seen in DEFAULT_CONFIG (hermes_cli/config.py) and in load_cli_config defaults (cli.py). No _config_version bump — deep merge handles new keys. Wired: - gateway/run.py: _handle_active_session_busy_message appends the hint after building the ack. progress_callback tracks tool.completed duration and queues the tool-progress hint into the progress bubble. - cli.py: CLI input loop appends the busy-input hint on the first busy Enter; _on_tool_progress appends the tool-progress hint on the first >=30s tool completion. In-memory CLI_CONFIG is also updated so subsequent fires in the same process are suppressed immediately. All writes go through atomic_yaml_write and are wrapped in try/except so onboarding can never break the input/busy-ack paths.	2026-04-26 06:06:27 -07:00
Teknium	4bda9dcade	fix(gateway): honor voice.auto_tts config in auto-TTS gate (#16007 ) (#16039 ) The base adapter's auto-TTS path fired on any voice message unless the chat had explicitly run /voice off — it never read voice.auto_tts from config.yaml, so users who set auto_tts: false still got audio replies. Gate the base adapter on a three-layer decision instead: 1. chat in _auto_tts_enabled_chats (explicit /voice on\|tts) → fire 2. chat in _auto_tts_disabled_chats (explicit /voice off) → suppress 3. else → voice.auto_tts global default Runner now pushes voice.auto_tts onto the adapter as _auto_tts_default and mirrors /voice on\|tts chats into _auto_tts_enabled_chats via the existing _sync_voice_mode_state_to_adapter path. /voice off still wins. Closes #16007.	2026-04-26 05:52:05 -07:00
Teknium	67dcace412	docs(config): show options in comments for display settings (#16038 ) Users who run `hermes setup` get `cli-config.yaml.example` copied verbatim (including comments) to ~/.hermes/config.yaml. But several display settings had thin comments that didn't enumerate the valid options, so users couldn't tell from reading their config what values each key accepts. - busy_input_mode: widen from 'CLI' to 'CLI and gateway platforms'; note /stop as gateway equivalent of Ctrl+C; add /busy_input_mode runtime hint - compact, interim_assistant_messages, bell_on_complete, show_reasoning, streaming: add true/false option lines showing effect of each value - skin: refresh the built-in skin list (was missing daylight, warm-lightmode, poseidon, sisyphus, charizard — 5 of 9 built-ins undocumented)	2026-04-26 05:51:37 -07:00
Teknium	35c57cc46b	fix(gateway): suppress tool-progress bubbles after interrupt (#16034 ) When the LLM response carries N parallel tool calls, the agent fires N tool.started events back-to-back before its interrupt check runs. A user sending /stop mid-batch would see the '⚡ Interrupting current task' ack followed by a trail of 🔍 web_search bubbles for the remaining events in the batch — making the interrupt feel ignored. progress_callback and the drain loop in send_progress_messages now check agent.is_interrupted (via agent_holder[0], the existing cross-scope handle). Events that arrive after interrupt are dropped at both the queueing and rendering stages. The '⚡ Interrupting' message is sent through a separate adapter path and is unaffected.	2026-04-26 05:47:37 -07:00
Teknium	e8441c4c0f	fix(clipboard): report native/tmux success, keep Ctrl+Shift+C on dashboard Follow-up on #16020 salvage. Three corrections: 1. Truth signal for /copy Before: success was 'OSC 52 sequence was emitted to stdout'. That's false on local Linux inside tmux (emitSequence=false), so /copy kept printing 'clipboard copy failed' to users whose xclip/wl-copy had already succeeded fire-and-forget. Fix: setClipboard() now returns { sequence, success } where success = native-fired OR tmux-buffer-loaded OR osc52-emitted. copyNative() returns a boolean telling setClipboard whether a native attempt was made. /copy only shows 'failed' when literally no path was taken. 2. Dashboard keybinding Before: Ctrl+C for copy on non-Mac (Ctrl+Shift+C for paste). That swallows SIGINT when a stale selection is present and breaks the xterm/gnome-terminal/konsole/Windows-Terminal convention where Ctrl+C in a terminal emulator is always SIGINT. The real bug was that clipboard writes lost user-gesture through OSC-52 round-trips, which the direct writeText already fixes. Fix: revert copyModifier to Ctrl+Shift+C on non-Mac. Direct writeText in the keydown handler preserves user gesture. term.write Escape replaced with term.clearSelection() (works without relying on TUI input mode). 3. Error toast text Before: 'see HERMES_TUI_DEBUG_CLIPBOARD' — tells users how to debug but not how to fix. Fix: point users at HERMES_TUI_FORCE_OSC52=1 first (the actual escape hatch), mention the debug var second.	2026-04-26 05:46:45 -07:00
Harry Riddle	2511207cb0	chore: revert docs	2026-04-26 05:46:45 -07:00
Harry Riddle	0f3a6f0fb3	fix(clipboard): dashboard Ctrl+C direct copy; TUI honest feedback; HERMES_TUI_FORCE_OSC52 - Dashboard copy: direct Clipboard API on Ctrl+C/Cmd+C (user gesture); send Escape to TUI to clear selection; Ctrl+Shift+C kept as fallback. - TUI /copy: copySelection() async; only reports success if OSC52 emitted. - Add HERMES_TUI_FORCE_OSC52 env var to override native-tool detection. - Fixes "copied N chars" false-positive when clipboard backend absent. Changes: web/src/pages/ChatPage.tsx — direct navigator.clipboard.writeText ui-tui/packages/hermes-ink/src/ink/ink.tsx — async copySelection ui-tui/packages/hermes-ink/src/ink/termio/osc.ts — HERMES_TUI_FORCE_OSC52 ui-tui/src/app/slash/commands/core.ts — async /copy with honest feedback	2026-04-26 05:46:45 -07:00
Harry Riddle	a562420383	fix(tui): robust clipboard handling with debug logging and headless detection Problem: Ctrl+C in Hermes TUI shows 'copied' but clipboard often empty. Root causes: - Native Linux tools (xclip, wl-copy) require DISPLAY/WAYLAND_DISPLAY; in headless Docker/SSH they fail or hang. - OSC 52 fallback requires terminal emulator support; when absent, sequence is dropped silently. - Dashboard OSC 52 → Clipboard API path fails due to missing user gesture; errors were silently caught. - User feedback 'copied selection' was shown unconditionally, regardless of success. Solution implemented: - Short-circuit Linux native clipboard probing when no display server is present (no DISPLAY and no WAYLAND_DISPLAY). Avoids futile attempts and timeouts. - Add HERMES_TUI_DEBUG_CLIPBOARD env var (1/true). When set, TUI logs to stderr which clipboard path is used, probe results on Linux, and whether OSC 52 was emitted. Greatly improves diagnosability. - Improve dashboard clipboard error handling: replace empty catch blocks with console.warn messages for OSC 52 decode/Write failures and direct copy/paste errors. Makes browser permission/user-gesture failures visible in DevTools. - Add comprehensive clipboard troubleshooting documentation to README and AGENTS, covering OSC 52 verification, tmux config, Docker/headless constraints, env vars, dashboard caveats, and fallback strategies. Technical details: - in ui-tui/packages/hermes-ink/src/ink/termio/osc.ts: - Early return on Linux if both DISPLAY and WAYLAND_DISPLAY unset. - Refactor probe sequence to async with 500ms timeout, caching result; subsequent copies use cached tool immediately. - Emit debug logs when HERMES_TUI_DEBUG_CLIPBOARD=1. - in ink.tsx: log when OSC 52 not emitted (native or tmux path in use) in debug mode. - : OSC 52 handler and Ctrl+Shift+C handler now log warnings to console on Clipboard API rejection with error message. - Documentation: new 'Clipboard Troubleshooting' section in README; new 'Clipboard environment variables and pitfalls' subsection in AGENTS.md (Known Pitfalls). Tests: full ui-tui test suite (292 tests) passes; clipboard and OSC tests unaffected. No breaking changes. Files changed: - ui-tui/packages/hermes-ink/src/ink/termio/osc.ts - ui-tui/packages/hermes-ink/src/ink/ink.tsx - web/src/pages/ChatPage.tsx - README.md - AGENTS.md - CHANGELOG.md (new)	2026-04-26 05:46:45 -07:00
Teknium	855366909f	feat(models): remote model catalog manifest for OpenRouter + Nous Portal (#16033 ) OpenRouter and Nous Portal curated picker lists now resolve via a JSON manifest served by the docs site, falling back to the in-repo snapshot when unreachable. Lets us update model lists without shipping a release. Live URL: https://hermes-agent.nousresearch.com/docs/api/model-catalog.json (source at website/static/api/model-catalog.json; auto-deploys via the existing deploy-site.yml GitHub Pages pipeline on every merge to main). Schema (v1) carries id + optional description + free-form metadata at manifest, provider, and model levels. Pricing and context length stay live-fetched via existing machinery (/v1/models endpoints, models.dev). Config (new model_catalog section, default enabled): model_catalog.url master manifest URL model_catalog.ttl_hours disk cache TTL (default 24h) model_catalog.providers.<name>.url optional per-provider override Fetch pipeline: in-process cache -> disk cache (fresh < TTL) -> HTTP fetch -> disk-cache-on-failure fallback -> in-repo snapshot as last resort. Never raises to callers; at worst returns the bundled list. Changes: - website/static/api/model-catalog.json initial manifest (35 OR + 31 Nous) - scripts/build_model_catalog.py regenerator from in-repo lists - hermes_cli/model_catalog.py fetch + validate + cache module - hermes_cli/models.py fetch_openrouter_models() + new get_curated_nous_model_ids() - hermes_cli/main.py, hermes_cli/auth.py Nous flows use the helper - hermes_cli/config.py model_catalog defaults - website/docs/reference/model-catalog.md + sidebars.ts - tests/hermes_cli/test_model_catalog.py 21 tests (validation, fetch success/failure, accessors, disabled, overrides, integration)	2026-04-26 05:46:43 -07:00
Teknium	d09ab8ff13	fix(mcp-oauth): preserve server_url path for protected-resource validation (#16031 ) Stop pre-stripping the path from the configured MCP server URL before constructing OAuthClientProvider. The MCP SDK strips the path itself via OAuthContext.get_authorization_base_url() for authorization-server discovery, but uses the full server_url through resource_url_from_server_url() + check_resource_allowed() to validate against the server's RFC 9728 Protected Resource Metadata. For servers whose PRM advertises a path-scoped resource (e.g. Notion's https://mcp.notion.com/mcp), our _parse_base_url() collapsed the URL to the origin, so check_resource_allowed() saw requested='/' vs configured='/mcp/' and refused the token. Fixes OAuth against Notion MCP (and any other path-scoped resource). Closes #16015.	2026-04-26 05:43:54 -07:00
Teknium	438db0c7b0	fix(cli): /model picker honors provider-specific context caps (#16030 ) `_apply_model_switch_result` (the interactive `/model` picker's confirmation path) printed `ModelInfo.context_window` straight from models.dev, which reports the vendor-wide value (1.05M for gpt-5.5 on openai). ChatGPT Codex OAuth caps the same slug at 272K, so the picker showed 1M while the runtime (compressor, gateway `/model`, typed `/model <name>`) correctly used 272K — the classic 'sometimes 1M, sometimes 272K' mismatch on a single model. Both display paths now go through `resolve_display_context_length()`, matching the fix that `_handle_model_switch` received earlier. Also bump the stale last-resort fallback in DEFAULT_CONTEXT_LENGTHS (`gpt-5.5: 400000 -> 1050000`) to match the real OpenAI API value; the 272K Codex cap is already enforced via the Codex-OAuth branch, so the fallback now reflects what every non-Codex probe-miss should see. Tests: adds `test_apply_model_switch_result_context.py` with three scenarios (Codex cap wins, OpenRouter shows 1.05M, resolver-empty falls back to ModelInfo). Updates the existing non-Codex fallback test to assert 1.05M (the correct value). ## Validation \| path \| before \| after \| \|-------------------------------\|-----------\|-----------\| \| picker -> gpt-5.5 on Codex \| 1,050,000 \| 272,000 \| \| picker -> gpt-5.5 on OpenAI \| 1,050,000 \| 1,050,000 \| \| picker -> gpt-5.5 on OpenRouter \| 1,050,000 \| 1,050,000 \| \| typed /model gpt-5.5 on Codex \| 272,000 \| 272,000 \|	2026-04-26 05:43:31 -07:00
zkl	2ccdadcca6	fix(deepseek): bump V4 family context window to 1M tokens #14934 added deepseek-v4-pro / deepseek-v4-flash to the DeepSeek native provider but the context-window lookup still falls back to the existing "deepseek" substring entry (128K). DeepSeek V4 ships with a 1M context window, so any caller relying on get_model_context_length() for pre-flight token budgeting (compression, context warnings) under-counts by ~8x. Add explicit lowercase entries for the four DeepSeek model ids that ship 1M context: - deepseek-v4-pro - deepseek-v4-flash - deepseek-chat (legacy alias, server-side maps to v4-flash non-thinking) - deepseek-reasoner (legacy alias, server-side maps to v4-flash thinking) Longest-key-first substring matching means these explicit entries also cover the vendor-prefixed forms (deepseek/deepseek-v4-pro on OpenRouter and Nous Portal) without regressing the existing 128K fallback for older / unknown DeepSeek model ids on custom endpoints. Source: https://api-docs.deepseek.com/zh-cn/quick_start/pricing	2026-04-26 05:32:54 -07:00
Teknium	76042f5867	feat(review): class-first skill review prompt (#16026 ) The background skill-review prompt (spawned after N user turns) now instructs the reviewer to SURVEY existing skills first, identify the CLASS of task, and PREFER updating/generalizing an existing skill over creating a new narrow one. This reduces near-duplicate skill accumulation at the source. Catches the common failure mode where repeated tasks of the same class each spawn their own specific skill ("fix-my-tauri-error", "fix-my-electron-error") instead of a single class-level skill ("desktop-app-build-troubleshooting"). Applied to both _SKILL_REVIEW_PROMPT and the Skills half of _COMBINED_REVIEW_PROMPT. Memory-only review prompt unchanged. Groundwork for the Curator feature (issue #7816) — the creation-side fix. Curator handles the retirement/consolidation side in a follow-up PR. Tests assert the behavioral instructions are present (survey, class, update- over-create, overlap-flagging, opt-out clause) rather than snapshotting the full prompt text.	2026-04-26 05:17:10 -07:00
Teknium	192e7eb21f	fix(nous): don't trip cross-session rate breaker on upstream-capacity 429s (#15898 ) Nous Portal multiplexes multiple upstream providers (DeepSeek, Kimi, MiMo, Hermes) behind one endpoint. Before this fix, any 429 on any of those models recorded a cross-session file breaker that blocked EVERY model on Nous for the cooldown window -- even though the caller's own RPM/RPH/TPM/TPH buckets were healthy. Users hit a DeepSeek V4 Pro capacity error, restarted, switched to Kimi 2.6, and still got 'Nous Portal rate limit active -- resets in 46m 53s'. Nous already emits the full x-ratelimit-* header suite on every response (captured by rate_limit_tracker into agent._rate_limit_state). We now gate the breaker on that data: trip it only when either the 429's own headers or the last-known-good state show a bucket with remaining == 0 AND a reset window >= 60s. Upstream-capacity 429s (healthy buckets everywhere, but upstream out of capacity) fall through to normal retry/fallback and the breaker is never written. Note: the in-memory 'restart TUI/gateway to clear' workaround circulated in Discord does NOT work -- the breaker is file-backed at ~/.hermes/rate_limits/nous.json. The workaround for users still affected by a bad state file is to delete it. Reported in Discord by CrazyDok1 and KYSIV (Apr 2026).	2026-04-26 04:53:42 -07:00
Brooklyn Nicholson	d91e24547c	fix(tui): attach inline diffs to tool timeline	2026-04-26 05:17:26 -05:00
Brooklyn Nicholson	05dc2eec36	fix(tui): tighten timeline detail spacing	2026-04-26 05:13:21 -05:00
Brooklyn Nicholson	2e6c3c7d23	fix(tui): address follow-up review nits	2026-04-26 05:06:57 -05:00
Brooklyn Nicholson	a0aebad673	fix(tui): anchor details to stream timeline	2026-04-26 04:59:44 -05:00
Brooklyn Nicholson	7143d22a83	fix(tui): keep queued sends in queue UI	2026-04-26 04:49:56 -05:00
Brooklyn Nicholson	5ac4088856	fix(tui): keep live progress visible while scrolling	2026-04-26 04:46:44 -05:00
Brooklyn Nicholson	e16e196c7e	fix(tui): keep selection drag responsive	2026-04-26 04:44:19 -05:00
Brooklyn Nicholson	7d68ea9501	fix(tui): stream legacy thinking deltas visibly	2026-04-26 04:42:04 -05:00
Brooklyn Nicholson	bc17310442	fix(tui): smooth selection drag behavior	2026-04-26 04:39:25 -05:00
Brooklyn Nicholson	8f0fa0836f	fix(tui): preserve composer width on narrow panes	2026-04-26 04:35:54 -05:00
Brooklyn Nicholson	bbd950efcf	fix(tui): keep stream cadence responsive while typing	2026-04-26 04:32:55 -05:00
Brooklyn Nicholson	381121025e	fix(tui): address review feedback	2026-04-26 04:28:55 -05:00
Brooklyn Nicholson	355e0ae960	fix(tui): keep streaming progress stable during interaction	2026-04-26 04:23:57 -05:00
Brooklyn Nicholson	1c964ed43f	fix(tui): rely on native cursor for input	2026-04-26 03:47:05 -05:00
Brooklyn Nicholson	cd7c5e5606	perf(tui): defer local input render during echo	2026-04-26 03:38:56 -05:00
Brooklyn Nicholson	ee7ef33b02	fix(tui): queue busy submissions gracefully	2026-04-26 03:27:45 -05:00
Brooklyn Nicholson	5cd41d2b3b	perf(tui): widen native input echo	2026-04-26 03:22:50 -05:00
Brooklyn Nicholson	9bb3bc422d	perf(tui): optimistically echo simple input	2026-04-26 03:07:15 -05:00
Brooklyn Nicholson	19d75d1797	perf(tui): coalesce composer echo updates	2026-04-26 02:21:22 -05:00
Brooklyn Nicholson	458ce792d2	fix(tui): persist model switches by default	2026-04-26 02:15:10 -05:00
Brooklyn Nicholson	14fcff60c9	style(tui): apply formatter	2026-04-26 01:48:10 -05:00
Brooklyn Nicholson	db4e4acca0	perf(tui): stabilize long-session scrolling	2026-04-26 01:47:05 -05:00
Teknium	59b56d445c	feat(hooks): add duration_ms to post_tool_call + transform_tool_result (#15429 ) Plugin hooks fired after a tool dispatch now receive an integer duration_ms kwarg measuring how long the tool's registry.dispatch() call took (time.monotonic() before/after). Inspired by Claude Code 2.1.119 which added the same field to PostToolUse hook inputs. Wire points: - model_tools.py: measure dispatch latency, pass duration_ms to invoke_hook("post_tool_call", ...) and invoke_hook("transform_tool_result", ...) - hermes_cli/hooks.py: include duration_ms in the synthetic payload used by 'hermes hooks test' and 'hermes hooks doctor' so shell-hook authors see the same shape at development time as runtime - shell hooks (agent/shell_hooks.py): no code change needed; _serialize_payload already surfaces non-top-level kwargs under payload['extra'], so duration_ms lands at extra.duration_ms for shell-hook scripts Plugin authors can now build latency dashboards, per-tool SLO alerts, and regression canaries without having to wrap every tool manually. Test: tests/test_model_tools.py::test_post_tool_call_receives_non_negative_integer_duration_ms E2E: real PluginManager + dispatch monkey-patched with a 50ms sleep, hook callback observes duration_ms=50 (int). Refs: https://code.claude.com/docs/en/changelog (2.1.119, Apr 23 2026)	2026-04-25 22:13:12 -07:00
Teknium	eb28145f36	feat(approval): hardline blocklist for unrecoverable commands (#15878 ) Adds a floor below --yolo: a tiny set of commands so catastrophic they should never run via the agent, regardless of --yolo, gateway /yolo, approvals.mode=off, or cron approve mode. Opting into yolo is trusting the agent with your files and services — not trusting it to wipe the disk or power the box off. The list is deliberately small (12 patterns), covering only unrecoverable ops: - rm -rf targeting /, /home, /etc, /usr, /var, /boot, /bin, /sbin, /lib, ~, $HOME - mkfs (any variant) - dd + redirection to raw block devices (/dev/sd, /dev/nvme, etc.) - fork bomb - kill -1 / kill -9 -1 - shutdown, reboot, halt, poweroff, init 0/6, telinit 0/6, systemctl poweroff/reboot/halt/kexec Recoverable-but-costly commands (git reset --hard, rm -rf /tmp/x, chmod -R 777, curl \| sh) stay in DANGEROUS_PATTERNS where yolo can still pass them through — that's what yolo is for. Container backends (docker/singularity/modal/daytona) continue to bypass both hardline and dangerous checks, since nothing they do can touch the host. Inspired by Mercury Agent's permission-hardened blocklist.	2026-04-25 22:07:12 -07:00
Teknium	a55de5bcd0	feat(setup): auto-reconfigure on existing installs (#15879 ) Bare `hermes setup` on a returning user now drops straight into the full reconfigure wizard — every prompt shows the current value as its default, press Enter to keep or type a new value to change it. The returning-user menu is gone. Behavior: - First-time user: first-time wizard (unchanged) - Returning user, bare command: full reconfigure wizard (new default) - Returning user, `--quick`: only prompt for missing/unset items - Returning user, one section: `hermes setup model\|terminal\|gateway\|tools\|agent` - `--reconfigure`: preserved as backwards-compat alias (no-op since it's now default) The section functions already used current values as prompt defaults — this change just removes the extra click to get to them. The 'Quick Setup - configure missing items only' menu option is now exposed as the explicit `--quick` flag; it's the narrow case of filling in missing config (e.g. after a partial OpenClaw migration or when a required API key got cleared). Inspired by Mercury Agent's `mercury doctor` UX. Also removes: - RETURNING_USER_MENU_SECTION_KEYS (orphaned constant) - Two returning-user menu tests in test_setup_noninteractive.py (guarding behavior that no longer exists — covered by test_setup_reconfigure.py instead)	2026-04-25 22:02:02 -07:00
brooklyn!	cec0af02ad	Merge pull request #15870 from NousResearch/bb/fix-skills-search fix(tui): restore skills search RPC	2026-04-25 22:13:28 -05:00
Brooklyn Nicholson	91a7a0acbe	fix(tui): restore skills search RPC	2026-04-25 22:11:52 -05:00
Teknium	7c50ed707c	docs(azure-foundry): add provider guide, env vars, release AUTHOR_MAP - New website/docs/guides/azure-foundry.md covering both OpenAI-style and Anthropic-style endpoints, auto-detection behaviour, gpt-5.x routing, /v1 stripping, api-version query forwarding, and the provider: anthropic + Azure URL alternative setup. - environment-variables.md picks up AZURE_FOUNDRY_API_KEY, AZURE_FOUNDRY_BASE_URL, AZURE_ANTHROPIC_KEY. - cli-commands.md includes azure-foundry in the provider choices list. - configuration.md lists azure-foundry among auxiliary-task providers. - sidebars.ts wires the new guide into the Guides section. - scripts/release.py AUTHOR_MAP entries for TechPrototyper, HangGlidersRule (noreply), and pein892 so the contributor-attribution CI check does not reject the salvage.	2026-04-25 18:48:43 -07:00
Teknium	731e1ef8cb	feat(azure-foundry): auto-detect transport, models, context length The azure-foundry wizard now probes the endpoint before asking the user to pick anything by hand: 1. URL path sniff — endpoints ending in /anthropic are Azure Foundry Claude routes and skip to anthropic_messages. 2. GET <base>/models probe — if the endpoint returns an OpenAI-shaped model list, we switch to chat_completions and prefill the picker with the returned deployment/model IDs. 3. Anthropic Messages probe — fallback for endpoints that don't expose /models but do speak the Anthropic Messages shape. 4. Manual fallback — private endpoints / custom routes still work; the user picks API mode + types a deployment name. Context length for the selected model is resolved through the existing agent.model_metadata.get_model_context_length chain (models.dev, provider metadata, hardcoded family fallbacks) and stored in model.context_length when a non-default value is found. Also refactors runtime_provider so Azure Foundry resolution is reused between the explicit-credentials path and the default top-level path — previously the /v1 strip for Anthropic-style Azure only ran when the caller passed explicit_* args, which meant config-driven sessions hit a double-/v1 URL. New module hermes_cli/azure_detect.py with 19 unit tests covering: - path sniff, model ID extraction, probe fallbacks - HTTP error handling (URLError, HTTPError) - context-length lookup passthrough - DEFAULT_FALLBACK_CONTEXT rejection New runtime tests cover: - OpenAI-style Azure Foundry - Anthropic-style Azure Foundry with /v1 stripping - Missing base_url / API key raising AuthError Rationale: Microsoft confirms there's no pure-API-key endpoint to list Azure deployments (that requires ARM management auth). The v1 Azure OpenAI endpoint does expose /models with the resource's available model catalog, which is good enough for picker prefill in the common case. Users on private/gated endpoints fall through to manual entry.	2026-04-25 18:48:43 -07:00
akhater	ac57114284	fix(agent): support Azure OpenAI gpt-5.x on chat/completions endpoint Azure OpenAI exposes an OpenAI-compatible endpoint at `{resource}.openai.azure.com/openai/v1` that accepts the standard `openai` Python client. Two issues prevented gpt-5.x models from working: 1. `_max_tokens_param()` only sent `max_completion_tokens` for `api.openai.com` URLs. Azure also requires `max_completion_tokens` for gpt-5.x models. 2. The `codex_responses` upgrade gate unconditionally upgraded gpt-5.x to Responses API. Azure does NOT support the Responses API — it serves gpt-5.x on the regular `/chat/completions` path, causing a 404. Fix: add `_is_azure_openai_url()` that matches `openai.azure.com` URLs. - `_max_tokens_param()` now returns `max_completion_tokens` for Azure. - The `codex_responses` upgrade gate skips Azure so gpt-5.x stays on `chat_completions` where Azure actually serves it. - The fallback-provider api_mode picker also recognises Azure and stays on chat_completions. - Tests cover max_tokens routing, api_mode behaviour, and URL detection. gpt-4.x models on Azure are unaffected (already used chat_completions + max_tokens, which Azure accepts for those models). Salvage of PR #10086 — rewritten against current main where the codex_responses upgrade gate gained copilot-acp / explicit-api_mode exclusions.	2026-04-25 18:48:43 -07:00
pein892	24b4b24d79	fix: preserve URL query params for Azure OpenAI and custom endpoints Azure OpenAI requires an `api-version` query parameter on every request. When users include it in the base_url (e.g. `?api-version=2025-04-01-preview`), the OpenAI SDK silently drops it during URL construction, causing 404 errors. Extract query params from base_url and pass them via `default_query` so the SDK appends them to every request. This is a generic solution that works for any custom endpoint requiring query parameters, not just Azure. No-op for URLs without query params — fully backward compatible.	2026-04-25 18:48:43 -07:00
HangGlidersRule	c15064fa37	fix: pass api-version as default_query param, not in base_url — SDK was producing malformed URLs like /anthropic?api-version=.../v1/messages	2026-04-25 18:48:43 -07:00
HangGlidersRule	7bfa9442de	fix: skip OAuth token refresh for Azure Anthropic endpoints — prevents ~/.claude/.credentials.json from overwriting Azure key mid-session	2026-04-25 18:48:43 -07:00
HangGlidersRule	d8e4c7214e	fix: Azure Anthropic short-circuit in resolve_runtime_provider — bypass custom runtime when provider=anthropic + azure.com URL	2026-04-25 18:48:43 -07:00
HangGlidersRule	6ef3a47ce5	fix: use Azure API key directly for Azure endpoints, bypass OAuth token priority chain	2026-04-25 18:48:43 -07:00

... 3 4 5 6 7 ...

6194 Commits