When a Codex Responses API response contains only reasoning items
(encrypted thinking state) with no message text or tool calls, the
_normalize_codex_response method was setting finish_reason='stop'.
This sent the response into the empty-content retry loop, which
burned 3 retries and then failed — exactly the pattern Nester
reported in Discord.
Two fixes:
1. _normalize_codex_response: reasoning-only responses (reasoning_items_raw
non-empty but no final_text) now get finish_reason='incomplete', routing
them to the Codex continuation path instead of the retry loop.
2. Incomplete handling: also checks for codex_reasoning_items when deciding
whether to preserve an interim message, so encrypted reasoning state is
not silently dropped when there is no visible reasoning text.
Adds 4 regression tests covering:
- Unit: reasoning-only → incomplete, reasoning+content → stop
- E2E: reasoning-only → continuation → final answer succeeds
- E2E: encrypted reasoning items preserved in interim messages
Add first-class GitHub Copilot and Copilot ACP provider support across
model selection, runtime provider resolution, CLI sessions, delegated
subagents, cron jobs, and the Telegram gateway.
This also normalizes Copilot model catalogs and API modes, introduces a
Copilot ACP OpenAI-compatible shim, and fixes service-mode auth by
resolving Homebrew-installed gh binaries under launchd.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add HERMES_API_MODE env var and model.api_mode config field to let
custom OpenAI-compatible endpoints opt into codex_responses mode
without requiring the OpenAI Codex OAuth provider path.
- _get_configured_api_mode() reads HERMES_API_MODE env (precedence)
then model.api_mode from config.yaml; validates against whitelist
- Applied in both _resolve_openrouter_runtime() and
_resolve_named_custom_runtime() (original PR only covered openrouter)
- Fix _dump_api_request_debug() to show /responses URL when in
codex_responses mode instead of always showing /chat/completions
- Tests for config override, env override, invalid values, named
custom providers, and debug dump URL for both API modes
Inspired by PR #1041 by @mxyhi.
Co-authored-by: mxyhi <mxyhi@users.noreply.github.com>
Adds tool_choice, parallel_tool_calls, and prompt_cache_key to the
Codex Responses API request kwargs — matching what the official Codex
CLI sends.
- tool_choice: 'auto' — enables the model to proactively call tools.
Without this, the model may default to not using tools, which explains
reports of the agent claiming it lacks shell access (#747).
- parallel_tool_calls: True — allows the model to issue multiple tool
calls in a single turn for efficiency.
- prompt_cache_key: session_id — enables server-side prompt caching
across turns in the same session, reducing latency and cost.
Refs #747
- Enhanced Codex model discovery by fetching available models from the API, with fallback to local cache and defaults.
- Updated the context compressor's summary target tokens to 2500 for improved performance.
- Added external credential detection for Codex CLI to streamline authentication.
- Refactored various components to ensure consistent handling of authentication and model selection across the application.