mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-28 23:11:37 +08:00
Azure Foundry deploys GPT-5.x, codex-*, and o1/o3/o4 reasoning models as Responses-API-only. Calling /chat/completions against these deployments returns 400 'The requested operation is unsupported.', which broke any user who ran 'hermes model' on Azure, picked a gpt-5/codex deployment, and kept the default api_mode: chat_completions. Verified in a user debug bundle on 2026-04-26: gpt-5.3-codex failed on synopsisse.openai.azure.com with that exact payload while gpt-4o-pure on the same endpoint worked. Adds azure_foundry_model_api_mode(model_name) that returns codex_responses when the model name starts with gpt-5, codex, o1, o3, or o4 — otherwise None so chat_completions / anthropic_messages stay untouched for gpt-4o, Llama, Claude-via-Anthropic, etc. Resolver (both the direct Azure Foundry path and the pool-entry path) consults it and upgrades api_mode unless the user explicitly picked anthropic_messages. target_model (from /model mid-session switch) takes precedence over the persisted default so switching from gpt-4o to gpt-5.3-codex routes correctly before the next request. Docs: correct the azure-foundry guide which previously claimed Azure keeps gpt-5.x on chat completions — that was only true for early Azure OpenAI, not Azure Foundry codex/o-series deployments. Tests: 14 unit tests for azure_foundry_model_api_mode + 6 integration tests in TestAzureFoundryResolution covering Bob's exact scenario, target_model override, anthropic_messages guard, and o3-mini.