mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-01 16:31:56 +08:00
Add MCP sampling/createMessage capability allowing MCP servers to request
LLM completions through the Hermes agent during tool execution. Enables
agent-in-the-loop workflows (data analysis, content generation, decision
making) where servers can leverage the LLM as needed.
Implementation as SamplingHandler class (per-server instance, no globals):
- Text-only sampling: server asks LLM a question, gets text back
- Tool use in sampling: server provides tools, LLM can use them in a
multi-turn loop with configurable max_tool_rounds governance
- Rate limiting (sliding window, configurable max_rpm per server)
- Model resolution (config override > server hint > default)
- Model whitelist (allowed_models per server)
- Token cap (max_tokens_cap per server)
- LLM timeout with asyncio.wait_for
- Credential stripping on responses
- Per-server audit metrics (requests, errors, tokens_used, tool_use_count)
- Configurable log_level for audit verbosity
- Non-blocking: LLM calls offloaded via asyncio.to_thread()
- Proper MCP SDK types: CreateMessageResult for text responses,
CreateMessageResultWithTools + ToolUseContent for tool use responses
- SamplingCapability with SamplingToolsCapability advertised to servers
- Backward compatible: silently disabled if MCP SDK lacks sampling types
Config (all optional, zero breaking changes):
mcp_servers:
my_server:
sampling:
enabled: true # default
model: 'gemini-3-flash'
max_tokens_cap: 4096
timeout: 30
max_rpm: 10
allowed_models: []
max_tool_rounds: 5
log_level: 'info'
Based on the sampling concept from PR #366 by eren-karakus0. Restructured
as a class-based design, fixed critical bugs (wrong return types for tool
use, missing capability advertisement, broken Pydantic validation), and
added tests using real MCP SDK types.
50 new tests, full suite passes (2600 tests).