hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-04-28 06:51:16 +08:00

Author	SHA1	Message	Date
Erosika	894e0b935b	feat(honcho): explain why when honcho_profile returns an empty card Closed PR #5137 addressed the retrieval path (peer cards via get_card() instead of the session-scoped lookup that returned empty for per-session messaging flows) — that architectural fix is already in main as _fetch_peer_card / _fetch_peer_context. What never got fixed is the user-visible side: honcho_profile returning a flat 'No profile facts available yet.' leaves the model to guess at why. The model then often surfaces it to the user as a cryptic error. Adds a diagnostic hint next to the existing 'result' message, enumerating the likely causes in rough order of frequency: 1. Observation disabled for this peer (user_observe_me/others off) 2. Peer card hasn't accumulated yet (fresh peer / dialectic cadence hasn't fired enough turns — cards build over time) 3. Generic fallback: self-hosted Honcho < 3.x lacks peer cards The hint also suggests alternative tools (honcho_reasoning / honcho_search) so the model can route around the empty card rather than giving up. Schema description updated so the model knows the hint field exists and that an empty card is NOT an error state. 7 tests cover the hint paths: warmup, observation-disabled for user + ai, generic fallback, populated card still returns plain result (no hint), alternative-tool suggestion present.	2026-04-27 12:37:33 -07:00
Erosika	5883df5574	fix(honcho): keep legacy schemeless baseUrl configs working The scheme-validation commit (e77a3f2c) was too strict: a user with legacy ''baseUrl: localhost:8000'' (no ''http://'' prefix) in their ''~/.honcho/config.json'' would get ''No API key configured'' from the CLI after that change, even though their setup worked before. urlparse on a schemeless host:port treats the host segment as the scheme and leaves netloc empty, so the http/https check rejected it. Falls back to a lenient check for schemeless strings that look like hosts: contain '.' or ':', aren't a boolean/null literal, aren't pure digits. The SDK still rejects truly malformed URLs at connect time with a clearer error than ours. Three new tests: legacy schemeless hosts accepted; obvious garbage literals (''true'', ''null'', ''12345'') still rejected. Reviewer noted concern #1: schemeless regression for self-hosters with old configs.	2026-04-27 12:37:33 -07:00
Erosika	02ab255a0d	style(honcho): hoist hashlib import; validate baseUrl scheme before 'local' sentinel Two small follow-ups to the PR review: - Hoist hashlib import from _enforce_session_id_limit() to module top. stdlib imports are free after first cache, but keeping all imports at module top matches the rest of the codebase. - _resolve_api_key now URL-parses baseUrl and requires http/https + non-empty netloc before returning the 'local' sentinel. A typo like baseUrl: 'true' (or bare 'localhost') no longer silently passes the credential guard; the CLI correctly reports 'not configured'. Three new tests cover the new validation (garbage strings, non-http schemes, valid https).	2026-04-27 12:37:33 -07:00
Erosika	3b2edb347d	fix(gateway): scrub memory-context leaks from vision auto-analysis output fixes #5719 The auxiliary vision LLM called by gateway._enrich_message_with_vision can echo its injected Honcho system prompt back into the image description. That description gets embedded verbatim into the enriched user message, so recalled memory (personal facts, dialectic output) surfaces into a user-visible bubble. Strips both forms of leak before embedding: - <memory-context>...</memory-context> fenced blocks (sanitize_context) - trailing '## Honcho Context' sections (header + everything after) Plus regression tests: - tests/agent/test_streaming_context_scrubber.py — 13 tests on the stateful scrubber (whole block, split tags, false-positive partial tags, unterminated span, reset, case-insensitivity) - tests/run_agent/test_run_agent_codex_responses.py — 2 new tests on _fire_stream_delta covering the realistic 7-chunk leak scenario and the cross-turn scrubber reset - tests/gateway/test_vision_memory_leak.py — 4 tests covering the vision auto-analysis boundary (clean pass-through, '## Honcho Context' header, fenced block, both patterns together)	2026-04-27 12:37:33 -07:00
twozle	82205276c1	fix(plugins/memory/honcho): default Honcho SDK HTTP timeout to 30s When no explicit timeout is configured (HonchoClientConfig.timeout, honcho.timeout / requestTimeout, or HONCHO_TIMEOUT), get_honcho_client previously constructed the SDK with no timeout kwarg, letting the underlying httpx client hang indefinitely if the Honcho backend became unreachable mid-request. This is a silent-failure hazard on the post-response path of run_conversation: the memory_manager.sync_all() / queue_prefetch_all() calls fire after the agent has already generated its final reply, so a stalled Honcho request blocks run_conversation from returning. The gateway never logs "response ready" and never delivers the response to the platform (Telegram, etc.), even though the text is already saved to the session file. Repro: unplug the network or block app.honcho.dev mid-turn after the model has produced its final message. Without this change, _run_agent never returns. With it, the call aborts after 30s, run_conversation returns, and the gateway delivers the response (Honcho sync failure is logged and swallowed as before). The default applies only when nothing is configured, so any deployment that has explicitly set timeout / HONCHO_TIMEOUT / honcho.timeout / honcho.requestTimeout keeps its existing value. Self-hosted deployments that genuinely need a longer ceiling can still override via any of those knobs.	2026-04-27 12:37:33 -07:00
Alexander Yususpov	36d6b643f6	fix(honcho): CLI credential guard rejects self-hosted baseUrl configs _resolve_api_key() only checks for apiKey / HONCHO_API_KEY, so all CLI subcommands (identity --show, status, migrate, etc.) bail with "No API key configured" on self-hosted instances that use baseUrl without an API key. Return "local" when baseUrl or HONCHO_BASE_URL is set, matching the client.py behavior that already handles this case for the SDK. Tested on: macOS, self-hosted Honcho (Docker, localhost:8000).	2026-04-27 12:37:33 -07:00
HiddenPuppy	5d36871d92	Fix Honcho HOME-aware global config fallback	2026-04-27 12:37:33 -07:00
dontcallmejames	f1ba4014e1	fix: harden memory-context leak boundaries	2026-04-27 12:37:33 -07:00
dontcallmejames	39713ba2ae	fix: strip leaked memory context from commentary	2026-04-27 12:37:33 -07:00
Sanjays2402	cd1c4812ab	fix(honcho): truncate resolve_session_name output to Honcho's 100-char limit (#13868 ) Gateway session keys (Matrix "!room:server" + thread event IDs, Telegram supergroup reply chains, Slack thread IDs with long workspace prefixes) can exceed Honcho's 100-character session ID limit after sanitization. Every Honcho API call for those sessions then 400s with "session_id too long". Add a helper that enforces the 100-char limit after sanitization: short keys (the common case) short-circuit unchanged; over-limit keys keep a prefix and append a deterministic `-<8 hex>` SHA-256 suffix over the original key so two long keys sharing a leading segment can't collide onto the same truncated ID. Adds 7 regression tests in tests/honcho_plugin/test_client.py covering short / exact-limit / long / deterministic / collision-resistant / allowlist-preserving / hash-suffix-present cases.	2026-04-27 12:37:33 -07:00
Brian D. Evans	d03c6fcc45	fix(honcho): pinPeerName opt-in keeps memory unified across platforms (#14984 ) When a gateway drives Hermes (Telegram, Discord, Slack, ...), it passes the platform-native user ID as ``runtime_user_peer_name`` into the Honcho session manager. That ID wins over ``peer_name`` in ``honcho.json``, so a single user who connects over three platforms ends up as three separate Honcho peers — one per platform — with fragmented memory and no cross- platform context continuity. For multi-user bots this is correct (and must not change): each user gets their own peer scope. For the vast majority of personal Hermes deployments the configured ``peer_name`` is an unambiguous identity, though, so the reporter asked for an opt-in knob that pins the user peer to that value. Fix: new ``pinPeerName`` boolean on the host config, default ``false``. When ``true`` AND ``peerName`` is set, the configured peer_name beats the gateway's runtime identity; every other resolution case is unchanged. honcho.json: { "peerName": "Igor", "hosts": { "hermes": { "pinPeerName": true } } } session.py (resolution order, pinned case): runtime_user_peer_name → skipped (opt-in flag active) config.peer_name → WINS "Igor" session-key fallback → unreached Parsing follows the same host-block-overrides-root pattern as every other flag in HonchoClientConfig.from_global_config (``_resolve_bool`` helper). Tests (tests/honcho_plugin/test_pin_peer_name.py — 13 cases, 5 groups): - Config parsing: default, root true, host-block true, host overrides root, explicit false. - Peer resolution: runtime wins by default (regression guard for multi- user bots), config wins when pinned, pin-without-peer_name is a no-op (prevents silent peer-id collapse to session-key fallback), CLI path where runtime is absent, deepest fallback intact, assistant peer untouched by the flag. - Cross-platform unification: Telegram UID + Discord snowflake collapse to one peer when pinned; negative control confirms two distinct runtime IDs still produce two peers when unpinned. 244 honcho_plugin tests pass, 3 pre-existing skips, zero regressions. Defensive detail: session.py uses ``getattr(self._config, "pin_peer_name", False)`` so callers building partial config objects (several test fixtures across the codebase do this) don't break if they haven't updated yet. Runtime cost: one attr lookup per new session. Closes #14984 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 12:37:33 -07:00
Siddharth Balyan	1fa76607c0	feat: trigram FTS5 index for CJK search, replace LIKE fallback (#16651 ) * fix: bypass FTS5 for CJK queries in session_search FTS5 default tokenizer splits CJK characters into individual tokens, so multi-character queries like "大别山项目" become AND of single chars. This produces few/no results compared to LIKE substring search. For CJK queries, skip FTS5 entirely and use LIKE for accurate phrase matching. Fixes NousResearch/hermes-agent#15500 * fix: cache _contains_cjk, escape LIKE wildcards, add regression tests On top of the CJK FTS5 bypass from #15509: - Cache _contains_cjk() result in a local var to avoid redundant O(n) scans on every CJK query - Escape %, _ in LIKE queries so literal wildcards in user input are not treated as SQL wildcards (consistent with other LIKE queries in hermes_state.py that use ESCAPE '\') - Fix misleading comment ('or CJK fallback' → accurate description) - Add 3 regression tests: - test_cjk_partial_fts5_results_supplemented_by_like (#15500 / #14829) - test_cjk_like_dedup_no_duplicates - test_cjk_like_escapes_wildcards (new wildcard escaping) * feat: trigram FTS5 index for CJK search, replace LIKE fallback Replace the LIKE '%query%' full-table-scan fallback for CJK queries with a proper trigram FTS5 index (messages_fts_trigram). The trigram tokenizer creates overlapping 3-byte sequences so substring matching works natively for any script — CJK, Thai, etc. For queries with 3+ CJK characters: uses the trigram FTS5 table with proper ranking, snippets, and indexed lookups. For shorter queries (1-2 CJK chars): falls back to LIKE since the trigram tokenizer needs ≥9 UTF-8 bytes (3 CJK chars) minimum. Schema v10 migration creates the trigram table and backfills existing messages. Triggers keep the index in sync on INSERT/UPDATE/DELETE. Builds on top of #16276 (bypass FTS5 for CJK, escape LIKE wildcards). --------- Co-authored-by: vominh1919 <vominh1919@gmail.com>	2026-04-28 00:12:07 +05:30
brooklyn!	e80504b088	Merge pull request #16656 from NousResearch/bb/tui-parity-mutating-commands fix(tui): route mutating slash commands through live gateway state	2026-04-27 13:30:19 -05:00
kshitijk4poor	56724147ef	fix(providers/gmi): post-salvage review fixes - config.py: remove dead ENV_VARS_BY_VERSION[17] entry (current _config_version is 22, so all users are past version 17 and would never be prompted for GMI_API_KEY on upgrade — consistent with how arcee was added) - auxiliary_client.py: use google/gemini-3.1-flash-lite-preview as GMI aux model instead of anthropic/claude-opus-4.6 (matches cheap fast-model pattern used by all other providers: zai→glm-4.5-flash, kimi→kimi-k2-turbo-preview, stepfun→step-3.5-flash, kilocode→google/gemini-3-flash-preview) - test_gmi_provider.py: fix malformed write_text() call in doctor test (was: write_text("GMI_API_KEY=* encoding="utf-8") → missing closing quote, wrote literal string 'GMI_API_KEY=* encoding=' to .env file) - test_gmi_provider.py + test_auxiliary_client.py: update aux model assertions to match new cheaper default - docs/integrations/providers.md: add 'gmi' to inline 'Supported providers' fallback list (was only in the table, not the inline list at line ~1181) - docs/reference/cli-commands.md: add 'gmi' to --provider choices list	2026-04-27 11:17:59 -07:00
Isaac Huang	c53fcb0173	feat(providers): add GMI Cloud as a first-class API-key provider (#11955 ) Add GMI Cloud (api.gmi-serving.com) as a full first-class API-key provider with built-in auth, aliases, model catalog, CLI entry points, auxiliary client routing, context length resolution, doctor checks, env var tracking, and docs. - auth.py: ProviderConfig for 'gmi' (api_key, GMI_API_KEY / GMI_BASE_URL) - providers.py: HermesOverlay with extra_env_vars for models.dev detection - models.py: curated slash-form model catalog; live /v1/models fetch - main.py: 'gmi' in _named_custom_provider_map and --provider choices - model_metadata.py: _URL_TO_PROVIDER, _PROVIDER_PREFIXES, dedicated context-length probe block (GMI's /models has authoritative data) - auxiliary_client.py: alias entries; _compat_model fix for slash-form models on cached aggregator-style clients; gmi aux default model - doctor.py: GMI in provider connectivity checks - config.py: GMI_API_KEY / GMI_BASE_URL in OPTIONAL_ENV_VARS - conftest.py: explicit GMI_BASE_URL clearing (not caught by _API_KEY suffix) - docs: providers.md, environment-variables.md, fallback-providers.md, configuration.md, quickstart.md (expands provider table) Co-authored-by: Isaac Huang <isaachuang@Isaacs-MacBook-Pro.local>	2026-04-27 11:17:59 -07:00
Brooklyn Nicholson	4f59510dd4	fix(tui): tighten fast-mode support validation Distinguish missing model from unsupported model before enabling fast mode and cover both cases so config and live agent state remain untouched on invalid fast toggles.	2026-04-27 13:00:11 -05:00
Brooklyn Nicholson	4a08f1015a	fix(tui): reject fast mode for unsupported live models Match classic CLI parity by refusing to enable fast mode when the active model cannot produce fast request overrides, avoiding a misleading fast status with no runtime effect.	2026-04-27 12:55:41 -05:00
Brooklyn Nicholson	b8556eb15e	fix(tui): address fast-mode live sync review feedback Make `config.set fast status` read-only and keep live agent request overrides in sync with fast-mode toggles so runtime API kwargs match the selected mode.	2026-04-27 12:47:42 -05:00
Brooklyn Nicholson	a13449a40a	fix(tui): address Copilot review feedback on mutating command parity Harden busy mode config reads against invalid display config shapes and align /fast help+usage text with accepted aliases, with regression coverage for non-dict display values.	2026-04-27 12:30:30 -05:00
Brooklyn Nicholson	a4cb3ef66c	fix(tui): make mutating slash paths native and lifecycle-safe Route /browser, /reload-mcp, /rollback, /stop, /fast, and /busy through direct TUI RPC handlers so state changes hit the live gateway session instead of slash-worker fallback. Add TUI session finalize/reset parity hooks (memory commit + plugin boundaries) and parity matrix tests to keep mutating commands off fallback.	2026-04-27 12:20:08 -05:00
brooklyn!	d5a89283b7	Merge pull request #16625 from NousResearch/bb/fix-tui-title-session-sync fix(tui): keep /title session names in sync	2026-04-27 12:05:54 -05:00
Brooklyn Nicholson	633f74504f	fix(ci): resolve follow-up title edge case and flaky checks Handle queued-title ValueError cleanup during session init, harden Discord message source building for test stubs, and fix the Dockerfile contract test syntax error. Also refresh the TUI lockfile and Nix build flags so nix ubuntu-latest no longer fails on npm lock/peer resolution drift.	2026-04-27 11:49:02 -05:00
Brooklyn Nicholson	27936ee02d	fix(tui-gateway): keep queued user titles from being dropped Retry queued pending titles even when the DB already has a non-empty title so explicit user title intents are not silently lost (for example after auto-title). Includes regression coverage.	2026-04-27 11:31:49 -05:00
Brooklyn Nicholson	3aa86717b6	fix(tui-gateway): harden pending-title retry and user errors Retry persisting queued titles on session.title reads and map title validation failures to a user-facing 4022 code instead of generic 5007.	2026-04-27 11:27:51 -05:00
Brooklyn Nicholson	492c4c6573	fix(tui-gateway): address follow-up Copilot title threads Tighten pending-title flush during session init and treat row lookup failures during title-set no-op detection as RPC errors instead of silently queueing.	2026-04-27 11:15:37 -05:00
Brooklyn Nicholson	3824b03237	fix(tui-gateway): harden session title RPC edge cases Handle session.title read failures without crashing, distinguish no-op title writes from missing session rows, and use a distinct empty-title error code with regression coverage.	2026-04-27 11:05:10 -05:00
Brooklyn Nicholson	42b917c92c	chore: uptick	2026-04-27 08:52:12 -07:00
Brooklyn Nicholson	7ccfb97fee	test(cli): assert active-session file lifecycle in launch_tui Validate that the temp active-session file exists while the TUI subprocess runs and is removed after launch cleanup to match mkstemp semantics.	2026-04-27 08:52:12 -07:00
Brooklyn Nicholson	7a6128cc4f	fix(tui): harden active-session temp file handling - create HERMES_TUI_ACTIVE_SESSION_FILE with mkstemp instead of a predictable tmp path and always cleanup in finally - add assertions that launch wiring uses a randomized session file path and removes it on exit	2026-04-27 08:52:12 -07:00
Brooklyn Nicholson	4b28140912	fix(cli): tighten MRU lookup and session DB cleanup - use a grouped last_active join in search_sessions to avoid per-row correlated max lookups - always close SessionDB in _resolve_last_session via finally and add regression coverage for search failure cleanup	2026-04-27 08:52:12 -07:00
Brooklyn Nicholson	653b5ec128	fix(tui): report actual session on exit	2026-04-27 08:52:12 -07:00
Brooklyn Nicholson	164e33aa46	fix(cli): resolve -c by true MRU session - order session listing by computed last_active in SessionDB so callers get MRU rows directly - keep _resolve_last_session as a single-row lookup and add regression coverage for >20 session sampling	2026-04-27 08:52:12 -07:00
Brooklyn Nicholson	cdfbd89ea5	fix(tui): keep /title session names in sync Route TUI /title through session.title RPC and queue titles when the session DB row is still initializing, so renamed sessions reliably appear in /resume and browse flows.	2026-04-27 10:51:14 -05:00
hermes-agent-dhabibi	aa53fb661a	fix(copilot): mark native image requests as vision Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>	2026-04-27 08:35:50 -07:00
hermes-agent-dhabibi	8402ba150e	fix(copilot): send vision header for Copilot vision requests Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route. Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision. Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>	2026-04-27 08:35:50 -07:00
Brooklyn Nicholson	b479205396	fix(docker): tighten TUI build contract	2026-04-27 10:15:00 -05:00
Brooklyn Nicholson	4424a0e0f7	fix(docker): prebuild TUI assets in image	2026-04-27 10:05:07 -05:00
Teknium	9b55365f6f	fix(gateway,cron): close ephemeral agents + reap stale aux clients (salvage #13979 ) (#16598 ) * fix: clean gateway auxiliary client caches on teardown * fix(gateway): recover from stale pid files and close cron agents Two issues were keeping the gateway from surviving long runs: 1. `_cleanup_invalid_pid_path` delegated to `remove_pid_file`, which refuses to unlink when the file's pid differs from our own. That safety check exists for the --replace atexit handoff, but it also applied to stale-record cleanup, so after a crashy exit the pid file was orphaned: `write_pid_file()`'s O_EXCL create then failed with `FileExistsError`, and systemd looped on "PID file race lost to another gateway instance". Unlink unconditionally from this helper since the caller has already verified the record is dead. 2. The cron scheduler never closed the ephemeral `AIAgent` it creates per tick, and never swept the process-global auxiliary-client cache. Over days of 10-minute ticks this leaked subprocesses and async httpx transports until the gateway hit EMFILE. Release the agent and call `cleanup_stale_async_clients()` in `run_job`'s outer `finally`, matching the gateway's own per-turn cleanup. * chore(release): map bloodcarter@gmail.com -> bloodcarter --------- Co-authored-by: bloodcarter <bloodcarter@gmail.com>	2026-04-27 07:41:42 -07:00
Teknium	817633bc5d	feat(backup): exclude SQLite WAL/SHM/journal sidecars (#16576 ) The backup takes a consistent snapshot of each .db via sqlite3.backup(), so shipping the live .db-wal / .db-shm / .db-journal alongside pairs the fresh snapshot with stale sidecar state and produces a torn restore on first open. Sidecars are transient and SQLite regenerates them on next connection anyway. This also trims multi-MB of junk from every zip — state.db-wal alone was ~9 MB here, doubled by the fact the WAL is the live write-ahead log, not data.	2026-04-27 06:43:52 -07:00
Teknium	008860a23f	fix(approval): close remaining prompt_toolkit deadlock vectors (#15216 ) PR #13734 fixed the concurrent-tool-executor vector (ThreadPoolExecutor workers didn't inherit the CLI's TLS approval callback). Two vectors remained that could still land in the deadlocking input() fallback: 1. _spawn_background_review spawns a raw threading.Thread with no approval callback installed, so any dangerous-command guard the review agent trips falls back to input() -> deadlock against the parent's prompt_toolkit TUI (same class as delegate_task subagents, fixed in `023b1bff1` / #15491). Install a _bg_review_auto_deny callback at thread start, clear on finally. 2. prompt_dangerous_approval's fallback unconditionally spawned a daemon thread calling input() when approval_callback was None. That fallback can never succeed under prompt_toolkit because the user's Enter goes to pt's raw-mode stdin capture. Detect an active pt Application via get_app_or_none() and fail closed (deny + log) instead, so future threads that forget to install a callback degrade gracefully instead of hanging 60s invisibly. Regression guards: - tests/run_agent/test_background_review.py verifies the review worker thread sees a callable auto-deny callback mid-run and that the slot is cleared in the finally block. - tests/tools/test_approval.py TestFailClosedUnderPromptToolkit verifies prompt_dangerous_approval returns 'deny' fast under a mocked pt Application, and that a real callback still wins over the guard.	2026-04-27 06:42:32 -07:00
luyao618	8ad29a938a	fix(agent): restrict background review agent to memory and skills toolsets The background skill/memory review agent was created without toolset restrictions, inheriting the full default tool set. This allowed it to use terminal, send_message, delegate_task, and other tools outside its intended scope, potentially performing unrelated side effects after skill creation. Restrict the review agent to only memory and skills toolsets by passing enabled_toolsets=['memory', 'skills'] during AIAgent construction. Fixes #15204	2026-04-27 06:41:23 -07:00
Teknium	a59a98b180	fix(cli): pass session messages to shutdown_memory_provider (#15165 sibling) The gateway fix in the previous commit forwards _session_messages on gateway session teardown. The CLI exit cleanup path had the same bug: it read getattr(agent, 'conversation_history', None) or [] — but AIAgent has no conversation_history attribute, so providers always received []. Switch to _session_messages (same attribute the gateway now uses), guarded by isinstance(..., list) to preserve the no-arg fallback for MagicMock-based CLI test stubs. Adds tests/cli/test_cli_shutdown_memory_messages.py (4 cases mirroring the gateway suite).	2026-04-27 06:41:16 -07:00
briandevans	500774e30e	fix(gateway): pass session messages to shutdown_memory_provider (#15165 ) ``_cleanup_agent_resources`` previously invoked ``agent.shutdown_memory_provider()`` with no arguments, so every memory provider's ``on_session_end`` hook received an empty list. Providers with an early-return guard on empty input (Holographic, Hindsight) never extracted facts from the conversation, and users hit "抱歉，找不到相關的對話記錄" on the first turn after any gateway restart, session reset, or idle expiry. Forward ``agent._session_messages`` — the transcript the agent itself maintains and refreshes every turn via ``_persist_session`` — so providers see the actual conversation. Falls back to the legacy no-arg call whenever the attribute is absent or not a list (test stubs built via ``object.__new__`` or ``MagicMock``) to preserve backward compatibility with existing suites. ``AIAgent.shutdown_memory_provider`` already accepts ``messages: list = None`` (run_agent.py:4126), so this is a pure caller-side fix. Paths that use ``skip_memory=True`` temporary agents (memory flush, hygiene auto-compress, ``/compress``) are no-ops inside ``shutdown_memory_provider`` because ``self._memory_manager`` is None — no behaviour change for them. Covers Part A of the bug report. Part B (adding ``on_session_end`` to the Hindsight plugin) is a separate concern that would benefit from this fix landing first. Regression test added at ``tests/gateway/test_shutdown_memory_provider_messages.py`` covering: populated messages forwarded, empty list still forwarded, attribute missing falls back, non-list (MagicMock) falls back, provider exceptions don't block ``close()``, None agent no-op, and agent without ``shutdown_memory_provider`` tolerated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 06:41:16 -07:00
Christian Scheid	75b460bc94	fix(email): add required Date header to outbound mail	2026-04-27 06:41:11 -07:00
Teknium	a9033c9220	feat(backup): exclude checkpoints/ from backups (#16572 ) Session-local trajectory cache — keyed by session hash, regenerated per-session, won't port to another machine anyway. On a large install this was multiple GB of pure noise in every zip. Also adds a regression test for the pre-existing backups/ exclusion so the two machine-local dirs share coverage.	2026-04-27 06:40:18 -07:00
Teknium	ea3c5a14c3	feat(update): make pre-update backup opt-in (off by default) (#16566 ) The zip backup could add minutes to every 'hermes update' on large HERMES_HOME directories. Flip the default to off and add a --backup flag for one-off opt-in runs. - updates.pre_update_backup default: True -> False - hermes update: new --backup flag (opposite of existing --no-backup) - Silent no-op when disabled (no message spam on every update) - Existing --no-backup still works and wins over --backup - Users who explicitly set pre_update_backup: true keep the old behavior - Tests updated to cover default-off, --backup opt-in, and config-enabled paths	2026-04-27 06:36:35 -07:00
Teknium	ec671c4154	feat(image-input): native multimodal routing based on model vision capability (#16506 ) * feat(image-input): native multimodal routing based on model vision capability Attach user-sent images as OpenAI-style content parts on the user turn when the active model supports native vision, so vision-capable models see real pixels instead of a lossy text description from vision_analyze. Routing decision (agent/image_routing.py::decide_image_input_mode): agent.image_input_mode = auto \| native \| text (default: auto) In auto mode: - If auxiliary.vision.provider/model is explicitly configured, keep the text pipeline (user paid for a dedicated vision backend). - Else if models.dev reports supports_vision=True for the active provider/model, attach natively. - Else fall back to text (current behaviour). Call sites updated: gateway/run.py (all messaging platforms), tui_gateway (dashboard/Ink), cli.py (interactive /attach + drag-drop). run_agent.py changes: - _prepare_anthropic_messages_for_api now passes image parts through unchanged when the model supports vision — the Anthropic adapter translates them to native image blocks. Previous behaviour (vision_analyze → text) only runs for non-vision Anthropic models. - New _prepare_messages_for_non_vision_model mirrors the same contract for chat.completions and codex_responses paths, so non-vision models on any provider get text-fallback instead of failing at the provider. - New _model_supports_vision() helper reads models.dev caps. vision_analyze description rewritten: positions it as a tool for images NOT already visible in the conversation (URLs, tool output, deeper inspection). Prevents the model from redundantly calling it on images already attached natively. Config default: agent.image_input_mode = auto. Tests: 35 new (test_image_routing.py + test_vision_aware_preprocessing.py), all existing tests that reference _prepare_anthropic_messages_for_api still pass (198 targeted + new tests green). * feat(image-input): size-cap + resize oversized images, charge image tokens in compressor Two follow-ups that make the native image routing safer for long / heavy sessions: 1) Oversize handling in build_native_content_parts: - 20 MB ceiling per image (matches vision_tools._MAX_BASE64_BYTES, the most restrictive provider — Gemini inline data). - Delegates to vision_tools._resize_image_for_vision (Pillow-based, already battle-tested) to downscale to 5 MB first-try. - If Pillow is missing or resize still overshoots, the image is dropped and reported back in skipped[]; caller falls back to text enrichment for that image. 2) Image-token accounting in context_compressor: - New _IMAGE_TOKEN_ESTIMATE = 1600 (matches Claude Code's constant; within the realistic range for Anthropic/GPT-4o/Gemini billing). - _content_length_for_budget() helper: sums text-part lengths and charges _IMAGE_CHAR_EQUIVALENT (1600 * 4 chars) per image/image_url/ input_image part. Base64 payload inside image_url is NOT counted as chars — dimensions don't matter, only image-presence. - Both tail-cut sites (_prune_old_tool_results L527 and _find_tail_cut_by_tokens L1126) now call the helper so multi-image conversations don't slip past compression budget. Tests: 9 new in test_image_routing.py (oversize triggers resize, resize-fails-returns-None, oversize-skipped-reported), 11 new in test_compressor_image_tokens.py (flat charge per image, multiple images, Responses-API / Anthropic-native / OpenAI-chat shapes, no-inflation on raw base64, bounds-check on the constant, integration test that an image-heavy tail actually gets trimmed). * fix(image-input): replace blanket 20MB ceiling with empirically-verified per-provider limits The previous commit imposed a hardcoded 20 MB base64 ceiling on all providers, triggering auto-resize on anything larger. This was wrong in both directions: * Too loose for Anthropic — actual limit is 5 MB (returns HTTP 400 'image exceeds 5 MB maximum' above that). * Too strict for OpenAI / Codex / OpenRouter — accept 49 MB+ without complaint (empirically verified April 2026 with progressive PNG sizes). New behaviour: * _PROVIDER_BASE64_CEILING table: only anthropic and bedrock have a ceiling (5 MB, since bedrock-on-Claude shares Anthropic's decoder). * Providers NOT in the table get no ceiling — images attach at native size and we trust the provider to return its own error if it disagrees. A provider-specific 400 message is clearer than us guessing wrong and silently degrading image quality. * build_native_content_parts() gains a keyword-only provider arg; gateway/CLI/TUI pass the active provider so Anthropic users get auto-resize protection while OpenAI users don't pay it. * Resize target dropped from 5 MB to 4 MB to slide safely under Anthropic's boundary with header overhead. Empirical measurements (direct API, no Hermes in the loop): image b64 anthropic openrouter/gpt5.5 codex-oauth/gpt5.5 0.19 MB ✓ ✓ ✓ 12.37 MB ✗ 400 5MB ✓ ✓ 23.85 MB ✗ 400 5MB ✓ ✓ 49.46 MB ✗ 413 ✓ ✓ Tests: rewrote TestOversizeHandling (5 tests): no-ceiling pass-through, Anthropic resize fires, Anthropic skip on resize-fail, build_native_parts routes ceiling by provider, unknown provider gets no ceiling. All 52 targeted tests pass. * refactor(image-input): attempt native, shrink-and-retry on provider reject Replace proactive per-provider size ceilings with a reactive shrink path on the provider's actual rejection. All providers now attempt native full-size attachment first; if the provider returns an image-too-large error, the agent silently shrinks and retries once. Why the previous design was wrong: hardcoding provider ceilings (anthropic=5MB, others=unlimited) meant OpenAI users on a 10MB image paid no tax, but Anthropic users lost quality on anything >5MB even though the empirical behaviour at provider-reject time is the same (shrink + retry). Baking the table into the routing layer also requires updating Hermes every time a provider's limit changes. Reactive design: - image_routing.py: _file_to_data_url encodes native size, no ceiling. build_native_content_parts drops its provider kwarg. - error_classifier.py: new FailoverReason.image_too_large + pattern match ("image exceeds", "image too large", etc.) checked BEFORE context_overflow so Anthropic's 5MB rejection lands in the right bucket. - run_agent.py: new _try_shrink_image_parts_in_messages walks api messages in-place, re-encodes oversized data: URL image parts through vision_tools._resize_image_for_vision to fit under 4MB, handles both chat.completions (dict image_url) and Responses (string image_url) shapes, ignores http URLs (provider-fetched). New image_shrink_retry_attempted flag in the retry loop fires the shrink exactly once per turn after credential-pool recovery but before auth retries. E2E verified live against Anthropic claude-sonnet-4-6: - 17.9MB PNG (23.9MB b64) attached at native size - Anthropic returns 400 "image exceeds 5 MB maximum" - Agent logs '📐 Image(s) exceeded provider size limit — shrank and retrying...' - Retry succeeds, correct response delivered in 6.8s total. Tests: 12 new (8 shrink-helper shapes + 4 classifier signals), replaces 5 proactive-ceiling tests with 3 simpler 'native attach works' tests. 181 targeted tests pass. test_enum_members_exist in test_error_classifier.py updated for the new enum value.	2026-04-27 06:27:59 -07:00
Teknium	df3c9593f8	feat(plugins): google_meet \u2014 join, transcribe, speak, follow up (#16364 ) * feat(plugins): google_meet — bundled plugin for join+transcribe Meet calls v1 shipping transcribe-only. Spawns headless Chromium via Playwright, joins an explicit https://meet.google.com/ URL, enables live captions, and scrapes them into a transcript file the agent can read across turns. The agent then has the meeting content in context and can do followup work (send recap, file issues, schedule followups) with its regular tools. Surface: - Tools: meet_join, meet_status, meet_transcript, meet_leave, meet_say (meet_say is a v1 stub — returns not-implemented; v2 will wire realtime duplex audio via OpenAI Realtime / Gemini Live + BlackHole / PulseAudio null-sink.) - CLI: hermes meet setup \| auth \| join \| status \| transcript \| stop - Lifecycle: on_session_end auto-leaves any still-running bot. Safety: - URL regex rejects anything that isn't https://meet.google.com/... - No calendar scanning, no auto-dial, no auto-consent announcement. - Single active meeting per install; a second meet_join leaves the first. - Platform-gated to Linux + macOS (Windows audio routing for v2 untested). - Opt-in: standalone plugin, user must add 'google_meet' to plugins.enabled in config.yaml. Zero core changes. Plugin uses existing register_tool / register_cli_command / register_hook surfaces. 21 new unit tests cover the URL safety gate, transcript dedup + status round-trip, process-manager refusals/start/stop paths, tool-handler JSON shape under each branch, session-end cleanup, and platform-gated register(). * feat(plugins/google_meet): v2 realtime audio + v3 remote node host v2 \u2014 agent speaks in-meeting audio_bridge.py: PulseAudio null-sink (Linux) + BlackHole probe (macOS). On Linux we load pactl module-null-sink + module-virtual-source, track module ids for teardown; Chrome gets PULSE_SOURCE=<virt src> env so its fake mic reads what we write to the sink. macOS just probes BlackHole 2ch and returns its device name \u2014 the plugin refuses to switch the user's default audio input (that would surprise them). realtime/openai_client.py: sync WebSocket client for the OpenAI Realtime API. RealtimeSession.speak(text) sends conversation.item.create + response.create, accumulates response.audio.delta PCM bytes, appends them to a file. RealtimeSpeaker runs a JSONL-queue loop consuming meet_say calls. 'websockets' is an optional dep imported lazily. meet_bot.py: when HERMES_MEET_MODE=realtime, provisions AudioBridge, starts RealtimeSession + speaker thread, spawns paplay to pump PCM into the null-sink, then cleans everything up on SIGTERM. If any realtime setup step fails, falls back cleanly to transcribe mode with an error flagged in status.json. process_manager.enqueue_say(): writes a JSONL line to say_queue.jsonl; refuses when no active meeting or active meeting is transcribe-only. tools.meet_say: real implementation; requires active mode='realtime'. meet_join: adds mode='transcribe'\|'realtime' param. v3 \u2014 remote node host node/protocol.py: JSON envelope (type, id, token, payload) + validate. node/registry.py: $HERMES_HOME/workspace/meetings/nodes.json, with resolve() auto-selecting the sole registered node when name is None. node/server.py: NodeServer \u2014 websockets.serve, bearer-token auth, dispatches start_bot/stop/status/transcript/say/ping onto the local process_manager. Token auto-generated + persisted on first run. node/client.py: NodeClient \u2014 short-lived sync WS per RPC, raises RuntimeError on error envelopes, clean API matching the server. node/cli.py: 'hermes meet node {run,list,approve,remove,status,ping}' subtree; wired into the main meet CLI by cli.py so 'hermes meet node' Just Works. tools.py: every meet_* tool accepts node='<name>'\|'auto'; when set, routes through NodeClient to the remote bot instead of running locally. Unknown node \u2192 clear 'no registered meet node matches ...' error. cli.py: 'hermes meet join --node my-mac --mode realtime' and 'hermes meet say "..." --node my-mac' route to the node; 'hermes meet node approve <name> <url> <token>' registers one. Tests 21 v1 tests updated (meet_say is no longer a stub; active-record now carries mode). 20 new audio_bridge + realtime tests. 42 new node tests (protocol/registry/server/client/cli). 17 new v1/v2/v3 integration tests at the plugin level covering enqueue_say edge cases, env var passthrough, mode validation, node routing (known/unknown/auto/ambiguous), and argparse wiring for `hermes meet say` + `hermes meet node` + --mode/--node flags. Total: 100 plugin tests + 58 plugin-system tests = 158 passing. E2E verified on Linux with fresh HERMES_HOME: plugin loads, 5 tools register, on_session_end hook wires, 'hermes meet' CLI tree wires including the node subtree, NodeRegistry round-trips, meet_join routes correctly to NodeClient under node='my-mac' with mode='realtime', enqueue_say accepts realtime/rejects transcribe, argparse parses every new flag cleanly. Zero changes to core. All new code lives under plugins/google_meet/. * feat(plugins/google_meet): auto-install, admission detect, mac PCM pump, barge-in, richer status Ready-for-live-test follow-up on PR #16364. Five additions that matter for the first live run on a real Meet, in priority order: 1. hermes meet install [--realtime] [--yes] pip install playwright websockets + python -m playwright install chromium --realtime: installs platform audio deps (pulseaudio-utils on Linux via sudo apt, blackhole-2ch + ffmpeg on macOS via brew). Prompts before sudo/brew unless --yes. Refuses on Windows. Refuses to auto-flip the macOS default input — user still selects BlackHole in System Settings (deliberate; surprise audio rerouting is worse than a manual step). 2. Admission detection _detect_admission(page): Leave-button visible OR caption region attached OR participants list present → we're in-call. _detect_denied(page): 'You can\'t join this video call' / 'You were removed' / 'No one responded to your request' → bail out. HERMES_MEET_LOBBY_TIMEOUT (default 300s) caps how long we sit in the lobby before giving up. in_call stays False until admitted. Status surfaces leaveReason: duration_expired \| lobby_timeout \| denied \| page_closed. 3. macOS PCM pump ffmpeg reads speaker.pcm (24kHz s16le mono) and writes to the BlackHole AVFoundation output via -f audiotoolbox -audio_device_index <N>. _mac_audio_device_index() probes ffmpeg -f avfoundation -list_devices true to resolve 'BlackHole 2ch' → numeric index. Falls back to index 0 on probe failure. Linux paplay pump unchanged. 4. Richer status dict _BotState now tracks realtime, realtimeReady, realtimeDevice, audioBytesOut, lastAudioOutAt, lastBargeInAt, joinAttemptedAt, leaveReason. RealtimeSession.audio_bytes_out / last_audio_out_at counters fold into the status file once a second so meet_status() can show the agent's voice activity in near-real-time. 5. Barge-in RealtimeSession.cancel_response() sends type='response.cancel' over the same WS (lock-guarded so it's safe to call from the caption thread while speak() is reading frames). Handles response.cancelled as a terminal frame type. _looks_like_human_speaker() gates triggers so the bot's own name, 'You', 'Unknown', and blanks don't self-cancel. Called from the caption drain loop: when a new caption arrives attributed to a real participant while rt.session exists, we fire cancel_response() and stamp lastBargeInAt. Tests: 20 new unit tests across _BotState telemetry, barge-in gating, admission/denied probe error handling, cancel_response with and without a connected WS, and `hermes meet install` CLI wiring (flag parsing + end-to-end subprocess.run verification + Linux-already-installed fast path). Total 171 passing across all google_meet test files + the plugin-system regression suite. E2E verified on Linux: plugin loads, all 5 tools register, `hermes meet install --realtime --yes` parses, fresh-bot status.json has every new telemetry key, cancel_response on a disconnected session returns False without raising, barge-in helper gates the bot's own name correctly. Still out of scope (for a future PR, not blocking live test): mic → Realtime duplex (the agent listening to meeting audio via WebRTC), node-host TLS/pairing UX, Windows audio, Meet create+Twilio. Docs updated: SKILL.md now lists the installer subcommand, lobby timeout, barge-in caveat, and the full status-dict reference table. README.md quick-start uses hermes meet install.	2026-04-27 06:22:25 -07:00
Teknium	8ed599dc05	feat(update): auto-backup HERMES_HOME before hermes update (#16539 ) Every 'hermes update' now runs a full backup of ~/.hermes/ first, so users can always roll back to the exact state they had before the update if anything goes wrong (corrupted sessions.db, broken skills, config migrations that don't round-trip, etc.). Changes: - hermes_cli/backup.py: new create_pre_update_backup() helper. Writes to <HERMES_HOME>/backups/pre-update-<stamp>.zip using the same exclusion rules and SQLite safe-copy as 'hermes backup'. Auto-rotates (keep last N, pre-update-*.zip only — hand-dropped zips in backups/ are untouched). Adds 'backups' to _EXCLUDED_DIRS so subsequent backups don't nest prior ones. - hermes_cli/main.py: _run_pre_update_backup() wired into _cmd_update_impl before any git operation. Prints save path, restore command, and how to disable. Swallows failures so a broken backup never blocks the update itself. New --no-backup flag on 'hermes update' for one-off override. - hermes_cli/config.py: new 'updates' section in DEFAULT_CONFIG with pre_update_backup (default true) and backup_keep (default 5). Auto-surfaces in the dashboard config UI. - tests/hermes_cli/test_backup.py: +11 tests covering backup location, content parity with 'hermes backup', no-recursion, rotation, manual file preservation, config gate, --no-backup flag, flag-wins-over-config.	2026-04-27 05:36:19 -07:00
Teknium	bb00b783fb	fix(cli): eliminate ghost status-bar + DSR input leaks from terminal drift The CLI renders through prompt_toolkit in non-full-screen mode, so every repaint uses the renderer's tracked _cursor_pos.y to cursor_up() + erase before drawing the new frame. Any time that tracked position drifts from terminal reality, redraws stack on top of stale content instead of overwriting it. Four user-visible bugs share this root cause. Fixes: - #5474 (SIGWINCH ghosts): the resize wrapper previously only handled column-shrink reflow. Generalize it to force a full screen-clear (erase_screen + cursor_goto(0,0)) and renderer.reset() on every resize — covers widen, row-shrink, and multiplexer SIGWINCH-less redraws. - #8688 (cmux/tmux tab switch): no SIGWINCH fires on focus regain, so prompt_toolkit has no signal to recover. Add a _force_full_redraw() helper, bound to Ctrl+L (standard bash/zsh/vim convention) and exposed as /redraw. Users can manually clear drift without restarting Hermes. - #14692 (DSR response leaks — ^[[53;1R): resize storms make prompt_toolkit's CSI 6n queries race past the input parser; the terminal's reply ends up as literal input text. Add a sibling of the bracketed-paste sanitizer that strips \x1b[<row>;<col>R and the caret-escape visible form from paste text, buffer text-filter, and the input-processing loop. The idle-redraw removal (#12641) is in the preceding commit from @foxion37 — keeping them as separate commits preserves attribution.	2026-04-27 05:31:47 -07:00

1 2 3 4 5 ...

2699 Commits