Today there's no way to remove a queued message — ↑ loads it for edit,
ctrl-K dispatches the head, but a draft you no longer want stays put
forever. ctrl-C just clears the composer and exits edit mode without
touching the queue.
Two new bindings, both gated on queueEditIdx !== null so they're
inert when the user isn't pointing at a queue item:
- ctrl-X — delete the queue item being edited, clear composer, exit
edit mode. "cut" matches the mental model and doesn't collide with
any existing binding.
- esc — cancel the edit (composer clears, item stays in queue).
Mirrors ctrl-C's existing behavior so muscle memory has two paths.
Header line now reads `queued (3) · editing 2 · ⌃X delete · esc cancel`
when in edit mode, so the affordance is discoverable without /help.
The /help hotkey table also gets a Ctrl+X entry.
ctrl-C is intentionally unchanged: it should never destroy queued
content. Cancel is non-destructive (esc / ctrl-C); only ctrl-X
removes the item.
Same layering concern as the persisted-assistant scrub already removed:
_emit_interim_assistant_message and the final_response return path were
mutating model output broadly. Streaming scrubber covers real leaks
delta-by-delta; these post-stream scrubs were redundant.
Reviewer pushback on the original boundary-hardening commits — three
overreach points pulled plugin-specific policy into shared core paths:
1. gateway/run.py hardcoded a '## Honcho Context' literal split for
vision-LLM output. Plugin-format heading in framework code; could
truncate legitimate output naturally containing that header.
Drop the literal split; keep generic sanitize_context (the wrapper
strip is plugin-agnostic). Plugin-specific cleanup belongs at the
provider boundary, not the shared gateway path.
2. run_agent.run_conversation scrubbed user_message and
persist_user_message before the conversation loop. User text is
sacred — if a user types a literal <memory-context> tag we must
not silently delete it. The producer (build_memory_context_block)
is the only legitimate emitter; user input should never need the
reverse op.
3. _build_assistant_message scrubbed model output before persistence.
Same hazard: would silently mutate legitimate documentation/code
the model emits containing the literal markers. The streaming
scrubber catches real leaks delta-by-delta before content is
concatenated; persist-time scrub was redundant belt-and-suspenders.
4. _fire_stream_delta stripped leading newlines from every delta unless
a paragraph break flag was set. Mid-stream '\n' is legitimate
markdown — lists, code fences, paragraph breaks — and chunk
boundaries are arbitrary. Narrow lstrip to the very first delta
of the stream only (so stale provider preamble still gets cleaned
on turn start, but mid-stream formatting survives).
Plus: build_memory_context_block now logs a warning when its defensive
sanitize_context strips something — surfaces buggy providers returning
pre-wrapped text instead of silently double-fencing.
Net architectural change: scrub surface collapses from 8 sites to 3
(StreamingContextScrubber on output deltas, plugin→backend send,
build_memory_context_block input-validation). Plugin-specific strings
stay out of shared runtime paths. User input and persisted assistant
output are no longer mutated.
Tests: rescoped TestMemoryContextSanitization (helper-correctness only,
no source-inspection of removed call sites), updated vision tests to
drop '## Honcho Context' literal-split assertions, updated
_build_assistant_message persistence test to assert preservation.
Added: cross-turn scrubber reset, build_memory_context_block warn-on-
violation, mid-stream newline preservation (plain + code fence).
Closed PR #5137 addressed the retrieval path (peer cards via get_card()
instead of the session-scoped lookup that returned empty for per-session
messaging flows) — that architectural fix is already in main as
_fetch_peer_card / _fetch_peer_context.
What never got fixed is the user-visible side: honcho_profile returning
a flat 'No profile facts available yet.' leaves the model to guess at
why. The model then often surfaces it to the user as a cryptic error.
Adds a diagnostic hint next to the existing 'result' message, enumerating
the likely causes in rough order of frequency:
1. Observation disabled for this peer (user_observe_me/others off)
2. Peer card hasn't accumulated yet (fresh peer / dialectic cadence
hasn't fired enough turns — cards build over time)
3. Generic fallback: self-hosted Honcho < 3.x lacks peer cards
The hint also suggests alternative tools (honcho_reasoning / honcho_search)
so the model can route around the empty card rather than giving up.
Schema description updated so the model knows the hint field exists and
that an empty card is NOT an error state.
7 tests cover the hint paths: warmup, observation-disabled for user + ai,
generic fallback, populated card still returns plain result (no hint),
alternative-tool suggestion present.
The scheme-validation commit (e77a3f2c) was too strict: a user with
legacy ''baseUrl: localhost:8000'' (no ''http://'' prefix) in their
''~/.honcho/config.json'' would get ''No API key configured'' from the
CLI after that change, even though their setup worked before.
urlparse on a schemeless host:port treats the host segment as the
scheme and leaves netloc empty, so the http/https check rejected it.
Falls back to a lenient check for schemeless strings that look like
hosts: contain '.' or ':', aren't a boolean/null literal, aren't pure
digits. The SDK still rejects truly malformed URLs at connect time
with a clearer error than ours.
Three new tests: legacy schemeless hosts accepted; obvious garbage
literals (''true'', ''null'', ''12345'') still rejected. Reviewer
noted concern #1: schemeless regression for self-hosters with old
configs.
main's 6a957a74 added an optional 'metadata' kwarg to
MemoryProvider.on_memory_write so providers can distinguish tool-driven
memory writes from background-review writes. MemoryManager already
does a getfullargspec-based introspection, so the old 3-arg signature
didn't break at runtime — but it missed the origin hint entirely.
Updates HonchoMemoryProvider.on_memory_write to accept the kwarg. The
metadata isn't yet threaded into Honcho's create_conclusion payload —
that's worth its own PR once the consolidation lands and the new
metadata shape stabilises.
Two small follow-ups to the PR review:
- Hoist hashlib import from _enforce_session_id_limit() to module top.
stdlib imports are free after first cache, but keeping all imports at
module top matches the rest of the codebase.
- _resolve_api_key now URL-parses baseUrl and requires http/https +
non-empty netloc before returning the 'local' sentinel. A typo like
baseUrl: 'true' (or bare 'localhost') no longer silently passes the
credential guard; the CLI correctly reports 'not configured'.
Three new tests cover the new validation (garbage strings, non-http
schemes, valid https).
fixes#5719
The auxiliary vision LLM called by gateway._enrich_message_with_vision
can echo its injected Honcho system prompt back into the image
description. That description gets embedded verbatim into the enriched
user message, so recalled memory (personal facts, dialectic output)
surfaces into a user-visible bubble.
Strips both forms of leak before embedding:
- <memory-context>...</memory-context> fenced blocks (sanitize_context)
- trailing '## Honcho Context' sections (header + everything after)
Plus regression tests:
- tests/agent/test_streaming_context_scrubber.py — 13 tests on the
stateful scrubber (whole block, split tags, false-positive partial
tags, unterminated span, reset, case-insensitivity)
- tests/run_agent/test_run_agent_codex_responses.py — 2 new tests on
_fire_stream_delta covering the realistic 7-chunk leak scenario and
the cross-turn scrubber reset
- tests/gateway/test_vision_memory_leak.py — 4 tests covering the
vision auto-analysis boundary (clean pass-through, '## Honcho Context'
header, fenced block, both patterns together)
sanitize_context() uses a non-greedy block regex that needs both
<memory-context> open and close tags present in a single string. When a
provider streams the fenced memory block across multiple deltas (typical
for recalled-context leaks — the payload often arrives in 10+ 1-80 char
chunks), the per-delta sanitize stripped the lone open/close tags via
_FENCE_TAG_RE but let the payload in between flow straight to the UI.
Adds StreamingContextScrubber: a small stateful scrubber that tracks
open/close tag pairs across deltas, holds back partial-tag tails at
chunk boundaries, and discards span contents wholesale (including the
system-note line that fragments across deltas).
Wired into _fire_stream_delta; reset per user turn; benign trailing
partial-tag tails are flushed at the end of each model call. Mid-span
interruption (provider drops closing tag) drops the orphaned content
rather than leaking it — truncated answer > leaked memory.
Follow-up to #13672 (@dontcallmejames).
new_session() was popping the old cached session, releasing the lock,
calling get_or_create, then re-acquiring the lock to insert. A concurrent
caller could observe the empty-cache window and race-create its own
session, producing two divergent session objects for the same key.
_cache_lock is an RLock, so nested reacquisition inside get_or_create is
safe. Hold it across the whole pop/create/insert sequence.
Follow-up to #13510 (@hekaru-agent).
When no explicit timeout is configured (HonchoClientConfig.timeout,
honcho.timeout / requestTimeout, or HONCHO_TIMEOUT), get_honcho_client
previously constructed the SDK with no timeout kwarg, letting the
underlying httpx client hang indefinitely if the Honcho backend
became unreachable mid-request.
This is a silent-failure hazard on the post-response path of
run_conversation: the memory_manager.sync_all() / queue_prefetch_all()
calls fire after the agent has already generated its final reply, so
a stalled Honcho request blocks run_conversation from returning.
The gateway never logs "response ready" and never delivers the
response to the platform (Telegram, etc.), even though the text is
already saved to the session file.
Repro: unplug the network or block app.honcho.dev mid-turn after
the model has produced its final message. Without this change,
_run_agent never returns. With it, the call aborts after 30s,
run_conversation returns, and the gateway delivers the response
(Honcho sync failure is logged and swallowed as before).
The default applies only when nothing is configured, so any
deployment that has explicitly set timeout / HONCHO_TIMEOUT /
honcho.timeout / honcho.requestTimeout keeps its existing value.
Self-hosted deployments that genuinely need a longer ceiling can
still override via any of those knobs.
_resolve_api_key() only checks for apiKey / HONCHO_API_KEY, so all
CLI subcommands (identity --show, status, migrate, etc.) bail with
"No API key configured" on self-hosted instances that use baseUrl
without an API key.
Return "local" when baseUrl or HONCHO_BASE_URL is set, matching the
client.py behavior that already handles this case for the SDK.
Tested on: macOS, self-hosted Honcho (Docker, localhost:8000).
Wraps _session_cache mutations in threading.RLock. Without this, concurrent
gateway sessions (e.g., Telegram + Discord hitting Honcho at the same time)
can race on the cache and silently lose conclusions or memory writes.
Adopted from #13510 by @hekaru-agent; the off-topic cron/jobs.py cleanup
hunk from that PR is dropped here for scope isolation. Resolved a small
conflict with the pinPeerName guard (kept both).
Gateway session keys (Matrix "!room:server" + thread event IDs, Telegram
supergroup reply chains, Slack thread IDs with long workspace prefixes) can
exceed Honcho's 100-character session ID limit after sanitization. Every
Honcho API call for those sessions then 400s with "session_id too long".
Add a helper that enforces the 100-char limit after sanitization:
short keys (the common case) short-circuit unchanged; over-limit keys
keep a prefix and append a deterministic `-<8 hex>` SHA-256 suffix over
the original key so two long keys sharing a leading segment can't
collide onto the same truncated ID.
Adds 7 regression tests in tests/honcho_plugin/test_client.py covering
short / exact-limit / long / deterministic / collision-resistant /
allowlist-preserving / hash-suffix-present cases.
CI caught that ``test_session_manager_prefers_runtime_user_id_over_config_peer_name``
in ``tests/agent/test_memory_user_id.py`` failed after this branch: that
test passes a ``MagicMock`` for ``config``, where
``mock.pin_peer_name`` silently returns another ``MagicMock`` — truthy by
default. My ``getattr(..., "pin_peer_name", False)`` fallback was
supposed to guard against callers that haven't added the new attr, but
MagicMock *does* have the attr — it just returns a live mock for it.
Tightened the gate to ``getattr(..., False) is True``. Real configs
built via ``HonchoClientConfig.from_global_config`` always yield a
proper boolean, so strict equality matches the pinned case and rejects
both the unset-attr fallback and MagicMock stand-ins. Added a comment
explaining why ``is True`` is intentional, not paranoid.
Also tightened the ``peer_name`` existence check to
``getattr(..., None)`` so a MagicMock with ``peer_name`` left at its
default (also truthy) doesn't spuriously enable pinning either.
Verified against both the new ``test_pin_peer_name.py`` suite (13/13
pass) and the previously-failing
``TestHonchoUserIdScoping`` (3/3 pass). Zero behaviour change for real
``HonchoClientConfig`` values.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a gateway drives Hermes (Telegram, Discord, Slack, ...), it passes the
platform-native user ID as ``runtime_user_peer_name`` into the Honcho
session manager. That ID wins over ``peer_name`` in ``honcho.json``, so a
single user who connects over three platforms ends up as three separate
Honcho peers — one per platform — with fragmented memory and no cross-
platform context continuity.
For multi-user bots this is correct (and must not change): each user gets
their own peer scope. For the vast majority of personal Hermes deployments
the configured ``peer_name`` is an unambiguous identity, though, so the
reporter asked for an opt-in knob that pins the user peer to that value.
Fix: new ``pinPeerName`` boolean on the host config, default ``false``.
When ``true`` AND ``peerName`` is set, the configured peer_name beats the
gateway's runtime identity; every other resolution case is unchanged.
honcho.json:
{
"peerName": "Igor",
"hosts": {
"hermes": { "pinPeerName": true }
}
}
session.py (resolution order, pinned case):
runtime_user_peer_name → skipped (opt-in flag active)
config.peer_name → WINS "Igor"
session-key fallback → unreached
Parsing follows the same host-block-overrides-root pattern as every other
flag in HonchoClientConfig.from_global_config (``_resolve_bool`` helper).
Tests (tests/honcho_plugin/test_pin_peer_name.py — 13 cases, 5 groups):
- Config parsing: default, root true, host-block true, host overrides
root, explicit false.
- Peer resolution: runtime wins by default (regression guard for multi-
user bots), config wins when pinned, pin-without-peer_name is a no-op
(prevents silent peer-id collapse to session-key fallback), CLI path
where runtime is absent, deepest fallback intact, assistant peer
untouched by the flag.
- Cross-platform unification: Telegram UID + Discord snowflake collapse
to one peer when pinned; negative control confirms two distinct
runtime IDs still produce two peers when unpinned.
244 honcho_plugin tests pass, 3 pre-existing skips, zero regressions.
Defensive detail: session.py uses ``getattr(self._config, "pin_peer_name",
False)`` so callers building partial config objects (several test fixtures
across the codebase do this) don't break if they haven't updated yet.
Runtime cost: one attr lookup per new session.
Closes#14984
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(nix): parameterize dependency-groups in python.nix
* refactor(nix): extract package to callPackage-able hermes-agent.nix
Makes the package overridable via .override{} and adds
extraPythonPackages parameter for PYTHONPATH injection.
Includes build-time collision check using PEP 503 name
canonicalization.
* feat(nix): add overlay for external NixOS consumption
External flakes can now add overlays = [ inputs.hermes-agent.overlays.default ]
to get pkgs.hermes-agent with full .override support.
* test(nix): add check for extraPythonPackages PYTHONPATH injection
Verifies wrapper has PYTHONPATH when extras provided, and
base package has no PYTHONPATH without extras.
* feat(nix): add extraPlugins option for directory-based plugins
Symlinks plugin packages into HERMES_HOME/plugins/ at activation time.
Validates plugin.yaml presence. Asserts unique plugin names at eval time.
Hermes discovers them automatically via its directory scan.
* feat(nix): add extraPythonPackages option for entry-point plugins
Overrides the hermes package with PYTHONPATH injection when
extraPythonPackages is non-empty. Plugin .dist-info directories
become visible to importlib.metadata for entry-point discovery.
Works in both native systemd and container modes.
* docs: add NixOS declarative plugin installation to nix-setup, plugins, and build-a-plugin guides
- nix-setup.md: new Plugins section with extraPlugins/extraPythonPackages
examples, overlay usage, collision checking note, options reference rows
- plugins.md: Nix row in discovery table, NixOS declarative plugins section
- build-a-hermes-plugin.md: Distribute for NixOS section after pip section
* fix: address review feedback — remove unrelated umask, fix fetchFromGitHub naming, simplify checks
- Remove accidentally introduced umask/migration changes (unrelated to plugins)
- Add pluginName helper, fix fetchFromGitHub producing name='source'
- Show name= in extraPlugins example docs
- Simplify checks.nix: use hermes-agent.override instead of re-callPackage
- Fix fragile grep shell logic in checks
* refactor: address simplify feedback — lib.getName, drop unused inputs', Python list for extras
- Use lib.getName instead of custom pluginName helper
- Drop unused inputs' from checks.nix perSystem args
- Pass extraPythonPackages as Python list literal instead of colon-split string
* fix: walk propagatedBuildInputs for plugin PYTHONPATH and collision check
Uses python312.pkgs.requiredPythonModules to resolve the full transitive
closure of extraPythonPackages. Without this, a plugin with third-party
deps (e.g. requests) would fail at runtime if those deps weren't already
in the sealed uv2nix venv. The collision check now also scans the full
closure, catching transitive conflicts.
* cleanup: fold plugins into subdir loop, use find for symlink cleanup, inline lib.getName
- Add 'plugins' to the existing cron/sessions/logs/memories subdir loop
instead of a separate mkdir/chown/chmod block
- Replace fragile for-glob with find -delete for stale symlink cleanup
- Inline lib.getName at both call sites, remove pluginName wrapper
* fix: bypass FTS5 for CJK queries in session_search
FTS5 default tokenizer splits CJK characters into individual tokens,
so multi-character queries like "大别山项目" become AND of single chars.
This produces few/no results compared to LIKE substring search.
For CJK queries, skip FTS5 entirely and use LIKE for accurate
phrase matching.
Fixes NousResearch/hermes-agent#15500
* fix: cache _contains_cjk, escape LIKE wildcards, add regression tests
On top of the CJK FTS5 bypass from #15509:
- Cache _contains_cjk() result in a local var to avoid redundant O(n)
scans on every CJK query
- Escape %, _ in LIKE queries so literal wildcards in user input are
not treated as SQL wildcards (consistent with other LIKE queries in
hermes_state.py that use ESCAPE '\')
- Fix misleading comment ('or CJK fallback' → accurate description)
- Add 3 regression tests:
- test_cjk_partial_fts5_results_supplemented_by_like (#15500 / #14829)
- test_cjk_like_dedup_no_duplicates
- test_cjk_like_escapes_wildcards (new wildcard escaping)
* feat: trigram FTS5 index for CJK search, replace LIKE fallback
Replace the LIKE '%query%' full-table-scan fallback for CJK queries with
a proper trigram FTS5 index (messages_fts_trigram). The trigram tokenizer
creates overlapping 3-byte sequences so substring matching works natively
for any script — CJK, Thai, etc.
For queries with 3+ CJK characters: uses the trigram FTS5 table with
proper ranking, snippets, and indexed lookups. For shorter queries
(1-2 CJK chars): falls back to LIKE since the trigram tokenizer needs
≥9 UTF-8 bytes (3 CJK chars) minimum.
Schema v10 migration creates the trigram table and backfills existing
messages. Triggers keep the index in sync on INSERT/UPDATE/DELETE.
Builds on top of #16276 (bypass FTS5 for CJK, escape LIKE wildcards).
---------
Co-authored-by: vominh1919 <vominh1919@gmail.com>
Keep the parity test backed by the real Python command registry while avoiding hard failures in Node-only Vitest environments that cannot import hermes_cli.commands.
- config.py: remove dead ENV_VARS_BY_VERSION[17] entry (current _config_version
is 22, so all users are past version 17 and would never be prompted for
GMI_API_KEY on upgrade — consistent with how arcee was added)
- auxiliary_client.py: use google/gemini-3.1-flash-lite-preview as GMI aux
model instead of anthropic/claude-opus-4.6 (matches cheap fast-model pattern
used by all other providers: zai→glm-4.5-flash, kimi→kimi-k2-turbo-preview,
stepfun→step-3.5-flash, kilocode→google/gemini-3-flash-preview)
- test_gmi_provider.py: fix malformed write_text() call in doctor test
(was: write_text("GMI_API_KEY=*** encoding="utf-8") → missing closing quote,
wrote literal string 'GMI_API_KEY=*** encoding=' to .env file)
- test_gmi_provider.py + test_auxiliary_client.py: update aux model assertions
to match new cheaper default
- docs/integrations/providers.md: add 'gmi' to inline 'Supported providers'
fallback list (was only in the table, not the inline list at line ~1181)
- docs/reference/cli-commands.md: add 'gmi' to --provider choices list
Distinguish missing model from unsupported model before enabling fast mode and cover both cases so config and live agent state remain untouched on invalid fast toggles.
Match classic CLI parity by refusing to enable fast mode when the active model cannot produce fast request overrides, avoiding a misleading fast status with no runtime effect.
Make `config.set fast status` read-only and keep live agent request overrides in sync with fast-mode toggles so runtime API kwargs match the selected mode.
Expose a small forceRedraw API from @hermes/ink and use it for Ctrl/Cmd+L so the hotkey performs a real terminal clear + full repaint instead of a no-op state patch.
Use explicit repaint patch semantics for Ctrl/Cmd+L and narrow the hotkey assertion to the actual +L entry so unrelated descriptions do not cause false failures.
Harden busy mode config reads against invalid display config shapes and align /fast help+usage text with accepted aliases, with regression coverage for non-dict display values.
Make Ctrl+L non-destructive by redrawing the current screen state instead of starting a new session, and stop auto-appending --global for typed /model commands so session scope remains the default unless explicitly requested.
Route /browser, /reload-mcp, /rollback, /stop, /fast, and /busy through direct TUI RPC handlers so state changes hit the live gateway session instead of slash-worker fallback. Add TUI session finalize/reset parity hooks (memory commit + plugin boundaries) and parity matrix tests to keep mutating commands off fallback.
Handle queued-title ValueError cleanup during session init, harden Discord message source building for test stubs, and fix the Dockerfile contract test syntax error. Also refresh the TUI lockfile and Nix build flags so nix ubuntu-latest no longer fails on npm lock/peer resolution drift.
Retry queued pending titles even when the DB already has a non-empty title so explicit user title intents are not silently lost (for example after auto-title). Includes regression coverage.
Tighten pending-title flush during session init and treat row lookup failures during title-set no-op detection as RPC errors instead of silently queueing.
Handle session.title read failures without crashing, distinguish no-op title writes from missing session rows, and use a distinct empty-title error code with regression coverage.