- gateway handler: turnController always archives in recordMessageComplete,
so the post-complete archiveTodosAtTurnEnd().forEach is dead code. Drop
it and the now-unused import.
- turnController: collapse archive prepend into a single spread expression.
- gateway server: one-line comment for the tool.start todo skip.
Two bugs surfaced together while the model fired the todo tool:
1. Count flickered (e.g. 3 → 1 → 3) because tool.start echoed
args.todos as the live state. With merge=true (or any partial
replacement) args.todos is just the items being updated, not the
full list. Drop the early echo — tool.complete already carries the
canonical full list from the tool result.
2. After turn end the panel jumped from under the user prompt to below
thinking/tools because archiveDoneTodos() was pushed AFTER segments
in finalMessages. Prepend the archive trail msg so it sits right
after the user prompt — same visual slot the live panel occupied
during streaming.
Keep history metadata consistent with lineage replay, globally order replayed lineage messages, and make Ink cache eviction report post-eviction sizes. Also keys TUI config cache by path to avoid cross-home test leakage.
Four independent session-UX bugs reported by an external user (#16294).
/save wrote hermes_conversation_<ts>.json to CWD — invisible to
'hermes sessions browse' and easy to lose. Snapshots now write under
~/.hermes/sessions/saved/ and the command prints the absolute path plus
a 'hermes --resume <id>' hint for the live DB-indexed session.
'hermes sessions browse' default --limit raised from 50 to 500. With the
old ceiling, users with moderately long histories saw only the most
recent 50 rows and assumed older sessions had been lost.
TUI session.list (`/resume` picker) switched from a hardcoded allow-list
of 13 gateway source names to a deny-list of just { 'tool' }. Sessions
tagged acp / webhook / user-defined HERMES_SESSION_SOURCE values and
any newly-added platform now surface. Default limit 20 → 200.
ollama-cloud provider setup passes force_refresh=True to
fetch_ollama_cloud_models() so a user entering their API key sees the
fresh catalog (e.g. deepseek v4 flash, kimi k2.6) immediately instead
of waiting up to an hour for the disk cache TTL to expire.
Closes#16294.
* fix(tui): call maybe_auto_title for TUI sessions (#15961)
The maybe_auto_title() helper is called from cli.py and gateway/run.py
but was never wired into tui_gateway/server.py, so every session started
via 'hermes --tui' landed in state.db with an empty title. Evidence from
the issue reporter: 0/154 TUI sessions titled vs 91/383 CLI.
Mirror the CLI/Gateway pattern: after emitting message.complete, when the
turn finished cleanly, fire-and-forget title generation using the session
key, user prompt, agent response, and current history.
Fixes#15949.
Co-authored-by: math0r-be <math0r-be@github.com>
* chore(release): map math0r-be placeholder email in AUTHOR_MAP
---------
Co-authored-by: math0r-be <math0r-be@github.com>
The ephemeral no-tools side-question variant of /btw confused users who
expected 'by-the-way' to mean 'run this off to the side with tools' —
they'd type /btw and get a toolless agent that couldn't do the work.
/bg worked because it was /background with full tools.
Collapse the two: /btw and /bg both alias to /background. One command,
one behavior, no more gotchas about which variant has tools.
Removed:
- _handle_btw_command in cli.py and gateway/run.py
- _run_btw_task + _active_btw_tasks state in gateway/run.py
- prompt.btw JSON-RPC method + btw.complete event in tui_gateway
- BtwStartResponse type + btw.complete case in ui-tui
- Standalone /btw slash tree registration in Discord
- Standalone btw CommandDef in hermes_cli/commands.py
Updated:
- background CommandDef aliases: (bg,) -> (bg, btw)
- TUI session.ts: local btw handler merged into background
- Docs and tips updated to describe /btw as a /background alias
PR #16046 added /busy and /verbose hints to the classic CLI and the
gateway runner but skipped the Ink TUI (and therefore the dashboard
/chat page, which embeds the TUI via PTY). This extends the same
latch to the TUI with TUI-native wording.
The TUI's busy-input model is not the /busy knob from the CLI —
single Enter while busy auto-queues, double Enter on an empty line
interrupts. The new busy-input hint teaches THAT gesture instead of
telling the user to flip a config that does not apply.
Changes:
- agent/onboarding.py — add busy_input_hint_tui() + tool_progress_hint_tui()
- tui_gateway/server.py — onboarding.claim JSON-RPC (Ink triggers busy
hint on enqueue) + _maybe_emit_onboarding_hint helper hooked into
_on_tool_complete for the 30s/tool_progress=all path. Same
config.yaml latch so each hint fires at most once per install across
CLI, gateway, and TUI combined.
- ui-tui/src/gatewayTypes.ts — OnboardingClaimResponse + onboarding.hint event
- ui-tui/src/app/createGatewayEventHandler.ts — render the hint event as sys()
- ui-tui/src/app/useSubmission.ts — claim busy_input_prompt on first
busy enqueue
- tests/agent/test_onboarding.py — +3 cases for TUI hint shape
- tests/tui_gateway/test_protocol.py — +4 cases for onboarding.claim
- website/docs/user-guide/tui.md — new 'Interrupting and queueing'
section explaining the TUI's double-Enter model and the hints
Validation:
scripts/run_tests.sh tests/agent/test_onboarding.py \
tests/tui_gateway/test_protocol.py \
tests/gateway/test_busy_session_ack.py
-> 66 passed
npm --prefix ui-tui run type-check -> clean
npm --prefix ui-tui run lint -> clean
npm --prefix ui-tui run build -> clean
When switching models on a custom endpoint (ollama-launch):
- Same-provider switches no longer re-resolve credentials (fixes base_url
being lost for 'custom' provider on subsequent switches)
- Named providers (ollama-launch) are resolved via user_providers so
switch_model can find their base_url from config
- Models not in the /v1/models probe but present in the user's saved
provider config are accepted with a warning instead of rejected
- CLI /model and TUI /model both pass user_providers/custom_providers
to switch_model so the config model list is available for validation
Closes#15088
On Windows WSL2, ConPTY implicitly enables mouse event injection when
the alternate screen buffer (DEC 1049) is entered, causing raw escape
sequences to appear in the transcript as ghost characters.
Fix (two parts):
1. ConPTY fix: send DISABLE_MOUSE_TRACKING immediately after entering
alt screen when mouse tracking is off (AlternateScreen.tsx)
2. Runtime toggle: add /mouse [on|off|toggle] slash command with config
persistence (display.tui_mouse) so users can manage this at runtime
The env var HERMES_TUI_DISABLE_MOUSE continues to work as the initial
default, but can now be overridden via /mouse and persisted to config.
Closes: upstream ConPTY mouse injection issue
Credits: OutThisLife / PR #13716 for the toggle concept
TUI auto-resolves `display.personality` at session init, unlike the base CLI.
If config contains `agent.personalities: null`, `_resolve_personality_prompt`
called `.get()` on None and failed before model/provider selection.
Normalize null personalities to `{}` and surface a targeted config warning.
Tolerating null top-level keys silently drops user settings (e.g.
`agent.system_prompt` next to a bare `agent:` line is gone). Probe at
session create, log via `logger.warning`, and surface in the boot info
under `config_warning` — rendered in the TUI feed alongside the existing
`credential_warning` banner.
YAML parses bare keys like `agent:` or `display:` as None. `dict.get(key, {})`
returns that None instead of the default (defaults only fire on missing keys),
so every `cfg.get("agent", {}).get(...)` chain in tui_gateway/server.py
crashed agent init with `'NoneType' object has no attribute 'get'`.
Guard all 21 sites with `(cfg.get(X) or {})`. Regression test covers the
null-section init path reported on Twitter against the new TUI.
Exposes hermes --tui over a PTY-backed WebSocket so the dashboard can
embed the real TUI rather than reimplement its surface. The browser
attaches xterm.js to the socket; keystrokes flow in, PTY output bytes
flow out.
Architecture:
browser <Terminal> (xterm.js)
│ onData ───► ws.send(keystrokes)
│ onResize ► ws.send('\x1b[RESIZE:cols;rows]')
│ write ◄── ws.onmessage (PTY bytes)
▼
FastAPI /api/pty (token-gated, loopback-only)
▼
PtyBridge (ptyprocess) ── spawns node ui-tui/dist/entry.js ──► tui_gateway + AIAgent
Components
----------
hermes_cli/pty_bridge.py
Thin wrapper around ptyprocess.PtyProcess: byte-safe read/write on the
master fd via os.read/os.write (not PtyProcessUnicode — ANSI is
inherently byte-oriented and UTF-8 boundaries may land mid-read),
non-blocking select-based reads, TIOCSWINSZ resize, idempotent
SIGHUP→SIGTERM→SIGKILL teardown, platform guard (POSIX-only; Windows
is WSL-supported only).
hermes_cli/web_server.py
@app.websocket("/api/pty") endpoint gated by the existing
_SESSION_TOKEN (via ?token= query param since browsers can't set
Authorization on WS upgrades). Loopback-only enforcement. Reader task
uses run_in_executor to pump PTY bytes without blocking the event
loop. Writer loop intercepts a custom \x1b[RESIZE:cols;rows] escape
before forwarding to the PTY. The endpoint resolves the TUI argv
through a _resolve_chat_argv hook so tests can inject fake commands
without building the real TUI.
Tests
-----
tests/hermes_cli/test_pty_bridge.py — 12 unit tests: spawn, stdout,
stdin round-trip, EOF, resize (via TIOCSWINSZ + tput readback), close
idempotency, cwd, env forwarding, unavailable-platform error.
tests/hermes_cli/test_web_server.py — TestPtyWebSocket adds 7 tests:
missing/bad token rejection (close code 4401), stdout streaming,
stdin round-trip, resize escape forwarding, unavailable-platform ANSI
error frame + 1011 close, resume parameter forwarding to argv.
96 tests pass under scripts/run_tests.sh.
(cherry picked from commit 29b337bca7)
feat(web): add Chat tab with xterm.js terminal + Sessions resume button
(cherry picked from commit 3d21aee8 by emozilla, conflicts resolved
against current main: BUILTIN_ROUTES table + plugin slot layout)
fix(tui): replace OSC 52 jargon in /copy confirmation
When the user ran /copy successfully, Ink confirmed with:
sent OSC52 copy sequence (terminal support required)
That reads like a protocol spec to everyone who isn't a terminal
implementer. The caveat was a historical artifact — OSC 52 wasn't
universally supported when this message was written, so the TUI
honestly couldn't guarantee the copy had landed anywhere.
Today every modern terminal (including the dashboard's embedded
xterm.js) handles OSC 52 reliably. Say what the user actually wants
to know — that it copied, and how much — matching the message the
TUI already uses for selection copy:
copied 1482 chars
(cherry picked from commit a0701b1d5a)
docs: document the dashboard Chat tab
AGENTS.md — new subsection under TUI Architecture explaining that the
dashboard embeds the real hermes --tui rather than rewriting it,
with pointers to the pty_bridge + WebSocket endpoint and the rule
'never add a parallel chat surface in React.'
website/docs/user-guide/features/web-dashboard.md — user-facing Chat
section inside the existing Web Dashboard page, covering how it works
(WebSocket + PTY + xterm.js), the Sessions-page resume flow, and
prerequisites (Node.js, ptyprocess, POSIX kernel / WSL on Windows).
(cherry picked from commit 2c2e32cc45)
feat(tui-gateway): transport-aware dispatch + WebSocket sidecar
Decouples the JSON-RPC dispatcher from its I/O sink so the same handler
surface can drive multiple transports concurrently. The PTY chat tab
already speaks to the TUI binary as bytes — this adds a structured
event channel alongside it for dashboard-side React widgets that need
typed events (tool.start/complete, model picker state, slash catalog)
that PTY can't surface.
- `tui_gateway/transport.py` — `Transport` protocol + `contextvars` binding
+ module-level `StdioTransport` fallback. The stdio stream resolves
through a lambda so existing tests that monkey-patch `_real_stdout`
keep passing without modification.
- `tui_gateway/ws.py` — WebSocket transport implementation; FastAPI
endpoint mounting lives in hermes_cli/web_server.py.
- `tui_gateway/server.py`:
- `write_json` routes via session transport (for async events) →
contextvar transport (for in-request writes) → stdio fallback.
- `dispatch(req, transport=None)` binds the transport for the request
lifetime and propagates it to pool workers via `contextvars.copy_context`
so async handlers don't lose their sink.
- `_init_session` and the manual-session create path stash the
request's transport so out-of-band events (subagent.complete, etc.)
fan out to the right peer.
`tui_gateway.entry` (Ink's stdio handshake) is unchanged externally —
it falls through every precedence step into the stdio fallback, byte-
identical to the previous behaviour.
feat(web): ChatSidebar — JSON-RPC sidecar next to xterm.js terminal
Composes the two transports into a single Chat tab:
┌─────────────────────────────────────────┬──────────────┐
│ xterm.js / PTY (emozilla #13379) │ ChatSidebar │
│ the literal hermes --tui process │ /api/ws │
└─────────────────────────────────────────┴──────────────┘
terminal bytes structured events
The terminal pane stays the canonical chat surface — full TUI fidelity,
slash commands, model picker, mouse, skin engine, wide chars all paint
inside the terminal. The sidebar opens a parallel JSON-RPC WebSocket
to the same gateway and renders metadata that PTY can't surface to
React chrome:
• model + provider badge with connection state (click → switch)
• running tool-call list (driven by tool.start / tool.progress /
tool.complete events)
• model picker dialog (gateway-driven, reuses ModelPickerDialog)
The sidecar is best-effort. If the WS can't connect (older gateway,
network hiccup, missing token) the terminal pane keeps working
unimpaired — sidebar just shows the connection-state badge in the
appropriate tone.
- `web/src/components/ChatSidebar.tsx` — new component (~270 lines).
Owns its GatewayClient, drives the model picker through
`slash.exec`, fans tool events into a capped tool list.
- `web/src/pages/ChatPage.tsx` — split layout: terminal pane
(`flex-1`) + sidebar (`w-80`, `lg+` only).
- `hermes_cli/web_server.py` — mount `/api/ws` (token + loopback
guards mirror /api/pty), delegate to `tui_gateway.ws.handle_ws`.
Co-authored-by: emozilla <emozilla@nousresearch.com>
refactor(web): /clean pass on ChatSidebar + ChatPage lint debt
- ChatSidebar: lift gw out of useRef into a useMemo derived from a
reconnect counter. React 19's react-hooks/refs and react-hooks/
set-state-in-effect rules both fire when you touch a ref during
render or call setState from inside a useEffect body. The
counter-derived gw is the canonical pattern for "external resource
that needs to be replaceable on user action" — re-creating the
client comes from bumping `version`, the effect just wires + tears
down. Drops the imperative `gwRef.current = …` reassign in
reconnect, drops the truthy ref guard in JSX. modelLabel +
banner inlined as derived locals (one-off useMemo was overkill).
- ChatPage: lazy-init the banner state from the missing-token check
so the effect body doesn't have to setState on first run. Drops
the unused react-hooks/exhaustive-deps eslint-disable. Adds a
scoped no-control-regex disable on the SGR mouse parser regex
(the \\x1b is intentional for xterm escape sequences).
All my-touched files now lint clean. Remaining warnings on web/
belong to pre-existing files this PR doesn't touch.
Verified: vitest 249/249, ui-tui eslint clean, web tsc clean,
python imports clean.
chore: uptick
fix(web): drop ChatSidebar tool list — events can't cross PTY/WS boundary
The /api/pty endpoint spawns `hermes --tui` as a child process with its
own tui_gateway and _sessions dict; /api/ws runs handle_ws in-process in
the dashboard server with a separate _sessions dict. Tool events fire on
the child's gateway and never reach the WS sidecar, so the sidebar's
tool.start/progress/complete listeners always observed an empty list.
Drop the misleading list (and the now-orphaned ToolCall primitive),
keep model badge + connection state + model picker + error banner —
those work because they're sidecar-local concerns. Surfacing tool calls
in the sidebar requires cross-process forwarding (PTY child opens a
back-WS to the dashboard, gateway tees emits onto stdio + sidecar
transport) — proper feature for a follow-up.
feat(web): wire ChatSidebar tool list to PTY child via /api/pub broadcast
The dashboard's /api/pty spawns hermes --tui as a child process; tool
events fire in the python tui_gateway grandchild and never crossed the
process boundary into the in-process WS sidecar — so the sidebar tool
list was always empty.
Cross-process forwarding:
- tui_gateway: TeeTransport (transport.py) + WsPublisherTransport
(event_publisher.py, sync websockets client). entry.py installs the
tee on _stdio_transport when HERMES_TUI_SIDECAR_URL is set, mirroring
every dispatcher emit to a back-WS without disturbing Ink's stdio
handshake.
- hermes_cli/web_server.py: new /api/pub (publisher) + /api/events
(subscriber) endpoints with a per-channel registry. /api/pty now
accepts ?channel= and propagates the sidecar URL via env. start_server
also stashes app.state.bound_port so the URL is constructable.
- web/src/pages/ChatPage.tsx: generates a channel UUID per mount,
passes it to /api/pty and as a prop to ChatSidebar.
- web/src/components/ChatSidebar.tsx: opens /api/events?channel=, fans
tool.start/progress/complete back into the ToolCall list. Restores
the ToolCall primitive.
Tests: 4 new TestPtyWebSocket cases cover channel propagation,
broadcast fan-out, and missing-channel rejection (10 PTY tests pass,
120 web_server tests overall).
fix(web): address Copilot review on #14890
Five threads, all real:
- gatewayClient.ts: register `message`/`close` listeners BEFORE awaiting
the open handshake. Server emits `gateway.ready` immediately after
accept, so a listener attached after the open promise could race past
the initial skin payload and lose it.
- ChatSidebar.tsx: wire `error`/`close` on the /api/events subscriber
WS into the existing error banner. 4401/4403 (auth/loopback reject)
surface as a "reload the page" message; mid-stream drops surface as
"events feed disconnected" with the existing reconnect button. Clean
unmount closes (1000/1001) stay silent.
- web-dashboard.md: install hint was `pip install hermes-agent[web]` but
ptyprocess lives in the `pty` extra, not `web`. Switch to
`hermes-agent[web,pty]` in both prerequisite blocks.
- AGENTS.md: previous "never add a parallel React chat surface" guidance
was overbroad and contradicted this PR's sidebar. Tightened to forbid
re-implementing the transcript/composer/PTY terminal while explicitly
allowing structured supporting widgets (sidebar / model picker /
inspectors), matching the actual architecture.
- web/package-lock.json: regenerated cleanly so the wterm sibling
workspace paths (extraneous machine-local entries) stop polluting CI.
Tests: 249/249 vitest, 10/10 PTY/events, web tsc clean.
refactor(web): /clean pass on ChatSidebar events handler
Spotted in the round-2 review:
- Banner flashed on clean unmount: `ws.close()` from the effect cleanup
fires `close` with code 1005, opened=true, neither 1000 nor 1001 —
hit the "unexpected drop" branch. Track `unmounting` in the effect
scope and gate the banner through a `surface()` helper so cleanup
closes stay silent.
- DRY the duplicated "events feed disconnected" string into a local
const used by both the error and close handlers.
- Drop the `opened` flag (no longer needed once the unmount guard is
the source of truth for "is this an expected close?").
Local llama.cpp servers (e.g. ggml-org/llama.cpp:full-cuda) fail the entire
request with HTTP 400 'Unable to generate parser for this template. ...
Unrecognized schema: "object"' when any tool schema contains shapes its
json-schema-to-grammar converter can't handle:
* 'type': 'object' without 'properties'
* bare string schema values ('additionalProperties: "object"')
* 'type': ['X', 'null'] arrays (nullable form)
Cloud providers accept these silently, so they ship from external MCP
servers (Atlassian, GCloud, Datadog) and from a couple of our own tools.
Changes
- tools/schema_sanitizer.py: walks the finalized tool list right before it
leaves get_tool_definitions() and repairs the hostile shapes in a deep
copy. No-op on well-formed schemas. Recurses into properties, items,
additionalProperties, anyOf/oneOf/allOf, and $defs.
- model_tools.get_tool_definitions(): invoke the sanitizer as the last
step so all paths (built-in, MCP, plugin, dynamically-rebuilt) get
covered uniformly.
- tools/browser_cdp_tool.py, tools/mcp_tool.py: fix our own bare-object
schemas so sanitization isn't load-bearing for in-repo tools.
- tui_gateway/server.py: _load_enabled_toolsets() was passing
include_default_mcp_servers=False at runtime. That's the config-editing
variant (see PR #3252) — it silently drops every default MCP server
from the TUI's enabled_toolsets, which is why the TUI didn't hit the
llama.cpp crash (no MCP tools sent at all). Switch to True so TUI
matches CLI behavior.
Tests
tests/tools/test_schema_sanitizer.py (17 tests) covers the individual
failure modes, well-formed pass-through, deep-copy isolation, and
required-field pruning.
E2E: loaded the default 'hermes-cli' toolset with MCP discovery and
confirmed all 27 resolved tool schemas pass a llama.cpp-compatibility
walk (no 'object' node missing 'properties', no bare-string schema
values).
Two fixes on top of the fuzzy-@ branch:
(1) Rebase artefact: re-apply only the fuzzy additions on top of
fresh `tui_gateway/server.py`. The earlier commit was cut from a
base 58 commits behind main and clobbered ~170 lines of
voice.toggle / voice.record handlers and the gateway crash hooks
(`_panic_hook`, `_thread_panic_hook`). Reset server.py to
origin/main and re-add only:
- `_FUZZY_*` constants + `_list_repo_files` + `_fuzzy_basename_rank`
- the new fuzzy branch in the `complete.path` handler
(2) Path scoping (Copilot review): `git ls-files` returns repo-root-
relative paths, but completions need to resolve under the gateway's
cwd. When hermes is launched from a subdirectory, the previous
code surfaced `@file:apps/web/src/foo.tsx` even though the agent
would resolve that relative to `apps/web/` and miss. Fix:
- `git -C root rev-parse --show-toplevel` to get repo top
- `git -C top ls-files …` for the listing
- `os.path.relpath(top + p, root)` per result, dropping anything
starting with `../` so the picker stays scoped to cwd-and-below
(matches Cmd-P workspace semantics)
`apps/web/src/foo.tsx` ends up as `@file:src/foo.tsx` from inside
`apps/web/`, and sibling subtrees + parent-of-cwd files don't leak.
New test `test_fuzzy_paths_relative_to_cwd_inside_subdir` builds a
3-package mono-repo, runs from `apps/web/`, and verifies completion
paths are subtree-relative + outside-of-cwd files don't appear.
Copilot review threads addressed: #3134675504 (path scoping),
#3134675532 (`voice.toggle` regression), #3134675541 (`voice.record`
regression — both were stale-base artefacts, not behavioural changes).
Typing `@appChrome` in the composer should surface
`ui-tui/src/components/appChrome.tsx` without requiring the user to
first type the full directory path — matches the Cmd-P behaviour
users expect from modern editors.
The gateway's `complete.path` handler was doing a plain
`os.listdir(".")` + `startswith` prefix match, so basenames only
resolved inside the current working directory. This reworks it to:
- enumerate repo files via `git ls-files -z --cached --others
--exclude-standard` (fast, honours `.gitignore`); fall back to a
bounded `os.walk` that skips common vendor / build dirs when the
working dir isn't a git repo. Results cached per-root with a 5s
TTL so rapid keystrokes don't respawn git processes.
- rank basenames with a 5-tier scorer: exact → prefix → camelCase
/ word-boundary → substring → subsequence. Shorter basenames win
ties; shorter rel paths break basename-length ties.
- only take the fuzzy branch when the query is bare (no `/`), is a
context reference (`@...`), and isn't `@folder:` — path-ish
queries and folder tags fall through to the existing
directory-listing path so explicit navigation intent is
preserved.
Completion rows now carry `display = basename`,
`meta = directory`, so the picker renders
`appChrome.tsx ui-tui/src/components` on one row (basename bold,
directory dim) — the meta column was previously "dir" / "" and is
a more useful signal for fuzzy hits.
Reported by Ben Barclay during the TUI v2 blitz test.
The voice.toggle handler was persisting display.voice_enabled /
display.voice_tts to config.yaml, so a TUI session that ever turned
voice on would re-open with it already on (and the mic badge lit) on
every subsequent launch. cli.py treats voice strictly as runtime
state: _voice_mode = False at __init__, only /voice on flips it, and
nothing writes it back to disk.
Drop the _write_config_key calls in voice.toggle on/off/tts and the
config.yaml fallback in _voice_mode_enabled / _voice_tts_enabled.
State is now env-var-only (HERMES_VOICE / HERMES_VOICE_TTS), scoped to
the live gateway subprocess — the next launch starts clean.
Crash-log stack trace (tui_gateway_crash.log) from the user's session
pinned the regression: SIGPIPE arrived while main thread was blocked on
for-raw-in-sys.stdin — i.e., a background thread (debug print to stderr,
most likely from HERMES_VOICE_DEBUG=1) wrote to a pipe whose buffer the
TUI hadn't drained yet, and SIG_DFL promptly killed the process.
Two fixes that together restore CLI parity:
- entry.py: SIGPIPE → SIG_IGN instead of the _log_signal handler that
then exited. With SIG_IGN, Python raises BrokenPipeError on the
offending write, which write_json already handles with a clean exit
via _log_exit. SIGTERM / SIGHUP still route through _log_signal so
real termination signals remain diagnosable.
- hermes_cli/voice.py:_debug: wrap the stderr print in a BrokenPipeError
/ OSError try/except. This runs from daemon threads (silence callback,
TTS playback, beep), so a broken stderr must not escape and ride up
into the main event loop.
Verified by spawning the gateway subprocess locally:
voice.toggle status → 200 OK, process stays alive, clean exit on
stdin close logs "reason=stdin EOF" instead of a silent reap.
SIG_DFL for SIGPIPE means the kernel reaps the gateway subprocess the
instant a background thread (TTS playback, silence callback, voice
status emitter) writes to a stdout the TUI stopped reading — before
the Python interpreter can run excepthook, threading.excepthook,
atexit, or the entry.py post-loop _log_exit.
Replace the three SIG_DFL / SIG_IGN bindings with a _log_signal
handler that:
- records which signal (SIGPIPE / SIGTERM / SIGHUP) fired and when;
- dumps the main-thread stack at signal delivery AND every live
thread's stack via sys._current_frames — the background-thread
write that provoked SIGPIPE is almost always visible here;
- writes everything to ~/.hermes/logs/tui_gateway_crash.log and prints
a [gateway-signal] breadcrumb to stderr so the TUI Activity surfaces
it as well.
SIGINT stays ignored (TUI handles Ctrl+C for the user).
Gateway exits weren't reaching the panic hook because entry.py calls
sys.exit(0) on broken stdout — clean termination, no exception. That
left "gateway exited" in the TUI with zero forensic trail when pipe
breaks happened mid-turn.
Entry.py now tags each exit path — startup-write failure, parse-error-
response write failure, per-method response write failure, stdin EOF —
with a one-line entry in ~/.hermes/logs/tui_gateway_crash.log and a
gateway.stderr breadcrumb. Includes the JSON-RPC method name on the
dispatch path, which is the only way to tell "died right after handling
voice.toggle on" from "died emitting the second message.complete".
When the gateway subprocess raises an unhandled exception during a
voice-mode turn, nothing survives: stdout is the JSON-RPC pipe, stderr
flushes but the process is already exiting, and no log file catches
Python's default traceback print. The user is left with an
undiagnosable "gateway exited" banner.
Install:
- sys.excepthook → write full traceback to tui_gateway_crash.log +
echo the first line to stderr (which the TUI pumps into
Activity as a gateway.stderr event). Chains to the default hook so
the process still terminates.
- threading.excepthook → same, tagged with the thread name so it's
clear when the crash came from a daemon thread (beep playback, TTS,
silence callback, etc.).
- Turn-dispatcher except block now also appends a traceback to the
crash log before emitting the user-visible error event — str(e)
alone was too terse to identify where in the voice pipeline the
failure happened.
Zero behavioural change on the happy path; purely forensics.
Three issues surfaced during end-to-end testing of the CLI-parity voice
loop and are fixed together because they all blocked "speak → agent
responds → TTS reads it back" from working at all:
1. Wrong result key (hermes_cli/voice.py)
transcribe_recording() returns {"success": bool, "transcript": str},
matching cli.py:_voice_stop_and_transcribe. The wrapper was reading
result.get("text"), which is None, so every successful Groq / local
STT response was thrown away and the 3-strikes halt fired after
three silent-looking cycles. Fixed by reading "transcript" and also
honouring "success" like the CLI does. Updated the loop simulation
tests to return the correct shape.
2. TTS speak-back was missing (tui_gateway/server.py + hermes_cli/voice.py)
The TUI had a voice.toggle "tts" subcommand but nothing downstream
actually read the flag — agent replies never spoke. Mirrored
cli.py:8747-8754's dispatch: on message.complete with status ==
"complete", if _voice_tts_enabled() is true, spawn a daemon thread
running speak_text(response). Rewrote speak_text as a full port of
cli.py:_voice_speak_response — same markdown-strip regex pipeline
(code blocks, links, bold/italic, inline code, headers, list bullets,
horizontal rules, excessive newlines), same 4000-char cap, same
explicit mp3 output path, same MP3-over-OGG playback choice (afplay
misbehaves on OGG), same cleanup of both extensions. Keeps TUI TTS
audible output byte-for-byte identical to the classic CLI.
3. Auto-submit swallowed on non-empty composer (createGatewayEventHandler.ts)
The voice.transcript handler branched on prev input via a setInput
updater and fired submitRef.current inside the updater when prev was
empty. React strict mode double-invokes state updaters, which would
queue the submit twice; and when the composer had any content the
transcript was merely appended — the agent never saw it. CLI
_pending_input.put(transcript) unconditionally feeds the transcript
as the next turn, so match that: always clear the composer and
setTimeout(() => submitRef.current(text), 0) outside any updater.
Side effect can't run twice this way, and a half-typed draft on the
rare occasion is a fair trade vs. silently dropping the turn.
Also added peak_rms to the rec.stop debug line so "recording too quiet"
is diagnosable at a glance when HERMES_VOICE_DEBUG=1.
The TUI had drifted from the CLI's voice model in two ways:
- /voice on was lighting up the microphone immediately and Ctrl+B was
interpreted as a mode toggle. The CLI separates the two: /voice on
just flips the umbrella bit, recording only starts once the user
presses Ctrl+B, which also sets _voice_continuous so the VAD loop
auto-restarts until the user presses Ctrl+B again or three silent
cycles pass.
- /voice tts was missing entirely, so users couldn't turn agent reply
speech on/off from inside the TUI.
This commit brings the TUI to parity.
Python
- hermes_cli/voice.py: continuous-mode API (start_continuous,
stop_continuous, is_continuous_active) layered on the existing PTT
wrappers. The silence callback transcribes, fires on_transcript,
tracks consecutive no-speech cycles, and auto-restarts — mirroring
cli.py:_voice_stop_and_transcribe + _restart_recording.
- tui_gateway/server.py:
- voice.toggle now supports on / off / tts / status. The umbrella
bit lives in HERMES_VOICE + display.voice_enabled; tts lives in
HERMES_VOICE_TTS + display.voice_tts. /voice off also tears down
any active continuous loop so a toggle-off really releases the
microphone.
- voice.record start/stop now drives start_continuous/stop_continuous.
start is refused with a clear error when the mode is off, matching
cli.py:handle_voice_record's early return on `not _voice_mode`.
- New voice.transcript / voice.status events emit through
_voice_emit (remembers the sid that last enabled the mode so
events land in the right session).
TypeScript
- gatewayTypes.ts: voice.status + voice.transcript event
discriminants; VoiceToggleResponse gains tts; VoiceRecordResponse
gains status for the new "started/stopped" responses.
- interfaces.ts: GatewayEventHandlerContext gains composer.setInput +
submission.submitRef + voice.{setRecording, setProcessing,
setVoiceEnabled}; InputHandlerContext.voice gains enabled +
setVoiceEnabled for the mode-aware Ctrl+B handler.
- createGatewayEventHandler.ts: voice.status drives REC/STT badges;
voice.transcript auto-submits when the composer is empty (CLI
_pending_input.put parity) and appends when a draft is in flight.
no_speech_limit flips voice off + sys line.
- useInputHandlers.ts: Ctrl+B now calls voice.record (start/stop),
not voice.toggle, and nudges the user with a sys line when the
mode is off instead of silently flipping it on.
- useMainApp.ts: wires the new event-handler context fields.
- slash/commands/session.ts: /voice handles on / off / tts / status
with CLI-matching output ("voice: mode on · tts off").
Backward compat preserved for voice.record (was always PTT shape;
gateway still honours start/stop with mode-gating added).
- normalizeStatusBar: replace Set + early-returns + cast with a single
alias lookup table. Handles legacy `false`, trims/lowercases strings,
maps `on` → `top` in one pass. One expression, no `as` hacks.
- Tab title block: drop the narrative comment, fold
blockedOnInput/titleStatus/cwdTag/terminalTitle into inline expressions
inside useTerminalTitle. Avoids shadowing the outer `cwd`.
- tui_gateway statusbar set branch: read `display` once instead of
`cfg0.get("display")` twice.
- normalizeStatusBar: trim/lowercase + 'on' → 'top' alias so user-edited
YAML variants (Top, " bottom ", on) coerce correctly
- shift-tab yolo: no-op with sys note when no live session; success-gated
echo and catch fallback so RPC failures don't report as 'yolo off'
- tui_gateway config.set/get statusbar: isinstance(display, dict) guards
mirroring the compact branch so a malformed display scalar in config.yaml
can't raise
Tests: +1 vitest for trim/case/on, +2 pytest for non-dict display survival.
- normalizeStatusBar collapses to one ternary expression
- /statusbar slash hoists the toggle value and flattens the branch tree
- shift-tab yolo comment reduced to one line
- cursorLayout/offsetFromPosition lose paragraph-length comments
- appLayout collapses the three {!overlay.agents && …} into one fragment
- StatusRule drops redundant flexShrink={0} (Yoga default)
- server.py uses a walrus + frozenset and trims the compat helper
Net -43 LoC. 237 vitest + 46 pytest green, layouts unchanged.
'top' and 'bottom' are positions relative to the input row, not the alt
screen viewport:
- top (default) → inline above the input, where the bar originally lived
(what 'on' used to mean)
- bottom → below the input, pinned to the last row
- off → hidden
Drops the literal top-of-screen placement; 'on' is kept as a backward-
compat alias that resolves to 'top' at both the config layer
(normalizeStatusBar, _coerce_statusbar) and the slash command.
Default is back to 'on' (inline, above the input) — bottom was too far
from the input and felt disconnected. Users who want it pinned can
opt in explicitly.
- UiState.statusBar: boolean → 'on' | 'off' | 'bottom' | 'top'
- /statusbar [on|off|bottom|top|toggle]; no-arg still binary-toggles
between off and on (preserves muscle memory)
- appLayout renders StatusRulePane in three slots (inline inside
ComposerPane for 'on', above transcript row for 'top', after
ComposerPane for 'bottom'); only the slot matching ui.statusBar
actually mounts
- drop the input's marginBottom when 'bottom' so the rule sits tight
against the input instead of floating a row below
- useConfigSync.normalizeStatusBar coerces legacy bool (true→on,
false→off) and unknown shapes to 'on' for forward-compat reads
- tui_gateway: split compact from statusbar config handlers; persist
string enum with _coerce_statusbar helper for legacy bool configs