Brings the memory-window branch current with the base PR: multi-click already
in; adds OSC window title/notifications, +10 tree-sitter languages, /sessions
this-directory grouping + TUI cwd persistence, and the node26-fnm-discovery +
launch-cwd fix. All ui-opentui/gateway/launcher additions; windowing files are
ours alone.
# Conflicts:
# ui-opentui/src/entry/main.tsx
Two reasons the local TUI stopped running OpenTUI / showed the wrong directory:
1. Node resolution. OpenTUI needs Node >= 26.3 (node:ffi floor), but
_node26_bin_or_none only checked HERMES_NODE + `which node`. When fnm's
default flips to an older line (e.g. v25.9) the active node fails the gate
and the engine silently falls back to Ink even though a usable v26.3 sits
installed. _fnm_node26_candidates now discovers fnm's installed versions
(FNM_DIR / XDG_DATA_HOME/fnm / ~/.local/share/fnm / macOS Library path),
newest first, version-probed — so the engine launches without the user
re-aliasing their global default.
2. Launch cwd. The launcher runs the engine with cwd=<engine package dir> so
its build/resolution works; the gateway it spawns then auto-detected THAT
dir as the workspace (chrome bar showed 'ui-opentui (feat/opentui-native-
engine)' regardless of where you ran hermes). TERMINAL_CWD — the gateway's
canonical launch-dir channel — was only exported in worktree mode; now it's
set to the real cwd for every launch (worktree mode still overrides to the
worktree path). The TUI's session.create no longer sends process.cwd() (the
engine dir) — a new launchCwd() reads the launcher's HERMES_CWD/TERMINAL_CWD,
falling back to process.cwd() only for standalone smokes.
Together: session cwd, chrome bar, terminal-tool cwd, and /sessions grouping
all anchor to where you actually ran hermes. Verified live — chrome bar shows
'/tmp/cwd-probe (my-feature)' launched from there with fnm default on v25.9.
8 new tests (fnm discovery order/precedence/empty-safety; launchCwd env
precedence).
The resume picker never had cwd grouping — deliberately deferred in the v1
spec because TUI session rows had no cwd to group by: the TUI's
session.create sent only {cols}, so explicit_cwd stayed false and
_ensure_session_db_row skipped cwd stamping by design (the desktop's launch
dir is meaningless — 'No workspace' grouping is its desired default).
In a terminal the launch directory IS the workspace choice, so the entry now
passes cwd: process.cwd() at session.create — the existing explicit-workspace
machinery persists it to the session row on first message (covered by
test_ensure_session_db_row_persists_explicit_cwd; zero gateway changes).
Picker: while browsing (no search), sessions whose cwd matches the TUI's
current directory order first under a '▾ this directory (N)' caption, the
rest under '▾ other directories' — one flat reordered list, so selection/
windowing/load-more math is untouched, and captions are pure render
decoration keyed off hereCount. During search the fuzzy score keeps owning
the order. Trailing-slash-normalized comparison, no fs calls.
Old sessions can't be backfilled (their cwd was never recorded); coverage
accumulates from here. 6 new tests (pure ordering edges + grouped frames,
search-drops-grouping, no-cwd passthrough).
Instead of an external watcher chasing 5-10 concurrent session pids,
every TUI samples ITSELF at 1Hz (rss/heap/external + windowing
mounted/peak-mounted counters) into ~/.hermes/logs/memwatch/, gated by
HERMES_TUI_MEMLOG (defaults to the HERMES_TUI_DIAGNOSTICS master
switch) — one shell-rc export covers every session a dev ever starts.
Unref'd timer, every failure path silently disables, 14-day retention.
bench/memwatch-report.mjs aggregates the fleet: per-session
baseline/peak/last, steady-state MB/h slope, peak mounted rows, and
SLOPE/PEAK/MOUNTED anomaly flags. Verified live: two fake-gateway
smoke sessions logged and aggregated (102MB base, mounted ≤60).
@opentui/core@0.4.0 bundles only 5 grammars (ts/js/markdown/markdown_inline/
zig) and Hermes registered none of its own — Python/Rust/Go/bash/JSON/C/HTML/
CSS/YAML/TOML tool bodies and fences rendered plain text (never a regression:
no addDefaultParsers existed anywhere in branch history).
Now: parsers/manifest.json curates the 10 grammars (cpp deliberately dropped —
3.28MB alone); scripts/update-parsers.mjs vendors wasm+highlights.scm with
magic/content validation (plain Node fetch — core's update-assets generator is
Bun-flavored and its import-module won't bundle under esbuild, so registration
skips it and points at the vendored files by runtime-resolved path instead);
boundary/parsers.ts registers via the public addDefaultParsers() at entry
module load, before the first <code>/<markdown> mount initializes the global
tree-sitter client. ~4MB vendored, committed (build inputs, offline-safe).
Markdown fence injections need no infoStringMap: fence labels resolve as
filetype ids and core's ext maps already normalize py→python, zsh→bash, h→c.
Live-smoked in a real renderer: python tool body draws 6 distinct token
colors; ```python and ```yaml fences inside markdown highlight too. 6 new
tests pin the wiring (vendored assets valid, registration set, filetype
routing); visuals stay live-smoke territory per codeBlock.tsx.
Window title: a render-nothing <TerminalChrome> tracks session.info — the
native renderer.setTerminalTitle (frame-safe, zig-side OSC emit) shows
'{session title} — Hermes' once the session is titled, 'Hermes Agent' until
then. The user's previous title is bracketed with the XTWINOPS title stack
(save on boot, best-effort restore on quit). Gateway: _session_info now
carries the live title (DB row, pending_title fallback) and a session.info
refresh follows every title change — pending-title application, the
auto-title worker landing (via maybe_auto_title's title_callback), and
session.title renames — so the window retitles without waiting for the
next turn.
Notifications: when the TUI starts waiting on the user — any blocking
prompt (clarify/approval/sudo/secret/confirm) or turn completion — three
dialect sequences go out through renderer.writeOut: OSC 9 (iTerm2/wezterm),
OSC 99 (kitty), OSC 777 (urxvt/foot); terminals ignore what they don't
speak. Suppressed while the terminal reports focused (core's mode-1004
focus/blur events; until a first blur proves reporting works, notify
unconditionally). HERMES_TUI_NOTIFY=0/false/off kills notifications; the
title is not gated. All text is OSC-sanitized (control chars stripped,
777's semicolon fields spliced-proof, length-capped).
13 new TUI tests (pure shaping/sequences/env gate + store-edge wiring via
an injected seam) and 2 gateway tests (title resolution order, thread-safe
refresh emitter). Live-smoked: tmux pane_title shows 'Hermes Agent' from
the native title path.
Brings the memory-window branch up to current main + the multi-click
selection feature, via the base PR branch's merge commits (history
preserved, no rewrites). No conflicts: the 13 windowing/diagnostics
commits touch ui-opentui/bench/docs only.
Main's rewritten test_tui_npm_install.py tests call _make_tui_argv expecting
the Ink/npm flow unconditionally; with the dual-engine dispatch merged in,
_resolve_tui_engine() auto-selects opentui whenever ui-opentui/dist is built
in the repo, routing the call away from the path under test (first subprocess
became 'node --version' instead of 'npm run build'). Pin the engine to ink
via an autouse fixture, mirroring the existing pinning precedent in
test_tui_resume_flow.py.
body sets user-select:none for native feel and opts text back in only via
[data-selectable-text='true']; the preview's source and rendered-markdown
panes never set it, so code couldn't be selected or copied. Tag the Shiki
code column and the markdown root. The attribute stays off the SourceView
grid root so the gutter keeps its select-none and line numbers don't bleed
into copied text.
The terminal listed JetBrains Mono only as a late fallback and shipped no
webfont, so on machines without SF Mono/Menlo xterm measured the grid on the
regular system face while styled SGR spans fell back to a font with different
advances — glyphs squeezed and overlapped.
Bundle the regular/bold/italic woff2 (Apache-2.0, the faces the dashboard
already ships), put the family first in the xterm stack, pin the weights, and
warm every face before mount (fonts.ready only settles already-requested
faces; bold/italic aren't asked for until styled output paints, past atlas
init). Vite emits them as hashed assets under dist/** with base './', so the
fonts ship in the asar and every install path inherits them.
439 main commits in; 144 branch commits preserved (no rewrite — merge over
rebase per glitch's call to keep history). Conflict resolutions:
- Dockerfile: ui-opentui build folded into main's new cached frontend-build
layer (COPY ui-opentui/ + install/build/prune beside web+ui-tui); node:26
base from our side kept.
- gateway/run.py: took main's extraction (slash handlers moved to
gateway/slash_commands.py); re-applied our /usage real-cost block
(real_session_cost_usd / resolve_billing_route, no estimation) at its new
home in slash_commands.py.
- tui_gateway/server.py: _LONG_HANDLERS union (our model.options + main's
plugins.manage).
- tests/test_tui_gateway_server.py: both sides' appended tests kept.
Verified during resolution: cli.py worktree prune/cleanup lock fix survived
auto-merge intact and main's new prune call-site (hermes_cli/main.py:2139)
inherits its clean/locked/dirty/unpushed skip semantics; HERMES_TUI_ENGINE
selection in hermes_cli/main.py intact.
Regular users get zero diagnostic surface by default: /mem and /heapdump
disappear from /help and completion, and invoking them prints the
one-line enable hint (relaunch with HERMES_TUI_DIAGNOSTICS=1) instead of
executing — an enable switch, not a secret. With the switch on, the
commands work as before and HERMES_TUI_WINDOW_STATS defaults on (still
individually settable either way). Full env-flag ledger (master switch /
user config / dev tuning / internal plumbing) in docs/opentui-env-flags.md.
672 tests exit 0.
The per-row copy control lived in the header's trailing slot as a 24px
button that depended on a `group-hover/tool-row` group that exists nowhere
in the tree. It therefore stayed `opacity-0` yet remained clickable — an
invisible hit-target straddling the disclosure caret and duration, making
the caret hard to click without firing a copy.
Move copy into the expanded body's top-right (matching the code-block
convention) where it can't fight the caret for the right edge, and make it
actually visible (subtle at rest, full on hover/focus). The header right
edge now belongs solely to the duration label + caret.
Tradeoff: copy is only reachable once a row is expanded; rows with no
expandable body no longer surface a copy control.
Rebasing onto fcf49f313 (multi-click selection) collided in the test
probe (both sides added 'mouse' — deduped) and surfaced two test debts
the cap-restore commit (3cc56517a) had shipped masked by a piped exit
code: store cap tests still asserted the 1000 default (now: 3000
windowed, 1000 with HERMES_TUI_WINDOWING=0, both covered), and the
burst-interplay test relied on the old cap trimming a 1500-row burst
(now pins HERMES_TUI_MAX_MESSAGES=1000 explicitly for both stores).
Also: .demo/ build artifacts excluded from typed linting. 669 tests
exit 0 (verified unpiped). Multi-click selections flow through the
same renderer selection seam, so windowing's drag-freeze + row-pinning
covers them with no changes.
Maintainer signals (native yoga next release, 2x layout, opencode's
100-cap was a legacy perf workaround): what changes for us (WASM ratchet
dies), what doesn't (the 65k handle table still makes windowing
load-bearing at 3000 rows), the boundary/ shim ledger with
delete-on-upstream-fix criteria, and the per-release upgrade playbook
that uses the bench suite as the acceptance contract.
Shareable explainer: every primitive at play (native handle table, Yoga
WASM grow-only memory, renderables, Solid surgical unmount, V8 GC
laziness, scrollbox draw-only culling) and every decision (windowing vs
store-cull, exact-heights-at-unmount vs Ink's estimate-correct, the
correctionIsLegal zero-jank law, append-time adjudication, never-window
rules, windowing-aware cap restore, heap right-sizing), with the
measured scoreboard and honest open items.
With transcript windowing (S1+S2) the mounted set no longer scales with
the store (peak 31 rows over a 1500-row burst), so the handle-table
clamp that forced 1000 rows is unnecessary when windowing is on. The
ceiling is now windowing-aware: 3000 rows (the originally-shipped
default, regression documented in opentui-fixes-audit.md §2) with
windowing, 1000 with HERMES_TUI_WINDOWING=0 (every row mounts again).
Measured at the restored cap (full 3000-msg store): mem3000 360MB peak
styled end-to-end (pre-campaign: ~870MB + unstyled past ~1,400 rows;
before that: crash). scroll3000 p50=2 p90=3 p99=8 max=17ms (Ink same
workload: p90=35 p99=96). Gate digest unchanged.
Pins Node 26.3 to ui-opentui/ only (fnm/mise auto-switch on cd; leaving
the directory restores whatever the dev had — no global default change).
engines.node >= 26.3 makes a wrong-Node npm ci warn. README covers
install paths (fnm/mise/nvm/absolute-binary), the ABI-locked
node_modules gotcha, and build/run commands.
Adversarial-review follow-up to the S2 slice. The S1 rule froze ALL window
recomputes while renderer.getSelection()?.isActive — but a finished mouse
selection persists by design (boundary/renderer.ts keeps the highlight so
Ctrl+C can re-copy), so a long streaming turn behind a lingering highlight
ballooned the mounted set exactly like pre-windowing (and permanently
ratcheted the Yoga-WASM high-water).
Refinement:
- full freeze only while selection.isDragging (the native walk touches the
live tree on every drag update — destroying a row mid-walk corrupts the
highlight; unchanged from S1 where it matters),
- a finished highlight instead PINS the rows containing
selection.selectedRenderables (parent-climb to the row wrapper via a
WeakMap) as neverWindow — the highlight and a later Ctrl+C copy stay
byte-exact while everything else keeps windowing,
- an active highlight counts as activity (no idle measure churn under it).
Test (headless mock-mouse drag): finished selection persists (isActive,
!isDragging) → 300-row burst keeps peakMounted < 120 AND
getSelectedText() returns the identical text afterward, the selected rows
having been pinned while long scrolled past the margin.
Verified on this build: gate digest otui-capped d5e9558583159eac… (2/2),
mem2000 otui-capped windowing-ON vmhwm 312MB (target ≤ 350), scroll2000
otui-capped p50 2.0ms / p99 6.0ms (gate ≤ 17ms). check exit 0 (648 tests).
Review verdicts on the remaining findings (verified against core source):
- "scrollTop compensation race": rejected — scrollTop is an imperative
scrollbar property (no signal staleness); records fire in document order,
each compensation immediately visible to the next.
- "heights map leak on /new": rejected — the countChanged cleanup prunes
every per-key map against the live key set (test-verified).
- remount-in-viewport estimate shift: only reachable when one frame jumps
past the margin (> 1 viewport); the design's accepted "remounted for
view" path — documented in the header.
- expanded tool/reasoning re-collapse on far-remount: S1-accepted,
deferred (component-local state; out of S2 file ownership).
S2 of docs/plans/opentui-transcript-windowing.md (#27), behind
HERMES_TUI_WINDOWING (OFF path renders the byte-identical legacy tree).
Append-time adjudication: the window now recomputes on transcript GROWTH,
not just scroll — a createComputed on messages.length re-windows
synchronously per append, and while pinned at the bottom computeWindow
anchors to the cumulative content BOTTOM (pinnedBottom) instead of the
stale pre-layout scrollTop, so burst-appended rows are spacer-swapped the
moment they pass the margin. The frame driver additionally treats a
≥ ¼-viewport scrollHeight change (streaming growth) like scroll movement.
Unseen-row default changed from "always mounted" to "mounted iff created
streaming or within the bottom-30" — live rows still paint instantly with
zero added latency; a bulk commitSnapshot (resume) mounts ONLY the bottom
window and everything above starts as line-count-estimate spacers (chip-
and-spacing-aware estimateMessageHeight).
Spacer corrections (zero-jank rule): when a measure lands a height
different from what the spacer occupied, the wrapper's onSizeChange fires
inside the layout traversal, pre-paint. Pinned at bottom the scrollbox's
own sticky re-pin (content onSizeChange runs before the row wrappers')
already compensated — verified by test; otherwise scrollTop is compensated
same-frame for rows fully above the viewport (correctionIsLegal). Frames
stay byte-stable across corrections in both pinned and mid-history tests.
Lazy exact-measure (design §4 — the simple choice, documented): no true
offscreen layout exists in @opentui/core, so an idle pulse (no appends,
no scroll, no turn, no selection for HERMES_TUI_WINDOW_IDLE_MS≈1s) mounts
MEASURE_BATCH_ROWS=10 never-measured rows nearest the bottom window edge
(edgeMeasureBatch), records exact heights (incl. a direct post-layout pull
for rows whose mount changed nothing — no onSizeChange fires), and the
next recompute swaps them back to now-exact spacers. Scrolling itself
still measures the margin band.
DEV counter: windowRowStats (current/peak simultaneously-mounted rows),
exposed on globalThis behind HERMES_TUI_WINDOW_STATS; tests assert it.
Measured (this build, 39f9f433e+S2):
- check: exit 0 (647 tests / 39 files; +11 pure window cases, +4 headless)
- peak mounted: 31 rows over a 1500-row burst; 30 rows on a 600-row
resume snapshot (bound asserted < 120)
- gate digest: otui-capped d5e9558583159eac… — byte-identical, 2/2 reps
- mem2000 (otui-capped, windowing ON, 8GB heap): vmhwm 300MB
(S1 same-heap 518MB; S1 right-sized-heap 427MB; Ink 229-239MB;
target ≤ 350MB)
- scroll2000 otui-capped: p50 2.0ms / p99 5.0ms / max 18ms
(gate ≤ 17ms p99; S1 baseline p99 15ms)
Known S2 limits (deferred to S3, design §5): /compact·/details toggles and
width resizes leave out-of-window spacer heights stale until remount or
the idle march; expanded-body state above the window may re-collapse on
remount (S1-accepted).
Core machinery of docs/plans/opentui-transcript-windowing.md (#27): rows
outside [scrollTop − viewport, scrollTop + 2·viewport) swap to an exact-height
empty <box> (1 yoga node, no text buffers / native handles), so the mounted
set stays ~3 viewports regardless of transcript length.
Flag: HERMES_TUI_WINDOWING — unset → ON; 0/false/no/off → OFF (envFlag
semantics, the bench A/B + one-env escape hatch). OFF renders the exact
legacy tree (no wrapper boxes).
Pieces:
- logic/window.ts (pure, table-tested): computeWindow (viewport ± 1-viewport
margin intersection over cumulative exact heights; null heights fall back
to a per-row line-count estimate), hysteresisFor/shouldRecompute (≥ ¼
viewport between recomputes), correctionIsLegal (the jank rule: corrections
only fully above the viewport with same-frame scrollTop compensation, or
fully below it), estimateMessageHeight (line-count estimate; wrong values
are fixed by remount only — S1 never corrects a spacer in place).
- view/transcript.tsx: per-row measuring wrapper records exact heights via
onSizeChange (only while the real row is mounted); window driver is a
renderer frame callback (setFrameCallback — scroll always renders, so no
extra timer) publishing the mounted set through one signal + createSelector
so only flipped rows re-render. Stable row keys via WeakMap<Message, n>
(messages have no id; store proxies are reference-stable). Solid <Show>
unmount destroys the row's renderables (@opentui/solid _removeNode →
destroyRecursively).
Never-window rules:
- streaming rows (remount would restart native markdown streaming),
- the last row while a turn is running (deltas land there),
- the bottom 30 rows (fixed K — sticky-bottom region; rows under
viewport+margin are mounted by the window calc anyway),
- rows the window has never adjudicated default to MOUNTED (new live rows
paint instantly),
- the whole window FREEZES while a mouse selection is active
(renderer.getSelection()?.isActive — a swap would destroy highlighted
renderables under the native selection walk).
Tests: 30 pure window.test.ts cases + 2 headless integration cases
(transcriptWindow.test.tsx) pinning the zero-jank invariant (scrollHeight
identical ON vs OFF), the renderable shedding, and remount-on-scroll-back.
Arabic/Hebrew/Persian/Urdu chat text rendered left-to-right and
left-aligned, and mixed RTL/English technical messages (the common case)
read backwards. Resolve each chat block's base direction from its own
first strong character (UAX#9) with pure CSS, scoped to the chat
surfaces only:
- `unicode-bidi: plaintext` + `text-align: start` on assistant prose
blocks (p, h1-h6, li, blockquote), the user bubble's text lines, and
both composers (main + edit share the composer-rich-input slot). RTL
blocks read and right-align RTL; English stays LTR; mixed
conversations resolve per block. `text-align: start` is required
because the user bubble hardcodes `text-left`.
- Inline `code` and KaTeX are pinned `direction: ltr; unicode-bidi:
isolate`, so the bidi first-strong heuristic skips them: a sentence
that *starts* with a command (`./run.sh ...`) followed by Arabic
still resolves RTL, and the command's own neutrals keep their order.
- Fenced code surfaces (code-card, user fences) are pinned LTR so they
never mirror or right-align inside an RTL list item or blockquote.
`direction` is never forced, so app chrome, layout, and list indent
stay LTR per the issue's request not to flip the whole UI. English-only
content is byte-for-byte unchanged.
Salvaged and unified from #44065 and #44169; verified in Chromium that
isolate removes inline code from the paragraph direction vote (the
code-first case), making the JS dir-resolution in #44065 unnecessary.
Fixes#44150
Co-authored-by: Adolanium <Adolanium@users.noreply.github.com>
Co-authored-by: Adalsteinn Helgason <AIalliAI@users.noreply.github.com>
Tell coding agents to activate shell setup once per session instead of re-sourcing it before every command, and pin the existing LocalEnvironment env-snapshot behavior with regression tests.
Port from anomalyco/opencode#31271: only call tools/list when the server
advertises the 'tools' capability in InitializeResult.capabilities.
Previously, _discover_tools() unconditionally called session.list_tools()
right after initialize. Prompt-only / resource-only servers (which omit
the tools capability per the MCP spec) raise McpError(-32601 Method not
found), which aborted the connection — burning all 3 initial-connect
retries and permanently failing the server even though its prompts and
resources were perfectly usable. The 180s keepalive had the same problem:
it probed with list_tools(), so even a successfully connected prompt-only
server would be torn down on the first keepalive cycle.
Changes:
- MCPServerTask._advertises_tools(): capability check with a legacy
fallback (no captured InitializeResult -> behave as before)
- _discover_tools(): skip tools/list for non-tool servers
- keepalive: use the universal ping request for non-tool servers
- _refresh_tools(): guard against tools/list_changed from non-tool servers
E2E verified with a real stdio prompt-only FastMCP-style server: on main
it fails all 3 connection attempts with Method-not-found; with this fix
it connects, lists prompts, answers ping keepalives, and shuts down
cleanly.
The desktop "Local / custom endpoint" onboarding never collected an API
key and /api/model/set silently dropped one, so an auth-gated endpoint
(e.g. a hosted vLLM behind a key) could never enumerate models — and
Settings' "Set up custom endpoint" routed `custom` into a non-existent
OAuth flow, booting the user back to the first screen (the reported loop).
Backend (web_server.py):
- /api/providers/validate accepts an optional api_key and sends it as a
Bearer header when probing a custom endpoint's /v1/models.
- /api/model/set accepts api_key, persists it to model.api_key (same
switch/preserve lifecycle as base_url), and registers a named
custom_providers entry via _save_custom_provider — matching the
`hermes model` CLI flow so the endpoint shows up as a ready picker row.
Desktop:
- ApiKeyForm shows an optional API key field for the local/custom option;
the key is threaded through saveOnboardingLocalEndpoint → validate +
setModelAssignment.
- New onboarding `localEndpoint` intent + startManualLocalEndpoint(); the
Settings "Set up custom endpoint" button now opens the local-endpoint
form (URL + key) instead of the OAuth dead-end.
- Added localApiKeyPlaceholder i18n key (en + types + zh).
Tests: api_key lifecycle on _apply_main_model_assignment, key persistence
+ custom_providers registration on /api/model/set, Bearer-header probe;
onboarding store forwards + persists the key.
Cherry-picked from #39840 by @flyinhigh and rebased cleanly on main.
- Defer config fetch in createGatewayEventHandler until gateway.ready to
avoid render-phase RPC that can mutate transcript state and trigger
React error 301 in embedded dashboard PTYs.
- Use undici WebSocket fallback when globalThis.WebSocket is unavailable
(Node attach mode and sidecar mirror sockets).
- Add regression tests for both fixes.
Co-authored-by: flyinhigh <flyinhigh@users.noreply.github.com>
Collapse the verbose multi-line rationale comments across the TUI/desktop/
backend approval surfaces into single-line "why" notes, and derive
APPROVAL_OPTS_NO_ALWAYS from APPROVAL_OPTS instead of re-listing it.
No behavior change.
Node >=18 / Electron 40 ship fetch; the hand-rolled http/https.request
plumbing buys nothing. AbortSignal.timeout replaces the socket timeout,
protocol guard and >=400 rejection semantics preserved. 13/13 unit
tests and the live web_server.py repro both green over the new
transport.
Both spawn paths (startHermes, spawnPoolBackend) duplicated the same
resolve -> log-fallback -> foreign-check -> throw dance. Collapse it into
adoptServedDashboardToken(baseUrl, spawnToken, {childAlive, label}) in
dashboard-token.cjs; childAlive is a thunk so liveness is sampled after
the fetch. Drop the redundant backendPool.delete in the pool's throw
path (the child exit/error handlers already own pool eviction).
Validated end-to-end against a real web_server.py backend, not just
units: token-injection regex vs the actual served index.html, foreign
refusal (dead child + live squatter), benign drift adoption, and the
401-vs-200 token auth split on /api/sessions.
When a tirith content-security warning is present the approval backend
forces allow_permanent=False and silently downgrades an "always" choice to
session scope (the persistence loop in check_all_command_guards only honors
"always" → permanent when no tirith finding exists). But the gateway notify
payload that drives the TUI and the Electron desktop app never carried that
flag, so both surfaces always rendered "Always allow" — offering a permanent
allow the backend would quietly refuse to persist.
Plumb allow_permanent end-to-end:
- tools/approval.py: include `allow_permanent: not has_tirith` in the gateway
approval_data the notify callback emits as `approval.request`.
- ui-tui: thread `allowPermanent` through the event handler, gateway types,
and ApprovalReq; ApprovalPrompt drops the "always" option (and renumbers the
quick-pick keys) when it's false.
- apps/desktop: thread `allow_permanent` through the gateway payload type, the
per-session approval store, and the inline ApprovalBar, which now hides the
"Always allow…" dropdown item when permanent allow is disallowed — reusing
the existing DropdownMenu / confirm-Dialog UI.
The desktop/TUI render path for approvals already landed in #38578 (the root
cause of approvals not surfacing in the GUI); this completes the salvage of
#37856 by carrying allow_permanent across both surfaces. #37856's original
thread-local _block() approach is dropped: desktop/TUI approvals resolve via
approval.respond → resolve_gateway_approval (the per-session queue), not the
_block()/request_id correlation, so a worker-thread callback waiting on _block
would never be released by the real UI.
Tests: gateway notify payload carries allow_permanent (True without tirith,
False with a tirith warning); ui-tui approvalAction reduced option set +
event-handler allowPermanent propagation; desktop store round-trip + the
ApprovalBar showing/hiding "Always allow".
Supersedes #37856Closes#37812
Co-authored-by: LeonSGP43 <cine.dreamer.one@gmail.com>
Two fixes to the Electron desktop launch path, with the port-reservation logic extracted into a unit-tested module:
1. hermes:bootstrap:reset ("Reload and retry") only cleared connectionPromise, leaving the live backend alive; the orphan kept binding PORT_FLOOR (9120) so the next startHermes() hit EADDRINUSE / "Object has been destroyed" and the window looped. Await teardownPrimaryBackendAndWait() so the reset stops the old backend before restarting.
2. pickPort() probes-then-closes a socket before the real bind happens in a separate Python child, so two concurrent spawns (primary + pool backend) could both be handed PORT_FLOOR and one died with EADDRINUSE. The reservation bookkeeping is extracted into electron/port-pool.cjs (PortPool): pickPort() reserves the chosen port until the child exits and releases it on every exit/error/throw-before-spawn path, closing the TOCTOU window.
PortPool is dependency-injected (probe passed in) and socket-free, unit-tested in electron/port-pool.test.cjs (8 cases) and wired into the test:desktop:platforms script.
(cherry picked from commit d4133945b9)
The served-token fallback adopts whatever token the dashboard HTML
injects. That is correct when our own child regenerated the token (env
pin lost across a shell-wrapped spawn), but wrong when the readiness
probe answered from a process we did not spawn: /api/status is public,
so an orphaned dashboard squatting the port passes waitForHermes while
our child dies on the bind conflict. Silently adopting that process's
token would authenticate the renderer against a foreign backend,
possibly on the wrong profile.
Discriminate on child liveness: the desktop pins
HERMES_DASHBOARD_SESSION_TOKEN on every spawn, so a live child always
serves our token. Served-token mismatch + dead child = foreign backend;
fail the boot loudly instead of connecting. Mismatch + live child keeps
the adopt-served-token salvage from #43720.
`@assistant-ui/store`'s index-keyed child-scope lookup (`tapClientLookup`)
throws — rather than returning undefined — when a subscriber reads an index
the message/parts list no longer has. During high-frequency store replacement
(switching sessions mid-stream, gateway reconnect replay) a subscriber from
the previous, longer list is still in React's notification queue and reads one
slot past the new, shorter array before it can unmount. The throw
(`Index N out of bounds (length: N)`, the classic index === length off-by-one)
unwinds all the way to the root error boundary and blanks the entire window,
even though the store self-heals on the very next consistent snapshot.
Wrap each virtualized message group in a tiny boundary that swallows ONLY this
transient lookup race and auto-recovers when the message signature changes
(the existing list-mutation key). Any other error re-throws to the root
boundary, so genuine bugs still surface.
Upstream-tracked and unresolved: assistant-ui/assistant-ui#4051, #3652.
Co-authored-by: mollusk <mollusk@users.noreply.github.com>
Desktop spawns its dashboard backend with `--profile <name>` and
`HERMES_DESKTOP=1`. cmd_dashboard's unified-launch routing treats any
named profile as a request for the shared machine dashboard: it re-execs
as the default profile (dropping HERMES_HOME) or, when one is already
listening, prints "Machine dashboard already running ... Managing profile
'<name>'" and exits 0. Either way the desktop-spawned child exits before
the app sees a ready backend, so Desktop retries forever — the Windows
named-profile boot loop in the post-mortem.
Skip the machine-dashboard reroute when HERMES_DESKTOP=1 so desktop pool
backends stay per-profile (which is what the pool expects). Carved out of
#44478.
Co-authored-by: AJ <yspdev@gmail.com>
The desktop chat surface talks to the dashboard's in-process /api/ws
gateway, which builds agents through tui_gateway.server._make_agent. That
path only snapshots the existing tool registry — MCP discovery is started
by tui_gateway/entry.py (the stdio TUI), which the dashboard process never
runs. So a profile's configured MCP servers never connect under the
desktop app and sessions show no MCP tools.
Start a shared background MCP discovery thread at dashboard startup (via
hermes_cli.mcp_startup, bounded so a slow/dead server can't block boot),
and have _make_agent briefly join that thread in addition to the existing
entry-owned TUI thread before snapshotting tools.
Carved out of #44478.
Co-authored-by: AJ <yspdev@gmail.com>
The deploy-site skills index crawl was capped at ~3k ClawHub entries
because CATALOG_WALK_BUDGET_SECONDS applied to max_items=0 walks too.
Only enforce the wall-clock budget for bounded browse requests and pass
limit=0 from build_skills_index so CI walks the full catalog.
Co-authored-by: Cursor <cursoragent@cursor.com>
* docs: finish Automation Blueprints terminology rebrand
Replace leftover "Automation Templates" wording from the Cron Recipes
rebrand, rename the copy-paste cookbook guide to Automation Recipes, and
point the marketing gallery link at the blueprints catalog.
Co-authored-by: Cursor <cursoragent@cursor.com>
* docs: use Automation Blueprints instead of Recipes in guide
Rename the cookbook guide from automation-recipes to
automation-blueprints so sidebar and copy match the product term.
Co-authored-by: Cursor <cursoragent@cursor.com>
* docs: rename automation-blueprints-catalog to automation-blueprints
Drop the -catalog suffix from the reference page slug and title, and
move the copy-paste cookbook to automation-blueprint-examples so the
main Automation Blueprints doc is unambiguous.
Co-authored-by: Cursor <cursoragent@cursor.com>
* Revert "docs: rename automation-blueprints-catalog to automation-blueprints"
This reverts commit 605f1eeab5.
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Legibility pass on the consolidated prefix: collapse the topic-overlap rule
from three overlapping sentences into one WINS sentence + one discard/no-wrap-up
sentence (same constraints, less dilution), fix the module docstring to
describe the headings that actually shipped, and correct the #10896 comment's
heading name (Historical Pending User Asks).
The prompt consolidation above retires the carveout-era prefix. Without a
frozen copy in _HISTORICAL_SUMMARY_PREFIXES, summaries persisted by
pre-upgrade builds would lose detection (_is_context_summary_content) and
renormalization (_strip_summary_prefix) — the exact regression class the
tuple exists to prevent. Adds contract tests covering every frozen prefix.
Refs #41607#38364#42812
A WSL2 user reported the top two left-sidebar items being unclickable
while the rest of the UI works. That symptom shape matches an
-webkit-app-region:drag hit-test band eating clicks, not GPU/compositing:
the shell's titlebar drag strips (app-shell.tsx) span the top 34px and
the nav group clears them by only 6px, and drag regions win hit-testing
over DOM regardless of pointer-events. Linux WCO (Electron >=32) is the
newest implementation and has known region quirks (electron#43030).
Apply the same no-drag carve-out the codebase already uses for sticky
user bubbles (USER_BUBBLE_BASE_CLASS in thread.tsx) to the sidebar nav
buttons. Harmless on every platform: the rows were never meant to be
draggable surface.
Root-cause hardening for the stranded-empty-registry failure behind
'No web search/extract provider configured': discover_and_load() set
_discovered=True before scanning, so a sweep that raised partway was
swallowed by callers as a warning and every later call early-returned
against an empty registry for the process lifetime. The flag now acts
only as a re-entrancy guard and is reset when the sweep raises, so the
next call retries discovery.
Pins the invariant that _ensure_web_plugins_loaded registers the keyless
Parallel default (and the wider bundled set) even when the general plugin
discovery raises, that the direct-registration fallback honors plugins.disabled,
and that it stays a no-op on the healthy path.
web_search/web_extract are documented to work with zero setup via the bundled
keyless Parallel free-MCP backend, but that only holds when the bundled
plugins/web/* providers are registered. The dispatch relied entirely on the
general plugin sweep to do that; when the sweep finishes without registering
them (its exception swallowed as a warning, a packaged layout where it ran
before the bundled tree was importable, or a stale empty-discovery cache), the
registry is empty and BOTH tools dead-end on "No web {search,extract} provider
configured" — despite needing no setup at all.
_ensure_web_plugins_loaded now verifies the keyless default landed after the
sweep and, if not, registers the bundled web providers directly against the
registry. Idempotent, a no-op on the healthy path (one dict lookup), and honors
an explicit plugins.disabled entry.
* fix(discord): recover from runtime gateway task exits
Salvaged from #39416 (AMEOBIUS) — cherry-picked only the task-exit
recovery; the original PR was 1081 commits behind with 28 unrelated
commits.
A post-ready discord.py WebSocket crash left the gateway split-brained:
producers stayed active while Discord stopped responding. After this fix
the adapter calls _set_fatal_error(retryable=True) + _notify_fatal_error()
so the existing GatewayRunner reconnect watcher replaces the dead adapter.
Also adds _wait_for_ready_or_bot_exit() so startup failures (SOCKS/proxy
errors, invalid tokens) surface fast instead of burning the full ready
timeout. Because connect() no longer waits via asyncio.wait_for on that
path, test_connect_releases_token_lock_on_timeout is updated to trigger
the timeout through the new helper (same lock-release contract).
3 tests pass (2 new runtime-failure tests + the updated timeout test);
test_discord_connect.py and test_discord_slash_commands.py green.
Co-Authored-By: ameobius <ameobius@local.host>
* fix(test): patch _wait_for_ready_or_bot_exit in timeout cancel test
connect() no longer uses asyncio.wait_for for the ready handshake, so
test_connect_timeout_cancels_bot_task was hanging for 30s in CI.
Co-authored-by: Cursor <cursoragent@cursor.com>
---------
Co-authored-by: ameobius <ameobius@local.host>
Co-authored-by: Cursor <cursoragent@cursor.com>
Follow-up to #44389: the generic 'except Exception' branch in connect()
had the same orphaned-task hazard as the timeout branch. Extract the
cancel-and-await logic into _cancel_bot_task() and call it from all
three sites (timeout branch, exception branch, disconnect()).
Also adds deaneeth to AUTHOR_MAP.
When connect() times out waiting for the Discord ready event, the background
asyncio.Task running client.start() was not cancelled. discord.py's internal
reconnect loop can ignore client.close() while a WebSocket handshake is in
flight, so the orphaned task eventually completes and fires on_ready.
A later successful reconnect then leaves two live Discord clients in the same
process — each with its own on_message handler and MessageDeduplicator instance
— so every @mention creates two threads because the per-adapter dedup caches
cannot catch cross-client duplicates.
Fix: explicitly cancel and await _bot_task in two places:
1. The asyncio.TimeoutError handler inside connect() — catches the case where
the adapter's own inner wait_for fires before the gateway's outer timeout.
2. The start of disconnect() — the load-bearing path, always reached via
_dispose_unused_adapter regardless of which timeout fired first.
Root cause confirmed from production logs: a Jun 8 network outage caused three
consecutive connect() timeouts. The first attempt's bot_task completed its
handshake 4 minutes later ("Connected as") with no preceding watcher line,
then the watcher's real reconnect also connected 90 seconds after that. The two
clients ran continuously for 41+ hours, confirmed by the same user message
appearing as two separate inbound events in two different thread IDs 357ms apart.
Regression tests added to tests/gateway/test_discord_connect.py:
- test_connect_timeout_cancels_bot_task: simulates a connect() timeout with a
NeverReadyBot and asserts _bot_task is None afterward
- test_disconnect_cancels_running_bot_task: injects a live zombie task, calls
disconnect(), and asserts the task is cancelled and the attribute cleared
Sibling site of the PDF/DOCX note fixed in PR #44175: the audio file
attachment context note led with "Ask the user what they'd like you to
do with it", steering the model into asking instead of transcribing.
Rewritten to instruct the agent to transcribe/process the file itself
when the request involves its content, only asking when intent is
genuinely unclear. Contract assertion added to the existing audio
attachment note test.
Pin the contract for _build_document_context_note: text documents confirm the
inlined content and record the path; binary documents (PDF/DOCX/XLSX/octet-
stream) tell the agent to extract the text itself and never instruct it to ask
the user to paste the contents.
When a user attached a binary document (PDF, DOCX, XLSX, …) in chat, the
context note prepended to the turn said "Ask the user what they'd like you to
do with it." That steered the model into asking the user to paste the
contents rather than extracting the text it is fully capable of reading — so
attached PDFs/DOCX appeared "unreadable" to the agent.
Rewrite the binary-document note to tell the agent the file is a non-text
format saved at the given path and to extract its text itself (e.g. via the
terminal tool or the ocr-and-documents skill) before answering. Text
documents (whose content is already inlined by the platform adapter) keep
their existing note. The note construction is pulled into a small
`_build_document_context_note` helper so it is unit-testable.
Set CI=1 in _run_npm_install_deterministic so the package's /dev/tty
postinstall demo is skipped during hermes dashboard web UI builds.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(desktop): stop file tree throwing "two HTML5 backends" on remount
The Agent Workspace file tree (react-arborist) shows a permanent "TREE ERROR"
with `[error-boundary:file-tree] Cannot have two HTML5 backends at the same
time.` react-arborist mounts its own react-dnd DndProvider + HTML5Backend per
<Tree>. react-dnd v14 keeps that manager on a global, ref-counted singleton
context and nulls it when the count reaches 0. The tree is keyed on
`${cwd}:${collapseNonce}`, so changing folder / collapsing forces a fresh
<Tree>; during the remount the singleton can be torn down and recreated while
the previous HTML5Backend still owns `window.__isReactDndHtml5Backend`, so the
new backend's setup() throws. The error boundary then sticks, because "Try
again" just remounts into the same race.
Pass arborist a stable, app-lifetime `dndManager` (new getFileTreeDndManager
singleton) so it reuses one backend for the life of the app and never
double-claims the window flag. Drag/drop is already disabled on this tree;
this only changes how the (unused) dnd backend is provisioned.
Promotes dnd-core and react-dnd-html5-backend to explicit deps (already present
transitively via react-arborist's react-dnd 14.x line, so they dedupe to one
instance).
* fix(nix): bump npmDepsHash for desktop dnd deps
Adding dnd-core / react-dnd-html5-backend changed the workspace
package-lock.json, so the single workspace-root npmDepsHash in
nix/lib.nix was stale and the nix build failed. Regenerate it
(hash from the failing nix CI job's 'got:' value).
* fix(nix): update npmDepsHash for merged lockfile
After merging main, the workspace lockfile combined main's dep
changes with the desktop dnd additions, so the npmDepsHash needed
recomputing again. Hash from the nix lockfile-check job.
* fix(nix): use fetchNpmDeps hash for desktop dnd lockfile
prefetch-npm-deps reported sha256-lVnybH9RE/... but fetchNpmDeps
wants sha256-mYgKXE/FL4hnkrEvpVv+ULM/oeyIfO2AM9Ol8OrfWm0= for the
merged workspace lockfile. Use the nix build 'got:' hash so CI passes.
---------
Co-authored-by: Brooklyn Nicholson <brooklyn.bb.nicholson@gmail.com>
When a user toggles off the last visible model for a provider group, the
effectiveVisibleKeys() function treated the missing provider prefix as
'never customized' and re-added the default models on the next render,
causing all models to snap back to enabled.
Fix: store a sentinel key (e.g. 'provider::') when the last model for a
provider is toggled off. The sentinel distinguishes 'user hid everything'
from 'user never customized', preventing the default-fallback path from
re-adding models the user explicitly chose to hide.
Fixes#43485
Turn off browser spellcheck, autocorrect, and autocomplete on the main chat composer and message-edit composer so code, paths, and slash commands are not flagged or altered.
The coding-posture brief told GPT/Codex models to use patch mode='patch'
(V4A) for structured/multi-file changes but mode='replace' "for a single
small swap". That second nudge points those models at a format their
first-party harness never taught them.
Verified against openai/codex (current main): apply_patch is the ONLY file
editor in codex-rs — zero occurrences of str_replace/old_string anywhere in
the repo; the grammar (core/src/tools/handlers/apply_patch.lark) is exactly
the V4A dialect our patch_parser implements; the shipped model prompts
(gpt_5_codex, gpt-5.2-codex, gpt-5.1-codex-max + instruction templates)
explicitly say to use apply_patch "for single file edits"; and the tool is
gated per model via ModelInfo.apply_patch_tool_type, i.e. OpenAI ships
V4A-for-everything as model metadata.
The GPT-family line now steers to mode='patch' for all edits, single-file
included. The replace-family line (Claude + open-weight) is unchanged —
Claude Code's FileEdit is old_string/new_string/replace_all exact string
replacement (confirmed from Anthropic's shipped sdk-tools.d.ts, the only
file editor in its tool union), matching our mode='replace'.
CI tests the PR merged with current main, where the new /memory canonical
command filled Slack's 50-slash cap: with btw/bg/reset all pinned ahead of
canonicals, the last canonical (/debug) got clamped and the Telegram-parity
test failed. Canonical commands must win slots over alias spellings — /new
keeps its native slot and 'reset' stays reachable via /hermes reset.
Also updates test_includes_aliases_as_first_class_slashes to assert the
pinned-alias contract (_SLACK_PRIORITY_ALIASES survive) instead of a
specific unpinned alias's survival, which was the same change-detector
pattern the docstring already warned about.
Review fixes for the Cron Recipes stack before release:
- hydration-move: */90 in the cron minute field silently wraps to hourly
(croniter-verified) — 90/120-minute options never fired at their stated
cadence. Replaced with an hour-field step (0 9-17/2 * * 1-5) and an
interval_hours slot whose options (1/2/3h) all fire as labeled.
- fill_recipe: reject unknown slot names. A typo'd 'tiem=07:15' used to
silently create the job at the 08:00 default; now it 422s on the dashboard
form and errors on the slash/deep-link paths with the valid slot list.
- deliver slot: non-strict enum (options are suggestions, scheduler
validates downstream) so slack/whatsapp/etc. users aren't locked out;
GET /api/cron/recipes rewrites its options from cron_delivery_targets()
so the dashboard form only offers configured platforms; help text no
longer claims dashboard-created jobs deliver to 'the chat you set this
up from' (the endpoint strips origin — they go to the home channel).
- gateway: success/accept messages no longer point at /cron (cli_only);
surface-aware hint instead. Conversational fill now sends the
'Setting up X — I'll ask you a couple of things…' ack before the agent
turn, matching the CLI experience.
- important-mail catalog entry: reference the urgency classifier by module
path (python3 -m cron.scripts.classify_items) instead of baking an
absolute host path into the job prompt — stale after relocation and
nonexistent on remote terminal backends. cron/scripts is now a real
package and ships in the wheel (pyproject packages.find).
- export_recipe: interval schedules round-trip again — parse_schedule
stores 'minutes' but the renderer only read 'seconds', so every interval
job exported as the silent '0 9 * * *' fallback.
- skills_hub install: say so when a recipe suggestion is dropped
(latched dedup or pending cap) instead of printing nothing.
Targeted tests: 58 cron/recipe + 261 web_server pass; E2E-validated all
14 recipes fill+parse, hydration cadences via croniter, typo rejection on
slash + endpoint paths, surface-aware hints, and interval export round-trip.
Reworks the chat-line UX: pick a recipe by name and the agent asks you for
what it needs, one question at a time, instead of forcing you to hand-type a
slot=val command line.
- /cron-recipe -> lists the catalog
- /cron-recipe <name> -> forgiving name match (exact/prefix/substring/
fuzzy; ambiguous lists candidates), then seeds
the agent with a natural-language fill request
built from the recipe's typed slots + schedule
and prompt templates. The agent asks for each
value one at a time and calls the EXISTING
cronjob tool. No new tool.
- /cron-recipe <name> slot=val -> unchanged deterministic path (fill_recipe ->
create_job) for the dashboard/docs/power user.
Mechanism (no new plumbing, invariant-safe — the seed enters as a normal user
turn, never a synthetic injection):
- shared handler returns RecipeCommandResult{text, agent_seed}; match_recipe()
and build_recipe_seed() are the new shared pieces.
- gateway: dispatch rewrites event.text to the seed and falls through to the
agent (the same pattern /steer uses).
- CLI: handler sets a one-shot self._pending_agent_seed; the interactive loop
consumes it right after process_command() and runs it as the next turn.
The typed-slot schema stays the single source of truth (still validates the
form/inline path via fill_recipe); the agent path just renders those slots into
the questions to ask. Docs updated to lead with the name-then-ask flow.
A 'recipe' is a one-place definition of an automation that every surface
renders natively. The slot schema (cron/recipe_catalog.py) is the single
source of truth; four renderers consume it, and all paths end at the same
cron.jobs.create_job — no second job engine.
Form where there's a screen, conversation where there's a chat line:
- Dashboard / GUI app: a Recipes sub-tab on the Cron page renders each
recipe's typed slots as a form (time-picker, enum dropdown, free-text);
submit POSTs /api/cron/recipes/instantiate which fills + creates the job.
- CLI / TUI / messengers: /cron-recipe lists the catalog, shows a recipe's
fields, or fills + creates from a pasted 'key slot=val' command. The shared
handler (hermes_cli/cron_recipe_cmd.py) names any missing/invalid slot so
the agent can ask a targeted follow-up.
- Docs: a generated Cron Recipes catalog page (website, .mdx + React cards)
shows each recipe with a copy-paste command and a 'Send to App' button.
- Desktop: a hermes:// URL scheme (Electron single-instance lock +
setAsDefaultProtocolClient + open-url/second-instance) routes
hermes://cron-recipe/<key>?slot=val into the chat composer pre-filled.
Typed slots (time/enum/text/weekdays) with defaults: users never type raw
cron — recipes parameterize time-of-day and weekday sets and translate to
cron expressions; a free-text 'schedule' slot is the full-flexibility escape
hatch. Consent-first throughout: nothing schedules without an explicit submit
or send.
Core:
- cron/recipe_catalog.py — CronRecipe + RecipeSlot, 5 curated recipes,
recipe_form_schema / recipe_slash_command / recipe_deeplink /
recipe_catalog_entry renderers, fill_recipe (validate + translate to
create_job kwargs).
- hermes_cli/cron_recipe_cmd.py — shared /cron-recipe handler (CLI + TUI +
gateway never drift). CommandDef + dispatch in commands.py / cli.py /
gateway/run.py.
Dashboard: GET /api/cron/recipes + POST /api/cron/recipes/instantiate
(web_server.py), CronRecipes.tsx gallery+form, Segmented sub-tab on CronPage,
api.ts methods + types.
Desktop: hermes:// scheme end to end (main.cjs deep-link router + ready-queue,
preload onDeepLink/signalDeepLinkReady, global.d.ts types, desktop-controller
composer prefill, electron-builder protocols key).
Docs: extract-cron-recipes.py generator wired into prebuild.mjs,
cron-recipes-catalog.mdx + CronRecipesCatalog React component, sidebar entry.
Generated index json gitignored like skills.json.
Tests: 23 core (catalog/slots/schedule-resolution/validation/renderers/command
handler/generator) + 5 web_server endpoint tests. E2E verified end to end:
slot fill -> create_job -> persisted job with correct schedule/deliver/origin.
Hermes can propose automations and let the user accept them with one tap
via /suggestions, instead of making them assemble cron jobs by hand. Every
proposal — wherever it originates — flows through one surface.
Sources (the 'where suggestions come from'):
- catalog: curated starter automations (daily briefing, important-mail
monitor, weekly review, workday-start reminder) via /suggestions catalog
- recipe: installing a skill that carries a metadata.hermes.recipe block
registers a suggestion instead of auto-scheduling
- usage / integration: reserved for the background-review detector and
account-connect triggers (sources defined; emitters land next)
Pieces:
- cron/suggestions.py — the store. add/list/accept/dismiss, dedup+latch by
key (dismissed proposals never re-offered), pending cap so it can't become
a nag wall. Accepting calls the existing cron.jobs.create_job — there is
NO second job engine. Mirrors jobs.py storage (atomic writes, lock, 0600).
- cron/suggestion_catalog.py — the curated set. The important-mail monitor
entry is where the old proactive-monitor poll->classify->surface engine
lives now (cron/scripts/classify_items.py + the 'monitor' aux task), as ONE
catalog automation rather than a standalone feature.
- tools/recipes.py — recipe<->job bridge; register_recipe_suggestion() makes
a recipe source 'recipe' of this surface. recipe_to_job_spec() is the single
translation both the direct and suggestion paths share.
- hermes_cli/suggestions_cmd.py — shared /suggestions handler (CLI + gateway
never drift); /suggestions [accept N|dismiss N|catalog|clear].
- Wired: CommandDef + CLI dispatch (cli.py) + gateway dispatch (gateway/run.py)
+ aux 'monitor' task (config.py) + recipe-install hook (skills_hub.py).
Consent-first throughout: nothing auto-schedules; acceptance is always
explicit; dismissals latch.
Supersedes #41122 (proactive-monitor) and #41127 (recipes): both fold in here
as a catalog entry and a suggestion source respectively.
Tests: store (dedup/cap/accept/dismiss/latch), catalog seeding+idempotency,
recipe->suggestion bridge, command handler, aux config. E2E: recipe SKILL.md
-> parsed -> suggested -> accepted -> real cron job persisted to jobs.json.
The coding posture's names-only demotion of non-coding skill categories
(#44342) applied under the default auto mode, silently changing the skill
index for every user in a git repo. Index changes must be opt-in: demotion
now only fires under agent.coding_context=focus, alongside the toolset
collapse. auto/on leave the skill index untouched; focus semantics are
unchanged (demoted, never hidden; deny-list keeps coding-adjacent and
custom categories at full entries).
The Config page read config_path from /api/status, which is machine-global
and always reports the profile the dashboard process was started under.
After switching profiles with the global switcher, the header kept showing
the old profile's path (e.g. /root/.hermes/profiles/worker_1/config.yaml)
even though reads/writes correctly targeted the new profile.
Fix: /api/config/raw now returns the resolved path alongside the YAML
(resolved inside _profile_scope, so it follows ?profile=). ConfigPage
prefers that scoped path and only falls back to /api/status for old
servers. ProfileKeyedRoutes already remounts the page on switch, so the
header refreshes immediately.
Editor-grade mouse selection parity with the Ink TUI (hermes-ink selection.ts):
a second click in the 500ms/1-cell chain selects the same-class character run
under the cursor (iTerm2 word set, wide-glyph aware), a third selects the line,
and dragging with the button held extends word-by-word / line-by-line while the
clicked span stays selected — anchor flips across the span on direction change.
Core knows only press-drag char selection, so this is a boundary shim
(multiClickSelect.ts) wrapping the renderer's startSelection/updateSelection
seam; word bounds read the presented frame's char grid. Native quirks probed
and pinned: per-renderable selection anchors are fixed at set time (anchor
flips restart the selection) and forward selections exclude the focus cell
(inclusive spans seed focus at hi+1). Pure scanning logic in logic/multiClick.ts;
20 new tests (pure + real-mouse-path frames); demo.tsx installs the seam for
tmux smokes.
Pipeline cell re-run with external VmRSS/VmHWM sampling on the dedicated
tmux servers: ink 5.07MB peak, otui 4.94MB peak, zero growth across the
800-msg stream (alt-screen = fixed grid, no scrollback accrual). The
emulator leg is a tie on memory as well as CPU.
Chaos (5 scenarios x 2 UIs): every gateway-death/hang scenario fully
recovers on BOTH UIs — Ink respawns immediately (~80ms, no backoff),
OpenTUI after its 1s backoff; transcripts converge byte-identical to
the never-killed digest; zero orphans. PTY EOF: both exit and reap the
gateway in ~100ms (Ink takes 4.1s to die vs OpenTUI 0.2s).
Pipeline (800 msgs @30ev/s inside a dedicated tmux server): UI CPU
82.4s (ink) vs 79.1s (otui); tmux-server leg ~0.4s BOTH — the
'Ink costs more in the emulator' hypothesis is not supported at this
workload. Frame pacing: otui 22.3fps vs ink 15.8fps, interframe p95
103ms vs 209ms. Echo latency: both excellent (p50 1-2ms); submit to
first-token-paint 44ms (ink) vs 107ms (otui).
Reconstructs what killed gateways and TUI sessions: merges
~/.hermes/logs, journalctl/dmesg OOM kills, sessions-DB abnormal ends,
and git worktree state into one timestamp-sorted timeline plus a
summary (OOM victims, tui_gateway exit-reason histogram, orphaned
sessions). Findings from the 2026-06-04..11 window: gateway deaths were
kernel OOM kills selecting hermes via OOMScoreAdjust=200 under pressure
from OTHER tools' multi-GB leaks; TUI deaths were top-down SIGHUP/EIO
cascades from unit teardown; tui_gateway itself crashed in 0/152 exits.
Real sessions DB (~/.hermes/state.db, 444 tui+cli sessions): p50=20,
p75=53, p90=182, p95=340, p99=1941 msgs. The assumed 200-300 'realistic
band' is actually the p90-p95 region; the typical session is ~20 msgs
and the p99 tail reaches ~1940 (real ~2k-msg sessions exist).
New cells at those anchors (2 reps, ink vs otui-capped, 2G scope), VmHWM
medians: 100 msgs 163 vs 222MB; 300 msgs 180 vs 268MB; 2000 msgs 234 vs
671MB (2.9x). session-distribution.mjs regenerates the JSON; run.mjs
gains --configs to skip redundant configs (otui-uncapped == capped below
the 3000-row cap).
Three behavior changes to the hermes -w worktree lifecycle:
1. Git-native locks. _setup_worktree now locks its worktree
(git worktree lock --reason "hermes session pid=<pid>"), and
_prune_stale_worktrees skips locked worktrees at ANY age — a lock
from a live or crashed session means "do not touch". New helpers
_lock_worktree / _unlock_worktree / _worktree_is_locked (fail-safe:
any error reads as locked) / _worktree_is_dirty (fail-safe: any
error reads as dirty).
2. Dirty trees are preserved. _cleanup_worktree previously destroyed
worktrees with uncommitted changes if there were no unpushed
commits; it now keeps the worktree, branch, and lock when the tree
is dirty OR has unpushed commits, and prints manual cleanup hints
(git worktree unlock + remove --force). The >72h "force remove
regardless" prune tier is removed: pruning may only ever delete
clean, unlocked, fully-pushed worktrees.
3. Branch deletion is gated on removal success. Both cleanup and
prune previously deleted the branch without checking the
git worktree remove returncode, dropping easy reachability of the
commits even when removal failed; the branch is now only deleted
after a successful remove.
Re-ran the cells that crashed, against the fixed binary (a939c9a):
- mem3000 otui-capped: crashed_after_stream exit 7 @ ~900MB
→ completed exit 0, vmhwm 859MB
- mem3000 otui-uncapped: crashed_after_stream exit 7
→ completed exit 0, vmhwm 834MB
- slope10000 otui-uncapped: died exit 7 at 3,000 msgs (197d499 result)
→ completed ALL 10,000 msgs, exit 0, vmhwm 1.57GB —
under the 2GB cgroup cap, no cap-hit, no crash
- fresh ink baselines for both cells (mem3000 257MB / slope10k 328MB).
No "Failed to create SyntaxStyle" anywhere; the store cap now binds at the
handle-safe ceiling (1000 rows) long before the native 65,534-slot handle
table exhausts. report.html + report-assets regenerated via render.mjs
(results are append-only; pre-fix runs remain as the baseline).
Root cause of the bench-suite crash (every otui mem3000/slope cell died at
~3000 lumpy fixture msgs, exit 7, ~880MB RSS — not a cgroup kill):
- @opentui/core 0.4.0 routes EVERY native object through ONE global handle
registry with 16-bit slot indices (core src/zig/handles.zig: INDEX_BITS=16,
MAX_SLOTS=65535, slot 0 reserved). Measured on this install: exactly 65,534
live handles; the next createSyntaxStyle() fails. destroy() DOES recycle
slots — exhaustion means LIVE objects.
- Every TextBufferRenderable burns THREE slots in its constructor
(TextBufferRenderable.ts:77-80: TextBuffer + TextBufferView + SyntaxStyle),
so the mount-everything transcript hits the wall at ~1,400 store rows
(~16 text renderables/row x 3 ~ 47 handles/row): "Failed to create
SyntaxStyle" (zig.ts:4554) throws out of a Solid mount effect.
- The crash was MASKED: CliRenderer's own uncaughtException handler
(handleError -> console.show()) allocates the console-overlay
OptimizedBuffer — another handle — so the handler itself threw "Failed to
create optimized buffer: WxH" and Node died with exit 7 (fatal error in
the uncaughtException handler), hiding the real error.
Why not share one SyntaxStyle (the obvious 3->2): the per-buffer style is
load-bearing — native setStyledText (text-buffer.zig) registers each chunk's
color by NAME ("chunk{i}") into the buffer's OWN style, and registration is
name-keyed-overwrite (syntax-style.zig putStyle), so a shared style would
cross-corrupt chunk colors between every styled <text>. Pooling is unsound
at our layer in core 0.4.0.
The fix, at the seams that are ours:
- boundary/nativeHandles.ts (ffiSafe.ts sibling): SyntaxStyle.create() on a
full table DEGRADES to a detached style (native handle 0) instead of
throwing — JS-side styleDefs/mergeStyles (what markdown/code chunk colors
actually use) keep working; all native calls on handle 0 are inert no-ops.
- boundary/renderer.ts: guard the process error listeners createCliRenderer
installs so an exception INSIDE the handler can never exit-7-mask the
original error again (logged honestly; original error stays the story).
- logic/store.ts: HERMES_TUI_MAX_MESSAGES clamped to a handle-safe ceiling
(1000 rows ~ 47k handles ~ 72% of the table on the realistic fixture).
The old default of 3000 was unreachable — the TUI crashed at ~1,400 rows,
before the cap ever bound. Renderable-weight-aware capping is #27's
(virtualization) to do properly; until then the degrade shim backstops
pathological rows.
TODO(upstream) — issue-shaped, for the OpenTUI repo:
(a) a global 64k handle table with a 3-slot cost per text renderable is
too small for transcript-style TUIs (61k renderables ~ 3k messages);
(b) native allocation failures throw out of the render loop with no
degrade path;
(c) handleError allocates (console overlay buffer) and so crashes on the
very condition it is reporting, masking the root cause with exit 7.
Also: eslint now ignores ui-opentui/.bench/** (bench `nodes`-cell build
artifact broke the lint gate) and .gitignore covers it.
Gate: npm run check green, 599 tests (595 baseline + 3 degrade-path tests
+ 1 cap-clamp test).
The OpenTUI /usage went through the slash-worker subprocess, which
resumes the session WITHOUT a live agent — so it could never show
current-session tokens or costs, and what it did show landed as a
full-screen page.
- slash.exec now answers /usage in-process from the live agent:
per-model rows (requests, tokens in/out, cache, provider-reported
cost when present), session totals/context, a one-line 30-day
summary (SessionDB.usage_totals, real costs only) and a one-line
Nous credits gauge (nous_credits_compact_line, refactored out of
nous_credits_lines). ~8 lines instead of a page.
- Unreported costs render as 'not reported by provider' — never
$0.00 — and the 30d summary omits cost when no session in the
window has a provider-reported figure.
- /usage full keeps the detailed legacy CLI page via the worker.
Cost displays were estimates from a pricing table; on OpenRouter the
status bar never reflected what was actually charged. Now cost is
provider-REPORTED only, end to end:
- OpenRouter requests carry usage:{include:true} (profile + legacy
transport paths); the response usage.cost field (credits, 1:1 USD)
is captured per call into agent.session_actual_cost_usd and
persisted to the sessions DB actual_cost_usd column (NULL-safe:
unreported calls never touch the stored value).
- Nous keeps its x-nous-credits-* header capture; the header delta
now surfaces as the session's real cost via real_session_cost_usd.
- Providers that report nothing accumulate NOTHING: cost fields stay
absent/None (the TUI hides its cost segment), never a fabricated
$0.00 and never an estimate. _get_usage, gateway /usage and the
CLI usage page all switched off estimate_usage_cost for display.
- Per-model session accumulator (session_model_usage) records real
per-call counts and provider-reported cost per model.
User feedback: tool/thinking rows did a "v small quick lil jump up and
down" when toggled, worst on the bottom rows.
Root cause (verified live with 10ms tmux capture sampling): the
transcript scrollbox's sticky-bottom re-pin and the scroll anchor fought
AFTER paint. On a toggle near the bottom, the content-height change runs
ScrollBox.recalculateBarProps -> applyStickyStart("bottom") (the user is
at the sticky position, so _hasManualScroll is false), which paints a
fully bottom-pinned frame; the anchor's 4x16ms scrollTo re-asserts then
yanked the viewport back up. The capture burst shows the transient
pinned frame between two anchored ones on every expand — the visible
down-up flick.
Fix at the cause instead of correcting after the effect: suspend
stickyScroll (a runtime get/set property on ScrollBoxRenderable) BEFORE
running the toggle and restore it ~100ms later, once the content height
has settled. With sticky off, the toggle's layout pass leaves scrollTop
untouched — the clicked header's document position is unchanged (content
grows/shrinks below it), so nothing moves and there is nothing left to
flicker; a collapse past the new bottom clamps naturally via the
ScrollBar scrollSize setter. Restoring recomputes the manual-scroll
state from the actual position: still at the bottom -> keeps pinning for
new content; mid-content -> manual-scroll semantics until the user
returns (the same end state the old anchor produced). Rapid re-toggles
inside the window keep the ORIGINAL saved value.
The far-from-bottom anchor guarantee is unchanged (scrollTop is simply
never touched), pinned headlessly in scrollAnchor.test.tsx along with
the suspension sequencing, the clamp-then-re-pin collapse path, and the
double-toggle restore. ffiSafe's tall-diff scroll-cut regression now
drives the negative-y condition explicitly via wheel scrolls (the old
anchor exercised it through the very transient sticky-bottom frames this
fix removes).
Verified live (tmux, real gateway): before — toggling the bottom rows
painted a transient bottom-pinned frame (f141 of a 10ms burst); after —
three toggle bursts produce ONLY the clean before/after states (4
distinct frames in 458 samples), headers hold their row, including the
bottom-most rows.
User feedback: "for all tools, i'd want all their output viewing enabled
to be infinite by default."
Flip envOutputLines (HERMES_TUI_TOOL_OUTPUT_LINES): unset -> Infinity
(was 200); a positive integer RESTORES a cap (e.g. =200); 0 stays
Infinity for back-compat with the old opt-in-unlimited value; garbage ->
Infinity (unrecognized = no cap asked for). The semantic is now "cap
only when the user asked for one".
The store's raw-result preference follows the same rule: envOutputLinesSet
becomes envOutputUnlimited — whenever the cap is unlimited (the default
now) and a gateway tail-capped result_text (omittedNote) arrives with the
always-full raw result on the wire, the raw result wins, since an
uncapped view of a tail would silently miss the head. With an explicit
finite cap the gateway tail + honest omitted note are kept.
Memory safety is unchanged: tool bodies mount only while EXPANDED (rows
default collapsed and free their Yoga nodes on collapse/unmount), and the
rolling HERMES_TUI_MAX_MESSAGES cap bounds the transcript's high-water
mark.
Tests: env.test.ts expectations flipped (unset/garbage -> Infinity, 0
documented as back-compat); tools.test.tsx "flag unset caps at 200"
becomes "unset renders all 250 lines", plus an explicit =50 cap (+note)
test and =200 restored-cap test; the store preference matrix covers
unset/0 (raw wins), =50 (tail+note kept), and no-raw (tail+note, no
crash). Verified live: seq 1 220 expanded renders rows 201-220 with no
"+N more lines" note.
The native TUI prefetches model.options right after session.create (91df32545,
picker instant-open). The handler is network-bound (~3.7s: pricing fetch + Nous
tier check in build_models_payload) and ran on the gateway's main dispatcher
thread, so every fast-path RPC issued in the first seconds after launch —
complete.slash for the '/' dropdown, session.list, config.get — sat unread
behind it. Measured: first '/' dropdown 1718ms at HEAD vs 53ms at 394f45a3d
(pre-prefetch baseline); 52ms after routing model.options onto the existing
RPC thread pool (_LONG_HANDLERS). The /model picker keeps its 29ms cached open.
A patch tool's result is a JSON record whose payload IS the diff. In a verbose
session the gateway redacts + TAIL-caps result_text (_cap_tui_verbose_text),
so the echo arrived under the native diff in two broken shapes: truncated
mid-JSON (unparseable, so the old JSON.parse check failed open), or — for tall
edits — capped PAST the JSON head, which the store's normalizeOutput then
un-escapes into plain lines that duplicate the diff. North star: no raw JSON
in the transcript, ever.
Three layers:
- gateway: when diff_unified ships, result_text drops the in-JSON diff echo
(_result_sans_diff_echo) — small, parseable, carries only the non-diff
signal (success/files_modified/warnings/lsp_diagnostics).
- fileTool diffOutputPlan: anything starting with '{' under a rendered diff is
suppressed regardless of parseability; parseable JSON with real non-diff
signal (error/warning/lsp_diagnostics) renders JUST those as labeled notes;
a non-JSON fragment whose lines echo the rendered diff is suppressed too
(guards older emitters). Plain-text results (lint tails) still render.
Expanding a tall <diff showLineNumbers> pinned to the scrollbox bottom froze
the TUI with ERR_INVALID_ARG_VALUE looping out of CliRenderer.loop every
frame. Root cause: @opentui/core 0.4.0 marshals OptimizedBuffer
fillRect/drawText/setCell* coordinates as u32 in the FFI table while
LineNumberRenderable.renderSelf passes raw screen coordinates — NEGATIVE when
the diff is partially scrolled above the viewport. Bun's FFI silently wraps
negatives (native side bounds-checks them into a no-op); Node's experimental
node:ffi rejects them. bufferDrawBox already uses i32, which is why ordinary
boxes/text scroll fine and only the diff line-background path crashed.
Fix at the seam we own: boundary/ffiSafe.ts patches OptimizedBuffer to clip
fillRect to the non-negative quadrant and skip negative-origin
drawText/setCell*/drawChar before the FFI call (Bun parity). Installed from
boundary/renderer.ts (live) and test/lib/render.ts (headless). TODO(upstream):
widen those FFI params to i32 so this shim can be deleted.
Ports the engine off the second JS runtime onto Node 26.3 (node:ffi) so the
repo ships a single JavaScript runtime: child_process for the gateway, vitest
for tests, an esbuild + Solid build step. Mouse selection copies the rendered
text you highlight, and the clipboard path is crash-proofed (a broken copy
pipe no longer quits the UI). Renames the engine dir ui-tui-opentui-v2/ ->
ui-opentui/ and updates the launcher/installer/Docker references.
Dev demo (not a test): seeds the bench fixture into the store via the resume path
and renders <App> under a real CliRenderer (no gateway) so you can attach over
tmux, scroll, and eyeball the transcript + the rolling-cap truncation notice.
Run: DEMO_TOTAL=240 HERMES_TUI_MAX_MESSAGES=80 bun scripts/demo.tsx
Bench (realistic fat-turn fixture) put numbers on the cap tradeoff: ~0.65 MB/msg,
~20.4 renderables/msg → 3000 ≈ 2 GB steady RSS, the highest cap within a sane TUI
budget (that ceiling only hit by marathon 3000+-msg sessions; typical cost a
fraction). 1500 was too little scrollback. Tunable via HERMES_TUI_MAX_MESSAGES.
Adds a store `dropped` counter (live overflow in capMessages + the resume slice in
commitSnapshot; reset on clearTranscript) and a dim, selectable=false top-of-
transcript notice — '⤒ N earlier messages — scroll-back capped; full transcript on
the dashboard · session <id>' — so display truncation is visible + points to the
deep-history surface. Display-only: never touches the model's gateway-side context.
Replaces the synthetic ~5.5-node/msg pushes with a deterministic generator
(scripts/fixture.ts): lorem-ipsum user turns + fat assistant turns (markdown +
reasoning + 1-15 tool parts with multi-line results) driven through the real
apply()/commitSnapshot paths. mem-bench.tsx pumps it + checks the resume path.
Realistic cost is ~20.4 renderables/msg (3.7x synthetic); informed the cap tune.
commitSnapshot set the full fetched history then trimmed — briefly handing the
whole transcript to <For>. Since Yoga (WASM) layout memory is grow-only, even a
transient over-cap mount permanently ratchets the high-water mark, partly
defeating the cap when resuming a large session (a real one has ~1980 messages).
Slice to MESSAGE_CAP BEFORE the first setState so resume mounts at most the cap.
Dev bench (not a test, not in the gate suite): mounts <App> under the Solid
test renderer, pushes N streamed turns, samples RSS + mounted-renderable count
with Bun.gc. Demonstrates HERMES_TUI_MAX_MESSAGES=400 pins mounted renderables
at ~2218 vs an unbounded climb to ~55k at 10k messages (RSS flat ~350MB vs
1.3GB). Run: bun scripts/mem-bench.tsx (MEM_BENCH_TOTAL/SAMPLE tunable).
Note in the README CLI section that the terminal UI defaults to the native
OpenTUI engine on Linux/macOS with Bun (provisioned by the installer), and
that the legacy Ink engine remains the automatic fallback (Windows, Termux,
no Bun) and can be selected explicitly with HERMES_TUI_ENGINE=ink. Ink is
not removed — it's the kept fallback.
No in-repo config example documents display.tui_engine (the published config
reference lives on the docs site, not the repo), so there was nothing to
annotate there.
Add an opt-in-safe `install_opentui` stage that provisions the native
OpenTUI TUI engine: it resolves/installs Bun (~/.bun/bin/bun) and runs
`bun install` in ui-tui-opentui-v2 so the launcher's _opentui_available()
probe (Bun + node_modules/@opentui) passes and OpenTUI becomes the default.
Strictly best-effort: skipped on Windows/Termux/Android and when the v2
package is absent; any sub-step failure (no network, Bun install fails,
`bun install` fails) logs a warning via log_warn and returns 0. The stage
never `exit`s and never returns non-zero, so it can't abort the install — a
failed/skipped setup simply leaves the user on the kept Ink fallback.
Registered after node-deps in all three drivers: the monolithic main()
flow, the run_stage_body case dispatcher (opentui-engine), and the
emit_manifest staged-installer JSON.
Flip the default engine: with no explicit HERMES_TUI_ENGINE env / display.tui_engine
config, resolve to 'opentui' when this host is genuinely set up for it (Bun resolves +
the v2 package's entry + node_modules present + not Windows/Termux), else 'ink'. An
explicit env/config choice still wins, and 'ink' remains the universal opt-out. Hosts
without the OpenTUI setup are unaffected (stay on Ink), so nothing strands a user.
- _config_tui_engine_early() now returns None (not 'ink') when unset, so the caller
distinguishes 'explicitly ink' from 'unset' and applies the availability-gated default.
- _bun_bin() split: _bun_bin_or_none() is the non-fatal probe; _bun_bin() still exit(1)s
on the explicit launch path. New _opentui_available() gates the default.
- Verified the full resolution matrix (7 cases) + that the platform/availability gates hold.
Replace the ~5 near-identical `as*()` accessors in
view/prompts/promptOverlay.tsx (one per ActivePrompt kind) with one generic
`narrow(kind)` helper that narrows the discriminated union via a typed type
guard (`p is Extract<ActivePrompt, { kind: K }>`) — no `as`. Each <Match>
branch keeps its precise typed payload. Behavior is identical.
Extract the repeated `setTimeout(() => …close…, 0)` overlay/prompt-close
pattern into a single `deferClose(fn)` helper (src/logic/defer.ts) so the
"why deferred" rationale (let the closing keystroke finish dispatching before
the composer remounts/refocuses) lives in one place.
Rewires the 5 close-defer sites: closePager/closeDashboard/closeSwitcher/
closePicker in view/App.tsx and clearSoon in view/prompts/promptOverlay.tsx.
Timing is unchanged (0ms). Other setTimeout uses (quit window, flashHint,
scroll re-anchor, resize debounce, transport) are NOT close-defers and are
left untouched.
Production boundary/logic .ts is clean of the no-unsafe-* family (gateway
payloads are Schema-decoded), so promote it from warn to error. *.tsx is
exempted: @opentui/solid's JSX namespace types every component return as
error/unknown — a framework limitation, not unsafe app code. Test helpers/
mocks (loose fixtures + async signatures) are exempted too. Remaining warns
are no-unnecessary-condition: intentional defensive guards on untrusted
runtime/gateway data that TS's narrowing can't model.
Replace the two ad-hoc as-cast loose readers in src/logic/store.ts with
effect Schema decode-at-boundary. New src/boundary/schema/SessionInfo.ts
defines SessionInfoPatchSchema + CatalogSchema (decodeUnknownOption),
mirroring GatewayEvent.ts. readInfoPatch + setCatalog now decode once and
build the typed patch/Catalog from the result (Option.none → empty
patch / catalog unset, never crashes). Wire field names verified against
tui_gateway/server.py. Removed the now-dead readOptBool helper. Tests
extended for nested-usage vs top-level context fallback, malformed/partial
payloads, and a garbage catalog.
The ring buffer is bounded (2000) but the NDJSON file was append-only and grew
forever. Add size-based rotation mirroring opencode's keep-N model: track bytes
written in-process (seeded from statSync on open, so we avoid a statSync on every
write) and, when the next line would cross LOG_MAX_BYTES (5 MiB), shift
.log -> .log.1 -> ... -> .log.5 (LOG_KEEP=5, oldest dropped) and resume on a
fresh file. Rotation is best-effort and fully try/catch-wrapped: any fs failure
leaves us appending to the existing file rather than crashing logging. Adds a
temp-dir rotation test (seeds >5 MiB to force a rotation on next write).
A caller-supplied `data` with a circular reference or BigInt makes plain
JSON.stringify throw inside the file-write catch, flipping `fileBroken` and
killing ALL file logging for the session. Add `safeStringify` (WeakSet circular
guard, BigInt -> `${n}n`, wrapped to never throw) and use it for entry
serialization, so a bad payload degrades to a placeholder instead of breaking
the sink. Also model LogLevel schema-first via Schema.Literals + inferred type
(matches boundary/schema/GatewayEvent.ts), and add focused safeStringify +
poison-payload tests.
Add a [1/4] format step to scripts/check.sh running
`bunx prettier --check src` (matching how the script invokes the other
tools), renumbering the existing steps to 2-4. Future formatting drift
now fails the gate.
The unused-imports/no-unused-vars warn → error promotion shipped in the
no-non-null-assertion commit (where the eslint rule changes live).
Run `prettier --write src` over ui-tui-opentui-v2 to normalize formatting
to the repo .prettierrc (no semicolons, single quotes, width 120,
arrowParens avoid, trailingComma none). This worktree had pre-existing
prettier-version divergences across 19 files; normalizing is correct.
No behavior changes — formatting only. The gate (type-check → lint →
bun test) stays green.
Promote strictness in ui-tui-opentui-v2:
- eslint: @typescript-eslint/no-non-null-assertion: error (with a test
override block keeping `!` in *.test.ts/tsx fixtures), and promote
unused-imports/no-unused-vars warn → error.
- tsconfig: add noUnusedLocals + noImplicitReturns.
Remove all 24 production `!` non-null assertions by replacing each with
a real guard / default / early-return, preserving rendered behavior:
- gateway/client.ts: read the pending entry once and guard (vs has()+get()!).
- logic/theme.ts: guard the parseHex regex match; `?? 0` on the always-
in-bounds XTERM_6_LEVELS lookups; restructure backgroundLuminance to
branch into a typed tuple and use charAt() for the 3-digit hex expand.
- view/homeHint.tsx: use Solid's <Show>{value => …} callback form to
narrow info().model / info().cwd instead of `!`.
- view/reasoningPart.tsx: guard the regex match before slicing m[0].
- view/statusBar.tsx: read model/cwd/pct into locals + guard; `?? 0` on
the showBar()-guarded context-bar percentage.
- view/toolPart.tsx: guard the single-arg entry; `?? 0` on the
duration-guarded fmtDuration.
Enable type-aware linting in ui-tui-opentui-v2: add projectService +
tsconfigRootDir to the TS files block and switch the preset to
recommendedTypeChecked. Turn ON as ERROR the high-value promise rules
(no-floating-promises, no-misused-promises, await-thenable) and fix the
3 real floating-promise sites in the gateway client (FileSink
write/flush/end are fire-and-forget on a piped child stdin — marked
with explicit `void`).
Defer the cast/unknown family + the noisy type-checked rules to 'warn'
(gate stays green; eslint exits 0 on warnings) for Phase 2, which will
replace the `as`/`unknown` boundary casts with Schema decoding:
no-unsafe-{assignment,member-access,argument,return,call},
no-unnecessary-condition, no-base-to-string, restrict-template-
expressions, no-unnecessary-type-assertion, require-await.
Bun runs the TS entry directly (no build step), so a missing `bun install`
otherwise surfaces as a cryptic '@opentui' resolve crash + blank UI. Fail
loudly with the fix instead. Part of the gateway build/run hardening.
Arm a startup watchdog after spawning the gateway child: if the unsolicited
gateway.ready handshake never arrives within HERMES_TUI_STARTUP_TIMEOUT_MS
(floor 2s, default 20s), emit a gateway.start_timeout event so the store can
surface a failure line + the captured stderr tail instead of a silent blank UI.
Cleared on ready (dispatch), on stop(); re-arms per recovery respawn.
Apply the existing theme selectionBg token to the plain <text> content
renderables (TextBufferRenderable supports selectionBg/selectionFg) so a
selection draws a clean solid bar that PRESERVES the text fg (no selectionFg →
no SGR-inverse fragmenting). Applied to:
- messageLine: the flat settled/user/system message text.
- toolPart: the args value lines + the output body lines.
Limitation: assistant answers rendered via the native <markdown> renderable
(MarkdownRenderable extends Renderable, not TextBufferRenderable, and
MarkdownOptions has no selectionBg/selectionFg) cannot take the themed highlight
— they fall back to the renderer's default selection style.
Subscribe to the renderer's "selection" event (fires once when a free-form
mouse selection completes) and auto-copy the spanned selectable text via the
existing onCopySelection callback. Unlike the Ctrl+C path, this does NOT
clearSelection() — the highlight persists so the user sees what was copied and
Ctrl+C still works. writeClipboard is idempotent so both paths are harmless.
Subagent hardening pass over boundary/logic/view, findings triaged (most were
false positives or app-lifetime-moot). The 3 genuine fixes:
- liveGateway.stop() now clears the pending 16ms coalesce timer before
client.stop() — a queued flush() could otherwise fire batch()/handlers into a
torn-down store after the layer scope releases.
- store.findToolPart now scans only the LIVE (last) assistant turn, not every
message — a tool.complete pairs with a tool.start in the current turn, so this
avoids matching a same-id tool in an older/resumed turn (and is O(parts)).
- store message.complete with text but NO prior start/delta now creates the turn
(complete-only gateways) instead of dropping the final text; still no empty
bubble when there's no text. +2 regression tests.
Triaged as NOT-a-bug / accepted-risk (documented so they're not relitigated):
@opentui/solid useKeyboard DOES auto-cleanup (onCleanup keyHandler.off, index.js:59);
dimensions/scrollAnchor timers are app-lifetime / try-catch-safe; unbounded-growth,
duplicate-dedup, and split-frame are theoretical for a trusted local newline-framed
subprocess; clipboard spawn timeout + atomic active-session write are minor follow-ups.
93 pass.
The Solid render-test harness (src/test/lib/render.ts + effect.ts) was never
committed — a global ~/.gitignore_global `lib/` rule silently excluded it, so the
opentui-v2 test suite wasn't reproducible from a clean checkout (render.test.tsx
imports ./lib/render). Force-add both + add a repo .gitignore negation
(!src/test/lib/). render.ts also carries the withKeymap() wrapper the keymap
migration needs (view tests mount under a KeymapProvider). 91 pass.
@opentui/keymap@0.3.2 was installed but unused (the spec said we'd use it). Wire
it natively: createDefaultOpenTuiKeymap(renderer) + <KeymapProvider> at the render
root, and a useCloseLayer(target,onClose) helper that registers a focus-within
Esc/Ctrl+C → close layer. Migrated the close handling of sessionSwitcher, picker,
approvalPrompt (close-only), confirmPrompt (y/n via confirm/cancel commands), and
pager + agentsDashboard (close via keymap; scroll/select stay raw — not cleanly
focus-gated). Overlays gain a root ref + focus-on-mount so the focus-within layer
activates. q-close re-added to pager/dashboard (footer advertises it).
Composer history/refocus + masked prompt + the Ctrl+C quit machine stay raw by
design (need the in-flight keystroke / careful state). Test harness gains a
withKeymap() wrapper so view tests mount under a provider. 91 pass; live-verified
/sessions + /tools Esc-close and composer focus recovery after.
Large bracketed pastes no longer flood the composer. On paste, if the text is
≥4 lines or >400 chars, insert a compact `[Pasted text #N +M lines]` chip and
hold the real content in a PasteStore; on submit, expand the chip back to the
full text before sending (free-code model). Single-pass String.replace keeps a
pasted block that itself contains a `[Pasted text #k]` literal safe.
The store is created ONCE in main.tsx and passed App→Composer (NOT per-composer)
so it survives the composer remounting on overlay open/close — a per-composer
store would lose a pending paste mid-compose. +6 unit tests (91 pass). Verified
live: paste 10 lines → chip; submit → transcript shows the full expanded code;
composer cleared.
The composer was a fixed height:3. Match free-code/opencode: native textarea
auto-grow via direct minHeight={1} maxHeight={max(6,⌊rows/3⌋)} props (opencode's
prompt sizing) — 1 row when empty, grows with wrapped/multiline content up to ~a
third of the screen, then scrolls internally. maxHeight is a DIRECT reactive prop
(not in style) so the cap tracks terminal resize via useDimensions. 85 pass;
verified live (a long wrapping line grew the box to 3 rows).
Typing while the textarea was unfocused doubled the FIRST letter: the always-active
handler did ta.focus() AND ta.insertText(key.sequence), but the renderer runs the
global useKeyboard handler BEFORE routing the key to the focused renderable — so
after focus() the same keystroke was also delivered to the now-focused textarea,
inserting it twice. Subsequent keys were fine (textarea already focused → block
skipped). Fix: focus() only; let the textarea insert the char it now receives.
Verified live: typing 'x' then 'y' while blurred yields '❯ xy' (no dup). 85 pass.
Design-judge top nit: Ink's bordered-box-around-the-session-info is the single
biggest 'designed home screen vs log output' signal, and the flat left-aligned
version lacked it. Wrap the model/dir/session block + Tools/Skills/MCP sections +
summary in a full border box (theme border token); banner+tagline stay above,
tips below. 85 pass.
Rebuilt the home screen to match hermes --tui: the HERMES-AGENT banner + tagline,
then a session info block (model · Nous Research / dir (branch) / Session: <id>),
then SEPARATE collapsible sections — Available Tools (enabled toolsets each as
'name: tool1, tool2', capped + '(and N more toolsets…)'), Available Skills (N) in
M categories, MCP Servers (N) connected — and a '… /help for commands' summary.
Previously it was one combined '▶ N tools · M skills · K MCP' dropdown that only
listed tools and showed no model/dir/session.
- gateway startup.catalog now returns per-toolset {enabled, tools} (resolved_tools,
session-aware enabled set — mirrors tools.list); py_compile OK.
- store Catalog gains toolset.enabled/tools; new sessionId field + setSessionId,
set on session create/resume (alongside the active-session-file write).
- homeHint takes the store, reads info (model/cwd/branch) + sessionId + catalog.
85 pass; verified live (model·Nous·dir·session + enabled toolsets w/ tools).
The transcript read as one undifferentiated gold blob (user/assistant/tool all
the same color). Adopt the Ink model where color IS the hierarchy, in 3 brightness
tiers:
- USER input → label (gold) — the human's turn stands out.
- ASSISTANT answer → text (bright) — the primary content, brightest.
- TOOL / REASONING → muted (dim) with an ACCENT glyph (⚡/▶/▼ amber) that marks
the block — clearly the secondary 'working area' below the answer.
Spacing: one blank line above every turn (was cramped: user had a blank, the
reply didn't) + the existing gap:1 between parts. Dropped the transcript's extra
marginTop (turns own their spacing now). 85 pass; verified live — gold ask, dim
tool, white answer read as three distinct things.
The launcher (hermes_cli/main.py _print_tui_exit_summary) reads
HERMES_TUI_ACTIVE_SESSION_FILE to print 'Resume this session with…' on exit. The
Ink TUI writes the current session id there on every session change
(useSessionLifecycle.writeActiveSessionFile); the native engine never did, so
after a /session switch the launcher fell back to the INITIAL launch session and
showed resume info for the wrong session (the reported leak).
Now writeActiveSession() writes {session_id} on session.create AND inside
resumeInto (every /session switch), mirroring Ink. Verified live: file shows the
created session, then updates to the switched-to session. 85 pass.
The transcript scrollbox (stickyScroll+stickyStart=bottom) re-pins to the bottom
on any content-height change when the user is at the bottom (@opentui/core
ScrollBox: `if (stickyStart && !_hasManualScroll) applyStickyStart`). So expanding
a tool/thinking block scrolled the clicked header up off-screen. A
ScrollAnchorProvider (transcript owns the scrollbox ref) lets toolPart/reasoningPart
wrap their toggle so scrollTop is held constant across the height change (re-asserted
over a few frames as layout settles) — the clicked header stays put and the
expansion reveals beneath it. 85 pass.
#2 (flicker regression): my item-7 AssistantText wrapped text in
<For each={segmentMarkdown(text)}> — segmentMarkdown returns NEW objects per
delta, so <For> (keyed by reference) DISPOSED and re-created the markdown
renderable on EVERY streamed delta. Each remount re-measured from zero → content
height oscillated → the scrollbar grew/shrank (exactly the reported symptom).
Fix (deep opencode parity): render assistant text as ONE stable native
<markdown> (MarkdownRenderable) fed the growing content in place, with
internalBlockMode="top-level" — opencode's anti-flicker mode where settled
top-level blocks aren't re-rendered per delta (_stableBlockCount, managed
internally). This is opencode's TextPart verbatim (routes/session/index.tsx:1687).
#3 (table inline formatting): the native <markdown tableOptions={{style:grid}}>
renders GFM tables as a grid WITH inline bold/italic/code in cells — so the
hand-rolled segmentMarkdown + MdTable grid are deleted (obsolete). Switched from
<code filetype=markdown> to <markdown> (the former re-measured the whole buffer
each delta and never aligned tables). 85 pass; verified live (smooth stream,
boxed table, concealed **/* markers styled).
Raw useTerminalDimensions fires on every SIGWINCH tick; during a drag that's a
recompute/reflow storm across every width-sensitive component (tool bodies,
tables, status bar, banner). Add a DimensionsProvider that runs the raw hook
ONCE and feeds a single leading+trailing-debounced (40ms) signal — mirroring the
gateway's 16ms event coalescing / opencode's createLeadingTrailingSignal — that
every consumer shares via useDimensions(). They now reflow together (no tearing)
and at most once per window. Falls back to the raw hook outside a provider
(headless tests). Verified: single resizes converge clean (wide banner ⇄ compact
brand at the 102-col threshold); rapid bursts coalesce. 90 pass.
Home screen now shows the canonical HERMES-AGENT block logo (hermes_cli/banner.py,
gold->amber->bronze via primary/accent/border tokens; width-guarded to a compact
brand line under 102 cols) plus a collapsible '▶ N tools · M skills · K MCP' panel
that expands to per-toolset / per-category / per-server detail.
Data comes from a new opt-in gateway RPC 'startup.catalog' (aggregates
get_all_toolsets + banner.get_available_skills + config mcp_servers); the native
engine fetches it best-effort on session start (Effect.catchCause swallows it on
old gateways). Opt-in => Ink path untouched. py_compile OK. Store gains a typed
Catalog + defensive setCatalog mapper. +2 tests (90 pass); verified live
(1185 tools / 196 skills / 2 MCP, expand shows the full lists).
Visual-hierarchy pass (design-reviewed against free-code/opencode):
- header: brand glyph in accent + name in primary/bold + a bottom rule, so it
reads as chrome and bookends the transcript with the status bar's top rule
(fixes 'nothing differentiates the header from the text stream').
- status bar: a dim │ divider segments model·effort from the context meter.
- user/assistant turn glyphs bold + the user ❯ in accent so turns are scannable.
- reasoning 'Thought' label uses label (not warn) so it matches tool headers —
warn is reserved for warnings; reasoning/tool now read as one aside family.
- home screen: brand in primary/bold, command names in accent vs muted descs,
wider column.
- FIX (load-bearing): strip ANSI/SGR escape sequences from slash/notice text
(pushSystem + openPager) — the gateway colors them for Ink, which interprets
them; the native <text> rendered them as literal glyphs. +stripAnsi
+ 3 tests. All tokens themed (no hardcoded colors). 88 pass.
The native <code filetype=markdown> colorizes pipes but never aligns tables.
Add a pure segmenter (segmentMarkdown) that splits assistant text into prose
runs (native renderable) and GFM table blocks, plus an MdTable grid renderer:
per-column widths (free-code's stringWidth+padAligned), :--- / :--: / ---:
alignment, bold header, dim │ separators, a ┼ header rule, width-aware column
shrink on resize. Incomplete tables (no separator yet, e.g. mid-stream) stay
prose until they close. +5 unit tests (85 pass); verified live with a 3-col table.
Blank lines between reasoning/tool/text grew and shrank mid-stream because
spacing was ad-hoc: tools carried marginTop:1, text/reasoning none, and the
markdown text part rendered the model's leading/trailing newlines as transient
blank lines that filled in as deltas arrived.
Now the parts column owns ALL spacing via gap:1 (uniform 1 line between any two
parts regardless of type/order), per-part marginTop is dropped, and text parts
are stripped of leading/trailing blank lines so the gap is the sole source —
no double gaps, no popping. Verified live: Thought/tool/tool/answer all spaced
by exactly one line. (80 pass.)
Reasoning rendered as an always-expanded plain muted blob. Now it's a proper
collapsible part (opencode ReasoningPart): auto-EXPANDED while the turn streams
(watch it think), then collapses to a one-line `▶ Thought: <title>` when settled;
click toggles. Title is the model's leading `**bold**` line (reasoningSummary).
Body renders as DIM markdown in a left-`│`-border block (Markdown gained an
optional `fg` so reasoning is muted vs the answer). +1 render test (80 pass).
Resumed tool calls were flat `⚡name arg` rows — no output, not collapsible —
because the resume snapshot (_history_to_messages) dropped each tool's result.
Now the native engine passes `with_tool_output: true` on session.resume and the
gateway folds the tool's redacted+capped result + args into its row, so resumed
turns show `▶ name arg (N lines)` collapsible blocks identical to a live turn.
The flag is OPT-IN: Ink doesn't pass it, so _history_to_messages stays byte-for-
byte unchanged for the Ink path (its expanded verbose-trail render OOM'd on big
output, #34095; the native engine renders tools collapsed, so the capped tail is
safe there). resume.ts maps context→argsPreview, result_text→resultText (label
peeled + envelope stripped), args→argsText — same shape as the live tool part.
py_compile OK. resume tests updated + 1 added (79 pass).
The gateway already ships per-tool arg metadata the client was discarding:
`context` (build_tool_preview's primary-arg line, always sent), `args` (full
dict on complete), `args_text` (redacted JSON, verbose), `duration_s`. Capture
them on the tool part and render free-code style:
- collapsed header: `▶ name <arg-preview> · <duration> (N lines)` — args are
finally visible without expanding (the core item-2 complaint).
- expanded: a single left-bordered (`│`) column with a key:value args block
(suppressed when the lone arg is already the header preview — judge nit) then
the output block.
- strip the gateway's `[showing verbose tail; omitted N chars]` banner into a
tidy `… omitted N chars` note; unwrap tail-capped `{"output":…}` envelope
fragments so the last line isn't a dangling JSON tail.
Left bar is a border glyph (opencode BlockTool style), not a bg fill — cleaner
and renders faithfully. +4 unit tests, +1 render test (78 pass).
The root box used padding:1 (all edges), reserving a blank row BELOW the
status-bar+composer block. Switch to paddingTop/Left/Right only so the input
hugs the last terminal row. Transcript stays flexGrow:1 minHeight:0; the
bottom block is the flexShrink:0 last child. StatusLine already renders
zero-height when idle, so no other change is needed.
Item 1 — copy/paste:
- boundary/clipboard.ts (ported/trimmed from opencode): writeClipboard = OSC 52
(SSH/tmux-safe) + a native command (pbcopy/wl-copy/xclip/xsel/clip);
readClipboardImage = clipboard PNG via wl-paste/xclip/pngpaste/powershell.
- Ctrl+C copies a live MOUSE SELECTION (renderer.getSelection) before the
interrupt/quit machine runs (opencode's selection-key precedence), with a
"Copied to clipboard" hint; falls through to interrupt/quit when there's no
selection.
- text paste inserts natively (textarea handlePaste); the composer's onPaste only
intercepts an EMPTY bracketed paste (image-only clipboard) → readClipboardImage
→ image.attach_bytes (the next prompt.submit picks it up).
Item 4 — mouse selection now ignores decorative glyphs: selectable={false} on the
message/tool gutter glyphs and all chrome (header, status bar, status line,
composer prompt glyph), so a drag copies the message text, not ❯/⚕/▶/⚡.
Live-smoked (this env has no clipboard tools/DISPLAY, so native copy + image read
can't be confirmed here, but): drag-select + Ctrl+C → "Copied to clipboard" (not
quit); no-selection Ctrl+C still arms quit; bracketed text paste lands in the
composer. 71 pass.
A just-started assistant turn (message.start, no deltas yet) rendered an EMPTY
fallback <text> on the glyph's line plus the `▍` caret on a SEPARATE line below —
so `⚕` sat alone with the caret dangling beneath it, indented. Folded the caret
into the no-parts fallback so it renders inline with the glyph (` ⚕ ▍`); a settled
row still shows its flat text, a turn with parts renders the parts. 71 pass.
Live-smoked: streaming start now shows `⚕ ▍` on one line; the reply text then
aligns with the glyph.
Item 15 — "/agents doesn't let me look into an agent trace live":
- store accumulates a concise per-subagent trace from the subagent.* stream
(▶ start / ⚡ tool — preview / progress text / ✓ summary), capped at 200 lines;
thinking deltas update a transient `thought` (not appended — they'd flood).
- AgentsDashboard is now master-detail: ↑/↓ select a subagent (▸ + accent), and
the bottom pane shows the selected agent's goal · status · model, its latest
thought, and a sticky-bottom (live) trace scrollbox. PgUp/PgDn scroll the trace.
Item 9 — /tools wired to a deliberate navigable overlay (fetch the roster via
slash.exec → pager) instead of incidental fallthrough; /skills already opens the
native picker.
Tests: store trace accumulation + dashboard render (trace line + footer). 71 pass.
Live-smoked: /tools → tool roster pager; /skills → picker; a real delegation
(spawn a subagent → reply PURPLE) → /agents showed the subagent with its goal ·
completed · model, 🧠 PURPLE thought, and ▶/✓ trace lines.
Item 7 — tools were non-collapsible and "ugly-interlaced":
- ToolPart now renders COLLAPSED by default as one line: `▶ name summary (N
lines)` (summary = explicit summary / first output line / error). A ▶/▼ glyph
marks expandable tools; clicking the header toggles a left-bar block of the
full (capped) output. Running tools show `name …`; single-line/erroring tools
render inline. Compact by default → far less interlacing clutter.
- toolOutput.normalizeOutput: un-double-escapes literal \n/\t when they dominate
over real newlines (some gateway tool tails are repr'd, so newlines arrived as
backslash-n and rendered as one ugly line). Conservative — genuine multi-line
output and legit `\n`-in-code are left alone. Applied in stripToolEnvelope.
Item 3 — the input "blue tint": dropped the textarea's blue focusedBackgroundColor
and added a `❯` prompt glyph. The composer is now distinguished by structure (the
glyph + the status-bar rule above it), not a background tint.
Tests: normalizeOutput (dominant-literal vs genuine-multiline). 70 pass.
Live-smoked: `ls -la` tool → collapsed `▶ terminal total 3460 (N lines)`;
SGR-click → `▼` + clean per-line output; composer shows `❯` with no blue tint.
onType used to fire complete.slash only for an argless `/command`, and Tab
replaced the whole line. Now:
- planCompletion(text) (pure, in slash.ts) routes: a `/command [args]` line →
complete.slash (the gateway completes names AND args, e.g. /details section
names); a trailing path-like word (@…, ~/…, ./…, /…, or anything with /) →
complete.path for file/dir tagging; else nothing.
- the accepted item splices ONLY its token: store tracks completionFrom (gateway
replace_from via readReplaceFrom, or the path-token start), and the composer's
Tab handler keeps the text before `from` and appends the candidate.
Tests: planCompletion (slash/path/prose/multiline) + readReplaceFrom. 69 pass.
Live-smoked: `/details ` → section dropdown (hidden/collapsed/.../activity), Tab
→ `/details hidden` (arg-only splice); `tui_gateway/` → its .py files;
`@hermes_cli/m` → m-prefixed files.
New logic/history.ts: createPromptHistory (pure cursor cycling — Up walks older,
Down walks newer back to the stashed draft, push dedupes a consecutive duplicate
+ resets) plus best-effort per-dir JSONL persistence under
$HERMES_HOME/tui-history/<sha1(cwd)>.jsonl (one JSON-encoded prompt per line,
multiline-safe).
Scoping matches the ask: prior prompts from the SAME launch dir are loaded on
start (recallable across relaunches), but a different dir keeps its own list — no
cross-dir/cross-session bleed.
Composer: Up at the first line → older prompt; Down at the last line → newer/draft
(at the boundary the textarea's own up/down is a no-op, so no conflict; mid-buffer
it still moves the cursor). setText + cursor-to-end on recall; any edit resets the
recall cursor. submit() pushes the prompt. Threaded entry → App → Composer; cwd =
process.cwd() (the launch dir under the real launcher).
Tests: 5 pure cursor-cycling cases. Live-smoked: seeded a dir file → Up/Up/Down
cycled two→one→two; a freshly submitted prompt was recalled via Up. 65 pass.
The textarea focuses on mount and when an overlay closes (remount), but focus
could drift to the transcript scrollbox on a mouse-scroll, dropping keystrokes.
Now (opencode's keep-the-prompt-focused idea, adapted):
- onMouseDown → focus the textarea (click-to-focus).
- a global keystroke net: a PRINTABLE, unmodified key while the textarea is
unfocused reclaims focus AND recovers the char (the in-flight event went to
the global handler, not the unfocused textarea, so insert it). Nav/scroll keys
(arrows/page/home/end/…) are deliberately left alone so keyboard transcript
scroll still works; kitty `release` events are skipped to avoid double-insert.
Completion accept/dismiss handler folded into the same useKeyboard with early
returns.
Live-smoked: type → text lands; `/` → completions; Esc → dismiss; type again →
lands; clean quit. 60 pass.
Item 11 — "stopping the agent doesn't work". Ctrl+C used to immediately destroy
the renderer. Now a turn-aware state machine (opencode's double-press model, the
user's preferred behaviour):
- While a turn runs (store.info.running): first Ctrl+C → session.interrupt
{session_id} (STOP the agent), and arms a 3s quit window with a warn hint
"⏹ stopped — Ctrl+C again to quit".
- Idle: first Ctrl+C arms the window ("Ctrl+C again to quit"); a stray single
press never nukes the session.
- A second Ctrl+C within the window KILLS the TUI (renderer.destroy → clean
scope teardown → gateway child EOF).
- A blocking prompt still owns Ctrl+C (deny/cancel) — unchanged.
Wiring: renderer.ts gains an `onCtrlC` hook (owns Ctrl+C when not blocked);
entry builds the machine (gateway yielded before the renderer so it can read
`running` + send interrupt). store gains a transient `hint` slice; StatusLine
shows hint (warn, priority) or the busy face (dim).
Live-smoked: long turn → Ctrl+C shows "stopped" + idle dot; second press exits
cleanly with no orphaned gateway child (the user's installed-venv sessions
untouched). 60 pass.
Item 14: a persistent bottom-chrome status bar, ported from Ink's appChrome
StatusRule. Sourced from the session.info event (model / reasoning_effort /
fast / cwd / branch / running / usage.context_*) which was decoded but dropped
until now; also folded session.create/resume result.info and message.complete
usage into a new store `info` slice.
- store: SessionInfo slice + applyInfo(); session.info handler; message.start/
complete flip `running` (the flag the Ctrl-C interrupt will read); refresh
usage on complete.
- schema: MessageComplete.payload gains loose `usage` so it survives decode.
- view/statusBar.tsx: width-aware (Ink progressive disclosure) — context bar
drops on narrow terminals, cwd compacts to last two segments + left-truncates
so the row never wraps. Turn/connection dot ◐/●/○.
- App: status bar sits ABOVE the composer; a top-edge rule (border:['top'])
visually separates the status bar + textbox input region from the transcript.
- tests: store info slice (3) + headless status-bar render (1); bumped the
approval-prompt capture height for the taller input region. 59 pass.
Live-smoked: bar shows model·effort·context%·dir; context updates 0→4% across a
turn; running dot flips; separator divides input region from transcript.
Live-usage issue 3/5: the kaomoji faces ("(¬_¬) processing…") lingered in the
transcript. Traced (instrumented capture): they arrive via `thinking.delta` —
Hermes's transient kaomoji busy *indicator* (_INDICATOR_DEFAULT=kaomoji), which I
was rendering as a persistent reasoning part.
- store: new transient `status` field. thinking.delta / status.update → `status`
(not a part); message.start + message.complete clear it. Only the real
`reasoning.delta` still becomes a (dim) transcript part.
- view/statusLine.tsx: a dim busy line above the composer shown while `status` is
set (Ink's FaceTicker analog), rendering nothing when idle; wired into App
between the transcript and the input zone.
Verified: bun run check green (55 tests / 7 files) — store tests assert
thinking.delta → status (no transcript part) + cleared on complete; status.update
→ status. Live tmux: a turn showed "٩(๑❛ᴗ❛๑)۶ cogitating…" on the transient status
line (cleared on completion) with NO face left in the transcript.
From live-usage feedback (driving the real TUI):
- Mouse ON by default (opencode parity; HERMES_TUI_MOUSE=0 opts out). Was hardcoded
off, which is why transcript wheel-scroll, scrollbar drag, and click-to-expand
tools didn't work and the terminal's native region-select polluted copy. With
useMouse the scrollbox handles the wheel + scrollbar and tools are click-expandable;
selection becomes OpenTUI's text-aware select. (Mouse can't be driven via tmux
send-keys — verify wheel/drag/click interactively.)
- Streaming markdown: match opencode's v2 text path —
<code filetype="markdown" streaming drawUnstyledText={false}>. The previous
drawUnstyledText:true drew raw text then overlaid styling each delta (a flash);
false avoids that and re-tokenizes incrementally for smoother streaming. (The
native renderable's tree-sitter doesn't settle in the headless test renderer with
drawUnstyledText:false, so the two markdown frame tests now assert the assistant
text via the store — paint is verified in the live smoke; render.ts also settles
to waitForVisualIdle.)
Verified: bun run check green (53 tests / 7 files). Live mouse + streaming
smoothness for glitch to confirm. Part of the live-feedback polish goal.
Repoint hermes_cli/main.py `_make_opentui_argv` from the superseded React entry
to the v4 Solid + Effect-at-boundary entry: it now prefers
`ui-tui-opentui-v2/src/entry/main.tsx` (cwd ui-tui-opentui-v2) and falls back to
`ui-tui-opentui/src/entry.real.tsx` only if the v2 package is absent (graceful
during coexistence). The engine gate (_resolve_tui_engine: HERMES_TUI_ENGINE /
display.tui_engine → opentui; Windows/Termux → Ink fallback) and the dual-engine
dispatch in _make_tui_argv are unchanged; Ink (ui-tui/) is untouched. The spawned
tui_gateway's source-root default lands on PROJECT_ROOT (package at
<root>/ui-tui-opentui-v2), so it loads Python from the same checkout, no extra env.
So `HERMES_TUI_ENGINE=opentui hermes --tui` now launches the v4 engine — the exact
`bun …/v2/src/entry/main.tsx` invocation live-smoked across P1–P5e, making every
first-class surface reachable from the real CLI.
Also: a consolidated 3-way acceptance summary (Ink ↔ opencode ↔ build) at the top
of opentui-feature-map.md covering all 7 first-class surfaces + the foundation +
the launcher, each ✅ + tested + smoked.
Verified: py_compile main.py OK (dev-skill rule for the 4k-line file); imported
the worktree CLI with HERMES_TUI_ENGINE=opentui → _resolve_tui_engine()='opentui',
_make_opentui_argv() → [bun, …/ui-tui-opentui-v2/src/entry/main.tsx] (cwd
ui-tui-opentui-v2, --watch in dev). v2 `bun run check` green (53 tests / 7 files).
Smoke P8 + matrix updated. Remaining: header chrome detail (5b), agent-feature
trail (5d), distribution (§10) — polish, not first-class blockers.
The agents dashboard (spec §2b; Ink agentsOverlay) — the last first-class
interactive surface. Subagent delegations are tracked from the `subagent.*`
event stream and shown in a full-height overlay.
- store: subagents[] built from subagent.{spawn_requested,start,thinking,tool,
progress,complete} by subagent_id (status·goal·model·depth·lastTool·summary);
clearTranscript clears them. dashboard flag + openDashboard/closeDashboard.
- view/overlays/agentsDashboard.tsx: full-height overlay (replaces transcript+
composer), depth-indented subagent rows colored by status, scroll via
scrollBy/scrollTo, Esc/q close. Empty state prompts to delegate.
- view/App.tsx: content zone is now a <Switch> — pager / agents dashboard /
(transcript + input zone).
- logic/slash.ts: /agents, /tasks → openDashboard (SlashContext.openDashboard).
Verified: bun run check green (53 tests / 7 files) — subagent reducer + a
dashboard frame test (seeded tree renders, transcript replaced) + /agents
dispatch. LIVE tmux: /agents opened empty; then a REAL delegation spawned a
subagent → /agents showed "⛓ Agents · 1 subagent · ● completed <goal>
(model) ⚡terminal". ALL 7 first-class surfaces are now ✅+tested+smoked
(blocking prompts, pager, session switcher, model picker, skills hub,
completions, agents dashboard). Smoke P5e + matrix updated. Remaining: chrome
(5b), agent-feature polish (5d), launcher (8).
A live slash-completion dropdown renders above the composer as you type `/…`
(spec §1 autocomplete) — the 6th and final first-class overlay surface.
- view/composer.tsx: onContentChange → onType (reads ta.plainText); a dropdown
of candidates (display + meta) renders above the textarea when completions are
set. The textarea owns key input (live refine-by-typing), so Tab accepts the
top match (ta.clear()+insertText) and Esc dismisses; arrow-nav would fight the
cursor (noted polish).
- store: completions state + setCompletions/clearCompletions; CompletionItem.
- logic/slash.ts: mapCompletions(complete.slash result) → candidates.
- entry: onType queries complete.slash for `/word` (no space) and sets/clears the
store completions; cleared on submit / non-slash / space.
Verified: bun run check green (49 tests / 7 files) — mapCompletions + a
composer-dropdown frame test. LIVE tmux: typing `/comp` showed /compress,
/composio, /compact (with descriptions); Tab accepted the top + cleared the
dropdown. ALL 6 first-class overlays are now ✅+tested+smoked (blocking prompts,
pager, session switcher, model picker, skills hub, completions). Smoke P5a +
matrix updated. Remaining: chrome (5b), agent features (5d), agents dashboard (5e).
A full-height scrollable pager (the FloatBox analog) — porting it unlocks the
long-output slash commands (/status /logs /history /tools) at once (spec §2b).
- view/overlays/pager.tsx: bordered full-height overlay (title + scrollbox +
footer), scrolling driven explicitly via useKeyboard → scrollBy/scrollTo (no
reliance on scrollbox auto-focus), Esc/q/Ctrl+C close. §8 #2 scrollbox gotchas.
- store: pager state + openPager/closePager.
- view/App.tsx: content zone swaps to the Pager (replacing transcript+composer)
when store.state.pager is set; the close is deferred a tick so the closing key
can't leak into the remounting composer.
- logic/slash.ts: present() routes output to the pager when long (>180 chars or
>2 non-empty lines, Ink parity) else a system line; titled by command; /logs
always pages. New openPager on SlashContext.
Verified: bun run check green (41 tests / 7 files) — present() routing
(short→system, long→pager) + a pager frame test (renders title/content, replaces
the transcript/composer). LIVE tmux: /logs → pager (title "Logs", scroll via
PageDown, Esc closed → composer refocused, no key-leak); /version (5-line output)
→ pager titled "Version". Smoke P5a + parity matrix updated. Completions dropdown
+ pickers + chrome are the next slices.
HERMES_TUI_RESUME=<id|recent> resumes a session instead of creating one:
session.most_recent (for "recent") → session.resume {cols, session_id} →
commitSnapshot(mapResumeHistory(messages)), buffering live events across the RPC.
- logic/resume.ts: maps the session.resume history into Message[]. Resumed tool
rows arrive as {role:'tool', name, context} (NO text — gotcha §8 #5); they're
FOLDED into the preceding assistant turn's ordered parts (state:'complete',
summary=context) so a resumed transcript renders the tools INLINE like a live
one. Assistant text gets a text part (renders via native markdown). User/system
stay flat. Unknown roles / non-arrays are ignored.
- logic/store.ts: hydrate split into beginBuffer() + commitSnapshot() so the live
event buffer spans the async resume RPC (events that arrive during resume are
replayed after the snapshot, in order).
- entry/main.tsx: bootstrap branches create vs resume; the resume path is timed
(rpc_ms / hydrate_ms) for profiling.
Verified: bun run check green (40 tests / 7 files) — resume mapper (fold tool
rows, standalone holder, ignore junk) + beginBuffer/commitSnapshot replay. LIVE
tmux: Launch A created a session with a ⚡terminal tool call; Launch B
(HERMES_TUI_RESUME=recent) hydrated user + assistant + the tool row inline.
STRESS+PROFILE on a real 103-message session (~/.hermes/sessions): client hydrate
= 76ms, bun RSS = 214MB STABLE (no leak), tool rows hydrated, PageUp scroll works;
the 1.6s cost is the server-side session.resume RPC, not the TUI. Smoke P4 +
matrix updated. Note: rows instantiate for the full history (scrollbox culls
render only) → RSS ~linear in turns; list virtualization is the lever if
multi-thousand-turn sessions become a target.
The composer now routes `/command` through the Ink-parity dispatch ladder
instead of submitting it as a prompt (spec §1):
- logic/slash.ts: parseSlash + dispatchSlash — client-local command →
slash.exec {command, session_id} (output → system line) → on reject
command.dispatch {arg, name, session_id} with typed handling
(exec/plugin→system · alias→re-dispatch · skill/send→submit a turn ·
prefill→notice). 6 client commands: help/quit/exit/clear/new/logs.
- /help renders the live `commands.catalog` (reads the `pairs` shape).
- view/prompts/confirmPrompt.tsx + store.setConfirm: a LOCAL (non-gateway) Y/N
dialog for /clear and /new; store gains pushSystem + clearTranscript.
- entry: a Promise-returning `request` adapter + the SlashContext wiring (quit →
renderer.destroy, confirm, clearTranscript, logTail, submit).
Also fixes a keystroke-leak: the key that ANSWERED a prompt was bleeding into the
freshly-refocused composer (`/clear`→y left "y" in the input, breaking the next
`/quit`). PromptOverlay now defers the prompt-clear (composer remount) past the
current keystroke — this hardens every Phase 3 prompt too.
Verified: bun run check green (36 tests / 6 files) — slash.test covers parse + the
full ladder against a fake context. LIVE tmux: /help → full gateway catalog;
/version → slash.exec output; /clear → confirm → cleared, no key-leak (typed "hi"
not "yhi"); /quit → clean quit, child reaped. Remaining TUI-only commands,
completions, pager routing, and session resume are 4b/4c. Smoke P4 + matrix updated.
The 4 gateway *.request events now drive a blocking-prompt overlay instead of
deadlocking the agent (spec §8 #6). Native OpenTUI paradigm (per glitch's steer):
- view/prompts/approvalPrompt.tsx: native <select> (once/session/always/deny)
→ approval.respond {choice, session_id}.
- view/prompts/clarifyPrompt.tsx: native <select> over choices + an "✎ Other…"
option that swaps to a native <input> for free-text → clarify.respond
{answer, request_id}.
- view/prompts/maskedPrompt.tsx: sudo (🔐) / secret (🔑) — native <input> has no
mask, so we own a buffer via useKeyboard and render '*' per char →
sudo/secret.respond {password|value, request_id}.
- view/prompts/promptOverlay.tsx: dispatches by prompt kind, binds each
answer/cancel to the matching *.respond; Esc/Ctrl+C → deny/empty so the agent
always unblocks.
Wiring: store gains ActivePrompt state + the 4 reducer cases + clearPrompt;
App swaps Composer↔PromptOverlay on store.state.prompt (so the composer textarea
stops capturing keys while blocked); renderer.ts gates the global Ctrl+C-quit on
isBlocked() so a prompt owns Ctrl+C (→ cancel); entry adds a generic `respond`
runFork callback + passes sessionId.
Verified: bun run check green (28 tests / 5 files) — reducer set/clear for all 4,
+ a frame test (approval overlay renders the command + all options as a bordered
modal, composer hidden while blocked). LIVE tmux: a real `rm -rf` approval fired;
Approve-once → command ran → unblocked; Esc → deny → "BLOCKED by user" →
unblocked; Ctrl+C-while-blocked cancelled WITHOUT quitting; Ctrl+C-unblocked quit
clean, no orphan. Smoke P3 + parity matrix updated. confirm (local) → Phase 4.
Assistant text parts now render through the NATIVE markdown renderable instead of
plain spans — bold/headings/lists/fences render, raw `**`/backtick markup is
concealed (spec §7; never hand-roll a parser).
- view/markdown.tsx: `<code filetype="markdown" streaming conceal drawUnstyledText>`
(CodeRenderable — opencode's v2 AssistantText path; `<markdown>` +
internalBlockMode="top-level" deferred paint headlessly). SyntaxStyle.fromStyles
is derived from the theme (markup.* → theme.color.*, non-hex colors guarded) and
cached by theme-object identity so all text parts share one instance, rebuilt
only on skin change. drawUnstyledText paints raw text immediately while
Tree-sitter highlighting settles (and makes it headless-capturable).
- view/messageLine.tsx: text-part Match renders <Markdown> instead of <text>.
- test/lib/render.ts: settle async markdown via flush(); captureFrame gains an
`until` option (waitForFrame) for content that paints after the first pass.
Verified: bun run check green (23 tests / 5 files). Live tmux: a markdown reply
(heading + bold word + 2-item list) rendered with `**` concealed (grep -c '**' = 0);
Ctrl+C clean, no orphan. Phase 2 complete (2a shell + 2b-i parts/tools + 2b-ii
markdown) — smoke steps 1–4 run live. Next: Phase 3 blocking prompts.
An assistant turn is now ONE ordered parts[] (text/reasoning/tool) instead of a
flat string, so tool calls render INLINE between text blocks rather than dumped
as separate rows below (spec §7 — the "dump-below" bug opencode's sync-v2 avoids).
- logic/store.ts: Part discriminated union + reducer rework. message.delta
appends to the open text part (or opens one); tool.start pushes a running tool
part; tool.complete matches by tool_id and updates that part IN PLACE (state,
envelope-stripped resultText, summary, error, lineCount); reasoning.delta
accumulates a reasoning part. User/system rows stay flat text; settled/resumed
assistant rows fall back to text.
- logic/toolOutput.ts: ported pure helpers — stripToolEnvelope (unwrap
{output,exit_code}, append [exit N]/[error] suffix) + collapseToolOutput +
truncate.
- view/messageLine.tsx: <For>+<Switch> dispatch by part.type with stable id keys.
- view/toolPart.tsx: two-tier render — inline one-liner (≤1 output line) or a
capped left-bar block (TOOL_MAX_LINES, "… +N more", click-to-expand) keyed off
the theme; reactive width via useTerminalDimensions.
Verified: bun run check green (23 tests / 5 files / 64 expects) — store
interleave/in-place/reasoning, a frame test asserting the tool renders inline +
envelope stripped, and toolOutput unit tests. Live tmux: a terminal-tool prompt
rendered "⚡ terminal" with its alpha/beta output inline between the assistant's
text parts; Ctrl+C clean, no orphan. Smoke P2b + parity matrix updated. Native
<markdown> for text parts is the next slice (2b-ii).
Turns the read-only Phase-1 view into an interactive shell, split into focused
view components (spec v4 §2 layout):
- view/transcript.tsx: ONE full-height <scrollbox> with a reactive <For>
(opencode's no-scrollback model). Applies the §8 #2 gotchas exactly:
minHeight:0 on the wrapper AND the scrollbox, NO flexDirection on the
scrollbox root, stickyScroll + stickyStart="bottom".
- view/composer.tsx: a native <textarea> captured by ref — flexShrink:0,
focus-on-mount, Enter->submit via keyBindings, imperative .clear() on submit,
and a `submitting` re-entrancy guard. Wired by the entry to fire prompt.submit
(Effect.runFork on the in-hand service value); it's now the PRIMARY input, with
the HERMES_TUI_PROMPT stand-in kept only for launch-with-prompt.
- view/header.tsx + view/messageLine.tsx: extracted, themed (no hardcoded
styles). MessageLine stays flat-text this slice; ordered parts (§7) land in 2b.
test/lib/render.ts now flushes 3 renderOnce passes before capture — a <scrollbox>
needs more than one pass to measure content + apply sticky, else the transcript
row paints blank.
Verified: bun run check green (12 tests / 4 files / 31 expects). Live tmux drive:
typed into the composer -> cleared -> user row -> streamed reply ("Here are three
words"); Ctrl+C quits cleanly even with the textarea focused, no orphan child.
Composer placeholder rendered the live skin's welcome string (skin->theme live).
Smoke P2a + parity matrix updated. Phase 2b (ordered parts/tool render/markdown)
is the next slice.
GatewayService/liveGateway over the real Python tui_gateway: JSON-RPC stdio
framing (Bun.spawn), 16ms event coalescing flushed inside Solid batch(), typed
GatewayError, and a decode-once GatewayEvent Schema (~35-member tagged union;
unknown/malformed events skip via Option.none, never crash the stream).
The Solid sync-v2-style store grows to: streaming text concat (prefer
payload.text), gateway.ready{skin}/skin.changed -> fromSkin reactive re-theme,
LRU id-dedup, and hydrate-while-buffering (resume scaffold). Theming is a 1:1
port of Ink's theme.ts (DARK/LIGHT, detectLightMode, ANSI-256 normalization,
fromSkin) behind a Solid ThemeProvider so existing skins work unchanged and the
view carries NO hardcoded styles. A console-safe diagnostics log (in-memory
ring + NDJSON file) is the single logging path.
Entry gains a live launch path (default; HERMES_TUI_FAKE=1 -> scripted hello)
with an initial-prompt bootstrap (session.create -> prompt.submit) as the
Phase-2-composer stand-in, plus a minimal Ctrl+C graceful quit
(renderer.destroy -> shutdown Deferred -> scope finalizers -> client.stop) so
the engine reaps its own gateway child instead of orphaning it.
Verified: bun run check green (tsc + eslint + 12 tests / 4 files); live tmux
drive connect -> gateway.ready -> prompt -> streamed reply ("pong") -> clean
teardown with no orphan bun/python. Parity matrix + smoke P1 run log updated.
hermes --tui launches the native OpenTUI engine (Bun) when
HERMES_TUI_ENGINE=opentui (env) or display.tui_engine=opentui (config);
Ink stays the default and the shipping path is untouched.
- _resolve_tui_engine() (env > config > ink); refuses opentui on
Windows/Termux (no Bun) -> falls back to ink with a notice.
- _make_opentui_argv() -> [bun, src/entry.real.tsx] (no build step).
- _bun_bin() with HERMES_BUN override.
- Branch at top of _make_tui_argv BEFORE _ensure_tui_node (Bun-only host
must not bootstrap Node).
- Gate _launch_tui NODE_OPTIONS/--max-old-space-size on engine==ink (Bun
is JSC; the V8 flag errors/ignores).
Verified end-to-end via tmux: real hermes --tui -> Bun -> OpenTUI ->
real Python gateway streamed a real reply. No-flag default still ink.
2026-06-08 11:11:54 +00:00
550 changed files with 274931 additions and 697 deletions
@@ -107,6 +107,8 @@ You can still bring your own keys per-tool whenever you want — the gateway is
Hermes has two entry points: start the terminal UI with `hermes`, or run the gateway and talk to it from Telegram, Discord, Slack, WhatsApp, Signal, or Email. Once you're in a conversation, many slash commands are shared across both interfaces.
> **TUI engine:** On supported hosts (Linux/macOS with Node 26.3+), the terminal UI defaults to the native **OpenTUI** engine, which the installer provisions for you. The legacy **Ink** engine remains the fallback — it's used automatically on Windows, Termux, or when the native engine can't run, and you can select it explicitly with `HERMES_TUI_ENGINE=ink hermes`. Ink is not going away; it's the kept fallback.
@@ -1172,7 +1205,7 @@ Recovered from a deterministic fallback because the LLM context summarizer was u
## Active State
Unknown from deterministic fallback. Inspect current repository/session state if needed.
## In Progress
{HISTORICAL_IN_PROGRESS_HEADING}
{active_task}
## Blocked
@@ -1184,13 +1217,13 @@ None recoverable from deterministic fallback.
## Resolved Questions
None recoverable from deterministic fallback.
## Pending User Asks
{HISTORICAL_PENDING_ASKS_HEADING}
{active_task}
## Relevant Files
{_bullets(relevant_files,limit=12)}
## Remaining Work
{HISTORICAL_REMAINING_WORK_HEADING}
Continue from the most recent unfulfilled user ask and protected tail messages. Verify state with tools before making claims.
## Last Dropped Turns
@@ -1312,7 +1345,7 @@ Summary generation was unavailable, so this is a best-effort deterministic fallb
_temporal_anchoring_rule=""
# Shared structured template (used by both paths).
_template_sections=f"""## Active Task
_template_sections=f"""{HISTORICAL_TASK_HEADING}
[THE SINGLE MOST IMPORTANT FIELD. Capture the user's most recent unfulfilled
input verbatim — the exact words they used. This includes:
- Explicit task assignments ("refactor the auth module")
@@ -1359,7 +1392,7 @@ Be specific with file paths, commands, line numbers, and results.]
- Any running processes or servers
- Environment details that matter]
## In Progress
{HISTORICAL_IN_PROGRESS_HEADING}
[Work currently underway — what was being done when compaction fired]
## Blocked
@@ -1371,14 +1404,14 @@ Be specific with file paths, commands, line numbers, and results.]
## Resolved Questions
[Questions the user asked that were ALREADY answered — include the answer so it is not repeated]
## Pending User Asks
[Questions or requests from the user that have NOT yet been answered or fulfilled. If none, write "None."]
{HISTORICAL_PENDING_ASKS_HEADING}
[Questions or requests from the user that have NOT yet been answered or fulfilled. These are STALE — they were from the compacted turns. Write them here for reference only. The agent must NOT act on them unless the latest user message explicitly requests it. If none, write "None."]
## Relevant Files
[Files read, modified, or created — with brief note on each]
## Remaining Work
[What remains to be done — framed as context, not instructions]
{HISTORICAL_REMAINING_WORK_HEADING}
[What remains to be done — framed as STALE context for reference only. The agent must NOT resume this work unless the latest user message explicitly asks for it.]
## Critical Context
[Any specific values, error messages, configuration details, or data that would be lost without explicit preservation. NEVER include API keys, tokens, passwords, or credentials — write [REDACTED] instead.]
@@ -1753,7 +1786,7 @@ The user has requested that this compaction PRIORITISE preserving all informatio
Context compressor bug (#10896): ``_align_boundary_backward`` can pull
``cut_idx`` past a user message when it tries to keep tool_call/result
groups together. If the last user message ends up in the *compressed*
middle region the LLM summariser writes it into "Pending User Asks",
middle region the LLM summariser writes it into "Historical Pending User Asks",
but ``SUMMARY_PREFIX`` tells the next model to respond only to user
messages *after* the summary — so the task effectively disappears from
the active context, causing the agent to stall, repeat completed work,
# TUI benchmark suite — Ink (`ui-tui`) vs OpenTUI (`ui-opentui`)
Methodology (settled, binding): `docs/plans/opentui-bench-suite.md`. This
directory is the implementation: real binaries over a real node-pty PTY
(120×40, xterm-256color), a fake gateway substituted via `HERMES_PYTHON`
(ZERO changes to either UI), external `/proc` sampling, cgroup-v2 memory caps.
No tmux anywhere in measurement — except the `pipeline` cell, whose entire
point is measuring the tmux emulator leg (see its note below).
## Pieces
| file | role |
|---|---|
| `fake-gateway.mjs` | NDJSON JSON-RPC gateway stand-in. Both UIs spawn it as `$HERMES_PYTHON -m tui_gateway.entry`. Answers every startup RPC with canned results, then streams the fixture (burst / paced / load-then-idle). Never writes stderr (the UIs render gateway stderr). |
| `fixture-stream.mjs` | Serializes the deterministic lumpy-turn fixture (`ui-opentui/scripts/fixture.ts`, imported directly via Node ≥26 type stripping — no port) to NDJSON. Cached under `.cache/`, sha256-stamped. |
"description":"TUI benchmark suite: Ink (ui-tui) vs OpenTUI (ui-opentui) over a real PTY with a fake gateway. Methodology: docs/plans/opentui-bench-suite.md.",
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.