Brings the memory-window branch up to current main + the multi-click
selection feature, via the base PR branch's merge commits (history
preserved, no rewrites). No conflicts: the 13 windowing/diagnostics
commits touch ui-opentui/bench/docs only.
Main's rewritten test_tui_npm_install.py tests call _make_tui_argv expecting
the Ink/npm flow unconditionally; with the dual-engine dispatch merged in,
_resolve_tui_engine() auto-selects opentui whenever ui-opentui/dist is built
in the repo, routing the call away from the path under test (first subprocess
became 'node --version' instead of 'npm run build'). Pin the engine to ink
via an autouse fixture, mirroring the existing pinning precedent in
test_tui_resume_flow.py.
body sets user-select:none for native feel and opts text back in only via
[data-selectable-text='true']; the preview's source and rendered-markdown
panes never set it, so code couldn't be selected or copied. Tag the Shiki
code column and the markdown root. The attribute stays off the SourceView
grid root so the gutter keeps its select-none and line numbers don't bleed
into copied text.
The terminal listed JetBrains Mono only as a late fallback and shipped no
webfont, so on machines without SF Mono/Menlo xterm measured the grid on the
regular system face while styled SGR spans fell back to a font with different
advances — glyphs squeezed and overlapped.
Bundle the regular/bold/italic woff2 (Apache-2.0, the faces the dashboard
already ships), put the family first in the xterm stack, pin the weights, and
warm every face before mount (fonts.ready only settles already-requested
faces; bold/italic aren't asked for until styled output paints, past atlas
init). Vite emits them as hashed assets under dist/** with base './', so the
fonts ship in the asar and every install path inherits them.
439 main commits in; 144 branch commits preserved (no rewrite — merge over
rebase per glitch's call to keep history). Conflict resolutions:
- Dockerfile: ui-opentui build folded into main's new cached frontend-build
layer (COPY ui-opentui/ + install/build/prune beside web+ui-tui); node:26
base from our side kept.
- gateway/run.py: took main's extraction (slash handlers moved to
gateway/slash_commands.py); re-applied our /usage real-cost block
(real_session_cost_usd / resolve_billing_route, no estimation) at its new
home in slash_commands.py.
- tui_gateway/server.py: _LONG_HANDLERS union (our model.options + main's
plugins.manage).
- tests/test_tui_gateway_server.py: both sides' appended tests kept.
Verified during resolution: cli.py worktree prune/cleanup lock fix survived
auto-merge intact and main's new prune call-site (hermes_cli/main.py:2139)
inherits its clean/locked/dirty/unpushed skip semantics; HERMES_TUI_ENGINE
selection in hermes_cli/main.py intact.
Regular users get zero diagnostic surface by default: /mem and /heapdump
disappear from /help and completion, and invoking them prints the
one-line enable hint (relaunch with HERMES_TUI_DIAGNOSTICS=1) instead of
executing — an enable switch, not a secret. With the switch on, the
commands work as before and HERMES_TUI_WINDOW_STATS defaults on (still
individually settable either way). Full env-flag ledger (master switch /
user config / dev tuning / internal plumbing) in docs/opentui-env-flags.md.
672 tests exit 0.
The per-row copy control lived in the header's trailing slot as a 24px
button that depended on a `group-hover/tool-row` group that exists nowhere
in the tree. It therefore stayed `opacity-0` yet remained clickable — an
invisible hit-target straddling the disclosure caret and duration, making
the caret hard to click without firing a copy.
Move copy into the expanded body's top-right (matching the code-block
convention) where it can't fight the caret for the right edge, and make it
actually visible (subtle at rest, full on hover/focus). The header right
edge now belongs solely to the duration label + caret.
Tradeoff: copy is only reachable once a row is expanded; rows with no
expandable body no longer surface a copy control.
Rebasing onto fcf49f313 (multi-click selection) collided in the test
probe (both sides added 'mouse' — deduped) and surfaced two test debts
the cap-restore commit (3cc56517a) had shipped masked by a piped exit
code: store cap tests still asserted the 1000 default (now: 3000
windowed, 1000 with HERMES_TUI_WINDOWING=0, both covered), and the
burst-interplay test relied on the old cap trimming a 1500-row burst
(now pins HERMES_TUI_MAX_MESSAGES=1000 explicitly for both stores).
Also: .demo/ build artifacts excluded from typed linting. 669 tests
exit 0 (verified unpiped). Multi-click selections flow through the
same renderer selection seam, so windowing's drag-freeze + row-pinning
covers them with no changes.
Maintainer signals (native yoga next release, 2x layout, opencode's
100-cap was a legacy perf workaround): what changes for us (WASM ratchet
dies), what doesn't (the 65k handle table still makes windowing
load-bearing at 3000 rows), the boundary/ shim ledger with
delete-on-upstream-fix criteria, and the per-release upgrade playbook
that uses the bench suite as the acceptance contract.
Shareable explainer: every primitive at play (native handle table, Yoga
WASM grow-only memory, renderables, Solid surgical unmount, V8 GC
laziness, scrollbox draw-only culling) and every decision (windowing vs
store-cull, exact-heights-at-unmount vs Ink's estimate-correct, the
correctionIsLegal zero-jank law, append-time adjudication, never-window
rules, windowing-aware cap restore, heap right-sizing), with the
measured scoreboard and honest open items.
With transcript windowing (S1+S2) the mounted set no longer scales with
the store (peak 31 rows over a 1500-row burst), so the handle-table
clamp that forced 1000 rows is unnecessary when windowing is on. The
ceiling is now windowing-aware: 3000 rows (the originally-shipped
default, regression documented in opentui-fixes-audit.md §2) with
windowing, 1000 with HERMES_TUI_WINDOWING=0 (every row mounts again).
Measured at the restored cap (full 3000-msg store): mem3000 360MB peak
styled end-to-end (pre-campaign: ~870MB + unstyled past ~1,400 rows;
before that: crash). scroll3000 p50=2 p90=3 p99=8 max=17ms (Ink same
workload: p90=35 p99=96). Gate digest unchanged.
Pins Node 26.3 to ui-opentui/ only (fnm/mise auto-switch on cd; leaving
the directory restores whatever the dev had — no global default change).
engines.node >= 26.3 makes a wrong-Node npm ci warn. README covers
install paths (fnm/mise/nvm/absolute-binary), the ABI-locked
node_modules gotcha, and build/run commands.
Adversarial-review follow-up to the S2 slice. The S1 rule froze ALL window
recomputes while renderer.getSelection()?.isActive — but a finished mouse
selection persists by design (boundary/renderer.ts keeps the highlight so
Ctrl+C can re-copy), so a long streaming turn behind a lingering highlight
ballooned the mounted set exactly like pre-windowing (and permanently
ratcheted the Yoga-WASM high-water).
Refinement:
- full freeze only while selection.isDragging (the native walk touches the
live tree on every drag update — destroying a row mid-walk corrupts the
highlight; unchanged from S1 where it matters),
- a finished highlight instead PINS the rows containing
selection.selectedRenderables (parent-climb to the row wrapper via a
WeakMap) as neverWindow — the highlight and a later Ctrl+C copy stay
byte-exact while everything else keeps windowing,
- an active highlight counts as activity (no idle measure churn under it).
Test (headless mock-mouse drag): finished selection persists (isActive,
!isDragging) → 300-row burst keeps peakMounted < 120 AND
getSelectedText() returns the identical text afterward, the selected rows
having been pinned while long scrolled past the margin.
Verified on this build: gate digest otui-capped d5e9558583159eac… (2/2),
mem2000 otui-capped windowing-ON vmhwm 312MB (target ≤ 350), scroll2000
otui-capped p50 2.0ms / p99 6.0ms (gate ≤ 17ms). check exit 0 (648 tests).
Review verdicts on the remaining findings (verified against core source):
- "scrollTop compensation race": rejected — scrollTop is an imperative
scrollbar property (no signal staleness); records fire in document order,
each compensation immediately visible to the next.
- "heights map leak on /new": rejected — the countChanged cleanup prunes
every per-key map against the live key set (test-verified).
- remount-in-viewport estimate shift: only reachable when one frame jumps
past the margin (> 1 viewport); the design's accepted "remounted for
view" path — documented in the header.
- expanded tool/reasoning re-collapse on far-remount: S1-accepted,
deferred (component-local state; out of S2 file ownership).
S2 of docs/plans/opentui-transcript-windowing.md (#27), behind
HERMES_TUI_WINDOWING (OFF path renders the byte-identical legacy tree).
Append-time adjudication: the window now recomputes on transcript GROWTH,
not just scroll — a createComputed on messages.length re-windows
synchronously per append, and while pinned at the bottom computeWindow
anchors to the cumulative content BOTTOM (pinnedBottom) instead of the
stale pre-layout scrollTop, so burst-appended rows are spacer-swapped the
moment they pass the margin. The frame driver additionally treats a
≥ ¼-viewport scrollHeight change (streaming growth) like scroll movement.
Unseen-row default changed from "always mounted" to "mounted iff created
streaming or within the bottom-30" — live rows still paint instantly with
zero added latency; a bulk commitSnapshot (resume) mounts ONLY the bottom
window and everything above starts as line-count-estimate spacers (chip-
and-spacing-aware estimateMessageHeight).
Spacer corrections (zero-jank rule): when a measure lands a height
different from what the spacer occupied, the wrapper's onSizeChange fires
inside the layout traversal, pre-paint. Pinned at bottom the scrollbox's
own sticky re-pin (content onSizeChange runs before the row wrappers')
already compensated — verified by test; otherwise scrollTop is compensated
same-frame for rows fully above the viewport (correctionIsLegal). Frames
stay byte-stable across corrections in both pinned and mid-history tests.
Lazy exact-measure (design §4 — the simple choice, documented): no true
offscreen layout exists in @opentui/core, so an idle pulse (no appends,
no scroll, no turn, no selection for HERMES_TUI_WINDOW_IDLE_MS≈1s) mounts
MEASURE_BATCH_ROWS=10 never-measured rows nearest the bottom window edge
(edgeMeasureBatch), records exact heights (incl. a direct post-layout pull
for rows whose mount changed nothing — no onSizeChange fires), and the
next recompute swaps them back to now-exact spacers. Scrolling itself
still measures the margin band.
DEV counter: windowRowStats (current/peak simultaneously-mounted rows),
exposed on globalThis behind HERMES_TUI_WINDOW_STATS; tests assert it.
Measured (this build, 39f9f433e+S2):
- check: exit 0 (647 tests / 39 files; +11 pure window cases, +4 headless)
- peak mounted: 31 rows over a 1500-row burst; 30 rows on a 600-row
resume snapshot (bound asserted < 120)
- gate digest: otui-capped d5e9558583159eac… — byte-identical, 2/2 reps
- mem2000 (otui-capped, windowing ON, 8GB heap): vmhwm 300MB
(S1 same-heap 518MB; S1 right-sized-heap 427MB; Ink 229-239MB;
target ≤ 350MB)
- scroll2000 otui-capped: p50 2.0ms / p99 5.0ms / max 18ms
(gate ≤ 17ms p99; S1 baseline p99 15ms)
Known S2 limits (deferred to S3, design §5): /compact·/details toggles and
width resizes leave out-of-window spacer heights stale until remount or
the idle march; expanded-body state above the window may re-collapse on
remount (S1-accepted).
Core machinery of docs/plans/opentui-transcript-windowing.md (#27): rows
outside [scrollTop − viewport, scrollTop + 2·viewport) swap to an exact-height
empty <box> (1 yoga node, no text buffers / native handles), so the mounted
set stays ~3 viewports regardless of transcript length.
Flag: HERMES_TUI_WINDOWING — unset → ON; 0/false/no/off → OFF (envFlag
semantics, the bench A/B + one-env escape hatch). OFF renders the exact
legacy tree (no wrapper boxes).
Pieces:
- logic/window.ts (pure, table-tested): computeWindow (viewport ± 1-viewport
margin intersection over cumulative exact heights; null heights fall back
to a per-row line-count estimate), hysteresisFor/shouldRecompute (≥ ¼
viewport between recomputes), correctionIsLegal (the jank rule: corrections
only fully above the viewport with same-frame scrollTop compensation, or
fully below it), estimateMessageHeight (line-count estimate; wrong values
are fixed by remount only — S1 never corrects a spacer in place).
- view/transcript.tsx: per-row measuring wrapper records exact heights via
onSizeChange (only while the real row is mounted); window driver is a
renderer frame callback (setFrameCallback — scroll always renders, so no
extra timer) publishing the mounted set through one signal + createSelector
so only flipped rows re-render. Stable row keys via WeakMap<Message, n>
(messages have no id; store proxies are reference-stable). Solid <Show>
unmount destroys the row's renderables (@opentui/solid _removeNode →
destroyRecursively).
Never-window rules:
- streaming rows (remount would restart native markdown streaming),
- the last row while a turn is running (deltas land there),
- the bottom 30 rows (fixed K — sticky-bottom region; rows under
viewport+margin are mounted by the window calc anyway),
- rows the window has never adjudicated default to MOUNTED (new live rows
paint instantly),
- the whole window FREEZES while a mouse selection is active
(renderer.getSelection()?.isActive — a swap would destroy highlighted
renderables under the native selection walk).
Tests: 30 pure window.test.ts cases + 2 headless integration cases
(transcriptWindow.test.tsx) pinning the zero-jank invariant (scrollHeight
identical ON vs OFF), the renderable shedding, and remount-on-scroll-back.
Arabic/Hebrew/Persian/Urdu chat text rendered left-to-right and
left-aligned, and mixed RTL/English technical messages (the common case)
read backwards. Resolve each chat block's base direction from its own
first strong character (UAX#9) with pure CSS, scoped to the chat
surfaces only:
- `unicode-bidi: plaintext` + `text-align: start` on assistant prose
blocks (p, h1-h6, li, blockquote), the user bubble's text lines, and
both composers (main + edit share the composer-rich-input slot). RTL
blocks read and right-align RTL; English stays LTR; mixed
conversations resolve per block. `text-align: start` is required
because the user bubble hardcodes `text-left`.
- Inline `code` and KaTeX are pinned `direction: ltr; unicode-bidi:
isolate`, so the bidi first-strong heuristic skips them: a sentence
that *starts* with a command (`./run.sh ...`) followed by Arabic
still resolves RTL, and the command's own neutrals keep their order.
- Fenced code surfaces (code-card, user fences) are pinned LTR so they
never mirror or right-align inside an RTL list item or blockquote.
`direction` is never forced, so app chrome, layout, and list indent
stay LTR per the issue's request not to flip the whole UI. English-only
content is byte-for-byte unchanged.
Salvaged and unified from #44065 and #44169; verified in Chromium that
isolate removes inline code from the paragraph direction vote (the
code-first case), making the JS dir-resolution in #44065 unnecessary.
Fixes#44150
Co-authored-by: Adolanium <Adolanium@users.noreply.github.com>
Co-authored-by: Adalsteinn Helgason <AIalliAI@users.noreply.github.com>
Tell coding agents to activate shell setup once per session instead of re-sourcing it before every command, and pin the existing LocalEnvironment env-snapshot behavior with regression tests.
Port from anomalyco/opencode#31271: only call tools/list when the server
advertises the 'tools' capability in InitializeResult.capabilities.
Previously, _discover_tools() unconditionally called session.list_tools()
right after initialize. Prompt-only / resource-only servers (which omit
the tools capability per the MCP spec) raise McpError(-32601 Method not
found), which aborted the connection — burning all 3 initial-connect
retries and permanently failing the server even though its prompts and
resources were perfectly usable. The 180s keepalive had the same problem:
it probed with list_tools(), so even a successfully connected prompt-only
server would be torn down on the first keepalive cycle.
Changes:
- MCPServerTask._advertises_tools(): capability check with a legacy
fallback (no captured InitializeResult -> behave as before)
- _discover_tools(): skip tools/list for non-tool servers
- keepalive: use the universal ping request for non-tool servers
- _refresh_tools(): guard against tools/list_changed from non-tool servers
E2E verified with a real stdio prompt-only FastMCP-style server: on main
it fails all 3 connection attempts with Method-not-found; with this fix
it connects, lists prompts, answers ping keepalives, and shuts down
cleanly.
The desktop "Local / custom endpoint" onboarding never collected an API
key and /api/model/set silently dropped one, so an auth-gated endpoint
(e.g. a hosted vLLM behind a key) could never enumerate models — and
Settings' "Set up custom endpoint" routed `custom` into a non-existent
OAuth flow, booting the user back to the first screen (the reported loop).
Backend (web_server.py):
- /api/providers/validate accepts an optional api_key and sends it as a
Bearer header when probing a custom endpoint's /v1/models.
- /api/model/set accepts api_key, persists it to model.api_key (same
switch/preserve lifecycle as base_url), and registers a named
custom_providers entry via _save_custom_provider — matching the
`hermes model` CLI flow so the endpoint shows up as a ready picker row.
Desktop:
- ApiKeyForm shows an optional API key field for the local/custom option;
the key is threaded through saveOnboardingLocalEndpoint → validate +
setModelAssignment.
- New onboarding `localEndpoint` intent + startManualLocalEndpoint(); the
Settings "Set up custom endpoint" button now opens the local-endpoint
form (URL + key) instead of the OAuth dead-end.
- Added localApiKeyPlaceholder i18n key (en + types + zh).
Tests: api_key lifecycle on _apply_main_model_assignment, key persistence
+ custom_providers registration on /api/model/set, Bearer-header probe;
onboarding store forwards + persists the key.
Cherry-picked from #39840 by @flyinhigh and rebased cleanly on main.
- Defer config fetch in createGatewayEventHandler until gateway.ready to
avoid render-phase RPC that can mutate transcript state and trigger
React error 301 in embedded dashboard PTYs.
- Use undici WebSocket fallback when globalThis.WebSocket is unavailable
(Node attach mode and sidecar mirror sockets).
- Add regression tests for both fixes.
Co-authored-by: flyinhigh <flyinhigh@users.noreply.github.com>
Collapse the verbose multi-line rationale comments across the TUI/desktop/
backend approval surfaces into single-line "why" notes, and derive
APPROVAL_OPTS_NO_ALWAYS from APPROVAL_OPTS instead of re-listing it.
No behavior change.
Node >=18 / Electron 40 ship fetch; the hand-rolled http/https.request
plumbing buys nothing. AbortSignal.timeout replaces the socket timeout,
protocol guard and >=400 rejection semantics preserved. 13/13 unit
tests and the live web_server.py repro both green over the new
transport.
Both spawn paths (startHermes, spawnPoolBackend) duplicated the same
resolve -> log-fallback -> foreign-check -> throw dance. Collapse it into
adoptServedDashboardToken(baseUrl, spawnToken, {childAlive, label}) in
dashboard-token.cjs; childAlive is a thunk so liveness is sampled after
the fetch. Drop the redundant backendPool.delete in the pool's throw
path (the child exit/error handlers already own pool eviction).
Validated end-to-end against a real web_server.py backend, not just
units: token-injection regex vs the actual served index.html, foreign
refusal (dead child + live squatter), benign drift adoption, and the
401-vs-200 token auth split on /api/sessions.
When a tirith content-security warning is present the approval backend
forces allow_permanent=False and silently downgrades an "always" choice to
session scope (the persistence loop in check_all_command_guards only honors
"always" → permanent when no tirith finding exists). But the gateway notify
payload that drives the TUI and the Electron desktop app never carried that
flag, so both surfaces always rendered "Always allow" — offering a permanent
allow the backend would quietly refuse to persist.
Plumb allow_permanent end-to-end:
- tools/approval.py: include `allow_permanent: not has_tirith` in the gateway
approval_data the notify callback emits as `approval.request`.
- ui-tui: thread `allowPermanent` through the event handler, gateway types,
and ApprovalReq; ApprovalPrompt drops the "always" option (and renumbers the
quick-pick keys) when it's false.
- apps/desktop: thread `allow_permanent` through the gateway payload type, the
per-session approval store, and the inline ApprovalBar, which now hides the
"Always allow…" dropdown item when permanent allow is disallowed — reusing
the existing DropdownMenu / confirm-Dialog UI.
The desktop/TUI render path for approvals already landed in #38578 (the root
cause of approvals not surfacing in the GUI); this completes the salvage of
#37856 by carrying allow_permanent across both surfaces. #37856's original
thread-local _block() approach is dropped: desktop/TUI approvals resolve via
approval.respond → resolve_gateway_approval (the per-session queue), not the
_block()/request_id correlation, so a worker-thread callback waiting on _block
would never be released by the real UI.
Tests: gateway notify payload carries allow_permanent (True without tirith,
False with a tirith warning); ui-tui approvalAction reduced option set +
event-handler allowPermanent propagation; desktop store round-trip + the
ApprovalBar showing/hiding "Always allow".
Supersedes #37856Closes#37812
Co-authored-by: LeonSGP43 <cine.dreamer.one@gmail.com>
Two fixes to the Electron desktop launch path, with the port-reservation logic extracted into a unit-tested module:
1. hermes:bootstrap:reset ("Reload and retry") only cleared connectionPromise, leaving the live backend alive; the orphan kept binding PORT_FLOOR (9120) so the next startHermes() hit EADDRINUSE / "Object has been destroyed" and the window looped. Await teardownPrimaryBackendAndWait() so the reset stops the old backend before restarting.
2. pickPort() probes-then-closes a socket before the real bind happens in a separate Python child, so two concurrent spawns (primary + pool backend) could both be handed PORT_FLOOR and one died with EADDRINUSE. The reservation bookkeeping is extracted into electron/port-pool.cjs (PortPool): pickPort() reserves the chosen port until the child exits and releases it on every exit/error/throw-before-spawn path, closing the TOCTOU window.
PortPool is dependency-injected (probe passed in) and socket-free, unit-tested in electron/port-pool.test.cjs (8 cases) and wired into the test:desktop:platforms script.
(cherry picked from commit d4133945b9)
The served-token fallback adopts whatever token the dashboard HTML
injects. That is correct when our own child regenerated the token (env
pin lost across a shell-wrapped spawn), but wrong when the readiness
probe answered from a process we did not spawn: /api/status is public,
so an orphaned dashboard squatting the port passes waitForHermes while
our child dies on the bind conflict. Silently adopting that process's
token would authenticate the renderer against a foreign backend,
possibly on the wrong profile.
Discriminate on child liveness: the desktop pins
HERMES_DASHBOARD_SESSION_TOKEN on every spawn, so a live child always
serves our token. Served-token mismatch + dead child = foreign backend;
fail the boot loudly instead of connecting. Mismatch + live child keeps
the adopt-served-token salvage from #43720.
`@assistant-ui/store`'s index-keyed child-scope lookup (`tapClientLookup`)
throws — rather than returning undefined — when a subscriber reads an index
the message/parts list no longer has. During high-frequency store replacement
(switching sessions mid-stream, gateway reconnect replay) a subscriber from
the previous, longer list is still in React's notification queue and reads one
slot past the new, shorter array before it can unmount. The throw
(`Index N out of bounds (length: N)`, the classic index === length off-by-one)
unwinds all the way to the root error boundary and blanks the entire window,
even though the store self-heals on the very next consistent snapshot.
Wrap each virtualized message group in a tiny boundary that swallows ONLY this
transient lookup race and auto-recovers when the message signature changes
(the existing list-mutation key). Any other error re-throws to the root
boundary, so genuine bugs still surface.
Upstream-tracked and unresolved: assistant-ui/assistant-ui#4051, #3652.
Co-authored-by: mollusk <mollusk@users.noreply.github.com>
Desktop spawns its dashboard backend with `--profile <name>` and
`HERMES_DESKTOP=1`. cmd_dashboard's unified-launch routing treats any
named profile as a request for the shared machine dashboard: it re-execs
as the default profile (dropping HERMES_HOME) or, when one is already
listening, prints "Machine dashboard already running ... Managing profile
'<name>'" and exits 0. Either way the desktop-spawned child exits before
the app sees a ready backend, so Desktop retries forever — the Windows
named-profile boot loop in the post-mortem.
Skip the machine-dashboard reroute when HERMES_DESKTOP=1 so desktop pool
backends stay per-profile (which is what the pool expects). Carved out of
#44478.
Co-authored-by: AJ <yspdev@gmail.com>
The desktop chat surface talks to the dashboard's in-process /api/ws
gateway, which builds agents through tui_gateway.server._make_agent. That
path only snapshots the existing tool registry — MCP discovery is started
by tui_gateway/entry.py (the stdio TUI), which the dashboard process never
runs. So a profile's configured MCP servers never connect under the
desktop app and sessions show no MCP tools.
Start a shared background MCP discovery thread at dashboard startup (via
hermes_cli.mcp_startup, bounded so a slow/dead server can't block boot),
and have _make_agent briefly join that thread in addition to the existing
entry-owned TUI thread before snapshotting tools.
Carved out of #44478.
Co-authored-by: AJ <yspdev@gmail.com>
The deploy-site skills index crawl was capped at ~3k ClawHub entries
because CATALOG_WALK_BUDGET_SECONDS applied to max_items=0 walks too.
Only enforce the wall-clock budget for bounded browse requests and pass
limit=0 from build_skills_index so CI walks the full catalog.
Co-authored-by: Cursor <cursoragent@cursor.com>
* docs: finish Automation Blueprints terminology rebrand
Replace leftover "Automation Templates" wording from the Cron Recipes
rebrand, rename the copy-paste cookbook guide to Automation Recipes, and
point the marketing gallery link at the blueprints catalog.
Co-authored-by: Cursor <cursoragent@cursor.com>
* docs: use Automation Blueprints instead of Recipes in guide
Rename the cookbook guide from automation-recipes to
automation-blueprints so sidebar and copy match the product term.
Co-authored-by: Cursor <cursoragent@cursor.com>
* docs: rename automation-blueprints-catalog to automation-blueprints
Drop the -catalog suffix from the reference page slug and title, and
move the copy-paste cookbook to automation-blueprint-examples so the
main Automation Blueprints doc is unambiguous.
Co-authored-by: Cursor <cursoragent@cursor.com>
* Revert "docs: rename automation-blueprints-catalog to automation-blueprints"
This reverts commit 605f1eeab5.
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Legibility pass on the consolidated prefix: collapse the topic-overlap rule
from three overlapping sentences into one WINS sentence + one discard/no-wrap-up
sentence (same constraints, less dilution), fix the module docstring to
describe the headings that actually shipped, and correct the #10896 comment's
heading name (Historical Pending User Asks).
The prompt consolidation above retires the carveout-era prefix. Without a
frozen copy in _HISTORICAL_SUMMARY_PREFIXES, summaries persisted by
pre-upgrade builds would lose detection (_is_context_summary_content) and
renormalization (_strip_summary_prefix) — the exact regression class the
tuple exists to prevent. Adds contract tests covering every frozen prefix.
Refs #41607#38364#42812
A WSL2 user reported the top two left-sidebar items being unclickable
while the rest of the UI works. That symptom shape matches an
-webkit-app-region:drag hit-test band eating clicks, not GPU/compositing:
the shell's titlebar drag strips (app-shell.tsx) span the top 34px and
the nav group clears them by only 6px, and drag regions win hit-testing
over DOM regardless of pointer-events. Linux WCO (Electron >=32) is the
newest implementation and has known region quirks (electron#43030).
Apply the same no-drag carve-out the codebase already uses for sticky
user bubbles (USER_BUBBLE_BASE_CLASS in thread.tsx) to the sidebar nav
buttons. Harmless on every platform: the rows were never meant to be
draggable surface.
Root-cause hardening for the stranded-empty-registry failure behind
'No web search/extract provider configured': discover_and_load() set
_discovered=True before scanning, so a sweep that raised partway was
swallowed by callers as a warning and every later call early-returned
against an empty registry for the process lifetime. The flag now acts
only as a re-entrancy guard and is reset when the sweep raises, so the
next call retries discovery.
Pins the invariant that _ensure_web_plugins_loaded registers the keyless
Parallel default (and the wider bundled set) even when the general plugin
discovery raises, that the direct-registration fallback honors plugins.disabled,
and that it stays a no-op on the healthy path.
web_search/web_extract are documented to work with zero setup via the bundled
keyless Parallel free-MCP backend, but that only holds when the bundled
plugins/web/* providers are registered. The dispatch relied entirely on the
general plugin sweep to do that; when the sweep finishes without registering
them (its exception swallowed as a warning, a packaged layout where it ran
before the bundled tree was importable, or a stale empty-discovery cache), the
registry is empty and BOTH tools dead-end on "No web {search,extract} provider
configured" — despite needing no setup at all.
_ensure_web_plugins_loaded now verifies the keyless default landed after the
sweep and, if not, registers the bundled web providers directly against the
registry. Idempotent, a no-op on the healthy path (one dict lookup), and honors
an explicit plugins.disabled entry.
* fix(discord): recover from runtime gateway task exits
Salvaged from #39416 (AMEOBIUS) — cherry-picked only the task-exit
recovery; the original PR was 1081 commits behind with 28 unrelated
commits.
A post-ready discord.py WebSocket crash left the gateway split-brained:
producers stayed active while Discord stopped responding. After this fix
the adapter calls _set_fatal_error(retryable=True) + _notify_fatal_error()
so the existing GatewayRunner reconnect watcher replaces the dead adapter.
Also adds _wait_for_ready_or_bot_exit() so startup failures (SOCKS/proxy
errors, invalid tokens) surface fast instead of burning the full ready
timeout. Because connect() no longer waits via asyncio.wait_for on that
path, test_connect_releases_token_lock_on_timeout is updated to trigger
the timeout through the new helper (same lock-release contract).
3 tests pass (2 new runtime-failure tests + the updated timeout test);
test_discord_connect.py and test_discord_slash_commands.py green.
Co-Authored-By: ameobius <ameobius@local.host>
* fix(test): patch _wait_for_ready_or_bot_exit in timeout cancel test
connect() no longer uses asyncio.wait_for for the ready handshake, so
test_connect_timeout_cancels_bot_task was hanging for 30s in CI.
Co-authored-by: Cursor <cursoragent@cursor.com>
---------
Co-authored-by: ameobius <ameobius@local.host>
Co-authored-by: Cursor <cursoragent@cursor.com>
Follow-up to #44389: the generic 'except Exception' branch in connect()
had the same orphaned-task hazard as the timeout branch. Extract the
cancel-and-await logic into _cancel_bot_task() and call it from all
three sites (timeout branch, exception branch, disconnect()).
Also adds deaneeth to AUTHOR_MAP.
When connect() times out waiting for the Discord ready event, the background
asyncio.Task running client.start() was not cancelled. discord.py's internal
reconnect loop can ignore client.close() while a WebSocket handshake is in
flight, so the orphaned task eventually completes and fires on_ready.
A later successful reconnect then leaves two live Discord clients in the same
process — each with its own on_message handler and MessageDeduplicator instance
— so every @mention creates two threads because the per-adapter dedup caches
cannot catch cross-client duplicates.
Fix: explicitly cancel and await _bot_task in two places:
1. The asyncio.TimeoutError handler inside connect() — catches the case where
the adapter's own inner wait_for fires before the gateway's outer timeout.
2. The start of disconnect() — the load-bearing path, always reached via
_dispose_unused_adapter regardless of which timeout fired first.
Root cause confirmed from production logs: a Jun 8 network outage caused three
consecutive connect() timeouts. The first attempt's bot_task completed its
handshake 4 minutes later ("Connected as") with no preceding watcher line,
then the watcher's real reconnect also connected 90 seconds after that. The two
clients ran continuously for 41+ hours, confirmed by the same user message
appearing as two separate inbound events in two different thread IDs 357ms apart.
Regression tests added to tests/gateway/test_discord_connect.py:
- test_connect_timeout_cancels_bot_task: simulates a connect() timeout with a
NeverReadyBot and asserts _bot_task is None afterward
- test_disconnect_cancels_running_bot_task: injects a live zombie task, calls
disconnect(), and asserts the task is cancelled and the attribute cleared
Sibling site of the PDF/DOCX note fixed in PR #44175: the audio file
attachment context note led with "Ask the user what they'd like you to
do with it", steering the model into asking instead of transcribing.
Rewritten to instruct the agent to transcribe/process the file itself
when the request involves its content, only asking when intent is
genuinely unclear. Contract assertion added to the existing audio
attachment note test.
Pin the contract for _build_document_context_note: text documents confirm the
inlined content and record the path; binary documents (PDF/DOCX/XLSX/octet-
stream) tell the agent to extract the text itself and never instruct it to ask
the user to paste the contents.
When a user attached a binary document (PDF, DOCX, XLSX, …) in chat, the
context note prepended to the turn said "Ask the user what they'd like you to
do with it." That steered the model into asking the user to paste the
contents rather than extracting the text it is fully capable of reading — so
attached PDFs/DOCX appeared "unreadable" to the agent.
Rewrite the binary-document note to tell the agent the file is a non-text
format saved at the given path and to extract its text itself (e.g. via the
terminal tool or the ocr-and-documents skill) before answering. Text
documents (whose content is already inlined by the platform adapter) keep
their existing note. The note construction is pulled into a small
`_build_document_context_note` helper so it is unit-testable.
Set CI=1 in _run_npm_install_deterministic so the package's /dev/tty
postinstall demo is skipped during hermes dashboard web UI builds.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(desktop): stop file tree throwing "two HTML5 backends" on remount
The Agent Workspace file tree (react-arborist) shows a permanent "TREE ERROR"
with `[error-boundary:file-tree] Cannot have two HTML5 backends at the same
time.` react-arborist mounts its own react-dnd DndProvider + HTML5Backend per
<Tree>. react-dnd v14 keeps that manager on a global, ref-counted singleton
context and nulls it when the count reaches 0. The tree is keyed on
`${cwd}:${collapseNonce}`, so changing folder / collapsing forces a fresh
<Tree>; during the remount the singleton can be torn down and recreated while
the previous HTML5Backend still owns `window.__isReactDndHtml5Backend`, so the
new backend's setup() throws. The error boundary then sticks, because "Try
again" just remounts into the same race.
Pass arborist a stable, app-lifetime `dndManager` (new getFileTreeDndManager
singleton) so it reuses one backend for the life of the app and never
double-claims the window flag. Drag/drop is already disabled on this tree;
this only changes how the (unused) dnd backend is provisioned.
Promotes dnd-core and react-dnd-html5-backend to explicit deps (already present
transitively via react-arborist's react-dnd 14.x line, so they dedupe to one
instance).
* fix(nix): bump npmDepsHash for desktop dnd deps
Adding dnd-core / react-dnd-html5-backend changed the workspace
package-lock.json, so the single workspace-root npmDepsHash in
nix/lib.nix was stale and the nix build failed. Regenerate it
(hash from the failing nix CI job's 'got:' value).
* fix(nix): update npmDepsHash for merged lockfile
After merging main, the workspace lockfile combined main's dep
changes with the desktop dnd additions, so the npmDepsHash needed
recomputing again. Hash from the nix lockfile-check job.
* fix(nix): use fetchNpmDeps hash for desktop dnd lockfile
prefetch-npm-deps reported sha256-lVnybH9RE/... but fetchNpmDeps
wants sha256-mYgKXE/FL4hnkrEvpVv+ULM/oeyIfO2AM9Ol8OrfWm0= for the
merged workspace lockfile. Use the nix build 'got:' hash so CI passes.
---------
Co-authored-by: Brooklyn Nicholson <brooklyn.bb.nicholson@gmail.com>
When a user toggles off the last visible model for a provider group, the
effectiveVisibleKeys() function treated the missing provider prefix as
'never customized' and re-added the default models on the next render,
causing all models to snap back to enabled.
Fix: store a sentinel key (e.g. 'provider::') when the last model for a
provider is toggled off. The sentinel distinguishes 'user hid everything'
from 'user never customized', preventing the default-fallback path from
re-adding models the user explicitly chose to hide.
Fixes#43485
Turn off browser spellcheck, autocorrect, and autocomplete on the main chat composer and message-edit composer so code, paths, and slash commands are not flagged or altered.
The coding-posture brief told GPT/Codex models to use patch mode='patch'
(V4A) for structured/multi-file changes but mode='replace' "for a single
small swap". That second nudge points those models at a format their
first-party harness never taught them.
Verified against openai/codex (current main): apply_patch is the ONLY file
editor in codex-rs — zero occurrences of str_replace/old_string anywhere in
the repo; the grammar (core/src/tools/handlers/apply_patch.lark) is exactly
the V4A dialect our patch_parser implements; the shipped model prompts
(gpt_5_codex, gpt-5.2-codex, gpt-5.1-codex-max + instruction templates)
explicitly say to use apply_patch "for single file edits"; and the tool is
gated per model via ModelInfo.apply_patch_tool_type, i.e. OpenAI ships
V4A-for-everything as model metadata.
The GPT-family line now steers to mode='patch' for all edits, single-file
included. The replace-family line (Claude + open-weight) is unchanged —
Claude Code's FileEdit is old_string/new_string/replace_all exact string
replacement (confirmed from Anthropic's shipped sdk-tools.d.ts, the only
file editor in its tool union), matching our mode='replace'.
CI tests the PR merged with current main, where the new /memory canonical
command filled Slack's 50-slash cap: with btw/bg/reset all pinned ahead of
canonicals, the last canonical (/debug) got clamped and the Telegram-parity
test failed. Canonical commands must win slots over alias spellings — /new
keeps its native slot and 'reset' stays reachable via /hermes reset.
Also updates test_includes_aliases_as_first_class_slashes to assert the
pinned-alias contract (_SLACK_PRIORITY_ALIASES survive) instead of a
specific unpinned alias's survival, which was the same change-detector
pattern the docstring already warned about.
Review fixes for the Cron Recipes stack before release:
- hydration-move: */90 in the cron minute field silently wraps to hourly
(croniter-verified) — 90/120-minute options never fired at their stated
cadence. Replaced with an hour-field step (0 9-17/2 * * 1-5) and an
interval_hours slot whose options (1/2/3h) all fire as labeled.
- fill_recipe: reject unknown slot names. A typo'd 'tiem=07:15' used to
silently create the job at the 08:00 default; now it 422s on the dashboard
form and errors on the slash/deep-link paths with the valid slot list.
- deliver slot: non-strict enum (options are suggestions, scheduler
validates downstream) so slack/whatsapp/etc. users aren't locked out;
GET /api/cron/recipes rewrites its options from cron_delivery_targets()
so the dashboard form only offers configured platforms; help text no
longer claims dashboard-created jobs deliver to 'the chat you set this
up from' (the endpoint strips origin — they go to the home channel).
- gateway: success/accept messages no longer point at /cron (cli_only);
surface-aware hint instead. Conversational fill now sends the
'Setting up X — I'll ask you a couple of things…' ack before the agent
turn, matching the CLI experience.
- important-mail catalog entry: reference the urgency classifier by module
path (python3 -m cron.scripts.classify_items) instead of baking an
absolute host path into the job prompt — stale after relocation and
nonexistent on remote terminal backends. cron/scripts is now a real
package and ships in the wheel (pyproject packages.find).
- export_recipe: interval schedules round-trip again — parse_schedule
stores 'minutes' but the renderer only read 'seconds', so every interval
job exported as the silent '0 9 * * *' fallback.
- skills_hub install: say so when a recipe suggestion is dropped
(latched dedup or pending cap) instead of printing nothing.
Targeted tests: 58 cron/recipe + 261 web_server pass; E2E-validated all
14 recipes fill+parse, hydration cadences via croniter, typo rejection on
slash + endpoint paths, surface-aware hints, and interval export round-trip.
Reworks the chat-line UX: pick a recipe by name and the agent asks you for
what it needs, one question at a time, instead of forcing you to hand-type a
slot=val command line.
- /cron-recipe -> lists the catalog
- /cron-recipe <name> -> forgiving name match (exact/prefix/substring/
fuzzy; ambiguous lists candidates), then seeds
the agent with a natural-language fill request
built from the recipe's typed slots + schedule
and prompt templates. The agent asks for each
value one at a time and calls the EXISTING
cronjob tool. No new tool.
- /cron-recipe <name> slot=val -> unchanged deterministic path (fill_recipe ->
create_job) for the dashboard/docs/power user.
Mechanism (no new plumbing, invariant-safe — the seed enters as a normal user
turn, never a synthetic injection):
- shared handler returns RecipeCommandResult{text, agent_seed}; match_recipe()
and build_recipe_seed() are the new shared pieces.
- gateway: dispatch rewrites event.text to the seed and falls through to the
agent (the same pattern /steer uses).
- CLI: handler sets a one-shot self._pending_agent_seed; the interactive loop
consumes it right after process_command() and runs it as the next turn.
The typed-slot schema stays the single source of truth (still validates the
form/inline path via fill_recipe); the agent path just renders those slots into
the questions to ask. Docs updated to lead with the name-then-ask flow.
A 'recipe' is a one-place definition of an automation that every surface
renders natively. The slot schema (cron/recipe_catalog.py) is the single
source of truth; four renderers consume it, and all paths end at the same
cron.jobs.create_job — no second job engine.
Form where there's a screen, conversation where there's a chat line:
- Dashboard / GUI app: a Recipes sub-tab on the Cron page renders each
recipe's typed slots as a form (time-picker, enum dropdown, free-text);
submit POSTs /api/cron/recipes/instantiate which fills + creates the job.
- CLI / TUI / messengers: /cron-recipe lists the catalog, shows a recipe's
fields, or fills + creates from a pasted 'key slot=val' command. The shared
handler (hermes_cli/cron_recipe_cmd.py) names any missing/invalid slot so
the agent can ask a targeted follow-up.
- Docs: a generated Cron Recipes catalog page (website, .mdx + React cards)
shows each recipe with a copy-paste command and a 'Send to App' button.
- Desktop: a hermes:// URL scheme (Electron single-instance lock +
setAsDefaultProtocolClient + open-url/second-instance) routes
hermes://cron-recipe/<key>?slot=val into the chat composer pre-filled.
Typed slots (time/enum/text/weekdays) with defaults: users never type raw
cron — recipes parameterize time-of-day and weekday sets and translate to
cron expressions; a free-text 'schedule' slot is the full-flexibility escape
hatch. Consent-first throughout: nothing schedules without an explicit submit
or send.
Core:
- cron/recipe_catalog.py — CronRecipe + RecipeSlot, 5 curated recipes,
recipe_form_schema / recipe_slash_command / recipe_deeplink /
recipe_catalog_entry renderers, fill_recipe (validate + translate to
create_job kwargs).
- hermes_cli/cron_recipe_cmd.py — shared /cron-recipe handler (CLI + TUI +
gateway never drift). CommandDef + dispatch in commands.py / cli.py /
gateway/run.py.
Dashboard: GET /api/cron/recipes + POST /api/cron/recipes/instantiate
(web_server.py), CronRecipes.tsx gallery+form, Segmented sub-tab on CronPage,
api.ts methods + types.
Desktop: hermes:// scheme end to end (main.cjs deep-link router + ready-queue,
preload onDeepLink/signalDeepLinkReady, global.d.ts types, desktop-controller
composer prefill, electron-builder protocols key).
Docs: extract-cron-recipes.py generator wired into prebuild.mjs,
cron-recipes-catalog.mdx + CronRecipesCatalog React component, sidebar entry.
Generated index json gitignored like skills.json.
Tests: 23 core (catalog/slots/schedule-resolution/validation/renderers/command
handler/generator) + 5 web_server endpoint tests. E2E verified end to end:
slot fill -> create_job -> persisted job with correct schedule/deliver/origin.
Hermes can propose automations and let the user accept them with one tap
via /suggestions, instead of making them assemble cron jobs by hand. Every
proposal — wherever it originates — flows through one surface.
Sources (the 'where suggestions come from'):
- catalog: curated starter automations (daily briefing, important-mail
monitor, weekly review, workday-start reminder) via /suggestions catalog
- recipe: installing a skill that carries a metadata.hermes.recipe block
registers a suggestion instead of auto-scheduling
- usage / integration: reserved for the background-review detector and
account-connect triggers (sources defined; emitters land next)
Pieces:
- cron/suggestions.py — the store. add/list/accept/dismiss, dedup+latch by
key (dismissed proposals never re-offered), pending cap so it can't become
a nag wall. Accepting calls the existing cron.jobs.create_job — there is
NO second job engine. Mirrors jobs.py storage (atomic writes, lock, 0600).
- cron/suggestion_catalog.py — the curated set. The important-mail monitor
entry is where the old proactive-monitor poll->classify->surface engine
lives now (cron/scripts/classify_items.py + the 'monitor' aux task), as ONE
catalog automation rather than a standalone feature.
- tools/recipes.py — recipe<->job bridge; register_recipe_suggestion() makes
a recipe source 'recipe' of this surface. recipe_to_job_spec() is the single
translation both the direct and suggestion paths share.
- hermes_cli/suggestions_cmd.py — shared /suggestions handler (CLI + gateway
never drift); /suggestions [accept N|dismiss N|catalog|clear].
- Wired: CommandDef + CLI dispatch (cli.py) + gateway dispatch (gateway/run.py)
+ aux 'monitor' task (config.py) + recipe-install hook (skills_hub.py).
Consent-first throughout: nothing auto-schedules; acceptance is always
explicit; dismissals latch.
Supersedes #41122 (proactive-monitor) and #41127 (recipes): both fold in here
as a catalog entry and a suggestion source respectively.
Tests: store (dedup/cap/accept/dismiss/latch), catalog seeding+idempotency,
recipe->suggestion bridge, command handler, aux config. E2E: recipe SKILL.md
-> parsed -> suggested -> accepted -> real cron job persisted to jobs.json.
The coding posture's names-only demotion of non-coding skill categories
(#44342) applied under the default auto mode, silently changing the skill
index for every user in a git repo. Index changes must be opt-in: demotion
now only fires under agent.coding_context=focus, alongside the toolset
collapse. auto/on leave the skill index untouched; focus semantics are
unchanged (demoted, never hidden; deny-list keeps coding-adjacent and
custom categories at full entries).
The Config page read config_path from /api/status, which is machine-global
and always reports the profile the dashboard process was started under.
After switching profiles with the global switcher, the header kept showing
the old profile's path (e.g. /root/.hermes/profiles/worker_1/config.yaml)
even though reads/writes correctly targeted the new profile.
Fix: /api/config/raw now returns the resolved path alongside the YAML
(resolved inside _profile_scope, so it follows ?profile=). ConfigPage
prefers that scoped path and only falls back to /api/status for old
servers. ProfileKeyedRoutes already remounts the page on switch, so the
header refreshes immediately.
Real-world failure with the original index pruning: under the default auto
posture, an agent-created ops skill in a demoted category vanished from the
prompt's skill index mid-project, and the agent silently fell back to a
stale sibling skill instead. The "discovery-only" premise didn't hold —
models do not reach for skills_list to rediscover what the index stops
showing them, and agent-created skills are the model's accumulated project
memory (runbooks, pitfalls, operating rules).
Gating pruning behind the opt-in focus mode was the wrong fix too: users
opening a worktree don't know the config exists, so the index-noise win
would effectively never ship.
Instead, the coding posture now DEMOTES non-coding categories rather than
hiding them: each demoted category renders as a single names-only line
("gaming [names only]: allthemons10-ops, mc-backup") with a footer note
explaining the omitted descriptions. Every skill name stays in the prompt,
so memory-anchored recall ("load <name>") keeps working in every mode,
while the description noise is still cut. Applies in auto/on/focus alike;
the general posture demotes nothing. Deny-list semantics unchanged —
unknown/custom categories and coding-adjacent ones keep full entries.
API renamed to match the honest semantics: hidden_skill_categories →
compact_skill_categories, build_skills_system_prompt(hidden_categories=) →
compact_categories=.
Two dashboard fixes:
1. The 'Anthropic API Key' OAuth catalog entry's status fn read
~/.claude/.credentials.json (which has its own dedicated claude-code
entry) and never checked ANTHROPIC_API_KEY at all. It now checks the
Hermes PKCE file, then the registry env-var order (ANTHROPIC_API_KEY
-> ANTHROPIC_TOKEN -> CLAUDE_CODE_OAUTH_TOKEN) via get_env_value, so
keys from .env, the shell, or Bitwarden (injected into the process
env by load_hermes_dotenv) are all reported, with a '(from Bitwarden)'
source suffix when applicable.
2. Deprecated HERMES_TOOL_PROGRESS / HERMES_TOOL_PROGRESS_MODE removed
from OPTIONAL_ENV_VARS so the keys page and setup checklists stop
offering them. Moved to _EXTRA_ENV_KEYS so .env sanitization and
reload_env still recognize them for existing users (gateway back-compat
fallback unchanged).
IAM policies scoped to bedrock:InvokeModel only (a common least-privilege
setup) reject converse_stream() with AccessDeniedException. The agent loop
hard-prefers streaming and the denial never matched the 'stream not
supported' auto-fallback, so InvokeModel-only users looped on AccessDenied
forever.
- agent/bedrock_adapter.py: new is_streaming_access_denied_error()
detector (ClientError code check + wrapped-SDK message match);
call_converse_stream() falls back to converse() on denial.
- agent/chat_completion_helpers.py: bedrock_converse streaming branch
retries inline via converse() and sets _disable_streaming so later
turns skip the doomed stream attempt; the chat-completions retry
block also recognizes the denial for the AnthropicBedrock SDK path
(message pre-check avoids importing bedrock_adapter — and its lazy
boto3 install — for unrelated providers).
Both paths print a one-line notice telling the user which IAM action
restores streaming.
* fix(desktop): Harden local file tree paths
Normalize Electron local path handling across file tree, preview, media, and git-root flows. Reject malformed and Windows device paths, recheck sensitive files after realpath resolution, and preserve external symlink traversal with stable renderer errors.
* fix(desktop): Address file tree review feedback
* fix(gateway): gate oversized Telegram voice/audio before download
Adds a pre-download size check to the Telegram voice and audio inbound
paths. Files that exceed _max_doc_bytes (default 20 MB) are rejected
before get_file() is called, preventing silent OOM-style stalls on large
uploads. A human-readable note is appended to the event text so the
model can explain the limit to the user.
Also extends 403 entitlement detection in recover_with_credential_pool
to cover two additional cases: 'oauth authentication is currently not
allowed for this organization' and Anthropic anthropic_messages-mode 403s,
both of which should be treated as entitlement failures rather than
transient errors.
Tests: 7 new cases in test_telegram_voice_v0_regressions.py covering
the size gate (accept, reject, note text) and the STT-failure notice path.
Salvaged from #40487 (cryptopafi) — cherry-picked the Telegram voice
policy and 403 entitlement fixes; LiveKit/Discord/uv.lock workstreams
left for separate PRs.
* test(gateway): drop orphaned voice tests not backed by this PR
The cherry-picked test file from #40487 included 3 tests for STT-failure
notice and voice-mode (_handle_voice_command 'on' -> voice_only) behavior
that this PR intentionally does NOT salvage (those belong to the LiveKit/
voice-policy workstreams left in #40487). They fail on both this branch
and clean main because the feature code isn't present.
Keep only the 2 tests backed by code actually in this PR:
- test_telegram_audio_size_gate_rejects_oversized_media_before_download
(covers the _telegram_media_size_allowed guard this PR adds)
- test_voice_tts_is_explicit_audio_reply_opt_in (matches current main)
Removed now-unused imports (MessageEvent, MessageType, AsyncMock).
Headless/VPS users (dashboard-over-Tailscale, no comfortable SSH) could
list/toggle/install skills and create/edit cron jobs, but not author a
custom skill or link one to a cron job — the UI set WHEN a job runs, but
not WHICH skill it uses.
- Skills page: 'New skill' button + per-row edit pencil open a SKILL.md
editor dialog (frontmatter + body, server-side validation via the same
_create_skill/_edit_skill path as the agent's skill_manage tool).
- New endpoints: GET /api/skills/content, POST /api/skills,
PUT /api/skills/content — all profile-scoped via _profile_scope(),
which now also retargets tools.skill_manager_tool's import-time
SKILLS_DIR binding.
- Cron page: skills multi-select in both create and edit modals (parity
with hermes cron --skill / edit --add-skill); CronJobCreate gains a
skills field; job cards show an attached-skills badge. update_job
already accepted skills in updates.
- Tests: 17 new endpoint tests (content read, create/edit validation +
profile scoping + auth gate, cron skills round-trip).
* fix(gateway): refuse to write service definitions with a temp-dir HERMES_HOME
A test/E2E harness that exports HERMES_HOME=/tmp/... and touches any
gateway service write path (install, start self-heal, restart's
refresh_systemd_unit_if_needed) bakes the throwaway home into the
production systemd unit / launchd plist. The gateway then restarts
'healthy' but pointed at an empty temp home — no platforms enabled,
deaf to every message (live incident 2026-06-11: /tmp/hermes-e2e-41264
poisoned the unit during a PR-review E2E probe; the post-update restart
produced a 7-hour zombie gateway).
The existing safety belt only sniffed pytest-shaped markers
(/pytest-of-, /hermes_test). Add a structural guard:
_temp_home_in_service_definition() extracts HERMES_HOME from the
generated systemd unit or launchd plist and refuses the write (with
actionable guidance) when it resolves under tempfile.gettempdir(),
/tmp, /var/tmp, or the macOS /private variants. Wired into all five
write sites: systemd refresh + install, launchd refresh + install +
start self-heal.
* test: patch unit generator in install tests tripped by temp-home guard
CI runs hermetic with HERMES_HOME under a tmp dir, so the real
generate_systemd_unit() output now (correctly) trips the new temp-home
write guard in three install tests. Patch the generator with synthetic
non-temp content — same pattern the existing pytest-marker guard tests
use.
Adds an idle clock to the context/status bar in both the prompt_toolkit CLI
and the Ink TUI: once a turn completes, a dim '✓ <elapsed>' segment shows how
long the session has been idle since the last final agent response. Hidden
while a turn is live (the per-prompt elapsed timer covers that) and before
the first turn completes.
- cli.py: track _last_turn_finished_at when the agent thread exits, surface
it via _format_idle_since() in the snapshot, render in both the wide
fragments path and the plain-text fallback.
- ui-tui: stamp lastTurnEndedAt when busy flips false after a live turn,
thread it through appStatus -> StatusRule, render via a ticking IdleSince
segment sharing the duration breakpoint/width budget.
Commit 550b72dd8 changed the concurrent-path tool-result rendering gate
from 'not agent.quiet_mode' to 'tool_progress_mode != off'. Subagents are
constructed with quiet_mode=True but inherit the default
tool_progress_mode='all', so every child tool call during delegate_task
started printing raw '✅ Tool N completed in Xs - {json...}' lines into
the parent's display, bypassing the curated tree-view relay in
_build_child_progress_callback.
Fix: require BOTH gates — quiet_mode must be off AND tool_progress_mode
must not be 'off' — restoring subagent silence while preserving the
#33860 fix (CLI verbose + tool-progress off stays suppressed). The same
combined gate is applied to the three sibling print sites in
tool_executor.py (concurrent header/args, sequential args, sequential
completion) so the whole class is consistent.
Two beta-reported dashboard bugs:
1. Models page: 'Use as -> Main model' on an analytics card sends
entry.provider, which falls back to the model's VENDOR prefix
(modelVendor('anthropic/claude-opus-4.6') == 'anthropic') when the
session row has no billing_provider. That persisted
provider: anthropic + default: anthropic/claude-opus-4.6 — a
vendor-prefixed OpenRouter slug on the NATIVE Anthropic provider.
New sessions then 400 against api.anthropic.com and the user reads
it as 'changing models does nothing'. Unknown vendors (moonshotai,
poolside, ...) were worse: a provider that can never resolve
credentials.
Fix: _normalize_main_model_assignment() at the single write
chokepoint — maps non-provider vendor names back to the user's
current aggregator (else openrouter), and runs the model through
normalize_model_for_provider() so the persisted name matches the
target provider's API format. Wired into both /api/model/set and
the profile-scoped _write_profile_model.
2. System page: 'Restore from backup' spawns hermes import with
stdin=DEVNULL, so the CLI's interactive 'Continue? [y/N]' overwrite
prompt hits EOF and auto-aborts whenever a config already exists
(always, when the dashboard is running). Fix: ConfirmDialog in the
dashboard owns the consent, then the endpoint passes --force so the
restore runs non-interactively.
Validated live: dashboard on a temp HERMES_HOME, repro'd both failure
modes pre-fix (vendor-slug write verified via config.yaml + tui
session.create; import 'Aborted.' in action-import.log), then verified
post-fix (normalized writes, modal -> --force -> restored marker file).
* fix(matrix): isolate room context and inbound dispatch
* test(matrix): cover room isolation and dispatch regressions
* docs(matrix): document room isolation and session scope
* fix(matrix): stabilize CI requirement checks
* test(matrix): isolate mautrix stubs in requirements tests
* fix(matrix): port room-scoped status and resume to slash commands mixin
Move Matrix /status scope output and /resume same-room guards from the
pre-refactor gateway/run.py into gateway/slash_commands.py so PR #18505
foundation behavior survives the upstream god-file decomposition.
Uses i18n keys for Matrix resume/status messages. Preserves upstream
session.py fixes (role_authorized, DM user_id isolation).
* docs(matrix): explain inbound dispatch via handle_sync loop
Document why Hermes uses an explicit sync loop with handle_sync() rather than
client.start(), aligning with upstream #7914 diagnostics while preserving
Hermes background maintenance tasks.
* fix(i18n): add Matrix resume/status keys to all locale catalogs
The Matrix /resume and /status slash-command keys added in the foundation
PR must exist in every supported locale file. tests/agent/test_i18n.py
asserts key and placeholder parity across catalogs.
Non-English locales use English strings as interim placeholders until
community translators can localize them.
* fix(matrix): restore gateway authz for allowed_users; honor config require_mention
Revert the early MATRIX_ALLOWED_USERS gate in _on_room_message so inbound
sender authorization stays in gateway authz like main. Parse require_mention
from config.extra (platforms.matrix / top-level matrix yaml) with env fallback,
matching thread_require_mention and fixing Forge when require_mention is set
only in profile config.yaml.
* fix(matrix): harden status scope and allowlisted DMs
* fix(matrix): use session store lookup for resume scope
* fix(mcp): propagate HERMES_HOME override onto the MCP event loop
Closes the known limit documented in #44007: tasks scheduled via
run_coroutine_threadsafe are created INSIDE the MCP loop thread, so they
copy that thread's context — a per-request profile scope (dashboard
?profile= endpoints, e.g. the MCP 'Test server' probe) silently vanished
for anything resolving get_hermes_home() inside the coroutine. Most
visible symptom: OAuth token-store paths (HERMES_HOME/mcp-tokens/)
resolved against the process home instead of the selected profile, so
testing an OAuth MCP cross-profile read the wrong tokens.
_run_on_mcp_loop now wraps scheduled coroutines with the caller's
context-local override (_wrap_with_home_override): set inside the task's
own context on the loop, reset on completion — task-local, so concurrent
calls carrying different scopes don't interfere, and the loop thread's
default context stays untouched. No-op (coroutine passes through
unwrapped) when no override is active, i.e. every non-dashboard caller.
web_server's probe comment updated from 'known limit' to 'covered'.
Tests: override propagation (direct + factory form), OAuth token-path
resolution on the loop, loop-context cleanliness after scoped calls,
no-op passthrough. 225 green across mcp_tool + unification suites.
* test(mcp): concurrent different-scope calls don't interfere
A long-lived Baileys bridge survives gateway restarts AND hermes update:
connect() adopted any bridge already listening with status connected, and
disconnect() only kills bridges the adapter spawned itself. Users who
updated to get inbound media support kept talking to a bridge process
serving months-old bridge.js — images and voice notes still arrived as
placeholders with no cached file path (refs #19105 follow-up reports).
Three fixes in the same stale-bridge class:
- Staleness handshake: bridge.js reports a sha256 self-hash in /health
(scriptHash); connect() compares it against bridge.js on disk and
restarts the bridge on mismatch. Pre-handshake bridges report no hash
and are treated as stale, so every existing stale bridge gets recycled
exactly once on the next gateway start.
- npm dep refresh: deps reinstall when package.json changes (stamp file
in node_modules), not only when node_modules is missing — a Baileys
pin bump now actually lands.
- Cache-dir passthrough: the gateway passes profile-aware
HERMES_{IMAGE,AUDIO,DOCUMENT}_CACHE_DIR to the bridge instead of the
bridge hardcoding ~/.hermes/image_cache etc., fixing media paths under
HERMES_HOME overrides, profiles, and the new cache/ layout.
* feat(dashboard): unify multi-profile management — one machine dashboard, global profile switcher
The dashboard becomes a machine-level management surface with one
write-target selector, replacing per-profile dashboard fragmentation.
Backend:
- profile param (query or body) on /api/config (get/put/raw), /api/env
(get/put/delete/reveal), /api/mcp/servers (list/add/remove/test/enabled),
/api/mcp/catalog (list/install), /api/model/info, /api/model/set —
all scoped through the existing _profile_scope() context manager
- model/set restructured: expensive-model warning (await) runs before the
scope; the config write runs sync inside the scope in a worker thread
- MCP catalog installs + git-bootstrap entries spawn 'hermes -p <profile>'
- chat PTY: ?profile= on /api/pty points the child's HERMES_HOME at the
profile dir (its own gateway subprocess, config/skills/memory/state.db
all profile-bound); in-process gateway attach skipped when scoped
CLI launch unification:
- '<profile> dashboard' routes to the machine dashboard: attach (open
browser at ?profile=) when one is listening, else re-exec pinned to the
default profile with --open-profile preselecting the launcher
- --isolated preserves the old dedicated per-profile server behavior
- start_server(initial_profile=...) appends ?profile= to the auto-open URL
Frontend:
- ProfileProvider + sidebar ProfileSwitcher: ONE global selector, URL-
persisted (?profile=), mirrored into fetchJSON which auto-appends the
param to the scoped endpoint families (explicit params win)
- app-wide amber banner names the managed profile
- SkillsPage's page-local selector (from the skills-scoping PR) folded
into the global context — single source of truth
- ChatPage threads the scope into the PTY WS URL; switching profiles
remounts the terminal into a fresh scoped session
Omitted profile keeps legacy behavior everywhere.
* docs(dashboard): document machine-level multi-profile management
- web-dashboard.md: 'Managing multiple profiles' section (switcher, URL
deep-links, unified launch, --isolated, scoped Chat, what stays
per-profile) + --isolated in the options table
- profiles.md: 'From the dashboard' subsection + set-as-active vs
switcher clarification
- cli-commands.md: --isolated flag + profile-alias launch example
* fix(dashboard): address profile-unification review findings
Review findings (dev review on PR #44007):
1. HIGH — stale page state on profile switch: pages load data on mount
and didn't consume the profile scope, so a page opened under profile A
kept showing A's state while writes silently targeted the newly
selected B. Fixed structurally: ProfileKeyedRoutes wraps the routed
page tree and keys it by the selected profile, remounting every page
(fresh state + refetch) on switch. ChatPage keeps its own remount
(channel keyed on scopedProfile).
2. HIGH — /api/model/auxiliary read was unscoped while /api/model/set
wrote scoped (Models page could show default's aux pins while editing
worker's). Endpoint now takes profile + _profile_scope, added to
PROFILE_SCOPED_PREFIXES, HTTPException re-raise so ghost profiles 404
instead of 500. Regression test asserts read/write symmetry with
differing worker/default aux config.
3. MEDIUM — tools post-setup spawned unscoped from the profile-aware
drawer. Now spawns 'hermes -p <profile> tools post-setup <key>'
(same mechanism as hub installs); drawer threads its profile prop.
Most hooks install machine-level artifacts where the scope is inert,
but hooks reading config/env now see the drawer's HERMES_HOME.
4. LOW — ty warnings: env Optional asserts before subscript/membership,
fastapi import replaced with web_server.HTTPException re-use.
298 tests green across the four affected suites; tsc -b + vite build
green; aux scoping E2E-verified with real imports.
* fix(dashboard): address second profile-unification review (gille)
1. BLOCKER — profile scope dropped on sidebar navigation: ProfileProvider
derived the selection from the current URL, and nav links are bare
paths, so clicking Config from /skills?profile=worker silently reset
the write target. State is now the source of truth; an effect
re-asserts ?profile= onto the new location after every navigation
(URL stays a synchronized projection for deep links/refresh), and an
incoming URL param (e.g. 'Manage skills & tools' links) still wins.
2. BLOCKER — /api/model/options unscoped while model/set wrote scoped:
the picker context (current model/provider, custom providers,
per-profile .env auth state) now loads inside _profile_scope; added
to PROFILE_SCOPED_PREFIXES. Test: a worker-only current-model pin
appears in the scoped payload and not the unscoped one.
3. BLOCKER — MCP test-server probe escaped the scope after the config
read: the probe now re-enters _profile_scope inside the worker thread
so env-placeholder expansion resolves against the selected profile's
.env. Known limit (documented): the probe's dedicated MCP event-loop
thread doesn't inherit the contextvar (OAuth token paths). Test
asserts get_hermes_home() inside the probe == the worker profile dir.
4. BLOCKER — broad excepts swallowed unknown-profile 404s: /api/model/info
degraded to 200-with-empty-model-info and /api/mcp/catalog to a
silently-empty catalog. Both re-raise HTTPException; 404 regression
tests added for info/options/catalog.
Polish: scope banner clears the fixed mobile header (mt-14 lg:mt-0);
--open-profile hidden via argparse.SUPPRESS (internal re-exec flag);
attach-path test now asserts the opened ?profile= URL.
(Stale-page-state + /api/model/auxiliary findings from this review were
already fixed in 92bcd1568 — the review ran against e600f6951.)
35 tests in the two new suites + 274 in the adjacent ones, all green;
tsc -b + vite build green; scoping E2E-verified with real imports.
* docs(dashboard)+fix: self-review pass — Profiles page section, REST profile-param tip, body-beats-query precedence
Docs:
- web-dashboard.md: add the missing 'Profiles' subsection to Pages
(cards, create/builder, manage-skills jump, set-as-active vs switcher
distinction, editors); REST API section gets a profile-scoped-endpoints
tip documenting ?profile= / body profile / 404 semantics / /api/pty
- (profiles.md + cli-commands.md were already updated in e600f6951)
Precedence fix: scoped endpoints taking BOTH a query param and a body
field now resolve body.profile first. The SPA's fetchJSON injects the
query param from the GLOBAL switcher; an explicit body.profile (e.g.
Profile Builder flows writing into a specific new profile) is the more
specific intent and must not be overridden by whatever the sidebar
happens to be set to. Matches the documented 'explicit beats global'
contract in api.ts.
Verified: 304 tests green across the four suites; tsc -b + vite build
green; docusaurus build green (only pre-existing broken-link warnings,
none from this PR's pages).
Editor-grade mouse selection parity with the Ink TUI (hermes-ink selection.ts):
a second click in the 500ms/1-cell chain selects the same-class character run
under the cursor (iTerm2 word set, wide-glyph aware), a third selects the line,
and dragging with the button held extends word-by-word / line-by-line while the
clicked span stays selected — anchor flips across the span on direction change.
Core knows only press-drag char selection, so this is a boundary shim
(multiClickSelect.ts) wrapping the renderer's startSelection/updateSelection
seam; word bounds read the presented frame's char grid. Native quirks probed
and pinned: per-renderable selection anchors are fixed at set time (anchor
flips restart the selection) and forward selections exclude the focus cell
(inclusive spans seed focus at hi+1). Pure scanning logic in logic/multiClick.ts;
20 new tests (pure + real-mouse-path frames); demo.tsx installs the seam for
tmux smokes.
When auto-compression rotates the session tip (old #4 → new #5), the
incoming page carries the new tip but the previous list still holds the
old one. The old tip's id differs from the new tip's id, so the existing
id-only dedup in mergeSessionPage() preserves both as separate sidebar
rows.
Add lineage-level dedup: build a set of incoming lineage keys
(`_lineage_root_id ?? id`) and filter survivors whose lineage key
matches any incoming row. This mirrors the existing sessionPinId()
logic used for pin stability.
Fixes#43483
Extract the remote-detection helpers (canonicalGitHubRemote, isSshRemote,
isOfficialSshRemote) from main.cjs into a testable update-remote.cjs sibling
module and add a node:test suite, wired into test:desktop:platforms.
main.cjs requires('electron') at load, so its inline helpers weren't unit
testable. The Python side of #43754 shipped a regression test; this gives the
desktop side the same coverage for the security-critical detection that keeps
passive update checks off the SSH origin (avoiding FIDO2/passkey touch
prompts). Tests assert SSH/HTTPS forms canonicalize equal, official SSH is
detected case-insensitively, and forks / other hosts / the HTTPS remote are
NOT misclassified.
Follow-up to the winget stale-registration fix. Update-ProcessPathForPackages
rebuilt $env:Path wholesale from the persisted User+Machine hives (plus winget's
Links dir), discarding any process-only PATH entries added earlier in the
installer run. Since the helper now runs after every package manager, that
wholesale replace is more likely to clobber a process-local entry than the
original winget-branch-only version was.
Merge instead: seed from the current process PATH, then append hive and
winget-Links entries not already present, with a case-insensitive,
order-preserving dedupe. Behaviour on a clean box is unchanged (the hive entries
are simply appended); the difference is that pre-existing process-only entries
now survive the refresh.
When ripgrep/ffmpeg is missing, `winget install <id>` on a package winget
already has registered is treated as an upgrade: it finds no newer version and
exits 0x8A15002B (-1978335189, APPINSTALLER_CLI_ERROR_UPDATE_NOT_APPLICABLE)
without ensuring the binary is actually present. The installer only logged that
code and judged success by `Get-Command rg`, so a stale registration (files
removed outside winget, or a missing alias shim) became a permanent dead-end —
winget kept reporting "already installed" and the user could never reinstall.
Detect that exit code and retry once with `--force` to repair the registration
so the shim reappears.
Also refresh the process PATH after the choco and scoop fallbacks (not just
winget) via a shared helper, so a successful fallback install — or any install
on a box without winget — is no longer misreported as "not installed".
The pre-update / pre-migration backup path (_write_full_zip_backup) had the
same /tmp staging bug as run_backup: a small tmpfs at the default tempfile
location silently drops large *.db files from the archive. Route its SQLite
staging temp files to the output zip's directory as well, and add regression
tests (mutation-verified) for both staging paths.
Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
Two bugs in the backup routine:
1. SQLite safe-copy used tempfile.NamedTemporaryFile() which defaults to
the system temp directory (/tmp). When /tmp is a small tmpfs and the
database is large, the copy silently fails and the resulting zip is
missing state.db, kanban.db, and response_store.db.
Fix: pass dir=out_path.parent so the temp file is staged alongside the
output zip on the same filesystem.
2. _EXCLUDED_DIRS contained "hermes-agent" which matched at ANY path
depth, accidentally excluding the Hermes Agent skill directory at
skills/autonomous-ai-agents/hermes-agent/.
Fix: special-case "hermes-agent" to only match when it is the first
path component (the root-level code checkout). All other excluded dir
names continue to match at any depth.
Regression tests added for both fixes.
Follow-up to the salvaged #41264 (Windows watcher): the setsid/bash detached
restart watcher on Linux/macOS inherits _HERMES_GATEWAY=1 the same way, so
the CLI's self-restart loop guard silently refuses 'hermes gateway restart'
and the gateway never comes back. Scrub the marker from the watcher env on
the POSIX branch as well, and extend the setsid test to assert it.
Desktop already recovers from a stale runtime session id when
`prompt.submit` returns `session not found` after a gateway restart or
sleep/wake. The stop path did not have the same recovery: `cancelRun`
called `session.interrupt` once with the stale runtime id, then surfaced
`Stop failed / session not found`.
This makes stop/cancel mirror the prompt recovery path. If
`session.interrupt` reports `session not found` and the selected stored
session id is available, Desktop resumes that durable session, updates
the active runtime ref with the recovered id, and retries
`session.interrupt` once against the recovered runtime id.
Salvaged from #43941 — rebased onto current main, dropping the unrelated
`package-lock.json` (@types/node 24.13.1->24.13.2) and `nix/lib.nix`
hash churn. That bump is a local npm 11 re-resolution artifact, not a CI
requirement: repo CI runs node 22 (npm 10) and main is green at
@types/node 24.13.1, so the lockfile and nix hash do not need to change.
Co-authored-by: helix4u <4317663+helix4u@users.noreply.github.com>
When the agent runs several terminal commands back-to-back, each
progress line repeated the '💻 terminal' header above its fenced code
block, cluttering the progress bubble. Now only the first terminal call
in a streak emits the header; subsequent consecutive terminal calls
render adjacent code blocks. Any other tool (or non-block preview)
resets the streak so the next terminal call gets a fresh header.
* fix(cli): omit --workspace when subpackage has its own package-lock.json
When ui-tui/ (or web/) contains its own package-lock.json, _workspace_root()
returns the subpackage directory itself. Passing --workspace ui-tui in that
case fails because npm cannot find a workspace named 'ui-tui' inside ui-tui/.
Fix: skip the --workspace flag when npm_cwd equals the target directory,
running a plain 'npm install' from the standalone project root instead.
Applies the same fix to both _make_tui_argv (TUI) and _build_web_ui (web).
Fixes#42973
* test(cli): fix web workspace-scope fixture + cover own-lockfile fallback (#42973)
The web half of the #42977 fix broke test_npm_install_uses_workspace_web_scope,
which built its fixture with no lockfile anywhere. Without a root lockfile,
_workspace_root(web_dir) already returns web_dir, so the new
"() if npm_cwd == web_dir" branch correctly drops --workspace and the
assertion failed. Model a real workspace checkout instead: the single
package-lock.json lives at the root, so --workspace web scopes the install.
Also add the symmetric web regression test (web/ carrying its own lockfile =>
--workspace must be dropped and the install runs plainly from web_dir via
npm ci), matching the TUI coverage already in test_tui_npm_install.py.
---------
Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
Keyed draft stash (Map + localStorage mirror) behind the live composer:
switching threads stashes the departing draft and restores the entering
one; empty threads show an empty box. Session lifecycle never clears
composer state — the scope swap is the only coupling.
Co-authored-by: mollusk <roger@roger.local>
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
Pipeline cell re-run with external VmRSS/VmHWM sampling on the dedicated
tmux servers: ink 5.07MB peak, otui 4.94MB peak, zero growth across the
800-msg stream (alt-screen = fixed grid, no scrollback accrual). The
emulator leg is a tie on memory as well as CPU.
- DRY the duplicated submit-restore blocks into dispatchSubmit()
- inline localStorage access (drop browserStorage indirection);
clearPersistedComposerDraft delegates to write('')
- drop stale per-scope-stash comment in use-session-actions
The composer is a single global surface that sits ABOVE the thread: its
contents follow the user across session switches and are never touched
by session lifecycle. Switching threads doesn't change the render.
Replaces the per-scope draft choreography (scoped storage keys, attachment
stash map, skip-sentinel, restore-on-scope-change effect) with:
- one global localStorage key so an unsent draft survives app reloads
- a one-shot restore on mount
- nothing else — session switches simply don't touch the composer
Verified E2E via CDP with real sidebar clicks + real keystrokes:
typed draft survives A->B->A switching and a full page reload.
* feat(agent): coding-context posture with per-model edit-format tuning
Hermes detects when it's running in a coding context — an interactive
surface (CLI, TUI, ACP, desktop) sitting in a code workspace (git repo or
recognised project root) — and shifts into a coding posture. Outside that
(chat platforms, non-workspaces) nothing changes.
The posture is modelled as a frozen RuntimeMode selected from a small
ContextProfile registry (coding/general). A profile is data: the toolset to
collapse to, the operating brief to inject, and seams for model routing and
memory. Every domain reads the same resolved object instead of re-probing
git/config on its own:
- System prompt — RuntimeMode.system_blocks(): an operating brief (gather
context before editing, edit through tools not chat, verify with terminal,
cap retry loops) plus a live git/workspace snapshot, built once and baked
into the stable prompt tier so per-conversation caching is preserved.
- Per-model edit-format tuning — the brief nudges each model family toward
the patch mode it handles best: OpenAI/Codex toward mode='patch' (V4A
multi-file diffs), Anthropic toward mode='replace' (string replacement).
The model id rides on RuntimeMode; unknown families keep neutral wording.
- Skill index — non-coding skill categories are pruned from the prompt's
skill index (discovery-only; skills_list/skill_view still reach the full
catalog, with a disclosure note).
- Toolset — only under the opt-in 'focus' mode does the posture collapse to
the coding toolset + enabled MCP servers; the default posture is
prompt-only and never overrides configured toolsets.
Activation via agent.coding_context: auto (default), focus, on, off.
Subagents inherit the posture for free via toolset inheritance + the shared
prompt builder. Detection is not memoized so a long-lived gateway/TUI
process can't pin a stale posture across working directories.
* feat(agent): cover new-file authoring in the coding edit-format nudge
The per-model edit-format guidance only addressed editing existing code
(patch mode='patch' vs 'replace'), but authoring a brand-new file —
write_file, not patch — is a large fraction of real coding work and the
nudge was silent on it. Surfaced when building a single-file artifact where
the dominant operation was write_file and the steering offered no guidance.
Both family lines now lead with "author new files with write_file; for
edits to existing code prefer ...". Tests assert write_file appears in each
family's brief; unknown families still get neutral wording.
* docs(agent): correct memoization docstring + clarify TUI config-load asymmetry
* feat(agent): sharpen the coding posture — verify-loop facts, wider edit steering, $HOME guard
Tuning pass on the coding posture from dogfooding it as a harness:
- Workspace snapshot now hands the model its verify loop up front:
detected manifests + package manager (lockfile sniff), the exact
verify commands (package.json scripts, Makefile targets,
scripts/run_tests.sh, pytest config), and which context files
(AGENTS.md / CLAUDE.md / .cursorrules) exist at the root. Marker-only
(non-git) projects get the snapshot too instead of nothing. The
"verify before claiming done" brief line was the highest-value piece
in evals — this turns it from advice into an executable loop instead
of making the model rediscover the test command every session. Still
stat-cheap, size-guarded reads, built once at prompt time.
- Edit-format steering covers the families Hermes actually serves:
Gemini and open-weight coding models (DeepSeek, Qwen, Kimi, GLM,
Grok, Hermes, Llama, Mistral, Devstral, MiniMax) steer to
mode='replace' — their RL scaffolds use str_replace-style editors.
Previously only GPT/Codex and Claude families got steering; the
models Hermes users disproportionately run all fell to neutral.
- Operating brief gains four behaviors elite harnesses encode: batch
independent reads/searches in one turn; fix root causes and the bug
class (sibling call paths), not the reported site; no drive-by
refactors/renames/reformatting; never read, print, or commit secrets.
Plus a patch-failure escalation ladder: after the same region fails
twice, rewrite the enclosing function/file with write_file instead of
a third patch attempt.
- $HOME dotfiles guard: a git repo rooted exactly at the home directory
(or a marker sitting in it, e.g. a global ~/AGENTS.md) is user config,
not a code workspace — without the guard, every session anywhere under
a dotfiles-managed home silently flipped to the coding posture. Real
projects under such a home still detect via their own markers/repos;
'on' mode bypasses the guard.
Manual testing of the salvaged draft persistence showed none of it worked
end-to-end. Three distinct bugs, all invisible to the store-level unit
tests:
1. New-chat drafts were never written. The skip-one-persist sentinel was
reset to null after consuming, but null IS a real scope (the unsaved
new-session draft) — so in a new chat every persist run matched the
"consumed" sentinel and bailed. This silently killed the headline
#38498 fix. Use undefined as the no-skip sentinel, which can never
collide with a scope.
2. Cmd+R inside the debounce window dropped the trailing text. React does
not run effect cleanups on a page reload, so the flush-on-unmount
never fired; with the 400ms debounce that meant type-then-reload lost
the draft every time. Flush pending writes on pagehide.
3. Session switch/new/resume/branch paths in use-session-actions cleared
the composer stores synchronously with the session-id updates. React
batches those, so by the time ChatBar's scope-change cleanup ran to
stash the departing session's attachments, the store was already
empty — the stash recorded [] and the chips were lost anyway. The
composer's per-scope restore now owns composer contents wholesale on
scope change, so drop the upstream clears (clearComposerDraft only
touched the vestigial $composerDraft atom nothing reads).
Co-authored-by: mollusk <roger@roger.local>
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
Chaos (5 scenarios x 2 UIs): every gateway-death/hang scenario fully
recovers on BOTH UIs — Ink respawns immediately (~80ms, no backoff),
OpenTUI after its 1s backoff; transcripts converge byte-identical to
the never-killed digest; zero orphans. PTY EOF: both exit and reap the
gateway in ~100ms (Ink takes 4.1s to die vs OpenTUI 0.2s).
Pipeline (800 msgs @30ev/s inside a dedicated tmux server): UI CPU
82.4s (ink) vs 79.1s (otui); tmux-server leg ~0.4s BOTH — the
'Ink costs more in the emulator' hypothesis is not supported at this
workload. Frame pacing: otui 22.3fps vs ink 15.8fps, interframe p95
103ms vs 209ms. Echo latency: both excellent (p50 1-2ms); submit to
first-token-paint 44ms (ink) vs 107ms (otui).
Fully removes the cron per-job 'profile' arg added in #28124: the
cronjob tool schema field, CLI --profile flags on cron create/edit,
job-record storage/validation, the scheduler's _job_profile_context
wrapper, and the script-runner env override. Sequential-partition
logic reverts to workdir-only.
The context-local HERMES_HOME override in hermes_constants and the
subprocess bridging in tools/environments/local.py are kept — they
now have other consumers (dashboard multi-profile, TUI gateway).
Drop the hermes_state.py column + persistence plumbing from the salvaged
interleaved-thinking fix. The ordered-block channel covers the failure
window in-memory (turn replayed within the live conversation loop). A
session reloaded from disk after a crash falls back to reconstruction;
if that replay 400s, the thinking-signature recovery (#43667) strips
reasoning_details and retries — one degraded call in a rare resume path
instead of a schema column. Replaces the DB-roundtrip test with a
fallback-shape test.
Two additive hardening changes on the interleaved-thinking replay path
introduced by this PR's anthropic_content_blocks channel. Both are scoped
to that channel's blast radius; neither changes correct behavior.
1. Replay-time tool-input re-sourcing (credential safety).
The ordered-block channel captures each tool_use `input` from the RAW
API response in normalize_response, which is NOT credential-redacted.
The parallel tool_calls[].function.arguments IS redacted at storage
time (build_assistant_message, #19798). The verbatim-replay fast path
in _convert_assistant_message replayed the raw block input, so a secret
a model inlined into a tool call (e.g. an Authorization header value
passed inside a terminal command) would ride back onto the wire even
though it is redacted everywhere else in history. Re-source tool_use
input from the redacted tool_calls map by
sanitized id; interleave order (the reason this channel exists) is
unaffected. Adapted from #36071, which re-sources tool inputs the same
way on its replay path.
2. Broaden the thinking-replay 400 classifier (defense-in-depth).
error_classifier only matched "signature" + "thinking", so the
frozen-block variant — "thinking ... blocks in the latest assistant
message cannot be modified. These blocks must remain as they were in
the original response." — carried no "signature" token and fell through
to a non-retryable abort. The anthropic_content_blocks channel prevents
the reorder that triggers this 400 at the source, but if any future
mutator reintroduces it, the turn now self-heals via the existing
strip-reasoning-and-retry recovery instead of crash-looping. A negative
case ensures an unrelated "cannot be modified" 400 (no "thinking") is
not swept in. Mirrors the classifier broadening in #36087 and #36071.
Tests
- tests/agent/test_anthropic_thinking_block_order.py: a replay test
asserting an inlined secret is redacted on the wire while interleave
order is preserved.
- tests/agent/test_error_classifier.py: three cases — frozen-block 400
native and via OpenRouter route to thinking_signature/retryable; an
unrelated "cannot be modified" 400 does not.
Both grafts verified RED (tests fail with the change reverted) then GREEN.
Full adapter, transport, classifier and output-field-leak suites pass.
Co-authored-by: AlexanderBFoley <92330381+AlexanderBFoley@users.noreply.github.com>
HTTP 400 "messages.N.content.M.text.parsed_output: Extra inputs are not
permitted" on the native Anthropic transport. Anthropic SDK 0.87.0 response
blocks carry output-only attributes the Messages *input* schema forbids: text
blocks get `parsed_output` and `citations=None`, tool_use blocks get `caller`.
normalize_response captured blocks verbatim via _to_plain_data and replayed
them as request input on the next turn, so the forbidden fields leaked back ->
400. Like the earlier thinking-block bug, one poisoned turn wedges every
subsequent request in the session (even the diagnostic turn), recoverable only
by switching models or deleting the session.
This is a defect in the anthropic_content_blocks channel added for the
interleaved-thinking fix: it preserved block ORDER correctly but copied every
SDK attribute, including output-only ones.
Fix — whitelist input-permitted fields per block type at all three leak points:
- agent/transports/anthropic.py normalize_response: sanitize at CAPTURE so the
poison never persists to state.db (defence-in-depth).
- agent/anthropic_adapter.py _sanitize_replay_block (new): whitelist used on the
ordered-blocks replay path; also recovers already-poisoned stored sessions.
- agent/anthropic_adapter.py _convert_content_part_to_anthropic: a stored
`text` part is rebuilt from whitelisted fields instead of dict(part) verbatim
(this was the exact content.N.text.parsed_output failure locus).
Whitelist not blacklist, so future SDK output-only fields can't reintroduce it.
Block order and thinking-block signatures are preserved (the reason the channel
exists). Adds tests/agent/test_anthropic_output_field_leak.py; full adapter
suite green (163 tests). Existing poisoned state.db rows scrubbed out-of-band.
Interleaved-thinking turns (adaptive thinking, Claude 4.6+/Opus 4.8) emit
content blocks like:
thinking_1(signed) tool_use_1 thinking_2(signed) tool_use_2
Anthropic signs each thinking block against the turn content preceding it
at its position. normalize_response split the turn into two parallel lists
(reasoning_details + tool_calls), discarding cross-type order, and
_convert_assistant_message rebuilt it as [all thinking][text][all tool_use].
That moved thinking_2 ahead of tool_use_1, invalidating its signature, so
Anthropic rejected the latest assistant message with HTTP 400:
messages.N.content.M: `thinking` or `redacted_thinking` blocks in the
latest assistant message cannot be modified.
Observed repeatedly in agent.conversation_loop against api.anthropic.com /
claude-opus-4-8, recurring across sessions on multi-thinking-block turns.
Fix: carry a verbatim, order-preserving copy of the turn's content blocks
(anthropic_content_blocks) end-to-end - capture in normalize_response,
persist/restore through state.db, and replay unchanged for the latest
assistant message. Gated to turns that actually interleave signed thinking
with tool_use, so normal turns are unaffected.
Adds 3 regression tests including a SQLite round-trip covering the
crash-recovery reload path.
The salvaged draft persistence scoped text per session but reset the
composer's attachments to [] on every scope change, so a staged image or
file was silently dropped when you switched sessions and never restored on
return — inconsistent with the "drafts survive session switches" promise
and a real paper-cut given remote staging cost.
Retain attachments per scope in an in-memory map (keyed by the same scope
as the text draft) since blobs / object URLs / live upload state can't be
serialized to localStorage. Entering a scope restores its stashed chips;
leaving stashes the current ones; an accepted submit clears the scope.
This survives session switches (the case users hit) without pretending to
survive a full reload, which attachments fundamentally can't.
Also guard the debounced text write so browsing sent-message history or
editing a queued prompt (both swap the composer to recalled text via
loadIntoComposer) no longer clobbers the genuine in-progress draft in
storage.
Co-authored-by: mollusk <roger@roger.local>
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
'Set as active' on the Profiles page only flips the sticky active_profile
file (future CLI/gateway runs) — it never retargets the running dashboard
process. The skills/toolsets endpoints called bare load_config()/
save_config(), so after 'activating' a profile in the web UI, deactivating
a skill silently wrote into the dashboard's own profile and the activated
profile was untouched.
Backend:
- _profile_scope() context manager on the skills/toolsets endpoints:
context-local HERMES_HOME override for call-time config resolution +
cron-style locked swap of tools.skills_tool's import-time SKILLS_DIR
- profile param on /api/skills, /api/skills/toggle, /api/tools/toolsets*
(list/toggle/config/provider/env), hub sources/search installed-state
- hub install/uninstall/update spawn 'hermes -p <profile> skills ...' so
the child rebinds skills_hub.SKILLS_DIR at import (the override cannot
reach import-time globals); profile validated -> 404/400 before spawn
Frontend:
- Skills page: profile selector (deep-linkable /skills?profile=<name>),
amber banner naming the managed profile, threaded through skill toggles,
toolset drawer, and hub browser
- Profiles page: 'Manage skills & tools' action per card; 'Set as active'
toast now says it applies to new CLI/gateway runs only
Omitted profile keeps legacy behavior (dashboard's own profile).
The salvaged draft-persistence effect wrote to localStorage on every
keystroke — the composer's per-keystroke path was deliberately slimmed
down previously, so debounce the write (400ms) and flush pending text on
scope change/unmount so a fast session switch can't drop trailing
keystrokes. Also add AUTHOR_MAP entry for the salvaged commit.
Save in-progress composer text to browser localStorage per chat session and restore it when the desktop composer remounts. Keep the draft when submit is rejected or throws, and clear it only after the prompt is accepted.
The memory/skill write-approval gate (#38199, #43354, #43452) was only
documented inside features/memory.md. Surface it everywhere users will
actually look:
- features/skills.md: new 'Gating agent skill writes' section under
skill_manage, with the staging semantics, review commands, and the
distinction from skills.guard_agent_created
- configuration.md: memory.write_approval added to the Memory
Configuration block; new 'Write approval for skill writes' subsection
next to the guard_agent_created scanner
- reference/slash-commands.md: /memory and /skills review subcommands in
both the CLI and messaging tables; Notes updated since /skills
pending/approve/reject/diff/approval now works on the gateway
- features/memory.md: cross-link to the new skills section
Replace the hermes-identifying clientInfo/User-Agent/session-id prefix on
the keyless Parallel Search MCP path with a neutral 'mcp-web-client'
identity. Project policy forbids third-party usage attribution without an
explicit user opt-in (see telemetry PR policy); MCP requires a clientInfo,
so a generic one satisfies the spec without attributing traffic.
Also adds the contributor AUTHOR_MAP entry and refreshes uv.lock against
current main (parallel-web 0.6.0).
Make Parallel the web search/extract backend with a zero-setup free tier:
- Keyless (no PARALLEL_API_KEY): web_search/web_extract work out of the box via
Parallel's free hosted Search MCP (search.parallel.ai/mcp), and parallel
becomes the default backend when no other web credentials are configured
(ahead of ddgs, which is search-only). A small hand-rolled Streamable-HTTP
JSON-RPC client speaks the MCP's web_search/web_fetch tools; the existing
web_search/web_extract tools are the only tools registered.
- Keyed (PARALLEL_API_KEY set): uses the Parallel v1 REST endpoints
(client.search / client.extract with advanced_settings.full_content) — no beta.
Bumps parallel-web 0.4.2 -> 0.6.0.
- Attribution: on the free path only, results carry provider/attribution and the
CLI tool line reads "Parallel search" / "Parallel fetch"; the paid path is
unbranded.
- Selection/registration: web tools register unconditionally (free MCP backstop)
while check_web_api_key remains a real usability probe; explicit per-capability
backends are honored (so misconfig surfaces) rather than masked by the fallback.
Tested: live web_search/web_extract against search.parallel.ai in keyless and
keyed modes; unit suites for the MCP client, backend selection, and display
labeling; full agent run shows the "Parallel search" label on the free path.
Reconstructs what killed gateways and TUI sessions: merges
~/.hermes/logs, journalctl/dmesg OOM kills, sessions-DB abnormal ends,
and git worktree state into one timestamp-sorted timeline plus a
summary (OOM victims, tui_gateway exit-reason histogram, orphaned
sessions). Findings from the 2026-06-04..11 window: gateway deaths were
kernel OOM kills selecting hermes via OOMScoreAdjust=200 under pressure
from OTHER tools' multi-GB leaks; TUI deaths were top-down SIGHUP/EIO
cascades from unit teardown; tui_gateway itself crashed in 0/152 exits.
Real sessions DB (~/.hermes/state.db, 444 tui+cli sessions): p50=20,
p75=53, p90=182, p95=340, p99=1941 msgs. The assumed 200-300 'realistic
band' is actually the p90-p95 region; the typical session is ~20 msgs
and the p99 tail reaches ~1940 (real ~2k-msg sessions exist).
New cells at those anchors (2 reps, ink vs otui-capped, 2G scope), VmHWM
medians: 100 msgs 163 vs 222MB; 300 msgs 180 vs 268MB; 2000 msgs 234 vs
671MB (2.9x). session-distribution.mjs regenerates the JSON; run.mjs
gains --configs to skip redundant configs (otui-uncapped == capped below
the 3000-row cap).
Three behavior changes to the hermes -w worktree lifecycle:
1. Git-native locks. _setup_worktree now locks its worktree
(git worktree lock --reason "hermes session pid=<pid>"), and
_prune_stale_worktrees skips locked worktrees at ANY age — a lock
from a live or crashed session means "do not touch". New helpers
_lock_worktree / _unlock_worktree / _worktree_is_locked (fail-safe:
any error reads as locked) / _worktree_is_dirty (fail-safe: any
error reads as dirty).
2. Dirty trees are preserved. _cleanup_worktree previously destroyed
worktrees with uncommitted changes if there were no unpushed
commits; it now keeps the worktree, branch, and lock when the tree
is dirty OR has unpushed commits, and prints manual cleanup hints
(git worktree unlock + remove --force). The >72h "force remove
regardless" prune tier is removed: pruning may only ever delete
clean, unlocked, fully-pushed worktrees.
3. Branch deletion is gated on removal success. Both cleanup and
prune previously deleted the branch without checking the
git worktree remove returncode, dropping easy reachability of the
commits even when removal failed; the branch is now only deleted
after a successful remove.
Follow-up to #42351. Slash command rows render the command label and
description with `truncate`, so skill commands and longer blurbs were
clipped with no way to read the full text. Rather than add a floating
tooltip (which overlaps the popover and only helps the mouse), the active
row — the one reached by keyboard arrows or hover, since onMouseEnter
already sets activeIndex — now drops truncation and wraps inline
(whitespace-normal break-words). Idle rows stay single-line/truncated so
the list reads compact.
* desktop: surface /tools, /save, /personality and fix /help skill count
Move /tools and /save out of TERMINAL_ONLY_COMMANDS and /personality out of
ADVANCED_COMMANDS so they appear in the desktop slash palette and execute via
the existing slash.exec → command.dispatch fallback. The backend gateway already
accepts these through slash.exec (none are in _PENDING_INPUT_COMMANDS or the
skill list), so no backend change is required.
Recompute skill_count in filterDesktopCommandsCatalog from the filtered pairs.
Previously the /help footer echoed the unfiltered backend total — e.g. "60
skill commands available" while only ~29 actually appeared in the rendered
list, because the desktop hides terminal-only, picker-owned, and advanced
commands.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* desktop: keep slash popover live while typing args
The trigger regex `(?:^|[\s])([@/])([^\s@/]*)$` stopped matching the moment
the user typed a space after a slash command, so the popover never showed arg
completions for `/personality`, `/tools`, etc. — even though the backend's
`complete.slash` already returns them with a `replace_from` indicator.
Split the trigger detection so `/` allows args (`/cmd arg1 arg2`) while `@`
keeps the strict no-space behavior. Restrict the slash command name to
`[a-zA-Z][\w-]*` so file paths like `src/foo/bar` don't accidentally trigger
the popover.
Rewrite arg-completion items in useSlashCompletions to insert the full
`/personality alice` token instead of stranding `/alice`: when `replace_from`
is past the command base, prepend the existing prefix to each item's text so
the chip serializer produces a coherent replacement.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* cli: complete toolset names after /tools enable|disable
SlashCommandCompleter previously only auto-derived the first subcommand level
from args_hint, so `/tools enable <tab>` yielded nothing — the user had to
remember every toolset key (web, file, spotify, …) and every MCP server prefix.
Add `_tools_completions` that handles both stages: subcommand (list|disable|enable)
and tool name. Filter by current enable state so `/tools enable <tab>` only
offers disabled toolsets and `/tools disable <tab>` only offers enabled ones —
no point suggesting a no-op. MCP server prefixes (server:) come from the
saved mcp_servers config; per-tool completion under a server would require
runtime MCP introspection and is left as follow-up.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* desktop: registry-driven slash commands with first-class pickers
Collapse the if/else slash dispatch into one DESKTOP_COMMAND_SPECS table
that drives popover suggestions, per-type composer pills, and execution.
- /resume, /sessions, /switch: inline session completions (like /skin) plus
a "Browse all sessions…" entry that opens a dedicated session picker overlay
- /handoff: inline platform completion + handoff.request/handoff.state
gateway bridge so desktop reaches CLI parity
- colored per-type pills (command/skill/theme) in the composer
- strip ANSI and fix width/alignment of slash output in the chat panel
* desktop: fold repeated slash session/output boilerplate into one helper
runExec, /title, /help and the unavailable case each re-derived the same
ensure-session → bail-with-notify → build-renderSlashOutput dance.
withSlashOutput() returns {sessionId, render} or null, so each handler is
a two-line resolve instead of an eight-line preamble.
* desktop: keep backend meta on slash arg completions
Arg suggestions (/personality <name>, /tools enable <toolset>, /handoff
<platform>) were having their meta overwritten with the parent command's
registry description: desktopSlashDescription("/personality none") canonicalizes
back to /personality and returns its blurb. Skip the lookup for arg rows so the
backend's own display_meta ("clear personality overlay", etc.) survives.
* cli: list real personalities in /personality completion
_personality_completions resolved load_config().agent.personalities — but that
schema has no agent.personalities key, so completion always returned just
`none` even though the runtime (load_cli_config().agent.personalities) ships a
dozen built-ins (helpful, kawaii, pirate, …). Read from the same source the
command actually applies, so `/personality ` surfaces the real options.
* desktop: expand bare arg-commands to their options on pick
Picking a command like /personality from the slash popover committed it
immediately instead of advancing to its argument list. Mark arg-taking
commands (/skin, /resume, /handoff, /personality, /tools) in the registry
and, when one is picked bare, insert "/cmd " as plain text and re-open the
popover on its inline options — mirroring typing "/cmd " by hand. Arg picks
(serialized text already contains a space) still commit a single pill.
Also realign trigger-popover loading test with the redesigned popover (the
/help empty-state hint shows when resolved, not while the spinner is up);
the merge from main reintroduced the pre-redesign expectation.
* tui_gateway: fold session-db close into a context manager
Both handoff RPCs repeated the same `db, close_db = _session_db_handle()`
+ `finally: if close_db: db.close()` dance. Turn the helper into a
`_session_db` contextmanager that owns the close, so callers just
`with _session_db(session) as db:`.
* desktop: unblock handoff retries and exact resume ids
Clear timed-out desktop handoffs through the gateway so retries are not stuck behind a pending row, and let typed /resume session ids bypass the loaded sidebar cache.
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(streaming): stop socket read timeout from preempting stale-stream detector
The stale-stream detector is deliberately scaled to 180-300s so reasoning
models (e.g. Opus) can pause mid-stream during extended thinking. But the
httpx socket read timeout stayed at a flat 120s for cloud providers and fired
first, tearing down healthy reasoning streams before the detector (which owns
retry + diagnostics) could act. Symptom: every Copilot/Opus turn dies with
ReadTimeout at a consistent ~125s and never completes.
Floor the cloud socket read timeout at the stale-stream timeout so it can no
longer fire before the detector. Local providers and explicit
HERMES_STREAM_READ_TIMEOUT / request_timeout_seconds overrides are unchanged.
* test(streaming): pin read-timeout >= stale-stream invariant for cloud reasoning streams
Cover the contract that the httpx socket read timeout is never shorter than
the stale-stream detector for cloud providers on the default: small contexts
floor to 180s, >=50K to 240s, >=100K to 300s; explicit overrides win; local
providers and the unresolved-value fallback are unaffected.
CI caught tests/cli/test_cli_new_session.py asserting that /new keeps
the old session row when conversation history exists in memory. The
live transcript is authoritative: a session whose messages haven't
flushed to the DB yet (or whose flush failed) must not be pruned.
Guard _discard_session_if_empty on self.conversation_history and pin
the behavior with a test.
Port from google-gemini/gemini-cli#27770: starting the CLI and
immediately quitting (or rotating with /new, /clear) left an empty
untitled session row behind. These ghost rows pile up in /resume,
`hermes sessions list`, and the in-chat recent-sessions browser.
- SessionDB.delete_session_if_empty(): transactional check-and-delete
that only removes rows with no messages, no title, and no child
sessions (delegate subagent parents are preserved). Also removes
on-disk transcript files via the existing _remove_session_files.
- HermesCLI._discard_session_if_empty(): thin wrapper, wired into the
cli_close shutdown path and the new_session() rotation path.
Skipped when /exit --delete already handles removal.
Unlike the one-shot prune_empty_ghost_sessions migration (TUI-only,
24h-old rows), this prevents new ghost rows from accumulating at the
moment they would be created.
Re-ran the cells that crashed, against the fixed binary (a939c9a):
- mem3000 otui-capped: crashed_after_stream exit 7 @ ~900MB
→ completed exit 0, vmhwm 859MB
- mem3000 otui-uncapped: crashed_after_stream exit 7
→ completed exit 0, vmhwm 834MB
- slope10000 otui-uncapped: died exit 7 at 3,000 msgs (197d499 result)
→ completed ALL 10,000 msgs, exit 0, vmhwm 1.57GB —
under the 2GB cgroup cap, no cap-hit, no crash
- fresh ink baselines for both cells (mem3000 257MB / slope10k 328MB).
No "Failed to create SyntaxStyle" anywhere; the store cap now binds at the
handle-safe ceiling (1000 rows) long before the native 65,534-slot handle
table exhausts. report.html + report-assets regenerated via render.mjs
(results are append-only; pre-fix runs remain as the baseline).
Root cause of the bench-suite crash (every otui mem3000/slope cell died at
~3000 lumpy fixture msgs, exit 7, ~880MB RSS — not a cgroup kill):
- @opentui/core 0.4.0 routes EVERY native object through ONE global handle
registry with 16-bit slot indices (core src/zig/handles.zig: INDEX_BITS=16,
MAX_SLOTS=65535, slot 0 reserved). Measured on this install: exactly 65,534
live handles; the next createSyntaxStyle() fails. destroy() DOES recycle
slots — exhaustion means LIVE objects.
- Every TextBufferRenderable burns THREE slots in its constructor
(TextBufferRenderable.ts:77-80: TextBuffer + TextBufferView + SyntaxStyle),
so the mount-everything transcript hits the wall at ~1,400 store rows
(~16 text renderables/row x 3 ~ 47 handles/row): "Failed to create
SyntaxStyle" (zig.ts:4554) throws out of a Solid mount effect.
- The crash was MASKED: CliRenderer's own uncaughtException handler
(handleError -> console.show()) allocates the console-overlay
OptimizedBuffer — another handle — so the handler itself threw "Failed to
create optimized buffer: WxH" and Node died with exit 7 (fatal error in
the uncaughtException handler), hiding the real error.
Why not share one SyntaxStyle (the obvious 3->2): the per-buffer style is
load-bearing — native setStyledText (text-buffer.zig) registers each chunk's
color by NAME ("chunk{i}") into the buffer's OWN style, and registration is
name-keyed-overwrite (syntax-style.zig putStyle), so a shared style would
cross-corrupt chunk colors between every styled <text>. Pooling is unsound
at our layer in core 0.4.0.
The fix, at the seams that are ours:
- boundary/nativeHandles.ts (ffiSafe.ts sibling): SyntaxStyle.create() on a
full table DEGRADES to a detached style (native handle 0) instead of
throwing — JS-side styleDefs/mergeStyles (what markdown/code chunk colors
actually use) keep working; all native calls on handle 0 are inert no-ops.
- boundary/renderer.ts: guard the process error listeners createCliRenderer
installs so an exception INSIDE the handler can never exit-7-mask the
original error again (logged honestly; original error stays the story).
- logic/store.ts: HERMES_TUI_MAX_MESSAGES clamped to a handle-safe ceiling
(1000 rows ~ 47k handles ~ 72% of the table on the realistic fixture).
The old default of 3000 was unreachable — the TUI crashed at ~1,400 rows,
before the cap ever bound. Renderable-weight-aware capping is #27's
(virtualization) to do properly; until then the degrade shim backstops
pathological rows.
TODO(upstream) — issue-shaped, for the OpenTUI repo:
(a) a global 64k handle table with a 3-slot cost per text renderable is
too small for transcript-style TUIs (61k renderables ~ 3k messages);
(b) native allocation failures throw out of the render loop with no
degrade path;
(c) handleError allocates (console overlay buffer) and so crashes on the
very condition it is reporting, masking the root cause with exit 7.
Also: eslint now ignores ui-opentui/.bench/** (bench `nodes`-cell build
artifact broke the lint gate) and .gitignore covers it.
Gate: npm run check green, 599 tests (595 baseline + 3 degrade-path tests
+ 1 cap-clamp test).
Follow-up to the GitHub-skills fix: the same hardcoded ~/.hermes/.env
pattern existed across other bundled and optional skills. Under the
official Docker setup (HERMES_HOME=/opt/data, subprocess HOME=/opt/data/home)
those paths point at a nonexistent file.
- kanban-video-orchestrator setup.sh.tmpl + docs: resolve via
${HERMES_HOME:-$HOME/.hermes}/.env in check_key()
- telephony.py / canvas_api.py / hyperliquid_client.py: error and
save messages now report the real resolved env path instead of a
hardcoded literal (path resolution itself was already correct)
- godmode SKILL.md: load_dotenv snippet resolves via HERMES_HOME
- watch_github.py + ~20 SKILL.md prose mentions: document the env file
as ${HERMES_HOME:-~/.hermes}/.env so Docker users edit the right file
The GitHub skills' auth-detection fell back to reading GITHUB_TOKEN from a
hardcoded ~/.hermes/.env. In the official Docker layout HERMES_HOME=/opt/data
while tool subprocesses run with HOME=/opt/data/home, so `~/.hermes/.env`
expands to /opt/data/home/.hermes/.env — a path that does not exist — while the
real secrets file is /opt/data/.env. Result: the agent reports GITHUB_TOKEN as
"not set" even though it is present and the dashboard Keys page shows it.
Resolve the file as ${HERMES_HOME:-$HOME/.hermes}/.env (HERMES_HOME is bridged
into tool subprocess env, falling back to ~/.hermes when unset) across all six
auth-detection sites: github-auth (SKILL.md + scripts/gh-env.sh), github-issues,
github-repo-management, github-pr-workflow, github-code-review.
Follow-ups on top of the two salvaged GodsBoy commits, all live-validated
against the real Telegram Bot API:
- _edit_overflow_split finalize fallbacks degrade to _strip_mdv2() clean
text instead of putting raw **markdown** markers on screen (salvaged
from PR #43463 minus its format-first sizing — live probes show
Telegram's 4096 limit counts PARSED text, so MarkdownV2 escape
inflation cannot cause MESSAGE_TOO_LONG and sizing against formatted
wire length only causes premature splits and fragment messages).
- Skip the redundant requires-finalize edit after a got_done edit that
split-and-delivered (salvaged from PR #43463): re-finalizing re-splits
the full text into the adopted continuation and duplicates chunks.
- _send_fallback_final only deletes the stale partial message when the
fallback re-sent the COMPLETE final text. When the prefix dedup sent
only the missing tail, the partial IS the head of the answer; deleting
it left users with only the second half of long responses (live-
reproduced: flood-control during a long stream -> head deleted,
ratio 0.54 of content visible). This is the third bug behind the
'Telegram cut messages' reports and was present on main and both PRs.
The "Persistent Memory" callout said "when memory is full, the agent
consolidates or replaces entries to make room," which reads as if the
store self-compacts automatically. It does not: the `memory` tool
returns an overflow error and the agent does the consolidation in-turn
(the design from #41755). Also note that `replace` is bound by the same
limit — swapping in a longer entry can still overflow — which is the
exact case that confused a user (replace rejected near the cap even
though the math was correct).
Add execution-time coverage that bare `provider="custom"` resolves a literal
providers.custom endpoint (and still falls through when none exists), plus
creation-time coverage that `_resolve_model_override` keeps a resolvable
"custom" and only pins the main provider when it is unresolvable.
A cron job stored with `provider: "custom"` and a matching `providers.custom`
entry in config failed at execution with `auth_unavailable: providers=codex`.
Two layers conspired:
- `_get_named_custom_provider` returned None for bare "custom" *before*
scanning config, so a literal `providers.custom` entry was never matched and
resolution fell through to the global default (codex). Now it scans config
for an entry literally named "custom"; with none it still returns None,
preserving the legacy model.base_url trust path.
- `_resolve_model_override` blindly stripped bare "custom" at job creation and
pinned `model.provider` (e.g. codex). It now keeps "custom" when a configured
custom endpoint resolves, pinning the main provider only when it doesn't.
The previous fix for #23387 changed _launchd_domain() from gui/<uid> to
user/<uid> to support Background/SSH sessions on macOS 26+. However, this
broke Aqua sessions where gui/<uid> is the only working domain and
user/<uid> cannot bootstrap or manage the service.
Now _launchd_domain() probes which domain actually contains the loaded
service:
1. Try gui/<uid> first (Aqua sessions)
2. Fall back to user/<uid> (Background/SSH sessions)
3. Use launchctl managername as heuristic when neither has the service
4. Cache the result for the process lifetime
Regression tests cover all four paths plus caching behavior.
The thinking-signature recovery in agent/conversation_loop.py popped
reasoning_details from messages, then continued to retry. That had two
defects.
First, the strip never reached the wire payload. api_messages is built
once at the start of the turn by shallow-copying every entry in messages
(line 919 area). Each api_messages entry has its own reference to the
same reasoning_details list. When build_api_kwargs runs on every retry
iteration of the inner while-loop, it consumes api_messages, not
messages. Popping reasoning_details from messages left api_messages
untouched, so the retry's request still carried the same thinking
blocks Anthropic had just rejected. The classifier latched
thinking_sig_retry_attempted = True after the first attempt, and the
loop terminated with max_retries_exhausted on the same 400.
Second, the pop mutated the canonical message list. messages is the
same list _persist_session writes to state.db and the session
transcript, so a single recovery permanently wiped every signed
thinking block from the stored conversation. Subsequent turns reloaded
the stripped state, hit the same 400 ('invalid signature' or 'cannot
be modified', see #24107), and the agent stopped responding entirely.
Cascading compaction-ended sessions then chained off the corrupted
parent and the affected chat could not produce a response on any
future turn.
Move the strip onto api_messages, which is the API-call-time list
rebuilt into kwargs on every retry. messages is no longer touched, so
disk I/O stays clean and the recovery actually reaches the wire.
Observed against the native Anthropic Messages API on claude-opus-4-7
and claude-opus-4-8 with the interleaved-thinking-2025-05-14 beta on
hermes-agent 0.12.0 and 0.14.0. PR #24107 narrows the trigger; this
change makes the recovery do what it always claimed to do, and
prevents the destructive aftermath.
Tests cover the api_messages strip in isolation: pop on a shallow copy
does not affect the source, the canonical messages list survives the
strip, idempotency on a duplicate firing path, and a no-op when no
reasoning_details exist on the messages.
Related: #24107, #26959, #17861.
Anthropic returns a 400 when the thinking/redacted_thinking blocks in the
latest assistant message are mutated upstream: 'thinking or redacted_thinking
blocks in the latest assistant message cannot be modified. These blocks must
remain as they were in the original response.'
The classifier's thinking_signature branch only matched on the substring
'signature', so this variant fell through to a non-retryable client error
and hard-aborted the turn -- even though the existing strip-reasoning_details
-and-retry recovery would have healed it.
Broaden the 400 match to also catch 'cannot be modified' / 'must remain as
they were' (still gated on 'thinking'), routing it to the same recovery.
Adds a negative-case test so unrelated 'cannot be modified' 400s are not
swept in.
Defense-in-depth, orthogonal to the root-cause work in #35975 / #17861
(which prevent the block mutation in the first place). Only changes a
terminal-failure into a one-shot recovery.
Signed-off-by: Ian Culling <ian@culling.ca>
The OpenTUI /usage went through the slash-worker subprocess, which
resumes the session WITHOUT a live agent — so it could never show
current-session tokens or costs, and what it did show landed as a
full-screen page.
- slash.exec now answers /usage in-process from the live agent:
per-model rows (requests, tokens in/out, cache, provider-reported
cost when present), session totals/context, a one-line 30-day
summary (SessionDB.usage_totals, real costs only) and a one-line
Nous credits gauge (nous_credits_compact_line, refactored out of
nous_credits_lines). ~8 lines instead of a page.
- Unreported costs render as 'not reported by provider' — never
$0.00 — and the 30d summary omits cost when no session in the
window has a provider-reported figure.
- /usage full keeps the detailed legacy CLI page via the worker.
Cost displays were estimates from a pricing table; on OpenRouter the
status bar never reflected what was actually charged. Now cost is
provider-REPORTED only, end to end:
- OpenRouter requests carry usage:{include:true} (profile + legacy
transport paths); the response usage.cost field (credits, 1:1 USD)
is captured per call into agent.session_actual_cost_usd and
persisted to the sessions DB actual_cost_usd column (NULL-safe:
unreported calls never touch the stored value).
- Nous keeps its x-nous-credits-* header capture; the header delta
now surfaces as the session's real cost via real_session_cost_usd.
- Providers that report nothing accumulate NOTHING: cost fields stay
absent/None (the TUI hides its cost segment), never a fabricated
$0.00 and never an estimate. _get_usage, gateway /usage and the
CLI usage page all switched off estimate_usage_cost for display.
- Per-model session accumulator (session_model_usage) records real
per-call counts and provider-reported cost per model.
* fix(desktop): keep model runtime state per session
(cherry picked from commit f72ee87d99ee38cb7b5badeb9a8af869bb92073a)
* fix(desktop): keep footer model state scoped to active session
(cherry picked from commit d91942ebd4671ff857b5c8526dbf133f04782ecb)
* fix(desktop): restore stored runtime when resuming sessions
(cherry picked from commit 32b3793418257617b8da57e26151f079c2620d00)
* fix(desktop): persist live runtime changes for resume
(cherry picked from commit c58467779436dcef44a80ad55b52664752dc0837)
* fix(desktop): persist resumed endpoint runtime
* chore(attribution): map pinguarmy's commit email in AUTHOR_MAP
The salvaged commits on this branch preserve @pinguarmy's authorship
(郝鹏宇 / peterhao@Peters-MacBook-Air.local). Add the mapping so the
check-attribution CI gate resolves the email to the GitHub username.
---------
Co-authored-by: 郝鹏宇 <peterhao@Peters-MacBook-Air.local>
User feedback: tool/thinking rows did a "v small quick lil jump up and
down" when toggled, worst on the bottom rows.
Root cause (verified live with 10ms tmux capture sampling): the
transcript scrollbox's sticky-bottom re-pin and the scroll anchor fought
AFTER paint. On a toggle near the bottom, the content-height change runs
ScrollBox.recalculateBarProps -> applyStickyStart("bottom") (the user is
at the sticky position, so _hasManualScroll is false), which paints a
fully bottom-pinned frame; the anchor's 4x16ms scrollTo re-asserts then
yanked the viewport back up. The capture burst shows the transient
pinned frame between two anchored ones on every expand — the visible
down-up flick.
Fix at the cause instead of correcting after the effect: suspend
stickyScroll (a runtime get/set property on ScrollBoxRenderable) BEFORE
running the toggle and restore it ~100ms later, once the content height
has settled. With sticky off, the toggle's layout pass leaves scrollTop
untouched — the clicked header's document position is unchanged (content
grows/shrinks below it), so nothing moves and there is nothing left to
flicker; a collapse past the new bottom clamps naturally via the
ScrollBar scrollSize setter. Restoring recomputes the manual-scroll
state from the actual position: still at the bottom -> keeps pinning for
new content; mid-content -> manual-scroll semantics until the user
returns (the same end state the old anchor produced). Rapid re-toggles
inside the window keep the ORIGINAL saved value.
The far-from-bottom anchor guarantee is unchanged (scrollTop is simply
never touched), pinned headlessly in scrollAnchor.test.tsx along with
the suspension sequencing, the clamp-then-re-pin collapse path, and the
double-toggle restore. ffiSafe's tall-diff scroll-cut regression now
drives the negative-y condition explicitly via wheel scrolls (the old
anchor exercised it through the very transient sticky-bottom frames this
fix removes).
Verified live (tmux, real gateway): before — toggling the bottom rows
painted a transient bottom-pinned frame (f141 of a 10ms burst); after —
three toggle bursts produce ONLY the clean before/after states (4
distinct frames in 458 samples), headers hold their row, including the
bottom-most rows.
User feedback: "for all tools, i'd want all their output viewing enabled
to be infinite by default."
Flip envOutputLines (HERMES_TUI_TOOL_OUTPUT_LINES): unset -> Infinity
(was 200); a positive integer RESTORES a cap (e.g. =200); 0 stays
Infinity for back-compat with the old opt-in-unlimited value; garbage ->
Infinity (unrecognized = no cap asked for). The semantic is now "cap
only when the user asked for one".
The store's raw-result preference follows the same rule: envOutputLinesSet
becomes envOutputUnlimited — whenever the cap is unlimited (the default
now) and a gateway tail-capped result_text (omittedNote) arrives with the
always-full raw result on the wire, the raw result wins, since an
uncapped view of a tail would silently miss the head. With an explicit
finite cap the gateway tail + honest omitted note are kept.
Memory safety is unchanged: tool bodies mount only while EXPANDED (rows
default collapsed and free their Yoga nodes on collapse/unmount), and the
rolling HERMES_TUI_MAX_MESSAGES cap bounds the transcript's high-water
mark.
Tests: env.test.ts expectations flipped (unset/garbage -> Infinity, 0
documented as back-compat); tools.test.tsx "flag unset caps at 200"
becomes "unset renders all 250 lines", plus an explicit =50 cap (+note)
test and =200 restored-cap test; the store preference matrix covers
unset/0 (raw wins), =50 (tail+note kept), and no-raw (tail+note, no
crash). Verified live: seq 1 220 expanded renders rows 201-220 with no
"+N more lines" note.
* fix(ci): append filesystem forensics when a per-file pytest run exhausts exit-4 retries
A PR-added test file (tests/test_iron_proxy.py, PR #30179) repeatedly
failed exactly one CI shard with 'ERROR: file or directory not found'
across 4 runs (including a fresh merge SHA on fresh runners), while the
identical slice passes locally against the same merge commit and a
tree-integrity watcher confirms no sibling test mutates the repo. Three
unrelated branches showed the same one-shard signature the same day.
We currently cannot attribute these because the log only carries
pytest's exit-4 line. This adds a forensics block to the captured
output when exit-4 survives the retry loop:
- does the file exist NOW (post-retries)
- parent dir entry count + similarly-named entries
- git status --porcelain dirty-entry count + first 10 entries
Zero behavior change: rc stays 4, retries unchanged, forensics wrapped
in a broad try/except so they can never mask the failure.
Two new tests cover the exhausted-retries and genuinely-missing paths.
* chore: drop the two forensics tests — ship the runner change only
The native TUI prefetches model.options right after session.create (91df32545,
picker instant-open). The handler is network-bound (~3.7s: pricing fetch + Nous
tier check in build_models_payload) and ran on the gateway's main dispatcher
thread, so every fast-path RPC issued in the first seconds after launch —
complete.slash for the '/' dropdown, session.list, config.get — sat unread
behind it. Measured: first '/' dropdown 1718ms at HEAD vs 53ms at 394f45a3d
(pre-prefetch baseline); 52ms after routing model.options onto the existing
RPC thread pool (_LONG_HANDLERS). The /model picker keeps its 29ms cached open.
* feat(profiles): extend create endpoint for full profile-builder (model + MCPs + skills)
Backend foundation for the dashboard profile builder. Extends POST /api/profiles
to accept, in one call, everything a profile needs beyond name/clone:
- mcp_servers[] -> written into the new profile's config.yaml
- keep_skills[] -> replace-semantics: disable every seeded skill not kept
- hub_skills[] -> async install via 'hermes -p <name> skills install <id>'
All applied best-effort AFTER the profile dir exists, so a hiccup in any one
never 500s the create. Model/MCP/keep-skills writes are profile-scoped via the
HERMES_HOME context override (same mechanism as the existing _write_profile_model).
Hub installs go through a subprocess scoped with -p because skills_hub.SKILLS_DIR
is import-time-bound and the runtime override can't redirect it.
Adds two helpers (_write_profile_mcp_servers, _disable_unselected_skills) and a
TestClient test asserting all four paths land in the NEW profile's config and
the hub spawn is scoped to it. Design doc at docs/design/profile-builder.md.
* feat(dashboard): full-featured profile builder page
Adds a dedicated /profiles/new builder that composes everything a profile
needs into one stepped create flow, reusing the existing Models/Skills/MCP
data paths instead of duplicating them:
- Identity name + description
- Model provider+model picker (api.getModelOptions)
- Skills keep-which-built-in/optional (replace semantics, default = full
bundle) + skills-hub search/add (api.getSkills, searchSkillsHub)
- MCPs add HTTP/stdio servers inline
- Review blueprint -> single POST /api/profiles create
Nothing writes until Create; the one call commits model+MCPs+skill selection
and spawns hub-skill installs (reported in the success toast). ProfilesPage
header gets a 'Build' button (full builder) alongside 'Create' (quick modal).
Route is page-only (not in the sidebar nav). Verified with vite build (2258
modules, green).
A patch tool's result is a JSON record whose payload IS the diff. In a verbose
session the gateway redacts + TAIL-caps result_text (_cap_tui_verbose_text),
so the echo arrived under the native diff in two broken shapes: truncated
mid-JSON (unparseable, so the old JSON.parse check failed open), or — for tall
edits — capped PAST the JSON head, which the store's normalizeOutput then
un-escapes into plain lines that duplicate the diff. North star: no raw JSON
in the transcript, ever.
Three layers:
- gateway: when diff_unified ships, result_text drops the in-JSON diff echo
(_result_sans_diff_echo) — small, parseable, carries only the non-diff
signal (success/files_modified/warnings/lsp_diagnostics).
- fileTool diffOutputPlan: anything starting with '{' under a rendered diff is
suppressed regardless of parseability; parseable JSON with real non-diff
signal (error/warning/lsp_diagnostics) renders JUST those as labeled notes;
a non-JSON fragment whose lines echo the rendered diff is suppressed too
(guards older emitters). Plain-text results (lint tails) still render.
Expanding a tall <diff showLineNumbers> pinned to the scrollbox bottom froze
the TUI with ERR_INVALID_ARG_VALUE looping out of CliRenderer.loop every
frame. Root cause: @opentui/core 0.4.0 marshals OptimizedBuffer
fillRect/drawText/setCell* coordinates as u32 in the FFI table while
LineNumberRenderable.renderSelf passes raw screen coordinates — NEGATIVE when
the diff is partially scrolled above the viewport. Bun's FFI silently wraps
negatives (native side bounds-checks them into a no-op); Node's experimental
node:ffi rejects them. bufferDrawBox already uses i32, which is why ordinary
boxes/text scroll fine and only the diff line-background path crashed.
Fix at the seam we own: boundary/ffiSafe.ts patches OptimizedBuffer to clip
fillRect to the non-negative quadrant and skip negative-origin
drawText/setCell*/drawChar before the FFI call (Bun parity). Installed from
boundary/renderer.ts (live) and test/lib/render.ts (headless). TODO(upstream):
widen those FFI params to i32 so this shim can be deleted.
Ports the engine off the second JS runtime onto Node 26.3 (node:ffi) so the
repo ships a single JavaScript runtime: child_process for the gateway, vitest
for tests, an esbuild + Solid build step. Mouse selection copies the rendered
text you highlight, and the clipboard path is crash-proofed (a broken copy
pipe no longer quits the UI). Renames the engine dir ui-tui-opentui-v2/ ->
ui-opentui/ and updates the launcher/installer/Docker references.
Dev demo (not a test): seeds the bench fixture into the store via the resume path
and renders <App> under a real CliRenderer (no gateway) so you can attach over
tmux, scroll, and eyeball the transcript + the rolling-cap truncation notice.
Run: DEMO_TOTAL=240 HERMES_TUI_MAX_MESSAGES=80 bun scripts/demo.tsx
Bench (realistic fat-turn fixture) put numbers on the cap tradeoff: ~0.65 MB/msg,
~20.4 renderables/msg → 3000 ≈ 2 GB steady RSS, the highest cap within a sane TUI
budget (that ceiling only hit by marathon 3000+-msg sessions; typical cost a
fraction). 1500 was too little scrollback. Tunable via HERMES_TUI_MAX_MESSAGES.
Adds a store `dropped` counter (live overflow in capMessages + the resume slice in
commitSnapshot; reset on clearTranscript) and a dim, selectable=false top-of-
transcript notice — '⤒ N earlier messages — scroll-back capped; full transcript on
the dashboard · session <id>' — so display truncation is visible + points to the
deep-history surface. Display-only: never touches the model's gateway-side context.
Replaces the synthetic ~5.5-node/msg pushes with a deterministic generator
(scripts/fixture.ts): lorem-ipsum user turns + fat assistant turns (markdown +
reasoning + 1-15 tool parts with multi-line results) driven through the real
apply()/commitSnapshot paths. mem-bench.tsx pumps it + checks the resume path.
Realistic cost is ~20.4 renderables/msg (3.7x synthetic); informed the cap tune.
commitSnapshot set the full fetched history then trimmed — briefly handing the
whole transcript to <For>. Since Yoga (WASM) layout memory is grow-only, even a
transient over-cap mount permanently ratchets the high-water mark, partly
defeating the cap when resuming a large session (a real one has ~1980 messages).
Slice to MESSAGE_CAP BEFORE the first setState so resume mounts at most the cap.
Dev bench (not a test, not in the gate suite): mounts <App> under the Solid
test renderer, pushes N streamed turns, samples RSS + mounted-renderable count
with Bun.gc. Demonstrates HERMES_TUI_MAX_MESSAGES=400 pins mounted renderables
at ~2218 vs an unbounded climb to ~55k at 10k messages (RSS flat ~350MB vs
1.3GB). Run: bun scripts/mem-bench.tsx (MEM_BENCH_TOTAL/SAMPLE tunable).
Note in the README CLI section that the terminal UI defaults to the native
OpenTUI engine on Linux/macOS with Bun (provisioned by the installer), and
that the legacy Ink engine remains the automatic fallback (Windows, Termux,
no Bun) and can be selected explicitly with HERMES_TUI_ENGINE=ink. Ink is
not removed — it's the kept fallback.
No in-repo config example documents display.tui_engine (the published config
reference lives on the docs site, not the repo), so there was nothing to
annotate there.
Add an opt-in-safe `install_opentui` stage that provisions the native
OpenTUI TUI engine: it resolves/installs Bun (~/.bun/bin/bun) and runs
`bun install` in ui-tui-opentui-v2 so the launcher's _opentui_available()
probe (Bun + node_modules/@opentui) passes and OpenTUI becomes the default.
Strictly best-effort: skipped on Windows/Termux/Android and when the v2
package is absent; any sub-step failure (no network, Bun install fails,
`bun install` fails) logs a warning via log_warn and returns 0. The stage
never `exit`s and never returns non-zero, so it can't abort the install — a
failed/skipped setup simply leaves the user on the kept Ink fallback.
Registered after node-deps in all three drivers: the monolithic main()
flow, the run_stage_body case dispatcher (opentui-engine), and the
emit_manifest staged-installer JSON.
Flip the default engine: with no explicit HERMES_TUI_ENGINE env / display.tui_engine
config, resolve to 'opentui' when this host is genuinely set up for it (Bun resolves +
the v2 package's entry + node_modules present + not Windows/Termux), else 'ink'. An
explicit env/config choice still wins, and 'ink' remains the universal opt-out. Hosts
without the OpenTUI setup are unaffected (stay on Ink), so nothing strands a user.
- _config_tui_engine_early() now returns None (not 'ink') when unset, so the caller
distinguishes 'explicitly ink' from 'unset' and applies the availability-gated default.
- _bun_bin() split: _bun_bin_or_none() is the non-fatal probe; _bun_bin() still exit(1)s
on the explicit launch path. New _opentui_available() gates the default.
- Verified the full resolution matrix (7 cases) + that the platform/availability gates hold.
Replace the ~5 near-identical `as*()` accessors in
view/prompts/promptOverlay.tsx (one per ActivePrompt kind) with one generic
`narrow(kind)` helper that narrows the discriminated union via a typed type
guard (`p is Extract<ActivePrompt, { kind: K }>`) — no `as`. Each <Match>
branch keeps its precise typed payload. Behavior is identical.
Extract the repeated `setTimeout(() => …close…, 0)` overlay/prompt-close
pattern into a single `deferClose(fn)` helper (src/logic/defer.ts) so the
"why deferred" rationale (let the closing keystroke finish dispatching before
the composer remounts/refocuses) lives in one place.
Rewires the 5 close-defer sites: closePager/closeDashboard/closeSwitcher/
closePicker in view/App.tsx and clearSoon in view/prompts/promptOverlay.tsx.
Timing is unchanged (0ms). Other setTimeout uses (quit window, flashHint,
scroll re-anchor, resize debounce, transport) are NOT close-defers and are
left untouched.
Production boundary/logic .ts is clean of the no-unsafe-* family (gateway
payloads are Schema-decoded), so promote it from warn to error. *.tsx is
exempted: @opentui/solid's JSX namespace types every component return as
error/unknown — a framework limitation, not unsafe app code. Test helpers/
mocks (loose fixtures + async signatures) are exempted too. Remaining warns
are no-unnecessary-condition: intentional defensive guards on untrusted
runtime/gateway data that TS's narrowing can't model.
Replace the two ad-hoc as-cast loose readers in src/logic/store.ts with
effect Schema decode-at-boundary. New src/boundary/schema/SessionInfo.ts
defines SessionInfoPatchSchema + CatalogSchema (decodeUnknownOption),
mirroring GatewayEvent.ts. readInfoPatch + setCatalog now decode once and
build the typed patch/Catalog from the result (Option.none → empty
patch / catalog unset, never crashes). Wire field names verified against
tui_gateway/server.py. Removed the now-dead readOptBool helper. Tests
extended for nested-usage vs top-level context fallback, malformed/partial
payloads, and a garbage catalog.
The ring buffer is bounded (2000) but the NDJSON file was append-only and grew
forever. Add size-based rotation mirroring opencode's keep-N model: track bytes
written in-process (seeded from statSync on open, so we avoid a statSync on every
write) and, when the next line would cross LOG_MAX_BYTES (5 MiB), shift
.log -> .log.1 -> ... -> .log.5 (LOG_KEEP=5, oldest dropped) and resume on a
fresh file. Rotation is best-effort and fully try/catch-wrapped: any fs failure
leaves us appending to the existing file rather than crashing logging. Adds a
temp-dir rotation test (seeds >5 MiB to force a rotation on next write).
A caller-supplied `data` with a circular reference or BigInt makes plain
JSON.stringify throw inside the file-write catch, flipping `fileBroken` and
killing ALL file logging for the session. Add `safeStringify` (WeakSet circular
guard, BigInt -> `${n}n`, wrapped to never throw) and use it for entry
serialization, so a bad payload degrades to a placeholder instead of breaking
the sink. Also model LogLevel schema-first via Schema.Literals + inferred type
(matches boundary/schema/GatewayEvent.ts), and add focused safeStringify +
poison-payload tests.
Add a [1/4] format step to scripts/check.sh running
`bunx prettier --check src` (matching how the script invokes the other
tools), renumbering the existing steps to 2-4. Future formatting drift
now fails the gate.
The unused-imports/no-unused-vars warn → error promotion shipped in the
no-non-null-assertion commit (where the eslint rule changes live).
Run `prettier --write src` over ui-tui-opentui-v2 to normalize formatting
to the repo .prettierrc (no semicolons, single quotes, width 120,
arrowParens avoid, trailingComma none). This worktree had pre-existing
prettier-version divergences across 19 files; normalizing is correct.
No behavior changes — formatting only. The gate (type-check → lint →
bun test) stays green.
Promote strictness in ui-tui-opentui-v2:
- eslint: @typescript-eslint/no-non-null-assertion: error (with a test
override block keeping `!` in *.test.ts/tsx fixtures), and promote
unused-imports/no-unused-vars warn → error.
- tsconfig: add noUnusedLocals + noImplicitReturns.
Remove all 24 production `!` non-null assertions by replacing each with
a real guard / default / early-return, preserving rendered behavior:
- gateway/client.ts: read the pending entry once and guard (vs has()+get()!).
- logic/theme.ts: guard the parseHex regex match; `?? 0` on the always-
in-bounds XTERM_6_LEVELS lookups; restructure backgroundLuminance to
branch into a typed tuple and use charAt() for the 3-digit hex expand.
- view/homeHint.tsx: use Solid's <Show>{value => …} callback form to
narrow info().model / info().cwd instead of `!`.
- view/reasoningPart.tsx: guard the regex match before slicing m[0].
- view/statusBar.tsx: read model/cwd/pct into locals + guard; `?? 0` on
the showBar()-guarded context-bar percentage.
- view/toolPart.tsx: guard the single-arg entry; `?? 0` on the
duration-guarded fmtDuration.
Enable type-aware linting in ui-tui-opentui-v2: add projectService +
tsconfigRootDir to the TS files block and switch the preset to
recommendedTypeChecked. Turn ON as ERROR the high-value promise rules
(no-floating-promises, no-misused-promises, await-thenable) and fix the
3 real floating-promise sites in the gateway client (FileSink
write/flush/end are fire-and-forget on a piped child stdin — marked
with explicit `void`).
Defer the cast/unknown family + the noisy type-checked rules to 'warn'
(gate stays green; eslint exits 0 on warnings) for Phase 2, which will
replace the `as`/`unknown` boundary casts with Schema decoding:
no-unsafe-{assignment,member-access,argument,return,call},
no-unnecessary-condition, no-base-to-string, restrict-template-
expressions, no-unnecessary-type-assertion, require-await.
Bun runs the TS entry directly (no build step), so a missing `bun install`
otherwise surfaces as a cryptic '@opentui' resolve crash + blank UI. Fail
loudly with the fix instead. Part of the gateway build/run hardening.
Arm a startup watchdog after spawning the gateway child: if the unsolicited
gateway.ready handshake never arrives within HERMES_TUI_STARTUP_TIMEOUT_MS
(floor 2s, default 20s), emit a gateway.start_timeout event so the store can
surface a failure line + the captured stderr tail instead of a silent blank UI.
Cleared on ready (dispatch), on stop(); re-arms per recovery respawn.
Apply the existing theme selectionBg token to the plain <text> content
renderables (TextBufferRenderable supports selectionBg/selectionFg) so a
selection draws a clean solid bar that PRESERVES the text fg (no selectionFg →
no SGR-inverse fragmenting). Applied to:
- messageLine: the flat settled/user/system message text.
- toolPart: the args value lines + the output body lines.
Limitation: assistant answers rendered via the native <markdown> renderable
(MarkdownRenderable extends Renderable, not TextBufferRenderable, and
MarkdownOptions has no selectionBg/selectionFg) cannot take the themed highlight
— they fall back to the renderer's default selection style.
Subscribe to the renderer's "selection" event (fires once when a free-form
mouse selection completes) and auto-copy the spanned selectable text via the
existing onCopySelection callback. Unlike the Ctrl+C path, this does NOT
clearSelection() — the highlight persists so the user sees what was copied and
Ctrl+C still works. writeClipboard is idempotent so both paths are harmless.
Subagent hardening pass over boundary/logic/view, findings triaged (most were
false positives or app-lifetime-moot). The 3 genuine fixes:
- liveGateway.stop() now clears the pending 16ms coalesce timer before
client.stop() — a queued flush() could otherwise fire batch()/handlers into a
torn-down store after the layer scope releases.
- store.findToolPart now scans only the LIVE (last) assistant turn, not every
message — a tool.complete pairs with a tool.start in the current turn, so this
avoids matching a same-id tool in an older/resumed turn (and is O(parts)).
- store message.complete with text but NO prior start/delta now creates the turn
(complete-only gateways) instead of dropping the final text; still no empty
bubble when there's no text. +2 regression tests.
Triaged as NOT-a-bug / accepted-risk (documented so they're not relitigated):
@opentui/solid useKeyboard DOES auto-cleanup (onCleanup keyHandler.off, index.js:59);
dimensions/scrollAnchor timers are app-lifetime / try-catch-safe; unbounded-growth,
duplicate-dedup, and split-frame are theoretical for a trusted local newline-framed
subprocess; clipboard spawn timeout + atomic active-session write are minor follow-ups.
93 pass.
The Solid render-test harness (src/test/lib/render.ts + effect.ts) was never
committed — a global ~/.gitignore_global `lib/` rule silently excluded it, so the
opentui-v2 test suite wasn't reproducible from a clean checkout (render.test.tsx
imports ./lib/render). Force-add both + add a repo .gitignore negation
(!src/test/lib/). render.ts also carries the withKeymap() wrapper the keymap
migration needs (view tests mount under a KeymapProvider). 91 pass.
@opentui/keymap@0.3.2 was installed but unused (the spec said we'd use it). Wire
it natively: createDefaultOpenTuiKeymap(renderer) + <KeymapProvider> at the render
root, and a useCloseLayer(target,onClose) helper that registers a focus-within
Esc/Ctrl+C → close layer. Migrated the close handling of sessionSwitcher, picker,
approvalPrompt (close-only), confirmPrompt (y/n via confirm/cancel commands), and
pager + agentsDashboard (close via keymap; scroll/select stay raw — not cleanly
focus-gated). Overlays gain a root ref + focus-on-mount so the focus-within layer
activates. q-close re-added to pager/dashboard (footer advertises it).
Composer history/refocus + masked prompt + the Ctrl+C quit machine stay raw by
design (need the in-flight keystroke / careful state). Test harness gains a
withKeymap() wrapper so view tests mount under a provider. 91 pass; live-verified
/sessions + /tools Esc-close and composer focus recovery after.
Large bracketed pastes no longer flood the composer. On paste, if the text is
≥4 lines or >400 chars, insert a compact `[Pasted text #N +M lines]` chip and
hold the real content in a PasteStore; on submit, expand the chip back to the
full text before sending (free-code model). Single-pass String.replace keeps a
pasted block that itself contains a `[Pasted text #k]` literal safe.
The store is created ONCE in main.tsx and passed App→Composer (NOT per-composer)
so it survives the composer remounting on overlay open/close — a per-composer
store would lose a pending paste mid-compose. +6 unit tests (91 pass). Verified
live: paste 10 lines → chip; submit → transcript shows the full expanded code;
composer cleared.
The composer was a fixed height:3. Match free-code/opencode: native textarea
auto-grow via direct minHeight={1} maxHeight={max(6,⌊rows/3⌋)} props (opencode's
prompt sizing) — 1 row when empty, grows with wrapped/multiline content up to ~a
third of the screen, then scrolls internally. maxHeight is a DIRECT reactive prop
(not in style) so the cap tracks terminal resize via useDimensions. 85 pass;
verified live (a long wrapping line grew the box to 3 rows).
Typing while the textarea was unfocused doubled the FIRST letter: the always-active
handler did ta.focus() AND ta.insertText(key.sequence), but the renderer runs the
global useKeyboard handler BEFORE routing the key to the focused renderable — so
after focus() the same keystroke was also delivered to the now-focused textarea,
inserting it twice. Subsequent keys were fine (textarea already focused → block
skipped). Fix: focus() only; let the textarea insert the char it now receives.
Verified live: typing 'x' then 'y' while blurred yields '❯ xy' (no dup). 85 pass.
Design-judge top nit: Ink's bordered-box-around-the-session-info is the single
biggest 'designed home screen vs log output' signal, and the flat left-aligned
version lacked it. Wrap the model/dir/session block + Tools/Skills/MCP sections +
summary in a full border box (theme border token); banner+tagline stay above,
tips below. 85 pass.
Rebuilt the home screen to match hermes --tui: the HERMES-AGENT banner + tagline,
then a session info block (model · Nous Research / dir (branch) / Session: <id>),
then SEPARATE collapsible sections — Available Tools (enabled toolsets each as
'name: tool1, tool2', capped + '(and N more toolsets…)'), Available Skills (N) in
M categories, MCP Servers (N) connected — and a '… /help for commands' summary.
Previously it was one combined '▶ N tools · M skills · K MCP' dropdown that only
listed tools and showed no model/dir/session.
- gateway startup.catalog now returns per-toolset {enabled, tools} (resolved_tools,
session-aware enabled set — mirrors tools.list); py_compile OK.
- store Catalog gains toolset.enabled/tools; new sessionId field + setSessionId,
set on session create/resume (alongside the active-session-file write).
- homeHint takes the store, reads info (model/cwd/branch) + sessionId + catalog.
85 pass; verified live (model·Nous·dir·session + enabled toolsets w/ tools).
The transcript read as one undifferentiated gold blob (user/assistant/tool all
the same color). Adopt the Ink model where color IS the hierarchy, in 3 brightness
tiers:
- USER input → label (gold) — the human's turn stands out.
- ASSISTANT answer → text (bright) — the primary content, brightest.
- TOOL / REASONING → muted (dim) with an ACCENT glyph (⚡/▶/▼ amber) that marks
the block — clearly the secondary 'working area' below the answer.
Spacing: one blank line above every turn (was cramped: user had a blank, the
reply didn't) + the existing gap:1 between parts. Dropped the transcript's extra
marginTop (turns own their spacing now). 85 pass; verified live — gold ask, dim
tool, white answer read as three distinct things.
The launcher (hermes_cli/main.py _print_tui_exit_summary) reads
HERMES_TUI_ACTIVE_SESSION_FILE to print 'Resume this session with…' on exit. The
Ink TUI writes the current session id there on every session change
(useSessionLifecycle.writeActiveSessionFile); the native engine never did, so
after a /session switch the launcher fell back to the INITIAL launch session and
showed resume info for the wrong session (the reported leak).
Now writeActiveSession() writes {session_id} on session.create AND inside
resumeInto (every /session switch), mirroring Ink. Verified live: file shows the
created session, then updates to the switched-to session. 85 pass.
The transcript scrollbox (stickyScroll+stickyStart=bottom) re-pins to the bottom
on any content-height change when the user is at the bottom (@opentui/core
ScrollBox: `if (stickyStart && !_hasManualScroll) applyStickyStart`). So expanding
a tool/thinking block scrolled the clicked header up off-screen. A
ScrollAnchorProvider (transcript owns the scrollbox ref) lets toolPart/reasoningPart
wrap their toggle so scrollTop is held constant across the height change (re-asserted
over a few frames as layout settles) — the clicked header stays put and the
expansion reveals beneath it. 85 pass.
#2 (flicker regression): my item-7 AssistantText wrapped text in
<For each={segmentMarkdown(text)}> — segmentMarkdown returns NEW objects per
delta, so <For> (keyed by reference) DISPOSED and re-created the markdown
renderable on EVERY streamed delta. Each remount re-measured from zero → content
height oscillated → the scrollbar grew/shrank (exactly the reported symptom).
Fix (deep opencode parity): render assistant text as ONE stable native
<markdown> (MarkdownRenderable) fed the growing content in place, with
internalBlockMode="top-level" — opencode's anti-flicker mode where settled
top-level blocks aren't re-rendered per delta (_stableBlockCount, managed
internally). This is opencode's TextPart verbatim (routes/session/index.tsx:1687).
#3 (table inline formatting): the native <markdown tableOptions={{style:grid}}>
renders GFM tables as a grid WITH inline bold/italic/code in cells — so the
hand-rolled segmentMarkdown + MdTable grid are deleted (obsolete). Switched from
<code filetype=markdown> to <markdown> (the former re-measured the whole buffer
each delta and never aligned tables). 85 pass; verified live (smooth stream,
boxed table, concealed **/* markers styled).
Raw useTerminalDimensions fires on every SIGWINCH tick; during a drag that's a
recompute/reflow storm across every width-sensitive component (tool bodies,
tables, status bar, banner). Add a DimensionsProvider that runs the raw hook
ONCE and feeds a single leading+trailing-debounced (40ms) signal — mirroring the
gateway's 16ms event coalescing / opencode's createLeadingTrailingSignal — that
every consumer shares via useDimensions(). They now reflow together (no tearing)
and at most once per window. Falls back to the raw hook outside a provider
(headless tests). Verified: single resizes converge clean (wide banner ⇄ compact
brand at the 102-col threshold); rapid bursts coalesce. 90 pass.
Home screen now shows the canonical HERMES-AGENT block logo (hermes_cli/banner.py,
gold->amber->bronze via primary/accent/border tokens; width-guarded to a compact
brand line under 102 cols) plus a collapsible '▶ N tools · M skills · K MCP' panel
that expands to per-toolset / per-category / per-server detail.
Data comes from a new opt-in gateway RPC 'startup.catalog' (aggregates
get_all_toolsets + banner.get_available_skills + config mcp_servers); the native
engine fetches it best-effort on session start (Effect.catchCause swallows it on
old gateways). Opt-in => Ink path untouched. py_compile OK. Store gains a typed
Catalog + defensive setCatalog mapper. +2 tests (90 pass); verified live
(1185 tools / 196 skills / 2 MCP, expand shows the full lists).
Visual-hierarchy pass (design-reviewed against free-code/opencode):
- header: brand glyph in accent + name in primary/bold + a bottom rule, so it
reads as chrome and bookends the transcript with the status bar's top rule
(fixes 'nothing differentiates the header from the text stream').
- status bar: a dim │ divider segments model·effort from the context meter.
- user/assistant turn glyphs bold + the user ❯ in accent so turns are scannable.
- reasoning 'Thought' label uses label (not warn) so it matches tool headers —
warn is reserved for warnings; reasoning/tool now read as one aside family.
- home screen: brand in primary/bold, command names in accent vs muted descs,
wider column.
- FIX (load-bearing): strip ANSI/SGR escape sequences from slash/notice text
(pushSystem + openPager) — the gateway colors them for Ink, which interprets
them; the native <text> rendered them as literal glyphs. +stripAnsi
+ 3 tests. All tokens themed (no hardcoded colors). 88 pass.
The native <code filetype=markdown> colorizes pipes but never aligns tables.
Add a pure segmenter (segmentMarkdown) that splits assistant text into prose
runs (native renderable) and GFM table blocks, plus an MdTable grid renderer:
per-column widths (free-code's stringWidth+padAligned), :--- / :--: / ---:
alignment, bold header, dim │ separators, a ┼ header rule, width-aware column
shrink on resize. Incomplete tables (no separator yet, e.g. mid-stream) stay
prose until they close. +5 unit tests (85 pass); verified live with a 3-col table.
Blank lines between reasoning/tool/text grew and shrank mid-stream because
spacing was ad-hoc: tools carried marginTop:1, text/reasoning none, and the
markdown text part rendered the model's leading/trailing newlines as transient
blank lines that filled in as deltas arrived.
Now the parts column owns ALL spacing via gap:1 (uniform 1 line between any two
parts regardless of type/order), per-part marginTop is dropped, and text parts
are stripped of leading/trailing blank lines so the gap is the sole source —
no double gaps, no popping. Verified live: Thought/tool/tool/answer all spaced
by exactly one line. (80 pass.)
Reasoning rendered as an always-expanded plain muted blob. Now it's a proper
collapsible part (opencode ReasoningPart): auto-EXPANDED while the turn streams
(watch it think), then collapses to a one-line `▶ Thought: <title>` when settled;
click toggles. Title is the model's leading `**bold**` line (reasoningSummary).
Body renders as DIM markdown in a left-`│`-border block (Markdown gained an
optional `fg` so reasoning is muted vs the answer). +1 render test (80 pass).
Resumed tool calls were flat `⚡name arg` rows — no output, not collapsible —
because the resume snapshot (_history_to_messages) dropped each tool's result.
Now the native engine passes `with_tool_output: true` on session.resume and the
gateway folds the tool's redacted+capped result + args into its row, so resumed
turns show `▶ name arg (N lines)` collapsible blocks identical to a live turn.
The flag is OPT-IN: Ink doesn't pass it, so _history_to_messages stays byte-for-
byte unchanged for the Ink path (its expanded verbose-trail render OOM'd on big
output, #34095; the native engine renders tools collapsed, so the capped tail is
safe there). resume.ts maps context→argsPreview, result_text→resultText (label
peeled + envelope stripped), args→argsText — same shape as the live tool part.
py_compile OK. resume tests updated + 1 added (79 pass).
The gateway already ships per-tool arg metadata the client was discarding:
`context` (build_tool_preview's primary-arg line, always sent), `args` (full
dict on complete), `args_text` (redacted JSON, verbose), `duration_s`. Capture
them on the tool part and render free-code style:
- collapsed header: `▶ name <arg-preview> · <duration> (N lines)` — args are
finally visible without expanding (the core item-2 complaint).
- expanded: a single left-bordered (`│`) column with a key:value args block
(suppressed when the lone arg is already the header preview — judge nit) then
the output block.
- strip the gateway's `[showing verbose tail; omitted N chars]` banner into a
tidy `… omitted N chars` note; unwrap tail-capped `{"output":…}` envelope
fragments so the last line isn't a dangling JSON tail.
Left bar is a border glyph (opencode BlockTool style), not a bg fill — cleaner
and renders faithfully. +4 unit tests, +1 render test (78 pass).
The root box used padding:1 (all edges), reserving a blank row BELOW the
status-bar+composer block. Switch to paddingTop/Left/Right only so the input
hugs the last terminal row. Transcript stays flexGrow:1 minHeight:0; the
bottom block is the flexShrink:0 last child. StatusLine already renders
zero-height when idle, so no other change is needed.
Item 1 — copy/paste:
- boundary/clipboard.ts (ported/trimmed from opencode): writeClipboard = OSC 52
(SSH/tmux-safe) + a native command (pbcopy/wl-copy/xclip/xsel/clip);
readClipboardImage = clipboard PNG via wl-paste/xclip/pngpaste/powershell.
- Ctrl+C copies a live MOUSE SELECTION (renderer.getSelection) before the
interrupt/quit machine runs (opencode's selection-key precedence), with a
"Copied to clipboard" hint; falls through to interrupt/quit when there's no
selection.
- text paste inserts natively (textarea handlePaste); the composer's onPaste only
intercepts an EMPTY bracketed paste (image-only clipboard) → readClipboardImage
→ image.attach_bytes (the next prompt.submit picks it up).
Item 4 — mouse selection now ignores decorative glyphs: selectable={false} on the
message/tool gutter glyphs and all chrome (header, status bar, status line,
composer prompt glyph), so a drag copies the message text, not ❯/⚕/▶/⚡.
Live-smoked (this env has no clipboard tools/DISPLAY, so native copy + image read
can't be confirmed here, but): drag-select + Ctrl+C → "Copied to clipboard" (not
quit); no-selection Ctrl+C still arms quit; bracketed text paste lands in the
composer. 71 pass.
A just-started assistant turn (message.start, no deltas yet) rendered an EMPTY
fallback <text> on the glyph's line plus the `▍` caret on a SEPARATE line below —
so `⚕` sat alone with the caret dangling beneath it, indented. Folded the caret
into the no-parts fallback so it renders inline with the glyph (` ⚕ ▍`); a settled
row still shows its flat text, a turn with parts renders the parts. 71 pass.
Live-smoked: streaming start now shows `⚕ ▍` on one line; the reply text then
aligns with the glyph.
Item 15 — "/agents doesn't let me look into an agent trace live":
- store accumulates a concise per-subagent trace from the subagent.* stream
(▶ start / ⚡ tool — preview / progress text / ✓ summary), capped at 200 lines;
thinking deltas update a transient `thought` (not appended — they'd flood).
- AgentsDashboard is now master-detail: ↑/↓ select a subagent (▸ + accent), and
the bottom pane shows the selected agent's goal · status · model, its latest
thought, and a sticky-bottom (live) trace scrollbox. PgUp/PgDn scroll the trace.
Item 9 — /tools wired to a deliberate navigable overlay (fetch the roster via
slash.exec → pager) instead of incidental fallthrough; /skills already opens the
native picker.
Tests: store trace accumulation + dashboard render (trace line + footer). 71 pass.
Live-smoked: /tools → tool roster pager; /skills → picker; a real delegation
(spawn a subagent → reply PURPLE) → /agents showed the subagent with its goal ·
completed · model, 🧠 PURPLE thought, and ▶/✓ trace lines.
Item 7 — tools were non-collapsible and "ugly-interlaced":
- ToolPart now renders COLLAPSED by default as one line: `▶ name summary (N
lines)` (summary = explicit summary / first output line / error). A ▶/▼ glyph
marks expandable tools; clicking the header toggles a left-bar block of the
full (capped) output. Running tools show `name …`; single-line/erroring tools
render inline. Compact by default → far less interlacing clutter.
- toolOutput.normalizeOutput: un-double-escapes literal \n/\t when they dominate
over real newlines (some gateway tool tails are repr'd, so newlines arrived as
backslash-n and rendered as one ugly line). Conservative — genuine multi-line
output and legit `\n`-in-code are left alone. Applied in stripToolEnvelope.
Item 3 — the input "blue tint": dropped the textarea's blue focusedBackgroundColor
and added a `❯` prompt glyph. The composer is now distinguished by structure (the
glyph + the status-bar rule above it), not a background tint.
Tests: normalizeOutput (dominant-literal vs genuine-multiline). 70 pass.
Live-smoked: `ls -la` tool → collapsed `▶ terminal total 3460 (N lines)`;
SGR-click → `▼` + clean per-line output; composer shows `❯` with no blue tint.
onType used to fire complete.slash only for an argless `/command`, and Tab
replaced the whole line. Now:
- planCompletion(text) (pure, in slash.ts) routes: a `/command [args]` line →
complete.slash (the gateway completes names AND args, e.g. /details section
names); a trailing path-like word (@…, ~/…, ./…, /…, or anything with /) →
complete.path for file/dir tagging; else nothing.
- the accepted item splices ONLY its token: store tracks completionFrom (gateway
replace_from via readReplaceFrom, or the path-token start), and the composer's
Tab handler keeps the text before `from` and appends the candidate.
Tests: planCompletion (slash/path/prose/multiline) + readReplaceFrom. 69 pass.
Live-smoked: `/details ` → section dropdown (hidden/collapsed/.../activity), Tab
→ `/details hidden` (arg-only splice); `tui_gateway/` → its .py files;
`@hermes_cli/m` → m-prefixed files.
New logic/history.ts: createPromptHistory (pure cursor cycling — Up walks older,
Down walks newer back to the stashed draft, push dedupes a consecutive duplicate
+ resets) plus best-effort per-dir JSONL persistence under
$HERMES_HOME/tui-history/<sha1(cwd)>.jsonl (one JSON-encoded prompt per line,
multiline-safe).
Scoping matches the ask: prior prompts from the SAME launch dir are loaded on
start (recallable across relaunches), but a different dir keeps its own list — no
cross-dir/cross-session bleed.
Composer: Up at the first line → older prompt; Down at the last line → newer/draft
(at the boundary the textarea's own up/down is a no-op, so no conflict; mid-buffer
it still moves the cursor). setText + cursor-to-end on recall; any edit resets the
recall cursor. submit() pushes the prompt. Threaded entry → App → Composer; cwd =
process.cwd() (the launch dir under the real launcher).
Tests: 5 pure cursor-cycling cases. Live-smoked: seeded a dir file → Up/Up/Down
cycled two→one→two; a freshly submitted prompt was recalled via Up. 65 pass.
The textarea focuses on mount and when an overlay closes (remount), but focus
could drift to the transcript scrollbox on a mouse-scroll, dropping keystrokes.
Now (opencode's keep-the-prompt-focused idea, adapted):
- onMouseDown → focus the textarea (click-to-focus).
- a global keystroke net: a PRINTABLE, unmodified key while the textarea is
unfocused reclaims focus AND recovers the char (the in-flight event went to
the global handler, not the unfocused textarea, so insert it). Nav/scroll keys
(arrows/page/home/end/…) are deliberately left alone so keyboard transcript
scroll still works; kitty `release` events are skipped to avoid double-insert.
Completion accept/dismiss handler folded into the same useKeyboard with early
returns.
Live-smoked: type → text lands; `/` → completions; Esc → dismiss; type again →
lands; clean quit. 60 pass.
Item 11 — "stopping the agent doesn't work". Ctrl+C used to immediately destroy
the renderer. Now a turn-aware state machine (opencode's double-press model, the
user's preferred behaviour):
- While a turn runs (store.info.running): first Ctrl+C → session.interrupt
{session_id} (STOP the agent), and arms a 3s quit window with a warn hint
"⏹ stopped — Ctrl+C again to quit".
- Idle: first Ctrl+C arms the window ("Ctrl+C again to quit"); a stray single
press never nukes the session.
- A second Ctrl+C within the window KILLS the TUI (renderer.destroy → clean
scope teardown → gateway child EOF).
- A blocking prompt still owns Ctrl+C (deny/cancel) — unchanged.
Wiring: renderer.ts gains an `onCtrlC` hook (owns Ctrl+C when not blocked);
entry builds the machine (gateway yielded before the renderer so it can read
`running` + send interrupt). store gains a transient `hint` slice; StatusLine
shows hint (warn, priority) or the busy face (dim).
Live-smoked: long turn → Ctrl+C shows "stopped" + idle dot; second press exits
cleanly with no orphaned gateway child (the user's installed-venv sessions
untouched). 60 pass.
Item 14: a persistent bottom-chrome status bar, ported from Ink's appChrome
StatusRule. Sourced from the session.info event (model / reasoning_effort /
fast / cwd / branch / running / usage.context_*) which was decoded but dropped
until now; also folded session.create/resume result.info and message.complete
usage into a new store `info` slice.
- store: SessionInfo slice + applyInfo(); session.info handler; message.start/
complete flip `running` (the flag the Ctrl-C interrupt will read); refresh
usage on complete.
- schema: MessageComplete.payload gains loose `usage` so it survives decode.
- view/statusBar.tsx: width-aware (Ink progressive disclosure) — context bar
drops on narrow terminals, cwd compacts to last two segments + left-truncates
so the row never wraps. Turn/connection dot ◐/●/○.
- App: status bar sits ABOVE the composer; a top-edge rule (border:['top'])
visually separates the status bar + textbox input region from the transcript.
- tests: store info slice (3) + headless status-bar render (1); bumped the
approval-prompt capture height for the taller input region. 59 pass.
Live-smoked: bar shows model·effort·context%·dir; context updates 0→4% across a
turn; running dot flips; separator divides input region from transcript.
Live-usage issue 3/5: the kaomoji faces ("(¬_¬) processing…") lingered in the
transcript. Traced (instrumented capture): they arrive via `thinking.delta` —
Hermes's transient kaomoji busy *indicator* (_INDICATOR_DEFAULT=kaomoji), which I
was rendering as a persistent reasoning part.
- store: new transient `status` field. thinking.delta / status.update → `status`
(not a part); message.start + message.complete clear it. Only the real
`reasoning.delta` still becomes a (dim) transcript part.
- view/statusLine.tsx: a dim busy line above the composer shown while `status` is
set (Ink's FaceTicker analog), rendering nothing when idle; wired into App
between the transcript and the input zone.
Verified: bun run check green (55 tests / 7 files) — store tests assert
thinking.delta → status (no transcript part) + cleared on complete; status.update
→ status. Live tmux: a turn showed "٩(๑❛ᴗ❛๑)۶ cogitating…" on the transient status
line (cleared on completion) with NO face left in the transcript.
From live-usage feedback (driving the real TUI):
- Mouse ON by default (opencode parity; HERMES_TUI_MOUSE=0 opts out). Was hardcoded
off, which is why transcript wheel-scroll, scrollbar drag, and click-to-expand
tools didn't work and the terminal's native region-select polluted copy. With
useMouse the scrollbox handles the wheel + scrollbar and tools are click-expandable;
selection becomes OpenTUI's text-aware select. (Mouse can't be driven via tmux
send-keys — verify wheel/drag/click interactively.)
- Streaming markdown: match opencode's v2 text path —
<code filetype="markdown" streaming drawUnstyledText={false}>. The previous
drawUnstyledText:true drew raw text then overlaid styling each delta (a flash);
false avoids that and re-tokenizes incrementally for smoother streaming. (The
native renderable's tree-sitter doesn't settle in the headless test renderer with
drawUnstyledText:false, so the two markdown frame tests now assert the assistant
text via the store — paint is verified in the live smoke; render.ts also settles
to waitForVisualIdle.)
Verified: bun run check green (53 tests / 7 files). Live mouse + streaming
smoothness for glitch to confirm. Part of the live-feedback polish goal.
Repoint hermes_cli/main.py `_make_opentui_argv` from the superseded React entry
to the v4 Solid + Effect-at-boundary entry: it now prefers
`ui-tui-opentui-v2/src/entry/main.tsx` (cwd ui-tui-opentui-v2) and falls back to
`ui-tui-opentui/src/entry.real.tsx` only if the v2 package is absent (graceful
during coexistence). The engine gate (_resolve_tui_engine: HERMES_TUI_ENGINE /
display.tui_engine → opentui; Windows/Termux → Ink fallback) and the dual-engine
dispatch in _make_tui_argv are unchanged; Ink (ui-tui/) is untouched. The spawned
tui_gateway's source-root default lands on PROJECT_ROOT (package at
<root>/ui-tui-opentui-v2), so it loads Python from the same checkout, no extra env.
So `HERMES_TUI_ENGINE=opentui hermes --tui` now launches the v4 engine — the exact
`bun …/v2/src/entry/main.tsx` invocation live-smoked across P1–P5e, making every
first-class surface reachable from the real CLI.
Also: a consolidated 3-way acceptance summary (Ink ↔ opencode ↔ build) at the top
of opentui-feature-map.md covering all 7 first-class surfaces + the foundation +
the launcher, each ✅ + tested + smoked.
Verified: py_compile main.py OK (dev-skill rule for the 4k-line file); imported
the worktree CLI with HERMES_TUI_ENGINE=opentui → _resolve_tui_engine()='opentui',
_make_opentui_argv() → [bun, …/ui-tui-opentui-v2/src/entry/main.tsx] (cwd
ui-tui-opentui-v2, --watch in dev). v2 `bun run check` green (53 tests / 7 files).
Smoke P8 + matrix updated. Remaining: header chrome detail (5b), agent-feature
trail (5d), distribution (§10) — polish, not first-class blockers.
The agents dashboard (spec §2b; Ink agentsOverlay) — the last first-class
interactive surface. Subagent delegations are tracked from the `subagent.*`
event stream and shown in a full-height overlay.
- store: subagents[] built from subagent.{spawn_requested,start,thinking,tool,
progress,complete} by subagent_id (status·goal·model·depth·lastTool·summary);
clearTranscript clears them. dashboard flag + openDashboard/closeDashboard.
- view/overlays/agentsDashboard.tsx: full-height overlay (replaces transcript+
composer), depth-indented subagent rows colored by status, scroll via
scrollBy/scrollTo, Esc/q close. Empty state prompts to delegate.
- view/App.tsx: content zone is now a <Switch> — pager / agents dashboard /
(transcript + input zone).
- logic/slash.ts: /agents, /tasks → openDashboard (SlashContext.openDashboard).
Verified: bun run check green (53 tests / 7 files) — subagent reducer + a
dashboard frame test (seeded tree renders, transcript replaced) + /agents
dispatch. LIVE tmux: /agents opened empty; then a REAL delegation spawned a
subagent → /agents showed "⛓ Agents · 1 subagent · ● completed <goal>
(model) ⚡terminal". ALL 7 first-class surfaces are now ✅+tested+smoked
(blocking prompts, pager, session switcher, model picker, skills hub,
completions, agents dashboard). Smoke P5e + matrix updated. Remaining: chrome
(5b), agent-feature polish (5d), launcher (8).
A live slash-completion dropdown renders above the composer as you type `/…`
(spec §1 autocomplete) — the 6th and final first-class overlay surface.
- view/composer.tsx: onContentChange → onType (reads ta.plainText); a dropdown
of candidates (display + meta) renders above the textarea when completions are
set. The textarea owns key input (live refine-by-typing), so Tab accepts the
top match (ta.clear()+insertText) and Esc dismisses; arrow-nav would fight the
cursor (noted polish).
- store: completions state + setCompletions/clearCompletions; CompletionItem.
- logic/slash.ts: mapCompletions(complete.slash result) → candidates.
- entry: onType queries complete.slash for `/word` (no space) and sets/clears the
store completions; cleared on submit / non-slash / space.
Verified: bun run check green (49 tests / 7 files) — mapCompletions + a
composer-dropdown frame test. LIVE tmux: typing `/comp` showed /compress,
/composio, /compact (with descriptions); Tab accepted the top + cleared the
dropdown. ALL 6 first-class overlays are now ✅+tested+smoked (blocking prompts,
pager, session switcher, model picker, skills hub, completions). Smoke P5a +
matrix updated. Remaining: chrome (5b), agent features (5d), agents dashboard (5e).
A full-height scrollable pager (the FloatBox analog) — porting it unlocks the
long-output slash commands (/status /logs /history /tools) at once (spec §2b).
- view/overlays/pager.tsx: bordered full-height overlay (title + scrollbox +
footer), scrolling driven explicitly via useKeyboard → scrollBy/scrollTo (no
reliance on scrollbox auto-focus), Esc/q/Ctrl+C close. §8 #2 scrollbox gotchas.
- store: pager state + openPager/closePager.
- view/App.tsx: content zone swaps to the Pager (replacing transcript+composer)
when store.state.pager is set; the close is deferred a tick so the closing key
can't leak into the remounting composer.
- logic/slash.ts: present() routes output to the pager when long (>180 chars or
>2 non-empty lines, Ink parity) else a system line; titled by command; /logs
always pages. New openPager on SlashContext.
Verified: bun run check green (41 tests / 7 files) — present() routing
(short→system, long→pager) + a pager frame test (renders title/content, replaces
the transcript/composer). LIVE tmux: /logs → pager (title "Logs", scroll via
PageDown, Esc closed → composer refocused, no key-leak); /version (5-line output)
→ pager titled "Version". Smoke P5a + parity matrix updated. Completions dropdown
+ pickers + chrome are the next slices.
HERMES_TUI_RESUME=<id|recent> resumes a session instead of creating one:
session.most_recent (for "recent") → session.resume {cols, session_id} →
commitSnapshot(mapResumeHistory(messages)), buffering live events across the RPC.
- logic/resume.ts: maps the session.resume history into Message[]. Resumed tool
rows arrive as {role:'tool', name, context} (NO text — gotcha §8 #5); they're
FOLDED into the preceding assistant turn's ordered parts (state:'complete',
summary=context) so a resumed transcript renders the tools INLINE like a live
one. Assistant text gets a text part (renders via native markdown). User/system
stay flat. Unknown roles / non-arrays are ignored.
- logic/store.ts: hydrate split into beginBuffer() + commitSnapshot() so the live
event buffer spans the async resume RPC (events that arrive during resume are
replayed after the snapshot, in order).
- entry/main.tsx: bootstrap branches create vs resume; the resume path is timed
(rpc_ms / hydrate_ms) for profiling.
Verified: bun run check green (40 tests / 7 files) — resume mapper (fold tool
rows, standalone holder, ignore junk) + beginBuffer/commitSnapshot replay. LIVE
tmux: Launch A created a session with a ⚡terminal tool call; Launch B
(HERMES_TUI_RESUME=recent) hydrated user + assistant + the tool row inline.
STRESS+PROFILE on a real 103-message session (~/.hermes/sessions): client hydrate
= 76ms, bun RSS = 214MB STABLE (no leak), tool rows hydrated, PageUp scroll works;
the 1.6s cost is the server-side session.resume RPC, not the TUI. Smoke P4 +
matrix updated. Note: rows instantiate for the full history (scrollbox culls
render only) → RSS ~linear in turns; list virtualization is the lever if
multi-thousand-turn sessions become a target.
The composer now routes `/command` through the Ink-parity dispatch ladder
instead of submitting it as a prompt (spec §1):
- logic/slash.ts: parseSlash + dispatchSlash — client-local command →
slash.exec {command, session_id} (output → system line) → on reject
command.dispatch {arg, name, session_id} with typed handling
(exec/plugin→system · alias→re-dispatch · skill/send→submit a turn ·
prefill→notice). 6 client commands: help/quit/exit/clear/new/logs.
- /help renders the live `commands.catalog` (reads the `pairs` shape).
- view/prompts/confirmPrompt.tsx + store.setConfirm: a LOCAL (non-gateway) Y/N
dialog for /clear and /new; store gains pushSystem + clearTranscript.
- entry: a Promise-returning `request` adapter + the SlashContext wiring (quit →
renderer.destroy, confirm, clearTranscript, logTail, submit).
Also fixes a keystroke-leak: the key that ANSWERED a prompt was bleeding into the
freshly-refocused composer (`/clear`→y left "y" in the input, breaking the next
`/quit`). PromptOverlay now defers the prompt-clear (composer remount) past the
current keystroke — this hardens every Phase 3 prompt too.
Verified: bun run check green (36 tests / 6 files) — slash.test covers parse + the
full ladder against a fake context. LIVE tmux: /help → full gateway catalog;
/version → slash.exec output; /clear → confirm → cleared, no key-leak (typed "hi"
not "yhi"); /quit → clean quit, child reaped. Remaining TUI-only commands,
completions, pager routing, and session resume are 4b/4c. Smoke P4 + matrix updated.
The 4 gateway *.request events now drive a blocking-prompt overlay instead of
deadlocking the agent (spec §8 #6). Native OpenTUI paradigm (per glitch's steer):
- view/prompts/approvalPrompt.tsx: native <select> (once/session/always/deny)
→ approval.respond {choice, session_id}.
- view/prompts/clarifyPrompt.tsx: native <select> over choices + an "✎ Other…"
option that swaps to a native <input> for free-text → clarify.respond
{answer, request_id}.
- view/prompts/maskedPrompt.tsx: sudo (🔐) / secret (🔑) — native <input> has no
mask, so we own a buffer via useKeyboard and render '*' per char →
sudo/secret.respond {password|value, request_id}.
- view/prompts/promptOverlay.tsx: dispatches by prompt kind, binds each
answer/cancel to the matching *.respond; Esc/Ctrl+C → deny/empty so the agent
always unblocks.
Wiring: store gains ActivePrompt state + the 4 reducer cases + clearPrompt;
App swaps Composer↔PromptOverlay on store.state.prompt (so the composer textarea
stops capturing keys while blocked); renderer.ts gates the global Ctrl+C-quit on
isBlocked() so a prompt owns Ctrl+C (→ cancel); entry adds a generic `respond`
runFork callback + passes sessionId.
Verified: bun run check green (28 tests / 5 files) — reducer set/clear for all 4,
+ a frame test (approval overlay renders the command + all options as a bordered
modal, composer hidden while blocked). LIVE tmux: a real `rm -rf` approval fired;
Approve-once → command ran → unblocked; Esc → deny → "BLOCKED by user" →
unblocked; Ctrl+C-while-blocked cancelled WITHOUT quitting; Ctrl+C-unblocked quit
clean, no orphan. Smoke P3 + parity matrix updated. confirm (local) → Phase 4.
Assistant text parts now render through the NATIVE markdown renderable instead of
plain spans — bold/headings/lists/fences render, raw `**`/backtick markup is
concealed (spec §7; never hand-roll a parser).
- view/markdown.tsx: `<code filetype="markdown" streaming conceal drawUnstyledText>`
(CodeRenderable — opencode's v2 AssistantText path; `<markdown>` +
internalBlockMode="top-level" deferred paint headlessly). SyntaxStyle.fromStyles
is derived from the theme (markup.* → theme.color.*, non-hex colors guarded) and
cached by theme-object identity so all text parts share one instance, rebuilt
only on skin change. drawUnstyledText paints raw text immediately while
Tree-sitter highlighting settles (and makes it headless-capturable).
- view/messageLine.tsx: text-part Match renders <Markdown> instead of <text>.
- test/lib/render.ts: settle async markdown via flush(); captureFrame gains an
`until` option (waitForFrame) for content that paints after the first pass.
Verified: bun run check green (23 tests / 5 files). Live tmux: a markdown reply
(heading + bold word + 2-item list) rendered with `**` concealed (grep -c '**' = 0);
Ctrl+C clean, no orphan. Phase 2 complete (2a shell + 2b-i parts/tools + 2b-ii
markdown) — smoke steps 1–4 run live. Next: Phase 3 blocking prompts.
An assistant turn is now ONE ordered parts[] (text/reasoning/tool) instead of a
flat string, so tool calls render INLINE between text blocks rather than dumped
as separate rows below (spec §7 — the "dump-below" bug opencode's sync-v2 avoids).
- logic/store.ts: Part discriminated union + reducer rework. message.delta
appends to the open text part (or opens one); tool.start pushes a running tool
part; tool.complete matches by tool_id and updates that part IN PLACE (state,
envelope-stripped resultText, summary, error, lineCount); reasoning.delta
accumulates a reasoning part. User/system rows stay flat text; settled/resumed
assistant rows fall back to text.
- logic/toolOutput.ts: ported pure helpers — stripToolEnvelope (unwrap
{output,exit_code}, append [exit N]/[error] suffix) + collapseToolOutput +
truncate.
- view/messageLine.tsx: <For>+<Switch> dispatch by part.type with stable id keys.
- view/toolPart.tsx: two-tier render — inline one-liner (≤1 output line) or a
capped left-bar block (TOOL_MAX_LINES, "… +N more", click-to-expand) keyed off
the theme; reactive width via useTerminalDimensions.
Verified: bun run check green (23 tests / 5 files / 64 expects) — store
interleave/in-place/reasoning, a frame test asserting the tool renders inline +
envelope stripped, and toolOutput unit tests. Live tmux: a terminal-tool prompt
rendered "⚡ terminal" with its alpha/beta output inline between the assistant's
text parts; Ctrl+C clean, no orphan. Smoke P2b + parity matrix updated. Native
<markdown> for text parts is the next slice (2b-ii).
Turns the read-only Phase-1 view into an interactive shell, split into focused
view components (spec v4 §2 layout):
- view/transcript.tsx: ONE full-height <scrollbox> with a reactive <For>
(opencode's no-scrollback model). Applies the §8 #2 gotchas exactly:
minHeight:0 on the wrapper AND the scrollbox, NO flexDirection on the
scrollbox root, stickyScroll + stickyStart="bottom".
- view/composer.tsx: a native <textarea> captured by ref — flexShrink:0,
focus-on-mount, Enter->submit via keyBindings, imperative .clear() on submit,
and a `submitting` re-entrancy guard. Wired by the entry to fire prompt.submit
(Effect.runFork on the in-hand service value); it's now the PRIMARY input, with
the HERMES_TUI_PROMPT stand-in kept only for launch-with-prompt.
- view/header.tsx + view/messageLine.tsx: extracted, themed (no hardcoded
styles). MessageLine stays flat-text this slice; ordered parts (§7) land in 2b.
test/lib/render.ts now flushes 3 renderOnce passes before capture — a <scrollbox>
needs more than one pass to measure content + apply sticky, else the transcript
row paints blank.
Verified: bun run check green (12 tests / 4 files / 31 expects). Live tmux drive:
typed into the composer -> cleared -> user row -> streamed reply ("Here are three
words"); Ctrl+C quits cleanly even with the textarea focused, no orphan child.
Composer placeholder rendered the live skin's welcome string (skin->theme live).
Smoke P2a + parity matrix updated. Phase 2b (ordered parts/tool render/markdown)
is the next slice.
GatewayService/liveGateway over the real Python tui_gateway: JSON-RPC stdio
framing (Bun.spawn), 16ms event coalescing flushed inside Solid batch(), typed
GatewayError, and a decode-once GatewayEvent Schema (~35-member tagged union;
unknown/malformed events skip via Option.none, never crash the stream).
The Solid sync-v2-style store grows to: streaming text concat (prefer
payload.text), gateway.ready{skin}/skin.changed -> fromSkin reactive re-theme,
LRU id-dedup, and hydrate-while-buffering (resume scaffold). Theming is a 1:1
port of Ink's theme.ts (DARK/LIGHT, detectLightMode, ANSI-256 normalization,
fromSkin) behind a Solid ThemeProvider so existing skins work unchanged and the
view carries NO hardcoded styles. A console-safe diagnostics log (in-memory
ring + NDJSON file) is the single logging path.
Entry gains a live launch path (default; HERMES_TUI_FAKE=1 -> scripted hello)
with an initial-prompt bootstrap (session.create -> prompt.submit) as the
Phase-2-composer stand-in, plus a minimal Ctrl+C graceful quit
(renderer.destroy -> shutdown Deferred -> scope finalizers -> client.stop) so
the engine reaps its own gateway child instead of orphaning it.
Verified: bun run check green (tsc + eslint + 12 tests / 4 files); live tmux
drive connect -> gateway.ready -> prompt -> streamed reply ("pong") -> clean
teardown with no orphan bun/python. Parity matrix + smoke P1 run log updated.
hermes --tui launches the native OpenTUI engine (Bun) when
HERMES_TUI_ENGINE=opentui (env) or display.tui_engine=opentui (config);
Ink stays the default and the shipping path is untouched.
- _resolve_tui_engine() (env > config > ink); refuses opentui on
Windows/Termux (no Bun) -> falls back to ink with a notice.
- _make_opentui_argv() -> [bun, src/entry.real.tsx] (no build step).
- _bun_bin() with HERMES_BUN override.
- Branch at top of _make_tui_argv BEFORE _ensure_tui_node (Bun-only host
must not bootstrap Node).
- Gate _launch_tui NODE_OPTIONS/--max-old-space-size on engine==ink (Bun
is JSC; the V8 flag errors/ignores).
Verified end-to-end via tmux: real hermes --tui -> Bun -> OpenTUI ->
real Python gateway streamed a real reply. No-flag default still ink.
2026-06-08 11:11:54 +00:00
726 changed files with 276244 additions and 3476 deletions
@@ -107,6 +107,8 @@ You can still bring your own keys per-tool whenever you want — the gateway is
Hermes has two entry points: start the terminal UI with `hermes`, or run the gateway and talk to it from Telegram, Discord, Slack, WhatsApp, Signal, or Email. Once you're in a conversation, many slash commands are shared across both interfaces.
> **TUI engine:** On supported hosts (Linux/macOS with Node 26.3+), the terminal UI defaults to the native **OpenTUI** engine, which the installer provisions for you. The legacy **Ink** engine remains the fallback — it's used automatically on Windows, Termux, or when the native engine can't run, and you can select it explicitly with `HERMES_TUI_ENGINE=ink hermes`. Ink is not going away; it's the kept fallback.
@@ -1172,7 +1205,7 @@ Recovered from a deterministic fallback because the LLM context summarizer was u
## Active State
Unknown from deterministic fallback. Inspect current repository/session state if needed.
## In Progress
{HISTORICAL_IN_PROGRESS_HEADING}
{active_task}
## Blocked
@@ -1184,13 +1217,13 @@ None recoverable from deterministic fallback.
## Resolved Questions
None recoverable from deterministic fallback.
## Pending User Asks
{HISTORICAL_PENDING_ASKS_HEADING}
{active_task}
## Relevant Files
{_bullets(relevant_files,limit=12)}
## Remaining Work
{HISTORICAL_REMAINING_WORK_HEADING}
Continue from the most recent unfulfilled user ask and protected tail messages. Verify state with tools before making claims.
## Last Dropped Turns
@@ -1312,7 +1345,7 @@ Summary generation was unavailable, so this is a best-effort deterministic fallb
_temporal_anchoring_rule=""
# Shared structured template (used by both paths).
_template_sections=f"""## Active Task
_template_sections=f"""{HISTORICAL_TASK_HEADING}
[THE SINGLE MOST IMPORTANT FIELD. Capture the user's most recent unfulfilled
input verbatim — the exact words they used. This includes:
- Explicit task assignments ("refactor the auth module")
@@ -1359,7 +1392,7 @@ Be specific with file paths, commands, line numbers, and results.]
- Any running processes or servers
- Environment details that matter]
## In Progress
{HISTORICAL_IN_PROGRESS_HEADING}
[Work currently underway — what was being done when compaction fired]
## Blocked
@@ -1371,14 +1404,14 @@ Be specific with file paths, commands, line numbers, and results.]
## Resolved Questions
[Questions the user asked that were ALREADY answered — include the answer so it is not repeated]
## Pending User Asks
[Questions or requests from the user that have NOT yet been answered or fulfilled. If none, write "None."]
{HISTORICAL_PENDING_ASKS_HEADING}
[Questions or requests from the user that have NOT yet been answered or fulfilled. These are STALE — they were from the compacted turns. Write them here for reference only. The agent must NOT act on them unless the latest user message explicitly requests it. If none, write "None."]
## Relevant Files
[Files read, modified, or created — with brief note on each]
## Remaining Work
[What remains to be done — framed as context, not instructions]
{HISTORICAL_REMAINING_WORK_HEADING}
[What remains to be done — framed as STALE context for reference only. The agent must NOT resume this work unless the latest user message explicitly asks for it.]
## Critical Context
[Any specific values, error messages, configuration details, or data that would be lost without explicit preservation. NEVER include API keys, tokens, passwords, or credentials — write [REDACTED] instead.]
@@ -1753,7 +1786,7 @@ The user has requested that this compaction PRIORITISE preserving all informatio
Context compressor bug (#10896): ``_align_boundary_backward`` can pull
``cut_idx`` past a user message when it tries to keep tool_call/result
groups together. If the last user message ends up in the *compressed*
middle region the LLM summariser writes it into "Pending User Asks",
middle region the LLM summariser writes it into "Historical Pending User Asks",
but ``SUMMARY_PREFIX`` tells the next model to respond only to user
messages *after* the summary — so the task effectively disappears from
the active context, causing the agent to stall, repeat completed work,
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.