Make Ctrl+L non-destructive by redrawing the current screen state instead of starting a new session, and stop auto-appending --global for typed /model commands so session scope remains the default unless explicitly requested.
Handle queued-title ValueError cleanup during session init, harden Discord message source building for test stubs, and fix the Dockerfile contract test syntax error. Also refresh the TUI lockfile and Nix build flags so nix ubuntu-latest no longer fails on npm lock/peer resolution drift.
Route TUI /title through session.title RPC and queue titles when the session DB row is still initializing, so renamed sessions reliably appear in /resume and browse flows.
- drop unused TUI helpers, test-only layout scaffolding, and stale public debug exports
- remove an unused profiler import and trim test-only coverage for deleted helpers
- gateway handler: turnController always archives in recordMessageComplete,
so the post-complete archiveTodosAtTurnEnd().forEach is dead code. Drop
it and the now-unused import.
- turnController: collapse archive prepend into a single spread expression.
- gateway server: one-line comment for the tool.start todo skip.
Two bugs surfaced together while the model fired the todo tool:
1. Count flickered (e.g. 3 → 1 → 3) because tool.start echoed
args.todos as the live state. With merge=true (or any partial
replacement) args.todos is just the items being updated, not the
full list. Drop the early echo — tool.complete already carries the
canonical full list from the tool result.
2. After turn end the panel jumped from under the user prompt to below
thinking/tools because archiveDoneTodos() was pushed AFTER segments
in finalMessages. Prepend the archive trail msg so it sits right
after the user prompt — same visual slot the live panel occupied
during streaming.
Keep history metadata consistent with lineage replay, globally order replayed lineage messages, and make Ink cache eviction report post-eviction sizes. Also keys TUI config cache by path to avoid cross-home test leakage.
Four independent session-UX bugs reported by an external user (#16294).
/save wrote hermes_conversation_<ts>.json to CWD — invisible to
'hermes sessions browse' and easy to lose. Snapshots now write under
~/.hermes/sessions/saved/ and the command prints the absolute path plus
a 'hermes --resume <id>' hint for the live DB-indexed session.
'hermes sessions browse' default --limit raised from 50 to 500. With the
old ceiling, users with moderately long histories saw only the most
recent 50 rows and assumed older sessions had been lost.
TUI session.list (`/resume` picker) switched from a hardcoded allow-list
of 13 gateway source names to a deny-list of just { 'tool' }. Sessions
tagged acp / webhook / user-defined HERMES_SESSION_SOURCE values and
any newly-added platform now surface. Default limit 20 → 200.
ollama-cloud provider setup passes force_refresh=True to
fetch_ollama_cloud_models() so a user entering their API key sees the
fresh catalog (e.g. deepseek v4 flash, kimi k2.6) immediately instead
of waiting up to an hour for the disk cache TTL to expire.
Closes#16294.
- stringWidth: true LRU on cache hit (touch-on-read via delete+set) so
hot strings stay resident under long sessions; was insertion-order
FIFO before
- virtualHeights: include todos, panel sections, and intro version in
messageHeightKey so height-cache reuse correctly invalidates when
todo content / panel sections change
- virtualHeights: estimate trail+todos rows at todos.length+2 (or 2
collapsed) instead of the generic ~1-line fallback, so initial
virtualization offsets are closer to reality
- useInputHandlers: clearTimeout on unmount for scrollIdleTimer so
pending relaxStreaming() never fires after teardown
- render-node-to-output: drop unused declined.noHint counter from
scrollFastPathStats; it was always 0 (the "hint missing" branch is
outside the diagnostics block)
- perfPane / hermes-ink.d.ts: follow the noHint removal
- wheelAccel: replace ~/claude-code path comment with generic
attribution that doesn't reference a developer-local checkout
TodoPanel now renders as a child of the most recent user message's
virtualized row container, so it visually belongs to that prompt and
follows it during scroll. Falls back gracefully when no user message
exists yet (panel just doesn't render).
Adds an `evictInkCaches(level)` API that prunes the four hot module-level
caches (`widthCache`, `wrapCache`, `sliceCache`, `lineWidthCache`) with
either a half-keep LRU pass or a full clear. Wired into:
- memoryMonitor: half-prune on 'high', full drop on 'critical', before
the heap dump / auto-restart path. Gives long sessions a shot at
recovering RSS instead of hard-exiting.
- useSessionLifecycle.resetSession: half-prune so a /new session starts
with a half-warm pool and the prior session can resume cheaply.
Also: lineWidthCache now uses LRU half-eviction on overflow instead of a
full `cache.clear()`, matching the other three caches.
Comparison vs claude-code: both forks now share the same `prevScreen`
blit + dirty-cascade machinery in render-node-to-output. Their smoothness
came from sibling-memo discipline (every chrome pane memo'd so dirty
cascade doesn't disable transcript blit) — already in place in our
appLayout.tsx (TranscriptPane / ComposerPane / StatusRulePane all memo'd).
Alt-screen is not the cause; both use it. The remaining gap was per-row
CPU on width/wrap/slice, which the previous commit closed.
CPU profile (Apr 2026, real-user scroll on 11k-line session) showed three
hot loops in the per-frame render path:
Output.get() per-frame walk: 24% total
└─ sliceAnsi(line, from, to) per write: 18% total
stringWidth(line) chain (cached + JS): 14% total
All three were re-doing identical work every frame: same string → same
clipped slice → same width.
Fixes:
1. Memoize stringWidth (8k-entry LRU) for non-ASCII strings; ASCII fast-path
skips the cache (inline scan beats Map.get for short ASCII, the >90%
case). String.charCodeAt scan up to 64 chars is cheaper than the regex
fallback.
2. Memoize wrapText (4k-entry LRU keyed by maxWidth|wrapType|text) — wrapAnsi
is pure and the same content reflows identically every frame.
3. Memoize sliceAnsi (4k-entry LRU keyed by start|end|str) for the
end-defined hot path used by Output.get().
4. Skip the slice entirely in Output.get() when the line already fits the
clip box (startsBefore=false && endsAfter=false). Most transcript lines
never exceed their container width, and tokenizing them just to slice
(line, 0, width) was pure overhead. This single fast-path drops
sliceAnsi from 18% → ~0% in the profile.
Also tighten virtualization constants (MAX_MOUNTED 260→120, OVERSCAN 40→20,
SLIDE_STEP 25→12) and cap historical-message render at 800 chars / 16
lines via HISTORY_RENDER_MAX_*; messages inside the FULL_RENDER_TAIL_ITEMS
window still render in full so reading-zone behavior is unchanged.
Validation, real-user CPU profile, page-up scroll on 11k-line session:
Output.get() self-time: 24% → 0.3%
sliceAnsi total: 18% → not in top 25
stringWidth family: 14% → ~3%
idle: 60.7% → 77.3%
Frame timings (synthetic page-up profile harness):
dur p95: ~10ms → 4.87ms
dur p99: 25ms+ → 12.80ms
yoga p99: ~20ms → 1.87ms
The remaining CPU in the profile is Yoga layoutNode + React commit,
which is the irreducible work for this UI tree size.
Adds a corner-overlay FPS readout gated on HERMES_TUI_FPS, fed by
ink's onFrame callback (so it's the REAL render rate, not a timer).
Displays fps, last-frame duration, and total frame count, colored by
threshold (green ≥50, yellow ≥30, red below).
Implementation:
* lib/fpsStore.ts — nanostore atom updated from a trackFrame()
sink. Ring buffer of last 30 frame timestamps; fps = 29/elapsed.
trackFrame is undefined when SHOW_FPS is off so ink's onFrame
short-circuits at the optional chain.
* components/fpsOverlay.tsx — tiny <Text> subscriber; returns null
when SHOW_FPS is off (React skips the subtree entirely).
* entry.tsx — composes onFrame from logFrameEvent (dev-perf) and
trackFrame (fps) so both flags can coexist. When both are off,
onFrame is undefined and ink never attaches the handler.
* appLayout.tsx — mounts the overlay as a flex-shrink=0 right-
aligned Box below the composer, conditional on SHOW_FPS.
Usage:
HERMES_TUI_FPS=1 hermes --tui
# bottom right: " 62.3fps · 0.8ms · #1234" (green/yellow/red)
Intended as a user-facing diagnostic during the scroll-perf tuning
pass — watch the counter drop while holding PageUp to see where
frames go silent, without having to run scripts/profile-tui.py in a
side terminal.
126 files post-compile with React Compiler; 352 tests still pass.
Replaces the static WHEEL_SCROLL_STEP=1 multiplier on wheel events
with an adaptive accel state machine that infers user intent from
inter-event timing.
Algorithm ported straight from claude-code's
src/components/ScrollKeybindingHandler.tsx. All tuning constants,
the native/xterm.js path split, the encoder-bounce detection, the
trackpad-burst signature → all theirs. This file is a mechanical
port into our module structure.
What it does:
precision click (>500ms gap) 1 row/event (deliberate scan)
sustained mouse (40-200ms) 2-6 rows (decay curve)
detected wheel bounce ramps to 15 (sticky wheel-mode)
trackpad flick (5+ <5ms) 1 row/event (burst detect)
direction reversal reset to base
Two implementation paths:
* native terminals (ghostty, iTerm2, Kitty, WezTerm) — linear
window-ramp + optional wheel-mode curve triggered by detected
encoder bounce. SGR proportional reporting handled via the
burst-count guard.
* xterm.js (VS Code / Cursor / browser terminals) — pure
exponential-decay curve with fractional carry. Events arrive
1-per-notch with no pre-amplification, so the curve is more
aggressive.
Selected at construction via isXtermJs() from @hermes/ink (now
exported). Per-user tune via HERMES_TUI_SCROLL_SPEED (alias
CLAUDE_CODE_SCROLL_SPEED for portability).
13 unit tests covering direction flip/bounce/reversal, idle
disengage, trackpad-burst disengage, frac invariants, and the
native vs xterm.js branches.
Profiled under --rate 30 (stress test) and --rate 10 (realistic
sustained scroll): accel ramps to cap=6 at 30Hz burst, decays to
1-3 rows at sparse 10Hz clicks. Perf is comparable to baseline
because accel IS multiplying step — the win is perceptual (fast
flicks cover distance, slow clicks keep precision), not raw fps.
Companion to the earlier WHEEL_SCROLL_STEP=1 change: that set the
base; this modulates around it.
Adds a gate so we can A/B test whether bypassing the alt-screen +
viewport constraint lets the terminal's native scrollback beat our
virtualization on scroll perf.
Result: definitively NO. Inline mode is 40x worse on every metric
that moves, because AlternateScreen is what constrains the ScrollBox
to the viewport height. Without it, the ScrollBox grows to contain
every child of the transcript and every frame re-renders all 1100
messages.
Profile under hold-wheel_up (1106-msg session, 30Hz for 6s):
metric fullscreen inline delta
patches_total 28,864 1,111,574 +3751%
writeBytes_total 42 KB 1.6 MB +3881%
fps_throughput 15.8 fps 1.75 fps -89%
frames 179 18 -90%
gap_p50_ms 17 (~60fps) 726 (~1fps) +4170%
yoga_p99 34 ms 405 ms +1083%
renderer_p99 14 ms 169 ms +1062%
flickers 0 5 offscreen —
This is actually the cleanest data we've gotten so far:
* AlternateScreen is LOAD-BEARING for perf — its viewport height
constraint is what lets useVirtualHistory's culling work. No
constraint → ScrollBox grows unbounded → every fiber mounts.
* The outer terminal (Cursor's xterm.js) parsed 1.6 MB of ANSI in
under 10 seconds with drain p99 = 8.83 ms and 0 backpressure
frames. Our terminal-write hypothesis from last session was
wrong: the bottleneck is React + Yoga, not the wire.
* Doing proper inline mode (non-virtualized transcript in
scrollback, composer pinned below) is not a flag flip — it's a
different UI architecture. Leaving this flag in so anyone
re-running the experiment gets the same numbers, but not
building the architecture until we're sure the perf win is
worth the UX loss (it probably isn't — the fullscreen + virt
path is the one we should optimize, not replace).
Keeping the flag as an experiment gate. Flip HERMES_TUI_INLINE=1
and run scripts/profile-tui.py --compare to reproduce.
Adds four fields to FrameEvent.phases and the matching profile
summary:
optimizedPatches post-optimize patch count (what's actually
written to stdout; the .patches field is
pre-optimize)
writeBytes UTF-8 byte count of the write this frame
backpressure true when Node's stdout.write returned false
(Writable buffer full — outer terminal can't
keep up)
prevFrameDrainMs end-to-end drain time of the PREVIOUS frame's
write, captured from stdout.write's 2-arg
callback. Reported on the next frame so the
measurement reflects "time until OS flushed
the bytes to the terminal fd", not "time until
queued in Node".
writeDiffToTerminal() now returns { bytes, backpressure } and
accepts an optional onDrain callback. Only attached on TTY with
diff; piped/non-TTY stdout bypasses flow control so the callback
would fire synchronously anyway.
Initial measurements under hold-wheel_up against 1106-msg session
(30Hz for 6s):
patches total 28,888
optimized total 16,700 (ratio 0.58 — optimizer cuts ~42%)
writeBytes 42 KB / 10s = 4.2 KB/s throughput
drainMs p50 0.14 ms terminal accepts bytes instantly
drainMs p99 0.85 ms
backpressure 0% of frames
This rules out the terminal-parse hypothesis — Cursor's xterm.js
drains our output in sub-millisecond time at only 4 KB/s. The
remaining lag has to be in the render pipeline, not the wire.
Profile output now includes the bytes+drain+backpressure lines to
keep this visible on every subsequent iteration.
Profiled with scripts/profile-tui.py under hold-PageUp + hold-wheel.
The placeholder → microtask-upgrade pattern did not reduce renderer
p99 (63ms → 63ms) or max (96ms → 142ms, slightly worse). Each fresh
row still pays the Md cost — just on a follow-up commit instead of
inline — and the follow-up commit shows up as a second heavy frame
a few ms later.
The real bottlenecks turned out to be:
1. wheel step too large (fixed in 7ca16eea)
2. outer terminal ANSI parse throughput (diagnosing next)
3. React commit frequency during hold-scroll (needs coalescing)
None of which DeferredMd addresses. Clearing the complexity so the
next experiments land on a simpler substrate.
User observation: "it doesn't scroll line by line/row by row."
Was right. Two places hardcoded big deltas:
1. WHEEL_SCROLL_STEP = 6 (config/limits.ts)
Each wheel event scrolled 6 rows. A mechanical wheel notch emits
3-5 events → 18-30 rows per click, which visually teleports past
content instead of smooth-scrolling it. Drop to 1. Trackpads
emit 50-100 events per flick — at step=1 that's still a fast flick
(a whole viewport in one flick) but each intermediate frame is
visible. Porting claude-code's wheel accel state machine is the
right next step if this feels sluggish on precision scrolls.
2. pageUp/pageDown = viewport - 2 (useInputHandlers.ts)
Full-viewport jumps replace the entire screen — no visual
continuity, can't scan content — AND land right at Ink's fast-path
threshold (`delta < innerHeight`), which disqualifies the DECSTBM
blit on every press. Half-viewport keeps 50% continuity AND
drops well under the threshold. Two presses still cover the same
total distance.
Profiled against the 1106-msg session, holding the key at 30Hz for
6s:
wheel_up (step 6 → 1):
frames 142 → 163 (+15%)
throughput 10.7 → 15.8 fps (+48%)
patches tot 53018→ 36562 (-31%)
gap p50 5ms → 16ms (actual rendering ~60fps now)
<16ms frames 93 → 76
16-33ms 82 → 76
hitches 3 → 1
pageUp (viewport-2 → viewport/2):
throughput 10.7 → 9.5 fps (same ballpark — smaller delta × same
event rate = less total scroll)
Ink's proportional drain caps at `innerHeight - 1` per frame to keep
the DECSTBM fast path firing. With these smaller deltas every event
comfortably fits under that cap, so fast-path hit rate goes up and
patch volume per frame drops — the measured 31% reduction in total
patches-sent correlates with users perceiving smoother scrolling
because the outer terminal (VS Code / xterm.js / tmux) isn't drowning
in ANSI between paints.
Tests/type-check/build clean; 352 tests pass.
Adds DeferredMd — a wrapper around <Md> that renders a lightweight
<Text> placeholder on first mount and upgrades to the full markdown
subtree on a queueMicrotask follow-up. Rationale: fresh MessageLine
mounts during PageUp hold run our markdown tokenizer + syntax
highlighter synchronously, producing the 63-112ms renderer spikes
profiled earlier. A plain <Text> placeholder only needs Yoga to wrap
the pre-stripped string (no tokenizer, no highlight), then the Md
subtree builds in a follow-up React commit.
Upgrade cache: once a (theme, compact, text) tuple has been upgraded,
a WeakMap-keyed Set remembers it so remounts (scroll-out then
scroll-back) mount straight into <Md> — no placeholder round-trip.
WeakMap on theme means palette swaps re-upgrade naturally.
Honesty note: profiling under hold-PageUp showed this didn't reduce
renderer p99 measurably — the upgrade commit just pays the Md cost on
a follow-up frame instead of inline. The bigger bottleneck turned out
to be React commit frequency (3.5 commits/sec during 30Hz scroll
input, with 200ms+ silent gaps between commits dominating perceived
FPS), which this change doesn't address. Keeping the deferred path
anyway because:
1. It's correct and tested — no regressions across 352 tests
2. Defensive for pathological fresh-mount cases (giant code blocks,
wide tables) that aren't in the current profile fixture
3. Pairs naturally with useVirtualHistory's useDeferredValue to keep
React's concurrent scheduler able to interrupt upgrade commits
If the follow-up perf investigation (terminal write throughput / patch
volume / commit frequency) shows DeferredMd is net-neutral-or-worse in
practice, this can be reverted with a one-line swap back to <Md> in
messageLine.tsx:115.
Companion to the streaming 2-column fix in 7242361a — these two
touched messageLine.tsx together so they land as a pair.
StreamingMd returned <><Md/><Md/></> — a bare Fragment with two <Md>
children. Each <Md> returns a <Box flexDirection="column">, but its
parent in messageLine.tsx (line 169) is `<Box width={...}>` with no
flexDirection, which Ink defaults to 'row'. So during streaming the
two column boxes rendered side-by-side, producing the visible "tokens
jumble into two columns until it fixes itself" bug — the "fix" was
message.complete flipping isStreaming→false, which swaps the
StreamingMd subtree for a single DeferredMd/Md child (no siblings → row
direction is harmless).
Wrap the two <Md> siblings in a flexDirection="column" Box so they
stack. Localized fix so the non-streaming path (single-child, works
fine in a row parent) is untouched.
Reported by user:
> "tokens streaming... going into 2 columns randomly and jumbling
> together until it fixes itself"
No test changes — findStableBoundary tests still pass (the layout
change is parent-structural, not in the boundary logic). Build clean,
tsc clean, 352 tests pass.
Adds scrollFastPathStats counters to render-node-to-output.ts: captures
every time a ScrollBox's DECSTBM scroll hint is generated, records
whether the fast path took it (blit+shift from prevScreen) or declined,
and why. Exposed through hermes-ink's public exports and snapshotted on
every FrameEvent so the profiler harness can correlate decline reasons
with the actual patch/renderer cost per frame.
This is pure observation — no behaviour change. Preparing for the
virtual-history rewrite: the hypothesis was that our topSpacer/
bottomSpacer scheme disqualifies every scroll via heightDelta
mismatch, but the data shows the fast path is actually taken on most
scrolls (19/23 over a 6s PageUp hold through 1100 messages) — the
remaining steady-state renderer cost is Yoga tree traversal, not
the per-frame full redraw I initially suspected.
Declines that do happen correlate with React commits that changed the
mounted range mid-scroll (heightDelta=±3 to ±35). Those are the rarer
cases the virtualization rewrite still needs to address.
No test diffs — instrumentation-only. Build verified: `tsc --noEmit`
plus the full `npm run build` compiler post-pass pass cleanly.
Extends HERMES_DEV_PERF to capture the complete render pipeline, not
just React commits. Adds scripts/profile-tui.py to drive repeatable
hold-PageUp stress tests against a real long session.
perfPane.tsx:
Wires ink's onFrame callback (already plumbed through the fork) into
the same perf.log as the React.Profiler samples. Captures per-phase
timing (yoga calculateLayout, renderNodeToOutput, screen diff, patch
optimize, stdout write) plus yoga counters (visited/measured/cache-
Hits/live) and patch counts per frame. Events are tagged
{src: 'react'|'frame'} so jq can split them. logFrameEvent is
undefined when HERMES_DEV_PERF is unset, so ink doesn't even attach
the callback.
entry.tsx:
Passes logFrameEvent into render().
types/hermes-ink.d.ts:
Declares FrameEvent + onFrame on RenderOptions so the ui-tui side
type-checks against the plumbed-through ink option.
scripts/profile-tui.py:
New harness. Launches the built TUI under a PTY with the longest
session in state.db resumed, holds PageUp/PageDown/etc at a
configurable Hz for N seconds, then parses perf.log and prints
per-phase p50/p95/p99/max plus yoga-counter summaries. Zero deps
beyond stdlib. Exit 2 if nothing was captured (wiring broken).
Initial findings (1106-msg session, 6s PageUp hold at 30Hz):
- Steady state: 10 fps; renderer phase p99=63ms, write p99=0.2ms
- 4/107 heavy frames (>=16ms), all dominated by renderNodeToOutput
- One pathological 97ms frame with yoga measuring 70,415 text cells
and Yoga visiting 225k nodes — the cold-unmeasured-region hit
- Ink's scroll fast-path (DECSTBM blit from prevScreen) is
disqualified because our spacer-based virtual history doesn't
keep heightDelta in sync with scroll.delta, so every PageUp step
falls through to a full 2000-4800 patch re-render instead of ~40
Split in-flight assistant text at the last stable block boundary so only
the unclosed tail re-tokenizes per stream delta. Previously the full
text was rendered as plain <Text> during streaming and only flipped to
<Md> at message.complete — cheap per delta but loses live markdown
formatting.
New StreamingMd component holds a monotonically-growing stablePrefix
in a ref (idempotent under StrictMode double-render), renders it as
one <Md> that memoizes across deltas, and renders the unstable suffix
as a second <Md> that re-parses on each delta. Cost per delta drops
from O(total length) to O(unstable length).
findStableBoundary walks back to the last "\n\n" outside an open
fenced code block — splitting inside an open fence would orphan the
opener and break highlighting in the prefix.
Adapted from claude-code's src/components/Markdown.tsx:186 but built
on our line-based tokenizer instead of marked.lexer. 9 new tests cover
fence balance, boundary walk, and empty input.
Part of the --tui perf audit (see audit #7).