Follow-up to #15960 — the provider-active detection in tools_config.py
also read use_gateway with raw truthiness (is False, not dict.get), so
quoted 'false' caused the FAL-direct row to show wrong active status in
the hermes tools picker. Route both sites through is_truthy_value().
PR #16013 plugged the leak in `/new`, but two sibling session-boundary
resets had the same bug:
1. Inactivity / suspended-session auto-reset (top of `_handle_message`)
previously cleared only reasoning. Now drops model override and the
queued "/model switched" note as well.
2. Compression-exhaustion auto-reset now also drops the pending note
alongside the existing model/reasoning cleanup.
All three session-boundary sites now use the identical cleanup idiom.
`npm install --silent` (used by `_build_web_ui` and `_update_node_dependencies`)
silently rewrites package-lock.json on npm ≥ 10 (strips "peer": true etc.),
leaving the working tree dirty after every `hermes update`. The next update
then detects the dirty lockfile and stashes it — producing a trail of
hermes-update-autostash entries for web/package-lock.json, ui-tui/package-lock.json,
and root package-lock.json.
Switch to `npm ci` (strict, lockfile-preserving) via a new
`_run_npm_install_deterministic` helper that falls back to `npm install`
when the lockfile is missing or out of sync (WIP forks).
Verified locally: all three lockfiles stay byte-identical after the real
_build_web_ui / _update_node_dependencies run twice back-to-back. Fallback
path tested with a deliberately out-of-sync lockfile and a no-lockfile case.
Four independent session-UX bugs reported by an external user (#16294).
/save wrote hermes_conversation_<ts>.json to CWD — invisible to
'hermes sessions browse' and easy to lose. Snapshots now write under
~/.hermes/sessions/saved/ and the command prints the absolute path plus
a 'hermes --resume <id>' hint for the live DB-indexed session.
'hermes sessions browse' default --limit raised from 50 to 500. With the
old ceiling, users with moderately long histories saw only the most
recent 50 rows and assumed older sessions had been lost.
TUI session.list (`/resume` picker) switched from a hardcoded allow-list
of 13 gateway source names to a deny-list of just { 'tool' }. Sessions
tagged acp / webhook / user-defined HERMES_SESSION_SOURCE values and
any newly-added platform now surface. Default limit 20 → 200.
ollama-cloud provider setup passes force_refresh=True to
fetch_ollama_cloud_models() so a user entering their API key sees the
fresh catalog (e.g. deepseek v4 flash, kimi k2.6) immediately instead
of waiting up to an hour for the disk cache TTL to expire.
Closes#16294.
Expand the airtable skill from bare CRUD to a full Hermes-shaped
cookbook matching the linear/notion neighbors, and trim the
description to fit the 60-char system-prompt cutoff.
Hermes-specific additions:
- Explicit 'use the terminal tool with curl — not web_extract or
browser_navigate' guidance, matching the same note in linear.
- Note that AIRTABLE_API_KEY flows from ~/.hermes/.env into the
subprocess automatically via env_passthrough, so curl calls don't
need to re-export it.
- Prefer 'python3 -m json.tool' (always present) over jq (optional)
for pretty-printing, with -s on every curl to keep output clean.
- Read-before-write workflow that resolves record IDs via
filterByFormula instead of guessing.
Cookbook expansion (new vs original):
- Field-type reference table (text, select, multi-select, attachment,
linked record, user) with the exact write-shape Airtable expects.
- typecast flag for auto-coercing values / auto-creating select options.
- performUpsert PATCH for idempotent sync by merge field.
- Batch create/delete endpoints (10-record cap per call).
- Sort + fields query params with URL-encoding (%5B / %5D).
- Named-view query that applies saved filter/sort server-side.
- Full pagination loop template (while loop with offset).
- Common filterByFormula patterns (exact match, contains, AND/OR,
date comparison, NOT empty).
- Rate-limit backoff guidance (Retry-After header, per-base budget).
- Airtable error-code reference (AUTHENTICATION_REQUIRED,
INVALID_PERMISSIONS, MODEL_ID_NOT_FOUND,
INVALID_MULTIPLE_CHOICE_OPTIONS) so the agent can map failures to
user-actionable fixes instead of just retrying.
Also: description trimmed from 183 chars (truncated to 60 in system
prompt, losing 'filter/upsert/delete' trigger terms) down to 59 chars
that render whole: 'Airtable REST API via curl. Records CRUD, filters,
upserts.' Catalog row updated to match.
SKILL.md grew from 115 to 228 lines — still under the 500-line soft
cap and below the linear skill (297 lines) which serves the same
role for GraphQL.
- scripts/release.py: map sonoyuncudmr@gmail.com -> Sonoyunchu so the
check-attribution CI job and release notes credit Soynchu correctly.
- website/docs/reference/skills-catalog.md: add the airtable row to
the productivity bundled-skills table.
Adds NOTION_API_KEY, LINEAR_API_KEY, TENOR_API_KEY, and AIRTABLE_API_KEY
to OPTIONAL_ENV_VARS so:
- They persist to ~/.hermes/.env via save_env_value like every other
key Hermes knows about, instead of being ad-hoc variables the user
has to hand-edit the dotfile for.
- load_env() / reload_env() populate os.environ from .env on every
startup — the user sets the key once, skills keep working across
restarts without losing access.
- hermes setup / hermes config show surface them as known optional
vars with the correct signup URL (linear.app/settings/api,
airtable.com/create/tokens, etc.).
These four entries use category="skill" (new) rather than "tool".
tools/environments/local.py auto-adds every category=tool/messaging
entry to _HERMES_PROVIDER_ENV_BLOCKLIST, which stops env passthrough
from leaking provider credentials into the execute_code sandbox
(GHSA-rhgp-j443-p4rf). Skill API keys are the opposite case — the
point is for the agent's subprocess to see them so curl can read
Authorization headers — so they must be outside the blocklist. The
new category is inert for that check.
All four entries are advanced=True: they show up in 'hermes config'
and 'hermes status' displays, but do not nag users who have never
touched those skills during setup checklists.
E2E verified: save_env_value → reload_env → os.environ populated →
skill_view reports setup_needed=False → env_passthrough registers
the key for subprocess inheritance.
Convert the airtable skill from 'skills.config.airtable.api_key'
(config.yaml, wrong bucket for a secret) to 'prerequisites.env_vars:
[AIRTABLE_API_KEY]' (~/.hermes/.env), matching every other bundled
skill that authenticates with an API token.
Why the original shape was wrong:
- metadata.hermes.config is for non-secret skill settings (paths,
preferences) per references/skill-config-interface.md. Storing a
bearer token under skills.config.* also triggered the documented
'hermes config migrate' nag-on-every-run problem.
- The Quick Reference's 'AIRTABLE_API_KEY=...' bash line couldn't
read skills.config.airtable.api_key anyway — it's a yaml path, not
an env var.
Follow-up polish on the same pass:
- Added version/author/license frontmatter to match notion/linear.
- Added prerequisites.commands: [curl].
- Setup section now specifies the PAT format (pat...) that replaced
legacy 'key...' API keys in Feb 2024, plus the three required scopes
(data.records:read/write, schema.bases:read) and the per-base Access
list requirement.
- Clarified PATCH vs PUT and pagination (100 records/page cap).
- Swapped verification from 'hermes -q ...' (non-deterministic) to a
curl /v0/meta/bases call that returns a verifiable HTTP status code.
_web_ui_build_needed() in PR #14914 checked web_dir/"dist" as the
sentinel, but vite.config.ts sets outDir: "../hermes_cli/web_dist" so
the build output lands in hermes_cli/web_dist/, never in web/dist/.
The sentinel was therefore always missing → _web_ui_build_needed always
returned True → npm install + Vite build ran on every startup → OOM on
low-memory VPS persisted unchanged.
Fix: derive dist_dir as web_dir.parent / "hermes_cli" / "web_dist" so
the sentinel points to the actual build output directory.
Fixes#14898
When the gateway intercepts a pending /update prompt and the user sends
a recognized slash command (/new, /help, ...), the command now dispatches
normally AND the detached update subprocess is unblocked by writing a
blank .update_response. _gateway_prompt reads '' → strips → returns the
prompt's default (typically a safe 'n' / skip), so the update process
exits cleanly instead of blocking on stdin until the 30-minute watcher
timeout.
Also clears _update_prompt_pending[session_key] on this path so stray
future input for the same session isn't re-intercepted.
Extends PR #15849 with tests for the new cancel-write + a regression
test pinning the legacy behavior of unrecognized /foo slash commands
still being consumed as the response.
Slack Bolt posts are not editable like CLI spinners; medium-tier new still emitted a permanent line per tool start (issue #14663).
- Built-in slack default: off; other tier-2 platforms unchanged.
- Adjust /verbose isolation test for off to new cycle.
- Migration tests: read/write config.yaml as UTF-8 (Windows locale).
Previously, setting SLACK_BOT_TOKEN in .env would unconditionally enable
the Slack gateway adapter regardless of `slack.enabled: false` in config.yaml.
This caused spurious "SLACK_APP_TOKEN not set" errors when the token was
used only by skills (e.g. cron jobs that send Slack messages) rather than
for the Hermes messaging gateway.
Now, enabled: false in config.yaml is respected — the token is stored so
skills can still use it, but the gateway adapter is not activated.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- TestAutoMaintenance gains 3 tests: auto-prune deletes transcript files
when sessions_dir is passed, preserves them when it isn't (backward-
compat), and never touches active-session files during prune.
- FakeDB helpers in test_sessions_delete.py accept **kwargs so they
don't break when delete_session signature gains sessions_dir.
`delete_session()` and `prune_sessions()` only removed SQLite records,
leaving .json/.jsonl transcript files on disk forever. Over time this
causes unbounded disk growth (~27MB/day observed).
Changes:
- Add `_remove_session_files()` static helper that cleans up
`{session_id}.json`, `.jsonl`, and `request_dump_{session_id}_*.json`
- `delete_session()` accepts optional `sessions_dir` param and removes
files for the deleted session and its children
- `prune_sessions()` accepts optional `sessions_dir` param and removes
files for all pruned sessions after the DB transaction
- Wire up CLI `hermes sessions delete` and `hermes sessions prune` to
pass `sessions_dir`
- File cleanup is best-effort (OSError silenced) so DB operations are
never blocked by filesystem issues
- Fully backward-compatible: `sessions_dir=None` (default) preserves
existing behavior
Extends the existing channel_skill_bindings mechanism (previously
Discord-only) to Slack, so a channel or DM can auto-load one or more
skills at session start without relying on the model's skill selector
for every short reply.
Motivation: Mats's German flashcards DM pushes a cron-driven card
5x/day; he responds with one-word guesses like 'work'. Previously each
reply required the main agent to decide whether to load german-flashcards
(full opus turn just to pick a skill). With the binding configured per
Slack channel, the skill is injected at session start and grading runs
directly.
Changes:
- Extract resolve_channel_skills() from DiscordAdapter._resolve_channel_skills
into gateway.platforms.base (now shared across adapters).
- DiscordAdapter._resolve_channel_skills delegates to the shared helper
(behavior preserved — existing test suite still passes unchanged).
- SlackAdapter: resolve channel_skill_bindings on each message and attach
auto_skill to MessageEvent. gateway/run.py already handles auto-skill
injection on new sessions; this just wires Slack through it.
- gateway/config.py: accept channel_skill_bindings in slack: block of
config.yaml (was Discord-only).
- Tests: new tests/gateway/test_slack_channel_skills.py with 11 cases
covering DM/thread/parent resolution, single-vs-list skills, dedup,
malformed entries. Discord suite unchanged.
- Docs: add 'Per-Channel Skill Bindings' section to Slack user guide.
Config example:
slack:
channel_skill_bindings:
- id: "D0ATH9TQ0G6"
skills: ["german-flashcards"]
Enter while the agent is busy can now inject the typed text via /steer —
arriving at the agent after the next tool call — instead of interrupting
(current default) or queueing for the next turn.
Changes:
- cli.py: keybinding honors busy_input_mode='steer' by calling
agent.steer(text) on the UI thread (thread-safe), with automatic
fallback to 'queue' when the agent is missing, steer() is unavailable,
images are attached, or steer() rejects the payload. /busy accepts
'steer' as a fourth argument alongside queue/interrupt/status.
- gateway/run.py: busy-message handler and the PRIORITY running-agent
path both route through running_agent.steer() when the mode is 'steer',
with the same fallback-to-queue safety net. Ack wording tells users
their message was steered into the current run. Restart-drain queueing
now also activates for 'steer' so messages aren't lost across restarts.
- agent/onboarding.py: first-touch hint has a steer branch for both
CLI and gateway.
- hermes_cli/commands.py: /busy args_hint updated to include steer,
and 'steer' is registered as a subcommand (completions).
- hermes_cli/web_server.py: dashboard select widget offers steer.
- hermes_cli/config.py, cli-config.yaml.example, hermes_cli/tips.py:
inline docs updated.
- website/docs/user-guide/cli.md + messaging/index.md: documented.
- Tests: steer set/status path for /busy; onboarding hints;
_load_busy_input_mode accepts steer; busy-session ack exercises
steer success + two fallback-to-queue branches.
Requested on X by @CodingAcct.
Default is unchanged (interrupt).
MCP stdio servers are spawned via the SDK's stdio_client, which on
Linux uses start_new_session=True (setsid). When a cron job is
cancelled mid-way (timeout, agent finish, exception), the subprocess
often escapes the SDK's teardown and survives as a session leader.
Because setsid() detaches the child from the gateway's process group
/ cgroup tree, systemd does not reap it on service restart either —
so every cron tick that touches an MCP tool leaks a dangling server
process.
Fix:
* tools/mcp_tool.py — _run_stdio now wraps the whole stdio+session
context in try/finally. On any exit path (clean, exception,
cancellation), PIDs still alive are moved from the active
_stdio_pids set into a new _orphan_stdio_pids set. Orphan
detection is done via os.kill(pid, 0) — a cheap liveness probe
that never signals the target.
* tools/mcp_tool.py — _kill_orphaned_mcp_children gains an
include_active=False flag. Default behaviour now only reaps the
orphan set so concurrent sessions (other parallel cron jobs or
live user chats) are never disrupted. The existing shutdown path
passes include_active=True to keep the previous "kill everything"
semantics after the MCP loop is stopped.
* cron/scheduler.py — the cleanup hook is moved from run_job()'s
finally (which would race with parallel siblings after #13021)
into tick() after the ThreadPoolExecutor has joined every future.
At that point there are no in-flight sessions from this tick, so
sweeping the orphan set is always safe.
Net effect: zero regression for healthy sessions, and orphan MCP
servers no longer accumulate between gateway restarts.
Made-with: Cursor
Multiple overlapping Slack attachment improvements:
1. Upload retry with backoff on transient errors (429, 5xx, connection
reset, rate_limited, service unavailable). New _is_retryable_upload_error
helper covers three upload paths: _upload_file, send_video,
send_document. Up to 3 attempts with 1.5s * attempt backoff.
2. Thread participation tracking: successful file uploads now add the
thread_ts to _bot_message_ts, mirroring how text replies are tracked.
This lets follow-up thread messages auto-trigger the bot (same
engagement rules as replied threads).
3. Thread metadata preservation in the image redirect-guard fallback
(send_image → send text fallback) and in two gateway.run.py send
paths (image + document fallback calls).
4. HTML response rejection in _download_slack_file_bytes. Parallels
the existing check in _download_slack_file. Guards against Slack
returning a sign-in / redirect page as document bytes when scopes
are missing, so the agent doesn't get HTML-as-a-PDF.
5. File lifecycle event acks (file_shared / file_created / file_change).
These events arrive around snippet uploads. Acking them silences the
slack_bolt 'Unhandled request' 404 warnings without changing behavior.
6. Post-loop message type classification so a mixed image+document upload
classifies as PHOTO (or VOICE if no image), falling back to DOCUMENT.
Previously, the per-file classification in the inbound loop could be
overwritten unpredictably.
7. Expanded text-inject whitelist in inbound document handling to cover
.csv, .json, .xml, .yaml, .yml, .toml, .ini, .cfg (up to 100KB) so
snippets and config files are directly visible to the agent, not just
cached as opaque uploads. Paired with new MIME entries in
SUPPORTED_DOCUMENT_TYPES in base.py.
Squashed from two commits in #11819 so the single commit carries the
contributor's GitHub attribution (the original commits were authored
under a local dev hostname).
- stringWidth: true LRU on cache hit (touch-on-read via delete+set) so
hot strings stay resident under long sessions; was insertion-order
FIFO before
- virtualHeights: include todos, panel sections, and intro version in
messageHeightKey so height-cache reuse correctly invalidates when
todo content / panel sections change
- virtualHeights: estimate trail+todos rows at todos.length+2 (or 2
collapsed) instead of the generic ~1-line fallback, so initial
virtualization offsets are closer to reality
- useInputHandlers: clearTimeout on unmount for scrollIdleTimer so
pending relaxStreaming() never fires after teardown
- render-node-to-output: drop unused declined.noHint counter from
scrollFastPathStats; it was always 0 (the "hint missing" branch is
outside the diagnostics block)
- perfPane / hermes-ink.d.ts: follow the noHint removal
- wheelAccel: replace ~/claude-code path comment with generic
attribution that doesn't reference a developer-local checkout
TodoPanel now renders as a child of the most recent user message's
virtualized row container, so it visually belongs to that prompt and
follows it during scroll. Falls back gracefully when no user message
exists yet (panel just doesn't render).
Adds an `evictInkCaches(level)` API that prunes the four hot module-level
caches (`widthCache`, `wrapCache`, `sliceCache`, `lineWidthCache`) with
either a half-keep LRU pass or a full clear. Wired into:
- memoryMonitor: half-prune on 'high', full drop on 'critical', before
the heap dump / auto-restart path. Gives long sessions a shot at
recovering RSS instead of hard-exiting.
- useSessionLifecycle.resetSession: half-prune so a /new session starts
with a half-warm pool and the prior session can resume cheaply.
Also: lineWidthCache now uses LRU half-eviction on overflow instead of a
full `cache.clear()`, matching the other three caches.
Comparison vs claude-code: both forks now share the same `prevScreen`
blit + dirty-cascade machinery in render-node-to-output. Their smoothness
came from sibling-memo discipline (every chrome pane memo'd so dirty
cascade doesn't disable transcript blit) — already in place in our
appLayout.tsx (TranscriptPane / ComposerPane / StatusRulePane all memo'd).
Alt-screen is not the cause; both use it. The remaining gap was per-row
CPU on width/wrap/slice, which the previous commit closed.
CPU profile (Apr 2026, real-user scroll on 11k-line session) showed three
hot loops in the per-frame render path:
Output.get() per-frame walk: 24% total
└─ sliceAnsi(line, from, to) per write: 18% total
stringWidth(line) chain (cached + JS): 14% total
All three were re-doing identical work every frame: same string → same
clipped slice → same width.
Fixes:
1. Memoize stringWidth (8k-entry LRU) for non-ASCII strings; ASCII fast-path
skips the cache (inline scan beats Map.get for short ASCII, the >90%
case). String.charCodeAt scan up to 64 chars is cheaper than the regex
fallback.
2. Memoize wrapText (4k-entry LRU keyed by maxWidth|wrapType|text) — wrapAnsi
is pure and the same content reflows identically every frame.
3. Memoize sliceAnsi (4k-entry LRU keyed by start|end|str) for the
end-defined hot path used by Output.get().
4. Skip the slice entirely in Output.get() when the line already fits the
clip box (startsBefore=false && endsAfter=false). Most transcript lines
never exceed their container width, and tokenizing them just to slice
(line, 0, width) was pure overhead. This single fast-path drops
sliceAnsi from 18% → ~0% in the profile.
Also tighten virtualization constants (MAX_MOUNTED 260→120, OVERSCAN 40→20,
SLIDE_STEP 25→12) and cap historical-message render at 800 chars / 16
lines via HISTORY_RENDER_MAX_*; messages inside the FULL_RENDER_TAIL_ITEMS
window still render in full so reading-zone behavior is unchanged.
Validation, real-user CPU profile, page-up scroll on 11k-line session:
Output.get() self-time: 24% → 0.3%
sliceAnsi total: 18% → not in top 25
stringWidth family: 14% → ~3%
idle: 60.7% → 77.3%
Frame timings (synthetic page-up profile harness):
dur p95: ~10ms → 4.87ms
dur p99: 25ms+ → 12.80ms
yoga p99: ~20ms → 1.87ms
The remaining CPU in the profile is Yoga layoutNode + React commit,
which is the irreducible work for this UI tree size.
Ports openclaw/openclaw#72038 to hermes-agent.
Telegram's `editMessageText` preserves the original message timestamp,
so a long-running streamed reply (reasoning models that take 60+ seconds
to finish) would keep the first-token timestamp even after completion.
Users can't tell how long a task actually took.
When a preview message has been visible for >= 60s (configurable via
`streaming.fresh_final_after_seconds`), finalize by sending a fresh
message instead of editing in place, then best-effort delete the stale
preview. Short previews still edit in place (the existing fast path).
Implementation notes adapted from OpenClaw's TypeScript original:
- `StreamConsumerConfig` gains `fresh_final_after_seconds` (default 0 =
legacy edit-in-place). Gateway-level `StreamingConfig` defaults to 60.
- `GatewayStreamConsumer` tracks `_message_created_ts` at first-send and
checks it in `_send_or_edit` on `finalize=True`. New helpers
`_should_send_fresh_final` + `_try_fresh_final`.
- `BasePlatformAdapter` gains optional `delete_message(chat_id, message_id)`
returning False by default. `TelegramAdapter` implements it via
`_bot.delete_message`.
- `gateway/run.py` only enables fresh-final for `Platform.TELEGRAM`;
other platforms ignore the setting (they don't have the stale-edit
timestamp problem or edit-then-read works cheaply).
- Fallback to normal edit on any fresh-send failure — no user-visible
regression if Telegram rate-limits a send or the message is gone.
Tests: 15 new cases in tests/gateway/test_stream_consumer_fresh_final.py
covering short/long previews, config plumbing, delete-support absent,
send-failure fallback, __no_edit__ sentinel safety, and StreamingConfig
round-trip.
Co-authored-by: Hermes Agent <agent@nousresearch.com>
Adds a corner-overlay FPS readout gated on HERMES_TUI_FPS, fed by
ink's onFrame callback (so it's the REAL render rate, not a timer).
Displays fps, last-frame duration, and total frame count, colored by
threshold (green ≥50, yellow ≥30, red below).
Implementation:
* lib/fpsStore.ts — nanostore atom updated from a trackFrame()
sink. Ring buffer of last 30 frame timestamps; fps = 29/elapsed.
trackFrame is undefined when SHOW_FPS is off so ink's onFrame
short-circuits at the optional chain.
* components/fpsOverlay.tsx — tiny <Text> subscriber; returns null
when SHOW_FPS is off (React skips the subtree entirely).
* entry.tsx — composes onFrame from logFrameEvent (dev-perf) and
trackFrame (fps) so both flags can coexist. When both are off,
onFrame is undefined and ink never attaches the handler.
* appLayout.tsx — mounts the overlay as a flex-shrink=0 right-
aligned Box below the composer, conditional on SHOW_FPS.
Usage:
HERMES_TUI_FPS=1 hermes --tui
# bottom right: " 62.3fps · 0.8ms · #1234" (green/yellow/red)
Intended as a user-facing diagnostic during the scroll-perf tuning
pass — watch the counter drop while holding PageUp to see where
frames go silent, without having to run scripts/profile-tui.py in a
side terminal.
126 files post-compile with React Compiler; 352 tests still pass.
Replaces the static WHEEL_SCROLL_STEP=1 multiplier on wheel events
with an adaptive accel state machine that infers user intent from
inter-event timing.
Algorithm ported straight from claude-code's
src/components/ScrollKeybindingHandler.tsx. All tuning constants,
the native/xterm.js path split, the encoder-bounce detection, the
trackpad-burst signature → all theirs. This file is a mechanical
port into our module structure.
What it does:
precision click (>500ms gap) 1 row/event (deliberate scan)
sustained mouse (40-200ms) 2-6 rows (decay curve)
detected wheel bounce ramps to 15 (sticky wheel-mode)
trackpad flick (5+ <5ms) 1 row/event (burst detect)
direction reversal reset to base
Two implementation paths:
* native terminals (ghostty, iTerm2, Kitty, WezTerm) — linear
window-ramp + optional wheel-mode curve triggered by detected
encoder bounce. SGR proportional reporting handled via the
burst-count guard.
* xterm.js (VS Code / Cursor / browser terminals) — pure
exponential-decay curve with fractional carry. Events arrive
1-per-notch with no pre-amplification, so the curve is more
aggressive.
Selected at construction via isXtermJs() from @hermes/ink (now
exported). Per-user tune via HERMES_TUI_SCROLL_SPEED (alias
CLAUDE_CODE_SCROLL_SPEED for portability).
13 unit tests covering direction flip/bounce/reversal, idle
disengage, trackpad-burst disengage, frac invariants, and the
native vs xterm.js branches.
Profiled under --rate 30 (stress test) and --rate 10 (realistic
sustained scroll): accel ramps to cap=6 at 30Hz burst, decays to
1-3 rows at sparse 10Hz clicks. Perf is comparable to baseline
because accel IS multiplying step — the win is perceptual (fast
flicks cover distance, slow clicks keep precision), not raw fps.
Companion to the earlier WHEEL_SCROLL_STEP=1 change: that set the
base; this modulates around it.
Was user-local in ~/.hermes/skills/. Ported into skills/software-development/
so other Hermes users get it and so the related_skills links from
node-inspect-debugger and python-debugpy resolve in-repo.
Frontmatter upgraded to match repo convention (version/author/license/
metadata.hermes.{tags,related_skills}, description rewritten as "Use when ...").
Body expanded with debugging-tactics section pointing at the two new
debugger skills, and additional common-issues / pitfalls entries.
Adds a gate so we can A/B test whether bypassing the alt-screen +
viewport constraint lets the terminal's native scrollback beat our
virtualization on scroll perf.
Result: definitively NO. Inline mode is 40x worse on every metric
that moves, because AlternateScreen is what constrains the ScrollBox
to the viewport height. Without it, the ScrollBox grows to contain
every child of the transcript and every frame re-renders all 1100
messages.
Profile under hold-wheel_up (1106-msg session, 30Hz for 6s):
metric fullscreen inline delta
patches_total 28,864 1,111,574 +3751%
writeBytes_total 42 KB 1.6 MB +3881%
fps_throughput 15.8 fps 1.75 fps -89%
frames 179 18 -90%
gap_p50_ms 17 (~60fps) 726 (~1fps) +4170%
yoga_p99 34 ms 405 ms +1083%
renderer_p99 14 ms 169 ms +1062%
flickers 0 5 offscreen —
This is actually the cleanest data we've gotten so far:
* AlternateScreen is LOAD-BEARING for perf — its viewport height
constraint is what lets useVirtualHistory's culling work. No
constraint → ScrollBox grows unbounded → every fiber mounts.
* The outer terminal (Cursor's xterm.js) parsed 1.6 MB of ANSI in
under 10 seconds with drain p99 = 8.83 ms and 0 backpressure
frames. Our terminal-write hypothesis from last session was
wrong: the bottleneck is React + Yoga, not the wire.
* Doing proper inline mode (non-virtualized transcript in
scrollback, composer pinned below) is not a flag flip — it's a
different UI architecture. Leaving this flag in so anyone
re-running the experiment gets the same numbers, but not
building the architecture until we're sure the perf win is
worth the UX loss (it probably isn't — the fullscreen + virt
path is the one we should optimize, not replace).
Keeping the flag as an experiment gate. Flip HERMES_TUI_INLINE=1
and run scripts/profile-tui.py --compare to reproduce.
Two new skills under skills/software-development/ for real breakpoint-driven
debugging from the terminal:
- node-inspect-debugger: node --inspect / --inspect-brk, node inspect REPL,
CDP scripting via chrome-remote-interface, attaching to running Node
processes (SIGUSR1), ui-tui-specific recipes, Vitest under debugger,
CPU profiles + heap snapshots.
- python-debugpy: pdb quick reference, breakpoint() workflow, pytest --pdb
(with xdist caveat for scripts/run_tests.sh), post-mortem, debugpy for
remote/attach, remote-pdb as the agent-friendly alternative to DAP,
recipes for tui_gateway/_SlashWorker/subprocess debugging.
Before: change code → build → run profile → manually compare to
mental model of last run. After: `--loop` watches ui-tui/src and
packages/hermes-ink/src for .ts(x) changes, rebuilds on change,
re-runs the same scenario, prints a side-by-side A/B diff against
the previous iteration — so each edit's impact is quantified
instantly. Ctrl+C to stop.
Also added:
--save LABEL saves metrics snapshot to /tmp/perf-<LABEL>.json
--compare LABEL diffs the current run vs that snapshot
--extra-flag X pass-through to node dist/entry.js (prepping for
--no-fullscreen below)
key_metrics() flattens a full run into scalar numbers across
frames, React commits, and per-phase timings. format_diff() prints
a table with ↑/↓ markers denoting regressions vs improvements based
on whether the metric is lower-is-better (p99, max, patches, drain)
or higher-is-better (fps, gaps_under_16ms).
Run-to-run noise on static code is ~5-15% on most metrics — big
signal (>30% change on renderer_p99 / fps) cuts through cleanly.
Useful both for validating a single fix and for detecting subtle
regressions during the wheel-accel port.
Usage during the next perf session:
# one-shot with a baseline for later comparison
scripts/profile-tui.py --seconds 6 --hold wheel_up --save pre-accel
# after porting the wheel handler
scripts/profile-tui.py --seconds 6 --hold wheel_up --compare pre-accel
# continuous iteration
scripts/profile-tui.py --seconds 6 --hold wheel_up --loop