hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-04-28 06:51:16 +08:00

Author	SHA1	Message	Date
ghostmfr	e818ec520a	fix(slack): harden attachment handling Multiple overlapping Slack attachment improvements: 1. Upload retry with backoff on transient errors (429, 5xx, connection reset, rate_limited, service unavailable). New _is_retryable_upload_error helper covers three upload paths: _upload_file, send_video, send_document. Up to 3 attempts with 1.5s * attempt backoff. 2. Thread participation tracking: successful file uploads now add the thread_ts to _bot_message_ts, mirroring how text replies are tracked. This lets follow-up thread messages auto-trigger the bot (same engagement rules as replied threads). 3. Thread metadata preservation in the image redirect-guard fallback (send_image → send text fallback) and in two gateway.run.py send paths (image + document fallback calls). 4. HTML response rejection in _download_slack_file_bytes. Parallels the existing check in _download_slack_file. Guards against Slack returning a sign-in / redirect page as document bytes when scopes are missing, so the agent doesn't get HTML-as-a-PDF. 5. File lifecycle event acks (file_shared / file_created / file_change). These events arrive around snippet uploads. Acking them silences the slack_bolt 'Unhandled request' 404 warnings without changing behavior. 6. Post-loop message type classification so a mixed image+document upload classifies as PHOTO (or VOICE if no image), falling back to DOCUMENT. Previously, the per-file classification in the inbound loop could be overwritten unpredictably. 7. Expanded text-inject whitelist in inbound document handling to cover .csv, .json, .xml, .yaml, .yml, .toml, .ini, .cfg (up to 100KB) so snippets and config files are directly visible to the agent, not just cached as opaque uploads. Paired with new MIME entries in SUPPORTED_DOCUMENT_TYPES in base.py. Squashed from two commits in #11819 so the single commit carries the contributor's GitHub attribution (the original commits were authored under a local dev hostname).	2026-04-26 18:20:17 -07:00
Brooklyn Nicholson	527ac351b4	fix(tui): address Copilot review comments - stringWidth: true LRU on cache hit (touch-on-read via delete+set) so hot strings stay resident under long sessions; was insertion-order FIFO before - virtualHeights: include todos, panel sections, and intro version in messageHeightKey so height-cache reuse correctly invalidates when todo content / panel sections change - virtualHeights: estimate trail+todos rows at todos.length+2 (or 2 collapsed) instead of the generic ~1-line fallback, so initial virtualization offsets are closer to reality - useInputHandlers: clearTimeout on unmount for scrollIdleTimer so pending relaxStreaming() never fires after teardown - render-node-to-output: drop unused declined.noHint counter from scrollFastPathStats; it was always 0 (the "hint missing" branch is outside the diagnostics block) - perfPane / hermes-ink.d.ts: follow the noHint removal - wheelAccel: replace ~/claude-code path comment with generic attribution that doesn't reference a developer-local checkout	2026-04-26 20:07:41 -05:00
Brooklyn Nicholson	b115ea62da	feat(tui): anchor LiveTodoPanel to latest user message row TodoPanel now renders as a child of the most recent user message's virtualized row container, so it visually belongs to that prompt and follows it during scroll. Falls back gracefully when no user message exists yet (panel just doesn't render).	2026-04-26 20:07:29 -05:00
Brooklyn Nicholson	25767513f2	perf(tui): unified Ink cache eviction on memory pressure + session reset Adds an `evictInkCaches(level)` API that prunes the four hot module-level caches (`widthCache`, `wrapCache`, `sliceCache`, `lineWidthCache`) with either a half-keep LRU pass or a full clear. Wired into: - memoryMonitor: half-prune on 'high', full drop on 'critical', before the heap dump / auto-restart path. Gives long sessions a shot at recovering RSS instead of hard-exiting. - useSessionLifecycle.resetSession: half-prune so a /new session starts with a half-warm pool and the prior session can resume cheaply. Also: lineWidthCache now uses LRU half-eviction on overflow instead of a full `cache.clear()`, matching the other three caches. Comparison vs claude-code: both forks now share the same `prevScreen` blit + dirty-cascade machinery in render-node-to-output. Their smoothness came from sibling-memo discipline (every chrome pane memo'd so dirty cascade doesn't disable transcript blit) — already in place in our appLayout.tsx (TranscriptPane / ComposerPane / StatusRulePane all memo'd). Alt-screen is not the cause; both use it. The remaining gap was per-row CPU on width/wrap/slice, which the previous commit closed.	2026-04-26 19:41:53 -05:00
Brooklyn Nicholson	c370e2e1e5	perf(tui): cache stringWidth/wrapText/sliceAnsi + skip-slice when line fits clip CPU profile (Apr 2026, real-user scroll on 11k-line session) showed three hot loops in the per-frame render path: Output.get() per-frame walk: 24% total └─ sliceAnsi(line, from, to) per write: 18% total stringWidth(line) chain (cached + JS): 14% total All three were re-doing identical work every frame: same string → same clipped slice → same width. Fixes: 1. Memoize stringWidth (8k-entry LRU) for non-ASCII strings; ASCII fast-path skips the cache (inline scan beats Map.get for short ASCII, the >90% case). String.charCodeAt scan up to 64 chars is cheaper than the regex fallback. 2. Memoize wrapText (4k-entry LRU keyed by maxWidth\|wrapType\|text) — wrapAnsi is pure and the same content reflows identically every frame. 3. Memoize sliceAnsi (4k-entry LRU keyed by start\|end\|str) for the end-defined hot path used by Output.get(). 4. Skip the slice entirely in Output.get() when the line already fits the clip box (startsBefore=false && endsAfter=false). Most transcript lines never exceed their container width, and tokenizing them just to slice (line, 0, width) was pure overhead. This single fast-path drops sliceAnsi from 18% → ~0% in the profile. Also tighten virtualization constants (MAX_MOUNTED 260→120, OVERSCAN 40→20, SLIDE_STEP 25→12) and cap historical-message render at 800 chars / 16 lines via HISTORY_RENDER_MAX_*; messages inside the FULL_RENDER_TAIL_ITEMS window still render in full so reading-zone behavior is unchanged. Validation, real-user CPU profile, page-up scroll on 11k-line session: Output.get() self-time: 24% → 0.3% sliceAnsi total: 18% → not in top 25 stringWidth family: 14% → ~3% idle: 60.7% → 77.3% Frame timings (synthetic page-up profile harness): dur p95: ~10ms → 4.87ms dur p99: 25ms+ → 12.80ms yoga p99: ~20ms → 1.87ms The remaining CPU in the profile is Yoga layoutNode + React commit, which is the irreducible work for this UI tree size.	2026-04-26 19:28:09 -05:00
Teknium	b16f9d438b	feat(telegram): send fresh finals for stale preview streams (port openclaw#72038) (#16261 ) Ports openclaw/openclaw#72038 to hermes-agent. Telegram's `editMessageText` preserves the original message timestamp, so a long-running streamed reply (reasoning models that take 60+ seconds to finish) would keep the first-token timestamp even after completion. Users can't tell how long a task actually took. When a preview message has been visible for >= 60s (configurable via `streaming.fresh_final_after_seconds`), finalize by sending a fresh message instead of editing in place, then best-effort delete the stale preview. Short previews still edit in place (the existing fast path). Implementation notes adapted from OpenClaw's TypeScript original: - `StreamConsumerConfig` gains `fresh_final_after_seconds` (default 0 = legacy edit-in-place). Gateway-level `StreamingConfig` defaults to 60. - `GatewayStreamConsumer` tracks `_message_created_ts` at first-send and checks it in `_send_or_edit` on `finalize=True`. New helpers `_should_send_fresh_final` + `_try_fresh_final`. - `BasePlatformAdapter` gains optional `delete_message(chat_id, message_id)` returning False by default. `TelegramAdapter` implements it via `_bot.delete_message`. - `gateway/run.py` only enables fresh-final for `Platform.TELEGRAM`; other platforms ignore the setting (they don't have the stale-edit timestamp problem or edit-then-read works cheaply). - Fallback to normal edit on any fresh-send failure — no user-visible regression if Telegram rate-limits a send or the message is gone. Tests: 15 new cases in tests/gateway/test_stream_consumer_fresh_final.py covering short/long previews, config plumbing, delete-support absent, send-failure fallback, __no_edit__ sentinel safety, and StreamingConfig round-trip. Co-authored-by: Hermes Agent <agent@nousresearch.com>	2026-04-26 17:26:37 -07:00
Brooklyn Nicholson	85e9a23efb	feat(tui): HERMES_TUI_FPS=1 shows live fps counter Adds a corner-overlay FPS readout gated on HERMES_TUI_FPS, fed by ink's onFrame callback (so it's the REAL render rate, not a timer). Displays fps, last-frame duration, and total frame count, colored by threshold (green ≥50, yellow ≥30, red below). Implementation: * lib/fpsStore.ts — nanostore atom updated from a trackFrame() sink. Ring buffer of last 30 frame timestamps; fps = 29/elapsed. trackFrame is undefined when SHOW_FPS is off so ink's onFrame short-circuits at the optional chain. * components/fpsOverlay.tsx — tiny <Text> subscriber; returns null when SHOW_FPS is off (React skips the subtree entirely). * entry.tsx — composes onFrame from logFrameEvent (dev-perf) and trackFrame (fps) so both flags can coexist. When both are off, onFrame is undefined and ink never attaches the handler. * appLayout.tsx — mounts the overlay as a flex-shrink=0 right- aligned Box below the composer, conditional on SHOW_FPS. Usage: HERMES_TUI_FPS=1 hermes --tui # bottom right: " 62.3fps · 0.8ms · #1234" (green/yellow/red) Intended as a user-facing diagnostic during the scroll-perf tuning pass — watch the counter drop while holding PageUp to see where frames go silent, without having to run scripts/profile-tui.py in a side terminal. 126 files post-compile with React Compiler; 352 tests still pass.	2026-04-26 17:20:47 -05:00
Brooklyn Nicholson	4395c2b007	feat(tui): port claude-code's wheel accel state machine Replaces the static WHEEL_SCROLL_STEP=1 multiplier on wheel events with an adaptive accel state machine that infers user intent from inter-event timing. Algorithm ported straight from claude-code's src/components/ScrollKeybindingHandler.tsx. All tuning constants, the native/xterm.js path split, the encoder-bounce detection, the trackpad-burst signature → all theirs. This file is a mechanical port into our module structure. What it does: precision click (>500ms gap) 1 row/event (deliberate scan) sustained mouse (40-200ms) 2-6 rows (decay curve) detected wheel bounce ramps to 15 (sticky wheel-mode) trackpad flick (5+ <5ms) 1 row/event (burst detect) direction reversal reset to base Two implementation paths: * native terminals (ghostty, iTerm2, Kitty, WezTerm) — linear window-ramp + optional wheel-mode curve triggered by detected encoder bounce. SGR proportional reporting handled via the burst-count guard. * xterm.js (VS Code / Cursor / browser terminals) — pure exponential-decay curve with fractional carry. Events arrive 1-per-notch with no pre-amplification, so the curve is more aggressive. Selected at construction via isXtermJs() from @hermes/ink (now exported). Per-user tune via HERMES_TUI_SCROLL_SPEED (alias CLAUDE_CODE_SCROLL_SPEED for portability). 13 unit tests covering direction flip/bounce/reversal, idle disengage, trackpad-burst disengage, frac invariants, and the native vs xterm.js branches. Profiled under --rate 30 (stress test) and --rate 10 (realistic sustained scroll): accel ramps to cap=6 at 30Hz burst, decays to 1-3 rows at sparse 10Hz clicks. Perf is comparable to baseline because accel IS multiplying step — the win is perceptual (fast flicks cover distance, slow clicks keep precision), not raw fps. Companion to the earlier WHEEL_SCROLL_STEP=1 change: that set the base; this modulates around it.	2026-04-26 17:16:11 -05:00
Brooklyn Nicholson	0cd98499bb	Promote debugging-hermes-tui-commands to in-repo skill Was user-local in ~/.hermes/skills/. Ported into skills/software-development/ so other Hermes users get it and so the related_skills links from node-inspect-debugger and python-debugpy resolve in-repo. Frontmatter upgraded to match repo convention (version/author/license/ metadata.hermes.{tags,related_skills}, description rewritten as "Use when ..."). Body expanded with debugging-tactics section pointing at the two new debugger skills, and additional common-issues / pitfalls entries.	2026-04-26 17:13:12 -05:00
Brooklyn Nicholson	4cdb6962ca	Add hermes-agent-skill-authoring skill Class-level skill for writing SKILL.md files inside this repo: required frontmatter per tools/skill_manager_tool.py validator, size limits, peer-matched structure, directory placement, write_file vs skill_manage, caching pitfalls, cross-reference caveats.	2026-04-26 17:12:25 -05:00
Brooklyn Nicholson	9a46feb9bd	experiment(tui): HERMES_TUI_INLINE flag to skip AlternateScreen Adds a gate so we can A/B test whether bypassing the alt-screen + viewport constraint lets the terminal's native scrollback beat our virtualization on scroll perf. Result: definitively NO. Inline mode is 40x worse on every metric that moves, because AlternateScreen is what constrains the ScrollBox to the viewport height. Without it, the ScrollBox grows to contain every child of the transcript and every frame re-renders all 1100 messages. Profile under hold-wheel_up (1106-msg session, 30Hz for 6s): metric fullscreen inline delta patches_total 28,864 1,111,574 +3751% writeBytes_total 42 KB 1.6 MB +3881% fps_throughput 15.8 fps 1.75 fps -89% frames 179 18 -90% gap_p50_ms 17 (~60fps) 726 (~1fps) +4170% yoga_p99 34 ms 405 ms +1083% renderer_p99 14 ms 169 ms +1062% flickers 0 5 offscreen — This is actually the cleanest data we've gotten so far: * AlternateScreen is LOAD-BEARING for perf — its viewport height constraint is what lets useVirtualHistory's culling work. No constraint → ScrollBox grows unbounded → every fiber mounts. * The outer terminal (Cursor's xterm.js) parsed 1.6 MB of ANSI in under 10 seconds with drain p99 = 8.83 ms and 0 backpressure frames. Our terminal-write hypothesis from last session was wrong: the bottleneck is React + Yoga, not the wire. * Doing proper inline mode (non-virtualized transcript in scrollback, composer pinned below) is not a flag flip — it's a different UI architecture. Leaving this flag in so anyone re-running the experiment gets the same numbers, but not building the architecture until we're sure the perf win is worth the UX loss (it probably isn't — the fullscreen + virt path is the one we should optimize, not replace). Keeping the flag as an experiment gate. Flip HERMES_TUI_INLINE=1 and run scripts/profile-tui.py --compare to reproduce.	2026-04-26 17:11:49 -05:00
Brooklyn Nicholson	8d2b08342c	Add node-inspect-debugger and python-debugpy skills Two new skills under skills/software-development/ for real breakpoint-driven debugging from the terminal: - node-inspect-debugger: node --inspect / --inspect-brk, node inspect REPL, CDP scripting via chrome-remote-interface, attaching to running Node processes (SIGUSR1), ui-tui-specific recipes, Vitest under debugger, CPU profiles + heap snapshots. - python-debugpy: pdb quick reference, breakpoint() workflow, pytest --pdb (with xdist caveat for scripts/run_tests.sh), post-mortem, debugpy for remote/attach, remote-pdb as the agent-friendly alternative to DAP, recipes for tui_gateway/_SlashWorker/subprocess debugging.	2026-04-26 17:10:11 -05:00
Brooklyn Nicholson	82f842277e	perf(tui): profile harness gains --loop, --save, --compare Before: change code → build → run profile → manually compare to mental model of last run. After: `--loop` watches ui-tui/src and packages/hermes-ink/src for .ts(x) changes, rebuilds on change, re-runs the same scenario, prints a side-by-side A/B diff against the previous iteration — so each edit's impact is quantified instantly. Ctrl+C to stop. Also added: --save LABEL saves metrics snapshot to /tmp/perf-<LABEL>.json --compare LABEL diffs the current run vs that snapshot --extra-flag X pass-through to node dist/entry.js (prepping for --no-fullscreen below) key_metrics() flattens a full run into scalar numbers across frames, React commits, and per-phase timings. format_diff() prints a table with ↑/↓ markers denoting regressions vs improvements based on whether the metric is lower-is-better (p99, max, patches, drain) or higher-is-better (fps, gaps_under_16ms). Run-to-run noise on static code is ~5-15% on most metrics — big signal (>30% change on renderer_p99 / fps) cuts through cleanly. Useful both for validating a single fix and for detecting subtle regressions during the wheel-accel port. Usage during the next perf session: # one-shot with a baseline for later comparison scripts/profile-tui.py --seconds 6 --hold wheel_up --save pre-accel # after porting the wheel handler scripts/profile-tui.py --seconds 6 --hold wheel_up --compare pre-accel # continuous iteration scripts/profile-tui.py --seconds 6 --hold wheel_up --loop	2026-04-26 17:08:07 -05:00
Brooklyn Nicholson	f823535db2	perf(tui): instrument stdout drain — rule out terminal parse bottleneck Adds four fields to FrameEvent.phases and the matching profile summary: optimizedPatches post-optimize patch count (what's actually written to stdout; the .patches field is pre-optimize) writeBytes UTF-8 byte count of the write this frame backpressure true when Node's stdout.write returned false (Writable buffer full — outer terminal can't keep up) prevFrameDrainMs end-to-end drain time of the PREVIOUS frame's write, captured from stdout.write's 2-arg callback. Reported on the next frame so the measurement reflects "time until OS flushed the bytes to the terminal fd", not "time until queued in Node". writeDiffToTerminal() now returns { bytes, backpressure } and accepts an optional onDrain callback. Only attached on TTY with diff; piped/non-TTY stdout bypasses flow control so the callback would fire synchronously anyway. Initial measurements under hold-wheel_up against 1106-msg session (30Hz for 6s): patches total 28,888 optimized total 16,700 (ratio 0.58 — optimizer cuts ~42%) writeBytes 42 KB / 10s = 4.2 KB/s throughput drainMs p50 0.14 ms terminal accepts bytes instantly drainMs p99 0.85 ms backpressure 0% of frames This rules out the terminal-parse hypothesis — Cursor's xterm.js drains our output in sub-millisecond time at only 4 KB/s. The remaining lag has to be in the render pipeline, not the wire. Profile output now includes the bytes+drain+backpressure lines to keep this visible on every subsequent iteration.	2026-04-26 17:06:22 -05:00
Brooklyn Nicholson	d3dedf10aa	revert(tui): drop DeferredMd, profiling showed it was neutral Profiled with scripts/profile-tui.py under hold-PageUp + hold-wheel. The placeholder → microtask-upgrade pattern did not reduce renderer p99 (63ms → 63ms) or max (96ms → 142ms, slightly worse). Each fresh row still pays the Md cost — just on a follow-up commit instead of inline — and the follow-up commit shows up as a second heavy frame a few ms later. The real bottlenecks turned out to be: 1. wheel step too large (fixed in `7ca16eea`) 2. outer terminal ANSI parse throughput (diagnosing next) 3. React commit frequency during hold-scroll (needs coalescing) None of which DeferredMd addresses. Clearing the complexity so the next experiments land on a simpler substrate.	2026-04-26 17:03:38 -05:00
Brooklyn Nicholson	7ca16eea56	perf(tui): scroll one row at a time per wheel event, half-viewport per pageUp User observation: "it doesn't scroll line by line/row by row." Was right. Two places hardcoded big deltas: 1. WHEEL_SCROLL_STEP = 6 (config/limits.ts) Each wheel event scrolled 6 rows. A mechanical wheel notch emits 3-5 events → 18-30 rows per click, which visually teleports past content instead of smooth-scrolling it. Drop to 1. Trackpads emit 50-100 events per flick — at step=1 that's still a fast flick (a whole viewport in one flick) but each intermediate frame is visible. Porting claude-code's wheel accel state machine is the right next step if this feels sluggish on precision scrolls. 2. pageUp/pageDown = viewport - 2 (useInputHandlers.ts) Full-viewport jumps replace the entire screen — no visual continuity, can't scan content — AND land right at Ink's fast-path threshold (`delta < innerHeight`), which disqualifies the DECSTBM blit on every press. Half-viewport keeps 50% continuity AND drops well under the threshold. Two presses still cover the same total distance. Profiled against the 1106-msg session, holding the key at 30Hz for 6s: wheel_up (step 6 → 1): frames 142 → 163 (+15%) throughput 10.7 → 15.8 fps (+48%) patches tot 53018→ 36562 (-31%) gap p50 5ms → 16ms (actual rendering ~60fps now) <16ms frames 93 → 76 16-33ms 82 → 76 hitches 3 → 1 pageUp (viewport-2 → viewport/2): throughput 10.7 → 9.5 fps (same ballpark — smaller delta × same event rate = less total scroll) Ink's proportional drain caps at `innerHeight - 1` per frame to keep the DECSTBM fast path firing. With these smaller deltas every event comfortably fits under that cap, so fast-path hit rate goes up and patch volume per frame drops — the measured 31% reduction in total patches-sent correlates with users perceiving smoother scrolling because the outer terminal (VS Code / xterm.js / tmux) isn't drowning in ANSI between paints. Tests/type-check/build clean; 352 tests pass.	2026-04-26 17:01:22 -05:00
Brooklyn Nicholson	4a9070c9ac	perf(tui): defer Md upgrade for fresh-mounted assistant rows Adds DeferredMd — a wrapper around <Md> that renders a lightweight <Text> placeholder on first mount and upgrades to the full markdown subtree on a queueMicrotask follow-up. Rationale: fresh MessageLine mounts during PageUp hold run our markdown tokenizer + syntax highlighter synchronously, producing the 63-112ms renderer spikes profiled earlier. A plain <Text> placeholder only needs Yoga to wrap the pre-stripped string (no tokenizer, no highlight), then the Md subtree builds in a follow-up React commit. Upgrade cache: once a (theme, compact, text) tuple has been upgraded, a WeakMap-keyed Set remembers it so remounts (scroll-out then scroll-back) mount straight into <Md> — no placeholder round-trip. WeakMap on theme means palette swaps re-upgrade naturally. Honesty note: profiling under hold-PageUp showed this didn't reduce renderer p99 measurably — the upgrade commit just pays the Md cost on a follow-up frame instead of inline. The bigger bottleneck turned out to be React commit frequency (3.5 commits/sec during 30Hz scroll input, with 200ms+ silent gaps between commits dominating perceived FPS), which this change doesn't address. Keeping the deferred path anyway because: 1. It's correct and tested — no regressions across 352 tests 2. Defensive for pathological fresh-mount cases (giant code blocks, wide tables) that aren't in the current profile fixture 3. Pairs naturally with useVirtualHistory's useDeferredValue to keep React's concurrent scheduler able to interrupt upgrade commits If the follow-up perf investigation (terminal write throughput / patch volume / commit frequency) shows DeferredMd is net-neutral-or-worse in practice, this can be reverted with a one-line swap back to <Md> in messageLine.tsx:115. Companion to the streaming 2-column fix in `7242361a` — these two touched messageLine.tsx together so they land as a pair.	2026-04-26 16:56:09 -05:00
Brooklyn Nicholson	7242361a69	fix(tui): wrap streaming markdown split in column Box StreamingMd returned <><Md/><Md/></> — a bare Fragment with two <Md> children. Each <Md> returns a <Box flexDirection="column">, but its parent in messageLine.tsx (line 169) is `<Box width={...}>` with no flexDirection, which Ink defaults to 'row'. So during streaming the two column boxes rendered side-by-side, producing the visible "tokens jumble into two columns until it fixes itself" bug — the "fix" was message.complete flipping isStreaming→false, which swaps the StreamingMd subtree for a single DeferredMd/Md child (no siblings → row direction is harmless). Wrap the two <Md> siblings in a flexDirection="column" Box so they stack. Localized fix so the non-streaming path (single-child, works fine in a row parent) is untouched. Reported by user: > "tokens streaming... going into 2 columns randomly and jumbling > together until it fixes itself" No test changes — findStableBoundary tests still pass (the layout change is parent-structural, not in the boundary logic). Build clean, tsc clean, 352 tests pass.	2026-04-26 16:55:56 -05:00
Brooklyn Nicholson	cd7a200e6c	perf(tui): instrument scroll fast-path decline reasons Adds scrollFastPathStats counters to render-node-to-output.ts: captures every time a ScrollBox's DECSTBM scroll hint is generated, records whether the fast path took it (blit+shift from prevScreen) or declined, and why. Exposed through hermes-ink's public exports and snapshotted on every FrameEvent so the profiler harness can correlate decline reasons with the actual patch/renderer cost per frame. This is pure observation — no behaviour change. Preparing for the virtual-history rewrite: the hypothesis was that our topSpacer/ bottomSpacer scheme disqualifies every scroll via heightDelta mismatch, but the data shows the fast path is actually taken on most scrolls (19/23 over a 6s PageUp hold through 1100 messages) — the remaining steady-state renderer cost is Yoga tree traversal, not the per-frame full redraw I initially suspected. Declines that do happen correlate with React commits that changed the mounted range mid-scroll (heightDelta=±3 to ±35). Those are the rarer cases the virtualization rewrite still needs to address. No test diffs — instrumentation-only. Build verified: `tsc --noEmit` plus the full `npm run build` compiler post-pass pass cleanly.	2026-04-26 16:45:53 -05:00
Brooklyn Nicholson	71eee26640	perf(tui): full-pipeline instrumentation + profiling harness Extends HERMES_DEV_PERF to capture the complete render pipeline, not just React commits. Adds scripts/profile-tui.py to drive repeatable hold-PageUp stress tests against a real long session. perfPane.tsx: Wires ink's onFrame callback (already plumbed through the fork) into the same perf.log as the React.Profiler samples. Captures per-phase timing (yoga calculateLayout, renderNodeToOutput, screen diff, patch optimize, stdout write) plus yoga counters (visited/measured/cache- Hits/live) and patch counts per frame. Events are tagged {src: 'react'\|'frame'} so jq can split them. logFrameEvent is undefined when HERMES_DEV_PERF is unset, so ink doesn't even attach the callback. entry.tsx: Passes logFrameEvent into render(). types/hermes-ink.d.ts: Declares FrameEvent + onFrame on RenderOptions so the ui-tui side type-checks against the plumbed-through ink option. scripts/profile-tui.py: New harness. Launches the built TUI under a PTY with the longest session in state.db resumed, holds PageUp/PageDown/etc at a configurable Hz for N seconds, then parses perf.log and prints per-phase p50/p95/p99/max plus yoga-counter summaries. Zero deps beyond stdlib. Exit 2 if nothing was captured (wiring broken). Initial findings (1106-msg session, 6s PageUp hold at 30Hz): - Steady state: 10 fps; renderer phase p99=63ms, write p99=0.2ms - 4/107 heavy frames (>=16ms), all dominated by renderNodeToOutput - One pathological 97ms frame with yoga measuring 70,415 text cells and Yoga visiting 225k nodes — the cold-unmeasured-region hit - Ink's scroll fast-path (DECSTBM blit from prevScreen) is disqualified because our spacer-based virtual history doesn't keep heightDelta in sync with scroll.delta, so every PageUp step falls through to a full 2000-4800 patch re-render instead of ~40	2026-04-26 16:36:25 -05:00
Brooklyn Nicholson	69ff201050	feat(tui): anchor todo panel above streaming output	2026-04-26 16:26:50 -05:00
Brooklyn Nicholson	2259eac49e	feat(tui): collapse completed todo panel on turn end	2026-04-26 16:24:15 -05:00
Brooklyn Nicholson	cb7cfba6de	fix(cli): surface last_active in search_sessions so -c works	2026-04-26 16:21:57 -05:00
Brooklyn Nicholson	debae25f1c	perf(tui): incremental markdown during streaming Split in-flight assistant text at the last stable block boundary so only the unclosed tail re-tokenizes per stream delta. Previously the full text was rendered as plain <Text> during streaming and only flipped to <Md> at message.complete — cheap per delta but loses live markdown formatting. New StreamingMd component holds a monotonically-growing stablePrefix in a ref (idempotent under StrictMode double-render), renders it as one <Md> that memoizes across deltas, and renders the unstable suffix as a second <Md> that re-parses on each delta. Cost per delta drops from O(total length) to O(unstable length). findStableBoundary walks back to the last "\n\n" outside an open fenced code block — splitting inside an open fence would orphan the opener and break highlighting in the prefix. Adapted from claude-code's src/components/Markdown.tsx:186 but built on our line-based tokenizer instead of marked.lexer. 9 new tests cover fence balance, boundary walk, and empty input. Part of the --tui perf audit (see audit #7).	2026-04-26 16:21:34 -05:00
Brooklyn Nicholson	bde89c169b	fix(cli): -c picks the most recently used session	2026-04-26 16:17:39 -05:00
Brooklyn Nicholson	b36007b246	feat(tui): allow collapsing archived todo panels	2026-04-26 16:15:59 -05:00
Brooklyn Nicholson	c78b528125	feat(tui): archive todos at turn end with incomplete hint	2026-04-26 16:14:58 -05:00
Brooklyn Nicholson	319c1c1691	fix(tui): inline todo in transcript, group across thinking	2026-04-26 16:09:28 -05:00
Brooklyn Nicholson	4943ea2a7c	fix(tui): merge tools into contextual shelves	2026-04-26 16:00:38 -05:00
Brooklyn Nicholson	4d3e3a738d	chore(tui): sort imports	2026-04-26 15:56:47 -05:00
Brooklyn Nicholson	a5319fb7af	test(tui): cover live todo completion flow	2026-04-26 15:56:08 -05:00
Brooklyn Nicholson	f5552f92e2	fix(tui): stabilize live todo progress	2026-04-26 15:55:38 -05:00
Brooklyn Nicholson	1566f1eecc	fix(tui): report actual session on exit	2026-04-26 15:55:01 -05:00
Brooklyn Nicholson	a30db69dd5	chore(tui): clean live progress lint	2026-04-26 15:42:07 -05:00
Brooklyn Nicholson	f6846205cc	fix(tui): isolate turn state from app render	2026-04-26 15:40:38 -05:00
Brooklyn Nicholson	6a3873942f	fix(tui): format thinking paragraphs	2026-04-26 15:38:18 -05:00
Brooklyn Nicholson	64de685d3f	test(tui): remove stale turn freeze experiment	2026-04-26 15:35:41 -05:00
Brooklyn Nicholson	cee4036e8b	fix(tui): merge tool shelves in transcript	2026-04-26 15:35:38 -05:00
Brooklyn Nicholson	cf8439263a	fix(tui): keep todo pinned outside transcript	2026-04-26 15:33:01 -05:00
Brooklyn Nicholson	3271ffbd80	fix(tui): pin todo panel above live output	2026-04-26 15:27:31 -05:00
Brooklyn Nicholson	a7831b63db	fix(tui): stabilize live progress rendering	2026-04-26 15:23:43 -05:00
Brooklyn Nicholson	d4dde6b5f2	fix(tui): restore resumed transcript lineage	2026-04-26 15:16:12 -05:00
Teknium	755a280424	chore(release): map Wang-tianhao in AUTHOR_MAP	2026-04-26 13:02:51 -07:00
Wang-tianhao	6087e04043	fix(slack): extract rich_text quotes/lists and link unfurl previews Slack's modern composer sends messages with a 'blocks' array that contains rich_text elements. When a user forwards or quotes another message, the quoted content shows up in the rich_text_quote children of that array — and is NOT included in the plain 'text' field. The agent saw only the lossy plain text and was blind to forwarded / quoted content. Same story for link unfurl previews (Notion, docs, GitHub, etc.) which Slack puts in the 'attachments' array. Two fixes in the inbound handler: 1. _extract_text_from_slack_blocks walks rich_text / rich_text_quote / rich_text_list / rich_text_preformatted trees and renders readable text ('> quoted', '• bullet', code fences), dedupes against the plain text field, and appends the extracted content so the agent sees everything. 2. Link unfurl / attachment preview extraction reads title, url, body, and footer from the 'attachments' array and appends a '📎 [title](url)\n body\n _footer_' section per preview. Skips is_msg_unfurl to avoid echoing our own Slack replies back. Routing is careful not to trust augmented text: mention gating (is_mentioned) and slash-command detection both run against the original 'text' field, so forwarded content containing '<@bot>' or '/deploy' in a quote can't trick the bot into responding in a channel it shouldn't or classifying a normal message as a command. Adjustment from original PR: dropped _serialize_slack_blocks_for_agent, which inlined a redacted JSON dump of non-rich_text blocks (section, accessory, actions, etc.) — the agent would see the raw Block Kit structure for UI-heavy alerts. It added up to 6000 characters to the prompt context on every qualifying message with no opt-out. The rich_text extraction and attachment unfurls cover the common bug-fix case (quoted/forwarded content + link previews) without the prefill tax. If a user needs block inspection later, it can return as a config opt-in. Also updates the Slack platform notes in session.py to accurately describe what the gateway inlines.	2026-04-26 13:02:51 -07:00
Teknium	4921b26945	fix(cron): keep homeassistant toolset enabled when HASS_TOKEN is set (#16208 ) After #14798 made cron honor per-platform `hermes tools` config, the `_DEFAULT_OFF_TOOLSETS` filter silently stripped `homeassistant` from cron jobs for users who'd been relying on the previous blanket toolset. Norbert's HA cron reports regressed as a result. The HA toolset is already runtime-gated by its `check_fn` (requires HASS_TOKEN to register any tools). When HASS_TOKEN is set the user has explicitly opted in — `_DEFAULT_OFF_TOOLSETS` adds nothing in that case, so stop double-gating and restore HA for cron / cli / other platforms without an explicit saved toolset list. moa and rl stay off by default (original #14798 goal preserved). Fixes HA cron regression reported by Norbert.	2026-04-26 12:55:58 -07:00
Teknium	822b507a72	chore(release): map maxims-oss in AUTHOR_MAP	2026-04-26 12:54:46 -07:00
maxims-oss	18beb69b49	fix(memory): close embedded Hindsight async client cleanly HindsightEmbedded.close() delegates to its sync client.close(). When Hermes created/used that client on the shared async loop, closing it from the main thread raises 'attached to a different loop' before aiohttp releases the session — so the ClientSession / TCPConnector leak past provider teardown. Close the embedded inner async client on the shared loop first via _run_sync(inner_client.aclose()), then let the wrapper's sync close() do its daemon/UI bookkeeping. Salvage of #14605: test placement rebased — appended TestShutdown class after TestSharedEventLoopLifecycle (which landed on main after the PR was written). Original author attribution preserved.	2026-04-26 12:54:46 -07:00
Tranquil-Flow	bf05b8f4a2	fix(gateway): clean up cached agents on shutdown (#11205 )	2026-04-26 12:51:53 -07:00
Zainan Victor Zhou	778fd1898e	fix(slack): surface attachment access diagnostics Translate Slack attachment failures into actionable user-facing notices instead of generic download errors. When a scope/auth/permission issue breaks attachment processing, the user sees: [Slack attachment notice] - Slack attachment access failed for photo.jpg. Missing scope: files:read. Update the Slack app scopes/settings and reinstall the app to the workspace. Two helpers do the translation: _describe_slack_api_error — handles SlackApiError responses (missing_scope, invalid_auth, file_not_found, access_denied, etc.) _describe_slack_download_failure — handles httpx.HTTPStatusError (401/403/404) and Slack-returns-HTML-sign-in fallbacks Wired into three existing call sites: - the Slack Connect files.info path (PR #11111) so scope errors surface instead of being logged as generic "files.info failed" - the image, audio, and document download paths so 401/403 and HTML-body responses translate into actionable notices Adjustment from original PR: dropped _probe_slack_file_access_issue, the proactive pre-download files.info probe. It added one extra Slack API call per attachment even on healthy ones, and overlapped with the existing files.info call from PR #11111. The post-failure translation path covers the same user-facing diagnostic value without the per-message tax. Also documents files:read scope more prominently in the Slack setup guide and troubleshooting table. Contributed back from https://github.com/xinbenlv/zn-hermes-agent. Closes #7015. Co-authored-by: xinbenlv <zzn+pa@zzn.im>	2026-04-26 12:47:43 -07:00
Teknium	45bfcb9e71	test: update bare-agent helper for live-runtime attrs added by #16099 Background review fork now inherits session_id, credential_pool, and status_callback from the parent (added in #16099 after this PR was written). Extend the bare-agent helper so the regression test keeps reaching the cleanup assertions instead of failing in the runtime resolver. Signed-off-by: Teknium <8425893+teknium1@users.noreply.github.com>	2026-04-26 12:45:39 -07:00

1 2 3 4 5 ...

6207 Commits