hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-04-28 06:51:16 +08:00

Author	SHA1	Message	Date
Teknium	34eb1aaa9a	fix(update): use npm ci to stop rewriting package-lock on every update (#16295 ) `npm install --silent` (used by `_build_web_ui` and `_update_node_dependencies`) silently rewrites package-lock.json on npm ≥ 10 (strips "peer": true etc.), leaving the working tree dirty after every `hermes update`. The next update then detects the dirty lockfile and stashes it — producing a trail of hermes-update-autostash entries for web/package-lock.json, ui-tui/package-lock.json, and root package-lock.json. Switch to `npm ci` (strict, lockfile-preserving) via a new `_run_npm_install_deterministic` helper that falls back to `npm install` when the lockfile is missing or out of sync (WIP forks). Verified locally: all three lockfiles stay byte-identical after the real _build_web_ui / _update_node_dependencies run twice back-to-back. Fallback path tested with a deliberately out-of-sync lockfile and a no-lockfile case.	2026-04-26 18:51:31 -07:00
Teknium	ab6879634e	yuanbao platform (#16298 ) Co-authored-by: loongzhao <loongzhao@tencent.com>	2026-04-26 18:50:49 -07:00
Teknium	5eb6cd82b2	fix(sessions): /save lands under $HERMES_HOME, widen browse+TUI picker, force-refresh ollama-cloud on setup (#16296 ) Four independent session-UX bugs reported by an external user (#16294). /save wrote hermes_conversation_<ts>.json to CWD — invisible to 'hermes sessions browse' and easy to lose. Snapshots now write under ~/.hermes/sessions/saved/ and the command prints the absolute path plus a 'hermes --resume <id>' hint for the live DB-indexed session. 'hermes sessions browse' default --limit raised from 50 to 500. With the old ceiling, users with moderately long histories saw only the most recent 50 rows and assumed older sessions had been lost. TUI session.list (`/resume` picker) switched from a hardcoded allow-list of 13 gateway source names to a deny-list of just { 'tool' }. Sessions tagged acp / webhook / user-defined HERMES_SESSION_SOURCE values and any newly-added platform now surface. Default limit 20 → 200. ollama-cloud provider setup passes force_refresh=True to fetch_ollama_cloud_models() so a user entering their API key sees the fresh catalog (e.g. deepseek v4 flash, kimi k2.6) immediately instead of waiting up to an hour for the disk cache TTL to expire. Closes #16294.	2026-04-26 18:49:48 -07:00
Teknium	7e3c8a31f0	feat(skills/airtable): tailor skill to Hermes idioms + expand cookbook Expand the airtable skill from bare CRUD to a full Hermes-shaped cookbook matching the linear/notion neighbors, and trim the description to fit the 60-char system-prompt cutoff. Hermes-specific additions: - Explicit 'use the terminal tool with curl — not web_extract or browser_navigate' guidance, matching the same note in linear. - Note that AIRTABLE_API_KEY flows from ~/.hermes/.env into the subprocess automatically via env_passthrough, so curl calls don't need to re-export it. - Prefer 'python3 -m json.tool' (always present) over jq (optional) for pretty-printing, with -s on every curl to keep output clean. - Read-before-write workflow that resolves record IDs via filterByFormula instead of guessing. Cookbook expansion (new vs original): - Field-type reference table (text, select, multi-select, attachment, linked record, user) with the exact write-shape Airtable expects. - typecast flag for auto-coercing values / auto-creating select options. - performUpsert PATCH for idempotent sync by merge field. - Batch create/delete endpoints (10-record cap per call). - Sort + fields query params with URL-encoding (%5B / %5D). - Named-view query that applies saved filter/sort server-side. - Full pagination loop template (while loop with offset). - Common filterByFormula patterns (exact match, contains, AND/OR, date comparison, NOT empty). - Rate-limit backoff guidance (Retry-After header, per-base budget). - Airtable error-code reference (AUTHENTICATION_REQUIRED, INVALID_PERMISSIONS, MODEL_ID_NOT_FOUND, INVALID_MULTIPLE_CHOICE_OPTIONS) so the agent can map failures to user-actionable fixes instead of just retrying. Also: description trimmed from 183 chars (truncated to 60 in system prompt, losing 'filter/upsert/delete' trigger terms) down to 59 chars that render whole: 'Airtable REST API via curl. Records CRUD, filters, upserts.' Catalog row updated to match. SKILL.md grew from 115 to 228 lines — still under the 500-line soft cap and below the linear skill (297 lines) which serves the same role for GraphQL.	2026-04-26 18:45:15 -07:00
Teknium	0bef0b9416	chore: docs + attribution for airtable skill - scripts/release.py: map sonoyuncudmr@gmail.com -> Sonoyunchu so the check-attribution CI job and release notes credit Soynchu correctly. - website/docs/reference/skills-catalog.md: add the airtable row to the productivity bundled-skills table.	2026-04-26 18:45:15 -07:00
Teknium	55e9329ee6	feat(config): register bundled-skill API keys in OPTIONAL_ENV_VARS Adds NOTION_API_KEY, LINEAR_API_KEY, TENOR_API_KEY, and AIRTABLE_API_KEY to OPTIONAL_ENV_VARS so: - They persist to ~/.hermes/.env via save_env_value like every other key Hermes knows about, instead of being ad-hoc variables the user has to hand-edit the dotfile for. - load_env() / reload_env() populate os.environ from .env on every startup — the user sets the key once, skills keep working across restarts without losing access. - hermes setup / hermes config show surface them as known optional vars with the correct signup URL (linear.app/settings/api, airtable.com/create/tokens, etc.). These four entries use category="skill" (new) rather than "tool". tools/environments/local.py auto-adds every category=tool/messaging entry to _HERMES_PROVIDER_ENV_BLOCKLIST, which stops env passthrough from leaking provider credentials into the execute_code sandbox (GHSA-rhgp-j443-p4rf). Skill API keys are the opposite case — the point is for the agent's subprocess to see them so curl can read Authorization headers — so they must be outside the blocklist. The new category is inert for that check. All four entries are advanced=True: they show up in 'hermes config' and 'hermes status' displays, but do not nag users who have never touched those skills during setup checklists. E2E verified: save_env_value → reload_env → os.environ populated → skill_view reports setup_needed=False → env_passthrough registers the key for subprocess inheritance.	2026-04-26 18:45:15 -07:00
Teknium	0d4247d9bf	fix(skills/airtable): use .env credential pattern matching notion/linear Convert the airtable skill from 'skills.config.airtable.api_key' (config.yaml, wrong bucket for a secret) to 'prerequisites.env_vars: [AIRTABLE_API_KEY]' (~/.hermes/.env), matching every other bundled skill that authenticates with an API token. Why the original shape was wrong: - metadata.hermes.config is for non-secret skill settings (paths, preferences) per references/skill-config-interface.md. Storing a bearer token under skills.config.* also triggered the documented 'hermes config migrate' nag-on-every-run problem. - The Quick Reference's 'AIRTABLE_API_KEY=...' bash line couldn't read skills.config.airtable.api_key anyway — it's a yaml path, not an env var. Follow-up polish on the same pass: - Added version/author/license frontmatter to match notion/linear. - Added prerequisites.commands: [curl]. - Setup section now specifies the PAT format (pat...) that replaced legacy 'key...' API keys in Feb 2024, plus the three required scopes (data.records:read/write, schema.bases:read) and the per-base Access list requirement. - Clarified PATCH vs PUT and pagination (100 records/page cap). - Swapped verification from 'hermes -q ...' (non-deterministic) to a curl /v0/meta/bases call that returns a verifiable HTTP status code.	2026-04-26 18:45:15 -07:00
Sonoyunchu	c997183f53	feat(skills): add bundled Airtable productivity skill	2026-04-26 18:45:15 -07:00
Teknium	f01e4402a9	chore(release): map georgeglessner in AUTHOR_MAP	2026-04-26 18:43:57 -07:00
George Glessner	5b5a53a155	fix(cli): check hermes_cli/web_dist/ not web/dist/ for build staleness _web_ui_build_needed() in PR #14914 checked web_dir/"dist" as the sentinel, but vite.config.ts sets outDir: "../hermes_cli/web_dist" so the build output lands in hermes_cli/web_dist/, never in web/dist/. The sentinel was therefore always missing → _web_ui_build_needed always returned True → npm install + Vite build ran on every startup → OOM on low-memory VPS persisted unchanged. Fix: derive dist_dir as web_dir.parent / "hermes_cli" / "web_dist" so the sentinel points to the actual build output directory. Fixes #14898	2026-04-26 18:43:57 -07:00
Teknium	90c84c6dba	fix(gateway): unblock update subprocess on recognized-command bypass When the gateway intercepts a pending /update prompt and the user sends a recognized slash command (/new, /help, ...), the command now dispatches normally AND the detached update subprocess is unblocked by writing a blank .update_response. _gateway_prompt reads '' → strips → returns the prompt's default (typically a safe 'n' / skip), so the update process exits cleanly instead of blocking on stdin until the 30-minute watcher timeout. Also clears _update_prompt_pending[session_key] on this path so stray future input for the same session isn't re-intercepted. Extends PR #15849 with tests for the new cancel-write + a regression test pinning the legacy behavior of unrecognized /foo slash commands still being consumed as the response.	2026-04-26 18:39:44 -07:00
Yukipukii1	bdaf56a94d	fix(gateway): bypass slash commands during pending update prompts	2026-04-26 18:39:44 -07:00
Brooklyn Nicholson	b1c49d5e73	chore(tui): /clean recent perf work — KISS/DRY pass 24 files, -319 LoC. Behaviour preserved, 369/369 tests green. - hermes-ink caches: shared lruEvict helper for the four parallel LRU caches (stringWidth, wrapText, sliceAnsi, lineWidth); touch-on-read stays inlined per cache; tightened output.ts skip-slice fast path. - wheelAccel: trimmed provenance header, collapsed env parsing, ternary dispatch in computeWheelStep. - perfPane: folded ensureLogDir into once-flag, spread-with-overrides for fastPath/phases instead of full rebuilds. - env: extracted truthy() (used 4×). - virtualHeights: collapsed user/diff/slash height bumps; trail+todos estimate. - useInputHandlers: scrollIdleTimer cleanup on unmount, ?? undefined shorthand. - useMainApp: dropped dead liveTailVisible IIFE and liveProgress indirection. - appLayout, markdown, messageLine, entry: vertical rhythm, dropped narration comments, inlined one-shot vars. - fix: empty catch blocks → /* best-effort */ for no-empty lint.	2026-04-26 20:38:47 -05:00
Teknium	bdc1adf711	chore(release): map haru398801, badgerbees, xnbi in AUTHOR_MAP	2026-04-26 18:33:35 -07:00
Badgerbees	55f212a7a2	fix(slack): honor NO_PROXY for Slack transport	2026-04-26 18:33:35 -07:00
Xnbi	7eaad06a87	fix(gateway): default Slack tool_progress to off Slack Bolt posts are not editable like CLI spinners; medium-tier new still emitted a permanent line per tool start (issue #14663). - Built-in slack default: off; other tier-2 platforms unchanged. - Adjust /verbose isolation test for off to new cycle. - Migration tests: read/write config.yaml as UTF-8 (Windows locale).	2026-04-26 18:33:35 -07:00
haru398801	a01e767b24	fix(gateway): respect config.yaml slack.enabled when SLACK_BOT_TOKEN env var is set Previously, setting SLACK_BOT_TOKEN in .env would unconditionally enable the Slack gateway adapter regardless of `slack.enabled: false` in config.yaml. This caused spurious "SLACK_APP_TOKEN not set" errors when the token was used only by skills (e.g. cron jobs that send Slack messages) rather than for the Hermes messaging gateway. Now, enabled: false in config.yaml is respected — the token is stored so skills can still use it, but the gateway adapter is not activated. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-26 18:33:35 -07:00
hharry11	fd474d0f00	fix(gateway): avoid cross-user mirror writes in per-user group sessions	2026-04-26 18:31:24 -07:00
Teknium	cd2aee36ca	test(sessions): wire sessions_dir through auto-prune + file-cleanup regression tests - TestAutoMaintenance gains 3 tests: auto-prune deletes transcript files when sessions_dir is passed, preserves them when it isn't (backward- compat), and never touches active-session files during prune. - FakeDB helpers in test_sessions_delete.py accept **kwargs so they don't break when delete_session signature gains sessions_dir.	2026-04-26 18:31:07 -07:00
Yang Zhi	3b60abb6bb	fix(sessions): delete on-disk transcript files during prune and delete (#3015 ) `delete_session()` and `prune_sessions()` only removed SQLite records, leaving .json/.jsonl transcript files on disk forever. Over time this causes unbounded disk growth (~27MB/day observed). Changes: - Add `_remove_session_files()` static helper that cleans up `{session_id}.json`, `.jsonl`, and `request_dump_{session_id}_*.json` - `delete_session()` accepts optional `sessions_dir` param and removes files for the deleted session and its children - `prune_sessions()` accepts optional `sessions_dir` param and removes files for all pruned sessions after the DB transaction - Wire up CLI `hermes sessions delete` and `hermes sessions prune` to pass `sessions_dir` - File cleanup is best-effort (OSError silenced) so DB operations are never blocked by filesystem issues - Fully backward-compatible: `sessions_dir=None` (default) preserves existing behavior	2026-04-26 18:31:07 -07:00
Wysie	0ba6471dd1	fix: recover hindsight embedded daemon after idle shutdown	2026-04-26 18:29:11 -07:00
Yukipukii1	7317d69f19	fix(security): treat quoted false as false in browser SSRF guards	2026-04-26 18:27:13 -07:00
Teknium	2a0fc97c76	chore(release): map mewwts in AUTHOR_MAP	2026-04-26 18:25:41 -07:00
mewwts	8fb861ea6e	feat(gateway/slack): support channel_skill_bindings Extends the existing channel_skill_bindings mechanism (previously Discord-only) to Slack, so a channel or DM can auto-load one or more skills at session start without relying on the model's skill selector for every short reply. Motivation: Mats's German flashcards DM pushes a cron-driven card 5x/day; he responds with one-word guesses like 'work'. Previously each reply required the main agent to decide whether to load german-flashcards (full opus turn just to pick a skill). With the binding configured per Slack channel, the skill is injected at session start and grading runs directly. Changes: - Extract resolve_channel_skills() from DiscordAdapter._resolve_channel_skills into gateway.platforms.base (now shared across adapters). - DiscordAdapter._resolve_channel_skills delegates to the shared helper (behavior preserved — existing test suite still passes unchanged). - SlackAdapter: resolve channel_skill_bindings on each message and attach auto_skill to MessageEvent. gateway/run.py already handles auto-skill injection on new sessions; this just wires Slack through it. - gateway/config.py: accept channel_skill_bindings in slack: block of config.yaml (was Discord-only). - Tests: new tests/gateway/test_slack_channel_skills.py with 11 cases covering DM/thread/parent resolution, single-vs-list skills, dedup, malformed entries. Discord suite unchanged. - Docs: add 'Per-Channel Skill Bindings' section to Slack user guide. Config example: slack: channel_skill_bindings: - id: "D0ATH9TQ0G6" skills: ["german-flashcards"]	2026-04-26 18:25:41 -07:00
Teknium	635253b918	feat(busy): add 'steer' as a third display.busy_input_mode option (#16279 ) Enter while the agent is busy can now inject the typed text via /steer — arriving at the agent after the next tool call — instead of interrupting (current default) or queueing for the next turn. Changes: - cli.py: keybinding honors busy_input_mode='steer' by calling agent.steer(text) on the UI thread (thread-safe), with automatic fallback to 'queue' when the agent is missing, steer() is unavailable, images are attached, or steer() rejects the payload. /busy accepts 'steer' as a fourth argument alongside queue/interrupt/status. - gateway/run.py: busy-message handler and the PRIORITY running-agent path both route through running_agent.steer() when the mode is 'steer', with the same fallback-to-queue safety net. Ack wording tells users their message was steered into the current run. Restart-drain queueing now also activates for 'steer' so messages aren't lost across restarts. - agent/onboarding.py: first-touch hint has a steer branch for both CLI and gateway. - hermes_cli/commands.py: /busy args_hint updated to include steer, and 'steer' is registered as a subcommand (completions). - hermes_cli/web_server.py: dashboard select widget offers steer. - hermes_cli/config.py, cli-config.yaml.example, hermes_cli/tips.py: inline docs updated. - website/docs/user-guide/cli.md + messaging/index.md: documented. - Tests: steer set/status path for /busy; onboarding hints; _load_busy_input_mode accepts steer; busy-session ack exercises steer success + two fallback-to-queue branches. Requested on X by @CodingAcct. Default is unchanged (interrupt).	2026-04-26 18:21:29 -07:00
Teknium	87477756fd	chore(release): map Ito-69 in AUTHOR_MAP	2026-04-26 18:21:20 -07:00
Ivan Tonov	930494d687	fix(cron): reap orphaned MCP stdio subprocesses after each tick MCP stdio servers are spawned via the SDK's stdio_client, which on Linux uses start_new_session=True (setsid). When a cron job is cancelled mid-way (timeout, agent finish, exception), the subprocess often escapes the SDK's teardown and survives as a session leader. Because setsid() detaches the child from the gateway's process group / cgroup tree, systemd does not reap it on service restart either — so every cron tick that touches an MCP tool leaks a dangling server process. Fix: * tools/mcp_tool.py — _run_stdio now wraps the whole stdio+session context in try/finally. On any exit path (clean, exception, cancellation), PIDs still alive are moved from the active _stdio_pids set into a new _orphan_stdio_pids set. Orphan detection is done via os.kill(pid, 0) — a cheap liveness probe that never signals the target. * tools/mcp_tool.py — _kill_orphaned_mcp_children gains an include_active=False flag. Default behaviour now only reaps the orphan set so concurrent sessions (other parallel cron jobs or live user chats) are never disrupted. The existing shutdown path passes include_active=True to keep the previous "kill everything" semantics after the MCP loop is stopped. * cron/scheduler.py — the cleanup hook is moved from run_job()'s finally (which would race with parallel siblings after #13021) into tick() after the ThreadPoolExecutor has joined every future. At that point there are no in-flight sessions from this tick, so sweeping the orphan set is always safe. Net effect: zero regression for healthy sessions, and orphan MCP servers no longer accumulate between gateway restarts. Made-with: Cursor	2026-04-26 18:21:20 -07:00
Teknium	5db6db891c	chore(release): map ghostmfr in AUTHOR_MAP	2026-04-26 18:20:17 -07:00
ghostmfr	e818ec520a	fix(slack): harden attachment handling Multiple overlapping Slack attachment improvements: 1. Upload retry with backoff on transient errors (429, 5xx, connection reset, rate_limited, service unavailable). New _is_retryable_upload_error helper covers three upload paths: _upload_file, send_video, send_document. Up to 3 attempts with 1.5s * attempt backoff. 2. Thread participation tracking: successful file uploads now add the thread_ts to _bot_message_ts, mirroring how text replies are tracked. This lets follow-up thread messages auto-trigger the bot (same engagement rules as replied threads). 3. Thread metadata preservation in the image redirect-guard fallback (send_image → send text fallback) and in two gateway.run.py send paths (image + document fallback calls). 4. HTML response rejection in _download_slack_file_bytes. Parallels the existing check in _download_slack_file. Guards against Slack returning a sign-in / redirect page as document bytes when scopes are missing, so the agent doesn't get HTML-as-a-PDF. 5. File lifecycle event acks (file_shared / file_created / file_change). These events arrive around snippet uploads. Acking them silences the slack_bolt 'Unhandled request' 404 warnings without changing behavior. 6. Post-loop message type classification so a mixed image+document upload classifies as PHOTO (or VOICE if no image), falling back to DOCUMENT. Previously, the per-file classification in the inbound loop could be overwritten unpredictably. 7. Expanded text-inject whitelist in inbound document handling to cover .csv, .json, .xml, .yaml, .yml, .toml, .ini, .cfg (up to 100KB) so snippets and config files are directly visible to the agent, not just cached as opaque uploads. Paired with new MIME entries in SUPPORTED_DOCUMENT_TYPES in base.py. Squashed from two commits in #11819 so the single commit carries the contributor's GitHub attribution (the original commits were authored under a local dev hostname).	2026-04-26 18:20:17 -07:00
Brooklyn Nicholson	527ac351b4	fix(tui): address Copilot review comments - stringWidth: true LRU on cache hit (touch-on-read via delete+set) so hot strings stay resident under long sessions; was insertion-order FIFO before - virtualHeights: include todos, panel sections, and intro version in messageHeightKey so height-cache reuse correctly invalidates when todo content / panel sections change - virtualHeights: estimate trail+todos rows at todos.length+2 (or 2 collapsed) instead of the generic ~1-line fallback, so initial virtualization offsets are closer to reality - useInputHandlers: clearTimeout on unmount for scrollIdleTimer so pending relaxStreaming() never fires after teardown - render-node-to-output: drop unused declined.noHint counter from scrollFastPathStats; it was always 0 (the "hint missing" branch is outside the diagnostics block) - perfPane / hermes-ink.d.ts: follow the noHint removal - wheelAccel: replace ~/claude-code path comment with generic attribution that doesn't reference a developer-local checkout	2026-04-26 20:07:41 -05:00
Brooklyn Nicholson	b115ea62da	feat(tui): anchor LiveTodoPanel to latest user message row TodoPanel now renders as a child of the most recent user message's virtualized row container, so it visually belongs to that prompt and follows it during scroll. Falls back gracefully when no user message exists yet (panel just doesn't render).	2026-04-26 20:07:29 -05:00
Brooklyn Nicholson	25767513f2	perf(tui): unified Ink cache eviction on memory pressure + session reset Adds an `evictInkCaches(level)` API that prunes the four hot module-level caches (`widthCache`, `wrapCache`, `sliceCache`, `lineWidthCache`) with either a half-keep LRU pass or a full clear. Wired into: - memoryMonitor: half-prune on 'high', full drop on 'critical', before the heap dump / auto-restart path. Gives long sessions a shot at recovering RSS instead of hard-exiting. - useSessionLifecycle.resetSession: half-prune so a /new session starts with a half-warm pool and the prior session can resume cheaply. Also: lineWidthCache now uses LRU half-eviction on overflow instead of a full `cache.clear()`, matching the other three caches. Comparison vs claude-code: both forks now share the same `prevScreen` blit + dirty-cascade machinery in render-node-to-output. Their smoothness came from sibling-memo discipline (every chrome pane memo'd so dirty cascade doesn't disable transcript blit) — already in place in our appLayout.tsx (TranscriptPane / ComposerPane / StatusRulePane all memo'd). Alt-screen is not the cause; both use it. The remaining gap was per-row CPU on width/wrap/slice, which the previous commit closed.	2026-04-26 19:41:53 -05:00
Brooklyn Nicholson	c370e2e1e5	perf(tui): cache stringWidth/wrapText/sliceAnsi + skip-slice when line fits clip CPU profile (Apr 2026, real-user scroll on 11k-line session) showed three hot loops in the per-frame render path: Output.get() per-frame walk: 24% total └─ sliceAnsi(line, from, to) per write: 18% total stringWidth(line) chain (cached + JS): 14% total All three were re-doing identical work every frame: same string → same clipped slice → same width. Fixes: 1. Memoize stringWidth (8k-entry LRU) for non-ASCII strings; ASCII fast-path skips the cache (inline scan beats Map.get for short ASCII, the >90% case). String.charCodeAt scan up to 64 chars is cheaper than the regex fallback. 2. Memoize wrapText (4k-entry LRU keyed by maxWidth\|wrapType\|text) — wrapAnsi is pure and the same content reflows identically every frame. 3. Memoize sliceAnsi (4k-entry LRU keyed by start\|end\|str) for the end-defined hot path used by Output.get(). 4. Skip the slice entirely in Output.get() when the line already fits the clip box (startsBefore=false && endsAfter=false). Most transcript lines never exceed their container width, and tokenizing them just to slice (line, 0, width) was pure overhead. This single fast-path drops sliceAnsi from 18% → ~0% in the profile. Also tighten virtualization constants (MAX_MOUNTED 260→120, OVERSCAN 40→20, SLIDE_STEP 25→12) and cap historical-message render at 800 chars / 16 lines via HISTORY_RENDER_MAX_*; messages inside the FULL_RENDER_TAIL_ITEMS window still render in full so reading-zone behavior is unchanged. Validation, real-user CPU profile, page-up scroll on 11k-line session: Output.get() self-time: 24% → 0.3% sliceAnsi total: 18% → not in top 25 stringWidth family: 14% → ~3% idle: 60.7% → 77.3% Frame timings (synthetic page-up profile harness): dur p95: ~10ms → 4.87ms dur p99: 25ms+ → 12.80ms yoga p99: ~20ms → 1.87ms The remaining CPU in the profile is Yoga layoutNode + React commit, which is the irreducible work for this UI tree size.	2026-04-26 19:28:09 -05:00
Teknium	b16f9d438b	feat(telegram): send fresh finals for stale preview streams (port openclaw#72038) (#16261 ) Ports openclaw/openclaw#72038 to hermes-agent. Telegram's `editMessageText` preserves the original message timestamp, so a long-running streamed reply (reasoning models that take 60+ seconds to finish) would keep the first-token timestamp even after completion. Users can't tell how long a task actually took. When a preview message has been visible for >= 60s (configurable via `streaming.fresh_final_after_seconds`), finalize by sending a fresh message instead of editing in place, then best-effort delete the stale preview. Short previews still edit in place (the existing fast path). Implementation notes adapted from OpenClaw's TypeScript original: - `StreamConsumerConfig` gains `fresh_final_after_seconds` (default 0 = legacy edit-in-place). Gateway-level `StreamingConfig` defaults to 60. - `GatewayStreamConsumer` tracks `_message_created_ts` at first-send and checks it in `_send_or_edit` on `finalize=True`. New helpers `_should_send_fresh_final` + `_try_fresh_final`. - `BasePlatformAdapter` gains optional `delete_message(chat_id, message_id)` returning False by default. `TelegramAdapter` implements it via `_bot.delete_message`. - `gateway/run.py` only enables fresh-final for `Platform.TELEGRAM`; other platforms ignore the setting (they don't have the stale-edit timestamp problem or edit-then-read works cheaply). - Fallback to normal edit on any fresh-send failure — no user-visible regression if Telegram rate-limits a send or the message is gone. Tests: 15 new cases in tests/gateway/test_stream_consumer_fresh_final.py covering short/long previews, config plumbing, delete-support absent, send-failure fallback, __no_edit__ sentinel safety, and StreamingConfig round-trip. Co-authored-by: Hermes Agent <agent@nousresearch.com>	2026-04-26 17:26:37 -07:00
Brooklyn Nicholson	85e9a23efb	feat(tui): HERMES_TUI_FPS=1 shows live fps counter Adds a corner-overlay FPS readout gated on HERMES_TUI_FPS, fed by ink's onFrame callback (so it's the REAL render rate, not a timer). Displays fps, last-frame duration, and total frame count, colored by threshold (green ≥50, yellow ≥30, red below). Implementation: * lib/fpsStore.ts — nanostore atom updated from a trackFrame() sink. Ring buffer of last 30 frame timestamps; fps = 29/elapsed. trackFrame is undefined when SHOW_FPS is off so ink's onFrame short-circuits at the optional chain. * components/fpsOverlay.tsx — tiny <Text> subscriber; returns null when SHOW_FPS is off (React skips the subtree entirely). * entry.tsx — composes onFrame from logFrameEvent (dev-perf) and trackFrame (fps) so both flags can coexist. When both are off, onFrame is undefined and ink never attaches the handler. * appLayout.tsx — mounts the overlay as a flex-shrink=0 right- aligned Box below the composer, conditional on SHOW_FPS. Usage: HERMES_TUI_FPS=1 hermes --tui # bottom right: " 62.3fps · 0.8ms · #1234" (green/yellow/red) Intended as a user-facing diagnostic during the scroll-perf tuning pass — watch the counter drop while holding PageUp to see where frames go silent, without having to run scripts/profile-tui.py in a side terminal. 126 files post-compile with React Compiler; 352 tests still pass.	2026-04-26 17:20:47 -05:00
Brooklyn Nicholson	4395c2b007	feat(tui): port claude-code's wheel accel state machine Replaces the static WHEEL_SCROLL_STEP=1 multiplier on wheel events with an adaptive accel state machine that infers user intent from inter-event timing. Algorithm ported straight from claude-code's src/components/ScrollKeybindingHandler.tsx. All tuning constants, the native/xterm.js path split, the encoder-bounce detection, the trackpad-burst signature → all theirs. This file is a mechanical port into our module structure. What it does: precision click (>500ms gap) 1 row/event (deliberate scan) sustained mouse (40-200ms) 2-6 rows (decay curve) detected wheel bounce ramps to 15 (sticky wheel-mode) trackpad flick (5+ <5ms) 1 row/event (burst detect) direction reversal reset to base Two implementation paths: * native terminals (ghostty, iTerm2, Kitty, WezTerm) — linear window-ramp + optional wheel-mode curve triggered by detected encoder bounce. SGR proportional reporting handled via the burst-count guard. * xterm.js (VS Code / Cursor / browser terminals) — pure exponential-decay curve with fractional carry. Events arrive 1-per-notch with no pre-amplification, so the curve is more aggressive. Selected at construction via isXtermJs() from @hermes/ink (now exported). Per-user tune via HERMES_TUI_SCROLL_SPEED (alias CLAUDE_CODE_SCROLL_SPEED for portability). 13 unit tests covering direction flip/bounce/reversal, idle disengage, trackpad-burst disengage, frac invariants, and the native vs xterm.js branches. Profiled under --rate 30 (stress test) and --rate 10 (realistic sustained scroll): accel ramps to cap=6 at 30Hz burst, decays to 1-3 rows at sparse 10Hz clicks. Perf is comparable to baseline because accel IS multiplying step — the win is perceptual (fast flicks cover distance, slow clicks keep precision), not raw fps. Companion to the earlier WHEEL_SCROLL_STEP=1 change: that set the base; this modulates around it.	2026-04-26 17:16:11 -05:00
Brooklyn Nicholson	0cd98499bb	Promote debugging-hermes-tui-commands to in-repo skill Was user-local in ~/.hermes/skills/. Ported into skills/software-development/ so other Hermes users get it and so the related_skills links from node-inspect-debugger and python-debugpy resolve in-repo. Frontmatter upgraded to match repo convention (version/author/license/ metadata.hermes.{tags,related_skills}, description rewritten as "Use when ..."). Body expanded with debugging-tactics section pointing at the two new debugger skills, and additional common-issues / pitfalls entries.	2026-04-26 17:13:12 -05:00
Brooklyn Nicholson	4cdb6962ca	Add hermes-agent-skill-authoring skill Class-level skill for writing SKILL.md files inside this repo: required frontmatter per tools/skill_manager_tool.py validator, size limits, peer-matched structure, directory placement, write_file vs skill_manage, caching pitfalls, cross-reference caveats.	2026-04-26 17:12:25 -05:00
Brooklyn Nicholson	9a46feb9bd	experiment(tui): HERMES_TUI_INLINE flag to skip AlternateScreen Adds a gate so we can A/B test whether bypassing the alt-screen + viewport constraint lets the terminal's native scrollback beat our virtualization on scroll perf. Result: definitively NO. Inline mode is 40x worse on every metric that moves, because AlternateScreen is what constrains the ScrollBox to the viewport height. Without it, the ScrollBox grows to contain every child of the transcript and every frame re-renders all 1100 messages. Profile under hold-wheel_up (1106-msg session, 30Hz for 6s): metric fullscreen inline delta patches_total 28,864 1,111,574 +3751% writeBytes_total 42 KB 1.6 MB +3881% fps_throughput 15.8 fps 1.75 fps -89% frames 179 18 -90% gap_p50_ms 17 (~60fps) 726 (~1fps) +4170% yoga_p99 34 ms 405 ms +1083% renderer_p99 14 ms 169 ms +1062% flickers 0 5 offscreen — This is actually the cleanest data we've gotten so far: * AlternateScreen is LOAD-BEARING for perf — its viewport height constraint is what lets useVirtualHistory's culling work. No constraint → ScrollBox grows unbounded → every fiber mounts. * The outer terminal (Cursor's xterm.js) parsed 1.6 MB of ANSI in under 10 seconds with drain p99 = 8.83 ms and 0 backpressure frames. Our terminal-write hypothesis from last session was wrong: the bottleneck is React + Yoga, not the wire. * Doing proper inline mode (non-virtualized transcript in scrollback, composer pinned below) is not a flag flip — it's a different UI architecture. Leaving this flag in so anyone re-running the experiment gets the same numbers, but not building the architecture until we're sure the perf win is worth the UX loss (it probably isn't — the fullscreen + virt path is the one we should optimize, not replace). Keeping the flag as an experiment gate. Flip HERMES_TUI_INLINE=1 and run scripts/profile-tui.py --compare to reproduce.	2026-04-26 17:11:49 -05:00
Brooklyn Nicholson	8d2b08342c	Add node-inspect-debugger and python-debugpy skills Two new skills under skills/software-development/ for real breakpoint-driven debugging from the terminal: - node-inspect-debugger: node --inspect / --inspect-brk, node inspect REPL, CDP scripting via chrome-remote-interface, attaching to running Node processes (SIGUSR1), ui-tui-specific recipes, Vitest under debugger, CPU profiles + heap snapshots. - python-debugpy: pdb quick reference, breakpoint() workflow, pytest --pdb (with xdist caveat for scripts/run_tests.sh), post-mortem, debugpy for remote/attach, remote-pdb as the agent-friendly alternative to DAP, recipes for tui_gateway/_SlashWorker/subprocess debugging.	2026-04-26 17:10:11 -05:00
Brooklyn Nicholson	82f842277e	perf(tui): profile harness gains --loop, --save, --compare Before: change code → build → run profile → manually compare to mental model of last run. After: `--loop` watches ui-tui/src and packages/hermes-ink/src for .ts(x) changes, rebuilds on change, re-runs the same scenario, prints a side-by-side A/B diff against the previous iteration — so each edit's impact is quantified instantly. Ctrl+C to stop. Also added: --save LABEL saves metrics snapshot to /tmp/perf-<LABEL>.json --compare LABEL diffs the current run vs that snapshot --extra-flag X pass-through to node dist/entry.js (prepping for --no-fullscreen below) key_metrics() flattens a full run into scalar numbers across frames, React commits, and per-phase timings. format_diff() prints a table with ↑/↓ markers denoting regressions vs improvements based on whether the metric is lower-is-better (p99, max, patches, drain) or higher-is-better (fps, gaps_under_16ms). Run-to-run noise on static code is ~5-15% on most metrics — big signal (>30% change on renderer_p99 / fps) cuts through cleanly. Useful both for validating a single fix and for detecting subtle regressions during the wheel-accel port. Usage during the next perf session: # one-shot with a baseline for later comparison scripts/profile-tui.py --seconds 6 --hold wheel_up --save pre-accel # after porting the wheel handler scripts/profile-tui.py --seconds 6 --hold wheel_up --compare pre-accel # continuous iteration scripts/profile-tui.py --seconds 6 --hold wheel_up --loop	2026-04-26 17:08:07 -05:00
Brooklyn Nicholson	f823535db2	perf(tui): instrument stdout drain — rule out terminal parse bottleneck Adds four fields to FrameEvent.phases and the matching profile summary: optimizedPatches post-optimize patch count (what's actually written to stdout; the .patches field is pre-optimize) writeBytes UTF-8 byte count of the write this frame backpressure true when Node's stdout.write returned false (Writable buffer full — outer terminal can't keep up) prevFrameDrainMs end-to-end drain time of the PREVIOUS frame's write, captured from stdout.write's 2-arg callback. Reported on the next frame so the measurement reflects "time until OS flushed the bytes to the terminal fd", not "time until queued in Node". writeDiffToTerminal() now returns { bytes, backpressure } and accepts an optional onDrain callback. Only attached on TTY with diff; piped/non-TTY stdout bypasses flow control so the callback would fire synchronously anyway. Initial measurements under hold-wheel_up against 1106-msg session (30Hz for 6s): patches total 28,888 optimized total 16,700 (ratio 0.58 — optimizer cuts ~42%) writeBytes 42 KB / 10s = 4.2 KB/s throughput drainMs p50 0.14 ms terminal accepts bytes instantly drainMs p99 0.85 ms backpressure 0% of frames This rules out the terminal-parse hypothesis — Cursor's xterm.js drains our output in sub-millisecond time at only 4 KB/s. The remaining lag has to be in the render pipeline, not the wire. Profile output now includes the bytes+drain+backpressure lines to keep this visible on every subsequent iteration.	2026-04-26 17:06:22 -05:00
Brooklyn Nicholson	d3dedf10aa	revert(tui): drop DeferredMd, profiling showed it was neutral Profiled with scripts/profile-tui.py under hold-PageUp + hold-wheel. The placeholder → microtask-upgrade pattern did not reduce renderer p99 (63ms → 63ms) or max (96ms → 142ms, slightly worse). Each fresh row still pays the Md cost — just on a follow-up commit instead of inline — and the follow-up commit shows up as a second heavy frame a few ms later. The real bottlenecks turned out to be: 1. wheel step too large (fixed in `7ca16eea`) 2. outer terminal ANSI parse throughput (diagnosing next) 3. React commit frequency during hold-scroll (needs coalescing) None of which DeferredMd addresses. Clearing the complexity so the next experiments land on a simpler substrate.	2026-04-26 17:03:38 -05:00
Brooklyn Nicholson	7ca16eea56	perf(tui): scroll one row at a time per wheel event, half-viewport per pageUp User observation: "it doesn't scroll line by line/row by row." Was right. Two places hardcoded big deltas: 1. WHEEL_SCROLL_STEP = 6 (config/limits.ts) Each wheel event scrolled 6 rows. A mechanical wheel notch emits 3-5 events → 18-30 rows per click, which visually teleports past content instead of smooth-scrolling it. Drop to 1. Trackpads emit 50-100 events per flick — at step=1 that's still a fast flick (a whole viewport in one flick) but each intermediate frame is visible. Porting claude-code's wheel accel state machine is the right next step if this feels sluggish on precision scrolls. 2. pageUp/pageDown = viewport - 2 (useInputHandlers.ts) Full-viewport jumps replace the entire screen — no visual continuity, can't scan content — AND land right at Ink's fast-path threshold (`delta < innerHeight`), which disqualifies the DECSTBM blit on every press. Half-viewport keeps 50% continuity AND drops well under the threshold. Two presses still cover the same total distance. Profiled against the 1106-msg session, holding the key at 30Hz for 6s: wheel_up (step 6 → 1): frames 142 → 163 (+15%) throughput 10.7 → 15.8 fps (+48%) patches tot 53018→ 36562 (-31%) gap p50 5ms → 16ms (actual rendering ~60fps now) <16ms frames 93 → 76 16-33ms 82 → 76 hitches 3 → 1 pageUp (viewport-2 → viewport/2): throughput 10.7 → 9.5 fps (same ballpark — smaller delta × same event rate = less total scroll) Ink's proportional drain caps at `innerHeight - 1` per frame to keep the DECSTBM fast path firing. With these smaller deltas every event comfortably fits under that cap, so fast-path hit rate goes up and patch volume per frame drops — the measured 31% reduction in total patches-sent correlates with users perceiving smoother scrolling because the outer terminal (VS Code / xterm.js / tmux) isn't drowning in ANSI between paints. Tests/type-check/build clean; 352 tests pass.	2026-04-26 17:01:22 -05:00
Brooklyn Nicholson	4a9070c9ac	perf(tui): defer Md upgrade for fresh-mounted assistant rows Adds DeferredMd — a wrapper around <Md> that renders a lightweight <Text> placeholder on first mount and upgrades to the full markdown subtree on a queueMicrotask follow-up. Rationale: fresh MessageLine mounts during PageUp hold run our markdown tokenizer + syntax highlighter synchronously, producing the 63-112ms renderer spikes profiled earlier. A plain <Text> placeholder only needs Yoga to wrap the pre-stripped string (no tokenizer, no highlight), then the Md subtree builds in a follow-up React commit. Upgrade cache: once a (theme, compact, text) tuple has been upgraded, a WeakMap-keyed Set remembers it so remounts (scroll-out then scroll-back) mount straight into <Md> — no placeholder round-trip. WeakMap on theme means palette swaps re-upgrade naturally. Honesty note: profiling under hold-PageUp showed this didn't reduce renderer p99 measurably — the upgrade commit just pays the Md cost on a follow-up frame instead of inline. The bigger bottleneck turned out to be React commit frequency (3.5 commits/sec during 30Hz scroll input, with 200ms+ silent gaps between commits dominating perceived FPS), which this change doesn't address. Keeping the deferred path anyway because: 1. It's correct and tested — no regressions across 352 tests 2. Defensive for pathological fresh-mount cases (giant code blocks, wide tables) that aren't in the current profile fixture 3. Pairs naturally with useVirtualHistory's useDeferredValue to keep React's concurrent scheduler able to interrupt upgrade commits If the follow-up perf investigation (terminal write throughput / patch volume / commit frequency) shows DeferredMd is net-neutral-or-worse in practice, this can be reverted with a one-line swap back to <Md> in messageLine.tsx:115. Companion to the streaming 2-column fix in `7242361a` — these two touched messageLine.tsx together so they land as a pair.	2026-04-26 16:56:09 -05:00
Brooklyn Nicholson	7242361a69	fix(tui): wrap streaming markdown split in column Box StreamingMd returned <><Md/><Md/></> — a bare Fragment with two <Md> children. Each <Md> returns a <Box flexDirection="column">, but its parent in messageLine.tsx (line 169) is `<Box width={...}>` with no flexDirection, which Ink defaults to 'row'. So during streaming the two column boxes rendered side-by-side, producing the visible "tokens jumble into two columns until it fixes itself" bug — the "fix" was message.complete flipping isStreaming→false, which swaps the StreamingMd subtree for a single DeferredMd/Md child (no siblings → row direction is harmless). Wrap the two <Md> siblings in a flexDirection="column" Box so they stack. Localized fix so the non-streaming path (single-child, works fine in a row parent) is untouched. Reported by user: > "tokens streaming... going into 2 columns randomly and jumbling > together until it fixes itself" No test changes — findStableBoundary tests still pass (the layout change is parent-structural, not in the boundary logic). Build clean, tsc clean, 352 tests pass.	2026-04-26 16:55:56 -05:00
Brooklyn Nicholson	cd7a200e6c	perf(tui): instrument scroll fast-path decline reasons Adds scrollFastPathStats counters to render-node-to-output.ts: captures every time a ScrollBox's DECSTBM scroll hint is generated, records whether the fast path took it (blit+shift from prevScreen) or declined, and why. Exposed through hermes-ink's public exports and snapshotted on every FrameEvent so the profiler harness can correlate decline reasons with the actual patch/renderer cost per frame. This is pure observation — no behaviour change. Preparing for the virtual-history rewrite: the hypothesis was that our topSpacer/ bottomSpacer scheme disqualifies every scroll via heightDelta mismatch, but the data shows the fast path is actually taken on most scrolls (19/23 over a 6s PageUp hold through 1100 messages) — the remaining steady-state renderer cost is Yoga tree traversal, not the per-frame full redraw I initially suspected. Declines that do happen correlate with React commits that changed the mounted range mid-scroll (heightDelta=±3 to ±35). Those are the rarer cases the virtualization rewrite still needs to address. No test diffs — instrumentation-only. Build verified: `tsc --noEmit` plus the full `npm run build` compiler post-pass pass cleanly.	2026-04-26 16:45:53 -05:00
Brooklyn Nicholson	71eee26640	perf(tui): full-pipeline instrumentation + profiling harness Extends HERMES_DEV_PERF to capture the complete render pipeline, not just React commits. Adds scripts/profile-tui.py to drive repeatable hold-PageUp stress tests against a real long session. perfPane.tsx: Wires ink's onFrame callback (already plumbed through the fork) into the same perf.log as the React.Profiler samples. Captures per-phase timing (yoga calculateLayout, renderNodeToOutput, screen diff, patch optimize, stdout write) plus yoga counters (visited/measured/cache- Hits/live) and patch counts per frame. Events are tagged {src: 'react'\|'frame'} so jq can split them. logFrameEvent is undefined when HERMES_DEV_PERF is unset, so ink doesn't even attach the callback. entry.tsx: Passes logFrameEvent into render(). types/hermes-ink.d.ts: Declares FrameEvent + onFrame on RenderOptions so the ui-tui side type-checks against the plumbed-through ink option. scripts/profile-tui.py: New harness. Launches the built TUI under a PTY with the longest session in state.db resumed, holds PageUp/PageDown/etc at a configurable Hz for N seconds, then parses perf.log and prints per-phase p50/p95/p99/max plus yoga-counter summaries. Zero deps beyond stdlib. Exit 2 if nothing was captured (wiring broken). Initial findings (1106-msg session, 6s PageUp hold at 30Hz): - Steady state: 10 fps; renderer phase p99=63ms, write p99=0.2ms - 4/107 heavy frames (>=16ms), all dominated by renderNodeToOutput - One pathological 97ms frame with yoga measuring 70,415 text cells and Yoga visiting 225k nodes — the cold-unmeasured-region hit - Ink's scroll fast-path (DECSTBM blit from prevScreen) is disqualified because our spacer-based virtual history doesn't keep heightDelta in sync with scroll.delta, so every PageUp step falls through to a full 2000-4800 patch re-render instead of ~40	2026-04-26 16:36:25 -05:00
Brooklyn Nicholson	69ff201050	feat(tui): anchor todo panel above streaming output	2026-04-26 16:26:50 -05:00
Brooklyn Nicholson	2259eac49e	feat(tui): collapse completed todo panel on turn end	2026-04-26 16:24:15 -05:00

1 2 3 4 5 ...

6185 Commits