hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-04 09:47:54 +08:00

Author	SHA1	Message	Date
teknium1	832ecde4b0	feat(kanban): structured tool surface for worker + orchestrator agents Seven new tools in `tools/kanban_tools.py` that give kanban workers a backend-portable, schema-filtered way to interact with the board from inside their own Python process — no shelling out to `hermes kanban`. Motivation The CLI path (`hermes kanban complete \$TASK --summary ...`) breaks on any remote terminal backend (Docker, Modal, Singularity, SSH). The terminal tool runs `hermes kanban` inside the container, where `hermes` isn't installed and `~/.hermes/kanban.db` isn't mounted. Tools run in the agent's own Python process, so they always reach the board regardless of backend. Also skips shell-quoting fragility on --metadata JSON and gives structured error returns the model can reason about. The seven tools kanban_show read current task (defaults to HERMES_KANBAN_TASK) kanban_complete structured handoff: summary + metadata kanban_block ask for human input kanban_heartbeat signal liveness during long operations kanban_comment append to task thread kanban_create fan out into child tasks (orchestrator path) kanban_link add parent→child dependency after the fact Gating Each tool's check_fn returns True iff HERMES_KANBAN_TASK is set in the process env. The dispatcher sets it when spawning a worker; normal `hermes chat` sessions never have it. Empirically verified: a baseline hermes-cli schema is 27 tools; with HERMES_KANBAN_TASK set it grows to exactly 34 (+7). Zero leak into normal sessions. Also set HERMES_PROFILE in the spawn env so the kanban_comment tool's author default works cleanly (it's what the tool reads to attribute comments). Skill updates - `skills/devops/kanban-worker/SKILL.md`: lifecycle rewritten to use kanban_show / kanban_heartbeat / kanban_block / kanban_complete / kanban_comment / kanban_create directly. CLI fallback section added for human operators / scripts. - `skills/devops/kanban-orchestrator/SKILL.md`: all examples ported from CLI to tool form; top-banner note explaining tools are the primary surface. kanban_create / kanban_link throughout. Docs `website/docs/user-guide/features/kanban.md`: new "How workers interact with the board" section explaining the tool surface, gating mechanism, and why tools vs CLI. The worker skill / orchestrator skill subsections are now nested under it. Tests (+25 in tests/tools/test_kanban_tools.py) - Schema gating: kanban_tools_hidden_without_env_var, kanban_tools_visible_with_env_var. - Happy paths: show (default + explicit task_id), complete (with summary+metadata, with result only), block, heartbeat (with and without note), comment (default + custom author), create (with list parents, with string parent), link. - Error paths: complete rejects no-handoff and non-dict metadata, block rejects empty reason, comment rejects empty body, create rejects no title / no assignee / non-list parents, link rejects self-reference / missing args / cycles. - End-to-end: full worker lifecycle driven entirely through the tools, verified against DB state. 214/214 kanban suite pass under scripts/run_tests.sh.	2026-04-28 04:30:22 -07:00
Teknium	be184aa5fa	fix(kanban): close the two v2-flagged issues in v1 Both items the atypical-scenarios pass flagged as "v2 follow-up" actually belong in v1. Fixed now. Fix 1: workspace path traversal resolve_workspace now rejects non-absolute paths for all three workspace_kinds (scratch-with-explicit-path, dir:, worktree). A relative path like '../../../tmp/attacker' was being silently resolved against the dispatcher's CWD — a confused-deputy escape. Error message points users at the absolute-path requirement. Storage remains verbatim (kernel doesn't rewrite user input); the refusal happens at resolution time, so the dispatcher's existing spawn-failure circuit breaker correctly categorizes it. Threat model documented in website/docs/user-guide/features/kanban.md: single-host, trusted-local-user. The absolute-path rule prevents ambiguity-driven escape, not malicious access — kanban runs as you, with your uid, on your filesystem. Fix 2: build_worker_context unbounded Added per-section caps so worker prompts stay bounded on pathological boards: _CTX_MAX_PRIOR_ATTEMPTS = 10 most-recent N runs shown; older collapsed into "N earlier attempts omitted" marker. Attempt numbering preserved (shows "Attempt 16" not renumbered). _CTX_MAX_COMMENTS = 30 same pattern for comments. _CTX_MAX_FIELD_BYTES = 4 KB per summary / error / metadata / result. _CTX_MAX_BODY_BYTES = 8 KB per task.body (opening post). _CTX_MAX_COMMENT_BYTES = 2 KB per comment. Truncation uses a visible ellipsis + char-count so the worker knows it's been truncated. Effect on atypical-scenario runs: huge_run_count_on_one_task (1000 runs): 63 KB → 820 chars comment_storm (1000 comments): 50 KB → 1,671 chars Tests (+6 in main suite) test_resolve_workspace_rejects_relative_dir_path — relative dir: path stored verbatim but refused at resolve. test_resolve_workspace_accepts_absolute_dir_path — legitimate absolute paths are created and returned. test_resolve_workspace_rejects_relative_worktree_path — same guard for worktree kind. test_build_worker_context_caps_prior_attempts — 25 runs → exactly _CTX_MAX_PRIOR_ATTEMPTS shown, omitted marker present, attempt numbering preserves original index. test_build_worker_context_caps_comments — 100 comments → 30 shown, 70 in the omitted marker. test_build_worker_context_caps_huge_summary — 1 MB summary on a prior run → context under 10 KB total, truncation marker visible. 189/189 kanban suite pass. Atypical-scenarios stress script still passes all 28 scenarios with the new caps in effect.	2026-04-28 01:20:08 -07:00
Teknium	7206eed319	docs(kanban): add step-by-step tutorial with 10 dashboard screenshots New website/docs/user-guide/features/kanban-tutorial.md walks four user stories end-to-end, each backed by a real screenshot of the dashboard running against seeded data. Stories 1. Solo dev shipping a feature (parent->child dependencies, structured handoff, run history rendering). 2. Fleet farming (parallel independent tasks across 3 assignees, lanes-by-profile grouping, dispatcher daemon). 3. Role pipeline with retry (PM spec -> eng implements -> review blocks -> eng retries -> review approves; two-run history visible in the drawer; downstream workers pull parent summary+metadata). 4. Circuit breaker + crash recovery (2 spawn_failed + 1 gave_up for a deploy with missing creds; 1 crashed + 1 completed for an OOM-killed migration that recovered on retry). Each story shows both CLI commands and the dashboard drawer equivalent. Screenshots captured via playwright + chromium at 2x device scale, then repalettized with PIL (22MB -> 6.1MB for the 10-image set, no visible quality loss verified against vision). Side updates - website/sidebars.ts: added kanban-tutorial under features. - website/docs/user-guide/features/kanban.md: prefix banner linking new readers to the tutorial before the reference. All image references validate: `/img/kanban-tutorial/*` maps to website/static/img/kanban-tutorial/ (10 files). Docusaurus build not run locally (no node_modules in worktree); CI build on merge will confirm.	2026-04-27 20:28:53 -07:00
Teknium	e27c819de3	fix(kanban): deep-scan pass 2 — synthetic runs, event.run_id plumbing, invariant recovery, live drawer refresh Second integration audit covering surfaces the first pass didn't hit. Found eight issues spanning kernel, dashboard frontend, notifier, and CLI. All behavioral / UX fixes; no schema change. Kernel - complete_task on a never-claimed task (ready/blocked → done with no run in flight) was silently dropping the summary/metadata/result onto a non-existent run. Now synthesizes a zero-duration run (started_at == ended_at) so attempt history is complete. Only fires when there's actually handoff data to persist — bare complete_task(tid) remains a no-op for run creation. - block_task on a never-claimed task had the same bug for --reason. Same fix: synthesize a zero-duration run when a reason is passed. - Event dataclass gained a `run_id: Optional[int] = None` field. list_events, unseen_events_for_sub, and the dashboard _event_dict were all SELECTing the column but dropping it on the way out, so downstream consumers couldn't group events by attempt. Every read path now surfaces run_id. - claim_task got a defensive invariant-recovery step: if somehow `current_run_id` is non-NULL on a task in 'ready' status (invariant violation from an unknown code path), close the leaked run as 'reclaimed' inside the same txn as the new claim. No-op in the common case; belt-and-suspenders in case a future code path forgets to clear the pointer. Dashboard - GET /tasks/:id events array now carries run_id per event (via _event_dict). - WebSocket /events SELECT now includes run_id in the pushed event payload. - TaskDrawer reloads itself on live events for its own task id. New `taskEventTick[taskId]` state in the Board, incremented on every WS event, passed down as `eventTick` prop; drawer's useEffect depends on it. Previously, background workers completing a task the user was viewing left the drawer showing stale data until manual close/reopen. - CSS: added `.hermes-kanban-run--ended` rule for the fallback class the JS emits when outcome is unset. Harmless before; just inconsistent. CLI - `hermes kanban watch --kinds` help text listed the legacy event name `spawn_auto_blocked`. The kernel migration renames it to `gave_up`, so users typing the documented name got zero matches. Now shows the current lexicon (`completed,blocked,gave_up, crashed,timed_out`). Tests (+6 in core functionality, +1 in dashboard plugin) - complete_never_claimed_task_synthesizes_run - block_never_claimed_task_synthesizes_run - complete_never_claimed_without_handoff_skips_synthesis - event_dataclass_carries_run_id (created.run_id None, completed.run_id matches) - unseen_events_for_sub_includes_run_id (notifier path) - claim_task_recovers_from_invariant_leak (engineer the leak, verify recovery) - event_dict_includes_run_id (dashboard API shape) 171/171 kanban suite pass under scripts/run_tests.sh. Live-smoke (isolated HERMES_HOME via execute_code) exercised all six fixed paths plus the claim-after-leak recovery sequence. Docs - Runs section: new 'Synthetic runs for never-claimed completions' and 'Live drawer refresh' paragraphs explaining the invariants. - Event reference: `created` / `promoted` / `unblocked` entries now explicitly note `run_id` is `NULL`; `completed` / `blocked` describe synthetic-run fallback.	2026-04-27 19:23:49 -07:00
Teknium	1c78f6627a	docs(kanban): document audit-pass invariants — bulk-close guard, reclaimed-on-status-change, completed event carries summary - Runs section: dashboard PATCH parity (summary/metadata forward), `completed` event embeds first-line summary for notifiers, bulk --summary/--metadata refused, archive/drag-drop reclaim semantics. - Event reference: added Payload column to Lifecycle and Edits tables; called out the invariant that `status` carries run_id when closing a reclaimed run.	2026-04-27 08:46:08 -07:00
Teknium	0146cb2bd2	feat(kanban): runs as first-class (v1); structured handoffs; forward-compat for v2 workflows Addresses vulcan-artivus's RFC review on issue #16102. Picks up the structural changes that are expensive to retrofit later and zero-cost to land now; defers workflow-template routing + per-stage lanes to v2 (kept forward-compat hooks in the schema). Kernel - New `task_runs` table. Each claim opens a run (pid, claim_lock, heartbeat, max_runtime, started_at), each terminal transition closes it with an outcome (completed / blocked / crashed / timed_out / spawn_failed / gave_up / reclaimed). Multiple rows per task when retries happen, preserving full attempt history. - `tasks.current_run_id` points at the active run (NULL when idle); denormalised for cheap reads. - `task_events.run_id` carries the run a given event belongs to so UIs group events by attempt. claim/spawned/complete/block/crash/ timeout/spawn_fail/gave_up/heartbeat events are all run-scoped; created/promoted/assigned/edited stay task-scoped (run_id=NULL). - Legacy DBs: migration adds the columns + indexes + synthesizes a run row for any task that's 'running' before the runs table existed, so subsequent complete/heartbeat/reclaim calls have a target. Idempotent. Structured handoff - `complete_task(summary=, metadata=)` persists both on the closing run. `summary` falls back to `result` when omitted so single-run callers don't duplicate. `metadata` is a free-form dict ({changed_files, tests_run, findings, ...}). - `build_worker_context` rewrites: now reads "Prior attempts on this task" (closed runs: outcome, summary, error, metadata) and "Parent task results" pulls run.summary + run.metadata of the most-recent completed run per parent, falling back to task.result for legacy rows without runs. Retrying workers see why earlier attempts failed; downstream workers see parent handoffs structurally, not as loose `result` strings. CLI - `hermes kanban complete <id> --summary "..." --metadata '{"files":1}'`. JSON is parsed and rejected with exit-2 if malformed. - New `hermes kanban runs <id> [--json]` verb. Shows per-run rows: outcome, profile, elapsed, summary, error. JSON mode serializes the full run dataclass for scripting. Dashboard plugin - GET /tasks/:id now carries a runs[] array alongside task / events / comments / links. Each run serialised with outcome, summary, metadata, worker_pid, elapsed fields. - New Run History section in the drawer. Outcome-coloured left border (green=active, blue=completed, amber=reclaimed, red=crashed/timed_out/gave_up/blocked). Collapsed when >3 runs with a '+N earlier' toggle. Shows summary + error + metadata inline. Forward-compat for v2 (vulcan's workflow templates + stages) - `tasks.workflow_template_id` and `tasks.current_step_key` added as nullable columns. v1 kernel ignores them for routing; v2 will add workflow_templates + workflow_steps tables and wire the dispatcher to consult them. task_runs has a matching `step_key` column. Lets a v2 release land additively without another schema migration. Tests (+22 in test_kanban_core_functionality.py, +2 in dashboard) - run_created_on_claim / run_closed_on_complete_with_summary - run_summary_falls_back_to_result - multiple_attempts_preserved_as_runs (3 attempts: reclaimed → crashed → completed, all visible in list_runs) - run_on_block_with_reason / run_on_spawn_failure_records_failed_runs (5 spawn_failed runs + 1 gave_up run) - event_rows_carry_run_id (task-scoped vs run-scoped split) - build_worker_context_includes_prior_attempts - build_worker_context_uses_parent_run_summary (metadata JSON in context) - migration_backfills_inflight_run_for_legacy_db (simulates a pre-migration running task, re-runs init_db, asserts backfill) - forward_compat_columns_writable - cli_runs_verb + cli_runs_json - cli_complete_with_summary_and_metadata (JSON round-trip through shlex + argparse) - cli_complete_bad_metadata_exits_nonzero - task_detail_includes_runs / task_detail_runs_empty_before_claim 269/269 kanban suite pass under scripts/run_tests.sh. Live-smoke covered: single-attempt complete → run closed + summary persisted; retry scenario → two runs visible (blocked + completed); parent run summary + metadata surfaced to child via build_worker_context; forward-compat columns writable via UPDATE; GET /tasks/:id returns runs[]. Docs - New 'Runs — one row per attempt' section in kanban.md: the why (full attempt history, structured metadata), the two-table model (task is logical, run is execution), the structured handoff shape (--summary / --metadata), example CLI + dashboard output, forward-compat note for v2. - Event reference updated to mention task_events.run_id. - CLI reference gains 'hermes kanban runs <id>'. Not in v1 (deferred to v2): - Workflow templates (workflow_templates + workflow_steps tables, stage-based routing, success/failure step links). - 'stage' as a distinct axis from status in the UI. - Shared-by-default workspace binding across stages of the same workflow run. - Pipeline replacement for the kanban-orchestrator skill (the orchestrator's 'decompose, don't execute' guidance is still correct; it becomes partly redundant once workflows land).	2026-04-27 06:54:19 -07:00
Teknium	da7d09c3b6	feat(kanban): max-runtime timeouts, worker heartbeats, assignees picker, event vocab cleanup Ports four items from the Multica audit (https://github.com/multica-ai/multica). Dropped their cross-host server/daemon architecture and their Postgres+pgvector skill search — both the wrong shape for our single-host SQLite kernel. 1. Per-task max-runtime (`max_runtime_seconds` column) - New kernel function `enforce_max_runtime(conn)` runs in every dispatch tick. When a running task's elapsed time exceeds the cap, we SIGTERM the worker, wait a 5 s grace (polling _pid_alive), then SIGKILL. The task goes back to 'ready' with a `timed_out` event and re-queues on the next tick (unless the spawn-failure circuit breaker has already parked it). - Host-local only: lock prefix must match this host's claimer_id so we never signal a PID on another machine. - CLI: `hermes kanban create --max-runtime 30m \| 2h \| 1d \| <seconds>`. New `_parse_duration` helper accepts s/m/h/d suffixes or bare integers. - Dashboard POST body + the card's `max_runtime_seconds` field. 2. Worker heartbeat (`last_heartbeat_at` column, `heartbeat` event) - `heartbeat_worker(conn, task_id, note=None)` emits the event and touches last_heartbeat_at. Refused when the task isn't running. - CLI: `hermes kanban heartbeat <id> [--note "..."]`. - kanban-worker skill instructs workers to heartbeat during long loops (training runs, encodes, crawls, batch uploads). - Separate signal from PID crash detection: a worker's Python can still be alive while the actual work process is stuck. Heartbeat absence is diagnostic; future work can auto-block on stale heartbeats but v1 just surfaces the signal. 3. Assignee enumeration (`known_assignees`, `list_profiles_on_disk`) - Scans ~/.hermes/profiles/ for dirs containing config.yaml + unions with current assignees on the board. Each entry returns {name, on_disk, counts: {status: n}}. - CLI: `hermes kanban assignees [--json]`. Also hooked into `hermes kanban init` which now prints discovered profiles so new installs see 'these are the assignees you can target' immediately. - Dashboard: GET /api/plugins/kanban/assignees for the picker. 4. Event vocab cleanup (three renames + three new kinds) - `ready` → `promoted` (fires when deps clear; clearer semantic). - `priority` → `reprioritized` (past-tense verb, matches others). - `spawn_auto_blocked` → `gave_up` (short, memorable; the circuit breaker gave up on this task). - New: `spawned` (emitted with {pid} on successful spawn), `heartbeat` ({note?}), `timed_out` ({pid, elapsed_seconds, limit_seconds, sigkill}). - One-shot migration in `_migrate_add_optional_columns` renames legacy rows in-place on init_db(), so existing DBs upgrade cleanly. - Gateway notifier's TERMINAL_KINDS set updated; timed_out gets its own ⏱ message template, gave_up renamed from 'auto-blocked'. - Plugin_api.py's two 'priority' emit sites renamed to 'reprioritized'. - Documented in a new 'Event reference' section in kanban.md, grouped into three clusters (lifecycle / edits / worker telemetry) with payload shapes. Tests (+18 in tests/hermes_cli/test_kanban_core_functionality.py, 136/136 pass): - max_runtime_terminates_overrun_worker: real SIGTERM flow with _pid_alive stub, verifies event payload + state reset. - max_runtime_none_means_no_cap: unbounded tasks aren't timed out. - create_task_persists_max_runtime. - enforce_max_runtime_integrates_with_dispatch: kernel-level + dispatch_once chaining. - heartbeat_on_running_task + heartbeat_refused_when_not_running. - cli_heartbeat_verb with --note round-trip. - recompute_ready_emits_promoted_not_ready. - spawn_failure_circuit_breaker_emits_gave_up. - spawned_event_emitted_with_pid. - migration_renames_legacy_event_kinds (injects old rows, re-runs init_db, asserts rename). - list_profiles_on_disk (tmp_path + config.yaml filter). - known_assignees_merges_disk_and_board (profiles on disk + board assignees + per-status counts). - cli_assignees_json. - parse_duration_accepts_formats (s/m/h/d/float). - parse_duration_rejects_garbage. - cli_create_max_runtime_via_duration (2h → 7200). - cli_create_max_runtime_bad_format_exits_nonzero. Live smoke: POST /tasks with max_runtime_seconds round-trips; /assignees returns the union of on-disk + board-assigned names; PATCH priority produces 'reprioritized' events (not 'priority'); board cards expose max_runtime_seconds + last_heartbeat_at. Docs (website/docs/user-guide/features/kanban.md): - New 'Event reference' section with three-cluster table (lifecycle / edits / worker telemetry) + payload shapes. - CLI reference updated for --max-runtime, heartbeat, assignees. - Gateway notifications section updated for the new TERMINAL_KINDS. Not ported from Multica (deliberate, documented in the out-of-scope section already): Postgres+pgvector skill search (heavy deps conflict with SQLite kernel), server+daemon cross-host model (we're single-host on purpose), first-class agent identity with threaded comments (we keep the board profile-agnostic).	2026-04-27 06:32:17 -07:00
Teknium	af8d43dbbb	feat(kanban): core hardening — daemon, circuit breaker, crash detect, logs, notify, bulk, stats Eliminates every 'known broken on day one' item in the core functionality audit. The board is now self-driving (daemon, not cron), self-healing (crash detection, spawn-failure circuit breaker), and self-reporting (logs, stats, gateway notifications). Dispatcher - New `hermes kanban daemon` long-lived loop with --interval, --max, --failure-limit, --pidfile, --verbose, signal-clean shutdown (SIGINT/SIGTERM via threading.Event). A kb.run_daemon() entry point lets tests drive it inline without subprocess. - `hermes kanban init` now prints the dispatcher setup hint so users don't leave the board off-by-default. Ships a systemd user unit at plugins/kanban/systemd/hermes-kanban-dispatcher.service. - Removed the old 'add this to cron' doc path. Cron runs agent prompts (LLM cost per tick) — unacceptable for a per-minute coordination loop. Worker aliveness / safety - Spawn returns the child's PID; dispatcher stores it on the task row and calls detect_crashed_workers() every tick. If the PID is gone but the claim TTL hasn't expired, the task drops back to ready with a 'crashed' event. Host-local only — cross-host PIDs are ignored per the single-host design. - Spawn-failure circuit breaker: after N consecutive spawn_failed events on the same task (default 5), the dispatcher auto-blocks with the last error as the reason. Success resets the counter. Workspace-resolution failures count against the same budget. - Log rotation: _rotate_worker_log trims at 2 MiB, keeps one generation (.log.1), bounds per-task disk usage at ~4 MiB. Idempotency / dedup - create_task(idempotency_key=...) returns the existing non-archived task id for retried webhooks. --idempotency-key on the CLI, json body field on the dashboard plugin. Archived tasks don't block a fresh create with the same key. CLI surface - Bulk verbs: complete, unblock, archive accept multiple ids; block accepts --ids for sibling blocks with the same reason. - New verbs: daemon, watch (live event tail filtered by assignee/tenant/kinds), stats, log, notify-subscribe, notify-list, notify-unsubscribe. - dispatch gains --failure-limit + crashed/auto_blocked columns in JSON output and human-readable output. - gc accepts --event-retention-days / --log-retention-days; prunes task_events for terminal tasks and old log files. Gateway integration - New GatewayRunner._kanban_notifier_watcher: polls kanban_notify_subs every 5s, pushes ✔/⏸/✖ messages to subscribed chats for completed/blocked/spawn_auto_blocked/crashed events. Cursor-advanced per-sub; auto-removed when the task reaches done/archived. Runs alongside the session expiry and platform reconnect watchers — SQLite work in asyncio.to_thread so the event loop never blocks. - /kanban create in the gateway auto-subscribes the originating chat (platform + chat_id + thread_id). Users see '(subscribed — you'll be notified when t_abcd completes or blocks)' appended to the response. Dashboard plugin - GET /stats returns board_stats (by_status, by_assignee, oldest_ready_age_seconds). - GET /tasks/:id/log returns the worker log with optional ?tail=N cap. 404 on unknown task, exists=false when the task has never spawned. - POST /tasks accepts idempotency_key; both Pydantic body and the create_task kwarg now round-trip. - /board attaches task.age (created/started/time_to_complete in seconds) so the UI can colour stale cards without recomputing. - Card CSS: amber border after N minutes, red border when clearly stuck (tier per status: running 10m/60m, ready 1h/24h, todo 7d/30d, blocked 1h/24h). - Drawer: new Worker log section, auto-loads on mount, last 100 KB cap with on-disk path surfaced when truncated. Kernel - Schema additions: tasks.idempotency_key, tasks.spawn_failures, tasks.worker_pid, tasks.last_spawn_error; new kanban_notify_subs table. All gated by _migrate_add_optional_columns so legacy DBs upgrade cleanly. - release_stale_claims / complete_task / block_task now all clear worker_pid so crash detection doesn't false-positive on reclaimed tasks. - read_worker_log fixed: tail-skip no longer eats one-giant-line logs (common with child processes that don't flush newlines before dying). Tests (tests/hermes_cli/test_kanban_core_functionality.py, 28 new) - Idempotency: same key returns existing, archived doesn't block, no key never collides - Circuit breaker: auto-blocks after limit, success resets counter, workspace-resolution failure counts against budget - Aliveness: _pid_alive helper, detect_crashed_workers reclaims exited child - Daemon: runs and stops cleanly via stop_event, survives a tick exception - Stats + task_age helpers - Notify subs: CRUD, cursor advances, distinct-thread is a separate row - GC: events-only-for-terminal-tasks, old worker logs deleted - Log: rotation keeps one generation, read_worker_log tail - CLI: bulk complete/archive/unblock/block, create with --idempotency-key, stats --json, notify-subscribe+list, log missing task, gc reports counts - run_slash parity: smoke-tests every registered verb (23 invocations); none may raise or return empty string Full kanban test suite: 234/234 pass under scripts/run_tests.sh (60 original + 30 dashboard plugin + 28 new core + 116 command registry). Live smoke covers /stats, idempotency, age, log endpoint with and without content, log?tail= truncation signal, 404 on unknown task. Docs (website/docs/user-guide/features/kanban.md) - 'Core concepts' rewritten: new statuses (triage), idempotency key, dispatcher-as-daemon-not-cron with circuit breaker behaviour documented. - Quick start swapped to daemon. New systemd section covers user service install. - New sections: idempotent create, bulk verbs, gateway notifications, out-of-scope single-host note (kanban.db is local; don't expect multi-host). - CLI reference updated for every new verb, every new flag.	2026-04-26 13:01:09 -07:00
Teknium	27fc6c1086	feat(kanban): bulk ops, drawer edit, dep editor, markdown, touch, config The dashboard plugin gets the last layer of features that turn it from a 'usable read surface with drag-drop' into a 'full kanban UI' — no more 'drop to CLI to do X' moments from inside the tab. Plugin backend - POST /tasks/bulk — apply the same patch (status / archive / assignee / priority) to every id in the request body. Each id runs independently: one bad id reports {ok: false, error: ...} without aborting siblings. Status transitions that aren't legal for the current state are surfaced per-id ('transition to done refused'). Used by the multi-select bulk action bar. - GET /config — returns the dashboard.kanban section of config.yaml (default_tenant, lane_by_profile, include_archived_by_default, render_markdown) with sensible defaults when the section is absent. Loaded once by the SPA to preselect filters and toggle markdown rendering. - _conn() helper — every handler now goes through it, calling kanban_db.init_db() (idempotent) before every connection. Fresh installs work whether the first hit is GET /board, POST /tasks, or any other endpoint — no more 'no such table: tasks' when the CLI or a script hits the plugin before the dashboard has ever loaded. Plugin UI (plugin bundle, +~12 KB) - Multi-select: per-card checkbox; shift/ctrl-click also toggles without opening the drawer. A BulkActionBar appears above the columns with batch → ready / complete / archive / reassign (profile dropdown + unassign option). Destructive batches confirm first. Partial failures from the backend are surfaced inline. - Drawer inline editing: - Click the title → TitleEditor swaps in an input, Enter saves, Escape cancels. - Click the Assignee meta row → AssigneeEditor input (empty string unassigns). - Click the Priority meta row → PriorityEditor numeric input. - New 'edit' button on Description → full-width textarea; Save / Cancel switch back to rendered view. - Dependency editor: chip list of parents + children with per-chip × button (calls DELETE /links). Add-parent / add-child dropdowns filter out self + already-linked tasks so you cannot re-add a duplicate edge or a self-loop. Cycle rejections from the server surface cleanly via the existing error banner. - Parent selection in InlineCreate: new dropdown listing every task on the board ('{id} — {title}') — picking one sends parents=[id] with the create payload, so the task lands in todo (or triage if created from the Triage column) with the dependency wired up. - Safe markdown rendering for description, comment bodies, and result. A small in-bundle renderer handles headings, bold, italic, inline code, fenced code, bullet lists, and http(s)/mailto links. Every substitution runs on HTML-escaped input (no raw HTML), links get target=_blank + rel=noopener,noreferrer. Disabled by config key dashboard.kanban.render_markdown=false (falls back to <pre>). - Touch drag-drop: attachTouchDrag() installs a pointerdown handler that spawns a drag proxy, tracks elementFromPoint under the finger, and dispatches a hermes-kanban:drop CustomEvent on the column when released. Desktop continues to use native HTML5 DnD. Columns listen for both. - ErrorBoundary already present from the prior commit catches any renderer throw; markdown escape + touch-proxy cleanup both have their own try/finally. Tests (tests/plugins/test_kanban_dashboard_plugin.py — 90/90 pass) - bulk_status_ready: 3 tasks blocked, batch → ready, all move - bulk_archive hides all ids from default board - bulk_reassign changes every assignee - bulk_unassign_via_empty_string sets assignee back to None - bulk_partial_failure_doesnt_abort_siblings: bogus id in middle, good siblings still get priority=7 - bulk_empty_ids_400 - config_returns_defaults_when_section_missing - config_reads_dashboard_kanban_section (writes config.yaml, verifies every key round-trips) Live smoke (real FastAPI app + isolated HERMES_HOME): - /config without section returns defaults - /config with dashboard.kanban section returns the configured values - POST /tasks as the first-ever request (no prior /board) succeeds — auto-init handles it - Link add + remove via POST /links + DELETE /links round-trip - Bulk priority bump on 2 ids, both get priority=5 - Bulk archive hides ids from default board - PATCH {title, body} updates the task, markdown source survives the round trip - POST /tasks {triage: true, parents: [id]} lands in triage, not todo - Bulk partial: 2 good + 1 bogus returns per-id outcome Docs (website/docs/user-guide/features/kanban.md) - 'What the plugin gives you' rewritten to reflect bulk, drawer edit, dep editor, parent-on-create, markdown, touch drag-drop. - New 'Dashboard config' subsection with a YAML example for dashboard.kanban.*. - REST table gains /tasks/bulk and /config rows.	2026-04-26 12:36:23 -07:00
Teknium	45806629c5	feat(kanban): Triage column, progress rollup, WS auth, lanes, polish Follows up on the initial dashboard plugin with the items called out during self-review — ships the GUI-reality claims the PR body made, closes the WebSocket auth gap, and lands the 'Triage' status the design spec's Fusion-style screenshot leads with. Kernel changes - kanban_db.VALID_STATUSES gains 'triage'. status is TEXT without a CHECK constraint so no schema migration is needed. - create_task(triage=True) forces the initial status to 'triage' regardless of parents, and parent ids are still validated so the eventual link rows don't dangle. recompute_ready() only promotes 'todo' -> 'ready', so triage tasks are naturally isolated from the dispatcher pipeline. - hermes kanban create gains --triage. Patterns table (docs) gains P9 'Triage specifier'. Plugin backend (plugins/kanban/dashboard/plugin_api.py) - GET /board now auto-init's kanban.db on first read (idempotent). A fresh install shows an empty board instead of 'failed to load'. - GET /board returns a new 'progress' field per task — {done, total} of child-task completion, or None if the task has no children. - BOARD_COLUMNS prepends 'triage'. - POST /tasks accepts {triage: bool}; PATCH /tasks/:id accepts {status: 'triage'}. - WebSocket /events now requires ?token=<session_token> as a query param — browsers can't set Authorization on a WS upgrade, so this matches the pattern the in-browser PTY bridge uses. Constant-time compare against hermes_cli.web_server._SESSION_TOKEN. In bare-test contexts (no dashboard module) the check no-ops so the tail loop stays testable. Security boundary documented in the module header and in website/docs/user-guide/features/kanban.md. Plugin UI (plugins/kanban/dashboard/dist/index.js + style.css) - Adds the Triage column (lilac dot) with helper text 'Raw ideas — a specifier will flesh out the spec'. Inline-create from the Triage column parks new tasks in triage. - Status action row in the drawer gains '→ triage'. - Progress pill (N/M) on cards that have children. Full-complete state tints the pill green. - 'Lanes by profile' toolbar toggle — sub-groups the Running column by assignee so you see at a glance which specialist is busy on what. - Destructive status moves (done / archived / blocked) via drag-drop OR via the drawer action row now prompt for confirmation. - Escape closes the drawer. - Live-update reloads are debounced (250ms) so a burst of task_events triggers one refetch, not N. - WebSocket includes ?token= built from window.__HERMES_SESSION_TOKEN__. - WebSocket reconnect uses exponential backoff capped at 30s, not a fixed 1.5s spin loop, and surfaces a user-visible error on code-1008 (auth rejected) instead of reconnecting forever. - ErrorBoundary wraps the page — a bad card render shows a 'rendering error, reload view' card instead of crashing the tab. Tests (tests/plugins/test_kanban_dashboard_plugin.py, +5 tests = 21) - empty-board shape now asserts all 6 columns including 'triage' - create_triage_lands_in_triage_column - triage_task_not_promoted_to_ready (dispatcher bypasses triage) - patch_status_triage_works (both into triage and out of it) - board_progress_rollup (0/2 -> 1/2 -> childless cards = None) - board_auto_initializes_missing_db - ws_events_rejects_when_token_required (three sub-assertions: missing → 1008, wrong → 1008, correct → handshake accepted) All 82 kanban tests pass under scripts/run_tests.sh. Docs - kanban.md 'What the plugin gives you' fully rewritten to match shipped reality (triage, progress pill, assignee lanes, destructive-confirm, Escape-close, debounce). - New 'Security model' subsection documents the explicit-plugin- route-bypass, the WS token requirement, and the --host 0.0.0.0 warning; also notes that kanban.db is profile-agnostic on purpose (the coordination primitive) so cross-profile visibility is expected. - CLI command reference shows --triage. - Collaboration patterns table adds P9 'Triage specifier'.	2026-04-26 12:26:43 -07:00
Teknium	4093201c47	feat(kanban): dashboard plugin — Linear/Fusion-style board UI Ships plugins/kanban/dashboard/ as a bundled dashboard plugin. No core changes — uses the standard dashboard plugin contract (manifest.json + dist/index.js + plugin_api.py) documented in 'Extending the Dashboard'. What the tab gives you: - One column per kanban status (todo / ready / running / blocked / done; archived behind a toggle), column counts, coloured status dots. - Cards with id, title, priority badge, tenant tag, assignee, comment/link counts, 'created N ago'. - HTML5 drag-drop between columns — status change routes through the same kanban_db code the CLI /kanban verbs use, so the three surfaces (CLI, gateway, dashboard) can never drift. - Inline create per-column (title, assignee, priority). - Side drawer on card click: description, status action row (→ ready / → running / block / unblock / complete / archive), dependency links, comment thread with Enter-to-submit, last 20 events. - Toolbar: search, tenant filter, assignee filter, show-archived, nudge-dispatcher (skip the 60s wait), refresh. - Live updates via WebSocket tailing task_events — the board reflects CLI or gateway actions in real time. REST surface under /api/plugins/kanban/: GET /board, GET /tasks/:id, POST /tasks, PATCH /tasks/:id, POST /tasks/:id/comments, POST /links, DELETE /links, POST /dispatch, WS /events. Every handler is a thin wrapper around kanban_db — no new business logic. Visually theme-aware: the plugin CSS reads only --color-*, --radius, --font-mono etc. so it reskins with whichever dashboard theme is active. Tests (tests/plugins/test_kanban_dashboard_plugin.py, 16 tests): - empty board shape - create + appears in ready column with tenant/assignee rollups - tenant filter - detail includes parents/children/events - 404 on unknown task - PATCH status: complete / block / unblock / ready drag-drop / running - PATCH reassign, priority, edit, invalid-status rejection - POST comment (plus empty-body rejection) - POST link + DELETE link + cycle rejection - POST dispatch (dry run) All 76 kanban tests pass under scripts/run_tests.sh. Docs: website/docs/user-guide/features/kanban.md gains a full 'Dashboard (GUI)' section covering install, architecture, REST surface, live-updates mechanism, extending, and scope boundary.	2026-04-26 12:08:47 -07:00
Teknium	9f610aa8f3	docs(kanban): add GUI/Dashboard plugin section The /kanban CLI + slash command are enough to run the board headlessly, but triage and cross-profile supervision want a visual board. Document the design as a dashboard plugin that: - reads live state from kanban.db over a WebSocket on task_events (no polling) - writes through run_slash() so CLI/gateway/GUI cannot drift - mounts under /api/plugins/kanban/ following the existing 'Extending the Dashboard' plugin shape The plugin is strictly a thin layer over kanban_db — no new business logic, nothing to merge into the kernel.	2026-04-26 11:57:04 -07:00
Teknium	e1c5e741ad	feat(kanban): durable multi-profile collaboration board (#16081 ) New `hermes kanban` CLI subcommand + `/kanban` slash command + skills for worker and orchestrator profiles. SQLite-backed task board (~/.hermes/kanban.db) shared across all profiles on the host. Zero changes to run_agent.py, no new core tools, no tool-schema bloat. Motivation: delegate_task is a function call — sync fork/join, anonymous subagent, no resumability, no human-in-the-loop. Kanban is the durable shape needed for research triage, scheduled ops, digital twins, engineering pipelines, and fleet work. They coexist (workers may call delegate_task internally). What this adds - hermes_cli/kanban_db.py — schema, CAS claim, dependency resolution, dispatcher, workspace resolution, worker-context builder. - hermes_cli/kanban.py — 15-verb CLI surface and shared run_slash() entry point used by both CLI and gateway. - skills/devops/kanban-worker — how a profile should work a claimed task. - skills/devops/kanban-orchestrator — "you are a dispatcher, not a worker" template with anti-temptation rules. - /kanban slash command wired into cli.py and gateway/run.py. Bypasses the running-agent guard (board writes don't touch agent state), so /kanban unblock can free a stuck worker mid-conversation. - Design spec at docs/hermes-kanban-v1-spec.pdf — comparative analysis vs Cline Kanban, Paperclip, NanoClaw, Gemini Enterprise; 8 patterns; 4 user stories; implementation plan; concurrency correctness. - Docs: website/docs/user-guide/features/kanban.md, CLI reference updated, sidebar entry added. Architecture highlights - Three planes: control (user + gateway), state (board + dispatcher), execution (pool of profile processes). - Every worker is a full OS process, spawned as `hermes -p <profile>`. No in-process subagent swarms — solves NanoClaw's SDK-lifecycle failure class. - Atomic claim via SQLite CAS in a BEGIN IMMEDIATE transaction; stale claims reclaimed 15 min after their TTL expires. - Tenant namespacing via one nullable column — one specialist fleet can serve many businesses with data isolation by workspace path. Tests: 60 targeted tests (schema, CAS atomicity, dependency resolution, dispatcher, workspace kinds, tenancy, CLI + slash surface). All pass hermetic via scripts/run_tests.sh.	2026-04-26 08:29:46 -07:00
Teknium	06f81752ed	Revert "feat(kanban): durable multi-profile collaboration board (#16081 )" (#16098 ) This reverts commit `15937a6b46`.	2026-04-26 08:29:37 -07:00
Teknium	15937a6b46	feat(kanban): durable multi-profile collaboration board (#16081 ) New `hermes kanban` CLI subcommand + `/kanban` slash command + skills for worker and orchestrator profiles. SQLite-backed task board (~/.hermes/kanban.db) shared across all profiles on the host. Zero changes to run_agent.py, no new core tools, no tool-schema bloat. Motivation: delegate_task is a function call — sync fork/join, anonymous subagent, no resumability, no human-in-the-loop. Kanban is the durable shape needed for research triage, scheduled ops, digital twins, engineering pipelines, and fleet work. They coexist (workers may call delegate_task internally). What this adds - hermes_cli/kanban_db.py — schema, CAS claim, dependency resolution, dispatcher, workspace resolution, worker-context builder. - hermes_cli/kanban.py — 15-verb CLI surface and shared run_slash() entry point used by both CLI and gateway. - skills/devops/kanban-worker — how a profile should work a claimed task. - skills/devops/kanban-orchestrator — "you are a dispatcher, not a worker" template with anti-temptation rules. - /kanban slash command wired into cli.py and gateway/run.py. Bypasses the running-agent guard (board writes don't touch agent state), so /kanban unblock can free a stuck worker mid-conversation. - Design spec at docs/hermes-kanban-v1-spec.pdf — comparative analysis vs Cline Kanban, Paperclip, NanoClaw, Gemini Enterprise; 8 patterns; 4 user stories; implementation plan; concurrency correctness. - Docs: website/docs/user-guide/features/kanban.md, CLI reference updated, sidebar entry added. Architecture highlights - Three planes: control (user + gateway), state (board + dispatcher), execution (pool of profile processes). - Every worker is a full OS process, spawned as `hermes -p <profile>`. No in-process subagent swarms — solves NanoClaw's SDK-lifecycle failure class. - Atomic claim via SQLite CAS in a BEGIN IMMEDIATE transaction; stale claims reclaimed 15 min after their TTL expires. - Tenant namespacing via one nullable column — one specialist fleet can serve many businesses with data isolation by workspace path. Tests: 60 targeted tests (schema, CAS atomicity, dependency resolution, dispatcher, workspace kinds, tenancy, CLI + slash surface). All pass hermetic via scripts/run_tests.sh.	2026-04-26 08:24:26 -07:00
Teknium	7fa70b6c87	refactor: /btw is now an alias for /background (#16053 ) The ephemeral no-tools side-question variant of /btw confused users who expected 'by-the-way' to mean 'run this off to the side with tools' — they'd type /btw and get a toolless agent that couldn't do the work. /bg worked because it was /background with full tools. Collapse the two: /btw and /bg both alias to /background. One command, one behavior, no more gotchas about which variant has tools. Removed: - _handle_btw_command in cli.py and gateway/run.py - _run_btw_task + _active_btw_tasks state in gateway/run.py - prompt.btw JSON-RPC method + btw.complete event in tui_gateway - BtwStartResponse type + btw.complete case in ui-tui - Standalone /btw slash tree registration in Discord - Standalone btw CommandDef in hermes_cli/commands.py Updated: - background CommandDef aliases: (bg,) -> (bg, btw) - TUI session.ts: local btw handler merged into background - Docs and tips updated to describe /btw as a /background alias	2026-04-26 07:11:08 -07:00
Teknium	9a70260490	Revert "feat(onboarding): port first-touch hints to the TUI (#16054 )" (#16062 ) This reverts commit `ffd2621039`.	2026-04-26 06:31:37 -07:00
Teknium	ffd2621039	feat(onboarding): port first-touch hints to the TUI (#16054 ) PR #16046 added /busy and /verbose hints to the classic CLI and the gateway runner but skipped the Ink TUI (and therefore the dashboard /chat page, which embeds the TUI via PTY). This extends the same latch to the TUI with TUI-native wording. The TUI's busy-input model is not the /busy knob from the CLI — single Enter while busy auto-queues, double Enter on an empty line interrupts. The new busy-input hint teaches THAT gesture instead of telling the user to flip a config that does not apply. Changes: - agent/onboarding.py — add busy_input_hint_tui() + tool_progress_hint_tui() - tui_gateway/server.py — onboarding.claim JSON-RPC (Ink triggers busy hint on enqueue) + _maybe_emit_onboarding_hint helper hooked into _on_tool_complete for the 30s/tool_progress=all path. Same config.yaml latch so each hint fires at most once per install across CLI, gateway, and TUI combined. - ui-tui/src/gatewayTypes.ts — OnboardingClaimResponse + onboarding.hint event - ui-tui/src/app/createGatewayEventHandler.ts — render the hint event as sys() - ui-tui/src/app/useSubmission.ts — claim busy_input_prompt on first busy enqueue - tests/agent/test_onboarding.py — +3 cases for TUI hint shape - tests/tui_gateway/test_protocol.py — +4 cases for onboarding.claim - website/docs/user-guide/tui.md — new 'Interrupting and queueing' section explaining the TUI's double-Enter model and the hints Validation: scripts/run_tests.sh tests/agent/test_onboarding.py \ tests/tui_gateway/test_protocol.py \ tests/gateway/test_busy_session_ack.py -> 66 passed npm --prefix ui-tui run type-check -> clean npm --prefix ui-tui run lint -> clean npm --prefix ui-tui run build -> clean	2026-04-26 06:24:19 -07:00
Teknium	83c1c201f6	feat(onboarding): contextual first-touch hints for /busy and /verbose (#16046 ) Instead of a blocking first-run questionnaire, show a one-time hint the first time the user hits each behavior fork: 1. First message while the agent is working — appends a hint to the busy-ack explaining the /busy queue vs /busy interrupt knob, phrased to match the mode that was just applied (don't tell a queue-mode user to switch to queue). 2. First tool that runs for >= 30s in the noisiest progress mode (tool_progress: all) — prints a hint about /verbose to cycle display modes (all -> new -> off -> verbose). Gated on /verbose actually being usable on the surface: always shown on CLI; on gateway only shown when display.tool_progress_command is enabled. Each hint is latched in config.yaml under onboarding.seen.<flag>, so it fires exactly once per install across CLI, gateway, and cron, then never again. Users can wipe the section to re-see hints. New: - agent/onboarding.py — is_seen / mark_seen / hint strings, shared by both CLI and gateway. - onboarding.seen in DEFAULT_CONFIG (hermes_cli/config.py) and in load_cli_config defaults (cli.py). No _config_version bump — deep merge handles new keys. Wired: - gateway/run.py: _handle_active_session_busy_message appends the hint after building the ack. progress_callback tracks tool.completed duration and queues the tool-progress hint into the progress bubble. - cli.py: CLI input loop appends the busy-input hint on the first busy Enter; _on_tool_progress appends the tool-progress hint on the first >=30s tool completion. In-memory CLI_CONFIG is also updated so subsequent fires in the same process are suppressed immediately. All writes go through atomic_yaml_write and are wrapped in try/except so onboarding can never break the input/busy-ack paths.	2026-04-26 06:06:27 -07:00
Teknium	855366909f	feat(models): remote model catalog manifest for OpenRouter + Nous Portal (#16033 ) OpenRouter and Nous Portal curated picker lists now resolve via a JSON manifest served by the docs site, falling back to the in-repo snapshot when unreachable. Lets us update model lists without shipping a release. Live URL: https://hermes-agent.nousresearch.com/docs/api/model-catalog.json (source at website/static/api/model-catalog.json; auto-deploys via the existing deploy-site.yml GitHub Pages pipeline on every merge to main). Schema (v1) carries id + optional description + free-form metadata at manifest, provider, and model levels. Pricing and context length stay live-fetched via existing machinery (/v1/models endpoints, models.dev). Config (new model_catalog section, default enabled): model_catalog.url master manifest URL model_catalog.ttl_hours disk cache TTL (default 24h) model_catalog.providers.<name>.url optional per-provider override Fetch pipeline: in-process cache -> disk cache (fresh < TTL) -> HTTP fetch -> disk-cache-on-failure fallback -> in-repo snapshot as last resort. Never raises to callers; at worst returns the bundled list. Changes: - website/static/api/model-catalog.json initial manifest (35 OR + 31 Nous) - scripts/build_model_catalog.py regenerator from in-repo lists - hermes_cli/model_catalog.py fetch + validate + cache module - hermes_cli/models.py fetch_openrouter_models() + new get_curated_nous_model_ids() - hermes_cli/main.py, hermes_cli/auth.py Nous flows use the helper - hermes_cli/config.py model_catalog defaults - website/docs/reference/model-catalog.md + sidebars.ts - tests/hermes_cli/test_model_catalog.py 21 tests (validation, fetch success/failure, accessors, disabled, overrides, integration)	2026-04-26 05:46:43 -07:00
Teknium	59b56d445c	feat(hooks): add duration_ms to post_tool_call + transform_tool_result (#15429 ) Plugin hooks fired after a tool dispatch now receive an integer duration_ms kwarg measuring how long the tool's registry.dispatch() call took (time.monotonic() before/after). Inspired by Claude Code 2.1.119 which added the same field to PostToolUse hook inputs. Wire points: - model_tools.py: measure dispatch latency, pass duration_ms to invoke_hook("post_tool_call", ...) and invoke_hook("transform_tool_result", ...) - hermes_cli/hooks.py: include duration_ms in the synthetic payload used by 'hermes hooks test' and 'hermes hooks doctor' so shell-hook authors see the same shape at development time as runtime - shell hooks (agent/shell_hooks.py): no code change needed; _serialize_payload already surfaces non-top-level kwargs under payload['extra'], so duration_ms lands at extra.duration_ms for shell-hook scripts Plugin authors can now build latency dashboards, per-tool SLO alerts, and regression canaries without having to wrap every tool manually. Test: tests/test_model_tools.py::test_post_tool_call_receives_non_negative_integer_duration_ms E2E: real PluginManager + dispatch monkey-patched with a 50ms sleep, hook callback observes duration_ms=50 (int). Refs: https://code.claude.com/docs/en/changelog (2.1.119, Apr 23 2026)	2026-04-25 22:13:12 -07:00
Teknium	a55de5bcd0	feat(setup): auto-reconfigure on existing installs (#15879 ) Bare `hermes setup` on a returning user now drops straight into the full reconfigure wizard — every prompt shows the current value as its default, press Enter to keep or type a new value to change it. The returning-user menu is gone. Behavior: - First-time user: first-time wizard (unchanged) - Returning user, bare command: full reconfigure wizard (new default) - Returning user, `--quick`: only prompt for missing/unset items - Returning user, one section: `hermes setup model\|terminal\|gateway\|tools\|agent` - `--reconfigure`: preserved as backwards-compat alias (no-op since it's now default) The section functions already used current values as prompt defaults — this change just removes the extra click to get to them. The 'Quick Setup - configure missing items only' menu option is now exposed as the explicit `--quick` flag; it's the narrow case of filling in missing config (e.g. after a partial OpenClaw migration or when a required API key got cleared). Inspired by Mercury Agent's `mercury doctor` UX. Also removes: - RETURNING_USER_MENU_SECTION_KEYS (orphaned constant) - Two returning-user menu tests in test_setup_noninteractive.py (guarding behavior that no longer exists — covered by test_setup_reconfigure.py instead)	2026-04-25 22:02:02 -07:00
Teknium	7c50ed707c	docs(azure-foundry): add provider guide, env vars, release AUTHOR_MAP - New website/docs/guides/azure-foundry.md covering both OpenAI-style and Anthropic-style endpoints, auto-detection behaviour, gpt-5.x routing, /v1 stripping, api-version query forwarding, and the provider: anthropic + Azure URL alternative setup. - environment-variables.md picks up AZURE_FOUNDRY_API_KEY, AZURE_FOUNDRY_BASE_URL, AZURE_ANTHROPIC_KEY. - cli-commands.md includes azure-foundry in the provider choices list. - configuration.md lists azure-foundry among auxiliary-task providers. - sidebars.ts wires the new guide into the Guides section. - scripts/release.py AUTHOR_MAP entries for TechPrototyper, HangGlidersRule (noreply), and pein892 so the contributor-attribution CI check does not reject the salvage.	2026-04-25 18:48:43 -07:00
nerijusas	81e01f6ee9	fix(agent): preserve Codex message items for replay	2026-04-25 18:22:06 -07:00
Teknium	dc4d92f131	docs: embed tutorial videos on webhooks + auxiliary models pages (#15809 ) - webhooks.md: adds a Video Tutorial section under the intro with a responsive YouTube iframe (WNYe5mD4fY8). - configuration.md: adds a Video Tutorial subsection under Auxiliary Models with a responsive YouTube iframe (NoF-YajElIM). Both use a 16:9 aspect-ratio wrapper so the embeds scale cleanly on mobile. Verified with `npm run build` — MDX parses clean, no new warnings or broken links introduced.	2026-04-25 16:44:53 -07:00
Teknium	ea01bdcebe	refactor(memory): remove flush_memories entirely (#15696 ) The AIAgent.flush_memories pre-compression save, the gateway _flush_memories_for_session, and everything feeding them are obsolete now that the background memory/skill review handles persistent memory extraction. Problems with flush_memories: - Pre-dates the background review loop. It was the only memory-save path when introduced; the background review now fires every 10 user turns on CLI and gateway alike, which is far more frequent than compression or session reset ever triggered flush. - Blocking and synchronous. Pre-compression flush ran on the live agent before compression, blocking the user-visible response. - Cache-breaking. Flush built a temporary conversation prefix (system prompt + memory-only tool list) that diverged from the live conversation's cached prefix, invalidating prompt caching. The gateway variant spawned a fresh AIAgent with its own clean prompt for each finalized session — still cache-breaking, just in a different process. - Redundant. Background review runs in the live conversation's session context, gets the same content, writes to the same memory store, and doesn't break the cache. Everything flush_memories claimed to preserve is already covered. What this removes: - AIAgent.flush_memories() method (~248 LOC in run_agent.py) - Pre-compression flush call in _compress_context - flush_memories call sites in cli.py (/new + exit) - GatewayRunner._flush_memories_for_session + _async_flush_memories (and the 3 call sites: session expiry watcher, /new, /resume) - 'flush_memories' entry from DEFAULT_CONFIG auxiliary tasks, hermes tools UI task list, auxiliary_client docstrings - _memory_flush_min_turns config + init - #15631's headroom-deduction math in _check_compression_model_feasibility (headroom was only needed because flush dragged the full main-agent system prompt along; the compression summariser sends a single user-role prompt so new_threshold = aux_context is safe again) - The dedicated test files and assertions that exercised flush-specific paths What this renames (with read-time backcompat on sessions.json): - SessionEntry.memory_flushed -> SessionEntry.expiry_finalized. The session-expiry watcher still uses the flag to avoid re-running finalize/eviction on the same expired session; the new name reflects what it now actually gates. from_dict() reads 'expiry_finalized' first, falls back to the legacy 'memory_flushed' key so existing sessions.json files upgrade seamlessly. Supersedes #15631 and #15638. Tested: 383 targeted tests pass across run_agent/, agent/, cli/, and gateway/ session-boundary suites. No behavior regressions — background memory review continues to handle persistent memory extraction on both CLI and gateway.	2026-04-25 08:21:14 -07:00
Teknium	cf2fabc40f	docs(dashboard): document page-scoped plugin slots (#15662 ) Follow-up to PR #15658. The feature PR introduced page-scoped slots (<page>:top / <page>:bottom inside every built-in page) but only touched the Shell slots catalogue. Adds proper narrative coverage so plugin authors find the feature. Changes - extending-the-dashboard.md: - Frontmatter description + intro bullet now mention page-scoped slots - New TOC entry "Augmenting built-in pages (page-scoped slots)" - New dedicated subsection after "Replacing built-in pages" explaining the heavy-vs-light tradeoff, listing the pages that expose slots, and showing a worked manifest + IIFE example with tab.hidden: true - Cross-link from the tab.override section pointing readers to the lighter augmentation option - web-dashboard.md: - Bullet mentioning "page-scoped slots (inject widgets into built-in pages without overriding them)" Validation - TOC anchor "#augmenting-built-in-pages-page-scoped-slots" matches the generated heading slug - Code fences balanced (64, even) - Pre-existing docusaurus build errors (skills.json, api-server.md link) reproduce on bare main -- not introduced here	2026-04-25 06:59:24 -07:00
Teknium	af22421e87	feat(dashboard): page-scoped plugin slots for built-in pages (#15658 ) * fix(terminal): three-layer defense against watch_patterns notification spam Background processes that stack notify_on_complete=True with watch_patterns can flood the user with duplicate, delayed notifications — matches deliver asynchronously via the completion queue and continue arriving minutes after the process has exited. The docstring warning against this (PR #12113) has proven insufficient; agents still misuse the combination. Three layered defenses, each sufficient on its own: 1. Mutual exclusion (terminal_tool.py): When both flags are set on a background process, drop watch_patterns with a warning. notify_on_complete wins because 'let me know when it's done' is the more useful signal and fires exactly once. Extracted as _resolve_notification_flag_conflict() so the rule is testable in isolation. 2. Suppress-after-exit (process_registry.py): _check_watch_patterns() now bails the moment session.exited is True. Post-exit chunks (buffered reads draining after the process is gone) no longer produce notifications. This is the fix flagged as future work in session 20260418_020302_79881c. 3. Global circuit breaker (process_registry.py): Per-session rate limits don't catch the sibling-flood case — N concurrent processes can each stay under 8/10s and still collectively spam. New WATCH_GLOBAL_MAX_PER_WINDOW=15 cap trips a 30-second cooldown across ALL sessions, emits a single watch_overflow_tripped event, silently counts dropped events, and emits a watch_overflow_released summary when the cooldown ends. Also updates the tool schema + docstring to document the new behavior. Tests: 8 new tests covering all three fixes (suppress-after-exit x2, mutual-exclusion resolver x4, global breaker trip/cooldown/release x2). All 60 tests across test_watch_patterns.py, test_notify_on_complete.py, test_terminal_tool.py pass. Real-world trigger: self-inflicted in session 20260425_051924 — three concurrent hermes-sweeper review subprocesses each set watch_patterns= ['failed validation', 'errored'] AND notify_on_complete=True, then iterated over multiple items, producing enough matches per process to defeat the per-session cap while staying under the global cap that didn't yet exist. * fix(terminal): aggressive 1-per-15s watch_patterns rate limit + strike-3 promotion Per Teknium's direction, the watch_patterns rate limit is now much more aggressive and self-healing. ## New rule — per session - HARD cap: 1 watch-match notification per 15 seconds per process. - Any match arriving inside the cooldown window is dropped and counts as ONE strike for that window (many drops in the same window still = 1 strike). - After 3 consecutive strike windows, watch_patterns is permanently disabled for the session and the session is auto-promoted to notify_on_complete semantics — exactly one notification when the process actually exits. - A cooldown window that expires with zero drops resets the consecutive strike counter — healthy cadence is forgiven. ## Schema + docstring rewritten The tool schema description now gives the model explicit guidance: - notify_on_complete is 'the right choice for almost every long-running task' - watch_patterns is for RARE one-shot signals on LONG-LIVED processes - Do NOT use watch_patterns with loops/batch jobs — error patterns fire every iteration and will hit the strike limit fast - Mutual exclusion is stated on both parameter descriptions - 1/15s cooldown and 3-strike promotion are stated in the watch_patterns description so the model sees the contract every turn ## Removed - WATCH_MAX_PER_WINDOW (8/10s) and WATCH_OVERLOAD_KILL_SECONDS (45) — the new 1/15s limit subsumes both; keeping them would double-count. - _watch_window_hits / _watch_window_start / _watch_overload_since fields on ProcessSession. Replaced by _watch_last_emit_at / _watch_cooldown_until / _watch_strike_candidate / _watch_consecutive_strikes. ## Kept - Global circuit breaker across all sessions (15/10s → 30s cooldown) as a secondary safety net for concurrent siblings. Still valuable when 20 short-lived processes each fire once — none individually violates the per-session limit. - Suppress-after-exit guard. - Mutual exclusion resolver at the tool entry point. ## Tests - 6 new tests in TestPerSessionRateLimit covering: first match delivers, second in cooldown suppressed, multi-drop = single strike, 3 strikes disables + promotes, clean window resets counter, suppressed count carried to next emit. - Global circuit breaker tests rewritten to use fresh sessions instead of hacking removed per-window fields. - 50/50 watch_patterns + notify_on_complete tests pass. - 60/60 including test_terminal_tool.py pass. * feat(dashboard): page-scoped plugin slots for built-in pages Dashboard plugins can now inject components into specific built-in pages (Sessions, Analytics, Logs, Cron, Skills, Config, Env, Docs, Chat) without overriding the whole route. Previously, plugins could only: 1. Add new tabs (tab.path) 2. Replace whole built-in pages (tab.override) 3. Inject into global shell slots (header-, footer-, pre-main, ...) None of those let a plugin add a banner, card, or widget to an existing page. The new <page>:top / <page>:bottom slots close that gap, reusing the existing registerSlot() API. Changes - web/src/plugins/slots.ts: 18 new KNOWN_SLOT_NAMES entries (sessions:top, sessions:bottom, analytics:top, ..., chat:bottom), grouped under "Shell-wide" vs "Page-scoped" in the docblock - web/src/pages/*: each built-in page now renders <PluginSlot name="<page>:top" /> as the first child of its outer wrapper and <PluginSlot name="<page>:bottom" /> as the last child -- zero visual cost when no plugin registers - plugins/example-dashboard: registers a demo banner into sessions:top via registerSlot(), with matching slots entry in the manifest -- so freshly-setup users can see what page-scoped slots look like without writing any plugin code - website/docs: new "Page-scoped slots" table in the plugin authoring guide, with a worked example - tests/hermes_cli/test_web_server.py: round-trip test for colon-bearing slot names (sessions:top, analytics:bottom, ...) Validation - npm run build: clean (tsc -b + vite build, 2761 modules) - scripts/run_tests.sh tests/hermes_cli/test_web_server.py::TestDashboardPluginManifestExtensions: 5/5 pass	2026-04-25 06:55:35 -07:00
Teknium	e5647d7863	docs: consolidate dashboard themes and plugins into Extending the Dashboard (#15530 ) The web-dashboard.md and dashboard-plugins.md pages had overlapping, partial coverage of the theme and plugin systems. Themes were split across two pages; the plugin docs had a minimal manifest reference but no step-by-step guide, no slot catalog, and no theme+plugin demo. New: user-guide/features/extending-the-dashboard.md — single navigable reference for all three extension layers (themes, UI plugins, backend plugins). Includes: - Theme quick-start + full schema (palette, typography, layout, layout variants, assets, componentStyles, colorOverrides, customCSS) - Plugin quick-start + full schema (manifest, SDK, slots, tab.override, tab.hidden, backend routes, custom CSS) - 10-slot shell catalog with locations - Plugin discovery + load lifecycle - Combined theme+plugin walkthrough (Strike Freedom cockpit demo) - API reference + troubleshooting web-dashboard.md: trimmed to core tool docs (pages, REST API, CORS, development). Theme/plugin content now points to the new page with a built-in themes summary table. dashboard-plugins.md: deleted (merged into extending-the-dashboard.md). sidebars.ts: swap 'dashboard-plugins' → 'extending-the-dashboard' under the Management group. No user-facing behavior change; docs-only.	2026-04-24 23:26:51 -07:00
MorAlekss	0ed37c0ca4	docs(delegate): document max_concurrent_children and max_spawn_depth + cost warning	2026-04-24 20:38:58 -07:00
Julia Bennet	1dcf79a864	feat: add slash command for busy input mode	2026-04-24 15:15:26 -07:00
Allard	0bcbc9e316	docs(faq): Update docs on backups - update faq answer with new `backup` command in release 0.9.0 - move profile export section together with backup section so related information can be read more easily - add table comparison between `profile export` and `backup` to assist users if understanding the nuances between both	2026-04-24 15:14:08 -07:00
Austin Pickett	c61547c067	Merge pull request #14890 from NousResearch/bb/tui-web-chat-unified feat(web): dashboard Chat tab — xterm.js + JSON-RPC sidecar (supersedes #12710 + #13379)	2026-04-24 10:35:43 -07:00
Austin Pickett	850fac14e3	chore: address copilot comments	2026-04-24 12:51:04 -04:00
Austin Pickett	5500b51800	chore: fix lint	2026-04-24 12:32:10 -04:00
Keira Voss	10deb1b87d	fix(gateway): canonicalize WhatsApp identity in session keys Hermes' WhatsApp bridge routinely surfaces the same person under either a phone-format JID (60123456789@s.whatsapp.net) or a LID (…@lid), and may flip between the two for a single human within the same conversation. Before this change, build_session_key used the raw identifier verbatim, so the bridge reshuffling an alias form produced two distinct session keys for the same person — in two places: 1. DM chat_id — a user's DM sessions split in half, transcripts and per-sender state diverge. 2. Group participant_id (with group_sessions_per_user enabled) — a member's per-user session inside a group splits in half for the same reason. Add a canonicalizer that walks the bridge's lid-mapping-*.json files and picks the shortest/numeric-preferred alias as the stable identity. build_session_key now routes both the DM chat_id and the group participant_id through this helper when the platform is WhatsApp. All other platforms and chat types are untouched. Expose canonical_whatsapp_identifier and normalize_whatsapp_identifier as public helpers. Plugins that need per-sender behaviour (role-based routing, per-contact authorization, policy gating) need the same identity resolution Hermes uses internally; without a public helper, each plugin would have to re-implement the walker against the bridge's internal on-disk format. Keeping this alongside build_session_key makes it authoritative and one refactor away if the bridge ever changes shape. _expand_whatsapp_aliases stays private — it's an implementation detail of how the mapping files are walked, not a contract callers should depend on.	2026-04-24 07:55:55 -07:00
emozilla	f49afd3122	feat(web): add /api/pty WebSocket bridge to embed TUI in dashboard Exposes hermes --tui over a PTY-backed WebSocket so the dashboard can embed the real TUI rather than reimplement its surface. The browser attaches xterm.js to the socket; keystrokes flow in, PTY output bytes flow out. Architecture: browser <Terminal> (xterm.js) │ onData ───► ws.send(keystrokes) │ onResize ► ws.send('\x1b[RESIZE:cols;rows]') │ write ◄── ws.onmessage (PTY bytes) ▼ FastAPI /api/pty (token-gated, loopback-only) ▼ PtyBridge (ptyprocess) ── spawns node ui-tui/dist/entry.js ──► tui_gateway + AIAgent Components ---------- hermes_cli/pty_bridge.py Thin wrapper around ptyprocess.PtyProcess: byte-safe read/write on the master fd via os.read/os.write (not PtyProcessUnicode — ANSI is inherently byte-oriented and UTF-8 boundaries may land mid-read), non-blocking select-based reads, TIOCSWINSZ resize, idempotent SIGHUP→SIGTERM→SIGKILL teardown, platform guard (POSIX-only; Windows is WSL-supported only). hermes_cli/web_server.py @app.websocket("/api/pty") endpoint gated by the existing _SESSION_TOKEN (via ?token= query param since browsers can't set Authorization on WS upgrades). Loopback-only enforcement. Reader task uses run_in_executor to pump PTY bytes without blocking the event loop. Writer loop intercepts a custom \x1b[RESIZE:cols;rows] escape before forwarding to the PTY. The endpoint resolves the TUI argv through a _resolve_chat_argv hook so tests can inject fake commands without building the real TUI. Tests ----- tests/hermes_cli/test_pty_bridge.py — 12 unit tests: spawn, stdout, stdin round-trip, EOF, resize (via TIOCSWINSZ + tput readback), close idempotency, cwd, env forwarding, unavailable-platform error. tests/hermes_cli/test_web_server.py — TestPtyWebSocket adds 7 tests: missing/bad token rejection (close code 4401), stdout streaming, stdin round-trip, resize escape forwarding, unavailable-platform ANSI error frame + 1011 close, resume parameter forwarding to argv. 96 tests pass under scripts/run_tests.sh. (cherry picked from commit `29b337bca7`) feat(web): add Chat tab with xterm.js terminal + Sessions resume button (cherry picked from commit `3d21aee8` by emozilla, conflicts resolved against current main: BUILTIN_ROUTES table + plugin slot layout) fix(tui): replace OSC 52 jargon in /copy confirmation When the user ran /copy successfully, Ink confirmed with: sent OSC52 copy sequence (terminal support required) That reads like a protocol spec to everyone who isn't a terminal implementer. The caveat was a historical artifact — OSC 52 wasn't universally supported when this message was written, so the TUI honestly couldn't guarantee the copy had landed anywhere. Today every modern terminal (including the dashboard's embedded xterm.js) handles OSC 52 reliably. Say what the user actually wants to know — that it copied, and how much — matching the message the TUI already uses for selection copy: copied 1482 chars (cherry picked from commit `a0701b1d5a`) docs: document the dashboard Chat tab AGENTS.md — new subsection under TUI Architecture explaining that the dashboard embeds the real hermes --tui rather than rewriting it, with pointers to the pty_bridge + WebSocket endpoint and the rule 'never add a parallel chat surface in React.' website/docs/user-guide/features/web-dashboard.md — user-facing Chat section inside the existing Web Dashboard page, covering how it works (WebSocket + PTY + xterm.js), the Sessions-page resume flow, and prerequisites (Node.js, ptyprocess, POSIX kernel / WSL on Windows). (cherry picked from commit `2c2e32cc45`) feat(tui-gateway): transport-aware dispatch + WebSocket sidecar Decouples the JSON-RPC dispatcher from its I/O sink so the same handler surface can drive multiple transports concurrently. The PTY chat tab already speaks to the TUI binary as bytes — this adds a structured event channel alongside it for dashboard-side React widgets that need typed events (tool.start/complete, model picker state, slash catalog) that PTY can't surface. - `tui_gateway/transport.py` — `Transport` protocol + `contextvars` binding + module-level `StdioTransport` fallback. The stdio stream resolves through a lambda so existing tests that monkey-patch `_real_stdout` keep passing without modification. - `tui_gateway/ws.py` — WebSocket transport implementation; FastAPI endpoint mounting lives in hermes_cli/web_server.py. - `tui_gateway/server.py`: - `write_json` routes via session transport (for async events) → contextvar transport (for in-request writes) → stdio fallback. - `dispatch(req, transport=None)` binds the transport for the request lifetime and propagates it to pool workers via `contextvars.copy_context` so async handlers don't lose their sink. - `_init_session` and the manual-session create path stash the request's transport so out-of-band events (subagent.complete, etc.) fan out to the right peer. `tui_gateway.entry` (Ink's stdio handshake) is unchanged externally — it falls through every precedence step into the stdio fallback, byte- identical to the previous behaviour. feat(web): ChatSidebar — JSON-RPC sidecar next to xterm.js terminal Composes the two transports into a single Chat tab: ┌─────────────────────────────────────────┬──────────────┐ │ xterm.js / PTY (emozilla #13379) │ ChatSidebar │ │ the literal hermes --tui process │ /api/ws │ └─────────────────────────────────────────┴──────────────┘ terminal bytes structured events The terminal pane stays the canonical chat surface — full TUI fidelity, slash commands, model picker, mouse, skin engine, wide chars all paint inside the terminal. The sidebar opens a parallel JSON-RPC WebSocket to the same gateway and renders metadata that PTY can't surface to React chrome: • model + provider badge with connection state (click → switch) • running tool-call list (driven by tool.start / tool.progress / tool.complete events) • model picker dialog (gateway-driven, reuses ModelPickerDialog) The sidecar is best-effort. If the WS can't connect (older gateway, network hiccup, missing token) the terminal pane keeps working unimpaired — sidebar just shows the connection-state badge in the appropriate tone. - `web/src/components/ChatSidebar.tsx` — new component (~270 lines). Owns its GatewayClient, drives the model picker through `slash.exec`, fans tool events into a capped tool list. - `web/src/pages/ChatPage.tsx` — split layout: terminal pane (`flex-1`) + sidebar (`w-80`, `lg+` only). - `hermes_cli/web_server.py` — mount `/api/ws` (token + loopback guards mirror /api/pty), delegate to `tui_gateway.ws.handle_ws`. Co-authored-by: emozilla <emozilla@nousresearch.com> refactor(web): /clean pass on ChatSidebar + ChatPage lint debt - ChatSidebar: lift gw out of useRef into a useMemo derived from a reconnect counter. React 19's react-hooks/refs and react-hooks/ set-state-in-effect rules both fire when you touch a ref during render or call setState from inside a useEffect body. The counter-derived gw is the canonical pattern for "external resource that needs to be replaceable on user action" — re-creating the client comes from bumping `version`, the effect just wires + tears down. Drops the imperative `gwRef.current = …` reassign in reconnect, drops the truthy ref guard in JSX. modelLabel + banner inlined as derived locals (one-off useMemo was overkill). - ChatPage: lazy-init the banner state from the missing-token check so the effect body doesn't have to setState on first run. Drops the unused react-hooks/exhaustive-deps eslint-disable. Adds a scoped no-control-regex disable on the SGR mouse parser regex (the \\x1b is intentional for xterm escape sequences). All my-touched files now lint clean. Remaining warnings on web/ belong to pre-existing files this PR doesn't touch. Verified: vitest 249/249, ui-tui eslint clean, web tsc clean, python imports clean. chore: uptick fix(web): drop ChatSidebar tool list — events can't cross PTY/WS boundary The /api/pty endpoint spawns `hermes --tui` as a child process with its own tui_gateway and _sessions dict; /api/ws runs handle_ws in-process in the dashboard server with a separate _sessions dict. Tool events fire on the child's gateway and never reach the WS sidecar, so the sidebar's tool.start/progress/complete listeners always observed an empty list. Drop the misleading list (and the now-orphaned ToolCall primitive), keep model badge + connection state + model picker + error banner — those work because they're sidecar-local concerns. Surfacing tool calls in the sidebar requires cross-process forwarding (PTY child opens a back-WS to the dashboard, gateway tees emits onto stdio + sidecar transport) — proper feature for a follow-up. feat(web): wire ChatSidebar tool list to PTY child via /api/pub broadcast The dashboard's /api/pty spawns hermes --tui as a child process; tool events fire in the python tui_gateway grandchild and never crossed the process boundary into the in-process WS sidecar — so the sidebar tool list was always empty. Cross-process forwarding: - tui_gateway: TeeTransport (transport.py) + WsPublisherTransport (event_publisher.py, sync websockets client). entry.py installs the tee on _stdio_transport when HERMES_TUI_SIDECAR_URL is set, mirroring every dispatcher emit to a back-WS without disturbing Ink's stdio handshake. - hermes_cli/web_server.py: new /api/pub (publisher) + /api/events (subscriber) endpoints with a per-channel registry. /api/pty now accepts ?channel= and propagates the sidecar URL via env. start_server also stashes app.state.bound_port so the URL is constructable. - web/src/pages/ChatPage.tsx: generates a channel UUID per mount, passes it to /api/pty and as a prop to ChatSidebar. - web/src/components/ChatSidebar.tsx: opens /api/events?channel=, fans tool.start/progress/complete back into the ToolCall list. Restores the ToolCall primitive. Tests: 4 new TestPtyWebSocket cases cover channel propagation, broadcast fan-out, and missing-channel rejection (10 PTY tests pass, 120 web_server tests overall). fix(web): address Copilot review on #14890 Five threads, all real: - gatewayClient.ts: register `message`/`close` listeners BEFORE awaiting the open handshake. Server emits `gateway.ready` immediately after accept, so a listener attached after the open promise could race past the initial skin payload and lose it. - ChatSidebar.tsx: wire `error`/`close` on the /api/events subscriber WS into the existing error banner. 4401/4403 (auth/loopback reject) surface as a "reload the page" message; mid-stream drops surface as "events feed disconnected" with the existing reconnect button. Clean unmount closes (1000/1001) stay silent. - web-dashboard.md: install hint was `pip install hermes-agent[web]` but ptyprocess lives in the `pty` extra, not `web`. Switch to `hermes-agent[web,pty]` in both prerequisite blocks. - AGENTS.md: previous "never add a parallel React chat surface" guidance was overbroad and contradicted this PR's sidebar. Tightened to forbid re-implementing the transcript/composer/PTY terminal while explicitly allowing structured supporting widgets (sidebar / model picker / inspectors), matching the actual architecture. - web/package-lock.json: regenerated cleanly so the wterm sibling workspace paths (extraneous machine-local entries) stop polluting CI. Tests: 249/249 vitest, 10/10 PTY/events, web tsc clean. refactor(web): /clean pass on ChatSidebar events handler Spotted in the round-2 review: - Banner flashed on clean unmount: `ws.close()` from the effect cleanup fires `close` with code 1005, opened=true, neither 1000 nor 1001 — hit the "unexpected drop" branch. Track `unmounting` in the effect scope and gate the banner through a `surface()` helper so cleanup closes stay silent. - DRY the duplicated "events feed disconnected" string into a local const used by both the error and close handlers. - Drop the `opened` flag (no longer needed once the unmount guard is the source of truth for "is this an expected close?").	2026-04-24 10:51:49 -04:00
Teknium	1840c6a57d	feat(spotify): wire setup wizard into 'hermes tools' + document cron usage (#15180 ) A — 'hermes tools' activation now runs the full Spotify wizard. Previously a user had to (1) toggle the Spotify toolset on in 'hermes tools' AND (2) separately run 'hermes auth spotify' to actually use it. The second step was a discovery gap — the docs mentioned it but nothing in the TUI pointed users there. Now toggling Spotify on calls login_spotify_command as a post_setup hook. If the user has no client_id yet, the interactive wizard walks them through Spotify app creation; if they do, it skips straight to PKCE. Either way, one 'hermes tools' pass leaves Spotify toggled on AND authenticated. SystemExit from the wizard (user abort) leaves the toolset enabled and prints a 'run: hermes auth spotify' hint — it does NOT fail the toolset toggle. Dropped the TOOL_CATEGORIES env_vars list for Spotify. The wizard handles HERMES_SPOTIFY_CLIENT_ID persistence itself, and asking users to type env var names before the wizard fires was UX-backwards — the point of the wizard is that they don't HAVE a client_id yet. B — Docs page now covers cron + Spotify. New 'Scheduling: Spotify + cron' section with two working examples (morning playlist, wind-down pause) using the real 'hermes cron add' CLI surface (verified via 'cron add --help'). Covers the active-device gotcha, Premium gating, memory isolation, and links to the cron docs. Also fixed a stale '9 Spotify tools' reference in the setup copy — we consolidated to 7 tools in #15154. Validation: - scripts/run_tests.sh tests/hermes_cli/test_tools_config.py tests/hermes_cli/test_spotify_auth.py tests/tools/test_spotify_client.py → 54 passed - website: node scripts/prebuild.mjs && npx docusaurus build → SUCCESS, no new warnings	2026-04-24 07:24:28 -07:00
Teknium	e5d41f05d4	feat(spotify): consolidate tools (9→7), add spotify skill, surface in hermes setup (#15154 ) Three quality improvements on top of #15121 / #15130 / #15135: 1. Tool consolidation (9 → 7) - spotify_saved_tracks + spotify_saved_albums → spotify_library with kind='tracks'\|'albums'. Handler code was ~90 percent identical across the two old tools; the merge is a behavioral no-op. - spotify_activity dropped. Its 'now_playing' action was a duplicate of spotify_playback.get_currently_playing (both return identical 204/empty payloads). Its 'recently_played' action moves onto spotify_playback as a new action — history belongs adjacent to live state. - Net: each API call ships 2 fewer tool schemas when the Spotify toolset is enabled, and the action surface is more discoverable (everything playback-related is on one tool). 2. Spotify skill (skills/media/spotify/SKILL.md) Teaches the agent canonical usage patterns so common requests don't balloon into 4+ tool calls: - 'play X' = one search, then play by URI (not search + scan + describe + play) - 'what's playing' = single get_currently_playing (no preflight get_state chain) - Don't retry on '403 Premium required' or '403 No active device' — both require user action - URI/URL/bare-ID format normalization - Full failure-mode reference for 204/401/403/429 3. Surfaced in 'hermes setup' tool status Adds 'Spotify (PKCE OAuth)' to the tool status list when auth.json has a Spotify access/refresh token. Matches the homeassistant pattern but reads from auth.json (OAuth-based) rather than env vars. Docs updated to reflect the new 7-tool surface, and mention the companion skill in the 'Using it' section. Tests: 54 passing (client 22, auth 15, tools_config 35 — 18 = 54 after renaming/replacing the spotify_activity tests with library + recently_played coverage). Docusaurus build clean.	2026-04-24 06:14:51 -07:00
Teknium	9be17bb84f	docs(spotify): expand feature page with tool reference, Free/Premium matrix, troubleshooting (#15135 ) The initial Spotify docs page shipped in #15130 was a setup guide. This expands it into a full feature reference: - Per-tool parameter table for all 9 tools, extracted from the real schemas in tools/spotify_tool.py (actions, required/optional args, premium gating). - Free vs Premium feature matrix — which actions work on which tier, so Free users don't assume Spotify tools are useless to them. - Active-device prerequisite called out at the top; this is the #1 cause of '403 no active device' reports for every Spotify integration. - SSH / headless section explaining that browser auto-open is skipped when SSH_CLIENT/SSH_TTY is set, and how to tunnel the callback port. - Token lifecycle: refresh on 401, persistence across restarts, how to revoke server-side via spotify.com/account/apps. - Example prompt list so users know what to ask the agent. - Troubleshooting expanded: no-active-device, Premium-required, 204 now_playing, INVALID_CLIENT, 429, 401 refresh-revoked, wizard not opening browser. - 'Where things live' table mapping auth.json / .env / Spotify app. Verified with 'node scripts/prebuild.mjs && npx docusaurus build' — page compiles, no new warnings.	2026-04-24 05:38:02 -07:00
Teknium	05394f2f28	feat(spotify): interactive setup wizard + docs page (#15130 ) Previously 'hermes auth spotify' crashed with 'HERMES_SPOTIFY_CLIENT_ID is required' if the user hadn't manually created a Spotify developer app and set env vars. Now the command detects a missing client_id and walks the user through the one-time app registration inline: - Opens https://developer.spotify.com/dashboard in the browser - Tells the user exactly what to paste into the Spotify form (including the correct default redirect URI, 127.0.0.1:43827) - Prompts for the Client ID - Persists HERMES_SPOTIFY_CLIENT_ID to ~/.hermes/.env so subsequent runs skip the wizard - Continues straight into the PKCE OAuth flow Also prints the docs URL at both the start of the wizard and the end of a successful login so users can find the full guide. Adds website/docs/user-guide/features/spotify.md with the complete setup walkthrough, tool reference, and troubleshooting, and wires it into the sidebar under User Guide > Features > Advanced. Fixes a stale redirect URI default in the hermes_cli/tools_config.py TOOL_CATEGORIES entry (was 8888/callback from the PR description instead of the actual DEFAULT_SPOTIFY_REDIRECT_URI value 43827/spotify/callback defined in auth.py).	2026-04-24 05:30:05 -07:00
l0hde	2cab8129d1	feat(copilot): add 401 auth recovery with automatic token refresh and client rebuild When using GitHub Copilot as provider, HTTP 401 errors could cause Hermes to silently fall back to the next model in the chain instead of recovering. This adds a one-shot retry mechanism that: 1. Re-resolves the Copilot token via the standard priority chain (COPILOT_GITHUB_TOKEN -> GH_TOKEN -> GITHUB_TOKEN -> gh auth token) 2. Rebuilds the OpenAI client with fresh credentials and Copilot headers 3. Retries the failed request before falling back The fix handles the common case where the gho_* OAuth token remains valid but the httpx client state becomes stale (e.g. after startup race conditions or long-lived sessions). Key design decisions: - Always rebuild client even if token string unchanged (recovers stale state) - Uses _apply_client_headers_for_base_url() for canonical header management - One-shot flag guard prevents infinite 401 loops (matches existing pattern used by Codex/Nous/Anthropic providers) - No token exchange via /copilot_internal/v2/token (returns 404 for some account types; direct gho_* auth works reliably) Tests: 3 new test cases covering end-to-end 401->refresh->retry, client rebuild verification, and same-token rebuild scenarios. Docs: Updated providers.md with Copilot auth behavior section.	2026-04-24 05:09:08 -07:00
Teknium	852c7f3be3	feat(cron): per-job workdir for project-aware cron runs (#15110 ) Cron jobs can now specify a per-job working directory. When set, the job runs as if launched from that directory: AGENTS.md / CLAUDE.md / .cursorrules from that dir are injected into the system prompt, and the terminal / file / code-exec tools use it as their cwd (via TERMINAL_CWD). When unset, old behaviour is preserved (no project context files, tools use the scheduler's cwd). Requested by @bluthcy. ## Mechanism - cron/jobs.py: create_job / update_job accept 'workdir'; validated to be an absolute existing directory at create/update time. - cron/scheduler.py run_job: if job.workdir is set, point TERMINAL_CWD at it and flip skip_context_files to False before building the agent. Restored in finally on every exit path. - cron/scheduler.py tick: workdir jobs run sequentially (outside the thread pool) because TERMINAL_CWD is process-global. Workdir-less jobs still run in the parallel pool unchanged. - tools/cronjob_tools.py + hermes_cli/cron.py + hermes_cli/main.py: expose 'workdir' via the cronjob tool and 'hermes cron create/edit --workdir ...'. Empty string on edit clears the field. ## Validation - tests/cron/test_cron_workdir.py (21 tests): normalize, create, update, JSON round-trip via cronjob tool, tick partition (workdir jobs run on the main thread, not the pool), run_job env toggle + restore in finally. - Full targeted suite (tests/cron/, test_cronjob_tools.py, test_cron.py, test_config_cwd_bridge.py, test_worktree.py): 314/314 passed. - Live smoke: hermes cron create --workdir $(pwd) works; relative path rejected; list shows 'Workdir:'; edit --workdir '' clears.	2026-04-24 05:07:01 -07:00
WildCat Eng Manager	7626f3702e	feat: read prompt caching cache_ttl from config - Load prompt_caching.cache_ttl in AIAgent (5m default, 1h opt-in) - Document DEFAULT_CONFIG and developer guide example - Add unit tests for default, 1h, and invalid TTL fallback Made-with: Cursor	2026-04-24 03:21:29 -07:00
Teknium	b2e124d082	refactor(commands): drop /provider, /plan handler, and clean up slash registry (#15047 ) * refactor(commands): drop /provider and clean up slash registry * refactor(commands): drop /plan special handler — use plain skill dispatch	2026-04-24 03:10:52 -07:00
Keira Voss	2ba9b29f37	docs(plugins): correct pre_gateway_dispatch doc text and add hooks.md section Follow-up to `aeff6dfe`: - Fix semantic error in VALID_HOOKS inline comment ("after core auth" -> "before auth"). Hook intentionally runs BEFORE auth so plugins can handle unauthorized senders without triggering the pairing flow. - Fix wrong class name in the same comment (HermesGateway -> GatewayRunner, matching gateway/run.py). - Add a full ### pre_gateway_dispatch section in website/docs/user-guide/features/hooks.md (matches the pattern of every other plugin hook: signature, params table, fires-where, return-value table, use cases, two worked examples) plus a row in the quick-reference table. - Add the anchor link on the plugins.md table row so it matches the other hook entries. No code behavior change.	2026-04-24 03:02:03 -07:00
Keira Voss	1ef1e4c669	feat(plugins): add pre_gateway_dispatch hook Introduces a new plugin hook `pre_gateway_dispatch` fired once per incoming MessageEvent in `_handle_message`, after the internal-event guard but before the auth / pairing chain. Plugins may return a dict to influence flow: {"action": "skip", "reason": "..."} -> drop (no reply) {"action": "rewrite", "text": "..."} -> replace event.text {"action": "allow"} / None -> normal dispatch Motivation: gateway-level message-flow patterns that don't fit cleanly into any single adapter — e.g. listen-only group-chat windows (buffer ambient messages, collapse on @mention), or human-handover silent ingest (record messages while an owner handles the chat manually). Today these require forking core; with this hook they can live in a single profile-agnostic plugin. Hook runs BEFORE auth so plugins can handle unauthorized senders (e.g. customer-service handover ingest) without triggering the pairing-code flow. Exceptions in plugin callbacks are caught and logged; the first non-None action dict wins, remaining results are ignored. Includes: - `VALID_HOOKS` entry + inline doc in `hermes_cli/plugins.py` - Invocation block in `gateway/run.py::_handle_message` - 5 new tests in `tests/gateway/test_pre_gateway_dispatch.py` (skip, rewrite, allow, exception safety, internal-event bypass) - 2 additional tests in `tests/hermes_cli/test_plugins.py` - Table entry in `website/docs/user-guide/features/plugins.md` Made-with: Cursor	2026-04-24 03:02:03 -07:00
Brooklyn Nicholson	67bfd4b828	feat(tui): stream thinking + tools expanded by default Extends SECTION_DEFAULTS so the out-of-the-box TUI shows the turn as a live transcript (reasoning + tool calls streaming inline) instead of a wall of `▸` chevrons the user has to click every turn. Final default matrix: - thinking: expanded - tools: expanded - activity: hidden (unchanged from the previous commit) - subagents: falls through to details_mode (collapsed by default) Everything explicit in `display.sections` still wins, so anyone who already pinned an override keeps their layout. One-line revert is `display.sections.<name>: collapsed`.	2026-04-24 02:53:44 -05:00
Brooklyn Nicholson	728767e910	feat(tui): hide the activity panel by default The activity panel (gateway hints, terminal-parity nudges, background notifications) is noise for the typical day-to-day user, who only cares about thinking + tools + streamed content. Make `hidden` the built-in default for that section so users land on the quiet mode out of the box. Tool failures still render inline on the failing tool row, so this default suppresses the noise feed without losing the signal. Opt back in with `display.sections.activity: collapsed` (chevron) or `expanded` (always open) in `~/.hermes/config.yaml`, or live with `/details activity collapsed`. Implementation: SECTION_DEFAULTS in domain/details.ts, applied as the fallback in `sectionMode()` between the explicit override and the global details_mode. Existing `display.sections.activity` overrides take precedence — no migration needed for users who already set it.	2026-04-24 02:37:42 -05:00
Brooklyn Nicholson	78481ac124	feat(tui): per-section visibility for the details accordion Adds optional per-section overrides on top of the existing global details_mode (hidden \| collapsed \| expanded). Lets users keep the accordion collapsed by default while auto-expanding tools, or hide the activity panel entirely without touching thinking/tools/subagents. Config (~/.hermes/config.yaml): display: details_mode: collapsed sections: thinking: expanded tools: expanded activity: hidden Slash command: /details show current global + overrides /details [hidden\|collapsed\|expanded] set global mode (existing) /details <section> <mode\|reset> per-section override (new) /details <section> reset clear override Sections: thinking, tools, subagents, activity. Implementation: - ui-tui/src/types.ts SectionName + SectionVisibility - ui-tui/src/domain/details.ts parseSectionMode / resolveSections / sectionMode + SECTION_NAMES - ui-tui/src/app/uiStore.ts + app/interfaces.ts + app/useConfigSync.ts sections threaded into UiState - ui-tui/src/components/ thinking.tsx ToolTrail consults per-section mode for hidden/expanded behaviour; expandAll skips hidden sections; floating-alert fallback respects activity:hidden - ui-tui/src/components/ messageLine.tsx + appLayout.tsx pass sections through render tree - ui-tui/src/app/slash/ commands/core.ts /details <section> <mode\|reset> syntax - tui_gateway/server.py config.set details_mode.<section> writes to display.sections.<section> (empty value clears the override) - website/docs/user-guide/tui.md documented Tests: 14 new (4 domain, 4 useConfigSync, 3 slash, 3 gateway). Total: 269/269 vitest, all gateway tests pass.	2026-04-24 02:34:32 -05:00

1 2 3 4 5 ...

512 Commits