mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-04 09:47:54 +08:00
832ecde4b08430fa81e7a82bf07cbb0aa77e175a
512 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
832ecde4b0 |
feat(kanban): structured tool surface for worker + orchestrator agents
Seven new tools in `tools/kanban_tools.py` that give kanban workers a
backend-portable, schema-filtered way to interact with the board from
inside their own Python process — no shelling out to `hermes kanban`.
Motivation
The CLI path (`hermes kanban complete \$TASK --summary ...`) breaks
on any remote terminal backend (Docker, Modal, Singularity, SSH).
The terminal tool runs `hermes kanban` inside the container, where
`hermes` isn't installed and `~/.hermes/kanban.db` isn't mounted.
Tools run in the agent's own Python process, so they always reach
the board regardless of backend. Also skips shell-quoting fragility
on --metadata JSON and gives structured error returns the model can
reason about.
The seven tools
kanban_show read current task (defaults to HERMES_KANBAN_TASK)
kanban_complete structured handoff: summary + metadata
kanban_block ask for human input
kanban_heartbeat signal liveness during long operations
kanban_comment append to task thread
kanban_create fan out into child tasks (orchestrator path)
kanban_link add parent→child dependency after the fact
Gating
Each tool's check_fn returns True iff HERMES_KANBAN_TASK is set in
the process env. The dispatcher sets it when spawning a worker;
normal `hermes chat` sessions never have it. Empirically verified:
a baseline hermes-cli schema is 27 tools; with HERMES_KANBAN_TASK
set it grows to exactly 34 (+7). Zero leak into normal sessions.
Also set HERMES_PROFILE in the spawn env so the kanban_comment tool's
author default works cleanly (it's what the tool reads to attribute
comments).
Skill updates
- `skills/devops/kanban-worker/SKILL.md`: lifecycle rewritten to use
kanban_show / kanban_heartbeat / kanban_block / kanban_complete /
kanban_comment / kanban_create directly. CLI fallback section
added for human operators / scripts.
- `skills/devops/kanban-orchestrator/SKILL.md`: all examples ported
from CLI to tool form; top-banner note explaining tools are the
primary surface. kanban_create / kanban_link throughout.
Docs
`website/docs/user-guide/features/kanban.md`:
new "How workers interact with the board" section explaining the
tool surface, gating mechanism, and why tools vs CLI. The worker
skill / orchestrator skill subsections are now nested under it.
Tests (+25 in tests/tools/test_kanban_tools.py)
- Schema gating: kanban_tools_hidden_without_env_var,
kanban_tools_visible_with_env_var.
- Happy paths: show (default + explicit task_id), complete (with
summary+metadata, with result only), block, heartbeat (with and
without note), comment (default + custom author), create (with
list parents, with string parent), link.
- Error paths: complete rejects no-handoff and non-dict metadata,
block rejects empty reason, comment rejects empty body, create
rejects no title / no assignee / non-list parents, link rejects
self-reference / missing args / cycles.
- End-to-end: full worker lifecycle driven entirely through the
tools, verified against DB state.
214/214 kanban suite pass under scripts/run_tests.sh.
|
||
|
|
be184aa5fa |
fix(kanban): close the two v2-flagged issues in v1
Both items the atypical-scenarios pass flagged as "v2 follow-up"
actually belong in v1. Fixed now.
Fix 1: workspace path traversal
resolve_workspace now rejects non-absolute paths for all three
workspace_kinds (scratch-with-explicit-path, dir:, worktree). A
relative path like '../../../tmp/attacker' was being silently
resolved against the dispatcher's CWD — a confused-deputy escape.
Error message points users at the absolute-path requirement.
Storage remains verbatim (kernel doesn't rewrite user input);
the refusal happens at resolution time, so the dispatcher's
existing spawn-failure circuit breaker correctly categorizes it.
Threat model documented in website/docs/user-guide/features/kanban.md:
single-host, trusted-local-user. The absolute-path rule prevents
ambiguity-driven escape, not malicious access — kanban runs as you,
with your uid, on your filesystem.
Fix 2: build_worker_context unbounded
Added per-section caps so worker prompts stay bounded on pathological
boards:
_CTX_MAX_PRIOR_ATTEMPTS = 10 most-recent N runs shown; older
collapsed into "N earlier attempts
omitted" marker. Attempt numbering
preserved (shows "Attempt 16" not
renumbered).
_CTX_MAX_COMMENTS = 30 same pattern for comments.
_CTX_MAX_FIELD_BYTES = 4 KB per summary / error / metadata / result.
_CTX_MAX_BODY_BYTES = 8 KB per task.body (opening post).
_CTX_MAX_COMMENT_BYTES = 2 KB per comment.
Truncation uses a visible ellipsis + char-count so the worker knows
it's been truncated.
Effect on atypical-scenario runs:
huge_run_count_on_one_task (1000 runs): 63 KB → 820 chars
comment_storm (1000 comments): 50 KB → 1,671 chars
Tests (+6 in main suite)
test_resolve_workspace_rejects_relative_dir_path — relative dir:
path stored verbatim but refused at resolve.
test_resolve_workspace_accepts_absolute_dir_path — legitimate
absolute paths are created and returned.
test_resolve_workspace_rejects_relative_worktree_path — same guard
for worktree kind.
test_build_worker_context_caps_prior_attempts — 25 runs → exactly
_CTX_MAX_PRIOR_ATTEMPTS shown, omitted marker present,
attempt numbering preserves original index.
test_build_worker_context_caps_comments — 100 comments → 30 shown,
70 in the omitted marker.
test_build_worker_context_caps_huge_summary — 1 MB summary on a
prior run → context under 10 KB total, truncation marker visible.
189/189 kanban suite pass. Atypical-scenarios stress script still
passes all 28 scenarios with the new caps in effect.
|
||
|
|
7206eed319 |
docs(kanban): add step-by-step tutorial with 10 dashboard screenshots
New website/docs/user-guide/features/kanban-tutorial.md walks four
user stories end-to-end, each backed by a real screenshot of the
dashboard running against seeded data.
Stories
1. Solo dev shipping a feature (parent->child dependencies,
structured handoff, run history rendering).
2. Fleet farming (parallel independent tasks across 3 assignees,
lanes-by-profile grouping, dispatcher daemon).
3. Role pipeline with retry (PM spec -> eng implements -> review
blocks -> eng retries -> review approves; two-run history
visible in the drawer; downstream workers pull parent
summary+metadata).
4. Circuit breaker + crash recovery (2 spawn_failed + 1 gave_up
for a deploy with missing creds; 1 crashed + 1 completed for
an OOM-killed migration that recovered on retry).
Each story shows both CLI commands and the dashboard drawer
equivalent. Screenshots captured via playwright + chromium at 2x
device scale, then repalettized with PIL (22MB -> 6.1MB for the
10-image set, no visible quality loss verified against vision).
Side updates
- website/sidebars.ts: added kanban-tutorial under features.
- website/docs/user-guide/features/kanban.md: prefix banner
linking new readers to the tutorial before the reference.
All image references validate: `/img/kanban-tutorial/*` maps to
website/static/img/kanban-tutorial/ (10 files). Docusaurus build
not run locally (no node_modules in worktree); CI build on merge
will confirm.
|
||
|
|
e27c819de3 |
fix(kanban): deep-scan pass 2 — synthetic runs, event.run_id plumbing, invariant recovery, live drawer refresh
Second integration audit covering surfaces the first pass didn't hit.
Found eight issues spanning kernel, dashboard frontend, notifier, and CLI.
All behavioral / UX fixes; no schema change.
Kernel
- complete_task on a never-claimed task (ready/blocked → done with no
run in flight) was silently dropping the summary/metadata/result
onto a non-existent run. Now synthesizes a zero-duration run
(started_at == ended_at) so attempt history is complete. Only
fires when there's actually handoff data to persist — bare
complete_task(tid) remains a no-op for run creation.
- block_task on a never-claimed task had the same bug for --reason.
Same fix: synthesize a zero-duration run when a reason is passed.
- Event dataclass gained a `run_id: Optional[int] = None` field.
list_events, unseen_events_for_sub, and the dashboard _event_dict
were all SELECTing the column but dropping it on the way out,
so downstream consumers couldn't group events by attempt. Every
read path now surfaces run_id.
- claim_task got a defensive invariant-recovery step: if somehow
`current_run_id` is non-NULL on a task in 'ready' status (invariant
violation from an unknown code path), close the leaked run as
'reclaimed' inside the same txn as the new claim. No-op in the
common case; belt-and-suspenders in case a future code path forgets
to clear the pointer.
Dashboard
- GET /tasks/:id events array now carries run_id per event (via
_event_dict).
- WebSocket /events SELECT now includes run_id in the pushed event
payload.
- TaskDrawer reloads itself on live events for its own task id. New
`taskEventTick[taskId]` state in the Board, incremented on every
WS event, passed down as `eventTick` prop; drawer's useEffect
depends on it. Previously, background workers completing a task
the user was viewing left the drawer showing stale data until
manual close/reopen.
- CSS: added `.hermes-kanban-run--ended` rule for the fallback class
the JS emits when outcome is unset. Harmless before; just
inconsistent.
CLI
- `hermes kanban watch --kinds` help text listed the legacy event
name `spawn_auto_blocked`. The kernel migration renames it to
`gave_up`, so users typing the documented name got zero matches.
Now shows the current lexicon (`completed,blocked,gave_up,
crashed,timed_out`).
Tests (+6 in core functionality, +1 in dashboard plugin)
- complete_never_claimed_task_synthesizes_run
- block_never_claimed_task_synthesizes_run
- complete_never_claimed_without_handoff_skips_synthesis
- event_dataclass_carries_run_id (created.run_id None, completed.run_id matches)
- unseen_events_for_sub_includes_run_id (notifier path)
- claim_task_recovers_from_invariant_leak (engineer the leak, verify recovery)
- event_dict_includes_run_id (dashboard API shape)
171/171 kanban suite pass under scripts/run_tests.sh. Live-smoke (isolated
HERMES_HOME via execute_code) exercised all six fixed paths plus the
claim-after-leak recovery sequence.
Docs
- Runs section: new 'Synthetic runs for never-claimed completions'
and 'Live drawer refresh' paragraphs explaining the invariants.
- Event reference: `created` / `promoted` / `unblocked` entries now
explicitly note `run_id` is `NULL`; `completed` / `blocked`
describe synthetic-run fallback.
|
||
|
|
1c78f6627a |
docs(kanban): document audit-pass invariants — bulk-close guard, reclaimed-on-status-change, completed event carries summary
- Runs section: dashboard PATCH parity (summary/metadata forward), `completed` event embeds first-line summary for notifiers, bulk --summary/--metadata refused, archive/drag-drop reclaim semantics. - Event reference: added Payload column to Lifecycle and Edits tables; called out the invariant that `status` carries run_id when closing a reclaimed run. |
||
|
|
0146cb2bd2 |
feat(kanban): runs as first-class (v1); structured handoffs; forward-compat for v2 workflows
Addresses vulcan-artivus's RFC review on issue #16102. Picks up the structural changes that are expensive to retrofit later and zero-cost to land now; defers workflow-template routing + per-stage lanes to v2 (kept forward-compat hooks in the schema). Kernel - New `task_runs` table. Each claim opens a run (pid, claim_lock, heartbeat, max_runtime, started_at), each terminal transition closes it with an outcome (completed / blocked / crashed / timed_out / spawn_failed / gave_up / reclaimed). Multiple rows per task when retries happen, preserving full attempt history. - `tasks.current_run_id` points at the active run (NULL when idle); denormalised for cheap reads. - `task_events.run_id` carries the run a given event belongs to so UIs group events by attempt. claim/spawned/complete/block/crash/ timeout/spawn_fail/gave_up/heartbeat events are all run-scoped; created/promoted/assigned/edited stay task-scoped (run_id=NULL). - Legacy DBs: migration adds the columns + indexes + synthesizes a run row for any task that's 'running' before the runs table existed, so subsequent complete/heartbeat/reclaim calls have a target. Idempotent. Structured handoff - `complete_task(summary=, metadata=)` persists both on the closing run. `summary` falls back to `result` when omitted so single-run callers don't duplicate. `metadata` is a free-form dict ({changed_files, tests_run, findings, ...}). - `build_worker_context` rewrites: now reads "Prior attempts on this task" (closed runs: outcome, summary, error, metadata) and "Parent task results" pulls run.summary + run.metadata of the most-recent completed run per parent, falling back to task.result for legacy rows without runs. Retrying workers see why earlier attempts failed; downstream workers see parent handoffs structurally, not as loose `result` strings. CLI - `hermes kanban complete <id> --summary "..." --metadata '{"files":1}'`. JSON is parsed and rejected with exit-2 if malformed. - New `hermes kanban runs <id> [--json]` verb. Shows per-run rows: outcome, profile, elapsed, summary, error. JSON mode serializes the full run dataclass for scripting. Dashboard plugin - GET /tasks/:id now carries a runs[] array alongside task / events / comments / links. Each run serialised with outcome, summary, metadata, worker_pid, elapsed fields. - New Run History section in the drawer. Outcome-coloured left border (green=active, blue=completed, amber=reclaimed, red=crashed/timed_out/gave_up/blocked). Collapsed when >3 runs with a '+N earlier' toggle. Shows summary + error + metadata inline. Forward-compat for v2 (vulcan's workflow templates + stages) - `tasks.workflow_template_id` and `tasks.current_step_key` added as nullable columns. v1 kernel ignores them for routing; v2 will add workflow_templates + workflow_steps tables and wire the dispatcher to consult them. task_runs has a matching `step_key` column. Lets a v2 release land additively without another schema migration. Tests (+22 in test_kanban_core_functionality.py, +2 in dashboard) - run_created_on_claim / run_closed_on_complete_with_summary - run_summary_falls_back_to_result - multiple_attempts_preserved_as_runs (3 attempts: reclaimed → crashed → completed, all visible in list_runs) - run_on_block_with_reason / run_on_spawn_failure_records_failed_runs (5 spawn_failed runs + 1 gave_up run) - event_rows_carry_run_id (task-scoped vs run-scoped split) - build_worker_context_includes_prior_attempts - build_worker_context_uses_parent_run_summary (metadata JSON in context) - migration_backfills_inflight_run_for_legacy_db (simulates a pre-migration running task, re-runs init_db, asserts backfill) - forward_compat_columns_writable - cli_runs_verb + cli_runs_json - cli_complete_with_summary_and_metadata (JSON round-trip through shlex + argparse) - cli_complete_bad_metadata_exits_nonzero - task_detail_includes_runs / task_detail_runs_empty_before_claim 269/269 kanban suite pass under scripts/run_tests.sh. Live-smoke covered: single-attempt complete → run closed + summary persisted; retry scenario → two runs visible (blocked + completed); parent run summary + metadata surfaced to child via build_worker_context; forward-compat columns writable via UPDATE; GET /tasks/:id returns runs[]. Docs - New 'Runs — one row per attempt' section in kanban.md: the why (full attempt history, structured metadata), the two-table model (task is logical, run is execution), the structured handoff shape (--summary / --metadata), example CLI + dashboard output, forward-compat note for v2. - Event reference updated to mention task_events.run_id. - CLI reference gains 'hermes kanban runs <id>'. Not in v1 (deferred to v2): - Workflow templates (workflow_templates + workflow_steps tables, stage-based routing, success/failure step links). - 'stage' as a distinct axis from status in the UI. - Shared-by-default workspace binding across stages of the same workflow run. - Pipeline replacement for the kanban-orchestrator skill (the orchestrator's 'decompose, don't execute' guidance is still correct; it becomes partly redundant once workflows land). |
||
|
|
da7d09c3b6 |
feat(kanban): max-runtime timeouts, worker heartbeats, assignees picker, event vocab cleanup
Ports four items from the Multica audit (https://github.com/multica-ai/multica). Dropped their cross-host server/daemon architecture and their Postgres+pgvector skill search — both the wrong shape for our single-host SQLite kernel. 1. Per-task max-runtime (`max_runtime_seconds` column) - New kernel function `enforce_max_runtime(conn)` runs in every dispatch tick. When a running task's elapsed time exceeds the cap, we SIGTERM the worker, wait a 5 s grace (polling _pid_alive), then SIGKILL. The task goes back to 'ready' with a `timed_out` event and re-queues on the next tick (unless the spawn-failure circuit breaker has already parked it). - Host-local only: lock prefix must match this host's claimer_id so we never signal a PID on another machine. - CLI: `hermes kanban create --max-runtime 30m | 2h | 1d | <seconds>`. New `_parse_duration` helper accepts s/m/h/d suffixes or bare integers. - Dashboard POST body + the card's `max_runtime_seconds` field. 2. Worker heartbeat (`last_heartbeat_at` column, `heartbeat` event) - `heartbeat_worker(conn, task_id, note=None)` emits the event and touches last_heartbeat_at. Refused when the task isn't running. - CLI: `hermes kanban heartbeat <id> [--note "..."]`. - kanban-worker skill instructs workers to heartbeat during long loops (training runs, encodes, crawls, batch uploads). - Separate signal from PID crash detection: a worker's Python can still be alive while the actual work process is stuck. Heartbeat absence is diagnostic; future work can auto-block on stale heartbeats but v1 just surfaces the signal. 3. Assignee enumeration (`known_assignees`, `list_profiles_on_disk`) - Scans ~/.hermes/profiles/ for dirs containing config.yaml + unions with current assignees on the board. Each entry returns {name, on_disk, counts: {status: n}}. - CLI: `hermes kanban assignees [--json]`. Also hooked into `hermes kanban init` which now prints discovered profiles so new installs see 'these are the assignees you can target' immediately. - Dashboard: GET /api/plugins/kanban/assignees for the picker. 4. Event vocab cleanup (three renames + three new kinds) - `ready` → `promoted` (fires when deps clear; clearer semantic). - `priority` → `reprioritized` (past-tense verb, matches others). - `spawn_auto_blocked` → `gave_up` (short, memorable; the circuit breaker gave up on this task). - New: `spawned` (emitted with {pid} on successful spawn), `heartbeat` ({note?}), `timed_out` ({pid, elapsed_seconds, limit_seconds, sigkill}). - One-shot migration in `_migrate_add_optional_columns` renames legacy rows in-place on init_db(), so existing DBs upgrade cleanly. - Gateway notifier's TERMINAL_KINDS set updated; timed_out gets its own ⏱ message template, gave_up renamed from 'auto-blocked'. - Plugin_api.py's two 'priority' emit sites renamed to 'reprioritized'. - Documented in a new 'Event reference' section in kanban.md, grouped into three clusters (lifecycle / edits / worker telemetry) with payload shapes. Tests (+18 in tests/hermes_cli/test_kanban_core_functionality.py, 136/136 pass): - max_runtime_terminates_overrun_worker: real SIGTERM flow with _pid_alive stub, verifies event payload + state reset. - max_runtime_none_means_no_cap: unbounded tasks aren't timed out. - create_task_persists_max_runtime. - enforce_max_runtime_integrates_with_dispatch: kernel-level + dispatch_once chaining. - heartbeat_on_running_task + heartbeat_refused_when_not_running. - cli_heartbeat_verb with --note round-trip. - recompute_ready_emits_promoted_not_ready. - spawn_failure_circuit_breaker_emits_gave_up. - spawned_event_emitted_with_pid. - migration_renames_legacy_event_kinds (injects old rows, re-runs init_db, asserts rename). - list_profiles_on_disk (tmp_path + config.yaml filter). - known_assignees_merges_disk_and_board (profiles on disk + board assignees + per-status counts). - cli_assignees_json. - parse_duration_accepts_formats (s/m/h/d/float). - parse_duration_rejects_garbage. - cli_create_max_runtime_via_duration (2h → 7200). - cli_create_max_runtime_bad_format_exits_nonzero. Live smoke: POST /tasks with max_runtime_seconds round-trips; /assignees returns the union of on-disk + board-assigned names; PATCH priority produces 'reprioritized' events (not 'priority'); board cards expose max_runtime_seconds + last_heartbeat_at. Docs (website/docs/user-guide/features/kanban.md): - New 'Event reference' section with three-cluster table (lifecycle / edits / worker telemetry) + payload shapes. - CLI reference updated for --max-runtime, heartbeat, assignees. - Gateway notifications section updated for the new TERMINAL_KINDS. Not ported from Multica (deliberate, documented in the out-of-scope section already): Postgres+pgvector skill search (heavy deps conflict with SQLite kernel), server+daemon cross-host model (we're single-host on purpose), first-class agent identity with threaded comments (we keep the board profile-agnostic). |
||
|
|
af8d43dbbb |
feat(kanban): core hardening — daemon, circuit breaker, crash detect, logs, notify, bulk, stats
Eliminates every 'known broken on day one' item in the core functionality
audit. The board is now self-driving (daemon, not cron), self-healing
(crash detection, spawn-failure circuit breaker), and self-reporting
(logs, stats, gateway notifications).
Dispatcher
- New `hermes kanban daemon` long-lived loop with --interval, --max,
--failure-limit, --pidfile, --verbose, signal-clean shutdown
(SIGINT/SIGTERM via threading.Event). A kb.run_daemon() entry point
lets tests drive it inline without subprocess.
- `hermes kanban init` now prints the dispatcher setup hint so users
don't leave the board off-by-default. Ships a systemd user unit at
plugins/kanban/systemd/hermes-kanban-dispatcher.service.
- Removed the old 'add this to cron' doc path. Cron runs agent
prompts (LLM cost per tick) — unacceptable for a per-minute
coordination loop.
Worker aliveness / safety
- Spawn returns the child's PID; dispatcher stores it on the task row
and calls detect_crashed_workers() every tick. If the PID is gone
but the claim TTL hasn't expired, the task drops back to ready with
a 'crashed' event. Host-local only — cross-host PIDs are ignored
per the single-host design.
- Spawn-failure circuit breaker: after N consecutive spawn_failed
events on the same task (default 5), the dispatcher auto-blocks
with the last error as the reason. Success resets the counter.
Workspace-resolution failures count against the same budget.
- Log rotation: _rotate_worker_log trims at 2 MiB, keeps one
generation (.log.1), bounds per-task disk usage at ~4 MiB.
Idempotency / dedup
- create_task(idempotency_key=...) returns the existing non-archived
task id for retried webhooks. --idempotency-key on the CLI, json
body field on the dashboard plugin. Archived tasks don't block a
fresh create with the same key.
CLI surface
- Bulk verbs: complete, unblock, archive accept multiple ids;
block accepts --ids for sibling blocks with the same reason.
- New verbs: daemon, watch (live event tail filtered by
assignee/tenant/kinds), stats, log, notify-subscribe,
notify-list, notify-unsubscribe.
- dispatch gains --failure-limit + crashed/auto_blocked columns in
JSON output and human-readable output.
- gc accepts --event-retention-days / --log-retention-days; prunes
task_events for terminal tasks and old log files.
Gateway integration
- New GatewayRunner._kanban_notifier_watcher: polls
kanban_notify_subs every 5s, pushes ✔/⏸/✖ messages to subscribed
chats for completed/blocked/spawn_auto_blocked/crashed events.
Cursor-advanced per-sub; auto-removed when the task reaches
done/archived. Runs alongside the session expiry and platform
reconnect watchers — SQLite work in asyncio.to_thread so the
event loop never blocks.
- /kanban create in the gateway auto-subscribes the originating
chat (platform + chat_id + thread_id). Users see
'(subscribed — you'll be notified when t_abcd completes or
blocks)' appended to the response.
Dashboard plugin
- GET /stats returns board_stats (by_status, by_assignee,
oldest_ready_age_seconds).
- GET /tasks/:id/log returns the worker log with optional ?tail=N
cap. 404 on unknown task, exists=false when the task has never
spawned.
- POST /tasks accepts idempotency_key; both Pydantic body and the
create_task kwarg now round-trip.
- /board attaches task.age (created/started/time_to_complete in
seconds) so the UI can colour stale cards without recomputing.
- Card CSS: amber border after N minutes, red border when clearly
stuck (tier per status: running 10m/60m, ready 1h/24h, todo
7d/30d, blocked 1h/24h).
- Drawer: new Worker log section, auto-loads on mount, last 100 KB
cap with on-disk path surfaced when truncated.
Kernel
- Schema additions: tasks.idempotency_key, tasks.spawn_failures,
tasks.worker_pid, tasks.last_spawn_error; new
kanban_notify_subs table. All gated by _migrate_add_optional_columns
so legacy DBs upgrade cleanly.
- release_stale_claims / complete_task / block_task now all clear
worker_pid so crash detection doesn't false-positive on reclaimed
tasks.
- read_worker_log fixed: tail-skip no longer eats one-giant-line
logs (common with child processes that don't flush newlines
before dying).
Tests (tests/hermes_cli/test_kanban_core_functionality.py, 28 new)
- Idempotency: same key returns existing, archived doesn't block,
no key never collides
- Circuit breaker: auto-blocks after limit, success resets counter,
workspace-resolution failure counts against budget
- Aliveness: _pid_alive helper, detect_crashed_workers reclaims
exited child
- Daemon: runs and stops cleanly via stop_event, survives a tick
exception
- Stats + task_age helpers
- Notify subs: CRUD, cursor advances, distinct-thread is a separate row
- GC: events-only-for-terminal-tasks, old worker logs deleted
- Log: rotation keeps one generation, read_worker_log tail
- CLI: bulk complete/archive/unblock/block, create with
--idempotency-key, stats --json, notify-subscribe+list, log
missing task, gc reports counts
- run_slash parity: smoke-tests every registered verb (23
invocations); none may raise or return empty string
Full kanban test suite: 234/234 pass under scripts/run_tests.sh
(60 original + 30 dashboard plugin + 28 new core + 116 command
registry). Live smoke covers /stats, idempotency, age, log endpoint
with and without content, log?tail= truncation signal, 404 on unknown
task.
Docs (website/docs/user-guide/features/kanban.md)
- 'Core concepts' rewritten: new statuses (triage), idempotency key,
dispatcher-as-daemon-not-cron with circuit breaker behaviour
documented.
- Quick start swapped to daemon. New systemd section covers user
service install.
- New sections: idempotent create, bulk verbs, gateway
notifications, out-of-scope single-host note (kanban.db is local;
don't expect multi-host).
- CLI reference updated for every new verb, every new flag.
|
||
|
|
27fc6c1086 |
feat(kanban): bulk ops, drawer edit, dep editor, markdown, touch, config
The dashboard plugin gets the last layer of features that turn it from a
'usable read surface with drag-drop' into a 'full kanban UI' — no more
'drop to CLI to do X' moments from inside the tab.
Plugin backend
- POST /tasks/bulk — apply the same patch (status / archive / assignee
/ priority) to every id in the request body. Each id runs
independently: one bad id reports {ok: false, error: ...} without
aborting siblings. Status transitions that aren't legal for the
current state are surfaced per-id ('transition to done refused').
Used by the multi-select bulk action bar.
- GET /config — returns the dashboard.kanban section of config.yaml
(default_tenant, lane_by_profile, include_archived_by_default,
render_markdown) with sensible defaults when the section is absent.
Loaded once by the SPA to preselect filters and toggle markdown
rendering.
- _conn() helper — every handler now goes through it, calling
kanban_db.init_db() (idempotent) before every connection. Fresh
installs work whether the first hit is GET /board, POST /tasks, or
any other endpoint — no more 'no such table: tasks' when the CLI
or a script hits the plugin before the dashboard has ever loaded.
Plugin UI (plugin bundle, +~12 KB)
- Multi-select: per-card checkbox; shift/ctrl-click also toggles
without opening the drawer. A BulkActionBar appears above the
columns with batch → ready / complete / archive / reassign
(profile dropdown + unassign option). Destructive batches confirm
first. Partial failures from the backend are surfaced inline.
- Drawer inline editing:
- Click the title → TitleEditor swaps in an input, Enter saves,
Escape cancels.
- Click the Assignee meta row → AssigneeEditor input (empty string
unassigns).
- Click the Priority meta row → PriorityEditor numeric input.
- New 'edit' button on Description → full-width textarea; Save /
Cancel switch back to rendered view.
- Dependency editor: chip list of parents + children with per-chip
× button (calls DELETE /links). Add-parent / add-child dropdowns
filter out self + already-linked tasks so you cannot re-add a
duplicate edge or a self-loop. Cycle rejections from the server
surface cleanly via the existing error banner.
- Parent selection in InlineCreate: new dropdown listing every task
on the board ('{id} — {title}') — picking one sends parents=[id]
with the create payload, so the task lands in todo (or triage if
created from the Triage column) with the dependency wired up.
- Safe markdown rendering for description, comment bodies, and
result. A small in-bundle renderer handles headings, bold, italic,
inline code, fenced code, bullet lists, and http(s)/mailto links.
Every substitution runs on HTML-escaped input (no raw HTML), links
get target=_blank + rel=noopener,noreferrer. Disabled by config
key dashboard.kanban.render_markdown=false (falls back to <pre>).
- Touch drag-drop: attachTouchDrag() installs a pointerdown handler
that spawns a drag proxy, tracks elementFromPoint under the finger,
and dispatches a hermes-kanban:drop CustomEvent on the column when
released. Desktop continues to use native HTML5 DnD. Columns
listen for both.
- ErrorBoundary already present from the prior commit catches any
renderer throw; markdown escape + touch-proxy cleanup both have
their own try/finally.
Tests (tests/plugins/test_kanban_dashboard_plugin.py — 90/90 pass)
- bulk_status_ready: 3 tasks blocked, batch → ready, all move
- bulk_archive hides all ids from default board
- bulk_reassign changes every assignee
- bulk_unassign_via_empty_string sets assignee back to None
- bulk_partial_failure_doesnt_abort_siblings: bogus id in middle,
good siblings still get priority=7
- bulk_empty_ids_400
- config_returns_defaults_when_section_missing
- config_reads_dashboard_kanban_section (writes config.yaml, verifies
every key round-trips)
Live smoke (real FastAPI app + isolated HERMES_HOME):
- /config without section returns defaults
- /config with dashboard.kanban section returns the configured values
- POST /tasks as the first-ever request (no prior /board) succeeds —
auto-init handles it
- Link add + remove via POST /links + DELETE /links round-trip
- Bulk priority bump on 2 ids, both get priority=5
- Bulk archive hides ids from default board
- PATCH {title, body} updates the task, markdown source survives
the round trip
- POST /tasks {triage: true, parents: [id]} lands in triage, not todo
- Bulk partial: 2 good + 1 bogus returns per-id outcome
Docs (website/docs/user-guide/features/kanban.md)
- 'What the plugin gives you' rewritten to reflect bulk, drawer
edit, dep editor, parent-on-create, markdown, touch drag-drop.
- New 'Dashboard config' subsection with a YAML example for
dashboard.kanban.*.
- REST table gains /tasks/bulk and /config rows.
|
||
|
|
45806629c5 |
feat(kanban): Triage column, progress rollup, WS auth, lanes, polish
Follows up on the initial dashboard plugin with the items called out
during self-review — ships the GUI-reality claims the PR body made,
closes the WebSocket auth gap, and lands the 'Triage' status the design
spec's Fusion-style screenshot leads with.
Kernel changes
- kanban_db.VALID_STATUSES gains 'triage'. status is TEXT without a
CHECK constraint so no schema migration is needed.
- create_task(triage=True) forces the initial status to 'triage'
regardless of parents, and parent ids are still validated so the
eventual link rows don't dangle. recompute_ready() only promotes
'todo' -> 'ready', so triage tasks are naturally isolated from the
dispatcher pipeline.
- hermes kanban create gains --triage.
Patterns table (docs) gains P9 'Triage specifier'.
Plugin backend (plugins/kanban/dashboard/plugin_api.py)
- GET /board now auto-init's kanban.db on first read (idempotent).
A fresh install shows an empty board instead of 'failed to load'.
- GET /board returns a new 'progress' field per task — {done, total}
of child-task completion, or None if the task has no children.
- BOARD_COLUMNS prepends 'triage'.
- POST /tasks accepts {triage: bool}; PATCH /tasks/:id accepts
{status: 'triage'}.
- WebSocket /events now requires ?token=<session_token> as a query
param — browsers can't set Authorization on a WS upgrade, so this
matches the pattern the in-browser PTY bridge uses. Constant-time
compare against hermes_cli.web_server._SESSION_TOKEN. In bare-test
contexts (no dashboard module) the check no-ops so the tail loop
stays testable. Security boundary documented in the module header
and in website/docs/user-guide/features/kanban.md.
Plugin UI (plugins/kanban/dashboard/dist/index.js + style.css)
- Adds the Triage column (lilac dot) with helper text
'Raw ideas — a specifier will flesh out the spec'. Inline-create
from the Triage column parks new tasks in triage.
- Status action row in the drawer gains '→ triage'.
- Progress pill (N/M) on cards that have children. Full-complete
state tints the pill green.
- 'Lanes by profile' toolbar toggle — sub-groups the Running column
by assignee so you see at a glance which specialist is busy on
what.
- Destructive status moves (done / archived / blocked) via drag-drop
OR via the drawer action row now prompt for confirmation.
- Escape closes the drawer.
- Live-update reloads are debounced (250ms) so a burst of
task_events triggers one refetch, not N.
- WebSocket includes ?token= built from window.__HERMES_SESSION_TOKEN__.
- WebSocket reconnect uses exponential backoff capped at 30s, not
a fixed 1.5s spin loop, and surfaces a user-visible error on
code-1008 (auth rejected) instead of reconnecting forever.
- ErrorBoundary wraps the page — a bad card render shows a
'rendering error, reload view' card instead of crashing the tab.
Tests (tests/plugins/test_kanban_dashboard_plugin.py, +5 tests = 21)
- empty-board shape now asserts all 6 columns including 'triage'
- create_triage_lands_in_triage_column
- triage_task_not_promoted_to_ready (dispatcher bypasses triage)
- patch_status_triage_works (both into triage and out of it)
- board_progress_rollup (0/2 -> 1/2 -> childless cards = None)
- board_auto_initializes_missing_db
- ws_events_rejects_when_token_required (three sub-assertions:
missing → 1008, wrong → 1008, correct → handshake accepted)
All 82 kanban tests pass under scripts/run_tests.sh.
Docs
- kanban.md 'What the plugin gives you' fully rewritten to match
shipped reality (triage, progress pill, assignee lanes,
destructive-confirm, Escape-close, debounce).
- New 'Security model' subsection documents the explicit-plugin-
route-bypass, the WS token requirement, and the --host 0.0.0.0
warning; also notes that kanban.db is profile-agnostic on purpose
(the coordination primitive) so cross-profile visibility is
expected.
- CLI command reference shows --triage.
- Collaboration patterns table adds P9 'Triage specifier'.
|
||
|
|
4093201c47 |
feat(kanban): dashboard plugin — Linear/Fusion-style board UI
Ships plugins/kanban/dashboard/ as a bundled dashboard plugin. No core changes — uses the standard dashboard plugin contract (manifest.json + dist/index.js + plugin_api.py) documented in 'Extending the Dashboard'. What the tab gives you: - One column per kanban status (todo / ready / running / blocked / done; archived behind a toggle), column counts, coloured status dots. - Cards with id, title, priority badge, tenant tag, assignee, comment/link counts, 'created N ago'. - HTML5 drag-drop between columns — status change routes through the same kanban_db code the CLI /kanban verbs use, so the three surfaces (CLI, gateway, dashboard) can never drift. - Inline create per-column (title, assignee, priority). - Side drawer on card click: description, status action row (→ ready / → running / block / unblock / complete / archive), dependency links, comment thread with Enter-to-submit, last 20 events. - Toolbar: search, tenant filter, assignee filter, show-archived, nudge-dispatcher (skip the 60s wait), refresh. - Live updates via WebSocket tailing task_events — the board reflects CLI or gateway actions in real time. REST surface under /api/plugins/kanban/: GET /board, GET /tasks/:id, POST /tasks, PATCH /tasks/:id, POST /tasks/:id/comments, POST /links, DELETE /links, POST /dispatch, WS /events. Every handler is a thin wrapper around kanban_db — no new business logic. Visually theme-aware: the plugin CSS reads only --color-*, --radius, --font-mono etc. so it reskins with whichever dashboard theme is active. Tests (tests/plugins/test_kanban_dashboard_plugin.py, 16 tests): - empty board shape - create + appears in ready column with tenant/assignee rollups - tenant filter - detail includes parents/children/events - 404 on unknown task - PATCH status: complete / block / unblock / ready drag-drop / running - PATCH reassign, priority, edit, invalid-status rejection - POST comment (plus empty-body rejection) - POST link + DELETE link + cycle rejection - POST dispatch (dry run) All 76 kanban tests pass under scripts/run_tests.sh. Docs: website/docs/user-guide/features/kanban.md gains a full 'Dashboard (GUI)' section covering install, architecture, REST surface, live-updates mechanism, extending, and scope boundary. |
||
|
|
9f610aa8f3 |
docs(kanban): add GUI/Dashboard plugin section
The /kanban CLI + slash command are enough to run the board headlessly, but triage and cross-profile supervision want a visual board. Document the design as a dashboard plugin that: - reads live state from kanban.db over a WebSocket on task_events (no polling) - writes through run_slash() so CLI/gateway/GUI cannot drift - mounts under /api/plugins/kanban/ following the existing 'Extending the Dashboard' plugin shape The plugin is strictly a thin layer over kanban_db — no new business logic, nothing to merge into the kernel. |
||
|
|
e1c5e741ad |
feat(kanban): durable multi-profile collaboration board (#16081)
New `hermes kanban` CLI subcommand + `/kanban` slash command + skills for worker and orchestrator profiles. SQLite-backed task board (~/.hermes/kanban.db) shared across all profiles on the host. Zero changes to run_agent.py, no new core tools, no tool-schema bloat. Motivation: delegate_task is a function call — sync fork/join, anonymous subagent, no resumability, no human-in-the-loop. Kanban is the durable shape needed for research triage, scheduled ops, digital twins, engineering pipelines, and fleet work. They coexist (workers may call delegate_task internally). What this adds - hermes_cli/kanban_db.py — schema, CAS claim, dependency resolution, dispatcher, workspace resolution, worker-context builder. - hermes_cli/kanban.py — 15-verb CLI surface and shared run_slash() entry point used by both CLI and gateway. - skills/devops/kanban-worker — how a profile should work a claimed task. - skills/devops/kanban-orchestrator — "you are a dispatcher, not a worker" template with anti-temptation rules. - /kanban slash command wired into cli.py and gateway/run.py. Bypasses the running-agent guard (board writes don't touch agent state), so /kanban unblock can free a stuck worker mid-conversation. - Design spec at docs/hermes-kanban-v1-spec.pdf — comparative analysis vs Cline Kanban, Paperclip, NanoClaw, Gemini Enterprise; 8 patterns; 4 user stories; implementation plan; concurrency correctness. - Docs: website/docs/user-guide/features/kanban.md, CLI reference updated, sidebar entry added. Architecture highlights - Three planes: control (user + gateway), state (board + dispatcher), execution (pool of profile processes). - Every worker is a full OS process, spawned as `hermes -p <profile>`. No in-process subagent swarms — solves NanoClaw's SDK-lifecycle failure class. - Atomic claim via SQLite CAS in a BEGIN IMMEDIATE transaction; stale claims reclaimed 15 min after their TTL expires. - Tenant namespacing via one nullable column — one specialist fleet can serve many businesses with data isolation by workspace path. Tests: 60 targeted tests (schema, CAS atomicity, dependency resolution, dispatcher, workspace kinds, tenancy, CLI + slash surface). All pass hermetic via scripts/run_tests.sh. |
||
|
|
06f81752ed |
Revert "feat(kanban): durable multi-profile collaboration board (#16081)" (#16098)
This reverts commit
|
||
|
|
15937a6b46 |
feat(kanban): durable multi-profile collaboration board (#16081)
New `hermes kanban` CLI subcommand + `/kanban` slash command + skills for worker and orchestrator profiles. SQLite-backed task board (~/.hermes/kanban.db) shared across all profiles on the host. Zero changes to run_agent.py, no new core tools, no tool-schema bloat. Motivation: delegate_task is a function call — sync fork/join, anonymous subagent, no resumability, no human-in-the-loop. Kanban is the durable shape needed for research triage, scheduled ops, digital twins, engineering pipelines, and fleet work. They coexist (workers may call delegate_task internally). What this adds - hermes_cli/kanban_db.py — schema, CAS claim, dependency resolution, dispatcher, workspace resolution, worker-context builder. - hermes_cli/kanban.py — 15-verb CLI surface and shared run_slash() entry point used by both CLI and gateway. - skills/devops/kanban-worker — how a profile should work a claimed task. - skills/devops/kanban-orchestrator — "you are a dispatcher, not a worker" template with anti-temptation rules. - /kanban slash command wired into cli.py and gateway/run.py. Bypasses the running-agent guard (board writes don't touch agent state), so /kanban unblock can free a stuck worker mid-conversation. - Design spec at docs/hermes-kanban-v1-spec.pdf — comparative analysis vs Cline Kanban, Paperclip, NanoClaw, Gemini Enterprise; 8 patterns; 4 user stories; implementation plan; concurrency correctness. - Docs: website/docs/user-guide/features/kanban.md, CLI reference updated, sidebar entry added. Architecture highlights - Three planes: control (user + gateway), state (board + dispatcher), execution (pool of profile processes). - Every worker is a full OS process, spawned as `hermes -p <profile>`. No in-process subagent swarms — solves NanoClaw's SDK-lifecycle failure class. - Atomic claim via SQLite CAS in a BEGIN IMMEDIATE transaction; stale claims reclaimed 15 min after their TTL expires. - Tenant namespacing via one nullable column — one specialist fleet can serve many businesses with data isolation by workspace path. Tests: 60 targeted tests (schema, CAS atomicity, dependency resolution, dispatcher, workspace kinds, tenancy, CLI + slash surface). All pass hermetic via scripts/run_tests.sh. |
||
|
|
7fa70b6c87 |
refactor: /btw is now an alias for /background (#16053)
The ephemeral no-tools side-question variant of /btw confused users who expected 'by-the-way' to mean 'run this off to the side with tools' — they'd type /btw and get a toolless agent that couldn't do the work. /bg worked because it was /background with full tools. Collapse the two: /btw and /bg both alias to /background. One command, one behavior, no more gotchas about which variant has tools. Removed: - _handle_btw_command in cli.py and gateway/run.py - _run_btw_task + _active_btw_tasks state in gateway/run.py - prompt.btw JSON-RPC method + btw.complete event in tui_gateway - BtwStartResponse type + btw.complete case in ui-tui - Standalone /btw slash tree registration in Discord - Standalone btw CommandDef in hermes_cli/commands.py Updated: - background CommandDef aliases: (bg,) -> (bg, btw) - TUI session.ts: local btw handler merged into background - Docs and tips updated to describe /btw as a /background alias |
||
|
|
9a70260490 |
Revert "feat(onboarding): port first-touch hints to the TUI (#16054)" (#16062)
This reverts commit
|
||
|
|
ffd2621039 |
feat(onboarding): port first-touch hints to the TUI (#16054)
PR #16046 added /busy and /verbose hints to the classic CLI and the gateway runner but skipped the Ink TUI (and therefore the dashboard /chat page, which embeds the TUI via PTY). This extends the same latch to the TUI with TUI-native wording. The TUI's busy-input model is not the /busy knob from the CLI — single Enter while busy auto-queues, double Enter on an empty line interrupts. The new busy-input hint teaches THAT gesture instead of telling the user to flip a config that does not apply. Changes: - agent/onboarding.py — add busy_input_hint_tui() + tool_progress_hint_tui() - tui_gateway/server.py — onboarding.claim JSON-RPC (Ink triggers busy hint on enqueue) + _maybe_emit_onboarding_hint helper hooked into _on_tool_complete for the 30s/tool_progress=all path. Same config.yaml latch so each hint fires at most once per install across CLI, gateway, and TUI combined. - ui-tui/src/gatewayTypes.ts — OnboardingClaimResponse + onboarding.hint event - ui-tui/src/app/createGatewayEventHandler.ts — render the hint event as sys() - ui-tui/src/app/useSubmission.ts — claim busy_input_prompt on first busy enqueue - tests/agent/test_onboarding.py — +3 cases for TUI hint shape - tests/tui_gateway/test_protocol.py — +4 cases for onboarding.claim - website/docs/user-guide/tui.md — new 'Interrupting and queueing' section explaining the TUI's double-Enter model and the hints Validation: scripts/run_tests.sh tests/agent/test_onboarding.py \ tests/tui_gateway/test_protocol.py \ tests/gateway/test_busy_session_ack.py -> 66 passed npm --prefix ui-tui run type-check -> clean npm --prefix ui-tui run lint -> clean npm --prefix ui-tui run build -> clean |
||
|
|
83c1c201f6 |
feat(onboarding): contextual first-touch hints for /busy and /verbose (#16046)
Instead of a blocking first-run questionnaire, show a one-time hint the first time the user hits each behavior fork: 1. First message while the agent is working — appends a hint to the busy-ack explaining the /busy queue vs /busy interrupt knob, phrased to match the mode that was just applied (don't tell a queue-mode user to switch to queue). 2. First tool that runs for >= 30s in the noisiest progress mode (tool_progress: all) — prints a hint about /verbose to cycle display modes (all -> new -> off -> verbose). Gated on /verbose actually being usable on the surface: always shown on CLI; on gateway only shown when display.tool_progress_command is enabled. Each hint is latched in config.yaml under onboarding.seen.<flag>, so it fires exactly once per install across CLI, gateway, and cron, then never again. Users can wipe the section to re-see hints. New: - agent/onboarding.py — is_seen / mark_seen / hint strings, shared by both CLI and gateway. - onboarding.seen in DEFAULT_CONFIG (hermes_cli/config.py) and in load_cli_config defaults (cli.py). No _config_version bump — deep merge handles new keys. Wired: - gateway/run.py: _handle_active_session_busy_message appends the hint after building the ack. progress_callback tracks tool.completed duration and queues the tool-progress hint into the progress bubble. - cli.py: CLI input loop appends the busy-input hint on the first busy Enter; _on_tool_progress appends the tool-progress hint on the first >=30s tool completion. In-memory CLI_CONFIG is also updated so subsequent fires in the same process are suppressed immediately. All writes go through atomic_yaml_write and are wrapped in try/except so onboarding can never break the input/busy-ack paths. |
||
|
|
855366909f |
feat(models): remote model catalog manifest for OpenRouter + Nous Portal (#16033)
OpenRouter and Nous Portal curated picker lists now resolve via a JSON manifest served by the docs site, falling back to the in-repo snapshot when unreachable. Lets us update model lists without shipping a release. Live URL: https://hermes-agent.nousresearch.com/docs/api/model-catalog.json (source at website/static/api/model-catalog.json; auto-deploys via the existing deploy-site.yml GitHub Pages pipeline on every merge to main). Schema (v1) carries id + optional description + free-form metadata at manifest, provider, and model levels. Pricing and context length stay live-fetched via existing machinery (/v1/models endpoints, models.dev). Config (new model_catalog section, default enabled): model_catalog.url master manifest URL model_catalog.ttl_hours disk cache TTL (default 24h) model_catalog.providers.<name>.url optional per-provider override Fetch pipeline: in-process cache -> disk cache (fresh < TTL) -> HTTP fetch -> disk-cache-on-failure fallback -> in-repo snapshot as last resort. Never raises to callers; at worst returns the bundled list. Changes: - website/static/api/model-catalog.json initial manifest (35 OR + 31 Nous) - scripts/build_model_catalog.py regenerator from in-repo lists - hermes_cli/model_catalog.py fetch + validate + cache module - hermes_cli/models.py fetch_openrouter_models() + new get_curated_nous_model_ids() - hermes_cli/main.py, hermes_cli/auth.py Nous flows use the helper - hermes_cli/config.py model_catalog defaults - website/docs/reference/model-catalog.md + sidebars.ts - tests/hermes_cli/test_model_catalog.py 21 tests (validation, fetch success/failure, accessors, disabled, overrides, integration) |
||
|
|
59b56d445c |
feat(hooks): add duration_ms to post_tool_call + transform_tool_result (#15429)
Plugin hooks fired after a tool dispatch now receive an integer
duration_ms kwarg measuring how long the tool's registry.dispatch()
call took (time.monotonic() before/after). Inspired by Claude Code
2.1.119 which added the same field to PostToolUse hook inputs.
Wire points:
- model_tools.py: measure dispatch latency, pass duration_ms to
invoke_hook("post_tool_call", ...) and invoke_hook("transform_tool_result", ...)
- hermes_cli/hooks.py: include duration_ms in the synthetic payload
used by 'hermes hooks test' and 'hermes hooks doctor' so shell-hook
authors see the same shape at development time as runtime
- shell hooks (agent/shell_hooks.py): no code change needed;
_serialize_payload already surfaces non-top-level kwargs under
payload['extra'], so duration_ms lands at extra.duration_ms for
shell-hook scripts
Plugin authors can now build latency dashboards, per-tool SLO alerts,
and regression canaries without having to wrap every tool manually.
Test: tests/test_model_tools.py::test_post_tool_call_receives_non_negative_integer_duration_ms
E2E: real PluginManager + dispatch monkey-patched with a 50ms sleep,
hook callback observes duration_ms=50 (int).
Refs: https://code.claude.com/docs/en/changelog (2.1.119, Apr 23 2026)
|
||
|
|
a55de5bcd0 |
feat(setup): auto-reconfigure on existing installs (#15879)
Bare `hermes setup` on a returning user now drops straight into the full reconfigure wizard — every prompt shows the current value as its default, press Enter to keep or type a new value to change it. The returning-user menu is gone. Behavior: - First-time user: first-time wizard (unchanged) - Returning user, bare command: full reconfigure wizard (new default) - Returning user, `--quick`: only prompt for missing/unset items - Returning user, one section: `hermes setup model|terminal|gateway|tools|agent` - `--reconfigure`: preserved as backwards-compat alias (no-op since it's now default) The section functions already used current values as prompt defaults — this change just removes the extra click to get to them. The 'Quick Setup - configure missing items only' menu option is now exposed as the explicit `--quick` flag; it's the narrow case of filling in missing config (e.g. after a partial OpenClaw migration or when a required API key got cleared). Inspired by Mercury Agent's `mercury doctor` UX. Also removes: - RETURNING_USER_MENU_SECTION_KEYS (orphaned constant) - Two returning-user menu tests in test_setup_noninteractive.py (guarding behavior that no longer exists — covered by test_setup_reconfigure.py instead) |
||
|
|
7c50ed707c |
docs(azure-foundry): add provider guide, env vars, release AUTHOR_MAP
- New website/docs/guides/azure-foundry.md covering both OpenAI-style and Anthropic-style endpoints, auto-detection behaviour, gpt-5.x routing, /v1 stripping, api-version query forwarding, and the provider: anthropic + Azure URL alternative setup. - environment-variables.md picks up AZURE_FOUNDRY_API_KEY, AZURE_FOUNDRY_BASE_URL, AZURE_ANTHROPIC_KEY. - cli-commands.md includes azure-foundry in the provider choices list. - configuration.md lists azure-foundry among auxiliary-task providers. - sidebars.ts wires the new guide into the Guides section. - scripts/release.py AUTHOR_MAP entries for TechPrototyper, HangGlidersRule (noreply), and pein892 so the contributor-attribution CI check does not reject the salvage. |
||
|
|
81e01f6ee9 | fix(agent): preserve Codex message items for replay | ||
|
|
dc4d92f131 |
docs: embed tutorial videos on webhooks + auxiliary models pages (#15809)
- webhooks.md: adds a Video Tutorial section under the intro with a responsive YouTube iframe (WNYe5mD4fY8). - configuration.md: adds a Video Tutorial subsection under Auxiliary Models with a responsive YouTube iframe (NoF-YajElIM). Both use a 16:9 aspect-ratio wrapper so the embeds scale cleanly on mobile. Verified with `npm run build` — MDX parses clean, no new warnings or broken links introduced. |
||
|
|
ea01bdcebe |
refactor(memory): remove flush_memories entirely (#15696)
The AIAgent.flush_memories pre-compression save, the gateway _flush_memories_for_session, and everything feeding them are obsolete now that the background memory/skill review handles persistent memory extraction. Problems with flush_memories: - Pre-dates the background review loop. It was the only memory-save path when introduced; the background review now fires every 10 user turns on CLI and gateway alike, which is far more frequent than compression or session reset ever triggered flush. - Blocking and synchronous. Pre-compression flush ran on the live agent before compression, blocking the user-visible response. - Cache-breaking. Flush built a temporary conversation prefix (system prompt + memory-only tool list) that diverged from the live conversation's cached prefix, invalidating prompt caching. The gateway variant spawned a fresh AIAgent with its own clean prompt for each finalized session — still cache-breaking, just in a different process. - Redundant. Background review runs in the live conversation's session context, gets the same content, writes to the same memory store, and doesn't break the cache. Everything flush_memories claimed to preserve is already covered. What this removes: - AIAgent.flush_memories() method (~248 LOC in run_agent.py) - Pre-compression flush call in _compress_context - flush_memories call sites in cli.py (/new + exit) - GatewayRunner._flush_memories_for_session + _async_flush_memories (and the 3 call sites: session expiry watcher, /new, /resume) - 'flush_memories' entry from DEFAULT_CONFIG auxiliary tasks, hermes tools UI task list, auxiliary_client docstrings - _memory_flush_min_turns config + init - #15631's headroom-deduction math in _check_compression_model_feasibility (headroom was only needed because flush dragged the full main-agent system prompt along; the compression summariser sends a single user-role prompt so new_threshold = aux_context is safe again) - The dedicated test files and assertions that exercised flush-specific paths What this renames (with read-time backcompat on sessions.json): - SessionEntry.memory_flushed -> SessionEntry.expiry_finalized. The session-expiry watcher still uses the flag to avoid re-running finalize/eviction on the same expired session; the new name reflects what it now actually gates. from_dict() reads 'expiry_finalized' first, falls back to the legacy 'memory_flushed' key so existing sessions.json files upgrade seamlessly. Supersedes #15631 and #15638. Tested: 383 targeted tests pass across run_agent/, agent/, cli/, and gateway/ session-boundary suites. No behavior regressions — background memory review continues to handle persistent memory extraction on both CLI and gateway. |
||
|
|
cf2fabc40f |
docs(dashboard): document page-scoped plugin slots (#15662)
Follow-up to PR #15658. The feature PR introduced page-scoped slots (<page>:top / <page>:bottom inside every built-in page) but only touched the Shell slots catalogue. Adds proper narrative coverage so plugin authors find the feature. Changes - extending-the-dashboard.md: - Frontmatter description + intro bullet now mention page-scoped slots - New TOC entry "Augmenting built-in pages (page-scoped slots)" - New dedicated subsection after "Replacing built-in pages" explaining the heavy-vs-light tradeoff, listing the pages that expose slots, and showing a worked manifest + IIFE example with tab.hidden: true - Cross-link from the tab.override section pointing readers to the lighter augmentation option - web-dashboard.md: - Bullet mentioning "page-scoped slots (inject widgets into built-in pages without overriding them)" Validation - TOC anchor "#augmenting-built-in-pages-page-scoped-slots" matches the generated heading slug - Code fences balanced (64, even) - Pre-existing docusaurus build errors (skills.json, api-server.md link) reproduce on bare main -- not introduced here |
||
|
|
af22421e87 |
feat(dashboard): page-scoped plugin slots for built-in pages (#15658)
* fix(terminal): three-layer defense against watch_patterns notification spam Background processes that stack notify_on_complete=True with watch_patterns can flood the user with duplicate, delayed notifications — matches deliver asynchronously via the completion queue and continue arriving minutes after the process has exited. The docstring warning against this (PR #12113) has proven insufficient; agents still misuse the combination. Three layered defenses, each sufficient on its own: 1. Mutual exclusion (terminal_tool.py): When both flags are set on a background process, drop watch_patterns with a warning. notify_on_complete wins because 'let me know when it's done' is the more useful signal and fires exactly once. Extracted as _resolve_notification_flag_conflict() so the rule is testable in isolation. 2. Suppress-after-exit (process_registry.py): _check_watch_patterns() now bails the moment session.exited is True. Post-exit chunks (buffered reads draining after the process is gone) no longer produce notifications. This is the fix flagged as future work in session 20260418_020302_79881c. 3. Global circuit breaker (process_registry.py): Per-session rate limits don't catch the sibling-flood case — N concurrent processes can each stay under 8/10s and still collectively spam. New WATCH_GLOBAL_MAX_PER_WINDOW=15 cap trips a 30-second cooldown across ALL sessions, emits a single watch_overflow_tripped event, silently counts dropped events, and emits a watch_overflow_released summary when the cooldown ends. Also updates the tool schema + docstring to document the new behavior. Tests: 8 new tests covering all three fixes (suppress-after-exit x2, mutual-exclusion resolver x4, global breaker trip/cooldown/release x2). All 60 tests across test_watch_patterns.py, test_notify_on_complete.py, test_terminal_tool.py pass. Real-world trigger: self-inflicted in session 20260425_051924 — three concurrent hermes-sweeper review subprocesses each set watch_patterns= ['failed validation', 'errored'] AND notify_on_complete=True, then iterated over multiple items, producing enough matches per process to defeat the per-session cap while staying under the global cap that didn't yet exist. * fix(terminal): aggressive 1-per-15s watch_patterns rate limit + strike-3 promotion Per Teknium's direction, the watch_patterns rate limit is now much more aggressive and self-healing. ## New rule — per session - HARD cap: 1 watch-match notification per 15 seconds per process. - Any match arriving inside the cooldown window is dropped and counts as ONE strike for that window (many drops in the same window still = 1 strike). - After 3 consecutive strike windows, watch_patterns is permanently disabled for the session and the session is auto-promoted to notify_on_complete semantics — exactly one notification when the process actually exits. - A cooldown window that expires with zero drops resets the consecutive strike counter — healthy cadence is forgiven. ## Schema + docstring rewritten The tool schema description now gives the model explicit guidance: - notify_on_complete is 'the right choice for almost every long-running task' - watch_patterns is for RARE one-shot signals on LONG-LIVED processes - Do NOT use watch_patterns with loops/batch jobs — error patterns fire every iteration and will hit the strike limit fast - Mutual exclusion is stated on both parameter descriptions - 1/15s cooldown and 3-strike promotion are stated in the watch_patterns description so the model sees the contract every turn ## Removed - WATCH_MAX_PER_WINDOW (8/10s) and WATCH_OVERLOAD_KILL_SECONDS (45) — the new 1/15s limit subsumes both; keeping them would double-count. - _watch_window_hits / _watch_window_start / _watch_overload_since fields on ProcessSession. Replaced by _watch_last_emit_at / _watch_cooldown_until / _watch_strike_candidate / _watch_consecutive_strikes. ## Kept - Global circuit breaker across all sessions (15/10s → 30s cooldown) as a secondary safety net for concurrent siblings. Still valuable when 20 short-lived processes each fire once — none individually violates the per-session limit. - Suppress-after-exit guard. - Mutual exclusion resolver at the tool entry point. ## Tests - 6 new tests in TestPerSessionRateLimit covering: first match delivers, second in cooldown suppressed, multi-drop = single strike, 3 strikes disables + promotes, clean window resets counter, suppressed count carried to next emit. - Global circuit breaker tests rewritten to use fresh sessions instead of hacking removed per-window fields. - 50/50 watch_patterns + notify_on_complete tests pass. - 60/60 including test_terminal_tool.py pass. * feat(dashboard): page-scoped plugin slots for built-in pages Dashboard plugins can now inject components into specific built-in pages (Sessions, Analytics, Logs, Cron, Skills, Config, Env, Docs, Chat) without overriding the whole route. Previously, plugins could only: 1. Add new tabs (tab.path) 2. Replace whole built-in pages (tab.override) 3. Inject into global shell slots (header-*, footer-*, pre-main, ...) None of those let a plugin add a banner, card, or widget to an existing page. The new <page>:top / <page>:bottom slots close that gap, reusing the existing registerSlot() API. Changes - web/src/plugins/slots.ts: 18 new KNOWN_SLOT_NAMES entries (sessions:top, sessions:bottom, analytics:top, ..., chat:bottom), grouped under "Shell-wide" vs "Page-scoped" in the docblock - web/src/pages/*: each built-in page now renders <PluginSlot name="<page>:top" /> as the first child of its outer wrapper and <PluginSlot name="<page>:bottom" /> as the last child -- zero visual cost when no plugin registers - plugins/example-dashboard: registers a demo banner into sessions:top via registerSlot(), with matching slots entry in the manifest -- so freshly-setup users can see what page-scoped slots look like without writing any plugin code - website/docs: new "Page-scoped slots" table in the plugin authoring guide, with a worked example - tests/hermes_cli/test_web_server.py: round-trip test for colon-bearing slot names (sessions:top, analytics:bottom, ...) Validation - npm run build: clean (tsc -b + vite build, 2761 modules) - scripts/run_tests.sh tests/hermes_cli/test_web_server.py::TestDashboardPluginManifestExtensions: 5/5 pass |
||
|
|
e5647d7863 |
docs: consolidate dashboard themes and plugins into Extending the Dashboard (#15530)
The web-dashboard.md and dashboard-plugins.md pages had overlapping, partial coverage of the theme and plugin systems. Themes were split across two pages; the plugin docs had a minimal manifest reference but no step-by-step guide, no slot catalog, and no theme+plugin demo. New: user-guide/features/extending-the-dashboard.md — single navigable reference for all three extension layers (themes, UI plugins, backend plugins). Includes: - Theme quick-start + full schema (palette, typography, layout, layout variants, assets, componentStyles, colorOverrides, customCSS) - Plugin quick-start + full schema (manifest, SDK, slots, tab.override, tab.hidden, backend routes, custom CSS) - 10-slot shell catalog with locations - Plugin discovery + load lifecycle - Combined theme+plugin walkthrough (Strike Freedom cockpit demo) - API reference + troubleshooting web-dashboard.md: trimmed to core tool docs (pages, REST API, CORS, development). Theme/plugin content now points to the new page with a built-in themes summary table. dashboard-plugins.md: deleted (merged into extending-the-dashboard.md). sidebars.ts: swap 'dashboard-plugins' → 'extending-the-dashboard' under the Management group. No user-facing behavior change; docs-only. |
||
|
|
0ed37c0ca4 | docs(delegate): document max_concurrent_children and max_spawn_depth + cost warning | ||
|
|
1dcf79a864 | feat: add slash command for busy input mode | ||
|
|
0bcbc9e316 |
docs(faq): Update docs on backups
- update faq answer with new `backup` command in release 0.9.0 - move profile export section together with backup section so related information can be read more easily - add table comparison between `profile export` and `backup` to assist users if understanding the nuances between both |
||
|
|
c61547c067 |
Merge pull request #14890 from NousResearch/bb/tui-web-chat-unified
feat(web): dashboard Chat tab — xterm.js + JSON-RPC sidecar (supersedes #12710 + #13379) |
||
|
|
850fac14e3 | chore: address copilot comments | ||
|
|
5500b51800 | chore: fix lint | ||
|
|
10deb1b87d |
fix(gateway): canonicalize WhatsApp identity in session keys
Hermes' WhatsApp bridge routinely surfaces the same person under either a phone-format JID (60123456789@s.whatsapp.net) or a LID (…@lid), and may flip between the two for a single human within the same conversation. Before this change, build_session_key used the raw identifier verbatim, so the bridge reshuffling an alias form produced two distinct session keys for the same person — in two places: 1. DM chat_id — a user's DM sessions split in half, transcripts and per-sender state diverge. 2. Group participant_id (with group_sessions_per_user enabled) — a member's per-user session inside a group splits in half for the same reason. Add a canonicalizer that walks the bridge's lid-mapping-*.json files and picks the shortest/numeric-preferred alias as the stable identity. build_session_key now routes both the DM chat_id and the group participant_id through this helper when the platform is WhatsApp. All other platforms and chat types are untouched. Expose canonical_whatsapp_identifier and normalize_whatsapp_identifier as public helpers. Plugins that need per-sender behaviour (role-based routing, per-contact authorization, policy gating) need the same identity resolution Hermes uses internally; without a public helper, each plugin would have to re-implement the walker against the bridge's internal on-disk format. Keeping this alongside build_session_key makes it authoritative and one refactor away if the bridge ever changes shape. _expand_whatsapp_aliases stays private — it's an implementation detail of how the mapping files are walked, not a contract callers should depend on. |
||
|
|
f49afd3122 |
feat(web): add /api/pty WebSocket bridge to embed TUI in dashboard
Exposes hermes --tui over a PTY-backed WebSocket so the dashboard can
embed the real TUI rather than reimplement its surface. The browser
attaches xterm.js to the socket; keystrokes flow in, PTY output bytes
flow out.
Architecture:
browser <Terminal> (xterm.js)
│ onData ───► ws.send(keystrokes)
│ onResize ► ws.send('\x1b[RESIZE:cols;rows]')
│ write ◄── ws.onmessage (PTY bytes)
▼
FastAPI /api/pty (token-gated, loopback-only)
▼
PtyBridge (ptyprocess) ── spawns node ui-tui/dist/entry.js ──► tui_gateway + AIAgent
Components
----------
hermes_cli/pty_bridge.py
Thin wrapper around ptyprocess.PtyProcess: byte-safe read/write on the
master fd via os.read/os.write (not PtyProcessUnicode — ANSI is
inherently byte-oriented and UTF-8 boundaries may land mid-read),
non-blocking select-based reads, TIOCSWINSZ resize, idempotent
SIGHUP→SIGTERM→SIGKILL teardown, platform guard (POSIX-only; Windows
is WSL-supported only).
hermes_cli/web_server.py
@app.websocket("/api/pty") endpoint gated by the existing
_SESSION_TOKEN (via ?token= query param since browsers can't set
Authorization on WS upgrades). Loopback-only enforcement. Reader task
uses run_in_executor to pump PTY bytes without blocking the event
loop. Writer loop intercepts a custom \x1b[RESIZE:cols;rows] escape
before forwarding to the PTY. The endpoint resolves the TUI argv
through a _resolve_chat_argv hook so tests can inject fake commands
without building the real TUI.
Tests
-----
tests/hermes_cli/test_pty_bridge.py — 12 unit tests: spawn, stdout,
stdin round-trip, EOF, resize (via TIOCSWINSZ + tput readback), close
idempotency, cwd, env forwarding, unavailable-platform error.
tests/hermes_cli/test_web_server.py — TestPtyWebSocket adds 7 tests:
missing/bad token rejection (close code 4401), stdout streaming,
stdin round-trip, resize escape forwarding, unavailable-platform ANSI
error frame + 1011 close, resume parameter forwarding to argv.
96 tests pass under scripts/run_tests.sh.
(cherry picked from commit
|
||
|
|
1840c6a57d |
feat(spotify): wire setup wizard into 'hermes tools' + document cron usage (#15180)
A — 'hermes tools' activation now runs the full Spotify wizard. Previously a user had to (1) toggle the Spotify toolset on in 'hermes tools' AND (2) separately run 'hermes auth spotify' to actually use it. The second step was a discovery gap — the docs mentioned it but nothing in the TUI pointed users there. Now toggling Spotify on calls login_spotify_command as a post_setup hook. If the user has no client_id yet, the interactive wizard walks them through Spotify app creation; if they do, it skips straight to PKCE. Either way, one 'hermes tools' pass leaves Spotify toggled on AND authenticated. SystemExit from the wizard (user abort) leaves the toolset enabled and prints a 'run: hermes auth spotify' hint — it does NOT fail the toolset toggle. Dropped the TOOL_CATEGORIES env_vars list for Spotify. The wizard handles HERMES_SPOTIFY_CLIENT_ID persistence itself, and asking users to type env var names before the wizard fires was UX-backwards — the point of the wizard is that they don't HAVE a client_id yet. B — Docs page now covers cron + Spotify. New 'Scheduling: Spotify + cron' section with two working examples (morning playlist, wind-down pause) using the real 'hermes cron add' CLI surface (verified via 'cron add --help'). Covers the active-device gotcha, Premium gating, memory isolation, and links to the cron docs. Also fixed a stale '9 Spotify tools' reference in the setup copy — we consolidated to 7 tools in #15154. Validation: - scripts/run_tests.sh tests/hermes_cli/test_tools_config.py tests/hermes_cli/test_spotify_auth.py tests/tools/test_spotify_client.py → 54 passed - website: node scripts/prebuild.mjs && npx docusaurus build → SUCCESS, no new warnings |
||
|
|
e5d41f05d4 |
feat(spotify): consolidate tools (9→7), add spotify skill, surface in hermes setup (#15154)
Three quality improvements on top of #15121 / #15130 / #15135: 1. Tool consolidation (9 → 7) - spotify_saved_tracks + spotify_saved_albums → spotify_library with kind='tracks'|'albums'. Handler code was ~90 percent identical across the two old tools; the merge is a behavioral no-op. - spotify_activity dropped. Its 'now_playing' action was a duplicate of spotify_playback.get_currently_playing (both return identical 204/empty payloads). Its 'recently_played' action moves onto spotify_playback as a new action — history belongs adjacent to live state. - Net: each API call ships 2 fewer tool schemas when the Spotify toolset is enabled, and the action surface is more discoverable (everything playback-related is on one tool). 2. Spotify skill (skills/media/spotify/SKILL.md) Teaches the agent canonical usage patterns so common requests don't balloon into 4+ tool calls: - 'play X' = one search, then play by URI (not search + scan + describe + play) - 'what's playing' = single get_currently_playing (no preflight get_state chain) - Don't retry on '403 Premium required' or '403 No active device' — both require user action - URI/URL/bare-ID format normalization - Full failure-mode reference for 204/401/403/429 3. Surfaced in 'hermes setup' tool status Adds 'Spotify (PKCE OAuth)' to the tool status list when auth.json has a Spotify access/refresh token. Matches the homeassistant pattern but reads from auth.json (OAuth-based) rather than env vars. Docs updated to reflect the new 7-tool surface, and mention the companion skill in the 'Using it' section. Tests: 54 passing (client 22, auth 15, tools_config 35 — 18 = 54 after renaming/replacing the spotify_activity tests with library + recently_played coverage). Docusaurus build clean. |
||
|
|
9be17bb84f |
docs(spotify): expand feature page with tool reference, Free/Premium matrix, troubleshooting (#15135)
The initial Spotify docs page shipped in #15130 was a setup guide. This expands it into a full feature reference: - Per-tool parameter table for all 9 tools, extracted from the real schemas in tools/spotify_tool.py (actions, required/optional args, premium gating). - Free vs Premium feature matrix — which actions work on which tier, so Free users don't assume Spotify tools are useless to them. - Active-device prerequisite called out at the top; this is the #1 cause of '403 no active device' reports for every Spotify integration. - SSH / headless section explaining that browser auto-open is skipped when SSH_CLIENT/SSH_TTY is set, and how to tunnel the callback port. - Token lifecycle: refresh on 401, persistence across restarts, how to revoke server-side via spotify.com/account/apps. - Example prompt list so users know what to ask the agent. - Troubleshooting expanded: no-active-device, Premium-required, 204 now_playing, INVALID_CLIENT, 429, 401 refresh-revoked, wizard not opening browser. - 'Where things live' table mapping auth.json / .env / Spotify app. Verified with 'node scripts/prebuild.mjs && npx docusaurus build' — page compiles, no new warnings. |
||
|
|
05394f2f28 |
feat(spotify): interactive setup wizard + docs page (#15130)
Previously 'hermes auth spotify' crashed with 'HERMES_SPOTIFY_CLIENT_ID is required' if the user hadn't manually created a Spotify developer app and set env vars. Now the command detects a missing client_id and walks the user through the one-time app registration inline: - Opens https://developer.spotify.com/dashboard in the browser - Tells the user exactly what to paste into the Spotify form (including the correct default redirect URI, 127.0.0.1:43827) - Prompts for the Client ID - Persists HERMES_SPOTIFY_CLIENT_ID to ~/.hermes/.env so subsequent runs skip the wizard - Continues straight into the PKCE OAuth flow Also prints the docs URL at both the start of the wizard and the end of a successful login so users can find the full guide. Adds website/docs/user-guide/features/spotify.md with the complete setup walkthrough, tool reference, and troubleshooting, and wires it into the sidebar under User Guide > Features > Advanced. Fixes a stale redirect URI default in the hermes_cli/tools_config.py TOOL_CATEGORIES entry (was 8888/callback from the PR description instead of the actual DEFAULT_SPOTIFY_REDIRECT_URI value 43827/spotify/callback defined in auth.py). |
||
|
|
2cab8129d1 |
feat(copilot): add 401 auth recovery with automatic token refresh and client rebuild
When using GitHub Copilot as provider, HTTP 401 errors could cause Hermes to silently fall back to the next model in the chain instead of recovering. This adds a one-shot retry mechanism that: 1. Re-resolves the Copilot token via the standard priority chain (COPILOT_GITHUB_TOKEN -> GH_TOKEN -> GITHUB_TOKEN -> gh auth token) 2. Rebuilds the OpenAI client with fresh credentials and Copilot headers 3. Retries the failed request before falling back The fix handles the common case where the gho_* OAuth token remains valid but the httpx client state becomes stale (e.g. after startup race conditions or long-lived sessions). Key design decisions: - Always rebuild client even if token string unchanged (recovers stale state) - Uses _apply_client_headers_for_base_url() for canonical header management - One-shot flag guard prevents infinite 401 loops (matches existing pattern used by Codex/Nous/Anthropic providers) - No token exchange via /copilot_internal/v2/token (returns 404 for some account types; direct gho_* auth works reliably) Tests: 3 new test cases covering end-to-end 401->refresh->retry, client rebuild verification, and same-token rebuild scenarios. Docs: Updated providers.md with Copilot auth behavior section. |
||
|
|
852c7f3be3 |
feat(cron): per-job workdir for project-aware cron runs (#15110)
Cron jobs can now specify a per-job working directory. When set, the job runs as if launched from that directory: AGENTS.md / CLAUDE.md / .cursorrules from that dir are injected into the system prompt, and the terminal / file / code-exec tools use it as their cwd (via TERMINAL_CWD). When unset, old behaviour is preserved (no project context files, tools use the scheduler's cwd). Requested by @bluthcy. ## Mechanism - cron/jobs.py: create_job / update_job accept 'workdir'; validated to be an absolute existing directory at create/update time. - cron/scheduler.py run_job: if job.workdir is set, point TERMINAL_CWD at it and flip skip_context_files to False before building the agent. Restored in finally on every exit path. - cron/scheduler.py tick: workdir jobs run sequentially (outside the thread pool) because TERMINAL_CWD is process-global. Workdir-less jobs still run in the parallel pool unchanged. - tools/cronjob_tools.py + hermes_cli/cron.py + hermes_cli/main.py: expose 'workdir' via the cronjob tool and 'hermes cron create/edit --workdir ...'. Empty string on edit clears the field. ## Validation - tests/cron/test_cron_workdir.py (21 tests): normalize, create, update, JSON round-trip via cronjob tool, tick partition (workdir jobs run on the main thread, not the pool), run_job env toggle + restore in finally. - Full targeted suite (tests/cron/, test_cronjob_tools.py, test_cron.py, test_config_cwd_bridge.py, test_worktree.py): 314/314 passed. - Live smoke: hermes cron create --workdir $(pwd) works; relative path rejected; list shows 'Workdir:'; edit --workdir '' clears. |
||
|
|
7626f3702e |
feat: read prompt caching cache_ttl from config
- Load prompt_caching.cache_ttl in AIAgent (5m default, 1h opt-in) - Document DEFAULT_CONFIG and developer guide example - Add unit tests for default, 1h, and invalid TTL fallback Made-with: Cursor |
||
|
|
b2e124d082 |
refactor(commands): drop /provider, /plan handler, and clean up slash registry (#15047)
* refactor(commands): drop /provider and clean up slash registry * refactor(commands): drop /plan special handler — use plain skill dispatch |
||
|
|
2ba9b29f37 |
docs(plugins): correct pre_gateway_dispatch doc text and add hooks.md section
Follow-up to
|
||
|
|
1ef1e4c669 |
feat(plugins): add pre_gateway_dispatch hook
Introduces a new plugin hook `pre_gateway_dispatch` fired once per
incoming MessageEvent in `_handle_message`, after the internal-event
guard but before the auth / pairing chain. Plugins may return a dict
to influence flow:
{"action": "skip", "reason": "..."} -> drop (no reply)
{"action": "rewrite", "text": "..."} -> replace event.text
{"action": "allow"} / None -> normal dispatch
Motivation: gateway-level message-flow patterns that don't fit cleanly
into any single adapter — e.g. listen-only group-chat windows (buffer
ambient messages, collapse on @mention), or human-handover silent
ingest (record messages while an owner handles the chat manually).
Today these require forking core; with this hook they can live in a
single profile-agnostic plugin.
Hook runs BEFORE auth so plugins can handle unauthorized senders
(e.g. customer-service handover ingest) without triggering the
pairing-code flow. Exceptions in plugin callbacks are caught and
logged; the first non-None action dict wins, remaining results are
ignored.
Includes:
- `VALID_HOOKS` entry + inline doc in `hermes_cli/plugins.py`
- Invocation block in `gateway/run.py::_handle_message`
- 5 new tests in `tests/gateway/test_pre_gateway_dispatch.py`
(skip, rewrite, allow, exception safety, internal-event bypass)
- 2 additional tests in `tests/hermes_cli/test_plugins.py`
- Table entry in `website/docs/user-guide/features/plugins.md`
Made-with: Cursor
|
||
|
|
67bfd4b828 |
feat(tui): stream thinking + tools expanded by default
Extends SECTION_DEFAULTS so the out-of-the-box TUI shows the turn as a live transcript (reasoning + tool calls streaming inline) instead of a wall of `▸` chevrons the user has to click every turn. Final default matrix: - thinking: expanded - tools: expanded - activity: hidden (unchanged from the previous commit) - subagents: falls through to details_mode (collapsed by default) Everything explicit in `display.sections` still wins, so anyone who already pinned an override keeps their layout. One-line revert is `display.sections.<name>: collapsed`. |
||
|
|
728767e910 |
feat(tui): hide the activity panel by default
The activity panel (gateway hints, terminal-parity nudges, background notifications) is noise for the typical day-to-day user, who only cares about thinking + tools + streamed content. Make `hidden` the built-in default for that section so users land on the quiet mode out of the box. Tool failures still render inline on the failing tool row, so this default suppresses the noise feed without losing the signal. Opt back in with `display.sections.activity: collapsed` (chevron) or `expanded` (always open) in `~/.hermes/config.yaml`, or live with `/details activity collapsed`. Implementation: SECTION_DEFAULTS in domain/details.ts, applied as the fallback in `sectionMode()` between the explicit override and the global details_mode. Existing `display.sections.activity` overrides take precedence — no migration needed for users who already set it. |
||
|
|
78481ac124 |
feat(tui): per-section visibility for the details accordion
Adds optional per-section overrides on top of the existing global
details_mode (hidden | collapsed | expanded). Lets users keep the
accordion collapsed by default while auto-expanding tools, or hide the
activity panel entirely without touching thinking/tools/subagents.
Config (~/.hermes/config.yaml):
display:
details_mode: collapsed
sections:
thinking: expanded
tools: expanded
activity: hidden
Slash command:
/details show current global + overrides
/details [hidden|collapsed|expanded] set global mode (existing)
/details <section> <mode|reset> per-section override (new)
/details <section> reset clear override
Sections: thinking, tools, subagents, activity.
Implementation:
- ui-tui/src/types.ts SectionName + SectionVisibility
- ui-tui/src/domain/details.ts parseSectionMode / resolveSections /
sectionMode + SECTION_NAMES
- ui-tui/src/app/uiStore.ts +
app/interfaces.ts +
app/useConfigSync.ts sections threaded into UiState
- ui-tui/src/components/
thinking.tsx ToolTrail consults per-section mode for
hidden/expanded behaviour; expandAll
skips hidden sections; floating-alert
fallback respects activity:hidden
- ui-tui/src/components/
messageLine.tsx + appLayout.tsx pass sections through render tree
- ui-tui/src/app/slash/
commands/core.ts /details <section> <mode|reset> syntax
- tui_gateway/server.py config.set details_mode.<section>
writes to display.sections.<section>
(empty value clears the override)
- website/docs/user-guide/tui.md documented
Tests: 14 new (4 domain, 4 useConfigSync, 3 slash, 3 gateway).
Total: 269/269 vitest, all gateway tests pass.
|