- Runs section: dashboard PATCH parity (summary/metadata forward),
`completed` event embeds first-line summary for notifiers, bulk
--summary/--metadata refused, archive/drag-drop reclaim semantics.
- Event reference: added Payload column to Lifecycle and Edits
tables; called out the invariant that `status` carries run_id
when closing a reclaimed run.
Integration audit of the runs-as-first-class work (0146cb2bd) found five
bugs where structured runs got orphaned or dashboard parity was missing.
All behavioral fixes; no schema change needed.
Kernel
- archive_task: when called on a running task, now closes the
in-flight run with outcome='reclaimed' and clears current_run_id.
Previously, dashboard bulk-archive or CLI `kanban archive <running>`
would leave the task_runs row open with ended_at=NULL forever and
strand the pointer. Adds the claim_lock / claim_expires / worker_pid
clearing to the UPDATE so the task row is clean too.
- complete_task: embeds the first-line handoff summary in the
`completed` event payload (capped at 400 chars). Notifier can now
render `✔ task done — <title>\n<summary>` without a second SQL hit,
and the full summary still lives on the run row.
Dashboard plugin
- _set_status_direct: drag-drop OFF 'running' (to 'ready', 'todo',
'triage', 'done' — anywhere except back to 'running') now closes
the active run with outcome='reclaimed'. Clears worker_pid too.
Snapshots previous status + current_run_id before the UPDATE so
the decision has the right before-state. status event rows now
carry run_id when closing a run, NULL otherwise.
- UpdateTaskBody: adds `summary` and `metadata` fields. PATCH
/tasks/:id with status='done' now forwards them to complete_task,
giving the dashboard parity with `hermes kanban complete --summary
... --metadata ...`. Previously these fields only existed on the
CLI.
CLI
- `hermes kanban complete a b c --summary X` or `--metadata Y`:
refused with a clear stderr message instead of silently applying
the same handoff to every task. Bulk-close without handoff flags
still works. (Note: hermes_cli.main discards subcommand exit
codes via `args.func(args)` without propagating; tracked
separately. Side-effect check is the real guard.)
Gateway notifier
- Completion message prefers run.summary (carried in event payload)
over task.result. task.result remains the fallback for legacy rows
written before runs shipped.
- Docstring: renamed stale `spawn_auto_blocked` reference to
`gave_up` / `timed_out` — matches the actual TERMINAL_KINDS
tuple, which was already correct in code.
Tests (+8 in core functionality, +3 in dashboard plugin)
- archive_of_running_task_closes_run
- archive_of_ready_task_does_not_create_spurious_run
- dashboard_direct_status_change_off_running_closes_run
- dashboard_direct_status_change_within_same_state_is_noop_for_runs
- cli_bulk_complete_with_summary_rejects (side-effect assertion)
- cli_bulk_complete_without_summary_still_works
- completed_event_payload_carries_summary
- completed_event_payload_summary_none_when_missing
- patch_status_done_with_summary_and_metadata
- patch_status_done_without_summary_still_works (legacy path)
- patch_status_archive_closes_running_run (E2E through FastAPI TestClient)
164/164 kanban suite pass under scripts/run_tests.sh. Live smoke
(execute_code with isolated HERMES_HOME) covered all five fixed paths
plus a re-claim-after-drag-drop to confirm the fresh run is tracked
correctly after the orphan close.
Addresses vulcan-artivus's RFC review on issue #16102. Picks up the
structural changes that are expensive to retrofit later and zero-cost
to land now; defers workflow-template routing + per-stage lanes to v2
(kept forward-compat hooks in the schema).
Kernel
- New `task_runs` table. Each claim opens a run (pid, claim_lock,
heartbeat, max_runtime, started_at), each terminal transition
closes it with an outcome (completed / blocked / crashed /
timed_out / spawn_failed / gave_up / reclaimed). Multiple rows per
task when retries happen, preserving full attempt history.
- `tasks.current_run_id` points at the active run (NULL when idle);
denormalised for cheap reads.
- `task_events.run_id` carries the run a given event belongs to so
UIs group events by attempt. claim/spawned/complete/block/crash/
timeout/spawn_fail/gave_up/heartbeat events are all run-scoped;
created/promoted/assigned/edited stay task-scoped (run_id=NULL).
- Legacy DBs: migration adds the columns + indexes + synthesizes a
run row for any task that's 'running' before the runs table
existed, so subsequent complete/heartbeat/reclaim calls have a
target. Idempotent.
Structured handoff
- `complete_task(summary=, metadata=)` persists both on the closing
run. `summary` falls back to `result` when omitted so single-run
callers don't duplicate. `metadata` is a free-form dict
({changed_files, tests_run, findings, ...}).
- `build_worker_context` rewrites: now reads "Prior attempts on this
task" (closed runs: outcome, summary, error, metadata) and
"Parent task results" pulls run.summary + run.metadata of the
most-recent completed run per parent, falling back to task.result
for legacy rows without runs. Retrying workers see why earlier
attempts failed; downstream workers see parent handoffs
structurally, not as loose `result` strings.
CLI
- `hermes kanban complete <id> --summary "..." --metadata '{"files":1}'`.
JSON is parsed and rejected with exit-2 if malformed.
- New `hermes kanban runs <id> [--json]` verb. Shows per-run rows:
outcome, profile, elapsed, summary, error. JSON mode serializes
the full run dataclass for scripting.
Dashboard plugin
- GET /tasks/:id now carries a runs[] array alongside task / events /
comments / links. Each run serialised with outcome, summary,
metadata, worker_pid, elapsed fields.
- New Run History section in the drawer. Outcome-coloured left
border (green=active, blue=completed, amber=reclaimed,
red=crashed/timed_out/gave_up/blocked). Collapsed when >3 runs
with a '+N earlier' toggle. Shows summary + error + metadata
inline.
Forward-compat for v2 (vulcan's workflow templates + stages)
- `tasks.workflow_template_id` and `tasks.current_step_key` added as
nullable columns. v1 kernel ignores them for routing; v2 will add
workflow_templates + workflow_steps tables and wire the dispatcher
to consult them. task_runs has a matching `step_key` column. Lets
a v2 release land additively without another schema migration.
Tests (+22 in test_kanban_core_functionality.py, +2 in dashboard)
- run_created_on_claim / run_closed_on_complete_with_summary
- run_summary_falls_back_to_result
- multiple_attempts_preserved_as_runs (3 attempts: reclaimed →
crashed → completed, all visible in list_runs)
- run_on_block_with_reason / run_on_spawn_failure_records_failed_runs
(5 spawn_failed runs + 1 gave_up run)
- event_rows_carry_run_id (task-scoped vs run-scoped split)
- build_worker_context_includes_prior_attempts
- build_worker_context_uses_parent_run_summary (metadata JSON in context)
- migration_backfills_inflight_run_for_legacy_db (simulates a
pre-migration running task, re-runs init_db, asserts backfill)
- forward_compat_columns_writable
- cli_runs_verb + cli_runs_json
- cli_complete_with_summary_and_metadata (JSON round-trip through
shlex + argparse)
- cli_complete_bad_metadata_exits_nonzero
- task_detail_includes_runs / task_detail_runs_empty_before_claim
269/269 kanban suite pass under scripts/run_tests.sh. Live-smoke
covered: single-attempt complete → run closed + summary persisted;
retry scenario → two runs visible (blocked + completed); parent run
summary + metadata surfaced to child via build_worker_context;
forward-compat columns writable via UPDATE; GET /tasks/:id returns
runs[].
Docs
- New 'Runs — one row per attempt' section in kanban.md: the
why (full attempt history, structured metadata), the two-table
model (task is logical, run is execution), the structured handoff
shape (--summary / --metadata), example CLI + dashboard output,
forward-compat note for v2.
- Event reference updated to mention task_events.run_id.
- CLI reference gains 'hermes kanban runs <id>'.
Not in v1 (deferred to v2):
- Workflow templates (workflow_templates + workflow_steps tables,
stage-based routing, success/failure step links).
- 'stage' as a distinct axis from status in the UI.
- Shared-by-default workspace binding across stages of the same
workflow run.
- Pipeline replacement for the kanban-orchestrator skill (the
orchestrator's 'decompose, don't execute' guidance is still
correct; it becomes partly redundant once workflows land).
Ports four items from the Multica audit (https://github.com/multica-ai/multica).
Dropped their cross-host server/daemon architecture and their Postgres+pgvector
skill search — both the wrong shape for our single-host SQLite kernel.
1. Per-task max-runtime (`max_runtime_seconds` column)
- New kernel function `enforce_max_runtime(conn)` runs in every dispatch
tick. When a running task's elapsed time exceeds the cap, we SIGTERM
the worker, wait a 5 s grace (polling _pid_alive), then SIGKILL. The
task goes back to 'ready' with a `timed_out` event and re-queues
on the next tick (unless the spawn-failure circuit breaker has
already parked it).
- Host-local only: lock prefix must match this host's claimer_id so we
never signal a PID on another machine.
- CLI: `hermes kanban create --max-runtime 30m | 2h | 1d | <seconds>`.
New `_parse_duration` helper accepts s/m/h/d suffixes or bare
integers.
- Dashboard POST body + the card's `max_runtime_seconds` field.
2. Worker heartbeat (`last_heartbeat_at` column, `heartbeat` event)
- `heartbeat_worker(conn, task_id, note=None)` emits the event and
touches last_heartbeat_at. Refused when the task isn't running.
- CLI: `hermes kanban heartbeat <id> [--note "..."]`.
- kanban-worker skill instructs workers to heartbeat during long
loops (training runs, encodes, crawls, batch uploads).
- Separate signal from PID crash detection: a worker's Python can
still be alive while the actual work process is stuck. Heartbeat
absence is diagnostic; future work can auto-block on stale
heartbeats but v1 just surfaces the signal.
3. Assignee enumeration (`known_assignees`, `list_profiles_on_disk`)
- Scans ~/.hermes/profiles/ for dirs containing config.yaml + unions
with current assignees on the board. Each entry returns
{name, on_disk, counts: {status: n}}.
- CLI: `hermes kanban assignees [--json]`. Also hooked into
`hermes kanban init` which now prints discovered profiles so new
installs see 'these are the assignees you can target' immediately.
- Dashboard: GET /api/plugins/kanban/assignees for the picker.
4. Event vocab cleanup (three renames + three new kinds)
- `ready` → `promoted` (fires when deps clear; clearer semantic).
- `priority` → `reprioritized` (past-tense verb, matches others).
- `spawn_auto_blocked` → `gave_up` (short, memorable; the circuit
breaker gave up on this task).
- New: `spawned` (emitted with {pid} on successful spawn),
`heartbeat` ({note?}), `timed_out`
({pid, elapsed_seconds, limit_seconds, sigkill}).
- One-shot migration in `_migrate_add_optional_columns` renames
legacy rows in-place on init_db(), so existing DBs upgrade cleanly.
- Gateway notifier's TERMINAL_KINDS set updated; timed_out gets its
own ⏱ message template, gave_up renamed from 'auto-blocked'.
- Plugin_api.py's two 'priority' emit sites renamed to
'reprioritized'.
- Documented in a new 'Event reference' section in kanban.md,
grouped into three clusters (lifecycle / edits / worker
telemetry) with payload shapes.
Tests (+18 in tests/hermes_cli/test_kanban_core_functionality.py,
136/136 pass):
- max_runtime_terminates_overrun_worker: real SIGTERM flow with
_pid_alive stub, verifies event payload + state reset.
- max_runtime_none_means_no_cap: unbounded tasks aren't timed out.
- create_task_persists_max_runtime.
- enforce_max_runtime_integrates_with_dispatch: kernel-level +
dispatch_once chaining.
- heartbeat_on_running_task + heartbeat_refused_when_not_running.
- cli_heartbeat_verb with --note round-trip.
- recompute_ready_emits_promoted_not_ready.
- spawn_failure_circuit_breaker_emits_gave_up.
- spawned_event_emitted_with_pid.
- migration_renames_legacy_event_kinds (injects old rows, re-runs
init_db, asserts rename).
- list_profiles_on_disk (tmp_path + config.yaml filter).
- known_assignees_merges_disk_and_board (profiles on disk + board
assignees + per-status counts).
- cli_assignees_json.
- parse_duration_accepts_formats (s/m/h/d/float).
- parse_duration_rejects_garbage.
- cli_create_max_runtime_via_duration (2h → 7200).
- cli_create_max_runtime_bad_format_exits_nonzero.
Live smoke: POST /tasks with max_runtime_seconds round-trips;
/assignees returns the union of on-disk + board-assigned names;
PATCH priority produces 'reprioritized' events (not 'priority');
board cards expose max_runtime_seconds + last_heartbeat_at.
Docs (website/docs/user-guide/features/kanban.md):
- New 'Event reference' section with three-cluster table
(lifecycle / edits / worker telemetry) + payload shapes.
- CLI reference updated for --max-runtime, heartbeat, assignees.
- Gateway notifications section updated for the new TERMINAL_KINDS.
Not ported from Multica (deliberate, documented in the out-of-scope
section already): Postgres+pgvector skill search (heavy deps conflict
with SQLite kernel), server+daemon cross-host model (we're
single-host on purpose), first-class agent identity with threaded
comments (we keep the board profile-agnostic).
Eliminates every 'known broken on day one' item in the core functionality
audit. The board is now self-driving (daemon, not cron), self-healing
(crash detection, spawn-failure circuit breaker), and self-reporting
(logs, stats, gateway notifications).
Dispatcher
- New `hermes kanban daemon` long-lived loop with --interval, --max,
--failure-limit, --pidfile, --verbose, signal-clean shutdown
(SIGINT/SIGTERM via threading.Event). A kb.run_daemon() entry point
lets tests drive it inline without subprocess.
- `hermes kanban init` now prints the dispatcher setup hint so users
don't leave the board off-by-default. Ships a systemd user unit at
plugins/kanban/systemd/hermes-kanban-dispatcher.service.
- Removed the old 'add this to cron' doc path. Cron runs agent
prompts (LLM cost per tick) — unacceptable for a per-minute
coordination loop.
Worker aliveness / safety
- Spawn returns the child's PID; dispatcher stores it on the task row
and calls detect_crashed_workers() every tick. If the PID is gone
but the claim TTL hasn't expired, the task drops back to ready with
a 'crashed' event. Host-local only — cross-host PIDs are ignored
per the single-host design.
- Spawn-failure circuit breaker: after N consecutive spawn_failed
events on the same task (default 5), the dispatcher auto-blocks
with the last error as the reason. Success resets the counter.
Workspace-resolution failures count against the same budget.
- Log rotation: _rotate_worker_log trims at 2 MiB, keeps one
generation (.log.1), bounds per-task disk usage at ~4 MiB.
Idempotency / dedup
- create_task(idempotency_key=...) returns the existing non-archived
task id for retried webhooks. --idempotency-key on the CLI, json
body field on the dashboard plugin. Archived tasks don't block a
fresh create with the same key.
CLI surface
- Bulk verbs: complete, unblock, archive accept multiple ids;
block accepts --ids for sibling blocks with the same reason.
- New verbs: daemon, watch (live event tail filtered by
assignee/tenant/kinds), stats, log, notify-subscribe,
notify-list, notify-unsubscribe.
- dispatch gains --failure-limit + crashed/auto_blocked columns in
JSON output and human-readable output.
- gc accepts --event-retention-days / --log-retention-days; prunes
task_events for terminal tasks and old log files.
Gateway integration
- New GatewayRunner._kanban_notifier_watcher: polls
kanban_notify_subs every 5s, pushes ✔/⏸/✖ messages to subscribed
chats for completed/blocked/spawn_auto_blocked/crashed events.
Cursor-advanced per-sub; auto-removed when the task reaches
done/archived. Runs alongside the session expiry and platform
reconnect watchers — SQLite work in asyncio.to_thread so the
event loop never blocks.
- /kanban create in the gateway auto-subscribes the originating
chat (platform + chat_id + thread_id). Users see
'(subscribed — you'll be notified when t_abcd completes or
blocks)' appended to the response.
Dashboard plugin
- GET /stats returns board_stats (by_status, by_assignee,
oldest_ready_age_seconds).
- GET /tasks/:id/log returns the worker log with optional ?tail=N
cap. 404 on unknown task, exists=false when the task has never
spawned.
- POST /tasks accepts idempotency_key; both Pydantic body and the
create_task kwarg now round-trip.
- /board attaches task.age (created/started/time_to_complete in
seconds) so the UI can colour stale cards without recomputing.
- Card CSS: amber border after N minutes, red border when clearly
stuck (tier per status: running 10m/60m, ready 1h/24h, todo
7d/30d, blocked 1h/24h).
- Drawer: new Worker log section, auto-loads on mount, last 100 KB
cap with on-disk path surfaced when truncated.
Kernel
- Schema additions: tasks.idempotency_key, tasks.spawn_failures,
tasks.worker_pid, tasks.last_spawn_error; new
kanban_notify_subs table. All gated by _migrate_add_optional_columns
so legacy DBs upgrade cleanly.
- release_stale_claims / complete_task / block_task now all clear
worker_pid so crash detection doesn't false-positive on reclaimed
tasks.
- read_worker_log fixed: tail-skip no longer eats one-giant-line
logs (common with child processes that don't flush newlines
before dying).
Tests (tests/hermes_cli/test_kanban_core_functionality.py, 28 new)
- Idempotency: same key returns existing, archived doesn't block,
no key never collides
- Circuit breaker: auto-blocks after limit, success resets counter,
workspace-resolution failure counts against budget
- Aliveness: _pid_alive helper, detect_crashed_workers reclaims
exited child
- Daemon: runs and stops cleanly via stop_event, survives a tick
exception
- Stats + task_age helpers
- Notify subs: CRUD, cursor advances, distinct-thread is a separate row
- GC: events-only-for-terminal-tasks, old worker logs deleted
- Log: rotation keeps one generation, read_worker_log tail
- CLI: bulk complete/archive/unblock/block, create with
--idempotency-key, stats --json, notify-subscribe+list, log
missing task, gc reports counts
- run_slash parity: smoke-tests every registered verb (23
invocations); none may raise or return empty string
Full kanban test suite: 234/234 pass under scripts/run_tests.sh
(60 original + 30 dashboard plugin + 28 new core + 116 command
registry). Live smoke covers /stats, idempotency, age, log endpoint
with and without content, log?tail= truncation signal, 404 on unknown
task.
Docs (website/docs/user-guide/features/kanban.md)
- 'Core concepts' rewritten: new statuses (triage), idempotency key,
dispatcher-as-daemon-not-cron with circuit breaker behaviour
documented.
- Quick start swapped to daemon. New systemd section covers user
service install.
- New sections: idempotent create, bulk verbs, gateway
notifications, out-of-scope single-host note (kanban.db is local;
don't expect multi-host).
- CLI reference updated for every new verb, every new flag.
The dashboard plugin gets the last layer of features that turn it from a
'usable read surface with drag-drop' into a 'full kanban UI' — no more
'drop to CLI to do X' moments from inside the tab.
Plugin backend
- POST /tasks/bulk — apply the same patch (status / archive / assignee
/ priority) to every id in the request body. Each id runs
independently: one bad id reports {ok: false, error: ...} without
aborting siblings. Status transitions that aren't legal for the
current state are surfaced per-id ('transition to done refused').
Used by the multi-select bulk action bar.
- GET /config — returns the dashboard.kanban section of config.yaml
(default_tenant, lane_by_profile, include_archived_by_default,
render_markdown) with sensible defaults when the section is absent.
Loaded once by the SPA to preselect filters and toggle markdown
rendering.
- _conn() helper — every handler now goes through it, calling
kanban_db.init_db() (idempotent) before every connection. Fresh
installs work whether the first hit is GET /board, POST /tasks, or
any other endpoint — no more 'no such table: tasks' when the CLI
or a script hits the plugin before the dashboard has ever loaded.
Plugin UI (plugin bundle, +~12 KB)
- Multi-select: per-card checkbox; shift/ctrl-click also toggles
without opening the drawer. A BulkActionBar appears above the
columns with batch → ready / complete / archive / reassign
(profile dropdown + unassign option). Destructive batches confirm
first. Partial failures from the backend are surfaced inline.
- Drawer inline editing:
- Click the title → TitleEditor swaps in an input, Enter saves,
Escape cancels.
- Click the Assignee meta row → AssigneeEditor input (empty string
unassigns).
- Click the Priority meta row → PriorityEditor numeric input.
- New 'edit' button on Description → full-width textarea; Save /
Cancel switch back to rendered view.
- Dependency editor: chip list of parents + children with per-chip
× button (calls DELETE /links). Add-parent / add-child dropdowns
filter out self + already-linked tasks so you cannot re-add a
duplicate edge or a self-loop. Cycle rejections from the server
surface cleanly via the existing error banner.
- Parent selection in InlineCreate: new dropdown listing every task
on the board ('{id} — {title}') — picking one sends parents=[id]
with the create payload, so the task lands in todo (or triage if
created from the Triage column) with the dependency wired up.
- Safe markdown rendering for description, comment bodies, and
result. A small in-bundle renderer handles headings, bold, italic,
inline code, fenced code, bullet lists, and http(s)/mailto links.
Every substitution runs on HTML-escaped input (no raw HTML), links
get target=_blank + rel=noopener,noreferrer. Disabled by config
key dashboard.kanban.render_markdown=false (falls back to <pre>).
- Touch drag-drop: attachTouchDrag() installs a pointerdown handler
that spawns a drag proxy, tracks elementFromPoint under the finger,
and dispatches a hermes-kanban:drop CustomEvent on the column when
released. Desktop continues to use native HTML5 DnD. Columns
listen for both.
- ErrorBoundary already present from the prior commit catches any
renderer throw; markdown escape + touch-proxy cleanup both have
their own try/finally.
Tests (tests/plugins/test_kanban_dashboard_plugin.py — 90/90 pass)
- bulk_status_ready: 3 tasks blocked, batch → ready, all move
- bulk_archive hides all ids from default board
- bulk_reassign changes every assignee
- bulk_unassign_via_empty_string sets assignee back to None
- bulk_partial_failure_doesnt_abort_siblings: bogus id in middle,
good siblings still get priority=7
- bulk_empty_ids_400
- config_returns_defaults_when_section_missing
- config_reads_dashboard_kanban_section (writes config.yaml, verifies
every key round-trips)
Live smoke (real FastAPI app + isolated HERMES_HOME):
- /config without section returns defaults
- /config with dashboard.kanban section returns the configured values
- POST /tasks as the first-ever request (no prior /board) succeeds —
auto-init handles it
- Link add + remove via POST /links + DELETE /links round-trip
- Bulk priority bump on 2 ids, both get priority=5
- Bulk archive hides ids from default board
- PATCH {title, body} updates the task, markdown source survives
the round trip
- POST /tasks {triage: true, parents: [id]} lands in triage, not todo
- Bulk partial: 2 good + 1 bogus returns per-id outcome
Docs (website/docs/user-guide/features/kanban.md)
- 'What the plugin gives you' rewritten to reflect bulk, drawer
edit, dep editor, parent-on-create, markdown, touch drag-drop.
- New 'Dashboard config' subsection with a YAML example for
dashboard.kanban.*.
- REST table gains /tasks/bulk and /config rows.
Follows up on the initial dashboard plugin with the items called out
during self-review — ships the GUI-reality claims the PR body made,
closes the WebSocket auth gap, and lands the 'Triage' status the design
spec's Fusion-style screenshot leads with.
Kernel changes
- kanban_db.VALID_STATUSES gains 'triage'. status is TEXT without a
CHECK constraint so no schema migration is needed.
- create_task(triage=True) forces the initial status to 'triage'
regardless of parents, and parent ids are still validated so the
eventual link rows don't dangle. recompute_ready() only promotes
'todo' -> 'ready', so triage tasks are naturally isolated from the
dispatcher pipeline.
- hermes kanban create gains --triage.
Patterns table (docs) gains P9 'Triage specifier'.
Plugin backend (plugins/kanban/dashboard/plugin_api.py)
- GET /board now auto-init's kanban.db on first read (idempotent).
A fresh install shows an empty board instead of 'failed to load'.
- GET /board returns a new 'progress' field per task — {done, total}
of child-task completion, or None if the task has no children.
- BOARD_COLUMNS prepends 'triage'.
- POST /tasks accepts {triage: bool}; PATCH /tasks/:id accepts
{status: 'triage'}.
- WebSocket /events now requires ?token=<session_token> as a query
param — browsers can't set Authorization on a WS upgrade, so this
matches the pattern the in-browser PTY bridge uses. Constant-time
compare against hermes_cli.web_server._SESSION_TOKEN. In bare-test
contexts (no dashboard module) the check no-ops so the tail loop
stays testable. Security boundary documented in the module header
and in website/docs/user-guide/features/kanban.md.
Plugin UI (plugins/kanban/dashboard/dist/index.js + style.css)
- Adds the Triage column (lilac dot) with helper text
'Raw ideas — a specifier will flesh out the spec'. Inline-create
from the Triage column parks new tasks in triage.
- Status action row in the drawer gains '→ triage'.
- Progress pill (N/M) on cards that have children. Full-complete
state tints the pill green.
- 'Lanes by profile' toolbar toggle — sub-groups the Running column
by assignee so you see at a glance which specialist is busy on
what.
- Destructive status moves (done / archived / blocked) via drag-drop
OR via the drawer action row now prompt for confirmation.
- Escape closes the drawer.
- Live-update reloads are debounced (250ms) so a burst of
task_events triggers one refetch, not N.
- WebSocket includes ?token= built from window.__HERMES_SESSION_TOKEN__.
- WebSocket reconnect uses exponential backoff capped at 30s, not
a fixed 1.5s spin loop, and surfaces a user-visible error on
code-1008 (auth rejected) instead of reconnecting forever.
- ErrorBoundary wraps the page — a bad card render shows a
'rendering error, reload view' card instead of crashing the tab.
Tests (tests/plugins/test_kanban_dashboard_plugin.py, +5 tests = 21)
- empty-board shape now asserts all 6 columns including 'triage'
- create_triage_lands_in_triage_column
- triage_task_not_promoted_to_ready (dispatcher bypasses triage)
- patch_status_triage_works (both into triage and out of it)
- board_progress_rollup (0/2 -> 1/2 -> childless cards = None)
- board_auto_initializes_missing_db
- ws_events_rejects_when_token_required (three sub-assertions:
missing → 1008, wrong → 1008, correct → handshake accepted)
All 82 kanban tests pass under scripts/run_tests.sh.
Docs
- kanban.md 'What the plugin gives you' fully rewritten to match
shipped reality (triage, progress pill, assignee lanes,
destructive-confirm, Escape-close, debounce).
- New 'Security model' subsection documents the explicit-plugin-
route-bypass, the WS token requirement, and the --host 0.0.0.0
warning; also notes that kanban.db is profile-agnostic on purpose
(the coordination primitive) so cross-profile visibility is
expected.
- CLI command reference shows --triage.
- Collaboration patterns table adds P9 'Triage specifier'.
Ships plugins/kanban/dashboard/ as a bundled dashboard plugin. No core
changes — uses the standard dashboard plugin contract (manifest.json +
dist/index.js + plugin_api.py) documented in 'Extending the Dashboard'.
What the tab gives you:
- One column per kanban status (todo / ready / running / blocked / done;
archived behind a toggle), column counts, coloured status dots.
- Cards with id, title, priority badge, tenant tag, assignee,
comment/link counts, 'created N ago'.
- HTML5 drag-drop between columns — status change routes through the
same kanban_db code the CLI /kanban verbs use, so the three surfaces
(CLI, gateway, dashboard) can never drift.
- Inline create per-column (title, assignee, priority).
- Side drawer on card click: description, status action row
(→ ready / → running / block / unblock / complete / archive),
dependency links, comment thread with Enter-to-submit,
last 20 events.
- Toolbar: search, tenant filter, assignee filter, show-archived,
nudge-dispatcher (skip the 60s wait), refresh.
- Live updates via WebSocket tailing task_events — the board reflects
CLI or gateway actions in real time.
REST surface under /api/plugins/kanban/: GET /board, GET /tasks/:id,
POST /tasks, PATCH /tasks/:id, POST /tasks/:id/comments, POST /links,
DELETE /links, POST /dispatch, WS /events. Every handler is a thin
wrapper around kanban_db — no new business logic.
Visually theme-aware: the plugin CSS reads only --color-*, --radius,
--font-mono etc. so it reskins with whichever dashboard theme is active.
Tests (tests/plugins/test_kanban_dashboard_plugin.py, 16 tests):
- empty board shape
- create + appears in ready column with tenant/assignee rollups
- tenant filter
- detail includes parents/children/events
- 404 on unknown task
- PATCH status: complete / block / unblock / ready drag-drop / running
- PATCH reassign, priority, edit, invalid-status rejection
- POST comment (plus empty-body rejection)
- POST link + DELETE link + cycle rejection
- POST dispatch (dry run)
All 76 kanban tests pass under scripts/run_tests.sh.
Docs: website/docs/user-guide/features/kanban.md gains a full
'Dashboard (GUI)' section covering install, architecture, REST surface,
live-updates mechanism, extending, and scope boundary.
The /kanban CLI + slash command are enough to run the board
headlessly, but triage and cross-profile supervision want a
visual board. Document the design as a dashboard plugin that:
- reads live state from kanban.db over a WebSocket on
task_events (no polling)
- writes through run_slash() so CLI/gateway/GUI cannot drift
- mounts under /api/plugins/kanban/ following the existing
'Extending the Dashboard' plugin shape
The plugin is strictly a thin layer over kanban_db — no new
business logic, nothing to merge into the kernel.
New `hermes kanban` CLI subcommand + `/kanban` slash command + skills for
worker and orchestrator profiles. SQLite-backed task board
(~/.hermes/kanban.db) shared across all profiles on the host. Zero
changes to run_agent.py, no new core tools, no tool-schema bloat.
Motivation: delegate_task is a function call — sync fork/join, anonymous
subagent, no resumability, no human-in-the-loop. Kanban is the durable
shape needed for research triage, scheduled ops, digital twins,
engineering pipelines, and fleet work. They coexist (workers may call
delegate_task internally).
What this adds
- hermes_cli/kanban_db.py — schema, CAS claim, dependency resolution,
dispatcher, workspace resolution, worker-context builder.
- hermes_cli/kanban.py — 15-verb CLI surface and shared run_slash()
entry point used by both CLI and gateway.
- skills/devops/kanban-worker — how a profile should work a claimed task.
- skills/devops/kanban-orchestrator — "you are a dispatcher, not a
worker" template with anti-temptation rules.
- /kanban slash command wired into cli.py and gateway/run.py. Bypasses
the running-agent guard (board writes don't touch agent state), so
/kanban unblock can free a stuck worker mid-conversation.
- Design spec at docs/hermes-kanban-v1-spec.pdf — comparative analysis
vs Cline Kanban, Paperclip, NanoClaw, Gemini Enterprise; 8 patterns;
4 user stories; implementation plan; concurrency correctness.
- Docs: website/docs/user-guide/features/kanban.md, CLI reference
updated, sidebar entry added.
Architecture highlights
- Three planes: control (user + gateway), state (board + dispatcher),
execution (pool of profile processes).
- Every worker is a full OS process, spawned as `hermes -p <profile>`.
No in-process subagent swarms — solves NanoClaw's SDK-lifecycle
failure class.
- Atomic claim via SQLite CAS in a BEGIN IMMEDIATE transaction; stale
claims reclaimed 15 min after their TTL expires.
- Tenant namespacing via one nullable column — one specialist fleet
can serve many businesses with data isolation by workspace path.
Tests: 60 targeted tests (schema, CAS atomicity, dependency resolution,
dispatcher, workspace kinds, tenancy, CLI + slash surface). All pass
hermetic via scripts/run_tests.sh.
The background memory/skill review (_spawn_background_review) has always
forked a new AIAgent passing only model and provider, then relied on
AIAgent.__init__ to re-resolve credentials from env vars. This works for
users with keys in ~/.hermes/.env but silently falls back to env-var
auto-resolution in all cases, which fails for OAuth-only providers,
session-scoped creds, and credential-pool setups where auth can't be
reconstructed from env.
This used to be invisible -- failures were swallowed via logger.debug().
PR 8a2506af4 (Apr 24) surfaced auxiliary failures to the user, which
made the stale bug visible as:
"Auxiliary background review failed: No LLM provider configured"
Fix: pass api_key, base_url, api_mode, and credential_pool from the
parent's live runtime into the fork -- matching how every other
auxiliary path (compression, memory flush, vision, session search)
already inherits the parent's credentials via _current_main_runtime().
The chown/chmod block on config.yaml was added in b24d239ce to keep the
file readable by the hermes runtime user, but it sat in the post-gosu
'running as hermes' section of the entrypoint. That meant:
1. Default `docker run <image>` — container starts as root, entrypoint
drops to hermes via gosu, then non-root hermes tries to chown the
file to hermes. Works by coincidence because the file was just
created by root during volume setup and gosu target == target owner.
2. `docker run -u $(id -u):$(id -g) <image>` (#15865) — container
starts as the caller's UID. The root block is skipped entirely, we
land in the hermes section as some arbitrary non-root user, and
chown to 'hermes' fails with 'Operation not permitted'. Script
aborts under `set -e`.
Move the chown/chmod into the root block (before the gosu exec) where
it actually has privilege, and guard with `2>/dev/null || true` so
rootless Podman (where even in-container root lacks host-side chown
rights) doesn't abort either.
Closes#15865
Salvage PR #15883 cherry-picked FocusFlow Dev's commit; release-notes
CI needs the AUTHOR_MAP entry to attribute to the PR author's GitHub
login rather than a placeholder.
New `hermes kanban` CLI subcommand + `/kanban` slash command + skills for
worker and orchestrator profiles. SQLite-backed task board
(~/.hermes/kanban.db) shared across all profiles on the host. Zero
changes to run_agent.py, no new core tools, no tool-schema bloat.
Motivation: delegate_task is a function call — sync fork/join, anonymous
subagent, no resumability, no human-in-the-loop. Kanban is the durable
shape needed for research triage, scheduled ops, digital twins,
engineering pipelines, and fleet work. They coexist (workers may call
delegate_task internally).
What this adds
- hermes_cli/kanban_db.py — schema, CAS claim, dependency resolution,
dispatcher, workspace resolution, worker-context builder.
- hermes_cli/kanban.py — 15-verb CLI surface and shared run_slash()
entry point used by both CLI and gateway.
- skills/devops/kanban-worker — how a profile should work a claimed task.
- skills/devops/kanban-orchestrator — "you are a dispatcher, not a
worker" template with anti-temptation rules.
- /kanban slash command wired into cli.py and gateway/run.py. Bypasses
the running-agent guard (board writes don't touch agent state), so
/kanban unblock can free a stuck worker mid-conversation.
- Design spec at docs/hermes-kanban-v1-spec.pdf — comparative analysis
vs Cline Kanban, Paperclip, NanoClaw, Gemini Enterprise; 8 patterns;
4 user stories; implementation plan; concurrency correctness.
- Docs: website/docs/user-guide/features/kanban.md, CLI reference
updated, sidebar entry added.
Architecture highlights
- Three planes: control (user + gateway), state (board + dispatcher),
execution (pool of profile processes).
- Every worker is a full OS process, spawned as `hermes -p <profile>`.
No in-process subagent swarms — solves NanoClaw's SDK-lifecycle
failure class.
- Atomic claim via SQLite CAS in a BEGIN IMMEDIATE transaction; stale
claims reclaimed 15 min after their TTL expires.
- Tenant namespacing via one nullable column — one specialist fleet
can serve many businesses with data isolation by workspace path.
Tests: 60 targeted tests (schema, CAS atomicity, dependency resolution,
dispatcher, workspace kinds, tenancy, CLI + slash surface). All pass
hermetic via scripts/run_tests.sh.
Follow-up to PR #16053 (/btw as /background alias). Cleans up the
plumbing added exclusively for the old ephemeral /btw handler and
repairs a broken btw bypass that landed between my refactor and this
follow-up.
run_agent.py:
- Remove persist_session kwarg, instance attr, and _persist_session
short-circuit. Only /btw ever passed persist_session=False; with
/btw gone the default (always persist) is the only behavior anyone
ever wanted.
gateway/run.py:
- Remove the unreachable 'if _cmd_def_inner.name == "btw"' block
(PR #16059). Canonical name for a /btw message is 'background' after
alias resolution — the comparison could never be true, and it called
_handle_btw_command which no longer exists. The /background branch
above it already dispatches /btw correctly.
tests/gateway/test_running_agent_session_toggles.py:
- Fix test_btw_dispatches_mid_run to mock _handle_background_command
(the real dispatch target for /btw) instead of the deleted
_handle_btw_command.
/btw spawns a parallel ephemeral side-question task (self-guarded against
concurrent /btw on the same chat) — exactly like /background. But it was
missing from the running-agent bypass list in _handle_message(), so it
fell through to the catch-all and returned:
⏳ Agent is running — /btw can't run mid-turn. Wait for the current
response or /stop first.
That's the opposite of what /btw is for — asking a side question while
the main turn is still working. Add the bypass next to /background and a
regression test covering the mid-turn dispatch path.
Reported by @IuriiTiunov on Telegram.
The ephemeral no-tools side-question variant of /btw confused users who
expected 'by-the-way' to mean 'run this off to the side with tools' —
they'd type /btw and get a toolless agent that couldn't do the work.
/bg worked because it was /background with full tools.
Collapse the two: /btw and /bg both alias to /background. One command,
one behavior, no more gotchas about which variant has tools.
Removed:
- _handle_btw_command in cli.py and gateway/run.py
- _run_btw_task + _active_btw_tasks state in gateway/run.py
- prompt.btw JSON-RPC method + btw.complete event in tui_gateway
- BtwStartResponse type + btw.complete case in ui-tui
- Standalone /btw slash tree registration in Discord
- Standalone btw CommandDef in hermes_cli/commands.py
Updated:
- background CommandDef aliases: (bg,) -> (bg, btw)
- TUI session.ts: local btw handler merged into background
- Docs and tips updated to describe /btw as a /background alias
PR #16046 added /busy and /verbose hints to the classic CLI and the
gateway runner but skipped the Ink TUI (and therefore the dashboard
/chat page, which embeds the TUI via PTY). This extends the same
latch to the TUI with TUI-native wording.
The TUI's busy-input model is not the /busy knob from the CLI —
single Enter while busy auto-queues, double Enter on an empty line
interrupts. The new busy-input hint teaches THAT gesture instead of
telling the user to flip a config that does not apply.
Changes:
- agent/onboarding.py — add busy_input_hint_tui() + tool_progress_hint_tui()
- tui_gateway/server.py — onboarding.claim JSON-RPC (Ink triggers busy
hint on enqueue) + _maybe_emit_onboarding_hint helper hooked into
_on_tool_complete for the 30s/tool_progress=all path. Same
config.yaml latch so each hint fires at most once per install across
CLI, gateway, and TUI combined.
- ui-tui/src/gatewayTypes.ts — OnboardingClaimResponse + onboarding.hint event
- ui-tui/src/app/createGatewayEventHandler.ts — render the hint event as sys()
- ui-tui/src/app/useSubmission.ts — claim busy_input_prompt on first
busy enqueue
- tests/agent/test_onboarding.py — +3 cases for TUI hint shape
- tests/tui_gateway/test_protocol.py — +4 cases for onboarding.claim
- website/docs/user-guide/tui.md — new 'Interrupting and queueing'
section explaining the TUI's double-Enter model and the hints
Validation:
scripts/run_tests.sh tests/agent/test_onboarding.py \
tests/tui_gateway/test_protocol.py \
tests/gateway/test_busy_session_ack.py
-> 66 passed
npm --prefix ui-tui run type-check -> clean
npm --prefix ui-tui run lint -> clean
npm --prefix ui-tui run build -> clean
Manage the fallback_providers chain from the CLI instead of hand-editing
config.yaml. The picker reuses select_provider_and_model() from 'hermes
model' — same provider list, same credential prompts, same model picker.
hermes fallback [list] Show the current chain (primary + fallbacks)
hermes fallback add Run the model picker, append selection to chain
hermes fallback remove Pick an entry to delete (arrow-key menu)
hermes fallback clear Remove all entries (with confirmation)
'add' snapshots config['model'] before calling the picker, extracts the
user's selection from the post-picker state, then restores the primary
and appends {provider, model, base_url?, api_mode?} to fallback_providers.
Auth store's active_provider is snapshot/restored too so OAuth-provider
fallbacks don't silently deactivate the user's primary. Duplicates and
self-as-fallback are rejected. Legacy single-dict 'fallback_model' entries
are auto-migrated to the list format on first write.
Instead of a blocking first-run questionnaire, show a one-time hint the first
time the user hits each behavior fork:
1. First message while the agent is working — appends a hint to the busy-ack
explaining the /busy queue vs /busy interrupt knob, phrased to match the
mode that was just applied (don't tell a queue-mode user to switch to
queue).
2. First tool that runs for >= 30s in the noisiest progress mode
(tool_progress: all) — prints a hint about /verbose to cycle display
modes (all -> new -> off -> verbose). Gated on /verbose actually being
usable on the surface: always shown on CLI; on gateway only shown when
display.tool_progress_command is enabled.
Each hint is latched in config.yaml under onboarding.seen.<flag>, so it
fires exactly once per install across CLI, gateway, and cron, then never
again. Users can wipe the section to re-see hints.
New:
- agent/onboarding.py — is_seen / mark_seen / hint strings, shared by
both CLI and gateway.
- onboarding.seen in DEFAULT_CONFIG (hermes_cli/config.py) and in
load_cli_config defaults (cli.py). No _config_version bump — deep
merge handles new keys.
Wired:
- gateway/run.py: _handle_active_session_busy_message appends the hint
after building the ack. progress_callback tracks tool.completed
duration and queues the tool-progress hint into the progress bubble.
- cli.py: CLI input loop appends the busy-input hint on the first busy
Enter; _on_tool_progress appends the tool-progress hint on the first
>=30s tool completion. In-memory CLI_CONFIG is also updated so
subsequent fires in the same process are suppressed immediately.
All writes go through atomic_yaml_write and are wrapped in try/except
so onboarding can never break the input/busy-ack paths.
The base adapter's auto-TTS path fired on any voice message unless the
chat had explicitly run /voice off — it never read voice.auto_tts from
config.yaml, so users who set auto_tts: false still got audio replies.
Gate the base adapter on a three-layer decision instead:
1. chat in _auto_tts_enabled_chats (explicit /voice on|tts) → fire
2. chat in _auto_tts_disabled_chats (explicit /voice off) → suppress
3. else → voice.auto_tts global default
Runner now pushes voice.auto_tts onto the adapter as _auto_tts_default
and mirrors /voice on|tts chats into _auto_tts_enabled_chats via the
existing _sync_voice_mode_state_to_adapter path. /voice off still wins.
Closes#16007.
Users who run `hermes setup` get `cli-config.yaml.example` copied verbatim
(including comments) to ~/.hermes/config.yaml. But several display settings
had thin comments that didn't enumerate the valid options, so users couldn't
tell from reading their config what values each key accepts.
- busy_input_mode: widen from 'CLI' to 'CLI and gateway platforms';
note /stop as gateway equivalent of Ctrl+C; add /busy_input_mode runtime hint
- compact, interim_assistant_messages, bell_on_complete, show_reasoning,
streaming: add true/false option lines showing effect of each value
- skin: refresh the built-in skin list (was missing daylight, warm-lightmode,
poseidon, sisyphus, charizard — 5 of 9 built-ins undocumented)
When the LLM response carries N parallel tool calls, the agent fires
N tool.started events back-to-back before its interrupt check runs.
A user sending /stop mid-batch would see the '⚡ Interrupting current
task' ack followed by a trail of 🔍 web_search bubbles for the remaining
events in the batch — making the interrupt feel ignored.
progress_callback and the drain loop in send_progress_messages now
check agent.is_interrupted (via agent_holder[0], the existing
cross-scope handle). Events that arrive after interrupt are dropped
at both the queueing and rendering stages. The '⚡ Interrupting'
message is sent through a separate adapter path and is unaffected.
Follow-up on #16020 salvage. Three corrections:
1. Truth signal for /copy
Before: success was 'OSC 52 sequence was emitted to stdout'. That's
false on local Linux inside tmux (emitSequence=false), so /copy kept
printing 'clipboard copy failed' to users whose xclip/wl-copy had
already succeeded fire-and-forget.
Fix: setClipboard() now returns { sequence, success } where success =
native-fired OR tmux-buffer-loaded OR osc52-emitted. copyNative()
returns a boolean telling setClipboard whether a native attempt was
made. /copy only shows 'failed' when literally no path was taken.
2. Dashboard keybinding
Before: Ctrl+C for copy on non-Mac (Ctrl+Shift+C for paste).
That swallows SIGINT when a stale selection is present and breaks
the xterm/gnome-terminal/konsole/Windows-Terminal convention where
Ctrl+C in a terminal emulator is always SIGINT. The real bug was
that clipboard writes lost user-gesture through OSC-52 round-trips,
which the direct writeText already fixes.
Fix: revert copyModifier to Ctrl+Shift+C on non-Mac. Direct
writeText in the keydown handler preserves user gesture. term.write
Escape replaced with term.clearSelection() (works without relying
on TUI input mode).
3. Error toast text
Before: 'see HERMES_TUI_DEBUG_CLIPBOARD' — tells users how to
debug but not how to fix.
Fix: point users at HERMES_TUI_FORCE_OSC52=1 first (the actual
escape hatch), mention the debug var second.
- Dashboard copy: direct Clipboard API on Ctrl+C/Cmd+C (user gesture);
send Escape to TUI to clear selection; Ctrl+Shift+C kept as fallback.
- TUI /copy: copySelection() async; only reports success if OSC52 emitted.
- Add HERMES_TUI_FORCE_OSC52 env var to override native-tool detection.
- Fixes "copied N chars" false-positive when clipboard backend absent.
Changes:
web/src/pages/ChatPage.tsx — direct navigator.clipboard.writeText
ui-tui/packages/hermes-ink/src/ink/ink.tsx — async copySelection
ui-tui/packages/hermes-ink/src/ink/termio/osc.ts — HERMES_TUI_FORCE_OSC52
ui-tui/src/app/slash/commands/core.ts — async /copy with honest feedback
Problem: Ctrl+C in Hermes TUI shows 'copied' but clipboard often empty.
Root causes:
- Native Linux tools (xclip, wl-copy) require DISPLAY/WAYLAND_DISPLAY; in
headless Docker/SSH they fail or hang.
- OSC 52 fallback requires terminal emulator support; when absent, sequence
is dropped silently.
- Dashboard OSC 52 → Clipboard API path fails due to missing user gesture;
errors were silently caught.
- User feedback 'copied selection' was shown unconditionally, regardless of
success.
Solution implemented:
- Short-circuit Linux native clipboard probing when no display server is
present (no DISPLAY and no WAYLAND_DISPLAY). Avoids futile attempts and
timeouts.
- Add HERMES_TUI_DEBUG_CLIPBOARD env var (1/true). When set, TUI logs to
stderr which clipboard path is used, probe results on Linux, and whether
OSC 52 was emitted. Greatly improves diagnosability.
- Improve dashboard clipboard error handling: replace empty catch blocks
with console.warn messages for OSC 52 decode/Write failures and direct
copy/paste errors. Makes browser permission/user-gesture failures visible
in DevTools.
- Add comprehensive clipboard troubleshooting documentation to README and
AGENTS, covering OSC 52 verification, tmux config, Docker/headless
constraints, env vars, dashboard caveats, and fallback strategies.
Technical details:
- in ui-tui/packages/hermes-ink/src/ink/termio/osc.ts:
- Early return on Linux if both DISPLAY and WAYLAND_DISPLAY unset.
- Refactor probe sequence to async with 500ms timeout,
caching result; subsequent copies use cached tool immediately.
- Emit debug logs when HERMES_TUI_DEBUG_CLIPBOARD=1.
- in ink.tsx: log when OSC 52 not emitted (native
or tmux path in use) in debug mode.
- : OSC 52 handler and Ctrl+Shift+C handler now
log warnings to console on Clipboard API rejection with error message.
- Documentation: new 'Clipboard Troubleshooting' section in README; new
'Clipboard environment variables and pitfalls' subsection in AGENTS.md
(Known Pitfalls).
Tests: full ui-tui test suite (292 tests) passes; clipboard and OSC tests
unaffected. No breaking changes.
Files changed:
- ui-tui/packages/hermes-ink/src/ink/termio/osc.ts
- ui-tui/packages/hermes-ink/src/ink/ink.tsx
- web/src/pages/ChatPage.tsx
- README.md
- AGENTS.md
- CHANGELOG.md (new)
OpenRouter and Nous Portal curated picker lists now resolve via a JSON
manifest served by the docs site, falling back to the in-repo snapshot
when unreachable. Lets us update model lists without shipping a release.
Live URL: https://hermes-agent.nousresearch.com/docs/api/model-catalog.json
(source at website/static/api/model-catalog.json; auto-deploys via the
existing deploy-site.yml GitHub Pages pipeline on every merge to main).
Schema (v1) carries id + optional description + free-form metadata at
manifest, provider, and model levels. Pricing and context length stay
live-fetched via existing machinery (/v1/models endpoints, models.dev).
Config (new model_catalog section, default enabled):
model_catalog.url master manifest URL
model_catalog.ttl_hours disk cache TTL (default 24h)
model_catalog.providers.<name>.url optional per-provider override
Fetch pipeline: in-process cache -> disk cache (fresh < TTL) -> HTTP
fetch -> disk-cache-on-failure fallback -> in-repo snapshot as last
resort. Never raises to callers; at worst returns the bundled list.
Changes:
- website/static/api/model-catalog.json initial manifest (35 OR + 31 Nous)
- scripts/build_model_catalog.py regenerator from in-repo lists
- hermes_cli/model_catalog.py fetch + validate + cache module
- hermes_cli/models.py fetch_openrouter_models() +
new get_curated_nous_model_ids()
- hermes_cli/main.py, hermes_cli/auth.py Nous flows use the helper
- hermes_cli/config.py model_catalog defaults
- website/docs/reference/model-catalog.md + sidebars.ts
- tests/hermes_cli/test_model_catalog.py 21 tests (validation, fetch
success/failure, accessors,
disabled, overrides, integration)
Stop pre-stripping the path from the configured MCP server URL before
constructing OAuthClientProvider. The MCP SDK strips the path itself via
OAuthContext.get_authorization_base_url() for authorization-server
discovery, but uses the full server_url through
resource_url_from_server_url() + check_resource_allowed() to validate
against the server's RFC 9728 Protected Resource Metadata.
For servers whose PRM advertises a path-scoped resource (e.g. Notion's
https://mcp.notion.com/mcp), our _parse_base_url() collapsed the URL to
the origin, so check_resource_allowed() saw requested='/' vs
configured='/mcp/' and refused the token. Fixes OAuth against Notion MCP
(and any other path-scoped resource).
Closes#16015.
`_apply_model_switch_result` (the interactive `/model` picker's
confirmation path) printed `ModelInfo.context_window` straight from
models.dev, which reports the vendor-wide value (1.05M for gpt-5.5 on
openai). ChatGPT Codex OAuth caps the same slug at 272K, so the picker
showed 1M while the runtime (compressor, gateway `/model`, typed
`/model <name>`) correctly used 272K — the classic 'sometimes 1M,
sometimes 272K' mismatch on a single model.
Both display paths now go through `resolve_display_context_length()`,
matching the fix that `_handle_model_switch` received earlier.
Also bump the stale last-resort fallback in DEFAULT_CONTEXT_LENGTHS
(`gpt-5.5: 400000 -> 1050000`) to match the real OpenAI API value; the
272K Codex cap is already enforced via the Codex-OAuth branch, so the
fallback now reflects what every non-Codex probe-miss should see.
Tests: adds `test_apply_model_switch_result_context.py` with three
scenarios (Codex cap wins, OpenRouter shows 1.05M, resolver-empty falls
back to ModelInfo). Updates the existing non-Codex fallback test to
assert 1.05M (the correct value).
## Validation
| path | before | after |
|-------------------------------|-----------|-----------|
| picker -> gpt-5.5 on Codex | 1,050,000 | 272,000 |
| picker -> gpt-5.5 on OpenAI | 1,050,000 | 1,050,000 |
| picker -> gpt-5.5 on OpenRouter | 1,050,000 | 1,050,000 |
| typed /model gpt-5.5 on Codex | 272,000 | 272,000 |
#14934 added deepseek-v4-pro / deepseek-v4-flash to the DeepSeek native
provider but the context-window lookup still falls back to the existing
"deepseek" substring entry (128K). DeepSeek V4 ships with a 1M context
window, so any caller relying on get_model_context_length() for
pre-flight token budgeting (compression, context warnings) under-counts
by ~8x.
Add explicit lowercase entries for the four DeepSeek model ids that
ship 1M context:
- deepseek-v4-pro
- deepseek-v4-flash
- deepseek-chat (legacy alias, server-side maps to v4-flash non-thinking)
- deepseek-reasoner (legacy alias, server-side maps to v4-flash thinking)
Longest-key-first substring matching means these explicit entries also
cover the vendor-prefixed forms (deepseek/deepseek-v4-pro on OpenRouter
and Nous Portal) without regressing the existing 128K fallback for
older / unknown DeepSeek model ids on custom endpoints.
Source: https://api-docs.deepseek.com/zh-cn/quick_start/pricing
The background skill-review prompt (spawned after N user turns) now instructs
the reviewer to SURVEY existing skills first, identify the CLASS of task, and
PREFER updating/generalizing an existing skill over creating a new narrow one.
This reduces near-duplicate skill accumulation at the source. Catches the
common failure mode where repeated tasks of the same class each spawn their
own specific skill ("fix-my-tauri-error", "fix-my-electron-error") instead
of a single class-level skill ("desktop-app-build-troubleshooting").
Applied to both _SKILL_REVIEW_PROMPT and the **Skills** half of
_COMBINED_REVIEW_PROMPT. Memory-only review prompt unchanged.
Groundwork for the Curator feature (issue #7816) — the creation-side fix.
Curator handles the retirement/consolidation side in a follow-up PR.
Tests assert the behavioral instructions are present (survey, class, update-
over-create, overlap-flagging, opt-out clause) rather than snapshotting the
full prompt text.
Nous Portal multiplexes multiple upstream providers (DeepSeek, Kimi,
MiMo, Hermes) behind one endpoint. Before this fix, any 429 on any of
those models recorded a cross-session file breaker that blocked EVERY
model on Nous for the cooldown window -- even though the caller's
own RPM/RPH/TPM/TPH buckets were healthy. Users hit a DeepSeek V4 Pro
capacity error, restarted, switched to Kimi 2.6, and still got
'Nous Portal rate limit active -- resets in 46m 53s'.
Nous already emits the full x-ratelimit-* header suite on every
response (captured by rate_limit_tracker into agent._rate_limit_state).
We now gate the breaker on that data: trip it only when either the
429's own headers or the last-known-good state show a bucket with
remaining == 0 AND a reset window >= 60s. Upstream-capacity 429s
(healthy buckets everywhere, but upstream out of capacity) fall
through to normal retry/fallback and the breaker is never written.
Note: the in-memory 'restart TUI/gateway to clear' workaround
circulated in Discord does NOT work -- the breaker is file-backed at
~/.hermes/rate_limits/nous.json. The workaround for users still
affected by a bad state file is to delete it.
Reported in Discord by CrazyDok1 and KYSIV (Apr 2026).
Plugin hooks fired after a tool dispatch now receive an integer
duration_ms kwarg measuring how long the tool's registry.dispatch()
call took (time.monotonic() before/after). Inspired by Claude Code
2.1.119 which added the same field to PostToolUse hook inputs.
Wire points:
- model_tools.py: measure dispatch latency, pass duration_ms to
invoke_hook("post_tool_call", ...) and invoke_hook("transform_tool_result", ...)
- hermes_cli/hooks.py: include duration_ms in the synthetic payload
used by 'hermes hooks test' and 'hermes hooks doctor' so shell-hook
authors see the same shape at development time as runtime
- shell hooks (agent/shell_hooks.py): no code change needed;
_serialize_payload already surfaces non-top-level kwargs under
payload['extra'], so duration_ms lands at extra.duration_ms for
shell-hook scripts
Plugin authors can now build latency dashboards, per-tool SLO alerts,
and regression canaries without having to wrap every tool manually.
Test: tests/test_model_tools.py::test_post_tool_call_receives_non_negative_integer_duration_ms
E2E: real PluginManager + dispatch monkey-patched with a 50ms sleep,
hook callback observes duration_ms=50 (int).
Refs: https://code.claude.com/docs/en/changelog (2.1.119, Apr 23 2026)
Adds a floor below --yolo: a tiny set of commands so catastrophic they
should never run via the agent, regardless of --yolo, gateway /yolo,
approvals.mode=off, or cron approve mode. Opting into yolo is trusting
the agent with your files and services — not trusting it to wipe the
disk or power the box off.
The list is deliberately small (12 patterns), covering only
unrecoverable ops:
- rm -rf targeting /, /home, /etc, /usr, /var, /boot, /bin, /sbin,
/lib, ~, $HOME
- mkfs (any variant)
- dd + redirection to raw block devices (/dev/sd*, /dev/nvme*, etc.)
- fork bomb
- kill -1 / kill -9 -1
- shutdown, reboot, halt, poweroff, init 0/6, telinit 0/6,
systemctl poweroff/reboot/halt/kexec
Recoverable-but-costly commands (git reset --hard, rm -rf /tmp/x,
chmod -R 777, curl | sh) stay in DANGEROUS_PATTERNS where yolo can
still pass them through — that's what yolo is for.
Container backends (docker/singularity/modal/daytona) continue to
bypass both hardline and dangerous checks, since nothing they do can
touch the host.
Inspired by Mercury Agent's permission-hardened blocklist.
Bare `hermes setup` on a returning user now drops straight into the
full reconfigure wizard — every prompt shows the current value as its
default, press Enter to keep or type a new value to change it. The
returning-user menu is gone.
Behavior:
- First-time user: first-time wizard (unchanged)
- Returning user, bare command: full reconfigure wizard (new default)
- Returning user, `--quick`: only prompt for missing/unset items
- Returning user, one section: `hermes setup model|terminal|gateway|tools|agent`
- `--reconfigure`: preserved as backwards-compat alias (no-op since it's now default)
The section functions already used current values as prompt defaults —
this change just removes the extra click to get to them.
The 'Quick Setup - configure missing items only' menu option is now
exposed as the explicit `--quick` flag; it's the narrow case of
filling in missing config (e.g. after a partial OpenClaw migration or
when a required API key got cleared).
Inspired by Mercury Agent's `mercury doctor` UX.
Also removes:
- RETURNING_USER_MENU_SECTION_KEYS (orphaned constant)
- Two returning-user menu tests in test_setup_noninteractive.py
(guarding behavior that no longer exists — covered by
test_setup_reconfigure.py instead)
- New website/docs/guides/azure-foundry.md covering both OpenAI-style
and Anthropic-style endpoints, auto-detection behaviour, gpt-5.x
routing, /v1 stripping, api-version query forwarding, and the
provider: anthropic + Azure URL alternative setup.
- environment-variables.md picks up AZURE_FOUNDRY_API_KEY,
AZURE_FOUNDRY_BASE_URL, AZURE_ANTHROPIC_KEY.
- cli-commands.md includes azure-foundry in the provider choices list.
- configuration.md lists azure-foundry among auxiliary-task providers.
- sidebars.ts wires the new guide into the Guides section.
- scripts/release.py AUTHOR_MAP entries for TechPrototyper,
HangGlidersRule (noreply), and pein892 so the contributor-attribution
CI check does not reject the salvage.
The azure-foundry wizard now probes the endpoint before asking the user
to pick anything by hand:
1. URL path sniff — endpoints ending in /anthropic are Azure Foundry
Claude routes and skip to anthropic_messages.
2. GET <base>/models probe — if the endpoint returns an OpenAI-shaped
model list, we switch to chat_completions and prefill the picker
with the returned deployment/model IDs.
3. Anthropic Messages probe — fallback for endpoints that don't expose
/models but do speak the Anthropic Messages shape.
4. Manual fallback — private endpoints / custom routes still work;
the user picks API mode + types a deployment name.
Context length for the selected model is resolved through the existing
agent.model_metadata.get_model_context_length chain (models.dev,
provider metadata, hardcoded family fallbacks) and stored in
model.context_length when a non-default value is found.
Also refactors runtime_provider so Azure Foundry resolution is reused
between the explicit-credentials path and the default top-level path —
previously the /v1 strip for Anthropic-style Azure only ran when the
caller passed explicit_* args, which meant config-driven sessions
hit a double-/v1 URL.
New module hermes_cli/azure_detect.py with 19 unit tests covering:
- path sniff, model ID extraction, probe fallbacks
- HTTP error handling (URLError, HTTPError)
- context-length lookup passthrough
- DEFAULT_FALLBACK_CONTEXT rejection
New runtime tests cover:
- OpenAI-style Azure Foundry
- Anthropic-style Azure Foundry with /v1 stripping
- Missing base_url / API key raising AuthError
Rationale: Microsoft confirms there's no pure-API-key endpoint to list
Azure deployments (that requires ARM management auth). The v1 Azure
OpenAI endpoint does expose /models with the resource's available
model catalog, which is good enough for picker prefill in the common
case. Users on private/gated endpoints fall through to manual entry.
Azure OpenAI exposes an OpenAI-compatible endpoint at
`{resource}.openai.azure.com/openai/v1` that accepts the standard
`openai` Python client. Two issues prevented gpt-5.x models from working:
1. `_max_tokens_param()` only sent `max_completion_tokens` for
`api.openai.com` URLs. Azure also requires `max_completion_tokens`
for gpt-5.x models.
2. The `codex_responses` upgrade gate unconditionally upgraded gpt-5.x
to Responses API. Azure does NOT support the Responses API — it serves
gpt-5.x on the regular `/chat/completions` path, causing a 404.
Fix: add `_is_azure_openai_url()` that matches `openai.azure.com` URLs.
- `_max_tokens_param()` now returns `max_completion_tokens` for Azure.
- The `codex_responses` upgrade gate skips Azure so gpt-5.x stays on
`chat_completions` where Azure actually serves it.
- The fallback-provider api_mode picker also recognises Azure and stays
on chat_completions.
- Tests cover max_tokens routing, api_mode behaviour, and URL detection.
gpt-4.x models on Azure are unaffected (already used chat_completions +
max_tokens, which Azure accepts for those models).
Salvage of PR #10086 — rewritten against current main where the
codex_responses upgrade gate gained copilot-acp / explicit-api_mode
exclusions.
Azure OpenAI requires an `api-version` query parameter on every request.
When users include it in the base_url (e.g. `?api-version=2025-04-01-preview`),
the OpenAI SDK silently drops it during URL construction, causing 404 errors.
Extract query params from base_url and pass them via `default_query` so the
SDK appends them to every request. This is a generic solution that works for
any custom endpoint requiring query parameters, not just Azure.
No-op for URLs without query params — fully backward compatible.
Add support for Azure Foundry as a new inference provider. Azure Foundry
endpoints can use either OpenAI-style (/v1/chat/completions) or
Anthropic-style (/v1/messages) API formats.
Changes:
- Add azure-foundry to PROVIDER_REGISTRY (auth.py)
- Add azure-foundry overlay in HERMES_OVERLAYS (providers.py)
- Add empty model list for azure-foundry (models.py)
- Add _model_flow_azure_foundry() interactive setup (main.py)
- Add azure-foundry runtime resolution with api_mode support (runtime_provider.py)
- Add AZURE_FOUNDRY_API_KEY and AZURE_FOUNDRY_BASE_URL env vars (config.py)
Usage:
hermes model -> More providers -> Azure Foundry
The setup wizard prompts for:
- Endpoint URL
- API format (OpenAI or Anthropic-style)
- API key
- Model name
Configuration is saved to config.yaml (model.provider, model.base_url,
model.api_mode, model.default) and ~/.hermes/.env (AZURE_FOUNDRY_API_KEY).