mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-06 10:47:12 +08:00
* docs: document /kanban slash command
The kanban user guide and slash-commands reference only mentioned the
/kanban slash command in passing. Add a proper section covering:
- CLI and gateway both expose the full hermes kanban surface via
hermes_cli.kanban.run_slash (identical argument surface)
- Mid-run usage: /kanban bypasses the running-agent guard, so reads
and writes land immediately while an agent is still in a turn
- Auto-subscribe on /kanban create from the gateway — originating
chat is subscribed to terminal events, with a worked example
- Output truncation (~3800 chars) in messaging
- Autocomplete hint list vs full subcommand surface
Also adds /kanban rows to both slash-command tables (CLI + messaging)
in reference/slash-commands.md and moves it into the 'works in both'
notes bucket.
* docs(kanban): frame the model's tool surface as primary, CLI as the human surface
The kanban user guide and CLI reference read as if you drive the board
by running `hermes kanban` commands everywhere. In practice:
- **You** (human, scripts, cron, dashboard) use the `hermes kanban …`
CLI, the `/kanban …` slash command, or the REST/dashboard.
- **Workers** spawned by the dispatcher use a dedicated `kanban_*`
toolset (`kanban_show`, `kanban_complete`, `kanban_block`,
`kanban_heartbeat`, `kanban_comment`, `kanban_create`,
`kanban_link`) and never shell out to the CLI.
Changes to `user-guide/features/kanban.md`:
- New 'Two surfaces' intro distinguishes the two front doors up front.
- Quick-start section re-labelled so each step says who is running it
(you vs. orchestrator vs. worker).
- 'How workers interact with the board' rewritten:
- Lead with "Workers do not shell out to `hermes kanban`."
- Tool table extended with required params.
- Concrete worker-turn example (`kanban_show` → `kanban_heartbeat`
→ `kanban_complete`) and an orchestrator fan-out example
(`kanban_create` x N with `parents=[...]`).
- Moved 'Why tools not CLI' from a defensive aside to a clean
follow-up section.
- 'Worker skill' section explicitly says the lifecycle is taught
in tool calls, not CLI commands.
- 'Pinning extra skills' reordered — orchestrator tool form first
(the usual case), human/CLI second, dashboard third.
- 'Orchestrator skill' now shows a canonical `kanban_create` /
`kanban_link` / `kanban_complete` tool-call sequence instead of
only describing what the skill teaches.
- CLI-command-reference heading now clarifies this is the human
surface, with a cross-link to the tool-surface section.
- 'Runs — one row per attempt' structured-handoff example replaced:
the primary example is now `kanban_complete(summary=..., metadata=...)`
(what a worker actually does), with the CLI form retained as
"when you, the human, need to close a task a worker can't."
Changes to `reference/cli-commands.md`:
- `hermes kanban` intro marks itself as the human / scripting surface
and links out to the worker tool surface.
- Corrected `comment <id>` description — the next worker reads it via
`kanban_show()`, not by running `hermes kanban show`.
* docs(kanban-tutorial): reframe worker actions as tool calls
Honest answer to Teknium's follow-up: no, the first pass missed the
tutorial. The four stories all showed `hermes kanban claim /
complete / block / unblock` as if the backend-dev, pm, and reviewer
personas were humans running CLI commands. In a real hermes kanban
run those agents are dispatcher-spawned workers driving the board
through the `kanban_*` tool surface.
Changes:
- Setup intro now distinguishes the three surfaces up front
(dashboard / CLI for you, `kanban_*` tools for workers) and
establishes the convention: `bash` blocks are commands *you* run,
`# worker tool calls` blocks are what the agent emits.
- Story 1 (solo dev schema): 'Claim the schema task, do the work,
hand off' block replaced with the dispatcher spawning the
backend-dev worker and a `kanban_show → kanban_heartbeat →
kanban_complete` tool-call sequence. The 'On the CLI' `hermes
kanban show / runs` block re-labelled as 'you peeking at the board'
to keep it correct as a human inspection step.
- Story 2 (fleet farming): note about structured handoff updated
from `--summary` / `--metadata` CLI flags to
`kanban_complete(summary=..., metadata=...)` tool form.
- Story 3 (role pipeline): the big PM/engineer/reviewer block fully
rewritten as three worker tool-call sequences — PM worker
completes spec, engineer worker blocks, human/reviewer
`hermes kanban unblock` (or `/kanban unblock`), engineer worker
respawns and completes. The respawn-as-new-run mechanic is now
explicit.
- Reviewer paragraph: `build_worker_context` replaced with
`kanban_show()` — that's the tool that delivers the parent
handoff to the model.
- Structured handoff section heading and body updated:
`--summary`/`--metadata` → `summary`/`metadata` (tool params),
with a note that the tool surface doesn't expose a bulk variant
for the same reason the CLI refuses multi-task `complete`.
Story 4 (circuit breaker) unchanged — its workers fail to spawn,
so there are no tool calls to show; the `hermes kanban create` and
`hermes kanban runs` commands in it are correctly human-driven.
308 lines
18 KiB
Markdown
308 lines
18 KiB
Markdown
# Kanban tutorial
|
|
|
|
A walkthrough of the four use-cases the Hermes Kanban system was designed for, with the dashboard open in a browser. If you haven't read the [Kanban overview](./kanban) yet, start there — this assumes you know what a task, run, assignee, and dispatcher are.
|
|
|
|
## Setup
|
|
|
|
```bash
|
|
hermes kanban init # optional; first `hermes kanban <anything>` auto-inits
|
|
hermes dashboard # opens http://127.0.0.1:9119 in your browser
|
|
# click Kanban in the left nav
|
|
```
|
|
|
|
The dashboard is the most comfortable place for **you** to watch the system. Agent workers the dispatcher spawns never see the dashboard or the CLI — they drive the board through a dedicated `kanban_*` [toolset](./kanban#how-workers-interact-with-the-board) (`kanban_show`, `kanban_complete`, `kanban_block`, `kanban_heartbeat`, `kanban_comment`, `kanban_create`, `kanban_link`). All three surfaces — dashboard, CLI, worker tools — route through the same `~/.hermes/kanban.db`, so the board is consistent no matter which side of the fence a change came from.
|
|
|
|
Throughout the tutorial, **code blocks labelled `bash` are commands *you* run.** Code blocks labelled `# worker tool calls` are what the spawned worker's model emits as tool calls — shown here so you can see the loop end-to-end, not because you'd ever run them yourself.
|
|
|
|
## The board at a glance
|
|
|
|

|
|
|
|
Six columns, left to right:
|
|
|
|
- **Triage** — raw ideas, a specifier will flesh out the spec before anyone works on them.
|
|
- **Todo** — created but waiting on dependencies, or not yet assigned.
|
|
- **Ready** — assigned and waiting for the dispatcher to claim.
|
|
- **In progress** — a worker is actively running the task. With "Lanes by profile" on (the default), this column sub-groups by assignee so you can see at a glance what each worker is doing.
|
|
- **Blocked** — a worker asked for human input, or the circuit breaker tripped.
|
|
- **Done** — completed.
|
|
|
|
The top bar has filters for search, tenant, and assignee, plus a `Lanes by profile` toggle and a `Nudge dispatcher` button that runs one dispatch tick right now instead of waiting for the daemon's next interval. Clicking any card opens its drawer on the right.
|
|
|
|
### Flat view
|
|
|
|
If the profile lanes are noisy, toggle "Lanes by profile" off and the In Progress column collapses to a single flat list ordered by claim time:
|
|
|
|

|
|
|
|
## Story 1 — Solo dev shipping a feature
|
|
|
|
You're building a feature. Classic flow: design a schema, implement the API, write the tests. Three tasks with parent→child dependencies.
|
|
|
|
```bash
|
|
SCHEMA=$(hermes kanban create "Design auth schema" \
|
|
--assignee backend-dev --tenant auth-project --priority 2 \
|
|
--body "Design the user/session/token schema for the auth module." \
|
|
--json | jq -r .id)
|
|
|
|
API=$(hermes kanban create "Implement auth API endpoints" \
|
|
--assignee backend-dev --tenant auth-project --priority 2 \
|
|
--parent $SCHEMA \
|
|
--body "POST /register, POST /login, POST /refresh, POST /logout." \
|
|
--json | jq -r .id)
|
|
|
|
hermes kanban create "Write auth integration tests" \
|
|
--assignee qa-dev --tenant auth-project --priority 2 \
|
|
--parent $API \
|
|
--body "Cover happy path, wrong password, expired token, concurrent refresh."
|
|
```
|
|
|
|
Because `API` has `SCHEMA` as its parent, and `tests` has `API` as its parent, only `SCHEMA` starts in `ready`. The other two sit in `todo` until their parents complete. This is the dependency promotion engine doing its job — no other worker will pick up the test-writing until there's an API to test.
|
|
|
|
On the next dispatcher tick (60s by default, or immediately if you hit **Nudge dispatcher**) the `backend-dev` profile spawns as a worker with `HERMES_KANBAN_TASK=$SCHEMA` in its env. Here's what the worker's tool-call loop looks like from inside the agent:
|
|
|
|
```python
|
|
# worker tool calls — NOT commands you run
|
|
kanban_show()
|
|
# → returns title, body, worker_context, parents, prior attempts, comments
|
|
|
|
# (worker reads worker_context, uses terminal/file tools to design the schema,
|
|
# write migrations, run its own checks, commit — the real work happens here)
|
|
|
|
kanban_heartbeat(note="schema drafted, writing migrations now")
|
|
|
|
kanban_complete(
|
|
summary="users(id, email, pw_hash), sessions(id, user_id, jti, expires_at); "
|
|
"refresh tokens stored as sessions with type='refresh'",
|
|
metadata={
|
|
"changed_files": ["migrations/001_users.sql", "migrations/002_sessions.sql"],
|
|
"decisions": ["bcrypt for hashing", "JWT for session tokens",
|
|
"7-day refresh, 15-min access"],
|
|
},
|
|
)
|
|
```
|
|
|
|
`kanban_show` defaults `task_id` to `$HERMES_KANBAN_TASK`, so the worker doesn't need to know its own id. `kanban_complete` writes the summary + metadata onto the current `task_runs` row, closes that run, and transitions the task to `done` — all in one atomic hop through `kanban_db`.
|
|
|
|
When `SCHEMA` hits `done`, the dependency engine promotes `API` to `ready` automatically. The API worker, when it picks up, will call `kanban_show()` and see `SCHEMA`'s summary and metadata attached to the parent handoff — so it knows the schema decisions without re-reading a long design doc.
|
|
|
|
Click the completed schema task on the board and the drawer shows everything:
|
|
|
|

|
|
|
|
The Run History section at the bottom is the key addition. One attempt: outcome `completed`, worker `@backend-dev`, duration, timestamp, and the handoff summary in full. The metadata blob (`changed_files`, `decisions`) is stored on the run too and surfaced to any downstream worker that reads this parent.
|
|
|
|
You can inspect the same data from your terminal at any time — these commands are **you** peeking at the board, not the worker:
|
|
|
|
```bash
|
|
hermes kanban show $SCHEMA
|
|
hermes kanban runs $SCHEMA
|
|
# # OUTCOME PROFILE ELAPSED STARTED
|
|
# 1 completed backend-dev 0s 2026-04-27 19:34
|
|
# → users(id, email, pw_hash), sessions(id, user_id, jti, expires_at); refresh tokens ...
|
|
```
|
|
|
|
## Story 2 — Fleet farming
|
|
|
|
You have three workers (a translator, a transcriber, a copywriter) and a pile of independent tasks. You want all three pulling in parallel and making visible progress. This is the simplest kanban use-case and the one the original design optimized for.
|
|
|
|
Create the work:
|
|
|
|
```bash
|
|
for lang in Spanish French German; do
|
|
hermes kanban create "Translate homepage to $lang" \
|
|
--assignee translator --tenant content-ops
|
|
done
|
|
for i in 1 2 3 4 5; do
|
|
hermes kanban create "Transcribe Q3 customer call #$i" \
|
|
--assignee transcriber --tenant content-ops
|
|
done
|
|
for sku in 1001 1002 1003 1004; do
|
|
hermes kanban create "Generate product description: SKU-$sku" \
|
|
--assignee copywriter --tenant content-ops
|
|
done
|
|
```
|
|
|
|
Start the gateway and walk away — it hosts the embedded dispatcher
|
|
that picks up all three specialist profiles' tasks on the same
|
|
kanban.db:
|
|
|
|
```bash
|
|
hermes gateway start
|
|
```
|
|
|
|
Now filter the board to `content-ops` (or just search for "Transcribe") and you get this:
|
|
|
|

|
|
|
|
Two transcribes done, one running, two ready waiting for the next dispatcher tick. The In Progress column is grouped by profile (the "Lanes by profile" default) so you see each worker's active task without scanning a mixed list. The dispatcher will promote the next ready task to running as soon as the current one completes. With three daemons working on three assignee pools in parallel, the whole content queue drains without further human input.
|
|
|
|
**Everything Story 1 said about structured handoff still applies here.** A translator worker completing a call emits `kanban_complete(summary="translated 4 pages, style matched existing marketing voice", metadata={"duration_seconds": 720, "tokens_used": 2100})` — useful for analytics and for any downstream task that depends on this one.
|
|
|
|
## Story 3 — Role pipeline with retry
|
|
|
|
This is where Kanban earns its keep over a flat TODO list. A PM writes a spec. An engineer implements it. A reviewer rejects the first attempt. The engineer tries again with changes. The reviewer approves.
|
|
|
|
The dashboard view, filtered by `auth-project`:
|
|
|
|

|
|
|
|
Three-stage chain visible at once: `Spec: password reset flow` (DONE, pm), `Implement password reset flow` (DONE, backend-dev), `Review password reset PR` (READY, reviewer). Each has its parent in green at the bottom and children as dependencies.
|
|
|
|
The interesting one is the implementation task, because it was blocked and retried. Here's the full three-agent choreography, shown as the tool calls each worker's model makes:
|
|
|
|
```python
|
|
# --- PM worker spawns on $SPEC and writes the acceptance criteria ---
|
|
# worker tool calls
|
|
kanban_show()
|
|
kanban_complete(
|
|
summary="spec approved; POST /forgot-password sends email, "
|
|
"GET /reset/:token renders form, POST /reset applies new password",
|
|
metadata={"acceptance": [
|
|
"expired token returns 410",
|
|
"reused last-3 password returns 400 with message",
|
|
"successful reset invalidates all active sessions",
|
|
]},
|
|
)
|
|
# → $SPEC is done; $IMPL auto-promotes from todo to ready
|
|
|
|
# --- Engineer worker spawns on $IMPL (first attempt) ---
|
|
# worker tool calls
|
|
kanban_show() # reads $SPEC's summary + acceptance metadata in worker_context
|
|
# (engineer writes code, runs tests, opens PR)
|
|
# Reviewer feedback arrives — engineer decides the concerns are valid and blocks
|
|
kanban_block(
|
|
reason="Review: password strength check missing, reset link isn't "
|
|
"single-use (can be replayed within 30min)",
|
|
)
|
|
# → $IMPL transitions to blocked; run 1 closes with outcome='blocked'
|
|
```
|
|
|
|
Now you (the human, or a separate reviewer profile) read the block reason, decide the fix direction is clear, and unblock from the dashboard's "Unblock" button — or from the CLI / slash command:
|
|
|
|
```bash
|
|
hermes kanban unblock $IMPL
|
|
# or from a chat: /kanban unblock $IMPL
|
|
```
|
|
|
|
The dispatcher promotes `$IMPL` back to `ready` and, on the next tick, respawns the `backend-dev` worker. This second spawn is a **new run** on the same task:
|
|
|
|
```python
|
|
# --- Engineer worker spawns on $IMPL (second attempt) ---
|
|
# worker tool calls
|
|
kanban_show()
|
|
# → worker_context now includes the run 1 block reason, so this worker knows
|
|
# which two things to fix instead of re-reading the whole spec
|
|
# (engineer adds zxcvbn check, makes reset tokens single-use, re-runs tests)
|
|
kanban_complete(
|
|
summary="added zxcvbn strength check, reset tokens are now single-use "
|
|
"(stored + deleted on success)",
|
|
metadata={
|
|
"changed_files": [
|
|
"auth/reset.py",
|
|
"auth/tests/test_reset.py",
|
|
"migrations/003_single_use_reset_tokens.sql",
|
|
],
|
|
"tests_run": 11,
|
|
"review_iteration": 2,
|
|
},
|
|
)
|
|
```
|
|
|
|
Click the implementation task. The drawer shows **two attempts**:
|
|
|
|

|
|
|
|
- **Run 1** — `blocked` by `@backend-dev`. The review feedback sits right under the outcome: "password strength check missing, reset link isn't single-use (can be replayed within 30min)".
|
|
- **Run 2** — `completed` by `@backend-dev`. Fresh summary, fresh metadata.
|
|
|
|
Each run is a row in `task_runs` with its own outcome, summary, and metadata. Retry history is not a conceptual afterthought layered on top of a "latest state" task — it's the primary representation. When a retrying worker opens the task, `build_worker_context` shows it the prior attempts, so the second-pass worker sees why the first pass was blocked and addresses those specific findings instead of re-running from scratch.
|
|
|
|
The reviewer picks up next. When they open `Review password reset PR`, they see:
|
|
|
|

|
|
|
|
The parent link is the completed implementation. When the reviewer's worker spawns on `Review password reset PR` and calls `kanban_show()`, the returned `worker_context` includes the parent's most-recent-completed-run summary + metadata — so the reviewer reads "added zxcvbn strength check, reset tokens are now single-use" and has the list of changed files in hand before looking at a diff.
|
|
|
|
## Story 4 — Circuit breaker and crash recovery
|
|
|
|
Real workers fail. Missing credentials, OOM kills, transient network errors. The dispatcher has two lines of defense: a **circuit breaker** that auto-blocks after N consecutive failures so the board doesn't thrash forever, and **crash detection** that reclaims a task whose worker PID went away before its TTL expired.
|
|
|
|
### Circuit breaker — permanent-looking failure
|
|
|
|
A deploy task that can't spawn its worker because `AWS_ACCESS_KEY_ID` isn't set in the profile's environment:
|
|
|
|
```bash
|
|
hermes kanban create "Deploy to staging (missing creds)" \
|
|
--assignee deploy-bot --tenant ops
|
|
```
|
|
|
|
The dispatcher tries to spawn the worker. Spawn fails (`RuntimeError: AWS_ACCESS_KEY_ID not set`). The dispatcher releases the claim, increments a failure counter, and tries again next tick. After three consecutive failures (the default `failure_limit`), the circuit trips: the task goes to `blocked` with outcome `gave_up`. No more retries until a human unblocks it.
|
|
|
|
Click the blocked task:
|
|
|
|

|
|
|
|
Three runs, all with the same error on the `error` field. The first two are `spawn_failed` (retryable), the third is `gave_up` (terminal). The event log above shows the full sequence: `created → claimed → spawn_failed → claimed → spawn_failed → claimed → gave_up`.
|
|
|
|
On the terminal:
|
|
|
|
```bash
|
|
hermes kanban runs t_ef5d
|
|
# # OUTCOME PROFILE ELAPSED STARTED
|
|
# 1 spawn_failed deploy-bot 0s 2026-04-27 19:34
|
|
# ! AWS_ACCESS_KEY_ID not set in deploy-bot env
|
|
# 2 spawn_failed deploy-bot 0s 2026-04-27 19:34
|
|
# ! AWS_ACCESS_KEY_ID not set in deploy-bot env
|
|
# 3 gave_up deploy-bot 0s 2026-04-27 19:34
|
|
# ! AWS_ACCESS_KEY_ID not set in deploy-bot env
|
|
```
|
|
|
|
If Telegram / Discord / Slack is wired in, a gateway notification fires on the `gave_up` event so you hear about the outage without having to check the board.
|
|
|
|
### Crash recovery — worker dies mid-flight
|
|
|
|
Sometimes the spawn succeeds but the worker process dies later — segfault, OOM, `systemctl stop`. The dispatcher polls `kill(pid, 0)` and detects the dead pid; the claim releases, the task goes back to `ready`, and the next tick gives it to a fresh worker.
|
|
|
|
The example in the seed data is a migration that was running out of memory:
|
|
|
|
```bash
|
|
# Worker claims, starts scanning 2.4M rows, OOM kills it at ~2.3M
|
|
# Dispatcher detects dead pid, releases claim, increments attempt counter
|
|
# Retry with a chunked strategy succeeds
|
|
```
|
|
|
|
The drawer shows the full two-attempt history:
|
|
|
|

|
|
|
|
Run 1 — `crashed`, with the error `OOM kill at row 2.3M (process 99999 gone)`. Run 2 — `completed`, with `"strategy": "chunked with LIMIT + WHERE id > last_id"` in its metadata. The retrying worker saw the crash of run 1 in its context and picked a safer strategy; the metadata makes it obvious to a future observer (or postmortem writer) what changed.
|
|
|
|
## Structured handoff — why `summary` and `metadata` matter
|
|
|
|
In every story above, workers called `kanban_complete(summary=..., metadata=...)` at the end. That's not decoration — it's the primary handoff channel between stages of a workflow.
|
|
|
|
When a worker on task B is spawned and calls `kanban_show()`, the `worker_context` it gets back includes:
|
|
|
|
- B's **prior attempts** (previous runs: outcome, summary, error, metadata) so a retrying worker doesn't repeat a failed path.
|
|
- **Parent task results** — for each parent, the most-recent completed run's summary and metadata — so downstream workers see why and how the upstream work was done.
|
|
|
|
This replaces the "dig through comments and the work output" dance that plagues flat kanban systems. A PM writes acceptance criteria in the spec's metadata, and the engineer's worker sees them structurally in the parent handoff. An engineer records which tests they ran and how many passed, and the reviewer's worker has that list in hand before opening a diff.
|
|
|
|
The bulk-close guard exists because this data is per-run. `hermes kanban complete a b c --summary X` (you, from the CLI) is refused — copy-pasting the same summary to three tasks is almost always wrong. Bulk close without the handoff flags still works for the common "I finished a pile of admin tasks" case. The tool surface doesn't expose a bulk variant at all; `kanban_complete` is always single-task-at-a-time for the same reason.
|
|
|
|
## Inspecting a task currently running
|
|
|
|
For completeness — here's the drawer of a task still in flight (the API implementation from Story 1, claimed by `backend-dev` but not yet complete):
|
|
|
|

|
|
|
|
Status is `Running`. The active run appears in the Run History section with outcome `active` and no `ended_at`. If this worker dies or times out, the dispatcher closes this run with the appropriate outcome and opens a new one on the next claim — the attempt row never disappears.
|
|
|
|
## Next steps
|
|
|
|
- [Kanban overview](./kanban) — the full data model, event vocabulary, and CLI reference.
|
|
- `hermes kanban --help` — every subcommand, every flag.
|
|
- `hermes kanban watch --kinds completed,gave_up,timed_out` — live stream terminal events across the whole board.
|
|
- `hermes kanban notify-subscribe <task> --platform telegram --chat-id <id>` — get a gateway ping when a specific task finishes.
|