fix(kanban): deep-scan pass 2 — synthetic runs, event.run_id plumbing, invariant recovery, live drawer refresh

Second integration audit covering surfaces the first pass didn't hit. Found eight issues spanning kernel, dashboard frontend, notifier, and CLI. All behavioral / UX fixes; no schema change. Kernel - complete_task on a never-claimed task (ready/blocked → done with no run in flight) was silently dropping the summary/metadata/result onto a non-existent run. Now synthesizes a zero-duration run (started_at == ended_at) so attempt history is complete. Only fires when there's actually handoff data to persist — bare complete_task(tid) remains a no-op for run creation. - block_task on a never-claimed task had the same bug for --reason. Same fix: synthesize a zero-duration run when a reason is passed. - Event dataclass gained a `run_id: Optional[int] = None` field. list_events, unseen_events_for_sub, and the dashboard _event_dict were all SELECTing the column but dropping it on the way out, so downstream consumers couldn't group events by attempt. Every read path now surfaces run_id. - claim_task got a defensive invariant-recovery step: if somehow `current_run_id` is non-NULL on a task in 'ready' status (invariant violation from an unknown code path), close the leaked run as 'reclaimed' inside the same txn as the new claim. No-op in the common case; belt-and-suspenders in case a future code path forgets to clear the pointer. Dashboard - GET /tasks/:id events array now carries run_id per event (via _event_dict). - WebSocket /events SELECT now includes run_id in the pushed event payload. - TaskDrawer reloads itself on live events for its own task id. New `taskEventTick[taskId]` state in the Board, incremented on every WS event, passed down as `eventTick` prop; drawer's useEffect depends on it. Previously, background workers completing a task the user was viewing left the drawer showing stale data until manual close/reopen. - CSS: added `.hermes-kanban-run--ended` rule for the fallback class the JS emits when outcome is unset. Harmless before; just inconsistent. CLI - `hermes kanban watch --kinds` help text listed the legacy event name `spawn_auto_blocked`. The kernel migration renames it to `gave_up`, so users typing the documented name got zero matches. Now shows the current lexicon (`completed,blocked,gave_up, crashed,timed_out`). Tests (+6 in core functionality, +1 in dashboard plugin) - complete_never_claimed_task_synthesizes_run - block_never_claimed_task_synthesizes_run - complete_never_claimed_without_handoff_skips_synthesis - event_dataclass_carries_run_id (created.run_id None, completed.run_id matches) - unseen_events_for_sub_includes_run_id (notifier path) - claim_task_recovers_from_invariant_leak (engineer the leak, verify recovery) - event_dict_includes_run_id (dashboard API shape) 171/171 kanban suite pass under scripts/run_tests.sh. Live-smoke (isolated HERMES_HOME via execute_code) exercised all six fixed paths plus the claim-after-leak recovery sequence. Docs - Runs section: new 'Synthetic runs for never-claimed completions' and 'Live drawer refresh' paragraphs explaining the invariants. - Event reference: `created` / `promoted` / `unblocked` entries now explicitly note `run_id` is `NULL`; `completed` / `blocked` describe synthetic-run fallback.
2026-05-04 01:37:34 +08:00 · 2026-04-27 19:23:49 -07:00
parent 1c78f6627a
commit e27c819de3
8 changed files with 303 additions and 8 deletions
--- a/tests/plugins/test_kanban_dashboard_plugin.py
+++ b/tests/plugins/test_kanban_dashboard_plugin.py
@@ -757,3 +757,27 @@ def test_patch_status_archive_closes_running_run(client):
        assert kb.latest_run(conn, tid).outcome == "reclaimed"
    finally:
        conn.close()
+
+
+def test_event_dict_includes_run_id(client):
+    """GET /tasks/:id returns events with run_id populated."""
+    r = client.post("/api/plugins/kanban/tasks", json={"title": "e", "assignee": "worker"})
+    tid = r.json()["task"]["id"]
+    from hermes_cli import kanban_db as kb
+    conn = kb.connect()
+    try:
+        kb.claim_task(conn, tid)
+        run_id = kb.latest_run(conn, tid).id
+        kb.complete_task(conn, tid, summary="wss")
+    finally:
+        conn.close()
+
+    r = client.get(f"/api/plugins/kanban/tasks/{tid}")
+    assert r.status_code == 200
+    events = r.json()["events"]
+    # Every event in the response must have a run_id key (None or int).
+    for e in events:
+        assert "run_id" in e, f"missing run_id in event: {e}"
+    # completed event must have the actual run_id.
+    comp = [e for e in events if e["kind"] == "completed"]
+    assert comp[0]["run_id"] == run_id