feat(curator): background skill maintenance (issue #7816)
Adds the Curator — an auxiliary-model background task that periodically
reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage,
transitions unused skills through active → stale → archived, and spawns
a forked AIAgent to consolidate overlaps and patch drift.
Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI
startup and gateway boot when the last run is older than interval_hours
(default 24) AND the agent has been idle for min_idle_hours (default 2).
Invariants (all load-bearing):
- Never touches bundled or hub-installed skills (.bundled_manifest +
.hub/lock.json double-filter)
- Never auto-deletes — archive only. Archives are recoverable
via `hermes curator restore <skill>`
- Pinned skills bypass all auto-transitions
- Uses the aux client; never touches the main session's prompt cache
New files:
- tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes,
provenance filter
- agent/curator.py — orchestrator: config, idle gating, state-machine
transitions (pure, no LLM), forked-agent review prompt
- hermes_cli/curator.py — `hermes curator {status,run,pause,resume,
pin,unpin,restore}` subcommand
- tests/tools/test_skill_usage.py — 29 tests
- tests/agent/test_curator.py — 25 tests
Modified files (surgical patches):
- tools/skills_tool.py — bump view_count on successful skill_view
- tools/skill_manager_tool.py — bump patch_count on skill_manage
patch/edit/write_file/remove_file; forget record on delete
- hermes_cli/config.py — add curator: section to DEFAULT_CONFIG
- hermes_cli/commands.py — add /curator CommandDef with subcommands
- hermes_cli/main.py — register `hermes curator` subparser via
register_cli() from hermes_cli.curator
- cli.py — /curator slash-command dispatch + startup hook
- gateway/run.py — gateway-boot hook (mirrors CLI)
Validation:
- 54 new tests across skill_usage + curator, all passing in 3s
- 346 tests across all touched files' neighbors green
- 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green
- CLI smoke: `hermes curator status/pause/resume` work end-to-end
Companion to PR #16026 (class-first skill review prompt) — together
they form a loop: the review prompt stops near-duplicate skill creation
at the source, and the curator prunes/consolidates what still accumulates.
Refs #7816.
2026-04-26 06:08:39 -07:00
|
|
|
"""Curator — background skill maintenance orchestrator.
|
|
|
|
|
|
|
|
|
|
The curator is an auxiliary-model task that periodically reviews agent-created
|
|
|
|
|
skills and maintains the collection. It runs inactivity-triggered (no cron
|
|
|
|
|
daemon): when the agent is idle and the last curator run was longer than
|
|
|
|
|
``interval_hours`` ago, ``maybe_run_curator()`` spawns a forked AIAgent to do
|
|
|
|
|
the review.
|
|
|
|
|
|
|
|
|
|
Responsibilities:
|
2026-04-30 20:23:17 +08:00
|
|
|
- Auto-transition lifecycle states based on derived skill activity timestamps
|
feat(curator): background skill maintenance (issue #7816)
Adds the Curator — an auxiliary-model background task that periodically
reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage,
transitions unused skills through active → stale → archived, and spawns
a forked AIAgent to consolidate overlaps and patch drift.
Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI
startup and gateway boot when the last run is older than interval_hours
(default 24) AND the agent has been idle for min_idle_hours (default 2).
Invariants (all load-bearing):
- Never touches bundled or hub-installed skills (.bundled_manifest +
.hub/lock.json double-filter)
- Never auto-deletes — archive only. Archives are recoverable
via `hermes curator restore <skill>`
- Pinned skills bypass all auto-transitions
- Uses the aux client; never touches the main session's prompt cache
New files:
- tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes,
provenance filter
- agent/curator.py — orchestrator: config, idle gating, state-machine
transitions (pure, no LLM), forked-agent review prompt
- hermes_cli/curator.py — `hermes curator {status,run,pause,resume,
pin,unpin,restore}` subcommand
- tests/tools/test_skill_usage.py — 29 tests
- tests/agent/test_curator.py — 25 tests
Modified files (surgical patches):
- tools/skills_tool.py — bump view_count on successful skill_view
- tools/skill_manager_tool.py — bump patch_count on skill_manage
patch/edit/write_file/remove_file; forget record on delete
- hermes_cli/config.py — add curator: section to DEFAULT_CONFIG
- hermes_cli/commands.py — add /curator CommandDef with subcommands
- hermes_cli/main.py — register `hermes curator` subparser via
register_cli() from hermes_cli.curator
- cli.py — /curator slash-command dispatch + startup hook
- gateway/run.py — gateway-boot hook (mirrors CLI)
Validation:
- 54 new tests across skill_usage + curator, all passing in 3s
- 346 tests across all touched files' neighbors green
- 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green
- CLI smoke: `hermes curator status/pause/resume` work end-to-end
Companion to PR #16026 (class-first skill review prompt) — together
they form a loop: the review prompt stops near-duplicate skill creation
at the source, and the curator prunes/consolidates what still accumulates.
Refs #7816.
2026-04-26 06:08:39 -07:00
|
|
|
- Spawn a background review agent that can pin / archive / consolidate /
|
|
|
|
|
patch agent-created skills via skill_manage
|
|
|
|
|
- Persist curator state (last_run_at, paused, etc.) in .curator_state
|
|
|
|
|
|
|
|
|
|
Strict invariants:
|
|
|
|
|
- Only touches agent-created skills (see tools/skill_usage.is_agent_created)
|
|
|
|
|
- Never auto-deletes — only archives. Archive is recoverable.
|
|
|
|
|
- Pinned skills bypass all auto-transitions
|
|
|
|
|
- Uses the auxiliary client; never touches the main session's prompt cache
|
|
|
|
|
"""
|
|
|
|
|
|
|
|
|
|
from __future__ import annotations
|
|
|
|
|
|
|
|
|
|
import json
|
|
|
|
|
import logging
|
|
|
|
|
import os
|
|
|
|
|
import tempfile
|
|
|
|
|
import threading
|
|
|
|
|
from datetime import datetime, timedelta, timezone
|
|
|
|
|
from pathlib import Path
|
2026-05-02 01:13:17 +08:00
|
|
|
from typing import Any, Callable, Dict, List, NamedTuple, Optional, Set
|
feat(curator): background skill maintenance (issue #7816)
Adds the Curator — an auxiliary-model background task that periodically
reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage,
transitions unused skills through active → stale → archived, and spawns
a forked AIAgent to consolidate overlaps and patch drift.
Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI
startup and gateway boot when the last run is older than interval_hours
(default 24) AND the agent has been idle for min_idle_hours (default 2).
Invariants (all load-bearing):
- Never touches bundled or hub-installed skills (.bundled_manifest +
.hub/lock.json double-filter)
- Never auto-deletes — archive only. Archives are recoverable
via `hermes curator restore <skill>`
- Pinned skills bypass all auto-transitions
- Uses the aux client; never touches the main session's prompt cache
New files:
- tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes,
provenance filter
- agent/curator.py — orchestrator: config, idle gating, state-machine
transitions (pure, no LLM), forked-agent review prompt
- hermes_cli/curator.py — `hermes curator {status,run,pause,resume,
pin,unpin,restore}` subcommand
- tests/tools/test_skill_usage.py — 29 tests
- tests/agent/test_curator.py — 25 tests
Modified files (surgical patches):
- tools/skills_tool.py — bump view_count on successful skill_view
- tools/skill_manager_tool.py — bump patch_count on skill_manage
patch/edit/write_file/remove_file; forget record on delete
- hermes_cli/config.py — add curator: section to DEFAULT_CONFIG
- hermes_cli/commands.py — add /curator CommandDef with subcommands
- hermes_cli/main.py — register `hermes curator` subparser via
register_cli() from hermes_cli.curator
- cli.py — /curator slash-command dispatch + startup hook
- gateway/run.py — gateway-boot hook (mirrors CLI)
Validation:
- 54 new tests across skill_usage + curator, all passing in 3s
- 346 tests across all touched files' neighbors green
- 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green
- CLI smoke: `hermes curator status/pause/resume` work end-to-end
Companion to PR #16026 (class-first skill review prompt) — together
they form a loop: the review prompt stops near-duplicate skill creation
at the source, and the curator prunes/consolidates what still accumulates.
Refs #7816.
2026-04-26 06:08:39 -07:00
|
|
|
|
|
|
|
|
from hermes_constants import get_hermes_home
|
|
|
|
|
from tools import skill_usage
|
|
|
|
|
|
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
|
|
|
|
|
|
|
2026-05-02 01:13:17 +08:00
|
|
|
def _strip_aux_credential(value: Any) -> Optional[str]:
|
|
|
|
|
if value is None:
|
|
|
|
|
return None
|
|
|
|
|
text = str(value).strip()
|
|
|
|
|
return text or None
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
class _ReviewRuntimeBinding(NamedTuple):
|
|
|
|
|
"""Provider/model for the curator review fork plus optional per-slot overrides."""
|
|
|
|
|
|
|
|
|
|
provider: str
|
|
|
|
|
model: str
|
|
|
|
|
explicit_api_key: Optional[str]
|
|
|
|
|
explicit_base_url: Optional[str]
|
|
|
|
|
|
|
|
|
|
|
2026-04-26 06:32:18 -07:00
|
|
|
DEFAULT_INTERVAL_HOURS = 24 * 7 # 7 days
|
feat(curator): background skill maintenance (issue #7816)
Adds the Curator — an auxiliary-model background task that periodically
reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage,
transitions unused skills through active → stale → archived, and spawns
a forked AIAgent to consolidate overlaps and patch drift.
Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI
startup and gateway boot when the last run is older than interval_hours
(default 24) AND the agent has been idle for min_idle_hours (default 2).
Invariants (all load-bearing):
- Never touches bundled or hub-installed skills (.bundled_manifest +
.hub/lock.json double-filter)
- Never auto-deletes — archive only. Archives are recoverable
via `hermes curator restore <skill>`
- Pinned skills bypass all auto-transitions
- Uses the aux client; never touches the main session's prompt cache
New files:
- tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes,
provenance filter
- agent/curator.py — orchestrator: config, idle gating, state-machine
transitions (pure, no LLM), forked-agent review prompt
- hermes_cli/curator.py — `hermes curator {status,run,pause,resume,
pin,unpin,restore}` subcommand
- tests/tools/test_skill_usage.py — 29 tests
- tests/agent/test_curator.py — 25 tests
Modified files (surgical patches):
- tools/skills_tool.py — bump view_count on successful skill_view
- tools/skill_manager_tool.py — bump patch_count on skill_manage
patch/edit/write_file/remove_file; forget record on delete
- hermes_cli/config.py — add curator: section to DEFAULT_CONFIG
- hermes_cli/commands.py — add /curator CommandDef with subcommands
- hermes_cli/main.py — register `hermes curator` subparser via
register_cli() from hermes_cli.curator
- cli.py — /curator slash-command dispatch + startup hook
- gateway/run.py — gateway-boot hook (mirrors CLI)
Validation:
- 54 new tests across skill_usage + curator, all passing in 3s
- 346 tests across all touched files' neighbors green
- 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green
- CLI smoke: `hermes curator status/pause/resume` work end-to-end
Companion to PR #16026 (class-first skill review prompt) — together
they form a loop: the review prompt stops near-duplicate skill creation
at the source, and the curator prunes/consolidates what still accumulates.
Refs #7816.
2026-04-26 06:08:39 -07:00
|
|
|
DEFAULT_MIN_IDLE_HOURS = 2
|
|
|
|
|
DEFAULT_STALE_AFTER_DAYS = 30
|
|
|
|
|
DEFAULT_ARCHIVE_AFTER_DAYS = 90
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
# .curator_state — persistent scheduler + status
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
def _state_file() -> Path:
|
|
|
|
|
return get_hermes_home() / "skills" / ".curator_state"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _default_state() -> Dict[str, Any]:
|
|
|
|
|
return {
|
|
|
|
|
"last_run_at": None,
|
|
|
|
|
"last_run_duration_seconds": None,
|
|
|
|
|
"last_run_summary": None,
|
2026-04-30 21:36:40 +03:00
|
|
|
"last_report_path": None,
|
feat(curator): background skill maintenance (issue #7816)
Adds the Curator — an auxiliary-model background task that periodically
reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage,
transitions unused skills through active → stale → archived, and spawns
a forked AIAgent to consolidate overlaps and patch drift.
Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI
startup and gateway boot when the last run is older than interval_hours
(default 24) AND the agent has been idle for min_idle_hours (default 2).
Invariants (all load-bearing):
- Never touches bundled or hub-installed skills (.bundled_manifest +
.hub/lock.json double-filter)
- Never auto-deletes — archive only. Archives are recoverable
via `hermes curator restore <skill>`
- Pinned skills bypass all auto-transitions
- Uses the aux client; never touches the main session's prompt cache
New files:
- tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes,
provenance filter
- agent/curator.py — orchestrator: config, idle gating, state-machine
transitions (pure, no LLM), forked-agent review prompt
- hermes_cli/curator.py — `hermes curator {status,run,pause,resume,
pin,unpin,restore}` subcommand
- tests/tools/test_skill_usage.py — 29 tests
- tests/agent/test_curator.py — 25 tests
Modified files (surgical patches):
- tools/skills_tool.py — bump view_count on successful skill_view
- tools/skill_manager_tool.py — bump patch_count on skill_manage
patch/edit/write_file/remove_file; forget record on delete
- hermes_cli/config.py — add curator: section to DEFAULT_CONFIG
- hermes_cli/commands.py — add /curator CommandDef with subcommands
- hermes_cli/main.py — register `hermes curator` subparser via
register_cli() from hermes_cli.curator
- cli.py — /curator slash-command dispatch + startup hook
- gateway/run.py — gateway-boot hook (mirrors CLI)
Validation:
- 54 new tests across skill_usage + curator, all passing in 3s
- 346 tests across all touched files' neighbors green
- 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green
- CLI smoke: `hermes curator status/pause/resume` work end-to-end
Companion to PR #16026 (class-first skill review prompt) — together
they form a loop: the review prompt stops near-duplicate skill creation
at the source, and the curator prunes/consolidates what still accumulates.
Refs #7816.
2026-04-26 06:08:39 -07:00
|
|
|
"paused": False,
|
|
|
|
|
"run_count": 0,
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def load_state() -> Dict[str, Any]:
|
|
|
|
|
path = _state_file()
|
|
|
|
|
if not path.exists():
|
|
|
|
|
return _default_state()
|
|
|
|
|
try:
|
|
|
|
|
data = json.loads(path.read_text(encoding="utf-8"))
|
|
|
|
|
if isinstance(data, dict):
|
|
|
|
|
base = _default_state()
|
|
|
|
|
base.update({k: v for k, v in data.items() if k in base or k.startswith("_")})
|
|
|
|
|
return base
|
|
|
|
|
except (OSError, json.JSONDecodeError) as e:
|
|
|
|
|
logger.debug("Failed to read curator state: %s", e)
|
|
|
|
|
return _default_state()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def save_state(data: Dict[str, Any]) -> None:
|
|
|
|
|
path = _state_file()
|
|
|
|
|
try:
|
|
|
|
|
path.parent.mkdir(parents=True, exist_ok=True)
|
|
|
|
|
fd, tmp = tempfile.mkstemp(dir=str(path.parent), prefix=".curator_state_", suffix=".tmp")
|
|
|
|
|
try:
|
|
|
|
|
with os.fdopen(fd, "w", encoding="utf-8") as f:
|
|
|
|
|
json.dump(data, f, indent=2, sort_keys=True, ensure_ascii=False)
|
|
|
|
|
f.flush()
|
|
|
|
|
os.fsync(f.fileno())
|
|
|
|
|
os.replace(tmp, path)
|
|
|
|
|
except BaseException:
|
|
|
|
|
try:
|
|
|
|
|
os.unlink(tmp)
|
|
|
|
|
except OSError:
|
|
|
|
|
pass
|
|
|
|
|
raise
|
|
|
|
|
except Exception as e:
|
|
|
|
|
logger.debug("Failed to save curator state: %s", e, exc_info=True)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def set_paused(paused: bool) -> None:
|
|
|
|
|
state = load_state()
|
|
|
|
|
state["paused"] = bool(paused)
|
|
|
|
|
save_state(state)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def is_paused() -> bool:
|
|
|
|
|
return bool(load_state().get("paused"))
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
# Config access
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
def _load_config() -> Dict[str, Any]:
|
|
|
|
|
"""Read curator.* config from ~/.hermes/config.yaml. Tolerates missing file."""
|
|
|
|
|
try:
|
|
|
|
|
from hermes_cli.config import load_config
|
|
|
|
|
cfg = load_config()
|
|
|
|
|
except Exception as e:
|
|
|
|
|
logger.debug("Failed to load config for curator: %s", e)
|
|
|
|
|
return {}
|
|
|
|
|
if not isinstance(cfg, dict):
|
|
|
|
|
return {}
|
|
|
|
|
cur = cfg.get("curator") or {}
|
|
|
|
|
if not isinstance(cur, dict):
|
|
|
|
|
return {}
|
|
|
|
|
return cur
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def is_enabled() -> bool:
|
|
|
|
|
"""Default ON when no config says otherwise."""
|
|
|
|
|
cfg = _load_config()
|
|
|
|
|
return bool(cfg.get("enabled", True))
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def get_interval_hours() -> int:
|
|
|
|
|
cfg = _load_config()
|
|
|
|
|
try:
|
|
|
|
|
return int(cfg.get("interval_hours", DEFAULT_INTERVAL_HOURS))
|
|
|
|
|
except (TypeError, ValueError):
|
|
|
|
|
return DEFAULT_INTERVAL_HOURS
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def get_min_idle_hours() -> float:
|
|
|
|
|
cfg = _load_config()
|
|
|
|
|
try:
|
|
|
|
|
return float(cfg.get("min_idle_hours", DEFAULT_MIN_IDLE_HOURS))
|
|
|
|
|
except (TypeError, ValueError):
|
|
|
|
|
return DEFAULT_MIN_IDLE_HOURS
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def get_stale_after_days() -> int:
|
|
|
|
|
cfg = _load_config()
|
|
|
|
|
try:
|
|
|
|
|
return int(cfg.get("stale_after_days", DEFAULT_STALE_AFTER_DAYS))
|
|
|
|
|
except (TypeError, ValueError):
|
|
|
|
|
return DEFAULT_STALE_AFTER_DAYS
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def get_archive_after_days() -> int:
|
|
|
|
|
cfg = _load_config()
|
|
|
|
|
try:
|
|
|
|
|
return int(cfg.get("archive_after_days", DEFAULT_ARCHIVE_AFTER_DAYS))
|
|
|
|
|
except (TypeError, ValueError):
|
|
|
|
|
return DEFAULT_ARCHIVE_AFTER_DAYS
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
# Idle / interval check
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
def _parse_iso(ts: Optional[str]) -> Optional[datetime]:
|
|
|
|
|
if not ts:
|
|
|
|
|
return None
|
|
|
|
|
try:
|
|
|
|
|
return datetime.fromisoformat(ts)
|
|
|
|
|
except (TypeError, ValueError):
|
|
|
|
|
return None
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def should_run_now(now: Optional[datetime] = None) -> bool:
|
|
|
|
|
"""Return True if the curator should run immediately.
|
|
|
|
|
|
|
|
|
|
Gates:
|
|
|
|
|
- curator.enabled == True
|
|
|
|
|
- not paused
|
fix(curator): defer first run and add --dry-run preview (#18373) (#18389)
* fix(curator): defer first run and add --dry-run preview (#18373)
Curator was meant to run 7 days after install, not on the very first
gateway tick. On a fresh install (no .curator_state), should_run_now()
returned True immediately because last_run_at was None — so the gateway
cron ticker fired Curator against a fresh skill library moments after
'hermes update'. Combined with the binary 'agent-created' provenance
model (anything not bundled and not hub-installed), this consolidated
hand-authored user workflow skills without consent.
Changes:
- should_run_now(): first observation seeds last_run_at='now' and returns
False. The next real pass fires one full interval_hours later (7 days
by default), matching the original design intent.
- hermes curator run --dry-run: produces the same review report without
applying automatic transitions OR permitting the LLM to call
skill_manage / terminal mv. A DRY-RUN banner is prepended to the
prompt and the caller skips apply_automatic_transitions. State is
NOT advanced so a preview doesn't defer the next scheduled real pass.
- hermes update: prints a one-liner on fresh installs pointing at
--dry-run, pause, and the docs. Silent on steady state.
- Docs: curator.md and cli-commands.md explain the deferred first-run
behavior and warn that hand-written SKILL.md files share the
'agent-created' bucket, with guidance to pin or preview before the
first pass.
Tests:
- test_first_run_defers replaces the old 'first run always eligible'
assertion — same fixture, inverted expectation.
- test_maybe_run_curator_defers_on_fresh_install covers the gateway tick
path end-to-end.
- Three new dry-run tests cover state-advance suppression, prompt
banner injection, and apply_automatic_transitions skipping.
Fixes #18373.
* feat(curator): pre-run backup + rollback (#18373)
Every real curator pass now snapshots ~/.hermes/skills/ into
~/.hermes/skills/.curator_backups/<utc-iso>/skills.tar.gz before calling
apply_automatic_transitions or the LLM review. If a run consolidates or
archives something the user didn't want touched, 'hermes curator
rollback' restores the tree in one command. Dry-run is skipped — no
mutation means no snapshot needed.
Changes:
- agent/curator_backup.py (new): tar.gz snapshot + safe rollback. The
snapshot excludes .curator_backups/ (would recurse) and .hub/ (managed
by the skills hub). Extract refuses absolute paths and .. components,
and uses tarfile's filter='data' on Python 3.12+. Rollback takes a
pre-rollback safety snapshot FIRST, stages the current tree into
.rollback-staging-<ts>/ so the extract lands in an empty dir, and
cleans the staging dir on success. A failed extract restores the
staged contents.
- agent/curator.py: run_curator_review() calls curator_backup.
snapshot_skills(reason='pre-curator-run') before apply_automatic_
transitions. Best-effort — a failed snapshot logs at debug and the
run continues (a transient disk issue shouldn't silently disable
curator forever).
- hermes_cli/curator.py: new 'hermes curator backup' and 'hermes curator
rollback' subcommands. rollback supports --list, --id <ts>, -y.
- hermes_cli/config.py: curator.backup.{enabled, keep} config block
with sane defaults (enabled=true, keep=5).
- Docs: curator.md gets a 'Backups and rollback' section; cli-commands
.md table gets the new rows.
Tests (new file tests/agent/test_curator_backup.py, 16 cases):
- snapshot creates tarball + manifest with correct counts
- snapshot excludes .curator_backups/ (recursion guard) and .hub/
- snapshot disabled via config returns None without creating anything
- snapshot uniquifies ids within the same second (-01 suffix)
- prune honors keep count, newest-first
- list_backups + _resolve_backup cover newest-default and unknown-id
- rollback restores a deleted skill with content intact
- rollback is itself undoable — safety snapshot shows up in list_backups
- rollback with no snapshots returns an error
- rollback refuses tarballs with absolute paths or .. components
- real curator runs take a 'pre-curator-run' snapshot; dry-runs do not
All curator tests: 210 passing locally.
2026-05-01 09:49:59 -07:00
|
|
|
- last_run_at present AND older than interval_hours
|
|
|
|
|
|
|
|
|
|
First-run behavior: when there is no ``last_run_at`` (fresh install, or
|
|
|
|
|
install that predates the curator), we DO NOT run immediately. The
|
|
|
|
|
curator is designed to run after at least ``interval_hours`` (7 days by
|
|
|
|
|
default) of skill activity, not on the first background tick after
|
|
|
|
|
``hermes update``. On first observation we seed ``last_run_at`` to "now"
|
|
|
|
|
and defer the first real pass by one full interval. Users who want to
|
|
|
|
|
run it sooner can always invoke ``hermes curator run`` (with or without
|
|
|
|
|
``--dry-run``) explicitly — that path bypasses this gate.
|
feat(curator): background skill maintenance (issue #7816)
Adds the Curator — an auxiliary-model background task that periodically
reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage,
transitions unused skills through active → stale → archived, and spawns
a forked AIAgent to consolidate overlaps and patch drift.
Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI
startup and gateway boot when the last run is older than interval_hours
(default 24) AND the agent has been idle for min_idle_hours (default 2).
Invariants (all load-bearing):
- Never touches bundled or hub-installed skills (.bundled_manifest +
.hub/lock.json double-filter)
- Never auto-deletes — archive only. Archives are recoverable
via `hermes curator restore <skill>`
- Pinned skills bypass all auto-transitions
- Uses the aux client; never touches the main session's prompt cache
New files:
- tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes,
provenance filter
- agent/curator.py — orchestrator: config, idle gating, state-machine
transitions (pure, no LLM), forked-agent review prompt
- hermes_cli/curator.py — `hermes curator {status,run,pause,resume,
pin,unpin,restore}` subcommand
- tests/tools/test_skill_usage.py — 29 tests
- tests/agent/test_curator.py — 25 tests
Modified files (surgical patches):
- tools/skills_tool.py — bump view_count on successful skill_view
- tools/skill_manager_tool.py — bump patch_count on skill_manage
patch/edit/write_file/remove_file; forget record on delete
- hermes_cli/config.py — add curator: section to DEFAULT_CONFIG
- hermes_cli/commands.py — add /curator CommandDef with subcommands
- hermes_cli/main.py — register `hermes curator` subparser via
register_cli() from hermes_cli.curator
- cli.py — /curator slash-command dispatch + startup hook
- gateway/run.py — gateway-boot hook (mirrors CLI)
Validation:
- 54 new tests across skill_usage + curator, all passing in 3s
- 346 tests across all touched files' neighbors green
- 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green
- CLI smoke: `hermes curator status/pause/resume` work end-to-end
Companion to PR #16026 (class-first skill review prompt) — together
they form a loop: the review prompt stops near-duplicate skill creation
at the source, and the curator prunes/consolidates what still accumulates.
Refs #7816.
2026-04-26 06:08:39 -07:00
|
|
|
|
|
|
|
|
The idle check (min_idle_hours) is applied at the call site where we know
|
|
|
|
|
whether an agent is actively running — here we only enforce the static
|
|
|
|
|
gates.
|
|
|
|
|
"""
|
|
|
|
|
if not is_enabled():
|
|
|
|
|
return False
|
|
|
|
|
if is_paused():
|
|
|
|
|
return False
|
|
|
|
|
|
|
|
|
|
state = load_state()
|
|
|
|
|
last = _parse_iso(state.get("last_run_at"))
|
|
|
|
|
if last is None:
|
fix(curator): defer first run and add --dry-run preview (#18373) (#18389)
* fix(curator): defer first run and add --dry-run preview (#18373)
Curator was meant to run 7 days after install, not on the very first
gateway tick. On a fresh install (no .curator_state), should_run_now()
returned True immediately because last_run_at was None — so the gateway
cron ticker fired Curator against a fresh skill library moments after
'hermes update'. Combined with the binary 'agent-created' provenance
model (anything not bundled and not hub-installed), this consolidated
hand-authored user workflow skills without consent.
Changes:
- should_run_now(): first observation seeds last_run_at='now' and returns
False. The next real pass fires one full interval_hours later (7 days
by default), matching the original design intent.
- hermes curator run --dry-run: produces the same review report without
applying automatic transitions OR permitting the LLM to call
skill_manage / terminal mv. A DRY-RUN banner is prepended to the
prompt and the caller skips apply_automatic_transitions. State is
NOT advanced so a preview doesn't defer the next scheduled real pass.
- hermes update: prints a one-liner on fresh installs pointing at
--dry-run, pause, and the docs. Silent on steady state.
- Docs: curator.md and cli-commands.md explain the deferred first-run
behavior and warn that hand-written SKILL.md files share the
'agent-created' bucket, with guidance to pin or preview before the
first pass.
Tests:
- test_first_run_defers replaces the old 'first run always eligible'
assertion — same fixture, inverted expectation.
- test_maybe_run_curator_defers_on_fresh_install covers the gateway tick
path end-to-end.
- Three new dry-run tests cover state-advance suppression, prompt
banner injection, and apply_automatic_transitions skipping.
Fixes #18373.
* feat(curator): pre-run backup + rollback (#18373)
Every real curator pass now snapshots ~/.hermes/skills/ into
~/.hermes/skills/.curator_backups/<utc-iso>/skills.tar.gz before calling
apply_automatic_transitions or the LLM review. If a run consolidates or
archives something the user didn't want touched, 'hermes curator
rollback' restores the tree in one command. Dry-run is skipped — no
mutation means no snapshot needed.
Changes:
- agent/curator_backup.py (new): tar.gz snapshot + safe rollback. The
snapshot excludes .curator_backups/ (would recurse) and .hub/ (managed
by the skills hub). Extract refuses absolute paths and .. components,
and uses tarfile's filter='data' on Python 3.12+. Rollback takes a
pre-rollback safety snapshot FIRST, stages the current tree into
.rollback-staging-<ts>/ so the extract lands in an empty dir, and
cleans the staging dir on success. A failed extract restores the
staged contents.
- agent/curator.py: run_curator_review() calls curator_backup.
snapshot_skills(reason='pre-curator-run') before apply_automatic_
transitions. Best-effort — a failed snapshot logs at debug and the
run continues (a transient disk issue shouldn't silently disable
curator forever).
- hermes_cli/curator.py: new 'hermes curator backup' and 'hermes curator
rollback' subcommands. rollback supports --list, --id <ts>, -y.
- hermes_cli/config.py: curator.backup.{enabled, keep} config block
with sane defaults (enabled=true, keep=5).
- Docs: curator.md gets a 'Backups and rollback' section; cli-commands
.md table gets the new rows.
Tests (new file tests/agent/test_curator_backup.py, 16 cases):
- snapshot creates tarball + manifest with correct counts
- snapshot excludes .curator_backups/ (recursion guard) and .hub/
- snapshot disabled via config returns None without creating anything
- snapshot uniquifies ids within the same second (-01 suffix)
- prune honors keep count, newest-first
- list_backups + _resolve_backup cover newest-default and unknown-id
- rollback restores a deleted skill with content intact
- rollback is itself undoable — safety snapshot shows up in list_backups
- rollback with no snapshots returns an error
- rollback refuses tarballs with absolute paths or .. components
- real curator runs take a 'pre-curator-run' snapshot; dry-runs do not
All curator tests: 210 passing locally.
2026-05-01 09:49:59 -07:00
|
|
|
# Never run before. Seed state so we wait a full interval before the
|
|
|
|
|
# first real pass. Report-only; do not auto-mutate the library the
|
|
|
|
|
# very first time a gateway ticks after an update.
|
|
|
|
|
if now is None:
|
|
|
|
|
now = datetime.now(timezone.utc)
|
|
|
|
|
try:
|
|
|
|
|
state["last_run_at"] = now.isoformat()
|
|
|
|
|
state["last_run_summary"] = (
|
|
|
|
|
"deferred first run — curator seeded, will run after one "
|
|
|
|
|
"interval; use `hermes curator run --dry-run` to preview now"
|
|
|
|
|
)
|
|
|
|
|
save_state(state)
|
|
|
|
|
except Exception as e: # pragma: no cover — best-effort persistence
|
|
|
|
|
logger.debug("Failed to seed curator last_run_at: %s", e)
|
|
|
|
|
return False
|
feat(curator): background skill maintenance (issue #7816)
Adds the Curator — an auxiliary-model background task that periodically
reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage,
transitions unused skills through active → stale → archived, and spawns
a forked AIAgent to consolidate overlaps and patch drift.
Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI
startup and gateway boot when the last run is older than interval_hours
(default 24) AND the agent has been idle for min_idle_hours (default 2).
Invariants (all load-bearing):
- Never touches bundled or hub-installed skills (.bundled_manifest +
.hub/lock.json double-filter)
- Never auto-deletes — archive only. Archives are recoverable
via `hermes curator restore <skill>`
- Pinned skills bypass all auto-transitions
- Uses the aux client; never touches the main session's prompt cache
New files:
- tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes,
provenance filter
- agent/curator.py — orchestrator: config, idle gating, state-machine
transitions (pure, no LLM), forked-agent review prompt
- hermes_cli/curator.py — `hermes curator {status,run,pause,resume,
pin,unpin,restore}` subcommand
- tests/tools/test_skill_usage.py — 29 tests
- tests/agent/test_curator.py — 25 tests
Modified files (surgical patches):
- tools/skills_tool.py — bump view_count on successful skill_view
- tools/skill_manager_tool.py — bump patch_count on skill_manage
patch/edit/write_file/remove_file; forget record on delete
- hermes_cli/config.py — add curator: section to DEFAULT_CONFIG
- hermes_cli/commands.py — add /curator CommandDef with subcommands
- hermes_cli/main.py — register `hermes curator` subparser via
register_cli() from hermes_cli.curator
- cli.py — /curator slash-command dispatch + startup hook
- gateway/run.py — gateway-boot hook (mirrors CLI)
Validation:
- 54 new tests across skill_usage + curator, all passing in 3s
- 346 tests across all touched files' neighbors green
- 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green
- CLI smoke: `hermes curator status/pause/resume` work end-to-end
Companion to PR #16026 (class-first skill review prompt) — together
they form a loop: the review prompt stops near-duplicate skill creation
at the source, and the curator prunes/consolidates what still accumulates.
Refs #7816.
2026-04-26 06:08:39 -07:00
|
|
|
|
|
|
|
|
if now is None:
|
|
|
|
|
now = datetime.now(timezone.utc)
|
|
|
|
|
if last.tzinfo is None:
|
|
|
|
|
last = last.replace(tzinfo=timezone.utc)
|
|
|
|
|
interval = timedelta(hours=get_interval_hours())
|
|
|
|
|
return (now - last) >= interval
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
# Automatic state transitions (pure function, no LLM)
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
def apply_automatic_transitions(now: Optional[datetime] = None) -> Dict[str, int]:
|
|
|
|
|
"""Walk every agent-created skill and move active/stale/archived based on
|
2026-04-30 20:23:17 +08:00
|
|
|
the latest real activity timestamp. Pinned skills are never touched.
|
|
|
|
|
Returns a counter dict describing what changed."""
|
feat(curator): background skill maintenance (issue #7816)
Adds the Curator — an auxiliary-model background task that periodically
reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage,
transitions unused skills through active → stale → archived, and spawns
a forked AIAgent to consolidate overlaps and patch drift.
Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI
startup and gateway boot when the last run is older than interval_hours
(default 24) AND the agent has been idle for min_idle_hours (default 2).
Invariants (all load-bearing):
- Never touches bundled or hub-installed skills (.bundled_manifest +
.hub/lock.json double-filter)
- Never auto-deletes — archive only. Archives are recoverable
via `hermes curator restore <skill>`
- Pinned skills bypass all auto-transitions
- Uses the aux client; never touches the main session's prompt cache
New files:
- tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes,
provenance filter
- agent/curator.py — orchestrator: config, idle gating, state-machine
transitions (pure, no LLM), forked-agent review prompt
- hermes_cli/curator.py — `hermes curator {status,run,pause,resume,
pin,unpin,restore}` subcommand
- tests/tools/test_skill_usage.py — 29 tests
- tests/agent/test_curator.py — 25 tests
Modified files (surgical patches):
- tools/skills_tool.py — bump view_count on successful skill_view
- tools/skill_manager_tool.py — bump patch_count on skill_manage
patch/edit/write_file/remove_file; forget record on delete
- hermes_cli/config.py — add curator: section to DEFAULT_CONFIG
- hermes_cli/commands.py — add /curator CommandDef with subcommands
- hermes_cli/main.py — register `hermes curator` subparser via
register_cli() from hermes_cli.curator
- cli.py — /curator slash-command dispatch + startup hook
- gateway/run.py — gateway-boot hook (mirrors CLI)
Validation:
- 54 new tests across skill_usage + curator, all passing in 3s
- 346 tests across all touched files' neighbors green
- 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green
- CLI smoke: `hermes curator status/pause/resume` work end-to-end
Companion to PR #16026 (class-first skill review prompt) — together
they form a loop: the review prompt stops near-duplicate skill creation
at the source, and the curator prunes/consolidates what still accumulates.
Refs #7816.
2026-04-26 06:08:39 -07:00
|
|
|
from tools import skill_usage as _u
|
|
|
|
|
|
|
|
|
|
if now is None:
|
|
|
|
|
now = datetime.now(timezone.utc)
|
|
|
|
|
stale_cutoff = now - timedelta(days=get_stale_after_days())
|
|
|
|
|
archive_cutoff = now - timedelta(days=get_archive_after_days())
|
|
|
|
|
|
|
|
|
|
counts = {"marked_stale": 0, "archived": 0, "reactivated": 0, "checked": 0}
|
|
|
|
|
|
|
|
|
|
for row in _u.agent_created_report():
|
|
|
|
|
counts["checked"] += 1
|
|
|
|
|
name = row["name"]
|
|
|
|
|
if row.get("pinned"):
|
|
|
|
|
continue
|
|
|
|
|
|
2026-04-30 20:23:17 +08:00
|
|
|
last_activity = _parse_iso(row.get("last_activity_at"))
|
|
|
|
|
# If never active, treat created_at as the anchor so new skills don't
|
|
|
|
|
# immediately archive themselves.
|
|
|
|
|
anchor = last_activity or _parse_iso(row.get("created_at")) or now
|
feat(curator): background skill maintenance (issue #7816)
Adds the Curator — an auxiliary-model background task that periodically
reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage,
transitions unused skills through active → stale → archived, and spawns
a forked AIAgent to consolidate overlaps and patch drift.
Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI
startup and gateway boot when the last run is older than interval_hours
(default 24) AND the agent has been idle for min_idle_hours (default 2).
Invariants (all load-bearing):
- Never touches bundled or hub-installed skills (.bundled_manifest +
.hub/lock.json double-filter)
- Never auto-deletes — archive only. Archives are recoverable
via `hermes curator restore <skill>`
- Pinned skills bypass all auto-transitions
- Uses the aux client; never touches the main session's prompt cache
New files:
- tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes,
provenance filter
- agent/curator.py — orchestrator: config, idle gating, state-machine
transitions (pure, no LLM), forked-agent review prompt
- hermes_cli/curator.py — `hermes curator {status,run,pause,resume,
pin,unpin,restore}` subcommand
- tests/tools/test_skill_usage.py — 29 tests
- tests/agent/test_curator.py — 25 tests
Modified files (surgical patches):
- tools/skills_tool.py — bump view_count on successful skill_view
- tools/skill_manager_tool.py — bump patch_count on skill_manage
patch/edit/write_file/remove_file; forget record on delete
- hermes_cli/config.py — add curator: section to DEFAULT_CONFIG
- hermes_cli/commands.py — add /curator CommandDef with subcommands
- hermes_cli/main.py — register `hermes curator` subparser via
register_cli() from hermes_cli.curator
- cli.py — /curator slash-command dispatch + startup hook
- gateway/run.py — gateway-boot hook (mirrors CLI)
Validation:
- 54 new tests across skill_usage + curator, all passing in 3s
- 346 tests across all touched files' neighbors green
- 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green
- CLI smoke: `hermes curator status/pause/resume` work end-to-end
Companion to PR #16026 (class-first skill review prompt) — together
they form a loop: the review prompt stops near-duplicate skill creation
at the source, and the curator prunes/consolidates what still accumulates.
Refs #7816.
2026-04-26 06:08:39 -07:00
|
|
|
if anchor.tzinfo is None:
|
|
|
|
|
anchor = anchor.replace(tzinfo=timezone.utc)
|
|
|
|
|
|
|
|
|
|
current = row.get("state", _u.STATE_ACTIVE)
|
|
|
|
|
|
|
|
|
|
if anchor <= archive_cutoff and current != _u.STATE_ARCHIVED:
|
|
|
|
|
ok, _msg = _u.archive_skill(name)
|
|
|
|
|
if ok:
|
|
|
|
|
counts["archived"] += 1
|
|
|
|
|
elif anchor <= stale_cutoff and current == _u.STATE_ACTIVE:
|
|
|
|
|
_u.set_state(name, _u.STATE_STALE)
|
|
|
|
|
counts["marked_stale"] += 1
|
|
|
|
|
elif anchor > stale_cutoff and current == _u.STATE_STALE:
|
|
|
|
|
# Skill got used again after being marked stale — reactivate.
|
|
|
|
|
_u.set_state(name, _u.STATE_ACTIVE)
|
|
|
|
|
counts["reactivated"] += 1
|
|
|
|
|
|
|
|
|
|
return counts
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
# Review prompt for the forked agent
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
fix(curator): defer first run and add --dry-run preview (#18373) (#18389)
* fix(curator): defer first run and add --dry-run preview (#18373)
Curator was meant to run 7 days after install, not on the very first
gateway tick. On a fresh install (no .curator_state), should_run_now()
returned True immediately because last_run_at was None — so the gateway
cron ticker fired Curator against a fresh skill library moments after
'hermes update'. Combined with the binary 'agent-created' provenance
model (anything not bundled and not hub-installed), this consolidated
hand-authored user workflow skills without consent.
Changes:
- should_run_now(): first observation seeds last_run_at='now' and returns
False. The next real pass fires one full interval_hours later (7 days
by default), matching the original design intent.
- hermes curator run --dry-run: produces the same review report without
applying automatic transitions OR permitting the LLM to call
skill_manage / terminal mv. A DRY-RUN banner is prepended to the
prompt and the caller skips apply_automatic_transitions. State is
NOT advanced so a preview doesn't defer the next scheduled real pass.
- hermes update: prints a one-liner on fresh installs pointing at
--dry-run, pause, and the docs. Silent on steady state.
- Docs: curator.md and cli-commands.md explain the deferred first-run
behavior and warn that hand-written SKILL.md files share the
'agent-created' bucket, with guidance to pin or preview before the
first pass.
Tests:
- test_first_run_defers replaces the old 'first run always eligible'
assertion — same fixture, inverted expectation.
- test_maybe_run_curator_defers_on_fresh_install covers the gateway tick
path end-to-end.
- Three new dry-run tests cover state-advance suppression, prompt
banner injection, and apply_automatic_transitions skipping.
Fixes #18373.
* feat(curator): pre-run backup + rollback (#18373)
Every real curator pass now snapshots ~/.hermes/skills/ into
~/.hermes/skills/.curator_backups/<utc-iso>/skills.tar.gz before calling
apply_automatic_transitions or the LLM review. If a run consolidates or
archives something the user didn't want touched, 'hermes curator
rollback' restores the tree in one command. Dry-run is skipped — no
mutation means no snapshot needed.
Changes:
- agent/curator_backup.py (new): tar.gz snapshot + safe rollback. The
snapshot excludes .curator_backups/ (would recurse) and .hub/ (managed
by the skills hub). Extract refuses absolute paths and .. components,
and uses tarfile's filter='data' on Python 3.12+. Rollback takes a
pre-rollback safety snapshot FIRST, stages the current tree into
.rollback-staging-<ts>/ so the extract lands in an empty dir, and
cleans the staging dir on success. A failed extract restores the
staged contents.
- agent/curator.py: run_curator_review() calls curator_backup.
snapshot_skills(reason='pre-curator-run') before apply_automatic_
transitions. Best-effort — a failed snapshot logs at debug and the
run continues (a transient disk issue shouldn't silently disable
curator forever).
- hermes_cli/curator.py: new 'hermes curator backup' and 'hermes curator
rollback' subcommands. rollback supports --list, --id <ts>, -y.
- hermes_cli/config.py: curator.backup.{enabled, keep} config block
with sane defaults (enabled=true, keep=5).
- Docs: curator.md gets a 'Backups and rollback' section; cli-commands
.md table gets the new rows.
Tests (new file tests/agent/test_curator_backup.py, 16 cases):
- snapshot creates tarball + manifest with correct counts
- snapshot excludes .curator_backups/ (recursion guard) and .hub/
- snapshot disabled via config returns None without creating anything
- snapshot uniquifies ids within the same second (-01 suffix)
- prune honors keep count, newest-first
- list_backups + _resolve_backup cover newest-default and unknown-id
- rollback restores a deleted skill with content intact
- rollback is itself undoable — safety snapshot shows up in list_backups
- rollback with no snapshots returns an error
- rollback refuses tarballs with absolute paths or .. components
- real curator runs take a 'pre-curator-run' snapshot; dry-runs do not
All curator tests: 210 passing locally.
2026-05-01 09:49:59 -07:00
|
|
|
CURATOR_DRY_RUN_BANNER = (
|
|
|
|
|
"═══════════════════════════════════════════════════════════════\n"
|
|
|
|
|
"DRY-RUN — REPORT ONLY. DO NOT MUTATE THE SKILL LIBRARY.\n"
|
|
|
|
|
"═══════════════════════════════════════════════════════════════\n"
|
|
|
|
|
"\n"
|
|
|
|
|
"This is a PREVIEW pass. Follow every instruction below EXCEPT:\n"
|
|
|
|
|
"\n"
|
|
|
|
|
" • DO NOT call skill_manage with action=patch, create, delete, "
|
|
|
|
|
"write_file, or remove_file.\n"
|
|
|
|
|
" • DO NOT call terminal to mv skill directories into .archive/.\n"
|
|
|
|
|
" • DO NOT call terminal to mv, cp, rm, or rewrite any file under "
|
|
|
|
|
"~/.hermes/skills/.\n"
|
|
|
|
|
" • skills_list and skill_view are FINE — read as much as you need.\n"
|
|
|
|
|
"\n"
|
|
|
|
|
"Your output IS the deliverable. Produce the exact same "
|
|
|
|
|
"human-readable summary and structured YAML block you would "
|
|
|
|
|
"produce on a live run — but describe the actions you WOULD take, "
|
|
|
|
|
"not actions you took. A downstream reviewer will read the report "
|
|
|
|
|
"and decide whether to approve a live run with "
|
|
|
|
|
"`hermes curator run` (no flag).\n"
|
|
|
|
|
"\n"
|
|
|
|
|
"If you accidentally take a mutating action, say so explicitly in "
|
|
|
|
|
"the summary so the reviewer can revert it.\n"
|
|
|
|
|
"═══════════════════════════════════════════════════════════════"
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
|
feat(curator): background skill maintenance (issue #7816)
Adds the Curator — an auxiliary-model background task that periodically
reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage,
transitions unused skills through active → stale → archived, and spawns
a forked AIAgent to consolidate overlaps and patch drift.
Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI
startup and gateway boot when the last run is older than interval_hours
(default 24) AND the agent has been idle for min_idle_hours (default 2).
Invariants (all load-bearing):
- Never touches bundled or hub-installed skills (.bundled_manifest +
.hub/lock.json double-filter)
- Never auto-deletes — archive only. Archives are recoverable
via `hermes curator restore <skill>`
- Pinned skills bypass all auto-transitions
- Uses the aux client; never touches the main session's prompt cache
New files:
- tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes,
provenance filter
- agent/curator.py — orchestrator: config, idle gating, state-machine
transitions (pure, no LLM), forked-agent review prompt
- hermes_cli/curator.py — `hermes curator {status,run,pause,resume,
pin,unpin,restore}` subcommand
- tests/tools/test_skill_usage.py — 29 tests
- tests/agent/test_curator.py — 25 tests
Modified files (surgical patches):
- tools/skills_tool.py — bump view_count on successful skill_view
- tools/skill_manager_tool.py — bump patch_count on skill_manage
patch/edit/write_file/remove_file; forget record on delete
- hermes_cli/config.py — add curator: section to DEFAULT_CONFIG
- hermes_cli/commands.py — add /curator CommandDef with subcommands
- hermes_cli/main.py — register `hermes curator` subparser via
register_cli() from hermes_cli.curator
- cli.py — /curator slash-command dispatch + startup hook
- gateway/run.py — gateway-boot hook (mirrors CLI)
Validation:
- 54 new tests across skill_usage + curator, all passing in 3s
- 346 tests across all touched files' neighbors green
- 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green
- CLI smoke: `hermes curator status/pause/resume` work end-to-end
Companion to PR #16026 (class-first skill review prompt) — together
they form a loop: the review prompt stops near-duplicate skill creation
at the source, and the curator prunes/consolidates what still accumulates.
Refs #7816.
2026-04-26 06:08:39 -07:00
|
|
|
CURATOR_REVIEW_PROMPT = (
|
feat(curator): umbrella-first prompt, inherit parent config, unbounded iterations
Based on three live test runs against 346 agent-created skills on the
author's own setup (~6.5 min, opus-4.7, 86 API calls), the curator
prompt needed three sharpenings before it consistently produced real
umbrella consolidation instead of passive audit output:
**Umbrella-first framing.** The original 'decide keep/patch/archive/
consolidate' framing lets opus default to 'keep' whenever two skills
aren't byte-identical. The new prompt explicitly tells the reviewer
that pairwise distinctness is the wrong bar — the right question is
'would a human maintainer write this as N separate skills, or one
skill with N labeled subsections?' Expect 10-25 prefix clusters; merge
each into an umbrella via one of three methods.
**Three concrete consolidation methods.** (a) Merge into an existing
umbrella (patch the broadest skill, archive siblings); (b) Create a
new umbrella SKILL.md (skill_manage action=create); (c) Demote
session-specific detail into references/, templates/, or scripts/
under the umbrella via skill_manage action=write_file, then archive
the narrow sibling. This matches the support-file vocabulary the
review-prompt side already uses (PR #17213).
**Two observed bailouts pre-empted:** 'usage counters are zero so I
can't judge' (rule 4: judge on content, not use_count) and 'each has
a distinct trigger' (rule 5: pairwise distinctness is the wrong bar).
**Config-aware parent inheritance.** _run_llm_review() was building
AIAgent() without explicit provider/model, hitting an auto-resolve
path that returned empty credentials → HTTP 400 'No models provided'
against OpenRouter. Fork now inherits the user's main provider and
model (via load_config + resolve_runtime_provider) before spawning —
runs on whatever the user is currently on, OAuth-backed or
pool-backed included.
**Unbounded iteration ceiling.** max_iterations=8 was way too low for
an umbrella-build pass over hundreds of skills. A live pass takes
50-100 API calls (scanning, clustering, skill_view'ing candidates,
patching umbrellas, mv'ing siblings). Raised to 9999 — the natural
stopping criterion is 'no more clusters worth processing', not an
arbitrary tool-call budget.
**Tests updated:** test_curator_review_prompt_has_invariants accepts
DO NOT / MUST NOT and drops 'keep' from the required-verb set (the
umbrella-first prompt correctly deemphasizes 'keep' as a first-class
decision label since passive keep-everything is the failure mode
being prevented). Added test_curator_review_prompt_is_umbrella_first
asserting the umbrella framing, class-level thinking, references/
+ templates/ + scripts/ support-file mentions, and the 'use_count
is not evidence of value' pre-emption. Added
test_curator_review_prompt_offers_support_file_actions asserting
skill_manage action=create and action=write_file are both named.
**Live validation on author's setup:**
- Run 1 (old prompt): 3 archives, stopped after surveying — typical passive outcome
- Run 2 (consolidation prompt): 44 archives, 3 patches, surfaced the 50-skill mlops reorg duplicate bug but didn't umbrella
- Run 3 (this prompt): 249 archives + 18 new class-level umbrellas created, reducing agent-created skills from 346 → 118 with every archived skill's content preserved as references/ under its umbrella. Pinned skill untouched. Full report in PR description.
2026-04-28 22:07:02 -07:00
|
|
|
"You are running as Hermes' background skill CURATOR. This is an "
|
|
|
|
|
"UMBRELLA-BUILDING consolidation pass, not a passive audit and not a "
|
|
|
|
|
"duplicate-finder.\n\n"
|
|
|
|
|
"The goal of the skill collection is a LIBRARY OF CLASS-LEVEL "
|
|
|
|
|
"INSTRUCTIONS AND EXPERIENTIAL KNOWLEDGE. A collection of hundreds of "
|
|
|
|
|
"narrow skills where each one captures one session's specific bug is "
|
|
|
|
|
"a FAILURE of the library — not a feature. An agent searching skills "
|
|
|
|
|
"matches on descriptions, not on exact names; one broad umbrella "
|
|
|
|
|
"skill with labeled subsections beats five narrow siblings for "
|
|
|
|
|
"discoverability, not the other way around.\n\n"
|
|
|
|
|
"The right target shape is CLASS-LEVEL skills with rich SKILL.md "
|
|
|
|
|
"bodies + `references/`, `templates/`, and `scripts/` subfiles for "
|
|
|
|
|
"session-specific detail — not one-session-one-skill micro-entries.\n\n"
|
|
|
|
|
"Hard rules — do not violate:\n"
|
|
|
|
|
"1. DO NOT touch bundled or hub-installed skills. The candidate list "
|
feat(curator): background skill maintenance (issue #7816)
Adds the Curator — an auxiliary-model background task that periodically
reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage,
transitions unused skills through active → stale → archived, and spawns
a forked AIAgent to consolidate overlaps and patch drift.
Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI
startup and gateway boot when the last run is older than interval_hours
(default 24) AND the agent has been idle for min_idle_hours (default 2).
Invariants (all load-bearing):
- Never touches bundled or hub-installed skills (.bundled_manifest +
.hub/lock.json double-filter)
- Never auto-deletes — archive only. Archives are recoverable
via `hermes curator restore <skill>`
- Pinned skills bypass all auto-transitions
- Uses the aux client; never touches the main session's prompt cache
New files:
- tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes,
provenance filter
- agent/curator.py — orchestrator: config, idle gating, state-machine
transitions (pure, no LLM), forked-agent review prompt
- hermes_cli/curator.py — `hermes curator {status,run,pause,resume,
pin,unpin,restore}` subcommand
- tests/tools/test_skill_usage.py — 29 tests
- tests/agent/test_curator.py — 25 tests
Modified files (surgical patches):
- tools/skills_tool.py — bump view_count on successful skill_view
- tools/skill_manager_tool.py — bump patch_count on skill_manage
patch/edit/write_file/remove_file; forget record on delete
- hermes_cli/config.py — add curator: section to DEFAULT_CONFIG
- hermes_cli/commands.py — add /curator CommandDef with subcommands
- hermes_cli/main.py — register `hermes curator` subparser via
register_cli() from hermes_cli.curator
- cli.py — /curator slash-command dispatch + startup hook
- gateway/run.py — gateway-boot hook (mirrors CLI)
Validation:
- 54 new tests across skill_usage + curator, all passing in 3s
- 346 tests across all touched files' neighbors green
- 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green
- CLI smoke: `hermes curator status/pause/resume` work end-to-end
Companion to PR #16026 (class-first skill review prompt) — together
they form a loop: the review prompt stops near-duplicate skill creation
at the source, and the curator prunes/consolidates what still accumulates.
Refs #7816.
2026-04-26 06:08:39 -07:00
|
|
|
"below is already filtered to agent-created skills only.\n"
|
feat(curator): umbrella-first prompt, inherit parent config, unbounded iterations
Based on three live test runs against 346 agent-created skills on the
author's own setup (~6.5 min, opus-4.7, 86 API calls), the curator
prompt needed three sharpenings before it consistently produced real
umbrella consolidation instead of passive audit output:
**Umbrella-first framing.** The original 'decide keep/patch/archive/
consolidate' framing lets opus default to 'keep' whenever two skills
aren't byte-identical. The new prompt explicitly tells the reviewer
that pairwise distinctness is the wrong bar — the right question is
'would a human maintainer write this as N separate skills, or one
skill with N labeled subsections?' Expect 10-25 prefix clusters; merge
each into an umbrella via one of three methods.
**Three concrete consolidation methods.** (a) Merge into an existing
umbrella (patch the broadest skill, archive siblings); (b) Create a
new umbrella SKILL.md (skill_manage action=create); (c) Demote
session-specific detail into references/, templates/, or scripts/
under the umbrella via skill_manage action=write_file, then archive
the narrow sibling. This matches the support-file vocabulary the
review-prompt side already uses (PR #17213).
**Two observed bailouts pre-empted:** 'usage counters are zero so I
can't judge' (rule 4: judge on content, not use_count) and 'each has
a distinct trigger' (rule 5: pairwise distinctness is the wrong bar).
**Config-aware parent inheritance.** _run_llm_review() was building
AIAgent() without explicit provider/model, hitting an auto-resolve
path that returned empty credentials → HTTP 400 'No models provided'
against OpenRouter. Fork now inherits the user's main provider and
model (via load_config + resolve_runtime_provider) before spawning —
runs on whatever the user is currently on, OAuth-backed or
pool-backed included.
**Unbounded iteration ceiling.** max_iterations=8 was way too low for
an umbrella-build pass over hundreds of skills. A live pass takes
50-100 API calls (scanning, clustering, skill_view'ing candidates,
patching umbrellas, mv'ing siblings). Raised to 9999 — the natural
stopping criterion is 'no more clusters worth processing', not an
arbitrary tool-call budget.
**Tests updated:** test_curator_review_prompt_has_invariants accepts
DO NOT / MUST NOT and drops 'keep' from the required-verb set (the
umbrella-first prompt correctly deemphasizes 'keep' as a first-class
decision label since passive keep-everything is the failure mode
being prevented). Added test_curator_review_prompt_is_umbrella_first
asserting the umbrella framing, class-level thinking, references/
+ templates/ + scripts/ support-file mentions, and the 'use_count
is not evidence of value' pre-emption. Added
test_curator_review_prompt_offers_support_file_actions asserting
skill_manage action=create and action=write_file are both named.
**Live validation on author's setup:**
- Run 1 (old prompt): 3 archives, stopped after surveying — typical passive outcome
- Run 2 (consolidation prompt): 44 archives, 3 patches, surfaced the 50-skill mlops reorg duplicate bug but didn't umbrella
- Run 3 (this prompt): 249 archives + 18 new class-level umbrellas created, reducing agent-created skills from 346 → 118 with every archived skill's content preserved as references/ under its umbrella. Pinned skill untouched. Full report in PR description.
2026-04-28 22:07:02 -07:00
|
|
|
"2. DO NOT delete any skill. Archiving (moving the skill's directory "
|
|
|
|
|
"into ~/.hermes/skills/.archive/) is the maximum destructive action. "
|
|
|
|
|
"Archives are recoverable; deletion is not.\n"
|
|
|
|
|
"3. DO NOT touch skills shown as pinned=yes. Skip them entirely.\n"
|
|
|
|
|
"4. DO NOT use usage counters as a reason to skip consolidation. The "
|
|
|
|
|
"counters are new and often mostly zero. Judge overlap on CONTENT, "
|
|
|
|
|
"not on use_count. 'use=0' is not evidence a skill is valuable; it's "
|
|
|
|
|
"absence of evidence either way.\n"
|
|
|
|
|
"5. DO NOT reject consolidation on the grounds that 'each skill has "
|
|
|
|
|
"a distinct trigger'. Pairwise distinctness is the wrong bar. The "
|
|
|
|
|
"right bar is: 'would a human maintainer write this as N separate "
|
|
|
|
|
"skills, or as one skill with N labeled subsections?' When the "
|
|
|
|
|
"answer is the latter, merge.\n\n"
|
|
|
|
|
"How to work — not optional:\n"
|
|
|
|
|
"1. Scan the full candidate list. Identify PREFIX CLUSTERS (skills "
|
|
|
|
|
"sharing a first word or domain keyword). Examples you are likely "
|
|
|
|
|
"to find: hermes-config-*, hermes-dashboard-*, gateway-*, codex-*, "
|
|
|
|
|
"ollama-*, anthropic-*, gemini-*, mcp-*, salvage-*, pr-*, "
|
|
|
|
|
"competitor-*, python-*, security-*, etc. Expect 10-25 clusters.\n"
|
|
|
|
|
"2. For each cluster with 2+ members, do NOT ask 'are these pairs "
|
|
|
|
|
"overlapping?' — ask 'what is the UMBRELLA CLASS these skills all "
|
|
|
|
|
"serve? Would a maintainer name that class and write one skill for "
|
|
|
|
|
"it?' If yes, pick (or create) the umbrella and absorb the siblings "
|
|
|
|
|
"into it.\n"
|
|
|
|
|
"3. Three ways to consolidate — use the right one per cluster:\n"
|
|
|
|
|
" a. MERGE INTO EXISTING UMBRELLA — one skill in the cluster is "
|
|
|
|
|
"already broad enough to be the umbrella (example: `pr-triage-"
|
|
|
|
|
"salvage` for the PR review cluster). Patch it to add a labeled "
|
|
|
|
|
"section for each sibling's unique insight, then archive the "
|
|
|
|
|
"siblings.\n"
|
|
|
|
|
" b. CREATE A NEW UMBRELLA SKILL.md — no existing member is broad "
|
|
|
|
|
"enough. Use skill_manage action=create to write a new class-level "
|
|
|
|
|
"skill whose SKILL.md covers the shared workflow and has short "
|
|
|
|
|
"labeled subsections. Archive the now-absorbed narrow siblings.\n"
|
|
|
|
|
" c. DEMOTE TO REFERENCES/TEMPLATES/SCRIPTS — a sibling has "
|
|
|
|
|
"narrow-but-valuable session-specific content. Move it into the "
|
|
|
|
|
"umbrella's appropriate support directory:\n"
|
|
|
|
|
" • `references/<topic>.md` for session-specific detail OR "
|
|
|
|
|
"condensed knowledge banks (quoted research, API docs excerpts, "
|
|
|
|
|
"domain notes, provider quirks, reproduction recipes)\n"
|
|
|
|
|
" • `templates/<name>.<ext>` for starter files meant to be "
|
|
|
|
|
"copied and modified\n"
|
|
|
|
|
" • `scripts/<name>.<ext>` for statically re-runnable actions "
|
|
|
|
|
"(verification scripts, fixture generators, probes)\n"
|
|
|
|
|
" Then archive the old sibling. Use `terminal` with `mkdir -p "
|
|
|
|
|
"~/.hermes/skills/<umbrella>/references/ && mv ... <umbrella>/"
|
|
|
|
|
"references/<topic>.md` (or templates/ / scripts/).\n"
|
|
|
|
|
"4. Also flag skills whose NAME is too narrow (contains a PR number, "
|
|
|
|
|
"a feature codename, a specific error string, an 'audit' / "
|
|
|
|
|
"'diagnosis' / 'salvage' session artifact). These almost always "
|
|
|
|
|
"belong as a subsection or support file under a class-level umbrella.\n"
|
|
|
|
|
"5. Iterate. After one consolidation round, scan the remaining set "
|
|
|
|
|
"and look for the NEXT umbrella opportunity. Don't stop after 3 "
|
|
|
|
|
"merges.\n\n"
|
2026-04-26 06:13:09 -07:00
|
|
|
"Your toolset:\n"
|
feat(curator): umbrella-first prompt, inherit parent config, unbounded iterations
Based on three live test runs against 346 agent-created skills on the
author's own setup (~6.5 min, opus-4.7, 86 API calls), the curator
prompt needed three sharpenings before it consistently produced real
umbrella consolidation instead of passive audit output:
**Umbrella-first framing.** The original 'decide keep/patch/archive/
consolidate' framing lets opus default to 'keep' whenever two skills
aren't byte-identical. The new prompt explicitly tells the reviewer
that pairwise distinctness is the wrong bar — the right question is
'would a human maintainer write this as N separate skills, or one
skill with N labeled subsections?' Expect 10-25 prefix clusters; merge
each into an umbrella via one of three methods.
**Three concrete consolidation methods.** (a) Merge into an existing
umbrella (patch the broadest skill, archive siblings); (b) Create a
new umbrella SKILL.md (skill_manage action=create); (c) Demote
session-specific detail into references/, templates/, or scripts/
under the umbrella via skill_manage action=write_file, then archive
the narrow sibling. This matches the support-file vocabulary the
review-prompt side already uses (PR #17213).
**Two observed bailouts pre-empted:** 'usage counters are zero so I
can't judge' (rule 4: judge on content, not use_count) and 'each has
a distinct trigger' (rule 5: pairwise distinctness is the wrong bar).
**Config-aware parent inheritance.** _run_llm_review() was building
AIAgent() without explicit provider/model, hitting an auto-resolve
path that returned empty credentials → HTTP 400 'No models provided'
against OpenRouter. Fork now inherits the user's main provider and
model (via load_config + resolve_runtime_provider) before spawning —
runs on whatever the user is currently on, OAuth-backed or
pool-backed included.
**Unbounded iteration ceiling.** max_iterations=8 was way too low for
an umbrella-build pass over hundreds of skills. A live pass takes
50-100 API calls (scanning, clustering, skill_view'ing candidates,
patching umbrellas, mv'ing siblings). Raised to 9999 — the natural
stopping criterion is 'no more clusters worth processing', not an
arbitrary tool-call budget.
**Tests updated:** test_curator_review_prompt_has_invariants accepts
DO NOT / MUST NOT and drops 'keep' from the required-verb set (the
umbrella-first prompt correctly deemphasizes 'keep' as a first-class
decision label since passive keep-everything is the failure mode
being prevented). Added test_curator_review_prompt_is_umbrella_first
asserting the umbrella framing, class-level thinking, references/
+ templates/ + scripts/ support-file mentions, and the 'use_count
is not evidence of value' pre-emption. Added
test_curator_review_prompt_offers_support_file_actions asserting
skill_manage action=create and action=write_file are both named.
**Live validation on author's setup:**
- Run 1 (old prompt): 3 archives, stopped after surveying — typical passive outcome
- Run 2 (consolidation prompt): 44 archives, 3 patches, surfaced the 50-skill mlops reorg duplicate bug but didn't umbrella
- Run 3 (this prompt): 249 archives + 18 new class-level umbrellas created, reducing agent-created skills from 346 → 118 with every archived skill's content preserved as references/ under its umbrella. Pinned skill untouched. Full report in PR description.
2026-04-28 22:07:02 -07:00
|
|
|
" - skills_list, skill_view — read the current landscape\n"
|
|
|
|
|
" - skill_manage action=patch — add sections to the umbrella\n"
|
|
|
|
|
" - skill_manage action=create — create a new umbrella SKILL.md\n"
|
|
|
|
|
" - skill_manage action=write_file — add a references/, templates/, "
|
|
|
|
|
"or scripts/ file under an existing skill (the skill must already "
|
|
|
|
|
"exist)\n"
|
fix(curator): authoritative absorbed_into on delete + restore cron skill links on rollback (#18671) (#18731)
* fix(curator): authoritative absorbed_into declarations on skill delete
Closes #18671. The classification pipeline that feeds cron-ref rewriting
used to infer consolidation vs pruning from two brittle signals: the
curator model's post-hoc YAML summary block, and a substring heuristic
scanning other tool calls for the removed skill's name. Both miss in
real consolidations — the model forgets the YAML under reasoning
pressure, and the heuristic misses when the umbrella's patch content
describes the absorbed behavior abstractly instead of naming the old
slug. When both miss, the skill falls through to 'no-evidence fallback'
pruned, and #18253's cron rewriter drops the cron ref entirely instead
of mapping it to the umbrella. Same observable symptom as pre-#18253:
'Skill(s) not found and skipped' at the next cron run.
The fix makes the model declare intent at the moment of deletion.
skill_manage(action='delete') now accepts absorbed_into:
- absorbed_into='<umbrella>' -> consolidated, target must exist on disk
- absorbed_into='' -> explicit prune, no forwarding target
- missing -> legacy path, falls through to heuristic/YAML
The curator reconciler reads these declarations off llm_meta.tool_calls
BEFORE either the YAML block or the substring heuristic. Declaration
wins. Fallback logic stays intact for backward compat with any caller
(human or older curator conversation) that doesn't populate the arg.
Changes
- tools/skill_manager_tool.py: add absorbed_into param to skill_manage
+ _delete_skill. Validate target exists when non-empty. Reject
absorbed_into=<self>. Wire through dispatcher + registry + schema.
- agent/curator.py: new _extract_absorbed_into_declarations() walks
tool calls for skill_manage(delete) with the arg. _reconcile_classification
accepts absorbed_declarations= and treats them as authoritative. Curator
prompt updated to require the arg on every delete.
- Tests: 7 new skill_manager tests covering the tool contract (valid
target, empty string, nonexistent target, self-reference, whitespace,
backward compat, dispatcher plumbing). 11 new curator tests covering
the extractor + authoritative reconciler path + mixed-legacy-and-
declared runs.
Validation
- 307/307 targeted tests pass (curator + cron + skill_manager suites).
- E2E #18671 repro: 3 narrow skills, 1 umbrella, cron job referencing
all 3. Model emits NO YAML block. Heuristic misses (patch prose
doesn't name old slugs). Delete calls carry absorbed_into. Result:
both PR skills correctly classified 'consolidated' + cron rewritten
['pr-review-format', 'pr-review-checklist', 'stale-junk'] ->
['hermes-agent-dev']; stale-junk pruned via absorbed_into=''.
- E2E backward-compat: delete without absorbed_into, model emits YAML
-> routed via existing 'model' source, cron still rewritten correctly.
* feat(curator): capture + restore cron skill links across snapshot/rollback
Before this, rolling back a curator run restored the skills tree but cron
jobs still pointed at the umbrella skills the curator had rewritten them
to. The user would see their old narrow skills back on disk but their
cron jobs still configured with the merged umbrella — not actually 'back
to how it was'.
Snapshot side: snapshot_skills() now captures ~/.hermes/cron/jobs.json
alongside the skills tarball, as cron-jobs.json. The manifest gets a new
'cron_jobs' block with {backed_up, jobs_count} so rollback (and the CLI
confirm dialog) can surface what's in the snapshot. If jobs.json is
missing/unreadable/malformed, snapshot proceeds without cron data — the
skills backup is the core guarantee; cron is additive.
Rollback side: after the skills extract succeeds, the new
_restore_cron_skill_links() reconciles the backed-up jobs into the live
jobs.json SURGICALLY. Only 'skills' and 'skill' fields are restored, and
only on jobs matched by id. Everything else about a cron job — schedule,
last_run_at, next_run_at, enabled, prompt, workdir, hooks — is live
state the user or scheduler has modified since the snapshot; overwriting
it would regress unrelated activity.
Reconciliation rules:
- Job in backup AND live, skills differ → skills restored.
- Job in backup AND live, skills match → no-op.
- Job in backup, NOT in live → skipped (user deleted it
after snapshot; their choice
is later than the snapshot).
- Job in live, NOT in backup → untouched (user created it
after snapshot).
- Snapshot missing cron-jobs.json at all → rollback still succeeds,
reports 'not captured'
(older pre-feature snapshots
keep working).
Writes go through cron.jobs.save_jobs under the same _jobs_file_lock the
scheduler uses, so rollback doesn't race tick().
Also:
- hermes_cli/curator.py: rollback confirm dialog now shows
'cron jobs: N (will be restored for skill-link fields only)' when the
snapshot has cron data, or 'not in snapshot (<reason>)' otherwise.
- rollback()'s message string includes a 'cron links: ...' clause
summarizing the reconciliation outcome.
Tests
- 9 new cases: snapshot-with-cron, snapshot-without-cron, malformed-json
captured-as-raw, full rollback-restores-skills-and-cron, rollback
touches only skill fields, rollback skips user-deleted jobs, rollback
leaves user-created jobs untouched, rollback still works with
pre-feature snapshot that has no cron-jobs.json, standalone unit test
on _restore_cron_skill_links exercising the full report shape.
Validation
- 484/484 targeted tests pass (curator + cron + skill_manager suites).
- E2E: real snapshot_skills, real cron rewrite, real rollback. Before:
['pr-review-format', 'pr-review-checklist', 'pr-triage-salvage'].
After curator: ['hermes-agent-dev']. After rollback: ['pr-review-format',
'pr-review-checklist', 'pr-triage-salvage']. Non-skill fields (id,
name, prompt) preserved across the round trip.
2026-05-02 01:29:57 -07:00
|
|
|
" - skill_manage action=delete — archive a skill. MUST pass "
|
|
|
|
|
"`absorbed_into=<umbrella>` when you've merged its content into another "
|
|
|
|
|
"skill, or `absorbed_into=\"\"` when you're truly pruning with no "
|
|
|
|
|
"forwarding target. This drives cron-job skill-reference migration — "
|
|
|
|
|
"guessing from your YAML summary after the fact is fragile.\n"
|
feat(curator): umbrella-first prompt, inherit parent config, unbounded iterations
Based on three live test runs against 346 agent-created skills on the
author's own setup (~6.5 min, opus-4.7, 86 API calls), the curator
prompt needed three sharpenings before it consistently produced real
umbrella consolidation instead of passive audit output:
**Umbrella-first framing.** The original 'decide keep/patch/archive/
consolidate' framing lets opus default to 'keep' whenever two skills
aren't byte-identical. The new prompt explicitly tells the reviewer
that pairwise distinctness is the wrong bar — the right question is
'would a human maintainer write this as N separate skills, or one
skill with N labeled subsections?' Expect 10-25 prefix clusters; merge
each into an umbrella via one of three methods.
**Three concrete consolidation methods.** (a) Merge into an existing
umbrella (patch the broadest skill, archive siblings); (b) Create a
new umbrella SKILL.md (skill_manage action=create); (c) Demote
session-specific detail into references/, templates/, or scripts/
under the umbrella via skill_manage action=write_file, then archive
the narrow sibling. This matches the support-file vocabulary the
review-prompt side already uses (PR #17213).
**Two observed bailouts pre-empted:** 'usage counters are zero so I
can't judge' (rule 4: judge on content, not use_count) and 'each has
a distinct trigger' (rule 5: pairwise distinctness is the wrong bar).
**Config-aware parent inheritance.** _run_llm_review() was building
AIAgent() without explicit provider/model, hitting an auto-resolve
path that returned empty credentials → HTTP 400 'No models provided'
against OpenRouter. Fork now inherits the user's main provider and
model (via load_config + resolve_runtime_provider) before spawning —
runs on whatever the user is currently on, OAuth-backed or
pool-backed included.
**Unbounded iteration ceiling.** max_iterations=8 was way too low for
an umbrella-build pass over hundreds of skills. A live pass takes
50-100 API calls (scanning, clustering, skill_view'ing candidates,
patching umbrellas, mv'ing siblings). Raised to 9999 — the natural
stopping criterion is 'no more clusters worth processing', not an
arbitrary tool-call budget.
**Tests updated:** test_curator_review_prompt_has_invariants accepts
DO NOT / MUST NOT and drops 'keep' from the required-verb set (the
umbrella-first prompt correctly deemphasizes 'keep' as a first-class
decision label since passive keep-everything is the failure mode
being prevented). Added test_curator_review_prompt_is_umbrella_first
asserting the umbrella framing, class-level thinking, references/
+ templates/ + scripts/ support-file mentions, and the 'use_count
is not evidence of value' pre-emption. Added
test_curator_review_prompt_offers_support_file_actions asserting
skill_manage action=create and action=write_file are both named.
**Live validation on author's setup:**
- Run 1 (old prompt): 3 archives, stopped after surveying — typical passive outcome
- Run 2 (consolidation prompt): 44 archives, 3 patches, surfaced the 50-skill mlops reorg duplicate bug but didn't umbrella
- Run 3 (this prompt): 249 archives + 18 new class-level umbrellas created, reducing agent-created skills from 346 → 118 with every archived skill's content preserved as references/ under its umbrella. Pinned skill untouched. Full report in PR description.
2026-04-28 22:07:02 -07:00
|
|
|
" - terminal — mv a sibling into the archive "
|
|
|
|
|
"OR move its content into a support subfile\n\n"
|
|
|
|
|
"'keep' is a legitimate decision ONLY when the skill is already a "
|
|
|
|
|
"class-level umbrella and none of the proposed merges would improve "
|
|
|
|
|
"discoverability. 'This is narrow but distinct from its siblings' "
|
|
|
|
|
"is NOT a reason to keep — it's a reason to move it under an "
|
|
|
|
|
"umbrella as a subsection or support file.\n\n"
|
|
|
|
|
"Expected output: real umbrella-ification. Process every obvious "
|
|
|
|
|
"cluster. If you end the pass with fewer than 10 archives, you "
|
|
|
|
|
"stopped too early — go back and look at the clusters you left "
|
|
|
|
|
"alone.\n\n"
|
feat(curator): split archived into consolidated vs pruned with model + heuristic classification (#17941)
* fix(curator): split 'archived' into consolidated vs pruned in run reports
Users who watched a curator run saw skills like 'anthropic-api' listed
under 'Skills archived' and interpreted that as pruning — but the curator
had actually absorbed those skills into a new umbrella (e.g. 'llm-providers')
during the same run. The directory gets archived for safety (all removals
are recoverable), but the content still lives under a different name.
Users then 'restored' what they thought were deleted skills and ended up
with confusingly duplicated skillsets (old-name + absorbed-inside-umbrella).
Classify removed skills using this run's skill_manage tool calls:
- consolidated: content absorbed into a surviving/newly-created skill
(evidenced by a skill_manage write_file/patch/create/edit whose target
is a different skill AND whose file_path/content references the
removed skill's name)
- pruned: archived without consolidation evidence (truly stale)
REPORT.md now shows two distinct sections:
- 'Consolidated into umbrella skills' — with `removed → merged into umbrella`
- 'Pruned — archived for staleness' — pure staleness archives
run.json schema additions (backward compatible):
- counts.consolidated_this_run, counts.pruned_this_run
- consolidated: [{name, into, evidence}, ...]
- pruned: [names]
- archived: retained as the union for backward compat
Also: relabel the auto-transitions 'archived' counter to 'archived (no
LLM, pure time-based staleness)' so it's clearly distinct from LLM-pass
archives.
Tests: 9 new tests in test_curator_classification.py covering consolidation
evidence parsing (write_file/patch/create), hyphen/underscore name variants,
self-reference rejection, destination-must-exist, mixed runs, and
malformed-JSON fallback safety. Existing test_report_md_is_human_readable
updated to cover the new section names.
E2E: isolated HERMES_HOME, realistic 3-skill run, REPORT.md verified
end-to-end.
* feat(curator): hybrid model-declared + heuristic classification
Extend the consolidated-vs-pruned split with LLM-authored intent:
1. Curator prompt now requires a structured YAML block at the end of the
final response (consolidations / prunings with short rationale).
2. _parse_structured_summary() extracts it tolerantly — missing block,
malformed YAML, partial lists all fall back to heuristic cleanly.
3. _reconcile_classification() merges model intent with the tool-call
heuristic:
- Model wins on rationale when its umbrella exists post-run
- Model hallucination (umbrella doesn't exist) is downgraded to the
heuristic's finding, or pruned if there's no evidence either
- Heuristic catches model omission — consolidations the model
enumerated tools for but forgot to list get surfaced with a
'(detected via tool-call audit)' tag
4. REPORT.md now shows per-row rationale alongside 'removed → umbrella'
and flags audit-only rows so the user knows why no reason is shown.
Backward compat: run.json's 'archived' field (union) is preserved.
'pruned' is now a list of dicts with {name, source, reason};
'pruned_names' is the flat-name list for legacy consumers.
Tests: 15 new covering YAML parse edge cases (malformed, empty lists,
bare-string entries, missing fields), reconciler rules (model wins,
hallucination fallback, heuristic catches omission, prune with reason),
and an end-to-end report-render test with all four paths exercised.
2026-04-30 10:31:23 -07:00
|
|
|
"When done, write a human summary AND a structured machine-readable "
|
|
|
|
|
"block so downstream tooling can distinguish consolidation from "
|
|
|
|
|
"pruning. Format EXACTLY:\n\n"
|
|
|
|
|
"## Structured summary (required)\n"
|
|
|
|
|
"```yaml\n"
|
|
|
|
|
"consolidations:\n"
|
|
|
|
|
" - from: <old-skill-name>\n"
|
|
|
|
|
" into: <umbrella-skill-name>\n"
|
|
|
|
|
" reason: <one short sentence — why merged, not just 'similar'>\n"
|
|
|
|
|
"prunings:\n"
|
|
|
|
|
" - name: <skill-name>\n"
|
|
|
|
|
" reason: <one short sentence — why archived with no merge target>\n"
|
|
|
|
|
"```\n\n"
|
|
|
|
|
"Every skill you moved to .archive/ MUST appear in exactly one of the "
|
|
|
|
|
"two lists. If you consolidated X into umbrella Y (patched Y, wrote "
|
|
|
|
|
"a references file to Y, or created Y with X's content absorbed), X "
|
|
|
|
|
"goes under `consolidations` with `into: Y`. If you archived X with "
|
|
|
|
|
"no absorption — truly stale, irrelevant, or obsolete — X goes under "
|
|
|
|
|
"`prunings`. Leave a list empty (`consolidations: []`) if none. Do "
|
|
|
|
|
"not omit the block. The block comes AFTER your human-readable "
|
|
|
|
|
"summary of clusters processed, patches made, and decisions left alone."
|
feat(curator): background skill maintenance (issue #7816)
Adds the Curator — an auxiliary-model background task that periodically
reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage,
transitions unused skills through active → stale → archived, and spawns
a forked AIAgent to consolidate overlaps and patch drift.
Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI
startup and gateway boot when the last run is older than interval_hours
(default 24) AND the agent has been idle for min_idle_hours (default 2).
Invariants (all load-bearing):
- Never touches bundled or hub-installed skills (.bundled_manifest +
.hub/lock.json double-filter)
- Never auto-deletes — archive only. Archives are recoverable
via `hermes curator restore <skill>`
- Pinned skills bypass all auto-transitions
- Uses the aux client; never touches the main session's prompt cache
New files:
- tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes,
provenance filter
- agent/curator.py — orchestrator: config, idle gating, state-machine
transitions (pure, no LLM), forked-agent review prompt
- hermes_cli/curator.py — `hermes curator {status,run,pause,resume,
pin,unpin,restore}` subcommand
- tests/tools/test_skill_usage.py — 29 tests
- tests/agent/test_curator.py — 25 tests
Modified files (surgical patches):
- tools/skills_tool.py — bump view_count on successful skill_view
- tools/skill_manager_tool.py — bump patch_count on skill_manage
patch/edit/write_file/remove_file; forget record on delete
- hermes_cli/config.py — add curator: section to DEFAULT_CONFIG
- hermes_cli/commands.py — add /curator CommandDef with subcommands
- hermes_cli/main.py — register `hermes curator` subparser via
register_cli() from hermes_cli.curator
- cli.py — /curator slash-command dispatch + startup hook
- gateway/run.py — gateway-boot hook (mirrors CLI)
Validation:
- 54 new tests across skill_usage + curator, all passing in 3s
- 346 tests across all touched files' neighbors green
- 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green
- CLI smoke: `hermes curator status/pause/resume` work end-to-end
Companion to PR #16026 (class-first skill review prompt) — together
they form a loop: the review prompt stops near-duplicate skill creation
at the source, and the curator prunes/consolidates what still accumulates.
Refs #7816.
2026-04-26 06:08:39 -07:00
|
|
|
)
|
|
|
|
|
|
|
|
|
|
|
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307)
Every curator pass now emits a dated report directory under
`~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files:
- `run.json` — machine-readable full record (before/after snapshot,
state transitions, all tool calls, model/provider, timing, full LLM
final response untruncated, error if any)
- `REPORT.md` — human-readable markdown: model + duration header,
auto-transition counts, LLM consolidation stats, archived-this-run
list, new-skills-this-run list, state transitions, the full LLM
final summary, and a recovery footer pointing at the archive + the
`hermes curator restore` command
Reports live under `logs/curator/`, not inside `skills/` — they're
operational telemetry, not user-authored skill data, and belong
alongside `agent.log` / `gateway.log`.
Internals:
- `_run_llm_review()` now returns a dict (final, summary, model,
provider, tool_calls, error) instead of a bare truncated string so
the reporter has full fidelity
- Report writer is fully best-effort — any failure logs at DEBUG and
never breaks the curator itself. Same-second rerun gets a numeric
suffix so reports can't clobber each other
- Report path stamped into `.curator_state` as `last_report_path`
- `hermes curator status` surfaces a "last report:" line so users
can immediately open the latest run
Tests (all green):
- 7 new tests in tests/agent/test_curator_reports.py covering: report
location (logs not skills), both files written, run.json shape and
diff accuracy, markdown structure, error path still writes, state
transitions captured, same-second runs get unique dirs
- Existing test_run_review_synchronous_invokes_llm_stub updated to
stub the new dict-returning _run_llm_review signature
Live E2E: ran a synchronous pass against a 1-skill test collection
with a stubbed LLM; report written correctly, state stamped with
last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
# Per-run reports — {YYYYMMDD-HHMMSS}/run.json + REPORT.md under logs/curator/
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
def _reports_root() -> Path:
|
|
|
|
|
"""Directory where curator run reports are written.
|
|
|
|
|
|
|
|
|
|
Lives under the profile-aware logs dir (``~/.hermes/logs/curator/``)
|
|
|
|
|
alongside ``agent.log`` and ``gateway.log`` so it's found by anyone
|
|
|
|
|
looking for operational telemetry, not mixed in with the user's
|
|
|
|
|
authored skill data in ``~/.hermes/skills/``.
|
2026-04-30 04:52:28 -07:00
|
|
|
|
|
|
|
|
``ensure_hermes_home()`` pre-creates this dir on every CLI launch and
|
|
|
|
|
the v22→v23 migration backfills it for existing profiles, but we
|
|
|
|
|
still mkdir here as a belt-and-suspenders so the curator works even
|
|
|
|
|
from an odd entry path (e.g. gateway-only install, bare library use)
|
|
|
|
|
that bypasses both.
|
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307)
Every curator pass now emits a dated report directory under
`~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files:
- `run.json` — machine-readable full record (before/after snapshot,
state transitions, all tool calls, model/provider, timing, full LLM
final response untruncated, error if any)
- `REPORT.md` — human-readable markdown: model + duration header,
auto-transition counts, LLM consolidation stats, archived-this-run
list, new-skills-this-run list, state transitions, the full LLM
final summary, and a recovery footer pointing at the archive + the
`hermes curator restore` command
Reports live under `logs/curator/`, not inside `skills/` — they're
operational telemetry, not user-authored skill data, and belong
alongside `agent.log` / `gateway.log`.
Internals:
- `_run_llm_review()` now returns a dict (final, summary, model,
provider, tool_calls, error) instead of a bare truncated string so
the reporter has full fidelity
- Report writer is fully best-effort — any failure logs at DEBUG and
never breaks the curator itself. Same-second rerun gets a numeric
suffix so reports can't clobber each other
- Report path stamped into `.curator_state` as `last_report_path`
- `hermes curator status` surfaces a "last report:" line so users
can immediately open the latest run
Tests (all green):
- 7 new tests in tests/agent/test_curator_reports.py covering: report
location (logs not skills), both files written, run.json shape and
diff accuracy, markdown structure, error path still writes, state
transitions captured, same-second runs get unique dirs
- Existing test_run_review_synchronous_invokes_llm_stub updated to
stub the new dict-returning _run_llm_review signature
Live E2E: ran a synchronous pass against a 1-skill test collection
with a stubbed LLM; report written correctly, state stamped with
last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
|
|
|
"""
|
2026-04-30 04:52:28 -07:00
|
|
|
root = get_hermes_home() / "logs" / "curator"
|
|
|
|
|
try:
|
|
|
|
|
root.mkdir(parents=True, exist_ok=True)
|
|
|
|
|
except OSError as e:
|
|
|
|
|
logger.debug("Curator reports dir create failed: %s", e)
|
|
|
|
|
return root
|
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307)
Every curator pass now emits a dated report directory under
`~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files:
- `run.json` — machine-readable full record (before/after snapshot,
state transitions, all tool calls, model/provider, timing, full LLM
final response untruncated, error if any)
- `REPORT.md` — human-readable markdown: model + duration header,
auto-transition counts, LLM consolidation stats, archived-this-run
list, new-skills-this-run list, state transitions, the full LLM
final summary, and a recovery footer pointing at the archive + the
`hermes curator restore` command
Reports live under `logs/curator/`, not inside `skills/` — they're
operational telemetry, not user-authored skill data, and belong
alongside `agent.log` / `gateway.log`.
Internals:
- `_run_llm_review()` now returns a dict (final, summary, model,
provider, tool_calls, error) instead of a bare truncated string so
the reporter has full fidelity
- Report writer is fully best-effort — any failure logs at DEBUG and
never breaks the curator itself. Same-second rerun gets a numeric
suffix so reports can't clobber each other
- Report path stamped into `.curator_state` as `last_report_path`
- `hermes curator status` surfaces a "last report:" line so users
can immediately open the latest run
Tests (all green):
- 7 new tests in tests/agent/test_curator_reports.py covering: report
location (logs not skills), both files written, run.json shape and
diff accuracy, markdown structure, error path still writes, state
transitions captured, same-second runs get unique dirs
- Existing test_run_review_synchronous_invokes_llm_stub updated to
stub the new dict-returning _run_llm_review signature
Live E2E: ran a synchronous pass against a 1-skill test collection
with a stubbed LLM; report written correctly, state stamped with
last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
|
|
|
|
|
|
|
|
|
feat(curator): split archived into consolidated vs pruned with model + heuristic classification (#17941)
* fix(curator): split 'archived' into consolidated vs pruned in run reports
Users who watched a curator run saw skills like 'anthropic-api' listed
under 'Skills archived' and interpreted that as pruning — but the curator
had actually absorbed those skills into a new umbrella (e.g. 'llm-providers')
during the same run. The directory gets archived for safety (all removals
are recoverable), but the content still lives under a different name.
Users then 'restored' what they thought were deleted skills and ended up
with confusingly duplicated skillsets (old-name + absorbed-inside-umbrella).
Classify removed skills using this run's skill_manage tool calls:
- consolidated: content absorbed into a surviving/newly-created skill
(evidenced by a skill_manage write_file/patch/create/edit whose target
is a different skill AND whose file_path/content references the
removed skill's name)
- pruned: archived without consolidation evidence (truly stale)
REPORT.md now shows two distinct sections:
- 'Consolidated into umbrella skills' — with `removed → merged into umbrella`
- 'Pruned — archived for staleness' — pure staleness archives
run.json schema additions (backward compatible):
- counts.consolidated_this_run, counts.pruned_this_run
- consolidated: [{name, into, evidence}, ...]
- pruned: [names]
- archived: retained as the union for backward compat
Also: relabel the auto-transitions 'archived' counter to 'archived (no
LLM, pure time-based staleness)' so it's clearly distinct from LLM-pass
archives.
Tests: 9 new tests in test_curator_classification.py covering consolidation
evidence parsing (write_file/patch/create), hyphen/underscore name variants,
self-reference rejection, destination-must-exist, mixed runs, and
malformed-JSON fallback safety. Existing test_report_md_is_human_readable
updated to cover the new section names.
E2E: isolated HERMES_HOME, realistic 3-skill run, REPORT.md verified
end-to-end.
* feat(curator): hybrid model-declared + heuristic classification
Extend the consolidated-vs-pruned split with LLM-authored intent:
1. Curator prompt now requires a structured YAML block at the end of the
final response (consolidations / prunings with short rationale).
2. _parse_structured_summary() extracts it tolerantly — missing block,
malformed YAML, partial lists all fall back to heuristic cleanly.
3. _reconcile_classification() merges model intent with the tool-call
heuristic:
- Model wins on rationale when its umbrella exists post-run
- Model hallucination (umbrella doesn't exist) is downgraded to the
heuristic's finding, or pruned if there's no evidence either
- Heuristic catches model omission — consolidations the model
enumerated tools for but forgot to list get surfaced with a
'(detected via tool-call audit)' tag
4. REPORT.md now shows per-row rationale alongside 'removed → umbrella'
and flags audit-only rows so the user knows why no reason is shown.
Backward compat: run.json's 'archived' field (union) is preserved.
'pruned' is now a list of dicts with {name, source, reason};
'pruned_names' is the flat-name list for legacy consumers.
Tests: 15 new covering YAML parse edge cases (malformed, empty lists,
bare-string entries, missing fields), reconciler rules (model wins,
hallucination fallback, heuristic catches omission, prune with reason),
and an end-to-end report-render test with all four paths exercised.
2026-04-30 10:31:23 -07:00
|
|
|
def _classify_removed_skills(
|
|
|
|
|
removed: List[str],
|
|
|
|
|
added: List[str],
|
|
|
|
|
after_names: Set[str],
|
|
|
|
|
tool_calls: List[Dict[str, Any]],
|
|
|
|
|
) -> Dict[str, List[Dict[str, Any]]]:
|
|
|
|
|
"""Split ``removed`` into consolidated vs pruned.
|
|
|
|
|
|
|
|
|
|
A removed skill is "consolidated" when the curator absorbed its content
|
|
|
|
|
into another skill (an umbrella) during this run — the content still
|
|
|
|
|
lives, just under a different name. A removed skill is "pruned" when the
|
|
|
|
|
curator archived it for staleness/irrelevance without preserving its
|
|
|
|
|
content elsewhere.
|
|
|
|
|
|
|
|
|
|
Heuristic: scan this run's ``skill_manage`` tool calls and look for
|
|
|
|
|
``write_file``/``patch``/``create``/``edit`` actions whose target skill
|
|
|
|
|
(the ``name`` argument) is NOT the removed skill and whose
|
|
|
|
|
``file_path`` / ``file_content`` / ``content`` arguments reference the
|
|
|
|
|
removed skill's name. That's the textbook "absorbed into umbrella"
|
|
|
|
|
signal. Ties are broken by first-match (earliest tool call wins).
|
|
|
|
|
|
|
|
|
|
Returns ``{"consolidated": [{"name", "into", "evidence"}, ...],
|
|
|
|
|
"pruned": [{"name"}, ...]}``.
|
|
|
|
|
"""
|
|
|
|
|
consolidated: List[Dict[str, Any]] = []
|
|
|
|
|
pruned: List[Dict[str, Any]] = []
|
|
|
|
|
|
|
|
|
|
# Pre-parse tool calls: we only care about skill_manage.
|
|
|
|
|
parsed_calls: List[Dict[str, Any]] = []
|
|
|
|
|
for tc in tool_calls or []:
|
|
|
|
|
if not isinstance(tc, dict):
|
|
|
|
|
continue
|
|
|
|
|
if tc.get("name") != "skill_manage":
|
|
|
|
|
continue
|
|
|
|
|
raw = tc.get("arguments") or ""
|
|
|
|
|
# Arguments can be a JSON string (standard) or a dict (defensive).
|
|
|
|
|
args: Dict[str, Any] = {}
|
|
|
|
|
if isinstance(raw, dict):
|
|
|
|
|
args = raw
|
|
|
|
|
elif isinstance(raw, str):
|
|
|
|
|
try:
|
|
|
|
|
args = json.loads(raw)
|
|
|
|
|
except Exception:
|
|
|
|
|
# Truncated or malformed — fall back to substring match on
|
|
|
|
|
# the raw string so we still catch the common case.
|
|
|
|
|
args = {"_raw": raw}
|
|
|
|
|
if not isinstance(args, dict):
|
|
|
|
|
continue
|
|
|
|
|
parsed_calls.append(args)
|
|
|
|
|
|
|
|
|
|
# Build a set of "destination" skill names: anything still present after
|
|
|
|
|
# the run plus anything newly added this run. A removed skill being
|
|
|
|
|
# referenced from one of these is the consolidation signal.
|
|
|
|
|
destinations = set(after_names) | set(added or [])
|
|
|
|
|
|
|
|
|
|
for name in removed:
|
|
|
|
|
if not name:
|
|
|
|
|
continue
|
|
|
|
|
into: Optional[str] = None
|
|
|
|
|
evidence: Optional[str] = None
|
|
|
|
|
|
|
|
|
|
# Normalise name variants we'll search for in path/content strings.
|
|
|
|
|
needles = {name, name.replace("-", "_"), name.replace("_", "-")}
|
|
|
|
|
|
|
|
|
|
for args in parsed_calls:
|
|
|
|
|
target = args.get("name")
|
|
|
|
|
if not isinstance(target, str) or not target:
|
|
|
|
|
continue
|
|
|
|
|
# A call that operates on the removed skill itself isn't
|
|
|
|
|
# consolidation evidence.
|
|
|
|
|
if target == name:
|
|
|
|
|
continue
|
|
|
|
|
# The target must be a surviving or newly-created skill —
|
|
|
|
|
# otherwise we're pointing to a skill that doesn't exist.
|
|
|
|
|
if target not in destinations:
|
|
|
|
|
continue
|
|
|
|
|
|
|
|
|
|
# Look for the removed skill's name in file_path / content / raw.
|
|
|
|
|
haystacks: List[str] = []
|
|
|
|
|
for key in ("file_path", "file_content", "content", "new_string", "_raw"):
|
|
|
|
|
v = args.get(key)
|
|
|
|
|
if isinstance(v, str):
|
|
|
|
|
haystacks.append(v)
|
|
|
|
|
hit = False
|
|
|
|
|
for hay in haystacks:
|
|
|
|
|
for needle in needles:
|
|
|
|
|
if needle and needle in hay:
|
|
|
|
|
hit = True
|
|
|
|
|
evidence = (
|
|
|
|
|
f"skill_manage action={args.get('action', '?')} "
|
|
|
|
|
f"on '{target}' referenced '{name}' "
|
|
|
|
|
f"in {hay[:80]}"
|
|
|
|
|
)
|
|
|
|
|
break
|
|
|
|
|
if hit:
|
|
|
|
|
break
|
|
|
|
|
if hit:
|
|
|
|
|
into = target
|
|
|
|
|
break
|
|
|
|
|
|
|
|
|
|
if into:
|
|
|
|
|
consolidated.append({"name": name, "into": into, "evidence": evidence})
|
|
|
|
|
else:
|
|
|
|
|
pruned.append({"name": name})
|
|
|
|
|
|
|
|
|
|
return {"consolidated": consolidated, "pruned": pruned}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _parse_structured_summary(
|
|
|
|
|
llm_final: str,
|
|
|
|
|
) -> Dict[str, List[Dict[str, str]]]:
|
|
|
|
|
"""Extract the structured YAML block from the curator's final response.
|
|
|
|
|
|
|
|
|
|
The curator prompt requires a fenced ```yaml block under
|
|
|
|
|
``## Structured summary (required)`` with ``consolidations:`` and
|
|
|
|
|
``prunings:`` lists. This parses it tolerantly:
|
|
|
|
|
|
|
|
|
|
- Missing block → returns empty lists (we'll fall back to heuristic).
|
|
|
|
|
- Malformed YAML → returns empty lists and we rely on heuristic.
|
|
|
|
|
- Partial block (e.g. only consolidations) → returns what we could parse.
|
|
|
|
|
|
|
|
|
|
Returns ``{"consolidations": [{"from", "into", "reason"}, ...],
|
|
|
|
|
"prunings": [{"name", "reason"}, ...]}``.
|
|
|
|
|
"""
|
|
|
|
|
empty = {"consolidations": [], "prunings": []}
|
|
|
|
|
if not llm_final or not isinstance(llm_final, str):
|
|
|
|
|
return empty
|
|
|
|
|
|
|
|
|
|
# Find the YAML fenced block. We look for ```yaml ... ``` specifically
|
|
|
|
|
# rather than any fenced block so we don't accidentally pick up a code
|
|
|
|
|
# sample the model quoted elsewhere.
|
|
|
|
|
import re
|
|
|
|
|
match = re.search(
|
|
|
|
|
r"```ya?ml\s*\n(.*?)\n```",
|
|
|
|
|
llm_final,
|
|
|
|
|
re.DOTALL | re.IGNORECASE,
|
|
|
|
|
)
|
|
|
|
|
if not match:
|
|
|
|
|
return empty
|
|
|
|
|
|
|
|
|
|
body = match.group(1)
|
|
|
|
|
|
|
|
|
|
# Prefer PyYAML when available — every hermes install already has it
|
|
|
|
|
# (config.yaml loader). Fall back to a hand parser for paranoia.
|
|
|
|
|
try:
|
|
|
|
|
import yaml # type: ignore
|
|
|
|
|
data = yaml.safe_load(body)
|
|
|
|
|
except Exception:
|
|
|
|
|
return empty
|
|
|
|
|
|
|
|
|
|
if not isinstance(data, dict):
|
|
|
|
|
return empty
|
|
|
|
|
|
|
|
|
|
out: Dict[str, List[Dict[str, str]]] = {"consolidations": [], "prunings": []}
|
|
|
|
|
cons_raw = data.get("consolidations") or []
|
|
|
|
|
prun_raw = data.get("prunings") or []
|
|
|
|
|
|
|
|
|
|
if isinstance(cons_raw, list):
|
|
|
|
|
for entry in cons_raw:
|
|
|
|
|
if not isinstance(entry, dict):
|
|
|
|
|
continue
|
|
|
|
|
frm = entry.get("from")
|
|
|
|
|
into = entry.get("into")
|
|
|
|
|
if not (isinstance(frm, str) and frm.strip()
|
|
|
|
|
and isinstance(into, str) and into.strip()):
|
|
|
|
|
continue
|
|
|
|
|
reason = entry.get("reason")
|
|
|
|
|
out["consolidations"].append({
|
|
|
|
|
"from": frm.strip(),
|
|
|
|
|
"into": into.strip(),
|
|
|
|
|
"reason": (reason or "").strip() if isinstance(reason, str) else "",
|
|
|
|
|
})
|
|
|
|
|
|
|
|
|
|
if isinstance(prun_raw, list):
|
|
|
|
|
for entry in prun_raw:
|
|
|
|
|
if not isinstance(entry, dict):
|
|
|
|
|
continue
|
|
|
|
|
name = entry.get("name")
|
|
|
|
|
if not (isinstance(name, str) and name.strip()):
|
|
|
|
|
continue
|
|
|
|
|
reason = entry.get("reason")
|
|
|
|
|
out["prunings"].append({
|
|
|
|
|
"name": name.strip(),
|
|
|
|
|
"reason": (reason or "").strip() if isinstance(reason, str) else "",
|
|
|
|
|
})
|
|
|
|
|
|
|
|
|
|
return out
|
|
|
|
|
|
|
|
|
|
|
fix(curator): authoritative absorbed_into on delete + restore cron skill links on rollback (#18671) (#18731)
* fix(curator): authoritative absorbed_into declarations on skill delete
Closes #18671. The classification pipeline that feeds cron-ref rewriting
used to infer consolidation vs pruning from two brittle signals: the
curator model's post-hoc YAML summary block, and a substring heuristic
scanning other tool calls for the removed skill's name. Both miss in
real consolidations — the model forgets the YAML under reasoning
pressure, and the heuristic misses when the umbrella's patch content
describes the absorbed behavior abstractly instead of naming the old
slug. When both miss, the skill falls through to 'no-evidence fallback'
pruned, and #18253's cron rewriter drops the cron ref entirely instead
of mapping it to the umbrella. Same observable symptom as pre-#18253:
'Skill(s) not found and skipped' at the next cron run.
The fix makes the model declare intent at the moment of deletion.
skill_manage(action='delete') now accepts absorbed_into:
- absorbed_into='<umbrella>' -> consolidated, target must exist on disk
- absorbed_into='' -> explicit prune, no forwarding target
- missing -> legacy path, falls through to heuristic/YAML
The curator reconciler reads these declarations off llm_meta.tool_calls
BEFORE either the YAML block or the substring heuristic. Declaration
wins. Fallback logic stays intact for backward compat with any caller
(human or older curator conversation) that doesn't populate the arg.
Changes
- tools/skill_manager_tool.py: add absorbed_into param to skill_manage
+ _delete_skill. Validate target exists when non-empty. Reject
absorbed_into=<self>. Wire through dispatcher + registry + schema.
- agent/curator.py: new _extract_absorbed_into_declarations() walks
tool calls for skill_manage(delete) with the arg. _reconcile_classification
accepts absorbed_declarations= and treats them as authoritative. Curator
prompt updated to require the arg on every delete.
- Tests: 7 new skill_manager tests covering the tool contract (valid
target, empty string, nonexistent target, self-reference, whitespace,
backward compat, dispatcher plumbing). 11 new curator tests covering
the extractor + authoritative reconciler path + mixed-legacy-and-
declared runs.
Validation
- 307/307 targeted tests pass (curator + cron + skill_manager suites).
- E2E #18671 repro: 3 narrow skills, 1 umbrella, cron job referencing
all 3. Model emits NO YAML block. Heuristic misses (patch prose
doesn't name old slugs). Delete calls carry absorbed_into. Result:
both PR skills correctly classified 'consolidated' + cron rewritten
['pr-review-format', 'pr-review-checklist', 'stale-junk'] ->
['hermes-agent-dev']; stale-junk pruned via absorbed_into=''.
- E2E backward-compat: delete without absorbed_into, model emits YAML
-> routed via existing 'model' source, cron still rewritten correctly.
* feat(curator): capture + restore cron skill links across snapshot/rollback
Before this, rolling back a curator run restored the skills tree but cron
jobs still pointed at the umbrella skills the curator had rewritten them
to. The user would see their old narrow skills back on disk but their
cron jobs still configured with the merged umbrella — not actually 'back
to how it was'.
Snapshot side: snapshot_skills() now captures ~/.hermes/cron/jobs.json
alongside the skills tarball, as cron-jobs.json. The manifest gets a new
'cron_jobs' block with {backed_up, jobs_count} so rollback (and the CLI
confirm dialog) can surface what's in the snapshot. If jobs.json is
missing/unreadable/malformed, snapshot proceeds without cron data — the
skills backup is the core guarantee; cron is additive.
Rollback side: after the skills extract succeeds, the new
_restore_cron_skill_links() reconciles the backed-up jobs into the live
jobs.json SURGICALLY. Only 'skills' and 'skill' fields are restored, and
only on jobs matched by id. Everything else about a cron job — schedule,
last_run_at, next_run_at, enabled, prompt, workdir, hooks — is live
state the user or scheduler has modified since the snapshot; overwriting
it would regress unrelated activity.
Reconciliation rules:
- Job in backup AND live, skills differ → skills restored.
- Job in backup AND live, skills match → no-op.
- Job in backup, NOT in live → skipped (user deleted it
after snapshot; their choice
is later than the snapshot).
- Job in live, NOT in backup → untouched (user created it
after snapshot).
- Snapshot missing cron-jobs.json at all → rollback still succeeds,
reports 'not captured'
(older pre-feature snapshots
keep working).
Writes go through cron.jobs.save_jobs under the same _jobs_file_lock the
scheduler uses, so rollback doesn't race tick().
Also:
- hermes_cli/curator.py: rollback confirm dialog now shows
'cron jobs: N (will be restored for skill-link fields only)' when the
snapshot has cron data, or 'not in snapshot (<reason>)' otherwise.
- rollback()'s message string includes a 'cron links: ...' clause
summarizing the reconciliation outcome.
Tests
- 9 new cases: snapshot-with-cron, snapshot-without-cron, malformed-json
captured-as-raw, full rollback-restores-skills-and-cron, rollback
touches only skill fields, rollback skips user-deleted jobs, rollback
leaves user-created jobs untouched, rollback still works with
pre-feature snapshot that has no cron-jobs.json, standalone unit test
on _restore_cron_skill_links exercising the full report shape.
Validation
- 484/484 targeted tests pass (curator + cron + skill_manager suites).
- E2E: real snapshot_skills, real cron rewrite, real rollback. Before:
['pr-review-format', 'pr-review-checklist', 'pr-triage-salvage'].
After curator: ['hermes-agent-dev']. After rollback: ['pr-review-format',
'pr-review-checklist', 'pr-triage-salvage']. Non-skill fields (id,
name, prompt) preserved across the round trip.
2026-05-02 01:29:57 -07:00
|
|
|
def _extract_absorbed_into_declarations(
|
|
|
|
|
tool_calls: List[Dict[str, Any]],
|
|
|
|
|
) -> Dict[str, Dict[str, Any]]:
|
|
|
|
|
"""Walk this run's tool calls and extract model-declared absorption targets.
|
|
|
|
|
|
|
|
|
|
The curator prompt requires every ``skill_manage(action='delete')`` call
|
|
|
|
|
to pass ``absorbed_into=<umbrella>`` when consolidating, or
|
|
|
|
|
``absorbed_into=""`` when truly pruning. This is the single authoritative
|
|
|
|
|
signal for classification — the model's own declaration at the moment of
|
|
|
|
|
deletion, which beats both post-hoc YAML summary parsing and substring
|
|
|
|
|
heuristics on other tool calls.
|
|
|
|
|
|
|
|
|
|
Returns ``{skill_name: {"into": "<umbrella>" | "", "declared": True}}``.
|
|
|
|
|
Entries with ``into == ""`` are explicit prunings.
|
|
|
|
|
Skills without a ``skill_manage(delete)`` call, or with one that omitted
|
|
|
|
|
``absorbed_into``, are not in the returned dict — caller falls back to
|
|
|
|
|
the existing heuristic/YAML logic for those (backward compat with older
|
|
|
|
|
curator runs and any callers that don't populate the arg).
|
|
|
|
|
"""
|
|
|
|
|
out: Dict[str, Dict[str, Any]] = {}
|
|
|
|
|
for tc in tool_calls or []:
|
|
|
|
|
if not isinstance(tc, dict):
|
|
|
|
|
continue
|
|
|
|
|
if tc.get("name") != "skill_manage":
|
|
|
|
|
continue
|
|
|
|
|
raw = tc.get("arguments") or ""
|
|
|
|
|
args: Dict[str, Any] = {}
|
|
|
|
|
if isinstance(raw, dict):
|
|
|
|
|
args = raw
|
|
|
|
|
elif isinstance(raw, str):
|
|
|
|
|
try:
|
|
|
|
|
args = json.loads(raw)
|
|
|
|
|
except Exception:
|
|
|
|
|
continue
|
|
|
|
|
if not isinstance(args, dict):
|
|
|
|
|
continue
|
|
|
|
|
if args.get("action") != "delete":
|
|
|
|
|
continue
|
|
|
|
|
name = args.get("name")
|
|
|
|
|
if not isinstance(name, str) or not name.strip():
|
|
|
|
|
continue
|
|
|
|
|
# absorbed_into must be present (even empty string is meaningful);
|
|
|
|
|
# missing key means the model didn't declare intent.
|
|
|
|
|
if "absorbed_into" not in args:
|
|
|
|
|
continue
|
|
|
|
|
target = args.get("absorbed_into")
|
|
|
|
|
if target is None:
|
|
|
|
|
continue
|
|
|
|
|
if not isinstance(target, str):
|
|
|
|
|
continue
|
|
|
|
|
out[name.strip()] = {"into": target.strip(), "declared": True}
|
|
|
|
|
return out
|
|
|
|
|
|
|
|
|
|
|
feat(curator): split archived into consolidated vs pruned with model + heuristic classification (#17941)
* fix(curator): split 'archived' into consolidated vs pruned in run reports
Users who watched a curator run saw skills like 'anthropic-api' listed
under 'Skills archived' and interpreted that as pruning — but the curator
had actually absorbed those skills into a new umbrella (e.g. 'llm-providers')
during the same run. The directory gets archived for safety (all removals
are recoverable), but the content still lives under a different name.
Users then 'restored' what they thought were deleted skills and ended up
with confusingly duplicated skillsets (old-name + absorbed-inside-umbrella).
Classify removed skills using this run's skill_manage tool calls:
- consolidated: content absorbed into a surviving/newly-created skill
(evidenced by a skill_manage write_file/patch/create/edit whose target
is a different skill AND whose file_path/content references the
removed skill's name)
- pruned: archived without consolidation evidence (truly stale)
REPORT.md now shows two distinct sections:
- 'Consolidated into umbrella skills' — with `removed → merged into umbrella`
- 'Pruned — archived for staleness' — pure staleness archives
run.json schema additions (backward compatible):
- counts.consolidated_this_run, counts.pruned_this_run
- consolidated: [{name, into, evidence}, ...]
- pruned: [names]
- archived: retained as the union for backward compat
Also: relabel the auto-transitions 'archived' counter to 'archived (no
LLM, pure time-based staleness)' so it's clearly distinct from LLM-pass
archives.
Tests: 9 new tests in test_curator_classification.py covering consolidation
evidence parsing (write_file/patch/create), hyphen/underscore name variants,
self-reference rejection, destination-must-exist, mixed runs, and
malformed-JSON fallback safety. Existing test_report_md_is_human_readable
updated to cover the new section names.
E2E: isolated HERMES_HOME, realistic 3-skill run, REPORT.md verified
end-to-end.
* feat(curator): hybrid model-declared + heuristic classification
Extend the consolidated-vs-pruned split with LLM-authored intent:
1. Curator prompt now requires a structured YAML block at the end of the
final response (consolidations / prunings with short rationale).
2. _parse_structured_summary() extracts it tolerantly — missing block,
malformed YAML, partial lists all fall back to heuristic cleanly.
3. _reconcile_classification() merges model intent with the tool-call
heuristic:
- Model wins on rationale when its umbrella exists post-run
- Model hallucination (umbrella doesn't exist) is downgraded to the
heuristic's finding, or pruned if there's no evidence either
- Heuristic catches model omission — consolidations the model
enumerated tools for but forgot to list get surfaced with a
'(detected via tool-call audit)' tag
4. REPORT.md now shows per-row rationale alongside 'removed → umbrella'
and flags audit-only rows so the user knows why no reason is shown.
Backward compat: run.json's 'archived' field (union) is preserved.
'pruned' is now a list of dicts with {name, source, reason};
'pruned_names' is the flat-name list for legacy consumers.
Tests: 15 new covering YAML parse edge cases (malformed, empty lists,
bare-string entries, missing fields), reconciler rules (model wins,
hallucination fallback, heuristic catches omission, prune with reason),
and an end-to-end report-render test with all four paths exercised.
2026-04-30 10:31:23 -07:00
|
|
|
def _reconcile_classification(
|
|
|
|
|
removed: List[str],
|
|
|
|
|
heuristic: Dict[str, List[Dict[str, Any]]],
|
|
|
|
|
model_block: Dict[str, List[Dict[str, str]]],
|
|
|
|
|
destinations: Set[str],
|
fix(curator): authoritative absorbed_into on delete + restore cron skill links on rollback (#18671) (#18731)
* fix(curator): authoritative absorbed_into declarations on skill delete
Closes #18671. The classification pipeline that feeds cron-ref rewriting
used to infer consolidation vs pruning from two brittle signals: the
curator model's post-hoc YAML summary block, and a substring heuristic
scanning other tool calls for the removed skill's name. Both miss in
real consolidations — the model forgets the YAML under reasoning
pressure, and the heuristic misses when the umbrella's patch content
describes the absorbed behavior abstractly instead of naming the old
slug. When both miss, the skill falls through to 'no-evidence fallback'
pruned, and #18253's cron rewriter drops the cron ref entirely instead
of mapping it to the umbrella. Same observable symptom as pre-#18253:
'Skill(s) not found and skipped' at the next cron run.
The fix makes the model declare intent at the moment of deletion.
skill_manage(action='delete') now accepts absorbed_into:
- absorbed_into='<umbrella>' -> consolidated, target must exist on disk
- absorbed_into='' -> explicit prune, no forwarding target
- missing -> legacy path, falls through to heuristic/YAML
The curator reconciler reads these declarations off llm_meta.tool_calls
BEFORE either the YAML block or the substring heuristic. Declaration
wins. Fallback logic stays intact for backward compat with any caller
(human or older curator conversation) that doesn't populate the arg.
Changes
- tools/skill_manager_tool.py: add absorbed_into param to skill_manage
+ _delete_skill. Validate target exists when non-empty. Reject
absorbed_into=<self>. Wire through dispatcher + registry + schema.
- agent/curator.py: new _extract_absorbed_into_declarations() walks
tool calls for skill_manage(delete) with the arg. _reconcile_classification
accepts absorbed_declarations= and treats them as authoritative. Curator
prompt updated to require the arg on every delete.
- Tests: 7 new skill_manager tests covering the tool contract (valid
target, empty string, nonexistent target, self-reference, whitespace,
backward compat, dispatcher plumbing). 11 new curator tests covering
the extractor + authoritative reconciler path + mixed-legacy-and-
declared runs.
Validation
- 307/307 targeted tests pass (curator + cron + skill_manager suites).
- E2E #18671 repro: 3 narrow skills, 1 umbrella, cron job referencing
all 3. Model emits NO YAML block. Heuristic misses (patch prose
doesn't name old slugs). Delete calls carry absorbed_into. Result:
both PR skills correctly classified 'consolidated' + cron rewritten
['pr-review-format', 'pr-review-checklist', 'stale-junk'] ->
['hermes-agent-dev']; stale-junk pruned via absorbed_into=''.
- E2E backward-compat: delete without absorbed_into, model emits YAML
-> routed via existing 'model' source, cron still rewritten correctly.
* feat(curator): capture + restore cron skill links across snapshot/rollback
Before this, rolling back a curator run restored the skills tree but cron
jobs still pointed at the umbrella skills the curator had rewritten them
to. The user would see their old narrow skills back on disk but their
cron jobs still configured with the merged umbrella — not actually 'back
to how it was'.
Snapshot side: snapshot_skills() now captures ~/.hermes/cron/jobs.json
alongside the skills tarball, as cron-jobs.json. The manifest gets a new
'cron_jobs' block with {backed_up, jobs_count} so rollback (and the CLI
confirm dialog) can surface what's in the snapshot. If jobs.json is
missing/unreadable/malformed, snapshot proceeds without cron data — the
skills backup is the core guarantee; cron is additive.
Rollback side: after the skills extract succeeds, the new
_restore_cron_skill_links() reconciles the backed-up jobs into the live
jobs.json SURGICALLY. Only 'skills' and 'skill' fields are restored, and
only on jobs matched by id. Everything else about a cron job — schedule,
last_run_at, next_run_at, enabled, prompt, workdir, hooks — is live
state the user or scheduler has modified since the snapshot; overwriting
it would regress unrelated activity.
Reconciliation rules:
- Job in backup AND live, skills differ → skills restored.
- Job in backup AND live, skills match → no-op.
- Job in backup, NOT in live → skipped (user deleted it
after snapshot; their choice
is later than the snapshot).
- Job in live, NOT in backup → untouched (user created it
after snapshot).
- Snapshot missing cron-jobs.json at all → rollback still succeeds,
reports 'not captured'
(older pre-feature snapshots
keep working).
Writes go through cron.jobs.save_jobs under the same _jobs_file_lock the
scheduler uses, so rollback doesn't race tick().
Also:
- hermes_cli/curator.py: rollback confirm dialog now shows
'cron jobs: N (will be restored for skill-link fields only)' when the
snapshot has cron data, or 'not in snapshot (<reason>)' otherwise.
- rollback()'s message string includes a 'cron links: ...' clause
summarizing the reconciliation outcome.
Tests
- 9 new cases: snapshot-with-cron, snapshot-without-cron, malformed-json
captured-as-raw, full rollback-restores-skills-and-cron, rollback
touches only skill fields, rollback skips user-deleted jobs, rollback
leaves user-created jobs untouched, rollback still works with
pre-feature snapshot that has no cron-jobs.json, standalone unit test
on _restore_cron_skill_links exercising the full report shape.
Validation
- 484/484 targeted tests pass (curator + cron + skill_manager suites).
- E2E: real snapshot_skills, real cron rewrite, real rollback. Before:
['pr-review-format', 'pr-review-checklist', 'pr-triage-salvage'].
After curator: ['hermes-agent-dev']. After rollback: ['pr-review-format',
'pr-review-checklist', 'pr-triage-salvage']. Non-skill fields (id,
name, prompt) preserved across the round trip.
2026-05-02 01:29:57 -07:00
|
|
|
absorbed_declarations: Optional[Dict[str, Dict[str, Any]]] = None,
|
feat(curator): split archived into consolidated vs pruned with model + heuristic classification (#17941)
* fix(curator): split 'archived' into consolidated vs pruned in run reports
Users who watched a curator run saw skills like 'anthropic-api' listed
under 'Skills archived' and interpreted that as pruning — but the curator
had actually absorbed those skills into a new umbrella (e.g. 'llm-providers')
during the same run. The directory gets archived for safety (all removals
are recoverable), but the content still lives under a different name.
Users then 'restored' what they thought were deleted skills and ended up
with confusingly duplicated skillsets (old-name + absorbed-inside-umbrella).
Classify removed skills using this run's skill_manage tool calls:
- consolidated: content absorbed into a surviving/newly-created skill
(evidenced by a skill_manage write_file/patch/create/edit whose target
is a different skill AND whose file_path/content references the
removed skill's name)
- pruned: archived without consolidation evidence (truly stale)
REPORT.md now shows two distinct sections:
- 'Consolidated into umbrella skills' — with `removed → merged into umbrella`
- 'Pruned — archived for staleness' — pure staleness archives
run.json schema additions (backward compatible):
- counts.consolidated_this_run, counts.pruned_this_run
- consolidated: [{name, into, evidence}, ...]
- pruned: [names]
- archived: retained as the union for backward compat
Also: relabel the auto-transitions 'archived' counter to 'archived (no
LLM, pure time-based staleness)' so it's clearly distinct from LLM-pass
archives.
Tests: 9 new tests in test_curator_classification.py covering consolidation
evidence parsing (write_file/patch/create), hyphen/underscore name variants,
self-reference rejection, destination-must-exist, mixed runs, and
malformed-JSON fallback safety. Existing test_report_md_is_human_readable
updated to cover the new section names.
E2E: isolated HERMES_HOME, realistic 3-skill run, REPORT.md verified
end-to-end.
* feat(curator): hybrid model-declared + heuristic classification
Extend the consolidated-vs-pruned split with LLM-authored intent:
1. Curator prompt now requires a structured YAML block at the end of the
final response (consolidations / prunings with short rationale).
2. _parse_structured_summary() extracts it tolerantly — missing block,
malformed YAML, partial lists all fall back to heuristic cleanly.
3. _reconcile_classification() merges model intent with the tool-call
heuristic:
- Model wins on rationale when its umbrella exists post-run
- Model hallucination (umbrella doesn't exist) is downgraded to the
heuristic's finding, or pruned if there's no evidence either
- Heuristic catches model omission — consolidations the model
enumerated tools for but forgot to list get surfaced with a
'(detected via tool-call audit)' tag
4. REPORT.md now shows per-row rationale alongside 'removed → umbrella'
and flags audit-only rows so the user knows why no reason is shown.
Backward compat: run.json's 'archived' field (union) is preserved.
'pruned' is now a list of dicts with {name, source, reason};
'pruned_names' is the flat-name list for legacy consumers.
Tests: 15 new covering YAML parse edge cases (malformed, empty lists,
bare-string entries, missing fields), reconciler rules (model wins,
hallucination fallback, heuristic catches omission, prune with reason),
and an end-to-end report-render test with all four paths exercised.
2026-04-30 10:31:23 -07:00
|
|
|
) -> Dict[str, List[Dict[str, Any]]]:
|
|
|
|
|
"""Merge heuristic (tool-call evidence) with the model's structured block.
|
|
|
|
|
|
fix(curator): authoritative absorbed_into on delete + restore cron skill links on rollback (#18671) (#18731)
* fix(curator): authoritative absorbed_into declarations on skill delete
Closes #18671. The classification pipeline that feeds cron-ref rewriting
used to infer consolidation vs pruning from two brittle signals: the
curator model's post-hoc YAML summary block, and a substring heuristic
scanning other tool calls for the removed skill's name. Both miss in
real consolidations — the model forgets the YAML under reasoning
pressure, and the heuristic misses when the umbrella's patch content
describes the absorbed behavior abstractly instead of naming the old
slug. When both miss, the skill falls through to 'no-evidence fallback'
pruned, and #18253's cron rewriter drops the cron ref entirely instead
of mapping it to the umbrella. Same observable symptom as pre-#18253:
'Skill(s) not found and skipped' at the next cron run.
The fix makes the model declare intent at the moment of deletion.
skill_manage(action='delete') now accepts absorbed_into:
- absorbed_into='<umbrella>' -> consolidated, target must exist on disk
- absorbed_into='' -> explicit prune, no forwarding target
- missing -> legacy path, falls through to heuristic/YAML
The curator reconciler reads these declarations off llm_meta.tool_calls
BEFORE either the YAML block or the substring heuristic. Declaration
wins. Fallback logic stays intact for backward compat with any caller
(human or older curator conversation) that doesn't populate the arg.
Changes
- tools/skill_manager_tool.py: add absorbed_into param to skill_manage
+ _delete_skill. Validate target exists when non-empty. Reject
absorbed_into=<self>. Wire through dispatcher + registry + schema.
- agent/curator.py: new _extract_absorbed_into_declarations() walks
tool calls for skill_manage(delete) with the arg. _reconcile_classification
accepts absorbed_declarations= and treats them as authoritative. Curator
prompt updated to require the arg on every delete.
- Tests: 7 new skill_manager tests covering the tool contract (valid
target, empty string, nonexistent target, self-reference, whitespace,
backward compat, dispatcher plumbing). 11 new curator tests covering
the extractor + authoritative reconciler path + mixed-legacy-and-
declared runs.
Validation
- 307/307 targeted tests pass (curator + cron + skill_manager suites).
- E2E #18671 repro: 3 narrow skills, 1 umbrella, cron job referencing
all 3. Model emits NO YAML block. Heuristic misses (patch prose
doesn't name old slugs). Delete calls carry absorbed_into. Result:
both PR skills correctly classified 'consolidated' + cron rewritten
['pr-review-format', 'pr-review-checklist', 'stale-junk'] ->
['hermes-agent-dev']; stale-junk pruned via absorbed_into=''.
- E2E backward-compat: delete without absorbed_into, model emits YAML
-> routed via existing 'model' source, cron still rewritten correctly.
* feat(curator): capture + restore cron skill links across snapshot/rollback
Before this, rolling back a curator run restored the skills tree but cron
jobs still pointed at the umbrella skills the curator had rewritten them
to. The user would see their old narrow skills back on disk but their
cron jobs still configured with the merged umbrella — not actually 'back
to how it was'.
Snapshot side: snapshot_skills() now captures ~/.hermes/cron/jobs.json
alongside the skills tarball, as cron-jobs.json. The manifest gets a new
'cron_jobs' block with {backed_up, jobs_count} so rollback (and the CLI
confirm dialog) can surface what's in the snapshot. If jobs.json is
missing/unreadable/malformed, snapshot proceeds without cron data — the
skills backup is the core guarantee; cron is additive.
Rollback side: after the skills extract succeeds, the new
_restore_cron_skill_links() reconciles the backed-up jobs into the live
jobs.json SURGICALLY. Only 'skills' and 'skill' fields are restored, and
only on jobs matched by id. Everything else about a cron job — schedule,
last_run_at, next_run_at, enabled, prompt, workdir, hooks — is live
state the user or scheduler has modified since the snapshot; overwriting
it would regress unrelated activity.
Reconciliation rules:
- Job in backup AND live, skills differ → skills restored.
- Job in backup AND live, skills match → no-op.
- Job in backup, NOT in live → skipped (user deleted it
after snapshot; their choice
is later than the snapshot).
- Job in live, NOT in backup → untouched (user created it
after snapshot).
- Snapshot missing cron-jobs.json at all → rollback still succeeds,
reports 'not captured'
(older pre-feature snapshots
keep working).
Writes go through cron.jobs.save_jobs under the same _jobs_file_lock the
scheduler uses, so rollback doesn't race tick().
Also:
- hermes_cli/curator.py: rollback confirm dialog now shows
'cron jobs: N (will be restored for skill-link fields only)' when the
snapshot has cron data, or 'not in snapshot (<reason>)' otherwise.
- rollback()'s message string includes a 'cron links: ...' clause
summarizing the reconciliation outcome.
Tests
- 9 new cases: snapshot-with-cron, snapshot-without-cron, malformed-json
captured-as-raw, full rollback-restores-skills-and-cron, rollback
touches only skill fields, rollback skips user-deleted jobs, rollback
leaves user-created jobs untouched, rollback still works with
pre-feature snapshot that has no cron-jobs.json, standalone unit test
on _restore_cron_skill_links exercising the full report shape.
Validation
- 484/484 targeted tests pass (curator + cron + skill_manager suites).
- E2E: real snapshot_skills, real cron rewrite, real rollback. Before:
['pr-review-format', 'pr-review-checklist', 'pr-triage-salvage'].
After curator: ['hermes-agent-dev']. After rollback: ['pr-review-format',
'pr-review-checklist', 'pr-triage-salvage']. Non-skill fields (id,
name, prompt) preserved across the round trip.
2026-05-02 01:29:57 -07:00
|
|
|
Rules (evaluated in order; first match wins):
|
|
|
|
|
- **Model-declared `absorbed_into` at delete time is authoritative.** Any
|
|
|
|
|
entry in ``absorbed_declarations`` beats every other signal. This is
|
|
|
|
|
the model telling us directly, at the moment of deletion, what it did.
|
|
|
|
|
``into != ""`` and target exists → consolidated. ``into == ""`` →
|
|
|
|
|
pruned. ``into != ""`` but target doesn't exist → hallucination; fall
|
|
|
|
|
through to the usual signals.
|
feat(curator): split archived into consolidated vs pruned with model + heuristic classification (#17941)
* fix(curator): split 'archived' into consolidated vs pruned in run reports
Users who watched a curator run saw skills like 'anthropic-api' listed
under 'Skills archived' and interpreted that as pruning — but the curator
had actually absorbed those skills into a new umbrella (e.g. 'llm-providers')
during the same run. The directory gets archived for safety (all removals
are recoverable), but the content still lives under a different name.
Users then 'restored' what they thought were deleted skills and ended up
with confusingly duplicated skillsets (old-name + absorbed-inside-umbrella).
Classify removed skills using this run's skill_manage tool calls:
- consolidated: content absorbed into a surviving/newly-created skill
(evidenced by a skill_manage write_file/patch/create/edit whose target
is a different skill AND whose file_path/content references the
removed skill's name)
- pruned: archived without consolidation evidence (truly stale)
REPORT.md now shows two distinct sections:
- 'Consolidated into umbrella skills' — with `removed → merged into umbrella`
- 'Pruned — archived for staleness' — pure staleness archives
run.json schema additions (backward compatible):
- counts.consolidated_this_run, counts.pruned_this_run
- consolidated: [{name, into, evidence}, ...]
- pruned: [names]
- archived: retained as the union for backward compat
Also: relabel the auto-transitions 'archived' counter to 'archived (no
LLM, pure time-based staleness)' so it's clearly distinct from LLM-pass
archives.
Tests: 9 new tests in test_curator_classification.py covering consolidation
evidence parsing (write_file/patch/create), hyphen/underscore name variants,
self-reference rejection, destination-must-exist, mixed runs, and
malformed-JSON fallback safety. Existing test_report_md_is_human_readable
updated to cover the new section names.
E2E: isolated HERMES_HOME, realistic 3-skill run, REPORT.md verified
end-to-end.
* feat(curator): hybrid model-declared + heuristic classification
Extend the consolidated-vs-pruned split with LLM-authored intent:
1. Curator prompt now requires a structured YAML block at the end of the
final response (consolidations / prunings with short rationale).
2. _parse_structured_summary() extracts it tolerantly — missing block,
malformed YAML, partial lists all fall back to heuristic cleanly.
3. _reconcile_classification() merges model intent with the tool-call
heuristic:
- Model wins on rationale when its umbrella exists post-run
- Model hallucination (umbrella doesn't exist) is downgraded to the
heuristic's finding, or pruned if there's no evidence either
- Heuristic catches model omission — consolidations the model
enumerated tools for but forgot to list get surfaced with a
'(detected via tool-call audit)' tag
4. REPORT.md now shows per-row rationale alongside 'removed → umbrella'
and flags audit-only rows so the user knows why no reason is shown.
Backward compat: run.json's 'archived' field (union) is preserved.
'pruned' is now a list of dicts with {name, source, reason};
'pruned_names' is the flat-name list for legacy consumers.
Tests: 15 new covering YAML parse edge cases (malformed, empty lists,
bare-string entries, missing fields), reconciler rules (model wins,
hallucination fallback, heuristic catches omission, prune with reason),
and an end-to-end report-render test with all four paths exercised.
2026-04-30 10:31:23 -07:00
|
|
|
- Model-declared consolidation wins when its ``into`` target exists
|
|
|
|
|
in ``destinations`` (survived or newly-created). This gives the
|
|
|
|
|
model authority over intent + rationale.
|
|
|
|
|
- Model-declared consolidation whose ``into`` target does NOT exist is
|
|
|
|
|
downgraded: the model hallucinated an umbrella. We prefer the
|
|
|
|
|
heuristic's finding for that skill, or fall back to pruned.
|
|
|
|
|
- Heuristic-only finding (model didn't mention it, tool calls confirm)
|
|
|
|
|
is preserved as a consolidation, marked ``source="tool-call audit"``.
|
|
|
|
|
- Model-declared pruning is accepted unless the heuristic has
|
|
|
|
|
tool-call evidence that contradicts it (rare — the heuristic would
|
|
|
|
|
have flagged consolidation). In that case we log both.
|
|
|
|
|
|
|
|
|
|
Every removed skill is placed in exactly one bucket.
|
|
|
|
|
"""
|
|
|
|
|
heur_cons = {e["name"]: e for e in heuristic.get("consolidated", [])}
|
|
|
|
|
heur_pruned = {e["name"] for e in heuristic.get("pruned", [])}
|
|
|
|
|
|
|
|
|
|
model_cons = {e["from"]: e for e in model_block.get("consolidations", [])}
|
|
|
|
|
model_pruned = {e["name"]: e for e in model_block.get("prunings", [])}
|
|
|
|
|
|
fix(curator): authoritative absorbed_into on delete + restore cron skill links on rollback (#18671) (#18731)
* fix(curator): authoritative absorbed_into declarations on skill delete
Closes #18671. The classification pipeline that feeds cron-ref rewriting
used to infer consolidation vs pruning from two brittle signals: the
curator model's post-hoc YAML summary block, and a substring heuristic
scanning other tool calls for the removed skill's name. Both miss in
real consolidations — the model forgets the YAML under reasoning
pressure, and the heuristic misses when the umbrella's patch content
describes the absorbed behavior abstractly instead of naming the old
slug. When both miss, the skill falls through to 'no-evidence fallback'
pruned, and #18253's cron rewriter drops the cron ref entirely instead
of mapping it to the umbrella. Same observable symptom as pre-#18253:
'Skill(s) not found and skipped' at the next cron run.
The fix makes the model declare intent at the moment of deletion.
skill_manage(action='delete') now accepts absorbed_into:
- absorbed_into='<umbrella>' -> consolidated, target must exist on disk
- absorbed_into='' -> explicit prune, no forwarding target
- missing -> legacy path, falls through to heuristic/YAML
The curator reconciler reads these declarations off llm_meta.tool_calls
BEFORE either the YAML block or the substring heuristic. Declaration
wins. Fallback logic stays intact for backward compat with any caller
(human or older curator conversation) that doesn't populate the arg.
Changes
- tools/skill_manager_tool.py: add absorbed_into param to skill_manage
+ _delete_skill. Validate target exists when non-empty. Reject
absorbed_into=<self>. Wire through dispatcher + registry + schema.
- agent/curator.py: new _extract_absorbed_into_declarations() walks
tool calls for skill_manage(delete) with the arg. _reconcile_classification
accepts absorbed_declarations= and treats them as authoritative. Curator
prompt updated to require the arg on every delete.
- Tests: 7 new skill_manager tests covering the tool contract (valid
target, empty string, nonexistent target, self-reference, whitespace,
backward compat, dispatcher plumbing). 11 new curator tests covering
the extractor + authoritative reconciler path + mixed-legacy-and-
declared runs.
Validation
- 307/307 targeted tests pass (curator + cron + skill_manager suites).
- E2E #18671 repro: 3 narrow skills, 1 umbrella, cron job referencing
all 3. Model emits NO YAML block. Heuristic misses (patch prose
doesn't name old slugs). Delete calls carry absorbed_into. Result:
both PR skills correctly classified 'consolidated' + cron rewritten
['pr-review-format', 'pr-review-checklist', 'stale-junk'] ->
['hermes-agent-dev']; stale-junk pruned via absorbed_into=''.
- E2E backward-compat: delete without absorbed_into, model emits YAML
-> routed via existing 'model' source, cron still rewritten correctly.
* feat(curator): capture + restore cron skill links across snapshot/rollback
Before this, rolling back a curator run restored the skills tree but cron
jobs still pointed at the umbrella skills the curator had rewritten them
to. The user would see their old narrow skills back on disk but their
cron jobs still configured with the merged umbrella — not actually 'back
to how it was'.
Snapshot side: snapshot_skills() now captures ~/.hermes/cron/jobs.json
alongside the skills tarball, as cron-jobs.json. The manifest gets a new
'cron_jobs' block with {backed_up, jobs_count} so rollback (and the CLI
confirm dialog) can surface what's in the snapshot. If jobs.json is
missing/unreadable/malformed, snapshot proceeds without cron data — the
skills backup is the core guarantee; cron is additive.
Rollback side: after the skills extract succeeds, the new
_restore_cron_skill_links() reconciles the backed-up jobs into the live
jobs.json SURGICALLY. Only 'skills' and 'skill' fields are restored, and
only on jobs matched by id. Everything else about a cron job — schedule,
last_run_at, next_run_at, enabled, prompt, workdir, hooks — is live
state the user or scheduler has modified since the snapshot; overwriting
it would regress unrelated activity.
Reconciliation rules:
- Job in backup AND live, skills differ → skills restored.
- Job in backup AND live, skills match → no-op.
- Job in backup, NOT in live → skipped (user deleted it
after snapshot; their choice
is later than the snapshot).
- Job in live, NOT in backup → untouched (user created it
after snapshot).
- Snapshot missing cron-jobs.json at all → rollback still succeeds,
reports 'not captured'
(older pre-feature snapshots
keep working).
Writes go through cron.jobs.save_jobs under the same _jobs_file_lock the
scheduler uses, so rollback doesn't race tick().
Also:
- hermes_cli/curator.py: rollback confirm dialog now shows
'cron jobs: N (will be restored for skill-link fields only)' when the
snapshot has cron data, or 'not in snapshot (<reason>)' otherwise.
- rollback()'s message string includes a 'cron links: ...' clause
summarizing the reconciliation outcome.
Tests
- 9 new cases: snapshot-with-cron, snapshot-without-cron, malformed-json
captured-as-raw, full rollback-restores-skills-and-cron, rollback
touches only skill fields, rollback skips user-deleted jobs, rollback
leaves user-created jobs untouched, rollback still works with
pre-feature snapshot that has no cron-jobs.json, standalone unit test
on _restore_cron_skill_links exercising the full report shape.
Validation
- 484/484 targeted tests pass (curator + cron + skill_manager suites).
- E2E: real snapshot_skills, real cron rewrite, real rollback. Before:
['pr-review-format', 'pr-review-checklist', 'pr-triage-salvage'].
After curator: ['hermes-agent-dev']. After rollback: ['pr-review-format',
'pr-review-checklist', 'pr-triage-salvage']. Non-skill fields (id,
name, prompt) preserved across the round trip.
2026-05-02 01:29:57 -07:00
|
|
|
declared = absorbed_declarations or {}
|
|
|
|
|
|
feat(curator): split archived into consolidated vs pruned with model + heuristic classification (#17941)
* fix(curator): split 'archived' into consolidated vs pruned in run reports
Users who watched a curator run saw skills like 'anthropic-api' listed
under 'Skills archived' and interpreted that as pruning — but the curator
had actually absorbed those skills into a new umbrella (e.g. 'llm-providers')
during the same run. The directory gets archived for safety (all removals
are recoverable), but the content still lives under a different name.
Users then 'restored' what they thought were deleted skills and ended up
with confusingly duplicated skillsets (old-name + absorbed-inside-umbrella).
Classify removed skills using this run's skill_manage tool calls:
- consolidated: content absorbed into a surviving/newly-created skill
(evidenced by a skill_manage write_file/patch/create/edit whose target
is a different skill AND whose file_path/content references the
removed skill's name)
- pruned: archived without consolidation evidence (truly stale)
REPORT.md now shows two distinct sections:
- 'Consolidated into umbrella skills' — with `removed → merged into umbrella`
- 'Pruned — archived for staleness' — pure staleness archives
run.json schema additions (backward compatible):
- counts.consolidated_this_run, counts.pruned_this_run
- consolidated: [{name, into, evidence}, ...]
- pruned: [names]
- archived: retained as the union for backward compat
Also: relabel the auto-transitions 'archived' counter to 'archived (no
LLM, pure time-based staleness)' so it's clearly distinct from LLM-pass
archives.
Tests: 9 new tests in test_curator_classification.py covering consolidation
evidence parsing (write_file/patch/create), hyphen/underscore name variants,
self-reference rejection, destination-must-exist, mixed runs, and
malformed-JSON fallback safety. Existing test_report_md_is_human_readable
updated to cover the new section names.
E2E: isolated HERMES_HOME, realistic 3-skill run, REPORT.md verified
end-to-end.
* feat(curator): hybrid model-declared + heuristic classification
Extend the consolidated-vs-pruned split with LLM-authored intent:
1. Curator prompt now requires a structured YAML block at the end of the
final response (consolidations / prunings with short rationale).
2. _parse_structured_summary() extracts it tolerantly — missing block,
malformed YAML, partial lists all fall back to heuristic cleanly.
3. _reconcile_classification() merges model intent with the tool-call
heuristic:
- Model wins on rationale when its umbrella exists post-run
- Model hallucination (umbrella doesn't exist) is downgraded to the
heuristic's finding, or pruned if there's no evidence either
- Heuristic catches model omission — consolidations the model
enumerated tools for but forgot to list get surfaced with a
'(detected via tool-call audit)' tag
4. REPORT.md now shows per-row rationale alongside 'removed → umbrella'
and flags audit-only rows so the user knows why no reason is shown.
Backward compat: run.json's 'archived' field (union) is preserved.
'pruned' is now a list of dicts with {name, source, reason};
'pruned_names' is the flat-name list for legacy consumers.
Tests: 15 new covering YAML parse edge cases (malformed, empty lists,
bare-string entries, missing fields), reconciler rules (model wins,
hallucination fallback, heuristic catches omission, prune with reason),
and an end-to-end report-render test with all four paths exercised.
2026-04-30 10:31:23 -07:00
|
|
|
consolidated: List[Dict[str, Any]] = []
|
|
|
|
|
pruned: List[Dict[str, Any]] = []
|
|
|
|
|
|
|
|
|
|
for name in removed:
|
|
|
|
|
mc = model_cons.get(name)
|
|
|
|
|
mp = model_pruned.get(name)
|
|
|
|
|
hc = heur_cons.get(name)
|
fix(curator): authoritative absorbed_into on delete + restore cron skill links on rollback (#18671) (#18731)
* fix(curator): authoritative absorbed_into declarations on skill delete
Closes #18671. The classification pipeline that feeds cron-ref rewriting
used to infer consolidation vs pruning from two brittle signals: the
curator model's post-hoc YAML summary block, and a substring heuristic
scanning other tool calls for the removed skill's name. Both miss in
real consolidations — the model forgets the YAML under reasoning
pressure, and the heuristic misses when the umbrella's patch content
describes the absorbed behavior abstractly instead of naming the old
slug. When both miss, the skill falls through to 'no-evidence fallback'
pruned, and #18253's cron rewriter drops the cron ref entirely instead
of mapping it to the umbrella. Same observable symptom as pre-#18253:
'Skill(s) not found and skipped' at the next cron run.
The fix makes the model declare intent at the moment of deletion.
skill_manage(action='delete') now accepts absorbed_into:
- absorbed_into='<umbrella>' -> consolidated, target must exist on disk
- absorbed_into='' -> explicit prune, no forwarding target
- missing -> legacy path, falls through to heuristic/YAML
The curator reconciler reads these declarations off llm_meta.tool_calls
BEFORE either the YAML block or the substring heuristic. Declaration
wins. Fallback logic stays intact for backward compat with any caller
(human or older curator conversation) that doesn't populate the arg.
Changes
- tools/skill_manager_tool.py: add absorbed_into param to skill_manage
+ _delete_skill. Validate target exists when non-empty. Reject
absorbed_into=<self>. Wire through dispatcher + registry + schema.
- agent/curator.py: new _extract_absorbed_into_declarations() walks
tool calls for skill_manage(delete) with the arg. _reconcile_classification
accepts absorbed_declarations= and treats them as authoritative. Curator
prompt updated to require the arg on every delete.
- Tests: 7 new skill_manager tests covering the tool contract (valid
target, empty string, nonexistent target, self-reference, whitespace,
backward compat, dispatcher plumbing). 11 new curator tests covering
the extractor + authoritative reconciler path + mixed-legacy-and-
declared runs.
Validation
- 307/307 targeted tests pass (curator + cron + skill_manager suites).
- E2E #18671 repro: 3 narrow skills, 1 umbrella, cron job referencing
all 3. Model emits NO YAML block. Heuristic misses (patch prose
doesn't name old slugs). Delete calls carry absorbed_into. Result:
both PR skills correctly classified 'consolidated' + cron rewritten
['pr-review-format', 'pr-review-checklist', 'stale-junk'] ->
['hermes-agent-dev']; stale-junk pruned via absorbed_into=''.
- E2E backward-compat: delete without absorbed_into, model emits YAML
-> routed via existing 'model' source, cron still rewritten correctly.
* feat(curator): capture + restore cron skill links across snapshot/rollback
Before this, rolling back a curator run restored the skills tree but cron
jobs still pointed at the umbrella skills the curator had rewritten them
to. The user would see their old narrow skills back on disk but their
cron jobs still configured with the merged umbrella — not actually 'back
to how it was'.
Snapshot side: snapshot_skills() now captures ~/.hermes/cron/jobs.json
alongside the skills tarball, as cron-jobs.json. The manifest gets a new
'cron_jobs' block with {backed_up, jobs_count} so rollback (and the CLI
confirm dialog) can surface what's in the snapshot. If jobs.json is
missing/unreadable/malformed, snapshot proceeds without cron data — the
skills backup is the core guarantee; cron is additive.
Rollback side: after the skills extract succeeds, the new
_restore_cron_skill_links() reconciles the backed-up jobs into the live
jobs.json SURGICALLY. Only 'skills' and 'skill' fields are restored, and
only on jobs matched by id. Everything else about a cron job — schedule,
last_run_at, next_run_at, enabled, prompt, workdir, hooks — is live
state the user or scheduler has modified since the snapshot; overwriting
it would regress unrelated activity.
Reconciliation rules:
- Job in backup AND live, skills differ → skills restored.
- Job in backup AND live, skills match → no-op.
- Job in backup, NOT in live → skipped (user deleted it
after snapshot; their choice
is later than the snapshot).
- Job in live, NOT in backup → untouched (user created it
after snapshot).
- Snapshot missing cron-jobs.json at all → rollback still succeeds,
reports 'not captured'
(older pre-feature snapshots
keep working).
Writes go through cron.jobs.save_jobs under the same _jobs_file_lock the
scheduler uses, so rollback doesn't race tick().
Also:
- hermes_cli/curator.py: rollback confirm dialog now shows
'cron jobs: N (will be restored for skill-link fields only)' when the
snapshot has cron data, or 'not in snapshot (<reason>)' otherwise.
- rollback()'s message string includes a 'cron links: ...' clause
summarizing the reconciliation outcome.
Tests
- 9 new cases: snapshot-with-cron, snapshot-without-cron, malformed-json
captured-as-raw, full rollback-restores-skills-and-cron, rollback
touches only skill fields, rollback skips user-deleted jobs, rollback
leaves user-created jobs untouched, rollback still works with
pre-feature snapshot that has no cron-jobs.json, standalone unit test
on _restore_cron_skill_links exercising the full report shape.
Validation
- 484/484 targeted tests pass (curator + cron + skill_manager suites).
- E2E: real snapshot_skills, real cron rewrite, real rollback. Before:
['pr-review-format', 'pr-review-checklist', 'pr-triage-salvage'].
After curator: ['hermes-agent-dev']. After rollback: ['pr-review-format',
'pr-review-checklist', 'pr-triage-salvage']. Non-skill fields (id,
name, prompt) preserved across the round trip.
2026-05-02 01:29:57 -07:00
|
|
|
dec = declared.get(name)
|
|
|
|
|
|
|
|
|
|
# Authoritative: model declared `absorbed_into` at the delete call.
|
|
|
|
|
if dec is not None:
|
|
|
|
|
into_claim = dec.get("into", "")
|
|
|
|
|
if into_claim and into_claim in destinations:
|
|
|
|
|
entry: Dict[str, Any] = {
|
|
|
|
|
"name": name,
|
|
|
|
|
"into": into_claim,
|
|
|
|
|
"source": "absorbed_into (model-declared at delete)",
|
|
|
|
|
"reason": (mc.get("reason") or "") if mc else "",
|
|
|
|
|
}
|
|
|
|
|
if hc and hc.get("evidence"):
|
|
|
|
|
entry["evidence"] = hc["evidence"]
|
|
|
|
|
consolidated.append(entry)
|
|
|
|
|
continue
|
|
|
|
|
if into_claim == "":
|
|
|
|
|
# Explicit prune declaration
|
|
|
|
|
pruned.append({
|
|
|
|
|
"name": name,
|
|
|
|
|
"source": "absorbed_into=\"\" (model-declared prune)",
|
|
|
|
|
"reason": (mp.get("reason") or "") if mp else "",
|
|
|
|
|
})
|
|
|
|
|
continue
|
|
|
|
|
# into_claim is non-empty but target doesn't exist: the model
|
|
|
|
|
# named a nonexistent umbrella at delete time. The tool already
|
|
|
|
|
# rejects this at the skill_manage layer, so we shouldn't see it
|
|
|
|
|
# in practice — but if it slips through (e.g. the umbrella was
|
|
|
|
|
# deleted LATER in the same run), fall through to the usual
|
|
|
|
|
# signals rather than trusting a broken reference.
|
feat(curator): split archived into consolidated vs pruned with model + heuristic classification (#17941)
* fix(curator): split 'archived' into consolidated vs pruned in run reports
Users who watched a curator run saw skills like 'anthropic-api' listed
under 'Skills archived' and interpreted that as pruning — but the curator
had actually absorbed those skills into a new umbrella (e.g. 'llm-providers')
during the same run. The directory gets archived for safety (all removals
are recoverable), but the content still lives under a different name.
Users then 'restored' what they thought were deleted skills and ended up
with confusingly duplicated skillsets (old-name + absorbed-inside-umbrella).
Classify removed skills using this run's skill_manage tool calls:
- consolidated: content absorbed into a surviving/newly-created skill
(evidenced by a skill_manage write_file/patch/create/edit whose target
is a different skill AND whose file_path/content references the
removed skill's name)
- pruned: archived without consolidation evidence (truly stale)
REPORT.md now shows two distinct sections:
- 'Consolidated into umbrella skills' — with `removed → merged into umbrella`
- 'Pruned — archived for staleness' — pure staleness archives
run.json schema additions (backward compatible):
- counts.consolidated_this_run, counts.pruned_this_run
- consolidated: [{name, into, evidence}, ...]
- pruned: [names]
- archived: retained as the union for backward compat
Also: relabel the auto-transitions 'archived' counter to 'archived (no
LLM, pure time-based staleness)' so it's clearly distinct from LLM-pass
archives.
Tests: 9 new tests in test_curator_classification.py covering consolidation
evidence parsing (write_file/patch/create), hyphen/underscore name variants,
self-reference rejection, destination-must-exist, mixed runs, and
malformed-JSON fallback safety. Existing test_report_md_is_human_readable
updated to cover the new section names.
E2E: isolated HERMES_HOME, realistic 3-skill run, REPORT.md verified
end-to-end.
* feat(curator): hybrid model-declared + heuristic classification
Extend the consolidated-vs-pruned split with LLM-authored intent:
1. Curator prompt now requires a structured YAML block at the end of the
final response (consolidations / prunings with short rationale).
2. _parse_structured_summary() extracts it tolerantly — missing block,
malformed YAML, partial lists all fall back to heuristic cleanly.
3. _reconcile_classification() merges model intent with the tool-call
heuristic:
- Model wins on rationale when its umbrella exists post-run
- Model hallucination (umbrella doesn't exist) is downgraded to the
heuristic's finding, or pruned if there's no evidence either
- Heuristic catches model omission — consolidations the model
enumerated tools for but forgot to list get surfaced with a
'(detected via tool-call audit)' tag
4. REPORT.md now shows per-row rationale alongside 'removed → umbrella'
and flags audit-only rows so the user knows why no reason is shown.
Backward compat: run.json's 'archived' field (union) is preserved.
'pruned' is now a list of dicts with {name, source, reason};
'pruned_names' is the flat-name list for legacy consumers.
Tests: 15 new covering YAML parse edge cases (malformed, empty lists,
bare-string entries, missing fields), reconciler rules (model wins,
hallucination fallback, heuristic catches omission, prune with reason),
and an end-to-end report-render test with all four paths exercised.
2026-04-30 10:31:23 -07:00
|
|
|
|
|
|
|
|
# Model says consolidated — trust it if the destination is real.
|
|
|
|
|
if mc and mc.get("into") in destinations:
|
|
|
|
|
entry: Dict[str, Any] = {
|
|
|
|
|
"name": name,
|
|
|
|
|
"into": mc["into"],
|
|
|
|
|
"source": "model" + ("+audit" if hc else ""),
|
|
|
|
|
"reason": mc.get("reason") or "",
|
|
|
|
|
}
|
|
|
|
|
if hc and hc.get("evidence"):
|
|
|
|
|
entry["evidence"] = hc["evidence"]
|
|
|
|
|
consolidated.append(entry)
|
|
|
|
|
continue
|
|
|
|
|
|
|
|
|
|
# Model says consolidated but the umbrella doesn't exist —
|
|
|
|
|
# hallucination. Fall back to heuristic or prune.
|
|
|
|
|
if mc and mc.get("into") not in destinations:
|
|
|
|
|
if hc:
|
|
|
|
|
consolidated.append({
|
|
|
|
|
"name": name,
|
|
|
|
|
"into": hc["into"],
|
|
|
|
|
"source": "tool-call audit (model named missing umbrella)",
|
|
|
|
|
"reason": "",
|
|
|
|
|
"evidence": hc.get("evidence", ""),
|
|
|
|
|
"model_claimed_into": mc["into"],
|
|
|
|
|
})
|
|
|
|
|
else:
|
|
|
|
|
pruned.append({
|
|
|
|
|
"name": name,
|
|
|
|
|
"source": "fallback (model named missing umbrella, no tool-call evidence)",
|
|
|
|
|
"reason": "",
|
|
|
|
|
})
|
|
|
|
|
continue
|
|
|
|
|
|
|
|
|
|
# Heuristic found consolidation the model didn't mention.
|
|
|
|
|
if hc:
|
|
|
|
|
consolidated.append({
|
|
|
|
|
"name": name,
|
|
|
|
|
"into": hc["into"],
|
|
|
|
|
"source": "tool-call audit (model omitted from structured block)",
|
|
|
|
|
"reason": "",
|
|
|
|
|
"evidence": hc.get("evidence", ""),
|
|
|
|
|
})
|
|
|
|
|
continue
|
|
|
|
|
|
|
|
|
|
# Model says pruned (or no mention + no heuristic evidence).
|
|
|
|
|
reason = mp.get("reason", "") if mp else ""
|
|
|
|
|
pruned.append({
|
|
|
|
|
"name": name,
|
|
|
|
|
"source": "model" if mp else "no-evidence fallback",
|
|
|
|
|
"reason": reason,
|
|
|
|
|
})
|
|
|
|
|
|
|
|
|
|
return {"consolidated": consolidated, "pruned": pruned}
|
|
|
|
|
|
|
|
|
|
|
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307)
Every curator pass now emits a dated report directory under
`~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files:
- `run.json` — machine-readable full record (before/after snapshot,
state transitions, all tool calls, model/provider, timing, full LLM
final response untruncated, error if any)
- `REPORT.md` — human-readable markdown: model + duration header,
auto-transition counts, LLM consolidation stats, archived-this-run
list, new-skills-this-run list, state transitions, the full LLM
final summary, and a recovery footer pointing at the archive + the
`hermes curator restore` command
Reports live under `logs/curator/`, not inside `skills/` — they're
operational telemetry, not user-authored skill data, and belong
alongside `agent.log` / `gateway.log`.
Internals:
- `_run_llm_review()` now returns a dict (final, summary, model,
provider, tool_calls, error) instead of a bare truncated string so
the reporter has full fidelity
- Report writer is fully best-effort — any failure logs at DEBUG and
never breaks the curator itself. Same-second rerun gets a numeric
suffix so reports can't clobber each other
- Report path stamped into `.curator_state` as `last_report_path`
- `hermes curator status` surfaces a "last report:" line so users
can immediately open the latest run
Tests (all green):
- 7 new tests in tests/agent/test_curator_reports.py covering: report
location (logs not skills), both files written, run.json shape and
diff accuracy, markdown structure, error path still writes, state
transitions captured, same-second runs get unique dirs
- Existing test_run_review_synchronous_invokes_llm_stub updated to
stub the new dict-returning _run_llm_review signature
Live E2E: ran a synchronous pass against a 1-skill test collection
with a stubbed LLM; report written correctly, state stamped with
last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
|
|
|
def _write_run_report(
|
|
|
|
|
*,
|
|
|
|
|
started_at: datetime,
|
|
|
|
|
elapsed_seconds: float,
|
|
|
|
|
auto_counts: Dict[str, int],
|
|
|
|
|
auto_summary: str,
|
|
|
|
|
before_report: List[Dict[str, Any]],
|
|
|
|
|
before_names: Set[str],
|
|
|
|
|
after_report: List[Dict[str, Any]],
|
|
|
|
|
llm_meta: Dict[str, Any],
|
|
|
|
|
) -> Optional[Path]:
|
|
|
|
|
"""Write run.json + REPORT.md under logs/curator/{YYYYMMDD-HHMMSS}/.
|
|
|
|
|
|
|
|
|
|
Returns the report directory path on success, None if the write
|
|
|
|
|
couldn't happen (caller logs and continues — reporting is best-effort).
|
|
|
|
|
"""
|
|
|
|
|
root = _reports_root()
|
|
|
|
|
try:
|
|
|
|
|
root.mkdir(parents=True, exist_ok=True)
|
|
|
|
|
except Exception as e:
|
|
|
|
|
logger.debug("Curator report dir create failed: %s", e)
|
|
|
|
|
return None
|
|
|
|
|
|
|
|
|
|
stamp = started_at.strftime("%Y%m%d-%H%M%S")
|
|
|
|
|
run_dir = root / stamp
|
|
|
|
|
# If we crash-reran within the same second, append a disambiguator
|
|
|
|
|
suffix = 1
|
|
|
|
|
while run_dir.exists():
|
|
|
|
|
suffix += 1
|
|
|
|
|
run_dir = root / f"{stamp}-{suffix}"
|
|
|
|
|
try:
|
|
|
|
|
run_dir.mkdir(parents=True, exist_ok=False)
|
|
|
|
|
except Exception as e:
|
|
|
|
|
logger.debug("Curator run dir create failed: %s", e)
|
|
|
|
|
return None
|
|
|
|
|
|
|
|
|
|
# Diff before/after
|
|
|
|
|
after_by_name = {r.get("name"): r for r in after_report if isinstance(r, dict)}
|
|
|
|
|
after_names = set(after_by_name.keys())
|
|
|
|
|
removed = sorted(before_names - after_names) # archived during this run
|
|
|
|
|
added = sorted(after_names - before_names) # new skills this run
|
|
|
|
|
before_by_name = {r.get("name"): r for r in before_report if isinstance(r, dict)}
|
|
|
|
|
|
|
|
|
|
# State transitions between the two snapshots (e.g. active -> stale)
|
|
|
|
|
transitions: List[Dict[str, str]] = []
|
|
|
|
|
for name in sorted(after_names & before_names):
|
|
|
|
|
s_before = (before_by_name.get(name) or {}).get("state")
|
|
|
|
|
s_after = (after_by_name.get(name) or {}).get("state")
|
|
|
|
|
if s_before and s_after and s_before != s_after:
|
|
|
|
|
transitions.append({"name": name, "from": s_before, "to": s_after})
|
|
|
|
|
|
|
|
|
|
# Classify LLM tool calls
|
|
|
|
|
tc_counts: Dict[str, int] = {}
|
|
|
|
|
for tc in llm_meta.get("tool_calls", []) or []:
|
|
|
|
|
name = tc.get("name", "unknown")
|
|
|
|
|
tc_counts[name] = tc_counts.get(name, 0) + 1
|
|
|
|
|
|
feat(curator): split archived into consolidated vs pruned with model + heuristic classification (#17941)
* fix(curator): split 'archived' into consolidated vs pruned in run reports
Users who watched a curator run saw skills like 'anthropic-api' listed
under 'Skills archived' and interpreted that as pruning — but the curator
had actually absorbed those skills into a new umbrella (e.g. 'llm-providers')
during the same run. The directory gets archived for safety (all removals
are recoverable), but the content still lives under a different name.
Users then 'restored' what they thought were deleted skills and ended up
with confusingly duplicated skillsets (old-name + absorbed-inside-umbrella).
Classify removed skills using this run's skill_manage tool calls:
- consolidated: content absorbed into a surviving/newly-created skill
(evidenced by a skill_manage write_file/patch/create/edit whose target
is a different skill AND whose file_path/content references the
removed skill's name)
- pruned: archived without consolidation evidence (truly stale)
REPORT.md now shows two distinct sections:
- 'Consolidated into umbrella skills' — with `removed → merged into umbrella`
- 'Pruned — archived for staleness' — pure staleness archives
run.json schema additions (backward compatible):
- counts.consolidated_this_run, counts.pruned_this_run
- consolidated: [{name, into, evidence}, ...]
- pruned: [names]
- archived: retained as the union for backward compat
Also: relabel the auto-transitions 'archived' counter to 'archived (no
LLM, pure time-based staleness)' so it's clearly distinct from LLM-pass
archives.
Tests: 9 new tests in test_curator_classification.py covering consolidation
evidence parsing (write_file/patch/create), hyphen/underscore name variants,
self-reference rejection, destination-must-exist, mixed runs, and
malformed-JSON fallback safety. Existing test_report_md_is_human_readable
updated to cover the new section names.
E2E: isolated HERMES_HOME, realistic 3-skill run, REPORT.md verified
end-to-end.
* feat(curator): hybrid model-declared + heuristic classification
Extend the consolidated-vs-pruned split with LLM-authored intent:
1. Curator prompt now requires a structured YAML block at the end of the
final response (consolidations / prunings with short rationale).
2. _parse_structured_summary() extracts it tolerantly — missing block,
malformed YAML, partial lists all fall back to heuristic cleanly.
3. _reconcile_classification() merges model intent with the tool-call
heuristic:
- Model wins on rationale when its umbrella exists post-run
- Model hallucination (umbrella doesn't exist) is downgraded to the
heuristic's finding, or pruned if there's no evidence either
- Heuristic catches model omission — consolidations the model
enumerated tools for but forgot to list get surfaced with a
'(detected via tool-call audit)' tag
4. REPORT.md now shows per-row rationale alongside 'removed → umbrella'
and flags audit-only rows so the user knows why no reason is shown.
Backward compat: run.json's 'archived' field (union) is preserved.
'pruned' is now a list of dicts with {name, source, reason};
'pruned_names' is the flat-name list for legacy consumers.
Tests: 15 new covering YAML parse edge cases (malformed, empty lists,
bare-string entries, missing fields), reconciler rules (model wins,
hallucination fallback, heuristic catches omission, prune with reason),
and an end-to-end report-render test with all four paths exercised.
2026-04-30 10:31:23 -07:00
|
|
|
# Split "removed" into consolidated (absorbed into umbrella) vs pruned
|
|
|
|
|
# (archived for staleness, content not preserved elsewhere). The old
|
|
|
|
|
# "Skills archived" section lumped both together, which misled users
|
|
|
|
|
# into thinking consolidated skills had been pruned.
|
|
|
|
|
#
|
|
|
|
|
# Classification strategy:
|
|
|
|
|
# 1. Parse the curator's structured YAML block from its final response.
|
|
|
|
|
# The curator is now prompted to emit consolidations/prunings lists
|
|
|
|
|
# with short rationale. The model has intent visibility the tool
|
|
|
|
|
# calls don't.
|
|
|
|
|
# 2. Run the tool-call heuristic as a ground-truth audit.
|
|
|
|
|
# 3. Reconcile: model gets authority over intent + rationale, heuristic
|
|
|
|
|
# catches hallucination (umbrella doesn't exist) and omission
|
|
|
|
|
# (model forgot to list an actual consolidation).
|
|
|
|
|
heuristic = _classify_removed_skills(
|
|
|
|
|
removed=removed,
|
|
|
|
|
added=added,
|
|
|
|
|
after_names=after_names,
|
|
|
|
|
tool_calls=llm_meta.get("tool_calls", []) or [],
|
|
|
|
|
)
|
|
|
|
|
model_block = _parse_structured_summary(llm_meta.get("final", "") or "")
|
|
|
|
|
destinations = set(after_names) | set(added or [])
|
fix(curator): authoritative absorbed_into on delete + restore cron skill links on rollback (#18671) (#18731)
* fix(curator): authoritative absorbed_into declarations on skill delete
Closes #18671. The classification pipeline that feeds cron-ref rewriting
used to infer consolidation vs pruning from two brittle signals: the
curator model's post-hoc YAML summary block, and a substring heuristic
scanning other tool calls for the removed skill's name. Both miss in
real consolidations — the model forgets the YAML under reasoning
pressure, and the heuristic misses when the umbrella's patch content
describes the absorbed behavior abstractly instead of naming the old
slug. When both miss, the skill falls through to 'no-evidence fallback'
pruned, and #18253's cron rewriter drops the cron ref entirely instead
of mapping it to the umbrella. Same observable symptom as pre-#18253:
'Skill(s) not found and skipped' at the next cron run.
The fix makes the model declare intent at the moment of deletion.
skill_manage(action='delete') now accepts absorbed_into:
- absorbed_into='<umbrella>' -> consolidated, target must exist on disk
- absorbed_into='' -> explicit prune, no forwarding target
- missing -> legacy path, falls through to heuristic/YAML
The curator reconciler reads these declarations off llm_meta.tool_calls
BEFORE either the YAML block or the substring heuristic. Declaration
wins. Fallback logic stays intact for backward compat with any caller
(human or older curator conversation) that doesn't populate the arg.
Changes
- tools/skill_manager_tool.py: add absorbed_into param to skill_manage
+ _delete_skill. Validate target exists when non-empty. Reject
absorbed_into=<self>. Wire through dispatcher + registry + schema.
- agent/curator.py: new _extract_absorbed_into_declarations() walks
tool calls for skill_manage(delete) with the arg. _reconcile_classification
accepts absorbed_declarations= and treats them as authoritative. Curator
prompt updated to require the arg on every delete.
- Tests: 7 new skill_manager tests covering the tool contract (valid
target, empty string, nonexistent target, self-reference, whitespace,
backward compat, dispatcher plumbing). 11 new curator tests covering
the extractor + authoritative reconciler path + mixed-legacy-and-
declared runs.
Validation
- 307/307 targeted tests pass (curator + cron + skill_manager suites).
- E2E #18671 repro: 3 narrow skills, 1 umbrella, cron job referencing
all 3. Model emits NO YAML block. Heuristic misses (patch prose
doesn't name old slugs). Delete calls carry absorbed_into. Result:
both PR skills correctly classified 'consolidated' + cron rewritten
['pr-review-format', 'pr-review-checklist', 'stale-junk'] ->
['hermes-agent-dev']; stale-junk pruned via absorbed_into=''.
- E2E backward-compat: delete without absorbed_into, model emits YAML
-> routed via existing 'model' source, cron still rewritten correctly.
* feat(curator): capture + restore cron skill links across snapshot/rollback
Before this, rolling back a curator run restored the skills tree but cron
jobs still pointed at the umbrella skills the curator had rewritten them
to. The user would see their old narrow skills back on disk but their
cron jobs still configured with the merged umbrella — not actually 'back
to how it was'.
Snapshot side: snapshot_skills() now captures ~/.hermes/cron/jobs.json
alongside the skills tarball, as cron-jobs.json. The manifest gets a new
'cron_jobs' block with {backed_up, jobs_count} so rollback (and the CLI
confirm dialog) can surface what's in the snapshot. If jobs.json is
missing/unreadable/malformed, snapshot proceeds without cron data — the
skills backup is the core guarantee; cron is additive.
Rollback side: after the skills extract succeeds, the new
_restore_cron_skill_links() reconciles the backed-up jobs into the live
jobs.json SURGICALLY. Only 'skills' and 'skill' fields are restored, and
only on jobs matched by id. Everything else about a cron job — schedule,
last_run_at, next_run_at, enabled, prompt, workdir, hooks — is live
state the user or scheduler has modified since the snapshot; overwriting
it would regress unrelated activity.
Reconciliation rules:
- Job in backup AND live, skills differ → skills restored.
- Job in backup AND live, skills match → no-op.
- Job in backup, NOT in live → skipped (user deleted it
after snapshot; their choice
is later than the snapshot).
- Job in live, NOT in backup → untouched (user created it
after snapshot).
- Snapshot missing cron-jobs.json at all → rollback still succeeds,
reports 'not captured'
(older pre-feature snapshots
keep working).
Writes go through cron.jobs.save_jobs under the same _jobs_file_lock the
scheduler uses, so rollback doesn't race tick().
Also:
- hermes_cli/curator.py: rollback confirm dialog now shows
'cron jobs: N (will be restored for skill-link fields only)' when the
snapshot has cron data, or 'not in snapshot (<reason>)' otherwise.
- rollback()'s message string includes a 'cron links: ...' clause
summarizing the reconciliation outcome.
Tests
- 9 new cases: snapshot-with-cron, snapshot-without-cron, malformed-json
captured-as-raw, full rollback-restores-skills-and-cron, rollback
touches only skill fields, rollback skips user-deleted jobs, rollback
leaves user-created jobs untouched, rollback still works with
pre-feature snapshot that has no cron-jobs.json, standalone unit test
on _restore_cron_skill_links exercising the full report shape.
Validation
- 484/484 targeted tests pass (curator + cron + skill_manager suites).
- E2E: real snapshot_skills, real cron rewrite, real rollback. Before:
['pr-review-format', 'pr-review-checklist', 'pr-triage-salvage'].
After curator: ['hermes-agent-dev']. After rollback: ['pr-review-format',
'pr-review-checklist', 'pr-triage-salvage']. Non-skill fields (id,
name, prompt) preserved across the round trip.
2026-05-02 01:29:57 -07:00
|
|
|
# Authoritative signal: extract per-delete `absorbed_into` declarations
|
|
|
|
|
# from this run's tool calls. These beat both the YAML summary block and
|
|
|
|
|
# the substring heuristic — the model is telling us directly, at the
|
|
|
|
|
# moment of deletion, whether each archived skill was consolidated
|
|
|
|
|
# (into=<umbrella>) or pruned (into="").
|
|
|
|
|
absorbed_declarations = _extract_absorbed_into_declarations(
|
|
|
|
|
llm_meta.get("tool_calls", []) or []
|
|
|
|
|
)
|
feat(curator): split archived into consolidated vs pruned with model + heuristic classification (#17941)
* fix(curator): split 'archived' into consolidated vs pruned in run reports
Users who watched a curator run saw skills like 'anthropic-api' listed
under 'Skills archived' and interpreted that as pruning — but the curator
had actually absorbed those skills into a new umbrella (e.g. 'llm-providers')
during the same run. The directory gets archived for safety (all removals
are recoverable), but the content still lives under a different name.
Users then 'restored' what they thought were deleted skills and ended up
with confusingly duplicated skillsets (old-name + absorbed-inside-umbrella).
Classify removed skills using this run's skill_manage tool calls:
- consolidated: content absorbed into a surviving/newly-created skill
(evidenced by a skill_manage write_file/patch/create/edit whose target
is a different skill AND whose file_path/content references the
removed skill's name)
- pruned: archived without consolidation evidence (truly stale)
REPORT.md now shows two distinct sections:
- 'Consolidated into umbrella skills' — with `removed → merged into umbrella`
- 'Pruned — archived for staleness' — pure staleness archives
run.json schema additions (backward compatible):
- counts.consolidated_this_run, counts.pruned_this_run
- consolidated: [{name, into, evidence}, ...]
- pruned: [names]
- archived: retained as the union for backward compat
Also: relabel the auto-transitions 'archived' counter to 'archived (no
LLM, pure time-based staleness)' so it's clearly distinct from LLM-pass
archives.
Tests: 9 new tests in test_curator_classification.py covering consolidation
evidence parsing (write_file/patch/create), hyphen/underscore name variants,
self-reference rejection, destination-must-exist, mixed runs, and
malformed-JSON fallback safety. Existing test_report_md_is_human_readable
updated to cover the new section names.
E2E: isolated HERMES_HOME, realistic 3-skill run, REPORT.md verified
end-to-end.
* feat(curator): hybrid model-declared + heuristic classification
Extend the consolidated-vs-pruned split with LLM-authored intent:
1. Curator prompt now requires a structured YAML block at the end of the
final response (consolidations / prunings with short rationale).
2. _parse_structured_summary() extracts it tolerantly — missing block,
malformed YAML, partial lists all fall back to heuristic cleanly.
3. _reconcile_classification() merges model intent with the tool-call
heuristic:
- Model wins on rationale when its umbrella exists post-run
- Model hallucination (umbrella doesn't exist) is downgraded to the
heuristic's finding, or pruned if there's no evidence either
- Heuristic catches model omission — consolidations the model
enumerated tools for but forgot to list get surfaced with a
'(detected via tool-call audit)' tag
4. REPORT.md now shows per-row rationale alongside 'removed → umbrella'
and flags audit-only rows so the user knows why no reason is shown.
Backward compat: run.json's 'archived' field (union) is preserved.
'pruned' is now a list of dicts with {name, source, reason};
'pruned_names' is the flat-name list for legacy consumers.
Tests: 15 new covering YAML parse edge cases (malformed, empty lists,
bare-string entries, missing fields), reconciler rules (model wins,
hallucination fallback, heuristic catches omission, prune with reason),
and an end-to-end report-render test with all four paths exercised.
2026-04-30 10:31:23 -07:00
|
|
|
classification = _reconcile_classification(
|
|
|
|
|
removed=removed,
|
|
|
|
|
heuristic=heuristic,
|
|
|
|
|
model_block=model_block,
|
|
|
|
|
destinations=destinations,
|
fix(curator): authoritative absorbed_into on delete + restore cron skill links on rollback (#18671) (#18731)
* fix(curator): authoritative absorbed_into declarations on skill delete
Closes #18671. The classification pipeline that feeds cron-ref rewriting
used to infer consolidation vs pruning from two brittle signals: the
curator model's post-hoc YAML summary block, and a substring heuristic
scanning other tool calls for the removed skill's name. Both miss in
real consolidations — the model forgets the YAML under reasoning
pressure, and the heuristic misses when the umbrella's patch content
describes the absorbed behavior abstractly instead of naming the old
slug. When both miss, the skill falls through to 'no-evidence fallback'
pruned, and #18253's cron rewriter drops the cron ref entirely instead
of mapping it to the umbrella. Same observable symptom as pre-#18253:
'Skill(s) not found and skipped' at the next cron run.
The fix makes the model declare intent at the moment of deletion.
skill_manage(action='delete') now accepts absorbed_into:
- absorbed_into='<umbrella>' -> consolidated, target must exist on disk
- absorbed_into='' -> explicit prune, no forwarding target
- missing -> legacy path, falls through to heuristic/YAML
The curator reconciler reads these declarations off llm_meta.tool_calls
BEFORE either the YAML block or the substring heuristic. Declaration
wins. Fallback logic stays intact for backward compat with any caller
(human or older curator conversation) that doesn't populate the arg.
Changes
- tools/skill_manager_tool.py: add absorbed_into param to skill_manage
+ _delete_skill. Validate target exists when non-empty. Reject
absorbed_into=<self>. Wire through dispatcher + registry + schema.
- agent/curator.py: new _extract_absorbed_into_declarations() walks
tool calls for skill_manage(delete) with the arg. _reconcile_classification
accepts absorbed_declarations= and treats them as authoritative. Curator
prompt updated to require the arg on every delete.
- Tests: 7 new skill_manager tests covering the tool contract (valid
target, empty string, nonexistent target, self-reference, whitespace,
backward compat, dispatcher plumbing). 11 new curator tests covering
the extractor + authoritative reconciler path + mixed-legacy-and-
declared runs.
Validation
- 307/307 targeted tests pass (curator + cron + skill_manager suites).
- E2E #18671 repro: 3 narrow skills, 1 umbrella, cron job referencing
all 3. Model emits NO YAML block. Heuristic misses (patch prose
doesn't name old slugs). Delete calls carry absorbed_into. Result:
both PR skills correctly classified 'consolidated' + cron rewritten
['pr-review-format', 'pr-review-checklist', 'stale-junk'] ->
['hermes-agent-dev']; stale-junk pruned via absorbed_into=''.
- E2E backward-compat: delete without absorbed_into, model emits YAML
-> routed via existing 'model' source, cron still rewritten correctly.
* feat(curator): capture + restore cron skill links across snapshot/rollback
Before this, rolling back a curator run restored the skills tree but cron
jobs still pointed at the umbrella skills the curator had rewritten them
to. The user would see their old narrow skills back on disk but their
cron jobs still configured with the merged umbrella — not actually 'back
to how it was'.
Snapshot side: snapshot_skills() now captures ~/.hermes/cron/jobs.json
alongside the skills tarball, as cron-jobs.json. The manifest gets a new
'cron_jobs' block with {backed_up, jobs_count} so rollback (and the CLI
confirm dialog) can surface what's in the snapshot. If jobs.json is
missing/unreadable/malformed, snapshot proceeds without cron data — the
skills backup is the core guarantee; cron is additive.
Rollback side: after the skills extract succeeds, the new
_restore_cron_skill_links() reconciles the backed-up jobs into the live
jobs.json SURGICALLY. Only 'skills' and 'skill' fields are restored, and
only on jobs matched by id. Everything else about a cron job — schedule,
last_run_at, next_run_at, enabled, prompt, workdir, hooks — is live
state the user or scheduler has modified since the snapshot; overwriting
it would regress unrelated activity.
Reconciliation rules:
- Job in backup AND live, skills differ → skills restored.
- Job in backup AND live, skills match → no-op.
- Job in backup, NOT in live → skipped (user deleted it
after snapshot; their choice
is later than the snapshot).
- Job in live, NOT in backup → untouched (user created it
after snapshot).
- Snapshot missing cron-jobs.json at all → rollback still succeeds,
reports 'not captured'
(older pre-feature snapshots
keep working).
Writes go through cron.jobs.save_jobs under the same _jobs_file_lock the
scheduler uses, so rollback doesn't race tick().
Also:
- hermes_cli/curator.py: rollback confirm dialog now shows
'cron jobs: N (will be restored for skill-link fields only)' when the
snapshot has cron data, or 'not in snapshot (<reason>)' otherwise.
- rollback()'s message string includes a 'cron links: ...' clause
summarizing the reconciliation outcome.
Tests
- 9 new cases: snapshot-with-cron, snapshot-without-cron, malformed-json
captured-as-raw, full rollback-restores-skills-and-cron, rollback
touches only skill fields, rollback skips user-deleted jobs, rollback
leaves user-created jobs untouched, rollback still works with
pre-feature snapshot that has no cron-jobs.json, standalone unit test
on _restore_cron_skill_links exercising the full report shape.
Validation
- 484/484 targeted tests pass (curator + cron + skill_manager suites).
- E2E: real snapshot_skills, real cron rewrite, real rollback. Before:
['pr-review-format', 'pr-review-checklist', 'pr-triage-salvage'].
After curator: ['hermes-agent-dev']. After rollback: ['pr-review-format',
'pr-review-checklist', 'pr-triage-salvage']. Non-skill fields (id,
name, prompt) preserved across the round trip.
2026-05-02 01:29:57 -07:00
|
|
|
absorbed_declarations=absorbed_declarations,
|
feat(curator): split archived into consolidated vs pruned with model + heuristic classification (#17941)
* fix(curator): split 'archived' into consolidated vs pruned in run reports
Users who watched a curator run saw skills like 'anthropic-api' listed
under 'Skills archived' and interpreted that as pruning — but the curator
had actually absorbed those skills into a new umbrella (e.g. 'llm-providers')
during the same run. The directory gets archived for safety (all removals
are recoverable), but the content still lives under a different name.
Users then 'restored' what they thought were deleted skills and ended up
with confusingly duplicated skillsets (old-name + absorbed-inside-umbrella).
Classify removed skills using this run's skill_manage tool calls:
- consolidated: content absorbed into a surviving/newly-created skill
(evidenced by a skill_manage write_file/patch/create/edit whose target
is a different skill AND whose file_path/content references the
removed skill's name)
- pruned: archived without consolidation evidence (truly stale)
REPORT.md now shows two distinct sections:
- 'Consolidated into umbrella skills' — with `removed → merged into umbrella`
- 'Pruned — archived for staleness' — pure staleness archives
run.json schema additions (backward compatible):
- counts.consolidated_this_run, counts.pruned_this_run
- consolidated: [{name, into, evidence}, ...]
- pruned: [names]
- archived: retained as the union for backward compat
Also: relabel the auto-transitions 'archived' counter to 'archived (no
LLM, pure time-based staleness)' so it's clearly distinct from LLM-pass
archives.
Tests: 9 new tests in test_curator_classification.py covering consolidation
evidence parsing (write_file/patch/create), hyphen/underscore name variants,
self-reference rejection, destination-must-exist, mixed runs, and
malformed-JSON fallback safety. Existing test_report_md_is_human_readable
updated to cover the new section names.
E2E: isolated HERMES_HOME, realistic 3-skill run, REPORT.md verified
end-to-end.
* feat(curator): hybrid model-declared + heuristic classification
Extend the consolidated-vs-pruned split with LLM-authored intent:
1. Curator prompt now requires a structured YAML block at the end of the
final response (consolidations / prunings with short rationale).
2. _parse_structured_summary() extracts it tolerantly — missing block,
malformed YAML, partial lists all fall back to heuristic cleanly.
3. _reconcile_classification() merges model intent with the tool-call
heuristic:
- Model wins on rationale when its umbrella exists post-run
- Model hallucination (umbrella doesn't exist) is downgraded to the
heuristic's finding, or pruned if there's no evidence either
- Heuristic catches model omission — consolidations the model
enumerated tools for but forgot to list get surfaced with a
'(detected via tool-call audit)' tag
4. REPORT.md now shows per-row rationale alongside 'removed → umbrella'
and flags audit-only rows so the user knows why no reason is shown.
Backward compat: run.json's 'archived' field (union) is preserved.
'pruned' is now a list of dicts with {name, source, reason};
'pruned_names' is the flat-name list for legacy consumers.
Tests: 15 new covering YAML parse edge cases (malformed, empty lists,
bare-string entries, missing fields), reconciler rules (model wins,
hallucination fallback, heuristic catches omission, prune with reason),
and an end-to-end report-render test with all four paths exercised.
2026-04-30 10:31:23 -07:00
|
|
|
)
|
|
|
|
|
consolidated = classification["consolidated"]
|
|
|
|
|
pruned = classification["pruned"]
|
|
|
|
|
|
2026-04-30 23:04:50 -07:00
|
|
|
# Rewrite cron job skill references. When the curator consolidates
|
|
|
|
|
# skill X into umbrella Y, any cron job that lists X fails to load
|
|
|
|
|
# it at run time — the scheduler skips it and the job runs without
|
|
|
|
|
# the instructions it was scheduled to follow. Rewriting the
|
|
|
|
|
# references in-place keeps scheduled jobs working across
|
|
|
|
|
# consolidation passes. Best-effort: never let a cron-module issue
|
|
|
|
|
# break the curator.
|
|
|
|
|
cron_rewrites: Dict[str, Any] = {"rewrites": [], "jobs_updated": 0, "jobs_scanned": 0}
|
|
|
|
|
try:
|
|
|
|
|
consolidated_map = {
|
|
|
|
|
e["name"]: e["into"]
|
|
|
|
|
for e in consolidated
|
|
|
|
|
if isinstance(e, dict) and e.get("name") and e.get("into")
|
|
|
|
|
}
|
|
|
|
|
pruned_names = [
|
|
|
|
|
e["name"] for e in pruned
|
|
|
|
|
if isinstance(e, dict) and e.get("name")
|
|
|
|
|
]
|
|
|
|
|
if consolidated_map or pruned_names:
|
|
|
|
|
from cron.jobs import rewrite_skill_refs as _rewrite_cron_refs
|
|
|
|
|
cron_rewrites = _rewrite_cron_refs(
|
|
|
|
|
consolidated=consolidated_map,
|
|
|
|
|
pruned=pruned_names,
|
|
|
|
|
)
|
|
|
|
|
except Exception as e:
|
|
|
|
|
logger.debug("Curator cron skill rewrite failed: %s", e, exc_info=True)
|
|
|
|
|
cron_rewrites = {
|
|
|
|
|
"rewrites": [],
|
|
|
|
|
"jobs_updated": 0,
|
|
|
|
|
"jobs_scanned": 0,
|
|
|
|
|
"error": str(e),
|
|
|
|
|
}
|
|
|
|
|
|
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307)
Every curator pass now emits a dated report directory under
`~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files:
- `run.json` — machine-readable full record (before/after snapshot,
state transitions, all tool calls, model/provider, timing, full LLM
final response untruncated, error if any)
- `REPORT.md` — human-readable markdown: model + duration header,
auto-transition counts, LLM consolidation stats, archived-this-run
list, new-skills-this-run list, state transitions, the full LLM
final summary, and a recovery footer pointing at the archive + the
`hermes curator restore` command
Reports live under `logs/curator/`, not inside `skills/` — they're
operational telemetry, not user-authored skill data, and belong
alongside `agent.log` / `gateway.log`.
Internals:
- `_run_llm_review()` now returns a dict (final, summary, model,
provider, tool_calls, error) instead of a bare truncated string so
the reporter has full fidelity
- Report writer is fully best-effort — any failure logs at DEBUG and
never breaks the curator itself. Same-second rerun gets a numeric
suffix so reports can't clobber each other
- Report path stamped into `.curator_state` as `last_report_path`
- `hermes curator status` surfaces a "last report:" line so users
can immediately open the latest run
Tests (all green):
- 7 new tests in tests/agent/test_curator_reports.py covering: report
location (logs not skills), both files written, run.json shape and
diff accuracy, markdown structure, error path still writes, state
transitions captured, same-second runs get unique dirs
- Existing test_run_review_synchronous_invokes_llm_stub updated to
stub the new dict-returning _run_llm_review signature
Live E2E: ran a synchronous pass against a 1-skill test collection
with a stubbed LLM; report written correctly, state stamped with
last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
|
|
|
payload = {
|
|
|
|
|
"started_at": started_at.isoformat(),
|
|
|
|
|
"duration_seconds": round(elapsed_seconds, 2),
|
|
|
|
|
"model": llm_meta.get("model", ""),
|
|
|
|
|
"provider": llm_meta.get("provider", ""),
|
|
|
|
|
"auto_transitions": auto_counts,
|
|
|
|
|
"counts": {
|
|
|
|
|
"before": len(before_names),
|
|
|
|
|
"after": len(after_names),
|
|
|
|
|
"delta": len(after_names) - len(before_names),
|
|
|
|
|
"archived_this_run": len(removed),
|
|
|
|
|
"added_this_run": len(added),
|
feat(curator): split archived into consolidated vs pruned with model + heuristic classification (#17941)
* fix(curator): split 'archived' into consolidated vs pruned in run reports
Users who watched a curator run saw skills like 'anthropic-api' listed
under 'Skills archived' and interpreted that as pruning — but the curator
had actually absorbed those skills into a new umbrella (e.g. 'llm-providers')
during the same run. The directory gets archived for safety (all removals
are recoverable), but the content still lives under a different name.
Users then 'restored' what they thought were deleted skills and ended up
with confusingly duplicated skillsets (old-name + absorbed-inside-umbrella).
Classify removed skills using this run's skill_manage tool calls:
- consolidated: content absorbed into a surviving/newly-created skill
(evidenced by a skill_manage write_file/patch/create/edit whose target
is a different skill AND whose file_path/content references the
removed skill's name)
- pruned: archived without consolidation evidence (truly stale)
REPORT.md now shows two distinct sections:
- 'Consolidated into umbrella skills' — with `removed → merged into umbrella`
- 'Pruned — archived for staleness' — pure staleness archives
run.json schema additions (backward compatible):
- counts.consolidated_this_run, counts.pruned_this_run
- consolidated: [{name, into, evidence}, ...]
- pruned: [names]
- archived: retained as the union for backward compat
Also: relabel the auto-transitions 'archived' counter to 'archived (no
LLM, pure time-based staleness)' so it's clearly distinct from LLM-pass
archives.
Tests: 9 new tests in test_curator_classification.py covering consolidation
evidence parsing (write_file/patch/create), hyphen/underscore name variants,
self-reference rejection, destination-must-exist, mixed runs, and
malformed-JSON fallback safety. Existing test_report_md_is_human_readable
updated to cover the new section names.
E2E: isolated HERMES_HOME, realistic 3-skill run, REPORT.md verified
end-to-end.
* feat(curator): hybrid model-declared + heuristic classification
Extend the consolidated-vs-pruned split with LLM-authored intent:
1. Curator prompt now requires a structured YAML block at the end of the
final response (consolidations / prunings with short rationale).
2. _parse_structured_summary() extracts it tolerantly — missing block,
malformed YAML, partial lists all fall back to heuristic cleanly.
3. _reconcile_classification() merges model intent with the tool-call
heuristic:
- Model wins on rationale when its umbrella exists post-run
- Model hallucination (umbrella doesn't exist) is downgraded to the
heuristic's finding, or pruned if there's no evidence either
- Heuristic catches model omission — consolidations the model
enumerated tools for but forgot to list get surfaced with a
'(detected via tool-call audit)' tag
4. REPORT.md now shows per-row rationale alongside 'removed → umbrella'
and flags audit-only rows so the user knows why no reason is shown.
Backward compat: run.json's 'archived' field (union) is preserved.
'pruned' is now a list of dicts with {name, source, reason};
'pruned_names' is the flat-name list for legacy consumers.
Tests: 15 new covering YAML parse edge cases (malformed, empty lists,
bare-string entries, missing fields), reconciler rules (model wins,
hallucination fallback, heuristic catches omission, prune with reason),
and an end-to-end report-render test with all four paths exercised.
2026-04-30 10:31:23 -07:00
|
|
|
"consolidated_this_run": len(consolidated),
|
|
|
|
|
"pruned_this_run": len(pruned),
|
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307)
Every curator pass now emits a dated report directory under
`~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files:
- `run.json` — machine-readable full record (before/after snapshot,
state transitions, all tool calls, model/provider, timing, full LLM
final response untruncated, error if any)
- `REPORT.md` — human-readable markdown: model + duration header,
auto-transition counts, LLM consolidation stats, archived-this-run
list, new-skills-this-run list, state transitions, the full LLM
final summary, and a recovery footer pointing at the archive + the
`hermes curator restore` command
Reports live under `logs/curator/`, not inside `skills/` — they're
operational telemetry, not user-authored skill data, and belong
alongside `agent.log` / `gateway.log`.
Internals:
- `_run_llm_review()` now returns a dict (final, summary, model,
provider, tool_calls, error) instead of a bare truncated string so
the reporter has full fidelity
- Report writer is fully best-effort — any failure logs at DEBUG and
never breaks the curator itself. Same-second rerun gets a numeric
suffix so reports can't clobber each other
- Report path stamped into `.curator_state` as `last_report_path`
- `hermes curator status` surfaces a "last report:" line so users
can immediately open the latest run
Tests (all green):
- 7 new tests in tests/agent/test_curator_reports.py covering: report
location (logs not skills), both files written, run.json shape and
diff accuracy, markdown structure, error path still writes, state
transitions captured, same-second runs get unique dirs
- Existing test_run_review_synchronous_invokes_llm_stub updated to
stub the new dict-returning _run_llm_review signature
Live E2E: ran a synchronous pass against a 1-skill test collection
with a stubbed LLM; report written correctly, state stamped with
last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
|
|
|
"state_transitions": len(transitions),
|
2026-04-30 23:04:50 -07:00
|
|
|
"cron_jobs_rewritten": int(cron_rewrites.get("jobs_updated", 0)),
|
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307)
Every curator pass now emits a dated report directory under
`~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files:
- `run.json` — machine-readable full record (before/after snapshot,
state transitions, all tool calls, model/provider, timing, full LLM
final response untruncated, error if any)
- `REPORT.md` — human-readable markdown: model + duration header,
auto-transition counts, LLM consolidation stats, archived-this-run
list, new-skills-this-run list, state transitions, the full LLM
final summary, and a recovery footer pointing at the archive + the
`hermes curator restore` command
Reports live under `logs/curator/`, not inside `skills/` — they're
operational telemetry, not user-authored skill data, and belong
alongside `agent.log` / `gateway.log`.
Internals:
- `_run_llm_review()` now returns a dict (final, summary, model,
provider, tool_calls, error) instead of a bare truncated string so
the reporter has full fidelity
- Report writer is fully best-effort — any failure logs at DEBUG and
never breaks the curator itself. Same-second rerun gets a numeric
suffix so reports can't clobber each other
- Report path stamped into `.curator_state` as `last_report_path`
- `hermes curator status` surfaces a "last report:" line so users
can immediately open the latest run
Tests (all green):
- 7 new tests in tests/agent/test_curator_reports.py covering: report
location (logs not skills), both files written, run.json shape and
diff accuracy, markdown structure, error path still writes, state
transitions captured, same-second runs get unique dirs
- Existing test_run_review_synchronous_invokes_llm_stub updated to
stub the new dict-returning _run_llm_review signature
Live E2E: ran a synchronous pass against a 1-skill test collection
with a stubbed LLM; report written correctly, state stamped with
last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
|
|
|
"tool_calls_total": sum(tc_counts.values()),
|
|
|
|
|
},
|
|
|
|
|
"tool_call_counts": tc_counts,
|
|
|
|
|
"archived": removed,
|
feat(curator): split archived into consolidated vs pruned with model + heuristic classification (#17941)
* fix(curator): split 'archived' into consolidated vs pruned in run reports
Users who watched a curator run saw skills like 'anthropic-api' listed
under 'Skills archived' and interpreted that as pruning — but the curator
had actually absorbed those skills into a new umbrella (e.g. 'llm-providers')
during the same run. The directory gets archived for safety (all removals
are recoverable), but the content still lives under a different name.
Users then 'restored' what they thought were deleted skills and ended up
with confusingly duplicated skillsets (old-name + absorbed-inside-umbrella).
Classify removed skills using this run's skill_manage tool calls:
- consolidated: content absorbed into a surviving/newly-created skill
(evidenced by a skill_manage write_file/patch/create/edit whose target
is a different skill AND whose file_path/content references the
removed skill's name)
- pruned: archived without consolidation evidence (truly stale)
REPORT.md now shows two distinct sections:
- 'Consolidated into umbrella skills' — with `removed → merged into umbrella`
- 'Pruned — archived for staleness' — pure staleness archives
run.json schema additions (backward compatible):
- counts.consolidated_this_run, counts.pruned_this_run
- consolidated: [{name, into, evidence}, ...]
- pruned: [names]
- archived: retained as the union for backward compat
Also: relabel the auto-transitions 'archived' counter to 'archived (no
LLM, pure time-based staleness)' so it's clearly distinct from LLM-pass
archives.
Tests: 9 new tests in test_curator_classification.py covering consolidation
evidence parsing (write_file/patch/create), hyphen/underscore name variants,
self-reference rejection, destination-must-exist, mixed runs, and
malformed-JSON fallback safety. Existing test_report_md_is_human_readable
updated to cover the new section names.
E2E: isolated HERMES_HOME, realistic 3-skill run, REPORT.md verified
end-to-end.
* feat(curator): hybrid model-declared + heuristic classification
Extend the consolidated-vs-pruned split with LLM-authored intent:
1. Curator prompt now requires a structured YAML block at the end of the
final response (consolidations / prunings with short rationale).
2. _parse_structured_summary() extracts it tolerantly — missing block,
malformed YAML, partial lists all fall back to heuristic cleanly.
3. _reconcile_classification() merges model intent with the tool-call
heuristic:
- Model wins on rationale when its umbrella exists post-run
- Model hallucination (umbrella doesn't exist) is downgraded to the
heuristic's finding, or pruned if there's no evidence either
- Heuristic catches model omission — consolidations the model
enumerated tools for but forgot to list get surfaced with a
'(detected via tool-call audit)' tag
4. REPORT.md now shows per-row rationale alongside 'removed → umbrella'
and flags audit-only rows so the user knows why no reason is shown.
Backward compat: run.json's 'archived' field (union) is preserved.
'pruned' is now a list of dicts with {name, source, reason};
'pruned_names' is the flat-name list for legacy consumers.
Tests: 15 new covering YAML parse edge cases (malformed, empty lists,
bare-string entries, missing fields), reconciler rules (model wins,
hallucination fallback, heuristic catches omission, prune with reason),
and an end-to-end report-render test with all four paths exercised.
2026-04-30 10:31:23 -07:00
|
|
|
"consolidated": consolidated,
|
|
|
|
|
"pruned": pruned,
|
|
|
|
|
"pruned_names": [p["name"] for p in pruned],
|
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307)
Every curator pass now emits a dated report directory under
`~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files:
- `run.json` — machine-readable full record (before/after snapshot,
state transitions, all tool calls, model/provider, timing, full LLM
final response untruncated, error if any)
- `REPORT.md` — human-readable markdown: model + duration header,
auto-transition counts, LLM consolidation stats, archived-this-run
list, new-skills-this-run list, state transitions, the full LLM
final summary, and a recovery footer pointing at the archive + the
`hermes curator restore` command
Reports live under `logs/curator/`, not inside `skills/` — they're
operational telemetry, not user-authored skill data, and belong
alongside `agent.log` / `gateway.log`.
Internals:
- `_run_llm_review()` now returns a dict (final, summary, model,
provider, tool_calls, error) instead of a bare truncated string so
the reporter has full fidelity
- Report writer is fully best-effort — any failure logs at DEBUG and
never breaks the curator itself. Same-second rerun gets a numeric
suffix so reports can't clobber each other
- Report path stamped into `.curator_state` as `last_report_path`
- `hermes curator status` surfaces a "last report:" line so users
can immediately open the latest run
Tests (all green):
- 7 new tests in tests/agent/test_curator_reports.py covering: report
location (logs not skills), both files written, run.json shape and
diff accuracy, markdown structure, error path still writes, state
transitions captured, same-second runs get unique dirs
- Existing test_run_review_synchronous_invokes_llm_stub updated to
stub the new dict-returning _run_llm_review signature
Live E2E: ran a synchronous pass against a 1-skill test collection
with a stubbed LLM; report written correctly, state stamped with
last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
|
|
|
"added": added,
|
|
|
|
|
"state_transitions": transitions,
|
2026-04-30 23:04:50 -07:00
|
|
|
"cron_rewrites": cron_rewrites,
|
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307)
Every curator pass now emits a dated report directory under
`~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files:
- `run.json` — machine-readable full record (before/after snapshot,
state transitions, all tool calls, model/provider, timing, full LLM
final response untruncated, error if any)
- `REPORT.md` — human-readable markdown: model + duration header,
auto-transition counts, LLM consolidation stats, archived-this-run
list, new-skills-this-run list, state transitions, the full LLM
final summary, and a recovery footer pointing at the archive + the
`hermes curator restore` command
Reports live under `logs/curator/`, not inside `skills/` — they're
operational telemetry, not user-authored skill data, and belong
alongside `agent.log` / `gateway.log`.
Internals:
- `_run_llm_review()` now returns a dict (final, summary, model,
provider, tool_calls, error) instead of a bare truncated string so
the reporter has full fidelity
- Report writer is fully best-effort — any failure logs at DEBUG and
never breaks the curator itself. Same-second rerun gets a numeric
suffix so reports can't clobber each other
- Report path stamped into `.curator_state` as `last_report_path`
- `hermes curator status` surfaces a "last report:" line so users
can immediately open the latest run
Tests (all green):
- 7 new tests in tests/agent/test_curator_reports.py covering: report
location (logs not skills), both files written, run.json shape and
diff accuracy, markdown structure, error path still writes, state
transitions captured, same-second runs get unique dirs
- Existing test_run_review_synchronous_invokes_llm_stub updated to
stub the new dict-returning _run_llm_review signature
Live E2E: ran a synchronous pass against a 1-skill test collection
with a stubbed LLM; report written correctly, state stamped with
last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
|
|
|
"llm_final": llm_meta.get("final", ""),
|
|
|
|
|
"llm_summary": llm_meta.get("summary", ""),
|
|
|
|
|
"llm_error": llm_meta.get("error"),
|
|
|
|
|
"tool_calls": llm_meta.get("tool_calls", []),
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
# run.json — machine-readable, full fidelity
|
|
|
|
|
try:
|
|
|
|
|
(run_dir / "run.json").write_text(
|
|
|
|
|
json.dumps(payload, indent=2, ensure_ascii=False) + "\n",
|
|
|
|
|
encoding="utf-8",
|
|
|
|
|
)
|
|
|
|
|
except Exception as e:
|
|
|
|
|
logger.debug("Curator run.json write failed: %s", e)
|
|
|
|
|
|
|
|
|
|
# REPORT.md — human-readable
|
|
|
|
|
try:
|
|
|
|
|
md = _render_report_markdown(payload)
|
|
|
|
|
(run_dir / "REPORT.md").write_text(md, encoding="utf-8")
|
|
|
|
|
except Exception as e:
|
|
|
|
|
logger.debug("Curator REPORT.md write failed: %s", e)
|
|
|
|
|
|
2026-04-30 23:04:50 -07:00
|
|
|
# cron_rewrites.json — only when at least one job was touched, to
|
|
|
|
|
# keep run dirs uncluttered for the common no-op case.
|
|
|
|
|
try:
|
|
|
|
|
if int(cron_rewrites.get("jobs_updated", 0)) > 0:
|
|
|
|
|
(run_dir / "cron_rewrites.json").write_text(
|
|
|
|
|
json.dumps(cron_rewrites, indent=2, ensure_ascii=False) + "\n",
|
|
|
|
|
encoding="utf-8",
|
|
|
|
|
)
|
|
|
|
|
except Exception as e:
|
|
|
|
|
logger.debug("Curator cron_rewrites.json write failed: %s", e)
|
|
|
|
|
|
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307)
Every curator pass now emits a dated report directory under
`~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files:
- `run.json` — machine-readable full record (before/after snapshot,
state transitions, all tool calls, model/provider, timing, full LLM
final response untruncated, error if any)
- `REPORT.md` — human-readable markdown: model + duration header,
auto-transition counts, LLM consolidation stats, archived-this-run
list, new-skills-this-run list, state transitions, the full LLM
final summary, and a recovery footer pointing at the archive + the
`hermes curator restore` command
Reports live under `logs/curator/`, not inside `skills/` — they're
operational telemetry, not user-authored skill data, and belong
alongside `agent.log` / `gateway.log`.
Internals:
- `_run_llm_review()` now returns a dict (final, summary, model,
provider, tool_calls, error) instead of a bare truncated string so
the reporter has full fidelity
- Report writer is fully best-effort — any failure logs at DEBUG and
never breaks the curator itself. Same-second rerun gets a numeric
suffix so reports can't clobber each other
- Report path stamped into `.curator_state` as `last_report_path`
- `hermes curator status` surfaces a "last report:" line so users
can immediately open the latest run
Tests (all green):
- 7 new tests in tests/agent/test_curator_reports.py covering: report
location (logs not skills), both files written, run.json shape and
diff accuracy, markdown structure, error path still writes, state
transitions captured, same-second runs get unique dirs
- Existing test_run_review_synchronous_invokes_llm_stub updated to
stub the new dict-returning _run_llm_review signature
Live E2E: ran a synchronous pass against a 1-skill test collection
with a stubbed LLM; report written correctly, state stamped with
last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
|
|
|
return run_dir
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _render_report_markdown(p: Dict[str, Any]) -> str:
|
|
|
|
|
"""Render the human-readable report."""
|
|
|
|
|
lines: List[str] = []
|
|
|
|
|
started = p.get("started_at", "")
|
|
|
|
|
duration = p.get("duration_seconds", 0) or 0
|
|
|
|
|
mins, secs = divmod(int(duration), 60)
|
|
|
|
|
dur_label = f"{mins}m {secs}s" if mins else f"{secs}s"
|
|
|
|
|
|
|
|
|
|
lines.append(f"# Curator run — {started}\n")
|
|
|
|
|
model = p.get("model") or "(not resolved)"
|
|
|
|
|
prov = p.get("provider") or "(not resolved)"
|
|
|
|
|
counts = p.get("counts") or {}
|
|
|
|
|
lines.append(
|
|
|
|
|
f"Model: `{model}` via `{prov}` · Duration: {dur_label} · "
|
|
|
|
|
f"Agent-created skills: {counts.get('before', 0)} → {counts.get('after', 0)} "
|
|
|
|
|
f"({counts.get('delta', 0):+d})\n"
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
error = p.get("llm_error")
|
|
|
|
|
if error:
|
|
|
|
|
lines.append(f"> ⚠ LLM pass error: `{error}`\n")
|
|
|
|
|
|
|
|
|
|
# Auto-transitions (pure, no LLM)
|
|
|
|
|
auto = p.get("auto_transitions") or {}
|
|
|
|
|
lines.append("## Auto-transitions (pure, no LLM)\n")
|
|
|
|
|
lines.append(f"- checked: {auto.get('checked', 0)}")
|
|
|
|
|
lines.append(f"- marked stale: {auto.get('marked_stale', 0)}")
|
feat(curator): split archived into consolidated vs pruned with model + heuristic classification (#17941)
* fix(curator): split 'archived' into consolidated vs pruned in run reports
Users who watched a curator run saw skills like 'anthropic-api' listed
under 'Skills archived' and interpreted that as pruning — but the curator
had actually absorbed those skills into a new umbrella (e.g. 'llm-providers')
during the same run. The directory gets archived for safety (all removals
are recoverable), but the content still lives under a different name.
Users then 'restored' what they thought were deleted skills and ended up
with confusingly duplicated skillsets (old-name + absorbed-inside-umbrella).
Classify removed skills using this run's skill_manage tool calls:
- consolidated: content absorbed into a surviving/newly-created skill
(evidenced by a skill_manage write_file/patch/create/edit whose target
is a different skill AND whose file_path/content references the
removed skill's name)
- pruned: archived without consolidation evidence (truly stale)
REPORT.md now shows two distinct sections:
- 'Consolidated into umbrella skills' — with `removed → merged into umbrella`
- 'Pruned — archived for staleness' — pure staleness archives
run.json schema additions (backward compatible):
- counts.consolidated_this_run, counts.pruned_this_run
- consolidated: [{name, into, evidence}, ...]
- pruned: [names]
- archived: retained as the union for backward compat
Also: relabel the auto-transitions 'archived' counter to 'archived (no
LLM, pure time-based staleness)' so it's clearly distinct from LLM-pass
archives.
Tests: 9 new tests in test_curator_classification.py covering consolidation
evidence parsing (write_file/patch/create), hyphen/underscore name variants,
self-reference rejection, destination-must-exist, mixed runs, and
malformed-JSON fallback safety. Existing test_report_md_is_human_readable
updated to cover the new section names.
E2E: isolated HERMES_HOME, realistic 3-skill run, REPORT.md verified
end-to-end.
* feat(curator): hybrid model-declared + heuristic classification
Extend the consolidated-vs-pruned split with LLM-authored intent:
1. Curator prompt now requires a structured YAML block at the end of the
final response (consolidations / prunings with short rationale).
2. _parse_structured_summary() extracts it tolerantly — missing block,
malformed YAML, partial lists all fall back to heuristic cleanly.
3. _reconcile_classification() merges model intent with the tool-call
heuristic:
- Model wins on rationale when its umbrella exists post-run
- Model hallucination (umbrella doesn't exist) is downgraded to the
heuristic's finding, or pruned if there's no evidence either
- Heuristic catches model omission — consolidations the model
enumerated tools for but forgot to list get surfaced with a
'(detected via tool-call audit)' tag
4. REPORT.md now shows per-row rationale alongside 'removed → umbrella'
and flags audit-only rows so the user knows why no reason is shown.
Backward compat: run.json's 'archived' field (union) is preserved.
'pruned' is now a list of dicts with {name, source, reason};
'pruned_names' is the flat-name list for legacy consumers.
Tests: 15 new covering YAML parse edge cases (malformed, empty lists,
bare-string entries, missing fields), reconciler rules (model wins,
hallucination fallback, heuristic catches omission, prune with reason),
and an end-to-end report-render test with all four paths exercised.
2026-04-30 10:31:23 -07:00
|
|
|
lines.append(f"- archived (no LLM, pure time-based staleness): {auto.get('archived', 0)}")
|
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307)
Every curator pass now emits a dated report directory under
`~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files:
- `run.json` — machine-readable full record (before/after snapshot,
state transitions, all tool calls, model/provider, timing, full LLM
final response untruncated, error if any)
- `REPORT.md` — human-readable markdown: model + duration header,
auto-transition counts, LLM consolidation stats, archived-this-run
list, new-skills-this-run list, state transitions, the full LLM
final summary, and a recovery footer pointing at the archive + the
`hermes curator restore` command
Reports live under `logs/curator/`, not inside `skills/` — they're
operational telemetry, not user-authored skill data, and belong
alongside `agent.log` / `gateway.log`.
Internals:
- `_run_llm_review()` now returns a dict (final, summary, model,
provider, tool_calls, error) instead of a bare truncated string so
the reporter has full fidelity
- Report writer is fully best-effort — any failure logs at DEBUG and
never breaks the curator itself. Same-second rerun gets a numeric
suffix so reports can't clobber each other
- Report path stamped into `.curator_state` as `last_report_path`
- `hermes curator status` surfaces a "last report:" line so users
can immediately open the latest run
Tests (all green):
- 7 new tests in tests/agent/test_curator_reports.py covering: report
location (logs not skills), both files written, run.json shape and
diff accuracy, markdown structure, error path still writes, state
transitions captured, same-second runs get unique dirs
- Existing test_run_review_synchronous_invokes_llm_stub updated to
stub the new dict-returning _run_llm_review signature
Live E2E: ran a synchronous pass against a 1-skill test collection
with a stubbed LLM; report written correctly, state stamped with
last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
|
|
|
lines.append(f"- reactivated: {auto.get('reactivated', 0)}")
|
|
|
|
|
lines.append("")
|
|
|
|
|
|
|
|
|
|
# LLM pass numbers
|
|
|
|
|
tc_counts = p.get("tool_call_counts") or {}
|
|
|
|
|
lines.append("## LLM consolidation pass\n")
|
|
|
|
|
lines.append(f"- tool calls: **{counts.get('tool_calls_total', 0)}** "
|
|
|
|
|
f"(by name: {', '.join(f'{k}={v}' for k, v in sorted(tc_counts.items())) or 'none'})")
|
feat(curator): split archived into consolidated vs pruned with model + heuristic classification (#17941)
* fix(curator): split 'archived' into consolidated vs pruned in run reports
Users who watched a curator run saw skills like 'anthropic-api' listed
under 'Skills archived' and interpreted that as pruning — but the curator
had actually absorbed those skills into a new umbrella (e.g. 'llm-providers')
during the same run. The directory gets archived for safety (all removals
are recoverable), but the content still lives under a different name.
Users then 'restored' what they thought were deleted skills and ended up
with confusingly duplicated skillsets (old-name + absorbed-inside-umbrella).
Classify removed skills using this run's skill_manage tool calls:
- consolidated: content absorbed into a surviving/newly-created skill
(evidenced by a skill_manage write_file/patch/create/edit whose target
is a different skill AND whose file_path/content references the
removed skill's name)
- pruned: archived without consolidation evidence (truly stale)
REPORT.md now shows two distinct sections:
- 'Consolidated into umbrella skills' — with `removed → merged into umbrella`
- 'Pruned — archived for staleness' — pure staleness archives
run.json schema additions (backward compatible):
- counts.consolidated_this_run, counts.pruned_this_run
- consolidated: [{name, into, evidence}, ...]
- pruned: [names]
- archived: retained as the union for backward compat
Also: relabel the auto-transitions 'archived' counter to 'archived (no
LLM, pure time-based staleness)' so it's clearly distinct from LLM-pass
archives.
Tests: 9 new tests in test_curator_classification.py covering consolidation
evidence parsing (write_file/patch/create), hyphen/underscore name variants,
self-reference rejection, destination-must-exist, mixed runs, and
malformed-JSON fallback safety. Existing test_report_md_is_human_readable
updated to cover the new section names.
E2E: isolated HERMES_HOME, realistic 3-skill run, REPORT.md verified
end-to-end.
* feat(curator): hybrid model-declared + heuristic classification
Extend the consolidated-vs-pruned split with LLM-authored intent:
1. Curator prompt now requires a structured YAML block at the end of the
final response (consolidations / prunings with short rationale).
2. _parse_structured_summary() extracts it tolerantly — missing block,
malformed YAML, partial lists all fall back to heuristic cleanly.
3. _reconcile_classification() merges model intent with the tool-call
heuristic:
- Model wins on rationale when its umbrella exists post-run
- Model hallucination (umbrella doesn't exist) is downgraded to the
heuristic's finding, or pruned if there's no evidence either
- Heuristic catches model omission — consolidations the model
enumerated tools for but forgot to list get surfaced with a
'(detected via tool-call audit)' tag
4. REPORT.md now shows per-row rationale alongside 'removed → umbrella'
and flags audit-only rows so the user knows why no reason is shown.
Backward compat: run.json's 'archived' field (union) is preserved.
'pruned' is now a list of dicts with {name, source, reason};
'pruned_names' is the flat-name list for legacy consumers.
Tests: 15 new covering YAML parse edge cases (malformed, empty lists,
bare-string entries, missing fields), reconciler rules (model wins,
hallucination fallback, heuristic catches omission, prune with reason),
and an end-to-end report-render test with all four paths exercised.
2026-04-30 10:31:23 -07:00
|
|
|
lines.append(f"- consolidated into umbrellas: **{counts.get('consolidated_this_run', 0)}**")
|
|
|
|
|
lines.append(f"- pruned (archived for staleness): **{counts.get('pruned_this_run', 0)}**")
|
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307)
Every curator pass now emits a dated report directory under
`~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files:
- `run.json` — machine-readable full record (before/after snapshot,
state transitions, all tool calls, model/provider, timing, full LLM
final response untruncated, error if any)
- `REPORT.md` — human-readable markdown: model + duration header,
auto-transition counts, LLM consolidation stats, archived-this-run
list, new-skills-this-run list, state transitions, the full LLM
final summary, and a recovery footer pointing at the archive + the
`hermes curator restore` command
Reports live under `logs/curator/`, not inside `skills/` — they're
operational telemetry, not user-authored skill data, and belong
alongside `agent.log` / `gateway.log`.
Internals:
- `_run_llm_review()` now returns a dict (final, summary, model,
provider, tool_calls, error) instead of a bare truncated string so
the reporter has full fidelity
- Report writer is fully best-effort — any failure logs at DEBUG and
never breaks the curator itself. Same-second rerun gets a numeric
suffix so reports can't clobber each other
- Report path stamped into `.curator_state` as `last_report_path`
- `hermes curator status` surfaces a "last report:" line so users
can immediately open the latest run
Tests (all green):
- 7 new tests in tests/agent/test_curator_reports.py covering: report
location (logs not skills), both files written, run.json shape and
diff accuracy, markdown structure, error path still writes, state
transitions captured, same-second runs get unique dirs
- Existing test_run_review_synchronous_invokes_llm_stub updated to
stub the new dict-returning _run_llm_review signature
Live E2E: ran a synchronous pass against a 1-skill test collection
with a stubbed LLM; report written correctly, state stamped with
last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
|
|
|
lines.append(f"- new skills this run: **{counts.get('added_this_run', 0)}**")
|
|
|
|
|
lines.append(f"- state transitions (active ↔ stale ↔ archived): "
|
|
|
|
|
f"**{counts.get('state_transitions', 0)}**")
|
|
|
|
|
lines.append("")
|
|
|
|
|
|
feat(curator): split archived into consolidated vs pruned with model + heuristic classification (#17941)
* fix(curator): split 'archived' into consolidated vs pruned in run reports
Users who watched a curator run saw skills like 'anthropic-api' listed
under 'Skills archived' and interpreted that as pruning — but the curator
had actually absorbed those skills into a new umbrella (e.g. 'llm-providers')
during the same run. The directory gets archived for safety (all removals
are recoverable), but the content still lives under a different name.
Users then 'restored' what they thought were deleted skills and ended up
with confusingly duplicated skillsets (old-name + absorbed-inside-umbrella).
Classify removed skills using this run's skill_manage tool calls:
- consolidated: content absorbed into a surviving/newly-created skill
(evidenced by a skill_manage write_file/patch/create/edit whose target
is a different skill AND whose file_path/content references the
removed skill's name)
- pruned: archived without consolidation evidence (truly stale)
REPORT.md now shows two distinct sections:
- 'Consolidated into umbrella skills' — with `removed → merged into umbrella`
- 'Pruned — archived for staleness' — pure staleness archives
run.json schema additions (backward compatible):
- counts.consolidated_this_run, counts.pruned_this_run
- consolidated: [{name, into, evidence}, ...]
- pruned: [names]
- archived: retained as the union for backward compat
Also: relabel the auto-transitions 'archived' counter to 'archived (no
LLM, pure time-based staleness)' so it's clearly distinct from LLM-pass
archives.
Tests: 9 new tests in test_curator_classification.py covering consolidation
evidence parsing (write_file/patch/create), hyphen/underscore name variants,
self-reference rejection, destination-must-exist, mixed runs, and
malformed-JSON fallback safety. Existing test_report_md_is_human_readable
updated to cover the new section names.
E2E: isolated HERMES_HOME, realistic 3-skill run, REPORT.md verified
end-to-end.
* feat(curator): hybrid model-declared + heuristic classification
Extend the consolidated-vs-pruned split with LLM-authored intent:
1. Curator prompt now requires a structured YAML block at the end of the
final response (consolidations / prunings with short rationale).
2. _parse_structured_summary() extracts it tolerantly — missing block,
malformed YAML, partial lists all fall back to heuristic cleanly.
3. _reconcile_classification() merges model intent with the tool-call
heuristic:
- Model wins on rationale when its umbrella exists post-run
- Model hallucination (umbrella doesn't exist) is downgraded to the
heuristic's finding, or pruned if there's no evidence either
- Heuristic catches model omission — consolidations the model
enumerated tools for but forgot to list get surfaced with a
'(detected via tool-call audit)' tag
4. REPORT.md now shows per-row rationale alongside 'removed → umbrella'
and flags audit-only rows so the user knows why no reason is shown.
Backward compat: run.json's 'archived' field (union) is preserved.
'pruned' is now a list of dicts with {name, source, reason};
'pruned_names' is the flat-name list for legacy consumers.
Tests: 15 new covering YAML parse edge cases (malformed, empty lists,
bare-string entries, missing fields), reconciler rules (model wins,
hallucination fallback, heuristic catches omission, prune with reason),
and an end-to-end report-render test with all four paths exercised.
2026-04-30 10:31:23 -07:00
|
|
|
# Consolidated list — content absorbed into an umbrella. The directory
|
|
|
|
|
# on disk still lives under ~/.hermes/skills/.archive/ (every removal is
|
|
|
|
|
# recoverable by design), but the "live" content for these skills
|
|
|
|
|
# continues to exist inside the destination umbrella.
|
|
|
|
|
consolidated = p.get("consolidated") or []
|
|
|
|
|
if consolidated:
|
|
|
|
|
lines.append(f"### Consolidated into umbrella skills ({len(consolidated)})\n")
|
|
|
|
|
lines.append(
|
|
|
|
|
"_These skills were **absorbed into another skill** during this run — "
|
|
|
|
|
"their content still lives, just under a different name. "
|
|
|
|
|
"The original directory was moved to `~/.hermes/skills/.archive/` for "
|
|
|
|
|
"safety and can be restored via `hermes curator restore <name>` if the "
|
|
|
|
|
"consolidation was wrong._\n"
|
|
|
|
|
)
|
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307)
Every curator pass now emits a dated report directory under
`~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files:
- `run.json` — machine-readable full record (before/after snapshot,
state transitions, all tool calls, model/provider, timing, full LLM
final response untruncated, error if any)
- `REPORT.md` — human-readable markdown: model + duration header,
auto-transition counts, LLM consolidation stats, archived-this-run
list, new-skills-this-run list, state transitions, the full LLM
final summary, and a recovery footer pointing at the archive + the
`hermes curator restore` command
Reports live under `logs/curator/`, not inside `skills/` — they're
operational telemetry, not user-authored skill data, and belong
alongside `agent.log` / `gateway.log`.
Internals:
- `_run_llm_review()` now returns a dict (final, summary, model,
provider, tool_calls, error) instead of a bare truncated string so
the reporter has full fidelity
- Report writer is fully best-effort — any failure logs at DEBUG and
never breaks the curator itself. Same-second rerun gets a numeric
suffix so reports can't clobber each other
- Report path stamped into `.curator_state` as `last_report_path`
- `hermes curator status` surfaces a "last report:" line so users
can immediately open the latest run
Tests (all green):
- 7 new tests in tests/agent/test_curator_reports.py covering: report
location (logs not skills), both files written, run.json shape and
diff accuracy, markdown structure, error path still writes, state
transitions captured, same-second runs get unique dirs
- Existing test_run_review_synchronous_invokes_llm_stub updated to
stub the new dict-returning _run_llm_review signature
Live E2E: ran a synchronous pass against a 1-skill test collection
with a stubbed LLM; report written correctly, state stamped with
last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
|
|
|
SHOW = 50
|
feat(curator): split archived into consolidated vs pruned with model + heuristic classification (#17941)
* fix(curator): split 'archived' into consolidated vs pruned in run reports
Users who watched a curator run saw skills like 'anthropic-api' listed
under 'Skills archived' and interpreted that as pruning — but the curator
had actually absorbed those skills into a new umbrella (e.g. 'llm-providers')
during the same run. The directory gets archived for safety (all removals
are recoverable), but the content still lives under a different name.
Users then 'restored' what they thought were deleted skills and ended up
with confusingly duplicated skillsets (old-name + absorbed-inside-umbrella).
Classify removed skills using this run's skill_manage tool calls:
- consolidated: content absorbed into a surviving/newly-created skill
(evidenced by a skill_manage write_file/patch/create/edit whose target
is a different skill AND whose file_path/content references the
removed skill's name)
- pruned: archived without consolidation evidence (truly stale)
REPORT.md now shows two distinct sections:
- 'Consolidated into umbrella skills' — with `removed → merged into umbrella`
- 'Pruned — archived for staleness' — pure staleness archives
run.json schema additions (backward compatible):
- counts.consolidated_this_run, counts.pruned_this_run
- consolidated: [{name, into, evidence}, ...]
- pruned: [names]
- archived: retained as the union for backward compat
Also: relabel the auto-transitions 'archived' counter to 'archived (no
LLM, pure time-based staleness)' so it's clearly distinct from LLM-pass
archives.
Tests: 9 new tests in test_curator_classification.py covering consolidation
evidence parsing (write_file/patch/create), hyphen/underscore name variants,
self-reference rejection, destination-must-exist, mixed runs, and
malformed-JSON fallback safety. Existing test_report_md_is_human_readable
updated to cover the new section names.
E2E: isolated HERMES_HOME, realistic 3-skill run, REPORT.md verified
end-to-end.
* feat(curator): hybrid model-declared + heuristic classification
Extend the consolidated-vs-pruned split with LLM-authored intent:
1. Curator prompt now requires a structured YAML block at the end of the
final response (consolidations / prunings with short rationale).
2. _parse_structured_summary() extracts it tolerantly — missing block,
malformed YAML, partial lists all fall back to heuristic cleanly.
3. _reconcile_classification() merges model intent with the tool-call
heuristic:
- Model wins on rationale when its umbrella exists post-run
- Model hallucination (umbrella doesn't exist) is downgraded to the
heuristic's finding, or pruned if there's no evidence either
- Heuristic catches model omission — consolidations the model
enumerated tools for but forgot to list get surfaced with a
'(detected via tool-call audit)' tag
4. REPORT.md now shows per-row rationale alongside 'removed → umbrella'
and flags audit-only rows so the user knows why no reason is shown.
Backward compat: run.json's 'archived' field (union) is preserved.
'pruned' is now a list of dicts with {name, source, reason};
'pruned_names' is the flat-name list for legacy consumers.
Tests: 15 new covering YAML parse edge cases (malformed, empty lists,
bare-string entries, missing fields), reconciler rules (model wins,
hallucination fallback, heuristic catches omission, prune with reason),
and an end-to-end report-render test with all four paths exercised.
2026-04-30 10:31:23 -07:00
|
|
|
for entry in consolidated[:SHOW]:
|
|
|
|
|
name = entry.get("name", "?")
|
|
|
|
|
into = entry.get("into", "?")
|
|
|
|
|
reason = (entry.get("reason") or "").strip()
|
|
|
|
|
source = entry.get("source", "")
|
|
|
|
|
line = f"- `{name}` → merged into `{into}`"
|
|
|
|
|
if reason:
|
|
|
|
|
line += f" — {reason}"
|
|
|
|
|
if source and source.startswith("tool-call audit"):
|
|
|
|
|
# The model didn't enumerate this one — surface that to the
|
|
|
|
|
# user so they know why the row has no rationale.
|
|
|
|
|
line += f" _(detected via {source})_"
|
|
|
|
|
lines.append(line)
|
|
|
|
|
if entry.get("model_claimed_into"):
|
|
|
|
|
lines.append(
|
|
|
|
|
f" ⚠ The curator's summary named `{entry['model_claimed_into']}` "
|
|
|
|
|
"as the umbrella but that skill doesn't exist post-run; "
|
|
|
|
|
"showing the tool-call audit's finding instead."
|
|
|
|
|
)
|
|
|
|
|
if len(consolidated) > SHOW:
|
|
|
|
|
lines.append(f"- … and {len(consolidated) - SHOW} more (see `run.json`)")
|
|
|
|
|
lines.append("")
|
|
|
|
|
|
|
|
|
|
# Pruned list — archived without consolidation. These are the
|
|
|
|
|
# "stale skill pruned" cases the UI should mark clearly.
|
|
|
|
|
pruned = p.get("pruned") or []
|
|
|
|
|
if pruned:
|
|
|
|
|
lines.append(f"### Pruned — archived for staleness ({len(pruned)})\n")
|
|
|
|
|
lines.append(
|
|
|
|
|
"_These skills were archived without being merged into an umbrella "
|
|
|
|
|
"(e.g. stale, unused, or judged irrelevant). "
|
|
|
|
|
"Directories live under `~/.hermes/skills/.archive/`. "
|
|
|
|
|
"Restore any via `hermes curator restore <name>`._\n"
|
|
|
|
|
)
|
|
|
|
|
SHOW = 50
|
|
|
|
|
for entry in pruned[:SHOW]:
|
|
|
|
|
# Entries are dicts with {name, source, reason} when written via
|
|
|
|
|
# the reconciler, or bare strings when an older format slipped
|
|
|
|
|
# through. Handle both.
|
|
|
|
|
if isinstance(entry, dict):
|
|
|
|
|
name = entry.get("name", "?")
|
|
|
|
|
reason = (entry.get("reason") or "").strip()
|
|
|
|
|
line = f"- `{name}`"
|
|
|
|
|
if reason:
|
|
|
|
|
line += f" — {reason}"
|
|
|
|
|
lines.append(line)
|
|
|
|
|
else:
|
|
|
|
|
lines.append(f"- `{entry}`")
|
|
|
|
|
if len(pruned) > SHOW:
|
|
|
|
|
lines.append(f"- … and {len(pruned) - SHOW} more (see `run.json`)")
|
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307)
Every curator pass now emits a dated report directory under
`~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files:
- `run.json` — machine-readable full record (before/after snapshot,
state transitions, all tool calls, model/provider, timing, full LLM
final response untruncated, error if any)
- `REPORT.md` — human-readable markdown: model + duration header,
auto-transition counts, LLM consolidation stats, archived-this-run
list, new-skills-this-run list, state transitions, the full LLM
final summary, and a recovery footer pointing at the archive + the
`hermes curator restore` command
Reports live under `logs/curator/`, not inside `skills/` — they're
operational telemetry, not user-authored skill data, and belong
alongside `agent.log` / `gateway.log`.
Internals:
- `_run_llm_review()` now returns a dict (final, summary, model,
provider, tool_calls, error) instead of a bare truncated string so
the reporter has full fidelity
- Report writer is fully best-effort — any failure logs at DEBUG and
never breaks the curator itself. Same-second rerun gets a numeric
suffix so reports can't clobber each other
- Report path stamped into `.curator_state` as `last_report_path`
- `hermes curator status` surfaces a "last report:" line so users
can immediately open the latest run
Tests (all green):
- 7 new tests in tests/agent/test_curator_reports.py covering: report
location (logs not skills), both files written, run.json shape and
diff accuracy, markdown structure, error path still writes, state
transitions captured, same-second runs get unique dirs
- Existing test_run_review_synchronous_invokes_llm_stub updated to
stub the new dict-returning _run_llm_review signature
Live E2E: ran a synchronous pass against a 1-skill test collection
with a stubbed LLM; report written correctly, state stamped with
last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
|
|
|
lines.append("")
|
|
|
|
|
|
|
|
|
|
# Added list
|
|
|
|
|
added = p.get("added") or []
|
|
|
|
|
if added:
|
|
|
|
|
lines.append(f"### New skills this run ({len(added)})\n")
|
|
|
|
|
lines.append("_Usually these are new class-level umbrellas created via `skill_manage action=create`._\n")
|
|
|
|
|
for n in added:
|
|
|
|
|
lines.append(f"- `{n}`")
|
|
|
|
|
lines.append("")
|
|
|
|
|
|
|
|
|
|
# State transitions
|
|
|
|
|
trans = p.get("state_transitions") or []
|
|
|
|
|
if trans:
|
|
|
|
|
lines.append(f"### State transitions ({len(trans)})\n")
|
|
|
|
|
for t in trans:
|
|
|
|
|
lines.append(f"- `{t.get('name')}`: {t.get('from')} → {t.get('to')}")
|
|
|
|
|
lines.append("")
|
|
|
|
|
|
2026-04-30 23:04:50 -07:00
|
|
|
# Cron job rewrites — show which scheduled jobs had their skill
|
|
|
|
|
# references updated so users can audit that the auto-rewrite did
|
|
|
|
|
# the right thing. Only present when at least one job changed.
|
|
|
|
|
cron_rw = p.get("cron_rewrites") or {}
|
|
|
|
|
cron_rewrites_list = cron_rw.get("rewrites") or []
|
|
|
|
|
if cron_rewrites_list:
|
|
|
|
|
lines.append(f"### Cron job skill references rewritten ({len(cron_rewrites_list)})\n")
|
|
|
|
|
lines.append(
|
|
|
|
|
"_Cron jobs that referenced a consolidated or pruned skill were "
|
|
|
|
|
"updated in-place so they keep loading the right instructions "
|
|
|
|
|
"on their next run. See `cron_rewrites.json` for the full record._\n"
|
|
|
|
|
)
|
|
|
|
|
SHOW = 25
|
|
|
|
|
for entry in cron_rewrites_list[:SHOW]:
|
|
|
|
|
job_name = entry.get("job_name") or entry.get("job_id") or "?"
|
|
|
|
|
before = entry.get("before") or []
|
|
|
|
|
after = entry.get("after") or []
|
|
|
|
|
mapped = entry.get("mapped") or {}
|
|
|
|
|
dropped = entry.get("dropped") or []
|
|
|
|
|
lines.append(
|
|
|
|
|
f"- `{job_name}`: `{', '.join(before)}` → `{', '.join(after) or '(none)'}`"
|
|
|
|
|
)
|
|
|
|
|
for old, new in mapped.items():
|
|
|
|
|
lines.append(f" - `{old}` → `{new}` (consolidated)")
|
|
|
|
|
for name in dropped:
|
|
|
|
|
lines.append(f" - `{name}` dropped (pruned)")
|
|
|
|
|
if len(cron_rewrites_list) > SHOW:
|
|
|
|
|
lines.append(
|
|
|
|
|
f"- … and {len(cron_rewrites_list) - SHOW} more "
|
|
|
|
|
"(see `cron_rewrites.json`)"
|
|
|
|
|
)
|
|
|
|
|
lines.append("")
|
|
|
|
|
|
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307)
Every curator pass now emits a dated report directory under
`~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files:
- `run.json` — machine-readable full record (before/after snapshot,
state transitions, all tool calls, model/provider, timing, full LLM
final response untruncated, error if any)
- `REPORT.md` — human-readable markdown: model + duration header,
auto-transition counts, LLM consolidation stats, archived-this-run
list, new-skills-this-run list, state transitions, the full LLM
final summary, and a recovery footer pointing at the archive + the
`hermes curator restore` command
Reports live under `logs/curator/`, not inside `skills/` — they're
operational telemetry, not user-authored skill data, and belong
alongside `agent.log` / `gateway.log`.
Internals:
- `_run_llm_review()` now returns a dict (final, summary, model,
provider, tool_calls, error) instead of a bare truncated string so
the reporter has full fidelity
- Report writer is fully best-effort — any failure logs at DEBUG and
never breaks the curator itself. Same-second rerun gets a numeric
suffix so reports can't clobber each other
- Report path stamped into `.curator_state` as `last_report_path`
- `hermes curator status` surfaces a "last report:" line so users
can immediately open the latest run
Tests (all green):
- 7 new tests in tests/agent/test_curator_reports.py covering: report
location (logs not skills), both files written, run.json shape and
diff accuracy, markdown structure, error path still writes, state
transitions captured, same-second runs get unique dirs
- Existing test_run_review_synchronous_invokes_llm_stub updated to
stub the new dict-returning _run_llm_review signature
Live E2E: ran a synchronous pass against a 1-skill test collection
with a stubbed LLM; report written correctly, state stamped with
last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
|
|
|
# Full LLM final response
|
|
|
|
|
final = (p.get("llm_final") or "").strip()
|
|
|
|
|
if final:
|
|
|
|
|
lines.append("## LLM final summary\n")
|
|
|
|
|
lines.append(final)
|
|
|
|
|
lines.append("")
|
|
|
|
|
elif not error:
|
|
|
|
|
llm_sum = p.get("llm_summary") or ""
|
|
|
|
|
if llm_sum:
|
|
|
|
|
lines.append("## LLM summary\n")
|
|
|
|
|
lines.append(llm_sum)
|
|
|
|
|
lines.append("")
|
|
|
|
|
|
|
|
|
|
# Recovery footer
|
|
|
|
|
lines.append("## Recovery\n")
|
|
|
|
|
lines.append("- Restore an archived skill: `hermes curator restore <name>`")
|
|
|
|
|
lines.append("- All archives live under `~/.hermes/skills/.archive/` and are recoverable by `mv`")
|
|
|
|
|
lines.append("- See `run.json` in this directory for the full machine-readable record.")
|
|
|
|
|
lines.append("")
|
|
|
|
|
|
|
|
|
|
return "\n".join(lines)
|
|
|
|
|
|
|
|
|
|
|
feat(curator): background skill maintenance (issue #7816)
Adds the Curator — an auxiliary-model background task that periodically
reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage,
transitions unused skills through active → stale → archived, and spawns
a forked AIAgent to consolidate overlaps and patch drift.
Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI
startup and gateway boot when the last run is older than interval_hours
(default 24) AND the agent has been idle for min_idle_hours (default 2).
Invariants (all load-bearing):
- Never touches bundled or hub-installed skills (.bundled_manifest +
.hub/lock.json double-filter)
- Never auto-deletes — archive only. Archives are recoverable
via `hermes curator restore <skill>`
- Pinned skills bypass all auto-transitions
- Uses the aux client; never touches the main session's prompt cache
New files:
- tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes,
provenance filter
- agent/curator.py — orchestrator: config, idle gating, state-machine
transitions (pure, no LLM), forked-agent review prompt
- hermes_cli/curator.py — `hermes curator {status,run,pause,resume,
pin,unpin,restore}` subcommand
- tests/tools/test_skill_usage.py — 29 tests
- tests/agent/test_curator.py — 25 tests
Modified files (surgical patches):
- tools/skills_tool.py — bump view_count on successful skill_view
- tools/skill_manager_tool.py — bump patch_count on skill_manage
patch/edit/write_file/remove_file; forget record on delete
- hermes_cli/config.py — add curator: section to DEFAULT_CONFIG
- hermes_cli/commands.py — add /curator CommandDef with subcommands
- hermes_cli/main.py — register `hermes curator` subparser via
register_cli() from hermes_cli.curator
- cli.py — /curator slash-command dispatch + startup hook
- gateway/run.py — gateway-boot hook (mirrors CLI)
Validation:
- 54 new tests across skill_usage + curator, all passing in 3s
- 346 tests across all touched files' neighbors green
- 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green
- CLI smoke: `hermes curator status/pause/resume` work end-to-end
Companion to PR #16026 (class-first skill review prompt) — together
they form a loop: the review prompt stops near-duplicate skill creation
at the source, and the curator prunes/consolidates what still accumulates.
Refs #7816.
2026-04-26 06:08:39 -07:00
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
# Orchestrator — spawn a forked AIAgent for the LLM review pass
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
def _render_candidate_list() -> str:
|
|
|
|
|
"""Human/agent-readable list of agent-created skills with usage stats."""
|
|
|
|
|
rows = skill_usage.agent_created_report()
|
|
|
|
|
if not rows:
|
|
|
|
|
return "No agent-created skills to review."
|
|
|
|
|
lines = [f"Agent-created skills ({len(rows)}):\n"]
|
|
|
|
|
for r in rows:
|
|
|
|
|
lines.append(
|
|
|
|
|
f"- {r['name']} "
|
|
|
|
|
f"state={r['state']} "
|
|
|
|
|
f"pinned={'yes' if r.get('pinned') else 'no'} "
|
2026-04-30 20:23:17 +08:00
|
|
|
f"activity={r.get('activity_count', 0)} "
|
feat(curator): background skill maintenance (issue #7816)
Adds the Curator — an auxiliary-model background task that periodically
reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage,
transitions unused skills through active → stale → archived, and spawns
a forked AIAgent to consolidate overlaps and patch drift.
Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI
startup and gateway boot when the last run is older than interval_hours
(default 24) AND the agent has been idle for min_idle_hours (default 2).
Invariants (all load-bearing):
- Never touches bundled or hub-installed skills (.bundled_manifest +
.hub/lock.json double-filter)
- Never auto-deletes — archive only. Archives are recoverable
via `hermes curator restore <skill>`
- Pinned skills bypass all auto-transitions
- Uses the aux client; never touches the main session's prompt cache
New files:
- tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes,
provenance filter
- agent/curator.py — orchestrator: config, idle gating, state-machine
transitions (pure, no LLM), forked-agent review prompt
- hermes_cli/curator.py — `hermes curator {status,run,pause,resume,
pin,unpin,restore}` subcommand
- tests/tools/test_skill_usage.py — 29 tests
- tests/agent/test_curator.py — 25 tests
Modified files (surgical patches):
- tools/skills_tool.py — bump view_count on successful skill_view
- tools/skill_manager_tool.py — bump patch_count on skill_manage
patch/edit/write_file/remove_file; forget record on delete
- hermes_cli/config.py — add curator: section to DEFAULT_CONFIG
- hermes_cli/commands.py — add /curator CommandDef with subcommands
- hermes_cli/main.py — register `hermes curator` subparser via
register_cli() from hermes_cli.curator
- cli.py — /curator slash-command dispatch + startup hook
- gateway/run.py — gateway-boot hook (mirrors CLI)
Validation:
- 54 new tests across skill_usage + curator, all passing in 3s
- 346 tests across all touched files' neighbors green
- 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green
- CLI smoke: `hermes curator status/pause/resume` work end-to-end
Companion to PR #16026 (class-first skill review prompt) — together
they form a loop: the review prompt stops near-duplicate skill creation
at the source, and the curator prunes/consolidates what still accumulates.
Refs #7816.
2026-04-26 06:08:39 -07:00
|
|
|
f"use={r.get('use_count', 0)} "
|
|
|
|
|
f"view={r.get('view_count', 0)} "
|
|
|
|
|
f"patches={r.get('patch_count', 0)} "
|
2026-04-30 20:23:17 +08:00
|
|
|
f"last_activity={r.get('last_activity_at') or 'never'}"
|
feat(curator): background skill maintenance (issue #7816)
Adds the Curator — an auxiliary-model background task that periodically
reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage,
transitions unused skills through active → stale → archived, and spawns
a forked AIAgent to consolidate overlaps and patch drift.
Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI
startup and gateway boot when the last run is older than interval_hours
(default 24) AND the agent has been idle for min_idle_hours (default 2).
Invariants (all load-bearing):
- Never touches bundled or hub-installed skills (.bundled_manifest +
.hub/lock.json double-filter)
- Never auto-deletes — archive only. Archives are recoverable
via `hermes curator restore <skill>`
- Pinned skills bypass all auto-transitions
- Uses the aux client; never touches the main session's prompt cache
New files:
- tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes,
provenance filter
- agent/curator.py — orchestrator: config, idle gating, state-machine
transitions (pure, no LLM), forked-agent review prompt
- hermes_cli/curator.py — `hermes curator {status,run,pause,resume,
pin,unpin,restore}` subcommand
- tests/tools/test_skill_usage.py — 29 tests
- tests/agent/test_curator.py — 25 tests
Modified files (surgical patches):
- tools/skills_tool.py — bump view_count on successful skill_view
- tools/skill_manager_tool.py — bump patch_count on skill_manage
patch/edit/write_file/remove_file; forget record on delete
- hermes_cli/config.py — add curator: section to DEFAULT_CONFIG
- hermes_cli/commands.py — add /curator CommandDef with subcommands
- hermes_cli/main.py — register `hermes curator` subparser via
register_cli() from hermes_cli.curator
- cli.py — /curator slash-command dispatch + startup hook
- gateway/run.py — gateway-boot hook (mirrors CLI)
Validation:
- 54 new tests across skill_usage + curator, all passing in 3s
- 346 tests across all touched files' neighbors green
- 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green
- CLI smoke: `hermes curator status/pause/resume` work end-to-end
Companion to PR #16026 (class-first skill review prompt) — together
they form a loop: the review prompt stops near-duplicate skill creation
at the source, and the curator prunes/consolidates what still accumulates.
Refs #7816.
2026-04-26 06:08:39 -07:00
|
|
|
)
|
|
|
|
|
return "\n".join(lines)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def run_curator_review(
|
|
|
|
|
on_summary: Optional[Callable[[str], None]] = None,
|
|
|
|
|
synchronous: bool = False,
|
fix(curator): defer first run and add --dry-run preview (#18373) (#18389)
* fix(curator): defer first run and add --dry-run preview (#18373)
Curator was meant to run 7 days after install, not on the very first
gateway tick. On a fresh install (no .curator_state), should_run_now()
returned True immediately because last_run_at was None — so the gateway
cron ticker fired Curator against a fresh skill library moments after
'hermes update'. Combined with the binary 'agent-created' provenance
model (anything not bundled and not hub-installed), this consolidated
hand-authored user workflow skills without consent.
Changes:
- should_run_now(): first observation seeds last_run_at='now' and returns
False. The next real pass fires one full interval_hours later (7 days
by default), matching the original design intent.
- hermes curator run --dry-run: produces the same review report without
applying automatic transitions OR permitting the LLM to call
skill_manage / terminal mv. A DRY-RUN banner is prepended to the
prompt and the caller skips apply_automatic_transitions. State is
NOT advanced so a preview doesn't defer the next scheduled real pass.
- hermes update: prints a one-liner on fresh installs pointing at
--dry-run, pause, and the docs. Silent on steady state.
- Docs: curator.md and cli-commands.md explain the deferred first-run
behavior and warn that hand-written SKILL.md files share the
'agent-created' bucket, with guidance to pin or preview before the
first pass.
Tests:
- test_first_run_defers replaces the old 'first run always eligible'
assertion — same fixture, inverted expectation.
- test_maybe_run_curator_defers_on_fresh_install covers the gateway tick
path end-to-end.
- Three new dry-run tests cover state-advance suppression, prompt
banner injection, and apply_automatic_transitions skipping.
Fixes #18373.
* feat(curator): pre-run backup + rollback (#18373)
Every real curator pass now snapshots ~/.hermes/skills/ into
~/.hermes/skills/.curator_backups/<utc-iso>/skills.tar.gz before calling
apply_automatic_transitions or the LLM review. If a run consolidates or
archives something the user didn't want touched, 'hermes curator
rollback' restores the tree in one command. Dry-run is skipped — no
mutation means no snapshot needed.
Changes:
- agent/curator_backup.py (new): tar.gz snapshot + safe rollback. The
snapshot excludes .curator_backups/ (would recurse) and .hub/ (managed
by the skills hub). Extract refuses absolute paths and .. components,
and uses tarfile's filter='data' on Python 3.12+. Rollback takes a
pre-rollback safety snapshot FIRST, stages the current tree into
.rollback-staging-<ts>/ so the extract lands in an empty dir, and
cleans the staging dir on success. A failed extract restores the
staged contents.
- agent/curator.py: run_curator_review() calls curator_backup.
snapshot_skills(reason='pre-curator-run') before apply_automatic_
transitions. Best-effort — a failed snapshot logs at debug and the
run continues (a transient disk issue shouldn't silently disable
curator forever).
- hermes_cli/curator.py: new 'hermes curator backup' and 'hermes curator
rollback' subcommands. rollback supports --list, --id <ts>, -y.
- hermes_cli/config.py: curator.backup.{enabled, keep} config block
with sane defaults (enabled=true, keep=5).
- Docs: curator.md gets a 'Backups and rollback' section; cli-commands
.md table gets the new rows.
Tests (new file tests/agent/test_curator_backup.py, 16 cases):
- snapshot creates tarball + manifest with correct counts
- snapshot excludes .curator_backups/ (recursion guard) and .hub/
- snapshot disabled via config returns None without creating anything
- snapshot uniquifies ids within the same second (-01 suffix)
- prune honors keep count, newest-first
- list_backups + _resolve_backup cover newest-default and unknown-id
- rollback restores a deleted skill with content intact
- rollback is itself undoable — safety snapshot shows up in list_backups
- rollback with no snapshots returns an error
- rollback refuses tarballs with absolute paths or .. components
- real curator runs take a 'pre-curator-run' snapshot; dry-runs do not
All curator tests: 210 passing locally.
2026-05-01 09:49:59 -07:00
|
|
|
dry_run: bool = False,
|
feat(curator): background skill maintenance (issue #7816)
Adds the Curator — an auxiliary-model background task that periodically
reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage,
transitions unused skills through active → stale → archived, and spawns
a forked AIAgent to consolidate overlaps and patch drift.
Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI
startup and gateway boot when the last run is older than interval_hours
(default 24) AND the agent has been idle for min_idle_hours (default 2).
Invariants (all load-bearing):
- Never touches bundled or hub-installed skills (.bundled_manifest +
.hub/lock.json double-filter)
- Never auto-deletes — archive only. Archives are recoverable
via `hermes curator restore <skill>`
- Pinned skills bypass all auto-transitions
- Uses the aux client; never touches the main session's prompt cache
New files:
- tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes,
provenance filter
- agent/curator.py — orchestrator: config, idle gating, state-machine
transitions (pure, no LLM), forked-agent review prompt
- hermes_cli/curator.py — `hermes curator {status,run,pause,resume,
pin,unpin,restore}` subcommand
- tests/tools/test_skill_usage.py — 29 tests
- tests/agent/test_curator.py — 25 tests
Modified files (surgical patches):
- tools/skills_tool.py — bump view_count on successful skill_view
- tools/skill_manager_tool.py — bump patch_count on skill_manage
patch/edit/write_file/remove_file; forget record on delete
- hermes_cli/config.py — add curator: section to DEFAULT_CONFIG
- hermes_cli/commands.py — add /curator CommandDef with subcommands
- hermes_cli/main.py — register `hermes curator` subparser via
register_cli() from hermes_cli.curator
- cli.py — /curator slash-command dispatch + startup hook
- gateway/run.py — gateway-boot hook (mirrors CLI)
Validation:
- 54 new tests across skill_usage + curator, all passing in 3s
- 346 tests across all touched files' neighbors green
- 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green
- CLI smoke: `hermes curator status/pause/resume` work end-to-end
Companion to PR #16026 (class-first skill review prompt) — together
they form a loop: the review prompt stops near-duplicate skill creation
at the source, and the curator prunes/consolidates what still accumulates.
Refs #7816.
2026-04-26 06:08:39 -07:00
|
|
|
) -> Dict[str, Any]:
|
|
|
|
|
"""Execute a single curator review pass.
|
|
|
|
|
|
|
|
|
|
Steps:
|
|
|
|
|
1. Apply automatic state transitions (pure, no LLM).
|
|
|
|
|
2. If there are agent-created skills, spawn a forked AIAgent that runs
|
|
|
|
|
the LLM review prompt against the current candidate list.
|
|
|
|
|
3. Update .curator_state with last_run_at and a one-line summary.
|
|
|
|
|
4. Invoke *on_summary* with a user-visible description.
|
|
|
|
|
|
|
|
|
|
If *synchronous* is True, the LLM review runs in the calling thread; the
|
|
|
|
|
default is to spawn a daemon thread so the caller returns immediately.
|
fix(curator): defer first run and add --dry-run preview (#18373) (#18389)
* fix(curator): defer first run and add --dry-run preview (#18373)
Curator was meant to run 7 days after install, not on the very first
gateway tick. On a fresh install (no .curator_state), should_run_now()
returned True immediately because last_run_at was None — so the gateway
cron ticker fired Curator against a fresh skill library moments after
'hermes update'. Combined with the binary 'agent-created' provenance
model (anything not bundled and not hub-installed), this consolidated
hand-authored user workflow skills without consent.
Changes:
- should_run_now(): first observation seeds last_run_at='now' and returns
False. The next real pass fires one full interval_hours later (7 days
by default), matching the original design intent.
- hermes curator run --dry-run: produces the same review report without
applying automatic transitions OR permitting the LLM to call
skill_manage / terminal mv. A DRY-RUN banner is prepended to the
prompt and the caller skips apply_automatic_transitions. State is
NOT advanced so a preview doesn't defer the next scheduled real pass.
- hermes update: prints a one-liner on fresh installs pointing at
--dry-run, pause, and the docs. Silent on steady state.
- Docs: curator.md and cli-commands.md explain the deferred first-run
behavior and warn that hand-written SKILL.md files share the
'agent-created' bucket, with guidance to pin or preview before the
first pass.
Tests:
- test_first_run_defers replaces the old 'first run always eligible'
assertion — same fixture, inverted expectation.
- test_maybe_run_curator_defers_on_fresh_install covers the gateway tick
path end-to-end.
- Three new dry-run tests cover state-advance suppression, prompt
banner injection, and apply_automatic_transitions skipping.
Fixes #18373.
* feat(curator): pre-run backup + rollback (#18373)
Every real curator pass now snapshots ~/.hermes/skills/ into
~/.hermes/skills/.curator_backups/<utc-iso>/skills.tar.gz before calling
apply_automatic_transitions or the LLM review. If a run consolidates or
archives something the user didn't want touched, 'hermes curator
rollback' restores the tree in one command. Dry-run is skipped — no
mutation means no snapshot needed.
Changes:
- agent/curator_backup.py (new): tar.gz snapshot + safe rollback. The
snapshot excludes .curator_backups/ (would recurse) and .hub/ (managed
by the skills hub). Extract refuses absolute paths and .. components,
and uses tarfile's filter='data' on Python 3.12+. Rollback takes a
pre-rollback safety snapshot FIRST, stages the current tree into
.rollback-staging-<ts>/ so the extract lands in an empty dir, and
cleans the staging dir on success. A failed extract restores the
staged contents.
- agent/curator.py: run_curator_review() calls curator_backup.
snapshot_skills(reason='pre-curator-run') before apply_automatic_
transitions. Best-effort — a failed snapshot logs at debug and the
run continues (a transient disk issue shouldn't silently disable
curator forever).
- hermes_cli/curator.py: new 'hermes curator backup' and 'hermes curator
rollback' subcommands. rollback supports --list, --id <ts>, -y.
- hermes_cli/config.py: curator.backup.{enabled, keep} config block
with sane defaults (enabled=true, keep=5).
- Docs: curator.md gets a 'Backups and rollback' section; cli-commands
.md table gets the new rows.
Tests (new file tests/agent/test_curator_backup.py, 16 cases):
- snapshot creates tarball + manifest with correct counts
- snapshot excludes .curator_backups/ (recursion guard) and .hub/
- snapshot disabled via config returns None without creating anything
- snapshot uniquifies ids within the same second (-01 suffix)
- prune honors keep count, newest-first
- list_backups + _resolve_backup cover newest-default and unknown-id
- rollback restores a deleted skill with content intact
- rollback is itself undoable — safety snapshot shows up in list_backups
- rollback with no snapshots returns an error
- rollback refuses tarballs with absolute paths or .. components
- real curator runs take a 'pre-curator-run' snapshot; dry-runs do not
All curator tests: 210 passing locally.
2026-05-01 09:49:59 -07:00
|
|
|
|
|
|
|
|
If *dry_run* is True, the automatic stale/archive transitions are SKIPPED
|
|
|
|
|
and the LLM review pass is instructed to produce a report only — no
|
|
|
|
|
skill_manage mutations, no terminal archive moves. The REPORT.md still
|
|
|
|
|
gets written and ``state.last_report_path`` still records it so users
|
|
|
|
|
can read what the curator WOULD have done.
|
feat(curator): background skill maintenance (issue #7816)
Adds the Curator — an auxiliary-model background task that periodically
reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage,
transitions unused skills through active → stale → archived, and spawns
a forked AIAgent to consolidate overlaps and patch drift.
Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI
startup and gateway boot when the last run is older than interval_hours
(default 24) AND the agent has been idle for min_idle_hours (default 2).
Invariants (all load-bearing):
- Never touches bundled or hub-installed skills (.bundled_manifest +
.hub/lock.json double-filter)
- Never auto-deletes — archive only. Archives are recoverable
via `hermes curator restore <skill>`
- Pinned skills bypass all auto-transitions
- Uses the aux client; never touches the main session's prompt cache
New files:
- tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes,
provenance filter
- agent/curator.py — orchestrator: config, idle gating, state-machine
transitions (pure, no LLM), forked-agent review prompt
- hermes_cli/curator.py — `hermes curator {status,run,pause,resume,
pin,unpin,restore}` subcommand
- tests/tools/test_skill_usage.py — 29 tests
- tests/agent/test_curator.py — 25 tests
Modified files (surgical patches):
- tools/skills_tool.py — bump view_count on successful skill_view
- tools/skill_manager_tool.py — bump patch_count on skill_manage
patch/edit/write_file/remove_file; forget record on delete
- hermes_cli/config.py — add curator: section to DEFAULT_CONFIG
- hermes_cli/commands.py — add /curator CommandDef with subcommands
- hermes_cli/main.py — register `hermes curator` subparser via
register_cli() from hermes_cli.curator
- cli.py — /curator slash-command dispatch + startup hook
- gateway/run.py — gateway-boot hook (mirrors CLI)
Validation:
- 54 new tests across skill_usage + curator, all passing in 3s
- 346 tests across all touched files' neighbors green
- 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green
- CLI smoke: `hermes curator status/pause/resume` work end-to-end
Companion to PR #16026 (class-first skill review prompt) — together
they form a loop: the review prompt stops near-duplicate skill creation
at the source, and the curator prunes/consolidates what still accumulates.
Refs #7816.
2026-04-26 06:08:39 -07:00
|
|
|
"""
|
|
|
|
|
start = datetime.now(timezone.utc)
|
fix(curator): defer first run and add --dry-run preview (#18373) (#18389)
* fix(curator): defer first run and add --dry-run preview (#18373)
Curator was meant to run 7 days after install, not on the very first
gateway tick. On a fresh install (no .curator_state), should_run_now()
returned True immediately because last_run_at was None — so the gateway
cron ticker fired Curator against a fresh skill library moments after
'hermes update'. Combined with the binary 'agent-created' provenance
model (anything not bundled and not hub-installed), this consolidated
hand-authored user workflow skills without consent.
Changes:
- should_run_now(): first observation seeds last_run_at='now' and returns
False. The next real pass fires one full interval_hours later (7 days
by default), matching the original design intent.
- hermes curator run --dry-run: produces the same review report without
applying automatic transitions OR permitting the LLM to call
skill_manage / terminal mv. A DRY-RUN banner is prepended to the
prompt and the caller skips apply_automatic_transitions. State is
NOT advanced so a preview doesn't defer the next scheduled real pass.
- hermes update: prints a one-liner on fresh installs pointing at
--dry-run, pause, and the docs. Silent on steady state.
- Docs: curator.md and cli-commands.md explain the deferred first-run
behavior and warn that hand-written SKILL.md files share the
'agent-created' bucket, with guidance to pin or preview before the
first pass.
Tests:
- test_first_run_defers replaces the old 'first run always eligible'
assertion — same fixture, inverted expectation.
- test_maybe_run_curator_defers_on_fresh_install covers the gateway tick
path end-to-end.
- Three new dry-run tests cover state-advance suppression, prompt
banner injection, and apply_automatic_transitions skipping.
Fixes #18373.
* feat(curator): pre-run backup + rollback (#18373)
Every real curator pass now snapshots ~/.hermes/skills/ into
~/.hermes/skills/.curator_backups/<utc-iso>/skills.tar.gz before calling
apply_automatic_transitions or the LLM review. If a run consolidates or
archives something the user didn't want touched, 'hermes curator
rollback' restores the tree in one command. Dry-run is skipped — no
mutation means no snapshot needed.
Changes:
- agent/curator_backup.py (new): tar.gz snapshot + safe rollback. The
snapshot excludes .curator_backups/ (would recurse) and .hub/ (managed
by the skills hub). Extract refuses absolute paths and .. components,
and uses tarfile's filter='data' on Python 3.12+. Rollback takes a
pre-rollback safety snapshot FIRST, stages the current tree into
.rollback-staging-<ts>/ so the extract lands in an empty dir, and
cleans the staging dir on success. A failed extract restores the
staged contents.
- agent/curator.py: run_curator_review() calls curator_backup.
snapshot_skills(reason='pre-curator-run') before apply_automatic_
transitions. Best-effort — a failed snapshot logs at debug and the
run continues (a transient disk issue shouldn't silently disable
curator forever).
- hermes_cli/curator.py: new 'hermes curator backup' and 'hermes curator
rollback' subcommands. rollback supports --list, --id <ts>, -y.
- hermes_cli/config.py: curator.backup.{enabled, keep} config block
with sane defaults (enabled=true, keep=5).
- Docs: curator.md gets a 'Backups and rollback' section; cli-commands
.md table gets the new rows.
Tests (new file tests/agent/test_curator_backup.py, 16 cases):
- snapshot creates tarball + manifest with correct counts
- snapshot excludes .curator_backups/ (recursion guard) and .hub/
- snapshot disabled via config returns None without creating anything
- snapshot uniquifies ids within the same second (-01 suffix)
- prune honors keep count, newest-first
- list_backups + _resolve_backup cover newest-default and unknown-id
- rollback restores a deleted skill with content intact
- rollback is itself undoable — safety snapshot shows up in list_backups
- rollback with no snapshots returns an error
- rollback refuses tarballs with absolute paths or .. components
- real curator runs take a 'pre-curator-run' snapshot; dry-runs do not
All curator tests: 210 passing locally.
2026-05-01 09:49:59 -07:00
|
|
|
if dry_run:
|
|
|
|
|
# Count candidates without mutating state.
|
|
|
|
|
try:
|
|
|
|
|
report = skill_usage.agent_created_report()
|
|
|
|
|
counts = {
|
|
|
|
|
"checked": len(report),
|
|
|
|
|
"marked_stale": 0,
|
|
|
|
|
"archived": 0,
|
|
|
|
|
"reactivated": 0,
|
|
|
|
|
}
|
|
|
|
|
except Exception:
|
|
|
|
|
counts = {"checked": 0, "marked_stale": 0, "archived": 0, "reactivated": 0}
|
|
|
|
|
else:
|
|
|
|
|
# Pre-mutation snapshot — best-effort, never blocks the run. A
|
|
|
|
|
# failed snapshot logs at debug and continues (the alternative is
|
|
|
|
|
# that a transient disk issue silently disables curator forever,
|
|
|
|
|
# which is worse). Users who want to require snapshots can disable
|
|
|
|
|
# curator entirely until they can fix disk space.
|
|
|
|
|
try:
|
|
|
|
|
from agent import curator_backup
|
|
|
|
|
snap = curator_backup.snapshot_skills(reason="pre-curator-run")
|
|
|
|
|
if snap is not None and on_summary:
|
|
|
|
|
try:
|
|
|
|
|
on_summary(f"curator: snapshot created ({snap.name})")
|
|
|
|
|
except Exception:
|
|
|
|
|
pass
|
|
|
|
|
except Exception as e:
|
|
|
|
|
logger.debug("Curator pre-run snapshot failed: %s", e, exc_info=True)
|
|
|
|
|
counts = apply_automatic_transitions(now=start)
|
feat(curator): background skill maintenance (issue #7816)
Adds the Curator — an auxiliary-model background task that periodically
reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage,
transitions unused skills through active → stale → archived, and spawns
a forked AIAgent to consolidate overlaps and patch drift.
Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI
startup and gateway boot when the last run is older than interval_hours
(default 24) AND the agent has been idle for min_idle_hours (default 2).
Invariants (all load-bearing):
- Never touches bundled or hub-installed skills (.bundled_manifest +
.hub/lock.json double-filter)
- Never auto-deletes — archive only. Archives are recoverable
via `hermes curator restore <skill>`
- Pinned skills bypass all auto-transitions
- Uses the aux client; never touches the main session's prompt cache
New files:
- tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes,
provenance filter
- agent/curator.py — orchestrator: config, idle gating, state-machine
transitions (pure, no LLM), forked-agent review prompt
- hermes_cli/curator.py — `hermes curator {status,run,pause,resume,
pin,unpin,restore}` subcommand
- tests/tools/test_skill_usage.py — 29 tests
- tests/agent/test_curator.py — 25 tests
Modified files (surgical patches):
- tools/skills_tool.py — bump view_count on successful skill_view
- tools/skill_manager_tool.py — bump patch_count on skill_manage
patch/edit/write_file/remove_file; forget record on delete
- hermes_cli/config.py — add curator: section to DEFAULT_CONFIG
- hermes_cli/commands.py — add /curator CommandDef with subcommands
- hermes_cli/main.py — register `hermes curator` subparser via
register_cli() from hermes_cli.curator
- cli.py — /curator slash-command dispatch + startup hook
- gateway/run.py — gateway-boot hook (mirrors CLI)
Validation:
- 54 new tests across skill_usage + curator, all passing in 3s
- 346 tests across all touched files' neighbors green
- 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green
- CLI smoke: `hermes curator status/pause/resume` work end-to-end
Companion to PR #16026 (class-first skill review prompt) — together
they form a loop: the review prompt stops near-duplicate skill creation
at the source, and the curator prunes/consolidates what still accumulates.
Refs #7816.
2026-04-26 06:08:39 -07:00
|
|
|
|
|
|
|
|
auto_summary_parts = []
|
|
|
|
|
if counts["marked_stale"]:
|
|
|
|
|
auto_summary_parts.append(f"{counts['marked_stale']} marked stale")
|
|
|
|
|
if counts["archived"]:
|
|
|
|
|
auto_summary_parts.append(f"{counts['archived']} archived")
|
|
|
|
|
if counts["reactivated"]:
|
|
|
|
|
auto_summary_parts.append(f"{counts['reactivated']} reactivated")
|
|
|
|
|
auto_summary = ", ".join(auto_summary_parts) if auto_summary_parts else "no changes"
|
|
|
|
|
|
|
|
|
|
# Persist state before the LLM pass so a crash mid-review still records
|
fix(curator): defer first run and add --dry-run preview (#18373) (#18389)
* fix(curator): defer first run and add --dry-run preview (#18373)
Curator was meant to run 7 days after install, not on the very first
gateway tick. On a fresh install (no .curator_state), should_run_now()
returned True immediately because last_run_at was None — so the gateway
cron ticker fired Curator against a fresh skill library moments after
'hermes update'. Combined with the binary 'agent-created' provenance
model (anything not bundled and not hub-installed), this consolidated
hand-authored user workflow skills without consent.
Changes:
- should_run_now(): first observation seeds last_run_at='now' and returns
False. The next real pass fires one full interval_hours later (7 days
by default), matching the original design intent.
- hermes curator run --dry-run: produces the same review report without
applying automatic transitions OR permitting the LLM to call
skill_manage / terminal mv. A DRY-RUN banner is prepended to the
prompt and the caller skips apply_automatic_transitions. State is
NOT advanced so a preview doesn't defer the next scheduled real pass.
- hermes update: prints a one-liner on fresh installs pointing at
--dry-run, pause, and the docs. Silent on steady state.
- Docs: curator.md and cli-commands.md explain the deferred first-run
behavior and warn that hand-written SKILL.md files share the
'agent-created' bucket, with guidance to pin or preview before the
first pass.
Tests:
- test_first_run_defers replaces the old 'first run always eligible'
assertion — same fixture, inverted expectation.
- test_maybe_run_curator_defers_on_fresh_install covers the gateway tick
path end-to-end.
- Three new dry-run tests cover state-advance suppression, prompt
banner injection, and apply_automatic_transitions skipping.
Fixes #18373.
* feat(curator): pre-run backup + rollback (#18373)
Every real curator pass now snapshots ~/.hermes/skills/ into
~/.hermes/skills/.curator_backups/<utc-iso>/skills.tar.gz before calling
apply_automatic_transitions or the LLM review. If a run consolidates or
archives something the user didn't want touched, 'hermes curator
rollback' restores the tree in one command. Dry-run is skipped — no
mutation means no snapshot needed.
Changes:
- agent/curator_backup.py (new): tar.gz snapshot + safe rollback. The
snapshot excludes .curator_backups/ (would recurse) and .hub/ (managed
by the skills hub). Extract refuses absolute paths and .. components,
and uses tarfile's filter='data' on Python 3.12+. Rollback takes a
pre-rollback safety snapshot FIRST, stages the current tree into
.rollback-staging-<ts>/ so the extract lands in an empty dir, and
cleans the staging dir on success. A failed extract restores the
staged contents.
- agent/curator.py: run_curator_review() calls curator_backup.
snapshot_skills(reason='pre-curator-run') before apply_automatic_
transitions. Best-effort — a failed snapshot logs at debug and the
run continues (a transient disk issue shouldn't silently disable
curator forever).
- hermes_cli/curator.py: new 'hermes curator backup' and 'hermes curator
rollback' subcommands. rollback supports --list, --id <ts>, -y.
- hermes_cli/config.py: curator.backup.{enabled, keep} config block
with sane defaults (enabled=true, keep=5).
- Docs: curator.md gets a 'Backups and rollback' section; cli-commands
.md table gets the new rows.
Tests (new file tests/agent/test_curator_backup.py, 16 cases):
- snapshot creates tarball + manifest with correct counts
- snapshot excludes .curator_backups/ (recursion guard) and .hub/
- snapshot disabled via config returns None without creating anything
- snapshot uniquifies ids within the same second (-01 suffix)
- prune honors keep count, newest-first
- list_backups + _resolve_backup cover newest-default and unknown-id
- rollback restores a deleted skill with content intact
- rollback is itself undoable — safety snapshot shows up in list_backups
- rollback with no snapshots returns an error
- rollback refuses tarballs with absolute paths or .. components
- real curator runs take a 'pre-curator-run' snapshot; dry-runs do not
All curator tests: 210 passing locally.
2026-05-01 09:49:59 -07:00
|
|
|
# the run and doesn't immediately re-trigger. In dry-run we do NOT bump
|
|
|
|
|
# last_run_at or run_count — a preview shouldn't push the next scheduled
|
|
|
|
|
# real pass out. We still record a summary so `hermes curator status`
|
|
|
|
|
# shows that a preview ran.
|
feat(curator): background skill maintenance (issue #7816)
Adds the Curator — an auxiliary-model background task that periodically
reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage,
transitions unused skills through active → stale → archived, and spawns
a forked AIAgent to consolidate overlaps and patch drift.
Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI
startup and gateway boot when the last run is older than interval_hours
(default 24) AND the agent has been idle for min_idle_hours (default 2).
Invariants (all load-bearing):
- Never touches bundled or hub-installed skills (.bundled_manifest +
.hub/lock.json double-filter)
- Never auto-deletes — archive only. Archives are recoverable
via `hermes curator restore <skill>`
- Pinned skills bypass all auto-transitions
- Uses the aux client; never touches the main session's prompt cache
New files:
- tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes,
provenance filter
- agent/curator.py — orchestrator: config, idle gating, state-machine
transitions (pure, no LLM), forked-agent review prompt
- hermes_cli/curator.py — `hermes curator {status,run,pause,resume,
pin,unpin,restore}` subcommand
- tests/tools/test_skill_usage.py — 29 tests
- tests/agent/test_curator.py — 25 tests
Modified files (surgical patches):
- tools/skills_tool.py — bump view_count on successful skill_view
- tools/skill_manager_tool.py — bump patch_count on skill_manage
patch/edit/write_file/remove_file; forget record on delete
- hermes_cli/config.py — add curator: section to DEFAULT_CONFIG
- hermes_cli/commands.py — add /curator CommandDef with subcommands
- hermes_cli/main.py — register `hermes curator` subparser via
register_cli() from hermes_cli.curator
- cli.py — /curator slash-command dispatch + startup hook
- gateway/run.py — gateway-boot hook (mirrors CLI)
Validation:
- 54 new tests across skill_usage + curator, all passing in 3s
- 346 tests across all touched files' neighbors green
- 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green
- CLI smoke: `hermes curator status/pause/resume` work end-to-end
Companion to PR #16026 (class-first skill review prompt) — together
they form a loop: the review prompt stops near-duplicate skill creation
at the source, and the curator prunes/consolidates what still accumulates.
Refs #7816.
2026-04-26 06:08:39 -07:00
|
|
|
state = load_state()
|
fix(curator): defer first run and add --dry-run preview (#18373) (#18389)
* fix(curator): defer first run and add --dry-run preview (#18373)
Curator was meant to run 7 days after install, not on the very first
gateway tick. On a fresh install (no .curator_state), should_run_now()
returned True immediately because last_run_at was None — so the gateway
cron ticker fired Curator against a fresh skill library moments after
'hermes update'. Combined with the binary 'agent-created' provenance
model (anything not bundled and not hub-installed), this consolidated
hand-authored user workflow skills without consent.
Changes:
- should_run_now(): first observation seeds last_run_at='now' and returns
False. The next real pass fires one full interval_hours later (7 days
by default), matching the original design intent.
- hermes curator run --dry-run: produces the same review report without
applying automatic transitions OR permitting the LLM to call
skill_manage / terminal mv. A DRY-RUN banner is prepended to the
prompt and the caller skips apply_automatic_transitions. State is
NOT advanced so a preview doesn't defer the next scheduled real pass.
- hermes update: prints a one-liner on fresh installs pointing at
--dry-run, pause, and the docs. Silent on steady state.
- Docs: curator.md and cli-commands.md explain the deferred first-run
behavior and warn that hand-written SKILL.md files share the
'agent-created' bucket, with guidance to pin or preview before the
first pass.
Tests:
- test_first_run_defers replaces the old 'first run always eligible'
assertion — same fixture, inverted expectation.
- test_maybe_run_curator_defers_on_fresh_install covers the gateway tick
path end-to-end.
- Three new dry-run tests cover state-advance suppression, prompt
banner injection, and apply_automatic_transitions skipping.
Fixes #18373.
* feat(curator): pre-run backup + rollback (#18373)
Every real curator pass now snapshots ~/.hermes/skills/ into
~/.hermes/skills/.curator_backups/<utc-iso>/skills.tar.gz before calling
apply_automatic_transitions or the LLM review. If a run consolidates or
archives something the user didn't want touched, 'hermes curator
rollback' restores the tree in one command. Dry-run is skipped — no
mutation means no snapshot needed.
Changes:
- agent/curator_backup.py (new): tar.gz snapshot + safe rollback. The
snapshot excludes .curator_backups/ (would recurse) and .hub/ (managed
by the skills hub). Extract refuses absolute paths and .. components,
and uses tarfile's filter='data' on Python 3.12+. Rollback takes a
pre-rollback safety snapshot FIRST, stages the current tree into
.rollback-staging-<ts>/ so the extract lands in an empty dir, and
cleans the staging dir on success. A failed extract restores the
staged contents.
- agent/curator.py: run_curator_review() calls curator_backup.
snapshot_skills(reason='pre-curator-run') before apply_automatic_
transitions. Best-effort — a failed snapshot logs at debug and the
run continues (a transient disk issue shouldn't silently disable
curator forever).
- hermes_cli/curator.py: new 'hermes curator backup' and 'hermes curator
rollback' subcommands. rollback supports --list, --id <ts>, -y.
- hermes_cli/config.py: curator.backup.{enabled, keep} config block
with sane defaults (enabled=true, keep=5).
- Docs: curator.md gets a 'Backups and rollback' section; cli-commands
.md table gets the new rows.
Tests (new file tests/agent/test_curator_backup.py, 16 cases):
- snapshot creates tarball + manifest with correct counts
- snapshot excludes .curator_backups/ (recursion guard) and .hub/
- snapshot disabled via config returns None without creating anything
- snapshot uniquifies ids within the same second (-01 suffix)
- prune honors keep count, newest-first
- list_backups + _resolve_backup cover newest-default and unknown-id
- rollback restores a deleted skill with content intact
- rollback is itself undoable — safety snapshot shows up in list_backups
- rollback with no snapshots returns an error
- rollback refuses tarballs with absolute paths or .. components
- real curator runs take a 'pre-curator-run' snapshot; dry-runs do not
All curator tests: 210 passing locally.
2026-05-01 09:49:59 -07:00
|
|
|
if not dry_run:
|
|
|
|
|
state["last_run_at"] = start.isoformat()
|
|
|
|
|
state["run_count"] = int(state.get("run_count", 0)) + 1
|
|
|
|
|
prefix = "dry-run auto: " if dry_run else "auto: "
|
|
|
|
|
state["last_run_summary"] = f"{prefix}{auto_summary}"
|
feat(curator): background skill maintenance (issue #7816)
Adds the Curator — an auxiliary-model background task that periodically
reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage,
transitions unused skills through active → stale → archived, and spawns
a forked AIAgent to consolidate overlaps and patch drift.
Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI
startup and gateway boot when the last run is older than interval_hours
(default 24) AND the agent has been idle for min_idle_hours (default 2).
Invariants (all load-bearing):
- Never touches bundled or hub-installed skills (.bundled_manifest +
.hub/lock.json double-filter)
- Never auto-deletes — archive only. Archives are recoverable
via `hermes curator restore <skill>`
- Pinned skills bypass all auto-transitions
- Uses the aux client; never touches the main session's prompt cache
New files:
- tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes,
provenance filter
- agent/curator.py — orchestrator: config, idle gating, state-machine
transitions (pure, no LLM), forked-agent review prompt
- hermes_cli/curator.py — `hermes curator {status,run,pause,resume,
pin,unpin,restore}` subcommand
- tests/tools/test_skill_usage.py — 29 tests
- tests/agent/test_curator.py — 25 tests
Modified files (surgical patches):
- tools/skills_tool.py — bump view_count on successful skill_view
- tools/skill_manager_tool.py — bump patch_count on skill_manage
patch/edit/write_file/remove_file; forget record on delete
- hermes_cli/config.py — add curator: section to DEFAULT_CONFIG
- hermes_cli/commands.py — add /curator CommandDef with subcommands
- hermes_cli/main.py — register `hermes curator` subparser via
register_cli() from hermes_cli.curator
- cli.py — /curator slash-command dispatch + startup hook
- gateway/run.py — gateway-boot hook (mirrors CLI)
Validation:
- 54 new tests across skill_usage + curator, all passing in 3s
- 346 tests across all touched files' neighbors green
- 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green
- CLI smoke: `hermes curator status/pause/resume` work end-to-end
Companion to PR #16026 (class-first skill review prompt) — together
they form a loop: the review prompt stops near-duplicate skill creation
at the source, and the curator prunes/consolidates what still accumulates.
Refs #7816.
2026-04-26 06:08:39 -07:00
|
|
|
save_state(state)
|
|
|
|
|
|
|
|
|
|
def _llm_pass():
|
|
|
|
|
nonlocal auto_summary
|
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307)
Every curator pass now emits a dated report directory under
`~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files:
- `run.json` — machine-readable full record (before/after snapshot,
state transitions, all tool calls, model/provider, timing, full LLM
final response untruncated, error if any)
- `REPORT.md` — human-readable markdown: model + duration header,
auto-transition counts, LLM consolidation stats, archived-this-run
list, new-skills-this-run list, state transitions, the full LLM
final summary, and a recovery footer pointing at the archive + the
`hermes curator restore` command
Reports live under `logs/curator/`, not inside `skills/` — they're
operational telemetry, not user-authored skill data, and belong
alongside `agent.log` / `gateway.log`.
Internals:
- `_run_llm_review()` now returns a dict (final, summary, model,
provider, tool_calls, error) instead of a bare truncated string so
the reporter has full fidelity
- Report writer is fully best-effort — any failure logs at DEBUG and
never breaks the curator itself. Same-second rerun gets a numeric
suffix so reports can't clobber each other
- Report path stamped into `.curator_state` as `last_report_path`
- `hermes curator status` surfaces a "last report:" line so users
can immediately open the latest run
Tests (all green):
- 7 new tests in tests/agent/test_curator_reports.py covering: report
location (logs not skills), both files written, run.json shape and
diff accuracy, markdown structure, error path still writes, state
transitions captured, same-second runs get unique dirs
- Existing test_run_review_synchronous_invokes_llm_stub updated to
stub the new dict-returning _run_llm_review signature
Live E2E: ran a synchronous pass against a 1-skill test collection
with a stubbed LLM; report written correctly, state stamped with
last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
|
|
|
# Snapshot skill state BEFORE the LLM pass so the report can diff.
|
|
|
|
|
try:
|
|
|
|
|
before_report = skill_usage.agent_created_report()
|
|
|
|
|
except Exception:
|
|
|
|
|
before_report = []
|
|
|
|
|
before_names = {r.get("name") for r in before_report if isinstance(r, dict)}
|
|
|
|
|
|
|
|
|
|
llm_meta: Dict[str, Any] = {}
|
feat(curator): background skill maintenance (issue #7816)
Adds the Curator — an auxiliary-model background task that periodically
reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage,
transitions unused skills through active → stale → archived, and spawns
a forked AIAgent to consolidate overlaps and patch drift.
Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI
startup and gateway boot when the last run is older than interval_hours
(default 24) AND the agent has been idle for min_idle_hours (default 2).
Invariants (all load-bearing):
- Never touches bundled or hub-installed skills (.bundled_manifest +
.hub/lock.json double-filter)
- Never auto-deletes — archive only. Archives are recoverable
via `hermes curator restore <skill>`
- Pinned skills bypass all auto-transitions
- Uses the aux client; never touches the main session's prompt cache
New files:
- tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes,
provenance filter
- agent/curator.py — orchestrator: config, idle gating, state-machine
transitions (pure, no LLM), forked-agent review prompt
- hermes_cli/curator.py — `hermes curator {status,run,pause,resume,
pin,unpin,restore}` subcommand
- tests/tools/test_skill_usage.py — 29 tests
- tests/agent/test_curator.py — 25 tests
Modified files (surgical patches):
- tools/skills_tool.py — bump view_count on successful skill_view
- tools/skill_manager_tool.py — bump patch_count on skill_manage
patch/edit/write_file/remove_file; forget record on delete
- hermes_cli/config.py — add curator: section to DEFAULT_CONFIG
- hermes_cli/commands.py — add /curator CommandDef with subcommands
- hermes_cli/main.py — register `hermes curator` subparser via
register_cli() from hermes_cli.curator
- cli.py — /curator slash-command dispatch + startup hook
- gateway/run.py — gateway-boot hook (mirrors CLI)
Validation:
- 54 new tests across skill_usage + curator, all passing in 3s
- 346 tests across all touched files' neighbors green
- 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green
- CLI smoke: `hermes curator status/pause/resume` work end-to-end
Companion to PR #16026 (class-first skill review prompt) — together
they form a loop: the review prompt stops near-duplicate skill creation
at the source, and the curator prunes/consolidates what still accumulates.
Refs #7816.
2026-04-26 06:08:39 -07:00
|
|
|
try:
|
|
|
|
|
candidate_list = _render_candidate_list()
|
|
|
|
|
if "No agent-created skills" in candidate_list:
|
fix(curator): defer first run and add --dry-run preview (#18373) (#18389)
* fix(curator): defer first run and add --dry-run preview (#18373)
Curator was meant to run 7 days after install, not on the very first
gateway tick. On a fresh install (no .curator_state), should_run_now()
returned True immediately because last_run_at was None — so the gateway
cron ticker fired Curator against a fresh skill library moments after
'hermes update'. Combined with the binary 'agent-created' provenance
model (anything not bundled and not hub-installed), this consolidated
hand-authored user workflow skills without consent.
Changes:
- should_run_now(): first observation seeds last_run_at='now' and returns
False. The next real pass fires one full interval_hours later (7 days
by default), matching the original design intent.
- hermes curator run --dry-run: produces the same review report without
applying automatic transitions OR permitting the LLM to call
skill_manage / terminal mv. A DRY-RUN banner is prepended to the
prompt and the caller skips apply_automatic_transitions. State is
NOT advanced so a preview doesn't defer the next scheduled real pass.
- hermes update: prints a one-liner on fresh installs pointing at
--dry-run, pause, and the docs. Silent on steady state.
- Docs: curator.md and cli-commands.md explain the deferred first-run
behavior and warn that hand-written SKILL.md files share the
'agent-created' bucket, with guidance to pin or preview before the
first pass.
Tests:
- test_first_run_defers replaces the old 'first run always eligible'
assertion — same fixture, inverted expectation.
- test_maybe_run_curator_defers_on_fresh_install covers the gateway tick
path end-to-end.
- Three new dry-run tests cover state-advance suppression, prompt
banner injection, and apply_automatic_transitions skipping.
Fixes #18373.
* feat(curator): pre-run backup + rollback (#18373)
Every real curator pass now snapshots ~/.hermes/skills/ into
~/.hermes/skills/.curator_backups/<utc-iso>/skills.tar.gz before calling
apply_automatic_transitions or the LLM review. If a run consolidates or
archives something the user didn't want touched, 'hermes curator
rollback' restores the tree in one command. Dry-run is skipped — no
mutation means no snapshot needed.
Changes:
- agent/curator_backup.py (new): tar.gz snapshot + safe rollback. The
snapshot excludes .curator_backups/ (would recurse) and .hub/ (managed
by the skills hub). Extract refuses absolute paths and .. components,
and uses tarfile's filter='data' on Python 3.12+. Rollback takes a
pre-rollback safety snapshot FIRST, stages the current tree into
.rollback-staging-<ts>/ so the extract lands in an empty dir, and
cleans the staging dir on success. A failed extract restores the
staged contents.
- agent/curator.py: run_curator_review() calls curator_backup.
snapshot_skills(reason='pre-curator-run') before apply_automatic_
transitions. Best-effort — a failed snapshot logs at debug and the
run continues (a transient disk issue shouldn't silently disable
curator forever).
- hermes_cli/curator.py: new 'hermes curator backup' and 'hermes curator
rollback' subcommands. rollback supports --list, --id <ts>, -y.
- hermes_cli/config.py: curator.backup.{enabled, keep} config block
with sane defaults (enabled=true, keep=5).
- Docs: curator.md gets a 'Backups and rollback' section; cli-commands
.md table gets the new rows.
Tests (new file tests/agent/test_curator_backup.py, 16 cases):
- snapshot creates tarball + manifest with correct counts
- snapshot excludes .curator_backups/ (recursion guard) and .hub/
- snapshot disabled via config returns None without creating anything
- snapshot uniquifies ids within the same second (-01 suffix)
- prune honors keep count, newest-first
- list_backups + _resolve_backup cover newest-default and unknown-id
- rollback restores a deleted skill with content intact
- rollback is itself undoable — safety snapshot shows up in list_backups
- rollback with no snapshots returns an error
- rollback refuses tarballs with absolute paths or .. components
- real curator runs take a 'pre-curator-run' snapshot; dry-runs do not
All curator tests: 210 passing locally.
2026-05-01 09:49:59 -07:00
|
|
|
final_summary = f"{prefix}{auto_summary}; llm: skipped (no candidates)"
|
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307)
Every curator pass now emits a dated report directory under
`~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files:
- `run.json` — machine-readable full record (before/after snapshot,
state transitions, all tool calls, model/provider, timing, full LLM
final response untruncated, error if any)
- `REPORT.md` — human-readable markdown: model + duration header,
auto-transition counts, LLM consolidation stats, archived-this-run
list, new-skills-this-run list, state transitions, the full LLM
final summary, and a recovery footer pointing at the archive + the
`hermes curator restore` command
Reports live under `logs/curator/`, not inside `skills/` — they're
operational telemetry, not user-authored skill data, and belong
alongside `agent.log` / `gateway.log`.
Internals:
- `_run_llm_review()` now returns a dict (final, summary, model,
provider, tool_calls, error) instead of a bare truncated string so
the reporter has full fidelity
- Report writer is fully best-effort — any failure logs at DEBUG and
never breaks the curator itself. Same-second rerun gets a numeric
suffix so reports can't clobber each other
- Report path stamped into `.curator_state` as `last_report_path`
- `hermes curator status` surfaces a "last report:" line so users
can immediately open the latest run
Tests (all green):
- 7 new tests in tests/agent/test_curator_reports.py covering: report
location (logs not skills), both files written, run.json shape and
diff accuracy, markdown structure, error path still writes, state
transitions captured, same-second runs get unique dirs
- Existing test_run_review_synchronous_invokes_llm_stub updated to
stub the new dict-returning _run_llm_review signature
Live E2E: ran a synchronous pass against a 1-skill test collection
with a stubbed LLM; report written correctly, state stamped with
last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
|
|
|
llm_meta = {
|
|
|
|
|
"final": "",
|
|
|
|
|
"summary": "skipped (no candidates)",
|
|
|
|
|
"model": "",
|
|
|
|
|
"provider": "",
|
|
|
|
|
"tool_calls": [],
|
|
|
|
|
"error": None,
|
|
|
|
|
}
|
feat(curator): background skill maintenance (issue #7816)
Adds the Curator — an auxiliary-model background task that periodically
reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage,
transitions unused skills through active → stale → archived, and spawns
a forked AIAgent to consolidate overlaps and patch drift.
Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI
startup and gateway boot when the last run is older than interval_hours
(default 24) AND the agent has been idle for min_idle_hours (default 2).
Invariants (all load-bearing):
- Never touches bundled or hub-installed skills (.bundled_manifest +
.hub/lock.json double-filter)
- Never auto-deletes — archive only. Archives are recoverable
via `hermes curator restore <skill>`
- Pinned skills bypass all auto-transitions
- Uses the aux client; never touches the main session's prompt cache
New files:
- tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes,
provenance filter
- agent/curator.py — orchestrator: config, idle gating, state-machine
transitions (pure, no LLM), forked-agent review prompt
- hermes_cli/curator.py — `hermes curator {status,run,pause,resume,
pin,unpin,restore}` subcommand
- tests/tools/test_skill_usage.py — 29 tests
- tests/agent/test_curator.py — 25 tests
Modified files (surgical patches):
- tools/skills_tool.py — bump view_count on successful skill_view
- tools/skill_manager_tool.py — bump patch_count on skill_manage
patch/edit/write_file/remove_file; forget record on delete
- hermes_cli/config.py — add curator: section to DEFAULT_CONFIG
- hermes_cli/commands.py — add /curator CommandDef with subcommands
- hermes_cli/main.py — register `hermes curator` subparser via
register_cli() from hermes_cli.curator
- cli.py — /curator slash-command dispatch + startup hook
- gateway/run.py — gateway-boot hook (mirrors CLI)
Validation:
- 54 new tests across skill_usage + curator, all passing in 3s
- 346 tests across all touched files' neighbors green
- 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green
- CLI smoke: `hermes curator status/pause/resume` work end-to-end
Companion to PR #16026 (class-first skill review prompt) — together
they form a loop: the review prompt stops near-duplicate skill creation
at the source, and the curator prunes/consolidates what still accumulates.
Refs #7816.
2026-04-26 06:08:39 -07:00
|
|
|
else:
|
fix(curator): defer first run and add --dry-run preview (#18373) (#18389)
* fix(curator): defer first run and add --dry-run preview (#18373)
Curator was meant to run 7 days after install, not on the very first
gateway tick. On a fresh install (no .curator_state), should_run_now()
returned True immediately because last_run_at was None — so the gateway
cron ticker fired Curator against a fresh skill library moments after
'hermes update'. Combined with the binary 'agent-created' provenance
model (anything not bundled and not hub-installed), this consolidated
hand-authored user workflow skills without consent.
Changes:
- should_run_now(): first observation seeds last_run_at='now' and returns
False. The next real pass fires one full interval_hours later (7 days
by default), matching the original design intent.
- hermes curator run --dry-run: produces the same review report without
applying automatic transitions OR permitting the LLM to call
skill_manage / terminal mv. A DRY-RUN banner is prepended to the
prompt and the caller skips apply_automatic_transitions. State is
NOT advanced so a preview doesn't defer the next scheduled real pass.
- hermes update: prints a one-liner on fresh installs pointing at
--dry-run, pause, and the docs. Silent on steady state.
- Docs: curator.md and cli-commands.md explain the deferred first-run
behavior and warn that hand-written SKILL.md files share the
'agent-created' bucket, with guidance to pin or preview before the
first pass.
Tests:
- test_first_run_defers replaces the old 'first run always eligible'
assertion — same fixture, inverted expectation.
- test_maybe_run_curator_defers_on_fresh_install covers the gateway tick
path end-to-end.
- Three new dry-run tests cover state-advance suppression, prompt
banner injection, and apply_automatic_transitions skipping.
Fixes #18373.
* feat(curator): pre-run backup + rollback (#18373)
Every real curator pass now snapshots ~/.hermes/skills/ into
~/.hermes/skills/.curator_backups/<utc-iso>/skills.tar.gz before calling
apply_automatic_transitions or the LLM review. If a run consolidates or
archives something the user didn't want touched, 'hermes curator
rollback' restores the tree in one command. Dry-run is skipped — no
mutation means no snapshot needed.
Changes:
- agent/curator_backup.py (new): tar.gz snapshot + safe rollback. The
snapshot excludes .curator_backups/ (would recurse) and .hub/ (managed
by the skills hub). Extract refuses absolute paths and .. components,
and uses tarfile's filter='data' on Python 3.12+. Rollback takes a
pre-rollback safety snapshot FIRST, stages the current tree into
.rollback-staging-<ts>/ so the extract lands in an empty dir, and
cleans the staging dir on success. A failed extract restores the
staged contents.
- agent/curator.py: run_curator_review() calls curator_backup.
snapshot_skills(reason='pre-curator-run') before apply_automatic_
transitions. Best-effort — a failed snapshot logs at debug and the
run continues (a transient disk issue shouldn't silently disable
curator forever).
- hermes_cli/curator.py: new 'hermes curator backup' and 'hermes curator
rollback' subcommands. rollback supports --list, --id <ts>, -y.
- hermes_cli/config.py: curator.backup.{enabled, keep} config block
with sane defaults (enabled=true, keep=5).
- Docs: curator.md gets a 'Backups and rollback' section; cli-commands
.md table gets the new rows.
Tests (new file tests/agent/test_curator_backup.py, 16 cases):
- snapshot creates tarball + manifest with correct counts
- snapshot excludes .curator_backups/ (recursion guard) and .hub/
- snapshot disabled via config returns None without creating anything
- snapshot uniquifies ids within the same second (-01 suffix)
- prune honors keep count, newest-first
- list_backups + _resolve_backup cover newest-default and unknown-id
- rollback restores a deleted skill with content intact
- rollback is itself undoable — safety snapshot shows up in list_backups
- rollback with no snapshots returns an error
- rollback refuses tarballs with absolute paths or .. components
- real curator runs take a 'pre-curator-run' snapshot; dry-runs do not
All curator tests: 210 passing locally.
2026-05-01 09:49:59 -07:00
|
|
|
if dry_run:
|
|
|
|
|
prompt = (
|
|
|
|
|
f"{CURATOR_DRY_RUN_BANNER}\n\n"
|
|
|
|
|
f"{CURATOR_REVIEW_PROMPT}\n\n"
|
|
|
|
|
f"{candidate_list}"
|
|
|
|
|
)
|
|
|
|
|
else:
|
|
|
|
|
prompt = f"{CURATOR_REVIEW_PROMPT}\n\n{candidate_list}"
|
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307)
Every curator pass now emits a dated report directory under
`~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files:
- `run.json` — machine-readable full record (before/after snapshot,
state transitions, all tool calls, model/provider, timing, full LLM
final response untruncated, error if any)
- `REPORT.md` — human-readable markdown: model + duration header,
auto-transition counts, LLM consolidation stats, archived-this-run
list, new-skills-this-run list, state transitions, the full LLM
final summary, and a recovery footer pointing at the archive + the
`hermes curator restore` command
Reports live under `logs/curator/`, not inside `skills/` — they're
operational telemetry, not user-authored skill data, and belong
alongside `agent.log` / `gateway.log`.
Internals:
- `_run_llm_review()` now returns a dict (final, summary, model,
provider, tool_calls, error) instead of a bare truncated string so
the reporter has full fidelity
- Report writer is fully best-effort — any failure logs at DEBUG and
never breaks the curator itself. Same-second rerun gets a numeric
suffix so reports can't clobber each other
- Report path stamped into `.curator_state` as `last_report_path`
- `hermes curator status` surfaces a "last report:" line so users
can immediately open the latest run
Tests (all green):
- 7 new tests in tests/agent/test_curator_reports.py covering: report
location (logs not skills), both files written, run.json shape and
diff accuracy, markdown structure, error path still writes, state
transitions captured, same-second runs get unique dirs
- Existing test_run_review_synchronous_invokes_llm_stub updated to
stub the new dict-returning _run_llm_review signature
Live E2E: ran a synchronous pass against a 1-skill test collection
with a stubbed LLM; report written correctly, state stamped with
last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
|
|
|
llm_meta = _run_llm_review(prompt)
|
|
|
|
|
final_summary = (
|
fix(curator): defer first run and add --dry-run preview (#18373) (#18389)
* fix(curator): defer first run and add --dry-run preview (#18373)
Curator was meant to run 7 days after install, not on the very first
gateway tick. On a fresh install (no .curator_state), should_run_now()
returned True immediately because last_run_at was None — so the gateway
cron ticker fired Curator against a fresh skill library moments after
'hermes update'. Combined with the binary 'agent-created' provenance
model (anything not bundled and not hub-installed), this consolidated
hand-authored user workflow skills without consent.
Changes:
- should_run_now(): first observation seeds last_run_at='now' and returns
False. The next real pass fires one full interval_hours later (7 days
by default), matching the original design intent.
- hermes curator run --dry-run: produces the same review report without
applying automatic transitions OR permitting the LLM to call
skill_manage / terminal mv. A DRY-RUN banner is prepended to the
prompt and the caller skips apply_automatic_transitions. State is
NOT advanced so a preview doesn't defer the next scheduled real pass.
- hermes update: prints a one-liner on fresh installs pointing at
--dry-run, pause, and the docs. Silent on steady state.
- Docs: curator.md and cli-commands.md explain the deferred first-run
behavior and warn that hand-written SKILL.md files share the
'agent-created' bucket, with guidance to pin or preview before the
first pass.
Tests:
- test_first_run_defers replaces the old 'first run always eligible'
assertion — same fixture, inverted expectation.
- test_maybe_run_curator_defers_on_fresh_install covers the gateway tick
path end-to-end.
- Three new dry-run tests cover state-advance suppression, prompt
banner injection, and apply_automatic_transitions skipping.
Fixes #18373.
* feat(curator): pre-run backup + rollback (#18373)
Every real curator pass now snapshots ~/.hermes/skills/ into
~/.hermes/skills/.curator_backups/<utc-iso>/skills.tar.gz before calling
apply_automatic_transitions or the LLM review. If a run consolidates or
archives something the user didn't want touched, 'hermes curator
rollback' restores the tree in one command. Dry-run is skipped — no
mutation means no snapshot needed.
Changes:
- agent/curator_backup.py (new): tar.gz snapshot + safe rollback. The
snapshot excludes .curator_backups/ (would recurse) and .hub/ (managed
by the skills hub). Extract refuses absolute paths and .. components,
and uses tarfile's filter='data' on Python 3.12+. Rollback takes a
pre-rollback safety snapshot FIRST, stages the current tree into
.rollback-staging-<ts>/ so the extract lands in an empty dir, and
cleans the staging dir on success. A failed extract restores the
staged contents.
- agent/curator.py: run_curator_review() calls curator_backup.
snapshot_skills(reason='pre-curator-run') before apply_automatic_
transitions. Best-effort — a failed snapshot logs at debug and the
run continues (a transient disk issue shouldn't silently disable
curator forever).
- hermes_cli/curator.py: new 'hermes curator backup' and 'hermes curator
rollback' subcommands. rollback supports --list, --id <ts>, -y.
- hermes_cli/config.py: curator.backup.{enabled, keep} config block
with sane defaults (enabled=true, keep=5).
- Docs: curator.md gets a 'Backups and rollback' section; cli-commands
.md table gets the new rows.
Tests (new file tests/agent/test_curator_backup.py, 16 cases):
- snapshot creates tarball + manifest with correct counts
- snapshot excludes .curator_backups/ (recursion guard) and .hub/
- snapshot disabled via config returns None without creating anything
- snapshot uniquifies ids within the same second (-01 suffix)
- prune honors keep count, newest-first
- list_backups + _resolve_backup cover newest-default and unknown-id
- rollback restores a deleted skill with content intact
- rollback is itself undoable — safety snapshot shows up in list_backups
- rollback with no snapshots returns an error
- rollback refuses tarballs with absolute paths or .. components
- real curator runs take a 'pre-curator-run' snapshot; dry-runs do not
All curator tests: 210 passing locally.
2026-05-01 09:49:59 -07:00
|
|
|
f"{prefix}{auto_summary}; llm: {llm_meta.get('summary', 'no change')}"
|
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307)
Every curator pass now emits a dated report directory under
`~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files:
- `run.json` — machine-readable full record (before/after snapshot,
state transitions, all tool calls, model/provider, timing, full LLM
final response untruncated, error if any)
- `REPORT.md` — human-readable markdown: model + duration header,
auto-transition counts, LLM consolidation stats, archived-this-run
list, new-skills-this-run list, state transitions, the full LLM
final summary, and a recovery footer pointing at the archive + the
`hermes curator restore` command
Reports live under `logs/curator/`, not inside `skills/` — they're
operational telemetry, not user-authored skill data, and belong
alongside `agent.log` / `gateway.log`.
Internals:
- `_run_llm_review()` now returns a dict (final, summary, model,
provider, tool_calls, error) instead of a bare truncated string so
the reporter has full fidelity
- Report writer is fully best-effort — any failure logs at DEBUG and
never breaks the curator itself. Same-second rerun gets a numeric
suffix so reports can't clobber each other
- Report path stamped into `.curator_state` as `last_report_path`
- `hermes curator status` surfaces a "last report:" line so users
can immediately open the latest run
Tests (all green):
- 7 new tests in tests/agent/test_curator_reports.py covering: report
location (logs not skills), both files written, run.json shape and
diff accuracy, markdown structure, error path still writes, state
transitions captured, same-second runs get unique dirs
- Existing test_run_review_synchronous_invokes_llm_stub updated to
stub the new dict-returning _run_llm_review signature
Live E2E: ran a synchronous pass against a 1-skill test collection
with a stubbed LLM; report written correctly, state stamped with
last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
|
|
|
)
|
feat(curator): background skill maintenance (issue #7816)
Adds the Curator — an auxiliary-model background task that periodically
reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage,
transitions unused skills through active → stale → archived, and spawns
a forked AIAgent to consolidate overlaps and patch drift.
Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI
startup and gateway boot when the last run is older than interval_hours
(default 24) AND the agent has been idle for min_idle_hours (default 2).
Invariants (all load-bearing):
- Never touches bundled or hub-installed skills (.bundled_manifest +
.hub/lock.json double-filter)
- Never auto-deletes — archive only. Archives are recoverable
via `hermes curator restore <skill>`
- Pinned skills bypass all auto-transitions
- Uses the aux client; never touches the main session's prompt cache
New files:
- tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes,
provenance filter
- agent/curator.py — orchestrator: config, idle gating, state-machine
transitions (pure, no LLM), forked-agent review prompt
- hermes_cli/curator.py — `hermes curator {status,run,pause,resume,
pin,unpin,restore}` subcommand
- tests/tools/test_skill_usage.py — 29 tests
- tests/agent/test_curator.py — 25 tests
Modified files (surgical patches):
- tools/skills_tool.py — bump view_count on successful skill_view
- tools/skill_manager_tool.py — bump patch_count on skill_manage
patch/edit/write_file/remove_file; forget record on delete
- hermes_cli/config.py — add curator: section to DEFAULT_CONFIG
- hermes_cli/commands.py — add /curator CommandDef with subcommands
- hermes_cli/main.py — register `hermes curator` subparser via
register_cli() from hermes_cli.curator
- cli.py — /curator slash-command dispatch + startup hook
- gateway/run.py — gateway-boot hook (mirrors CLI)
Validation:
- 54 new tests across skill_usage + curator, all passing in 3s
- 346 tests across all touched files' neighbors green
- 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green
- CLI smoke: `hermes curator status/pause/resume` work end-to-end
Companion to PR #16026 (class-first skill review prompt) — together
they form a loop: the review prompt stops near-duplicate skill creation
at the source, and the curator prunes/consolidates what still accumulates.
Refs #7816.
2026-04-26 06:08:39 -07:00
|
|
|
except Exception as e:
|
|
|
|
|
logger.debug("Curator LLM pass failed: %s", e, exc_info=True)
|
fix(curator): defer first run and add --dry-run preview (#18373) (#18389)
* fix(curator): defer first run and add --dry-run preview (#18373)
Curator was meant to run 7 days after install, not on the very first
gateway tick. On a fresh install (no .curator_state), should_run_now()
returned True immediately because last_run_at was None — so the gateway
cron ticker fired Curator against a fresh skill library moments after
'hermes update'. Combined with the binary 'agent-created' provenance
model (anything not bundled and not hub-installed), this consolidated
hand-authored user workflow skills without consent.
Changes:
- should_run_now(): first observation seeds last_run_at='now' and returns
False. The next real pass fires one full interval_hours later (7 days
by default), matching the original design intent.
- hermes curator run --dry-run: produces the same review report without
applying automatic transitions OR permitting the LLM to call
skill_manage / terminal mv. A DRY-RUN banner is prepended to the
prompt and the caller skips apply_automatic_transitions. State is
NOT advanced so a preview doesn't defer the next scheduled real pass.
- hermes update: prints a one-liner on fresh installs pointing at
--dry-run, pause, and the docs. Silent on steady state.
- Docs: curator.md and cli-commands.md explain the deferred first-run
behavior and warn that hand-written SKILL.md files share the
'agent-created' bucket, with guidance to pin or preview before the
first pass.
Tests:
- test_first_run_defers replaces the old 'first run always eligible'
assertion — same fixture, inverted expectation.
- test_maybe_run_curator_defers_on_fresh_install covers the gateway tick
path end-to-end.
- Three new dry-run tests cover state-advance suppression, prompt
banner injection, and apply_automatic_transitions skipping.
Fixes #18373.
* feat(curator): pre-run backup + rollback (#18373)
Every real curator pass now snapshots ~/.hermes/skills/ into
~/.hermes/skills/.curator_backups/<utc-iso>/skills.tar.gz before calling
apply_automatic_transitions or the LLM review. If a run consolidates or
archives something the user didn't want touched, 'hermes curator
rollback' restores the tree in one command. Dry-run is skipped — no
mutation means no snapshot needed.
Changes:
- agent/curator_backup.py (new): tar.gz snapshot + safe rollback. The
snapshot excludes .curator_backups/ (would recurse) and .hub/ (managed
by the skills hub). Extract refuses absolute paths and .. components,
and uses tarfile's filter='data' on Python 3.12+. Rollback takes a
pre-rollback safety snapshot FIRST, stages the current tree into
.rollback-staging-<ts>/ so the extract lands in an empty dir, and
cleans the staging dir on success. A failed extract restores the
staged contents.
- agent/curator.py: run_curator_review() calls curator_backup.
snapshot_skills(reason='pre-curator-run') before apply_automatic_
transitions. Best-effort — a failed snapshot logs at debug and the
run continues (a transient disk issue shouldn't silently disable
curator forever).
- hermes_cli/curator.py: new 'hermes curator backup' and 'hermes curator
rollback' subcommands. rollback supports --list, --id <ts>, -y.
- hermes_cli/config.py: curator.backup.{enabled, keep} config block
with sane defaults (enabled=true, keep=5).
- Docs: curator.md gets a 'Backups and rollback' section; cli-commands
.md table gets the new rows.
Tests (new file tests/agent/test_curator_backup.py, 16 cases):
- snapshot creates tarball + manifest with correct counts
- snapshot excludes .curator_backups/ (recursion guard) and .hub/
- snapshot disabled via config returns None without creating anything
- snapshot uniquifies ids within the same second (-01 suffix)
- prune honors keep count, newest-first
- list_backups + _resolve_backup cover newest-default and unknown-id
- rollback restores a deleted skill with content intact
- rollback is itself undoable — safety snapshot shows up in list_backups
- rollback with no snapshots returns an error
- rollback refuses tarballs with absolute paths or .. components
- real curator runs take a 'pre-curator-run' snapshot; dry-runs do not
All curator tests: 210 passing locally.
2026-05-01 09:49:59 -07:00
|
|
|
final_summary = f"{prefix}{auto_summary}; llm: error ({e})"
|
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307)
Every curator pass now emits a dated report directory under
`~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files:
- `run.json` — machine-readable full record (before/after snapshot,
state transitions, all tool calls, model/provider, timing, full LLM
final response untruncated, error if any)
- `REPORT.md` — human-readable markdown: model + duration header,
auto-transition counts, LLM consolidation stats, archived-this-run
list, new-skills-this-run list, state transitions, the full LLM
final summary, and a recovery footer pointing at the archive + the
`hermes curator restore` command
Reports live under `logs/curator/`, not inside `skills/` — they're
operational telemetry, not user-authored skill data, and belong
alongside `agent.log` / `gateway.log`.
Internals:
- `_run_llm_review()` now returns a dict (final, summary, model,
provider, tool_calls, error) instead of a bare truncated string so
the reporter has full fidelity
- Report writer is fully best-effort — any failure logs at DEBUG and
never breaks the curator itself. Same-second rerun gets a numeric
suffix so reports can't clobber each other
- Report path stamped into `.curator_state` as `last_report_path`
- `hermes curator status` surfaces a "last report:" line so users
can immediately open the latest run
Tests (all green):
- 7 new tests in tests/agent/test_curator_reports.py covering: report
location (logs not skills), both files written, run.json shape and
diff accuracy, markdown structure, error path still writes, state
transitions captured, same-second runs get unique dirs
- Existing test_run_review_synchronous_invokes_llm_stub updated to
stub the new dict-returning _run_llm_review signature
Live E2E: ran a synchronous pass against a 1-skill test collection
with a stubbed LLM; report written correctly, state stamped with
last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
|
|
|
llm_meta = {
|
|
|
|
|
"final": "",
|
|
|
|
|
"summary": f"error ({e})",
|
|
|
|
|
"model": "",
|
|
|
|
|
"provider": "",
|
|
|
|
|
"tool_calls": [],
|
|
|
|
|
"error": str(e),
|
|
|
|
|
}
|
feat(curator): background skill maintenance (issue #7816)
Adds the Curator — an auxiliary-model background task that periodically
reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage,
transitions unused skills through active → stale → archived, and spawns
a forked AIAgent to consolidate overlaps and patch drift.
Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI
startup and gateway boot when the last run is older than interval_hours
(default 24) AND the agent has been idle for min_idle_hours (default 2).
Invariants (all load-bearing):
- Never touches bundled or hub-installed skills (.bundled_manifest +
.hub/lock.json double-filter)
- Never auto-deletes — archive only. Archives are recoverable
via `hermes curator restore <skill>`
- Pinned skills bypass all auto-transitions
- Uses the aux client; never touches the main session's prompt cache
New files:
- tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes,
provenance filter
- agent/curator.py — orchestrator: config, idle gating, state-machine
transitions (pure, no LLM), forked-agent review prompt
- hermes_cli/curator.py — `hermes curator {status,run,pause,resume,
pin,unpin,restore}` subcommand
- tests/tools/test_skill_usage.py — 29 tests
- tests/agent/test_curator.py — 25 tests
Modified files (surgical patches):
- tools/skills_tool.py — bump view_count on successful skill_view
- tools/skill_manager_tool.py — bump patch_count on skill_manage
patch/edit/write_file/remove_file; forget record on delete
- hermes_cli/config.py — add curator: section to DEFAULT_CONFIG
- hermes_cli/commands.py — add /curator CommandDef with subcommands
- hermes_cli/main.py — register `hermes curator` subparser via
register_cli() from hermes_cli.curator
- cli.py — /curator slash-command dispatch + startup hook
- gateway/run.py — gateway-boot hook (mirrors CLI)
Validation:
- 54 new tests across skill_usage + curator, all passing in 3s
- 346 tests across all touched files' neighbors green
- 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green
- CLI smoke: `hermes curator status/pause/resume` work end-to-end
Companion to PR #16026 (class-first skill review prompt) — together
they form a loop: the review prompt stops near-duplicate skill creation
at the source, and the curator prunes/consolidates what still accumulates.
Refs #7816.
2026-04-26 06:08:39 -07:00
|
|
|
|
|
|
|
|
elapsed = (datetime.now(timezone.utc) - start).total_seconds()
|
|
|
|
|
state2 = load_state()
|
|
|
|
|
state2["last_run_duration_seconds"] = elapsed
|
|
|
|
|
state2["last_run_summary"] = final_summary
|
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307)
Every curator pass now emits a dated report directory under
`~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files:
- `run.json` — machine-readable full record (before/after snapshot,
state transitions, all tool calls, model/provider, timing, full LLM
final response untruncated, error if any)
- `REPORT.md` — human-readable markdown: model + duration header,
auto-transition counts, LLM consolidation stats, archived-this-run
list, new-skills-this-run list, state transitions, the full LLM
final summary, and a recovery footer pointing at the archive + the
`hermes curator restore` command
Reports live under `logs/curator/`, not inside `skills/` — they're
operational telemetry, not user-authored skill data, and belong
alongside `agent.log` / `gateway.log`.
Internals:
- `_run_llm_review()` now returns a dict (final, summary, model,
provider, tool_calls, error) instead of a bare truncated string so
the reporter has full fidelity
- Report writer is fully best-effort — any failure logs at DEBUG and
never breaks the curator itself. Same-second rerun gets a numeric
suffix so reports can't clobber each other
- Report path stamped into `.curator_state` as `last_report_path`
- `hermes curator status` surfaces a "last report:" line so users
can immediately open the latest run
Tests (all green):
- 7 new tests in tests/agent/test_curator_reports.py covering: report
location (logs not skills), both files written, run.json shape and
diff accuracy, markdown structure, error path still writes, state
transitions captured, same-second runs get unique dirs
- Existing test_run_review_synchronous_invokes_llm_stub updated to
stub the new dict-returning _run_llm_review signature
Live E2E: ran a synchronous pass against a 1-skill test collection
with a stubbed LLM; report written correctly, state stamped with
last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
|
|
|
|
|
|
|
|
# Write the per-run report. Runs in a best-effort try so a
|
|
|
|
|
# reporting bug never breaks the curator itself. Report path is
|
|
|
|
|
# recorded in state so `hermes curator status` can point at it.
|
|
|
|
|
try:
|
|
|
|
|
after_report = skill_usage.agent_created_report()
|
|
|
|
|
except Exception:
|
|
|
|
|
after_report = []
|
|
|
|
|
try:
|
|
|
|
|
report_path = _write_run_report(
|
|
|
|
|
started_at=start,
|
|
|
|
|
elapsed_seconds=elapsed,
|
|
|
|
|
auto_counts=counts,
|
|
|
|
|
auto_summary=auto_summary,
|
|
|
|
|
before_report=before_report,
|
|
|
|
|
before_names=before_names,
|
|
|
|
|
after_report=after_report,
|
|
|
|
|
llm_meta=llm_meta,
|
|
|
|
|
)
|
|
|
|
|
if report_path is not None:
|
|
|
|
|
state2["last_report_path"] = str(report_path)
|
|
|
|
|
except Exception as e:
|
|
|
|
|
logger.debug("Curator report write failed: %s", e, exc_info=True)
|
|
|
|
|
|
feat(curator): background skill maintenance (issue #7816)
Adds the Curator — an auxiliary-model background task that periodically
reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage,
transitions unused skills through active → stale → archived, and spawns
a forked AIAgent to consolidate overlaps and patch drift.
Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI
startup and gateway boot when the last run is older than interval_hours
(default 24) AND the agent has been idle for min_idle_hours (default 2).
Invariants (all load-bearing):
- Never touches bundled or hub-installed skills (.bundled_manifest +
.hub/lock.json double-filter)
- Never auto-deletes — archive only. Archives are recoverable
via `hermes curator restore <skill>`
- Pinned skills bypass all auto-transitions
- Uses the aux client; never touches the main session's prompt cache
New files:
- tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes,
provenance filter
- agent/curator.py — orchestrator: config, idle gating, state-machine
transitions (pure, no LLM), forked-agent review prompt
- hermes_cli/curator.py — `hermes curator {status,run,pause,resume,
pin,unpin,restore}` subcommand
- tests/tools/test_skill_usage.py — 29 tests
- tests/agent/test_curator.py — 25 tests
Modified files (surgical patches):
- tools/skills_tool.py — bump view_count on successful skill_view
- tools/skill_manager_tool.py — bump patch_count on skill_manage
patch/edit/write_file/remove_file; forget record on delete
- hermes_cli/config.py — add curator: section to DEFAULT_CONFIG
- hermes_cli/commands.py — add /curator CommandDef with subcommands
- hermes_cli/main.py — register `hermes curator` subparser via
register_cli() from hermes_cli.curator
- cli.py — /curator slash-command dispatch + startup hook
- gateway/run.py — gateway-boot hook (mirrors CLI)
Validation:
- 54 new tests across skill_usage + curator, all passing in 3s
- 346 tests across all touched files' neighbors green
- 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green
- CLI smoke: `hermes curator status/pause/resume` work end-to-end
Companion to PR #16026 (class-first skill review prompt) — together
they form a loop: the review prompt stops near-duplicate skill creation
at the source, and the curator prunes/consolidates what still accumulates.
Refs #7816.
2026-04-26 06:08:39 -07:00
|
|
|
save_state(state2)
|
|
|
|
|
|
|
|
|
|
if on_summary:
|
|
|
|
|
try:
|
|
|
|
|
on_summary(f"curator: {final_summary}")
|
|
|
|
|
except Exception:
|
|
|
|
|
pass
|
|
|
|
|
|
|
|
|
|
if synchronous:
|
|
|
|
|
_llm_pass()
|
|
|
|
|
else:
|
|
|
|
|
t = threading.Thread(target=_llm_pass, daemon=True, name="curator-review")
|
|
|
|
|
t.start()
|
|
|
|
|
|
|
|
|
|
return {
|
|
|
|
|
"started_at": start.isoformat(),
|
|
|
|
|
"auto_transitions": counts,
|
|
|
|
|
"summary_so_far": auto_summary,
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
2026-05-02 01:13:17 +08:00
|
|
|
def _resolve_review_runtime(cfg: Dict[str, Any]) -> _ReviewRuntimeBinding:
|
|
|
|
|
"""Resolve provider/model and per-slot credentials for the curator review fork.
|
fix(curator): unify under auxiliary.curator (hermes model, dashboard) (#17868)
Voscko reported curator.auxiliary.provider/model was advertised in the
docs but ignored — the review fork read only model.provider/default. The
narrow fix would wire the one-off key through, but that leaves curator
as a parallel system: not in `hermes model` → auxiliary picker, not in
the dashboard Models tab, missing per-task base_url/api_key/timeout/
extra_body.
Unify curator with the rest of the aux task system so `hermes model`
and the dashboard configure it like every other aux task.
Four sources of truth updated:
- hermes_cli/config.py — add 'curator' slot to DEFAULT_CONFIG.auxiliary
(timeout=600 since reviews run long), drop the one-off curator.auxiliary
block from DEFAULT_CONFIG.curator.
- hermes_cli/main.py — add ('curator', 'Curator', 'skill-usage review pass')
to _AUX_TASKS so the CLI picker offers it.
- hermes_cli/web_server.py — add 'curator' to _AUX_TASK_SLOTS so the
dashboard REST endpoint accepts it.
- web/src/pages/ModelsPage.tsx — add Curator entry so the dashboard
Models tab renders the task.
agent/curator.py _resolve_review_model() now reads auxiliary.curator
first (canonical), falls back to legacy curator.auxiliary (with an info
log asking users to migrate), then falls back to the main chat model.
Pre-unification users keep working.
Docs updated: docs/user-guide/features/curator.md now points at
`hermes model` → auxiliary → Curator and the dashboard Models tab.
Tests: 6 unit tests on _resolve_review_model (auto default, canonical
slot honored, partial override fallback, legacy fallback with
deprecation log assertion, new-wins-over-legacy, empty-config safety)
plus a cross-registry test that curator is wired into all four sources
of truth. test_aux_tasks_keys_all_exist_in_default_config already
covers the DEFAULT_CONFIG ↔ _AUX_TASKS invariant.
Reported by Voscko on Discord.
2026-04-30 02:46:01 -07:00
|
|
|
|
2026-05-02 01:13:17 +08:00
|
|
|
Same precedence as `_resolve_review_model()`. Non-empty ``api_key`` /
|
|
|
|
|
``base_url`` from the active slot are returned as explicit overrides so
|
|
|
|
|
``resolve_runtime_provider`` does not silently reuse the main chat
|
|
|
|
|
credential chain for a routed auxiliary model.
|
fix(curator): unify under auxiliary.curator (hermes model, dashboard) (#17868)
Voscko reported curator.auxiliary.provider/model was advertised in the
docs but ignored — the review fork read only model.provider/default. The
narrow fix would wire the one-off key through, but that leaves curator
as a parallel system: not in `hermes model` → auxiliary picker, not in
the dashboard Models tab, missing per-task base_url/api_key/timeout/
extra_body.
Unify curator with the rest of the aux task system so `hermes model`
and the dashboard configure it like every other aux task.
Four sources of truth updated:
- hermes_cli/config.py — add 'curator' slot to DEFAULT_CONFIG.auxiliary
(timeout=600 since reviews run long), drop the one-off curator.auxiliary
block from DEFAULT_CONFIG.curator.
- hermes_cli/main.py — add ('curator', 'Curator', 'skill-usage review pass')
to _AUX_TASKS so the CLI picker offers it.
- hermes_cli/web_server.py — add 'curator' to _AUX_TASK_SLOTS so the
dashboard REST endpoint accepts it.
- web/src/pages/ModelsPage.tsx — add Curator entry so the dashboard
Models tab renders the task.
agent/curator.py _resolve_review_model() now reads auxiliary.curator
first (canonical), falls back to legacy curator.auxiliary (with an info
log asking users to migrate), then falls back to the main chat model.
Pre-unification users keep working.
Docs updated: docs/user-guide/features/curator.md now points at
`hermes model` → auxiliary → Curator and the dashboard Models tab.
Tests: 6 unit tests on _resolve_review_model (auto default, canonical
slot honored, partial override fallback, legacy fallback with
deprecation log assertion, new-wins-over-legacy, empty-config safety)
plus a cross-registry test that curator is wired into all four sources
of truth. test_aux_tasks_keys_all_exist_in_default_config already
covers the DEFAULT_CONFIG ↔ _AUX_TASKS invariant.
Reported by Voscko on Discord.
2026-04-30 02:46:01 -07:00
|
|
|
"""
|
|
|
|
|
_main = cfg.get("model", {}) if isinstance(cfg.get("model"), dict) else {}
|
|
|
|
|
_main_provider = _main.get("provider") or "auto"
|
|
|
|
|
_main_model = _main.get("default") or _main.get("model") or ""
|
|
|
|
|
|
|
|
|
|
# 1. Canonical aux task slot
|
|
|
|
|
_aux = cfg.get("auxiliary", {}) if isinstance(cfg.get("auxiliary"), dict) else {}
|
|
|
|
|
_cur_task = _aux.get("curator", {}) if isinstance(_aux.get("curator"), dict) else {}
|
|
|
|
|
_task_provider = (_cur_task.get("provider") or "").strip() or None
|
|
|
|
|
_task_model = (_cur_task.get("model") or "").strip() or None
|
|
|
|
|
if _task_provider and _task_provider != "auto" and _task_model:
|
2026-05-02 01:13:17 +08:00
|
|
|
return _ReviewRuntimeBinding(
|
|
|
|
|
_task_provider,
|
|
|
|
|
_task_model,
|
|
|
|
|
_strip_aux_credential(_cur_task.get("api_key")),
|
|
|
|
|
_strip_aux_credential(_cur_task.get("base_url")),
|
|
|
|
|
)
|
fix(curator): unify under auxiliary.curator (hermes model, dashboard) (#17868)
Voscko reported curator.auxiliary.provider/model was advertised in the
docs but ignored — the review fork read only model.provider/default. The
narrow fix would wire the one-off key through, but that leaves curator
as a parallel system: not in `hermes model` → auxiliary picker, not in
the dashboard Models tab, missing per-task base_url/api_key/timeout/
extra_body.
Unify curator with the rest of the aux task system so `hermes model`
and the dashboard configure it like every other aux task.
Four sources of truth updated:
- hermes_cli/config.py — add 'curator' slot to DEFAULT_CONFIG.auxiliary
(timeout=600 since reviews run long), drop the one-off curator.auxiliary
block from DEFAULT_CONFIG.curator.
- hermes_cli/main.py — add ('curator', 'Curator', 'skill-usage review pass')
to _AUX_TASKS so the CLI picker offers it.
- hermes_cli/web_server.py — add 'curator' to _AUX_TASK_SLOTS so the
dashboard REST endpoint accepts it.
- web/src/pages/ModelsPage.tsx — add Curator entry so the dashboard
Models tab renders the task.
agent/curator.py _resolve_review_model() now reads auxiliary.curator
first (canonical), falls back to legacy curator.auxiliary (with an info
log asking users to migrate), then falls back to the main chat model.
Pre-unification users keep working.
Docs updated: docs/user-guide/features/curator.md now points at
`hermes model` → auxiliary → Curator and the dashboard Models tab.
Tests: 6 unit tests on _resolve_review_model (auto default, canonical
slot honored, partial override fallback, legacy fallback with
deprecation log assertion, new-wins-over-legacy, empty-config safety)
plus a cross-registry test that curator is wired into all four sources
of truth. test_aux_tasks_keys_all_exist_in_default_config already
covers the DEFAULT_CONFIG ↔ _AUX_TASKS invariant.
Reported by Voscko on Discord.
2026-04-30 02:46:01 -07:00
|
|
|
|
|
|
|
|
# 2. Legacy curator.auxiliary.{provider,model} (deprecated, pre-unification)
|
|
|
|
|
_cur = cfg.get("curator", {}) if isinstance(cfg.get("curator"), dict) else {}
|
|
|
|
|
_legacy = _cur.get("auxiliary", {}) if isinstance(_cur.get("auxiliary"), dict) else {}
|
|
|
|
|
_legacy_provider = _legacy.get("provider") or None
|
|
|
|
|
_legacy_model = _legacy.get("model") or None
|
|
|
|
|
if _legacy_provider and _legacy_model:
|
|
|
|
|
logger.info(
|
|
|
|
|
"curator: using deprecated curator.auxiliary.{provider,model} "
|
|
|
|
|
"config — please migrate to auxiliary.curator.{provider,model}"
|
|
|
|
|
)
|
2026-05-02 01:13:17 +08:00
|
|
|
return _ReviewRuntimeBinding(
|
|
|
|
|
str(_legacy_provider),
|
|
|
|
|
str(_legacy_model),
|
|
|
|
|
_strip_aux_credential(_legacy.get("api_key")),
|
|
|
|
|
_strip_aux_credential(_legacy.get("base_url")),
|
|
|
|
|
)
|
fix(curator): unify under auxiliary.curator (hermes model, dashboard) (#17868)
Voscko reported curator.auxiliary.provider/model was advertised in the
docs but ignored — the review fork read only model.provider/default. The
narrow fix would wire the one-off key through, but that leaves curator
as a parallel system: not in `hermes model` → auxiliary picker, not in
the dashboard Models tab, missing per-task base_url/api_key/timeout/
extra_body.
Unify curator with the rest of the aux task system so `hermes model`
and the dashboard configure it like every other aux task.
Four sources of truth updated:
- hermes_cli/config.py — add 'curator' slot to DEFAULT_CONFIG.auxiliary
(timeout=600 since reviews run long), drop the one-off curator.auxiliary
block from DEFAULT_CONFIG.curator.
- hermes_cli/main.py — add ('curator', 'Curator', 'skill-usage review pass')
to _AUX_TASKS so the CLI picker offers it.
- hermes_cli/web_server.py — add 'curator' to _AUX_TASK_SLOTS so the
dashboard REST endpoint accepts it.
- web/src/pages/ModelsPage.tsx — add Curator entry so the dashboard
Models tab renders the task.
agent/curator.py _resolve_review_model() now reads auxiliary.curator
first (canonical), falls back to legacy curator.auxiliary (with an info
log asking users to migrate), then falls back to the main chat model.
Pre-unification users keep working.
Docs updated: docs/user-guide/features/curator.md now points at
`hermes model` → auxiliary → Curator and the dashboard Models tab.
Tests: 6 unit tests on _resolve_review_model (auto default, canonical
slot honored, partial override fallback, legacy fallback with
deprecation log assertion, new-wins-over-legacy, empty-config safety)
plus a cross-registry test that curator is wired into all four sources
of truth. test_aux_tasks_keys_all_exist_in_default_config already
covers the DEFAULT_CONFIG ↔ _AUX_TASKS invariant.
Reported by Voscko on Discord.
2026-04-30 02:46:01 -07:00
|
|
|
|
|
|
|
|
# 3. Fall through to the main chat model
|
2026-05-02 01:13:17 +08:00
|
|
|
return _ReviewRuntimeBinding(_main_provider, _main_model, None, None)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _resolve_review_model(cfg: Dict[str, Any]) -> tuple[str, str]:
|
|
|
|
|
"""Pick (provider, model) for the curator review fork.
|
|
|
|
|
|
|
|
|
|
Curator is a regular auxiliary task slot — ``auxiliary.curator.{provider,model}``
|
|
|
|
|
— so it participates in the canonical aux-model plumbing (``hermes model`` →
|
|
|
|
|
auxiliary picker, the dashboard Models tab, ``auxiliary.curator.{timeout,
|
|
|
|
|
base_url,api_key,extra_body}``). ``provider: "auto"`` with an empty model
|
|
|
|
|
means "use the main chat model" — same default as every other aux task.
|
|
|
|
|
|
|
|
|
|
Legacy fallback: users who configured ``curator.auxiliary.{provider,model}``
|
|
|
|
|
under the previous one-off schema still work. Precedence:
|
|
|
|
|
1. ``auxiliary.curator.{provider,model}`` when both are set non-auto
|
|
|
|
|
2. Legacy ``curator.auxiliary.{provider,model}`` when both are set
|
|
|
|
|
3. Main ``model.{provider,default/model}`` pair
|
|
|
|
|
"""
|
|
|
|
|
b = _resolve_review_runtime(cfg)
|
|
|
|
|
return b.provider, b.model
|
fix(curator): unify under auxiliary.curator (hermes model, dashboard) (#17868)
Voscko reported curator.auxiliary.provider/model was advertised in the
docs but ignored — the review fork read only model.provider/default. The
narrow fix would wire the one-off key through, but that leaves curator
as a parallel system: not in `hermes model` → auxiliary picker, not in
the dashboard Models tab, missing per-task base_url/api_key/timeout/
extra_body.
Unify curator with the rest of the aux task system so `hermes model`
and the dashboard configure it like every other aux task.
Four sources of truth updated:
- hermes_cli/config.py — add 'curator' slot to DEFAULT_CONFIG.auxiliary
(timeout=600 since reviews run long), drop the one-off curator.auxiliary
block from DEFAULT_CONFIG.curator.
- hermes_cli/main.py — add ('curator', 'Curator', 'skill-usage review pass')
to _AUX_TASKS so the CLI picker offers it.
- hermes_cli/web_server.py — add 'curator' to _AUX_TASK_SLOTS so the
dashboard REST endpoint accepts it.
- web/src/pages/ModelsPage.tsx — add Curator entry so the dashboard
Models tab renders the task.
agent/curator.py _resolve_review_model() now reads auxiliary.curator
first (canonical), falls back to legacy curator.auxiliary (with an info
log asking users to migrate), then falls back to the main chat model.
Pre-unification users keep working.
Docs updated: docs/user-guide/features/curator.md now points at
`hermes model` → auxiliary → Curator and the dashboard Models tab.
Tests: 6 unit tests on _resolve_review_model (auto default, canonical
slot honored, partial override fallback, legacy fallback with
deprecation log assertion, new-wins-over-legacy, empty-config safety)
plus a cross-registry test that curator is wired into all four sources
of truth. test_aux_tasks_keys_all_exist_in_default_config already
covers the DEFAULT_CONFIG ↔ _AUX_TASKS invariant.
Reported by Voscko on Discord.
2026-04-30 02:46:01 -07:00
|
|
|
|
|
|
|
|
|
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307)
Every curator pass now emits a dated report directory under
`~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files:
- `run.json` — machine-readable full record (before/after snapshot,
state transitions, all tool calls, model/provider, timing, full LLM
final response untruncated, error if any)
- `REPORT.md` — human-readable markdown: model + duration header,
auto-transition counts, LLM consolidation stats, archived-this-run
list, new-skills-this-run list, state transitions, the full LLM
final summary, and a recovery footer pointing at the archive + the
`hermes curator restore` command
Reports live under `logs/curator/`, not inside `skills/` — they're
operational telemetry, not user-authored skill data, and belong
alongside `agent.log` / `gateway.log`.
Internals:
- `_run_llm_review()` now returns a dict (final, summary, model,
provider, tool_calls, error) instead of a bare truncated string so
the reporter has full fidelity
- Report writer is fully best-effort — any failure logs at DEBUG and
never breaks the curator itself. Same-second rerun gets a numeric
suffix so reports can't clobber each other
- Report path stamped into `.curator_state` as `last_report_path`
- `hermes curator status` surfaces a "last report:" line so users
can immediately open the latest run
Tests (all green):
- 7 new tests in tests/agent/test_curator_reports.py covering: report
location (logs not skills), both files written, run.json shape and
diff accuracy, markdown structure, error path still writes, state
transitions captured, same-second runs get unique dirs
- Existing test_run_review_synchronous_invokes_llm_stub updated to
stub the new dict-returning _run_llm_review signature
Live E2E: ran a synchronous pass against a 1-skill test collection
with a stubbed LLM; report written correctly, state stamped with
last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
|
|
|
def _run_llm_review(prompt: str) -> Dict[str, Any]:
|
|
|
|
|
"""Spawn an AIAgent fork to run the curator review prompt.
|
|
|
|
|
|
|
|
|
|
Returns a dict with:
|
|
|
|
|
- final: full (untruncated) final response from the reviewer
|
|
|
|
|
- summary: short summary suitable for state file (240-char cap)
|
|
|
|
|
- model, provider: what the fork actually ran on
|
|
|
|
|
- tool_calls: list of {name, arguments} for every tool call made during
|
|
|
|
|
the pass (arguments may be truncated for readability)
|
|
|
|
|
- error: set if the pass failed mid-run; final/summary may still be empty
|
|
|
|
|
|
|
|
|
|
Never raises; callers get a structured failure instead.
|
|
|
|
|
"""
|
feat(curator): background skill maintenance (issue #7816)
Adds the Curator — an auxiliary-model background task that periodically
reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage,
transitions unused skills through active → stale → archived, and spawns
a forked AIAgent to consolidate overlaps and patch drift.
Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI
startup and gateway boot when the last run is older than interval_hours
(default 24) AND the agent has been idle for min_idle_hours (default 2).
Invariants (all load-bearing):
- Never touches bundled or hub-installed skills (.bundled_manifest +
.hub/lock.json double-filter)
- Never auto-deletes — archive only. Archives are recoverable
via `hermes curator restore <skill>`
- Pinned skills bypass all auto-transitions
- Uses the aux client; never touches the main session's prompt cache
New files:
- tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes,
provenance filter
- agent/curator.py — orchestrator: config, idle gating, state-machine
transitions (pure, no LLM), forked-agent review prompt
- hermes_cli/curator.py — `hermes curator {status,run,pause,resume,
pin,unpin,restore}` subcommand
- tests/tools/test_skill_usage.py — 29 tests
- tests/agent/test_curator.py — 25 tests
Modified files (surgical patches):
- tools/skills_tool.py — bump view_count on successful skill_view
- tools/skill_manager_tool.py — bump patch_count on skill_manage
patch/edit/write_file/remove_file; forget record on delete
- hermes_cli/config.py — add curator: section to DEFAULT_CONFIG
- hermes_cli/commands.py — add /curator CommandDef with subcommands
- hermes_cli/main.py — register `hermes curator` subparser via
register_cli() from hermes_cli.curator
- cli.py — /curator slash-command dispatch + startup hook
- gateway/run.py — gateway-boot hook (mirrors CLI)
Validation:
- 54 new tests across skill_usage + curator, all passing in 3s
- 346 tests across all touched files' neighbors green
- 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green
- CLI smoke: `hermes curator status/pause/resume` work end-to-end
Companion to PR #16026 (class-first skill review prompt) — together
they form a loop: the review prompt stops near-duplicate skill creation
at the source, and the curator prunes/consolidates what still accumulates.
Refs #7816.
2026-04-26 06:08:39 -07:00
|
|
|
import contextlib
|
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307)
Every curator pass now emits a dated report directory under
`~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files:
- `run.json` — machine-readable full record (before/after snapshot,
state transitions, all tool calls, model/provider, timing, full LLM
final response untruncated, error if any)
- `REPORT.md` — human-readable markdown: model + duration header,
auto-transition counts, LLM consolidation stats, archived-this-run
list, new-skills-this-run list, state transitions, the full LLM
final summary, and a recovery footer pointing at the archive + the
`hermes curator restore` command
Reports live under `logs/curator/`, not inside `skills/` — they're
operational telemetry, not user-authored skill data, and belong
alongside `agent.log` / `gateway.log`.
Internals:
- `_run_llm_review()` now returns a dict (final, summary, model,
provider, tool_calls, error) instead of a bare truncated string so
the reporter has full fidelity
- Report writer is fully best-effort — any failure logs at DEBUG and
never breaks the curator itself. Same-second rerun gets a numeric
suffix so reports can't clobber each other
- Report path stamped into `.curator_state` as `last_report_path`
- `hermes curator status` surfaces a "last report:" line so users
can immediately open the latest run
Tests (all green):
- 7 new tests in tests/agent/test_curator_reports.py covering: report
location (logs not skills), both files written, run.json shape and
diff accuracy, markdown structure, error path still writes, state
transitions captured, same-second runs get unique dirs
- Existing test_run_review_synchronous_invokes_llm_stub updated to
stub the new dict-returning _run_llm_review signature
Live E2E: ran a synchronous pass against a 1-skill test collection
with a stubbed LLM; report written correctly, state stamped with
last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
|
|
|
result_meta: Dict[str, Any] = {
|
|
|
|
|
"final": "",
|
|
|
|
|
"summary": "",
|
|
|
|
|
"model": "",
|
|
|
|
|
"provider": "",
|
|
|
|
|
"tool_calls": [],
|
|
|
|
|
"error": None,
|
|
|
|
|
}
|
feat(curator): background skill maintenance (issue #7816)
Adds the Curator — an auxiliary-model background task that periodically
reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage,
transitions unused skills through active → stale → archived, and spawns
a forked AIAgent to consolidate overlaps and patch drift.
Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI
startup and gateway boot when the last run is older than interval_hours
(default 24) AND the agent has been idle for min_idle_hours (default 2).
Invariants (all load-bearing):
- Never touches bundled or hub-installed skills (.bundled_manifest +
.hub/lock.json double-filter)
- Never auto-deletes — archive only. Archives are recoverable
via `hermes curator restore <skill>`
- Pinned skills bypass all auto-transitions
- Uses the aux client; never touches the main session's prompt cache
New files:
- tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes,
provenance filter
- agent/curator.py — orchestrator: config, idle gating, state-machine
transitions (pure, no LLM), forked-agent review prompt
- hermes_cli/curator.py — `hermes curator {status,run,pause,resume,
pin,unpin,restore}` subcommand
- tests/tools/test_skill_usage.py — 29 tests
- tests/agent/test_curator.py — 25 tests
Modified files (surgical patches):
- tools/skills_tool.py — bump view_count on successful skill_view
- tools/skill_manager_tool.py — bump patch_count on skill_manage
patch/edit/write_file/remove_file; forget record on delete
- hermes_cli/config.py — add curator: section to DEFAULT_CONFIG
- hermes_cli/commands.py — add /curator CommandDef with subcommands
- hermes_cli/main.py — register `hermes curator` subparser via
register_cli() from hermes_cli.curator
- cli.py — /curator slash-command dispatch + startup hook
- gateway/run.py — gateway-boot hook (mirrors CLI)
Validation:
- 54 new tests across skill_usage + curator, all passing in 3s
- 346 tests across all touched files' neighbors green
- 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green
- CLI smoke: `hermes curator status/pause/resume` work end-to-end
Companion to PR #16026 (class-first skill review prompt) — together
they form a loop: the review prompt stops near-duplicate skill creation
at the source, and the curator prunes/consolidates what still accumulates.
Refs #7816.
2026-04-26 06:08:39 -07:00
|
|
|
try:
|
|
|
|
|
from run_agent import AIAgent
|
|
|
|
|
except Exception as e:
|
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307)
Every curator pass now emits a dated report directory under
`~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files:
- `run.json` — machine-readable full record (before/after snapshot,
state transitions, all tool calls, model/provider, timing, full LLM
final response untruncated, error if any)
- `REPORT.md` — human-readable markdown: model + duration header,
auto-transition counts, LLM consolidation stats, archived-this-run
list, new-skills-this-run list, state transitions, the full LLM
final summary, and a recovery footer pointing at the archive + the
`hermes curator restore` command
Reports live under `logs/curator/`, not inside `skills/` — they're
operational telemetry, not user-authored skill data, and belong
alongside `agent.log` / `gateway.log`.
Internals:
- `_run_llm_review()` now returns a dict (final, summary, model,
provider, tool_calls, error) instead of a bare truncated string so
the reporter has full fidelity
- Report writer is fully best-effort — any failure logs at DEBUG and
never breaks the curator itself. Same-second rerun gets a numeric
suffix so reports can't clobber each other
- Report path stamped into `.curator_state` as `last_report_path`
- `hermes curator status` surfaces a "last report:" line so users
can immediately open the latest run
Tests (all green):
- 7 new tests in tests/agent/test_curator_reports.py covering: report
location (logs not skills), both files written, run.json shape and
diff accuracy, markdown structure, error path still writes, state
transitions captured, same-second runs get unique dirs
- Existing test_run_review_synchronous_invokes_llm_stub updated to
stub the new dict-returning _run_llm_review signature
Live E2E: ran a synchronous pass against a 1-skill test collection
with a stubbed LLM; report written correctly, state stamped with
last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
|
|
|
result_meta["error"] = f"AIAgent import failed: {e}"
|
|
|
|
|
result_meta["summary"] = result_meta["error"]
|
|
|
|
|
return result_meta
|
feat(curator): background skill maintenance (issue #7816)
Adds the Curator — an auxiliary-model background task that periodically
reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage,
transitions unused skills through active → stale → archived, and spawns
a forked AIAgent to consolidate overlaps and patch drift.
Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI
startup and gateway boot when the last run is older than interval_hours
(default 24) AND the agent has been idle for min_idle_hours (default 2).
Invariants (all load-bearing):
- Never touches bundled or hub-installed skills (.bundled_manifest +
.hub/lock.json double-filter)
- Never auto-deletes — archive only. Archives are recoverable
via `hermes curator restore <skill>`
- Pinned skills bypass all auto-transitions
- Uses the aux client; never touches the main session's prompt cache
New files:
- tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes,
provenance filter
- agent/curator.py — orchestrator: config, idle gating, state-machine
transitions (pure, no LLM), forked-agent review prompt
- hermes_cli/curator.py — `hermes curator {status,run,pause,resume,
pin,unpin,restore}` subcommand
- tests/tools/test_skill_usage.py — 29 tests
- tests/agent/test_curator.py — 25 tests
Modified files (surgical patches):
- tools/skills_tool.py — bump view_count on successful skill_view
- tools/skill_manager_tool.py — bump patch_count on skill_manage
patch/edit/write_file/remove_file; forget record on delete
- hermes_cli/config.py — add curator: section to DEFAULT_CONFIG
- hermes_cli/commands.py — add /curator CommandDef with subcommands
- hermes_cli/main.py — register `hermes curator` subparser via
register_cli() from hermes_cli.curator
- cli.py — /curator slash-command dispatch + startup hook
- gateway/run.py — gateway-boot hook (mirrors CLI)
Validation:
- 54 new tests across skill_usage + curator, all passing in 3s
- 346 tests across all touched files' neighbors green
- 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green
- CLI smoke: `hermes curator status/pause/resume` work end-to-end
Companion to PR #16026 (class-first skill review prompt) — together
they form a loop: the review prompt stops near-duplicate skill creation
at the source, and the curator prunes/consolidates what still accumulates.
Refs #7816.
2026-04-26 06:08:39 -07:00
|
|
|
|
feat(curator): umbrella-first prompt, inherit parent config, unbounded iterations
Based on three live test runs against 346 agent-created skills on the
author's own setup (~6.5 min, opus-4.7, 86 API calls), the curator
prompt needed three sharpenings before it consistently produced real
umbrella consolidation instead of passive audit output:
**Umbrella-first framing.** The original 'decide keep/patch/archive/
consolidate' framing lets opus default to 'keep' whenever two skills
aren't byte-identical. The new prompt explicitly tells the reviewer
that pairwise distinctness is the wrong bar — the right question is
'would a human maintainer write this as N separate skills, or one
skill with N labeled subsections?' Expect 10-25 prefix clusters; merge
each into an umbrella via one of three methods.
**Three concrete consolidation methods.** (a) Merge into an existing
umbrella (patch the broadest skill, archive siblings); (b) Create a
new umbrella SKILL.md (skill_manage action=create); (c) Demote
session-specific detail into references/, templates/, or scripts/
under the umbrella via skill_manage action=write_file, then archive
the narrow sibling. This matches the support-file vocabulary the
review-prompt side already uses (PR #17213).
**Two observed bailouts pre-empted:** 'usage counters are zero so I
can't judge' (rule 4: judge on content, not use_count) and 'each has
a distinct trigger' (rule 5: pairwise distinctness is the wrong bar).
**Config-aware parent inheritance.** _run_llm_review() was building
AIAgent() without explicit provider/model, hitting an auto-resolve
path that returned empty credentials → HTTP 400 'No models provided'
against OpenRouter. Fork now inherits the user's main provider and
model (via load_config + resolve_runtime_provider) before spawning —
runs on whatever the user is currently on, OAuth-backed or
pool-backed included.
**Unbounded iteration ceiling.** max_iterations=8 was way too low for
an umbrella-build pass over hundreds of skills. A live pass takes
50-100 API calls (scanning, clustering, skill_view'ing candidates,
patching umbrellas, mv'ing siblings). Raised to 9999 — the natural
stopping criterion is 'no more clusters worth processing', not an
arbitrary tool-call budget.
**Tests updated:** test_curator_review_prompt_has_invariants accepts
DO NOT / MUST NOT and drops 'keep' from the required-verb set (the
umbrella-first prompt correctly deemphasizes 'keep' as a first-class
decision label since passive keep-everything is the failure mode
being prevented). Added test_curator_review_prompt_is_umbrella_first
asserting the umbrella framing, class-level thinking, references/
+ templates/ + scripts/ support-file mentions, and the 'use_count
is not evidence of value' pre-emption. Added
test_curator_review_prompt_offers_support_file_actions asserting
skill_manage action=create and action=write_file are both named.
**Live validation on author's setup:**
- Run 1 (old prompt): 3 archives, stopped after surveying — typical passive outcome
- Run 2 (consolidation prompt): 44 archives, 3 patches, surfaced the 50-skill mlops reorg duplicate bug but didn't umbrella
- Run 3 (this prompt): 249 archives + 18 new class-level umbrellas created, reducing agent-created skills from 346 → 118 with every archived skill's content preserved as references/ under its umbrella. Pinned skill untouched. Full report in PR description.
2026-04-28 22:07:02 -07:00
|
|
|
# Resolve provider + model the same way the CLI does, so the curator
|
|
|
|
|
# fork inherits the user's active main config rather than falling
|
|
|
|
|
# through to an empty provider/model pair (which sends HTTP 400
|
|
|
|
|
# "No models provided"). AIAgent() without explicit provider/model
|
|
|
|
|
# arguments hits an auto-resolution path that fails for OAuth-only
|
|
|
|
|
# providers and for pool-backed credentials.
|
fix(curator): unify under auxiliary.curator (hermes model, dashboard) (#17868)
Voscko reported curator.auxiliary.provider/model was advertised in the
docs but ignored — the review fork read only model.provider/default. The
narrow fix would wire the one-off key through, but that leaves curator
as a parallel system: not in `hermes model` → auxiliary picker, not in
the dashboard Models tab, missing per-task base_url/api_key/timeout/
extra_body.
Unify curator with the rest of the aux task system so `hermes model`
and the dashboard configure it like every other aux task.
Four sources of truth updated:
- hermes_cli/config.py — add 'curator' slot to DEFAULT_CONFIG.auxiliary
(timeout=600 since reviews run long), drop the one-off curator.auxiliary
block from DEFAULT_CONFIG.curator.
- hermes_cli/main.py — add ('curator', 'Curator', 'skill-usage review pass')
to _AUX_TASKS so the CLI picker offers it.
- hermes_cli/web_server.py — add 'curator' to _AUX_TASK_SLOTS so the
dashboard REST endpoint accepts it.
- web/src/pages/ModelsPage.tsx — add Curator entry so the dashboard
Models tab renders the task.
agent/curator.py _resolve_review_model() now reads auxiliary.curator
first (canonical), falls back to legacy curator.auxiliary (with an info
log asking users to migrate), then falls back to the main chat model.
Pre-unification users keep working.
Docs updated: docs/user-guide/features/curator.md now points at
`hermes model` → auxiliary → Curator and the dashboard Models tab.
Tests: 6 unit tests on _resolve_review_model (auto default, canonical
slot honored, partial override fallback, legacy fallback with
deprecation log assertion, new-wins-over-legacy, empty-config safety)
plus a cross-registry test that curator is wired into all four sources
of truth. test_aux_tasks_keys_all_exist_in_default_config already
covers the DEFAULT_CONFIG ↔ _AUX_TASKS invariant.
Reported by Voscko on Discord.
2026-04-30 02:46:01 -07:00
|
|
|
#
|
2026-05-02 01:13:17 +08:00
|
|
|
# `_resolve_review_runtime()` honors `auxiliary.curator.{provider,model,...}`
|
fix(curator): unify under auxiliary.curator (hermes model, dashboard) (#17868)
Voscko reported curator.auxiliary.provider/model was advertised in the
docs but ignored — the review fork read only model.provider/default. The
narrow fix would wire the one-off key through, but that leaves curator
as a parallel system: not in `hermes model` → auxiliary picker, not in
the dashboard Models tab, missing per-task base_url/api_key/timeout/
extra_body.
Unify curator with the rest of the aux task system so `hermes model`
and the dashboard configure it like every other aux task.
Four sources of truth updated:
- hermes_cli/config.py — add 'curator' slot to DEFAULT_CONFIG.auxiliary
(timeout=600 since reviews run long), drop the one-off curator.auxiliary
block from DEFAULT_CONFIG.curator.
- hermes_cli/main.py — add ('curator', 'Curator', 'skill-usage review pass')
to _AUX_TASKS so the CLI picker offers it.
- hermes_cli/web_server.py — add 'curator' to _AUX_TASK_SLOTS so the
dashboard REST endpoint accepts it.
- web/src/pages/ModelsPage.tsx — add Curator entry so the dashboard
Models tab renders the task.
agent/curator.py _resolve_review_model() now reads auxiliary.curator
first (canonical), falls back to legacy curator.auxiliary (with an info
log asking users to migrate), then falls back to the main chat model.
Pre-unification users keep working.
Docs updated: docs/user-guide/features/curator.md now points at
`hermes model` → auxiliary → Curator and the dashboard Models tab.
Tests: 6 unit tests on _resolve_review_model (auto default, canonical
slot honored, partial override fallback, legacy fallback with
deprecation log assertion, new-wins-over-legacy, empty-config safety)
plus a cross-registry test that curator is wired into all four sources
of truth. test_aux_tasks_keys_all_exist_in_default_config already
covers the DEFAULT_CONFIG ↔ _AUX_TASKS invariant.
Reported by Voscko on Discord.
2026-04-30 02:46:01 -07:00
|
|
|
# (canonical aux-task slot, wired through `hermes model` → auxiliary
|
|
|
|
|
# picker and the dashboard Models tab), with a legacy fallback to
|
2026-05-02 01:13:17 +08:00
|
|
|
# `curator.auxiliary.{provider,model,...}`. See docs/user-guide/features/curator.md.
|
feat(curator): umbrella-first prompt, inherit parent config, unbounded iterations
Based on three live test runs against 346 agent-created skills on the
author's own setup (~6.5 min, opus-4.7, 86 API calls), the curator
prompt needed three sharpenings before it consistently produced real
umbrella consolidation instead of passive audit output:
**Umbrella-first framing.** The original 'decide keep/patch/archive/
consolidate' framing lets opus default to 'keep' whenever two skills
aren't byte-identical. The new prompt explicitly tells the reviewer
that pairwise distinctness is the wrong bar — the right question is
'would a human maintainer write this as N separate skills, or one
skill with N labeled subsections?' Expect 10-25 prefix clusters; merge
each into an umbrella via one of three methods.
**Three concrete consolidation methods.** (a) Merge into an existing
umbrella (patch the broadest skill, archive siblings); (b) Create a
new umbrella SKILL.md (skill_manage action=create); (c) Demote
session-specific detail into references/, templates/, or scripts/
under the umbrella via skill_manage action=write_file, then archive
the narrow sibling. This matches the support-file vocabulary the
review-prompt side already uses (PR #17213).
**Two observed bailouts pre-empted:** 'usage counters are zero so I
can't judge' (rule 4: judge on content, not use_count) and 'each has
a distinct trigger' (rule 5: pairwise distinctness is the wrong bar).
**Config-aware parent inheritance.** _run_llm_review() was building
AIAgent() without explicit provider/model, hitting an auto-resolve
path that returned empty credentials → HTTP 400 'No models provided'
against OpenRouter. Fork now inherits the user's main provider and
model (via load_config + resolve_runtime_provider) before spawning —
runs on whatever the user is currently on, OAuth-backed or
pool-backed included.
**Unbounded iteration ceiling.** max_iterations=8 was way too low for
an umbrella-build pass over hundreds of skills. A live pass takes
50-100 API calls (scanning, clustering, skill_view'ing candidates,
patching umbrellas, mv'ing siblings). Raised to 9999 — the natural
stopping criterion is 'no more clusters worth processing', not an
arbitrary tool-call budget.
**Tests updated:** test_curator_review_prompt_has_invariants accepts
DO NOT / MUST NOT and drops 'keep' from the required-verb set (the
umbrella-first prompt correctly deemphasizes 'keep' as a first-class
decision label since passive keep-everything is the failure mode
being prevented). Added test_curator_review_prompt_is_umbrella_first
asserting the umbrella framing, class-level thinking, references/
+ templates/ + scripts/ support-file mentions, and the 'use_count
is not evidence of value' pre-emption. Added
test_curator_review_prompt_offers_support_file_actions asserting
skill_manage action=create and action=write_file are both named.
**Live validation on author's setup:**
- Run 1 (old prompt): 3 archives, stopped after surveying — typical passive outcome
- Run 2 (consolidation prompt): 44 archives, 3 patches, surfaced the 50-skill mlops reorg duplicate bug but didn't umbrella
- Run 3 (this prompt): 249 archives + 18 new class-level umbrellas created, reducing agent-created skills from 346 → 118 with every archived skill's content preserved as references/ under its umbrella. Pinned skill untouched. Full report in PR description.
2026-04-28 22:07:02 -07:00
|
|
|
_api_key = None
|
|
|
|
|
_base_url = None
|
|
|
|
|
_api_mode = None
|
|
|
|
|
_resolved_provider = None
|
|
|
|
|
_model_name = ""
|
|
|
|
|
try:
|
|
|
|
|
from hermes_cli.config import load_config
|
|
|
|
|
from hermes_cli.runtime_provider import resolve_runtime_provider
|
|
|
|
|
_cfg = load_config()
|
2026-05-02 01:13:17 +08:00
|
|
|
_binding = _resolve_review_runtime(_cfg)
|
|
|
|
|
_provider, _model_name = _binding.provider, _binding.model
|
feat(curator): umbrella-first prompt, inherit parent config, unbounded iterations
Based on three live test runs against 346 agent-created skills on the
author's own setup (~6.5 min, opus-4.7, 86 API calls), the curator
prompt needed three sharpenings before it consistently produced real
umbrella consolidation instead of passive audit output:
**Umbrella-first framing.** The original 'decide keep/patch/archive/
consolidate' framing lets opus default to 'keep' whenever two skills
aren't byte-identical. The new prompt explicitly tells the reviewer
that pairwise distinctness is the wrong bar — the right question is
'would a human maintainer write this as N separate skills, or one
skill with N labeled subsections?' Expect 10-25 prefix clusters; merge
each into an umbrella via one of three methods.
**Three concrete consolidation methods.** (a) Merge into an existing
umbrella (patch the broadest skill, archive siblings); (b) Create a
new umbrella SKILL.md (skill_manage action=create); (c) Demote
session-specific detail into references/, templates/, or scripts/
under the umbrella via skill_manage action=write_file, then archive
the narrow sibling. This matches the support-file vocabulary the
review-prompt side already uses (PR #17213).
**Two observed bailouts pre-empted:** 'usage counters are zero so I
can't judge' (rule 4: judge on content, not use_count) and 'each has
a distinct trigger' (rule 5: pairwise distinctness is the wrong bar).
**Config-aware parent inheritance.** _run_llm_review() was building
AIAgent() without explicit provider/model, hitting an auto-resolve
path that returned empty credentials → HTTP 400 'No models provided'
against OpenRouter. Fork now inherits the user's main provider and
model (via load_config + resolve_runtime_provider) before spawning —
runs on whatever the user is currently on, OAuth-backed or
pool-backed included.
**Unbounded iteration ceiling.** max_iterations=8 was way too low for
an umbrella-build pass over hundreds of skills. A live pass takes
50-100 API calls (scanning, clustering, skill_view'ing candidates,
patching umbrellas, mv'ing siblings). Raised to 9999 — the natural
stopping criterion is 'no more clusters worth processing', not an
arbitrary tool-call budget.
**Tests updated:** test_curator_review_prompt_has_invariants accepts
DO NOT / MUST NOT and drops 'keep' from the required-verb set (the
umbrella-first prompt correctly deemphasizes 'keep' as a first-class
decision label since passive keep-everything is the failure mode
being prevented). Added test_curator_review_prompt_is_umbrella_first
asserting the umbrella framing, class-level thinking, references/
+ templates/ + scripts/ support-file mentions, and the 'use_count
is not evidence of value' pre-emption. Added
test_curator_review_prompt_offers_support_file_actions asserting
skill_manage action=create and action=write_file are both named.
**Live validation on author's setup:**
- Run 1 (old prompt): 3 archives, stopped after surveying — typical passive outcome
- Run 2 (consolidation prompt): 44 archives, 3 patches, surfaced the 50-skill mlops reorg duplicate bug but didn't umbrella
- Run 3 (this prompt): 249 archives + 18 new class-level umbrellas created, reducing agent-created skills from 346 → 118 with every archived skill's content preserved as references/ under its umbrella. Pinned skill untouched. Full report in PR description.
2026-04-28 22:07:02 -07:00
|
|
|
_rp = resolve_runtime_provider(
|
2026-05-02 01:13:17 +08:00
|
|
|
requested=_provider,
|
|
|
|
|
target_model=_model_name,
|
|
|
|
|
explicit_api_key=_binding.explicit_api_key,
|
|
|
|
|
explicit_base_url=_binding.explicit_base_url,
|
feat(curator): umbrella-first prompt, inherit parent config, unbounded iterations
Based on three live test runs against 346 agent-created skills on the
author's own setup (~6.5 min, opus-4.7, 86 API calls), the curator
prompt needed three sharpenings before it consistently produced real
umbrella consolidation instead of passive audit output:
**Umbrella-first framing.** The original 'decide keep/patch/archive/
consolidate' framing lets opus default to 'keep' whenever two skills
aren't byte-identical. The new prompt explicitly tells the reviewer
that pairwise distinctness is the wrong bar — the right question is
'would a human maintainer write this as N separate skills, or one
skill with N labeled subsections?' Expect 10-25 prefix clusters; merge
each into an umbrella via one of three methods.
**Three concrete consolidation methods.** (a) Merge into an existing
umbrella (patch the broadest skill, archive siblings); (b) Create a
new umbrella SKILL.md (skill_manage action=create); (c) Demote
session-specific detail into references/, templates/, or scripts/
under the umbrella via skill_manage action=write_file, then archive
the narrow sibling. This matches the support-file vocabulary the
review-prompt side already uses (PR #17213).
**Two observed bailouts pre-empted:** 'usage counters are zero so I
can't judge' (rule 4: judge on content, not use_count) and 'each has
a distinct trigger' (rule 5: pairwise distinctness is the wrong bar).
**Config-aware parent inheritance.** _run_llm_review() was building
AIAgent() without explicit provider/model, hitting an auto-resolve
path that returned empty credentials → HTTP 400 'No models provided'
against OpenRouter. Fork now inherits the user's main provider and
model (via load_config + resolve_runtime_provider) before spawning —
runs on whatever the user is currently on, OAuth-backed or
pool-backed included.
**Unbounded iteration ceiling.** max_iterations=8 was way too low for
an umbrella-build pass over hundreds of skills. A live pass takes
50-100 API calls (scanning, clustering, skill_view'ing candidates,
patching umbrellas, mv'ing siblings). Raised to 9999 — the natural
stopping criterion is 'no more clusters worth processing', not an
arbitrary tool-call budget.
**Tests updated:** test_curator_review_prompt_has_invariants accepts
DO NOT / MUST NOT and drops 'keep' from the required-verb set (the
umbrella-first prompt correctly deemphasizes 'keep' as a first-class
decision label since passive keep-everything is the failure mode
being prevented). Added test_curator_review_prompt_is_umbrella_first
asserting the umbrella framing, class-level thinking, references/
+ templates/ + scripts/ support-file mentions, and the 'use_count
is not evidence of value' pre-emption. Added
test_curator_review_prompt_offers_support_file_actions asserting
skill_manage action=create and action=write_file are both named.
**Live validation on author's setup:**
- Run 1 (old prompt): 3 archives, stopped after surveying — typical passive outcome
- Run 2 (consolidation prompt): 44 archives, 3 patches, surfaced the 50-skill mlops reorg duplicate bug but didn't umbrella
- Run 3 (this prompt): 249 archives + 18 new class-level umbrellas created, reducing agent-created skills from 346 → 118 with every archived skill's content preserved as references/ under its umbrella. Pinned skill untouched. Full report in PR description.
2026-04-28 22:07:02 -07:00
|
|
|
)
|
|
|
|
|
_api_key = _rp.get("api_key")
|
|
|
|
|
_base_url = _rp.get("base_url")
|
|
|
|
|
_api_mode = _rp.get("api_mode")
|
|
|
|
|
_resolved_provider = _rp.get("provider") or _provider
|
|
|
|
|
except Exception as e:
|
|
|
|
|
logger.debug("Curator provider resolution failed: %s", e, exc_info=True)
|
|
|
|
|
|
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307)
Every curator pass now emits a dated report directory under
`~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files:
- `run.json` — machine-readable full record (before/after snapshot,
state transitions, all tool calls, model/provider, timing, full LLM
final response untruncated, error if any)
- `REPORT.md` — human-readable markdown: model + duration header,
auto-transition counts, LLM consolidation stats, archived-this-run
list, new-skills-this-run list, state transitions, the full LLM
final summary, and a recovery footer pointing at the archive + the
`hermes curator restore` command
Reports live under `logs/curator/`, not inside `skills/` — they're
operational telemetry, not user-authored skill data, and belong
alongside `agent.log` / `gateway.log`.
Internals:
- `_run_llm_review()` now returns a dict (final, summary, model,
provider, tool_calls, error) instead of a bare truncated string so
the reporter has full fidelity
- Report writer is fully best-effort — any failure logs at DEBUG and
never breaks the curator itself. Same-second rerun gets a numeric
suffix so reports can't clobber each other
- Report path stamped into `.curator_state` as `last_report_path`
- `hermes curator status` surfaces a "last report:" line so users
can immediately open the latest run
Tests (all green):
- 7 new tests in tests/agent/test_curator_reports.py covering: report
location (logs not skills), both files written, run.json shape and
diff accuracy, markdown structure, error path still writes, state
transitions captured, same-second runs get unique dirs
- Existing test_run_review_synchronous_invokes_llm_stub updated to
stub the new dict-returning _run_llm_review signature
Live E2E: ran a synchronous pass against a 1-skill test collection
with a stubbed LLM; report written correctly, state stamped with
last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
|
|
|
result_meta["model"] = _model_name
|
|
|
|
|
result_meta["provider"] = _resolved_provider or ""
|
|
|
|
|
|
feat(curator): background skill maintenance (issue #7816)
Adds the Curator — an auxiliary-model background task that periodically
reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage,
transitions unused skills through active → stale → archived, and spawns
a forked AIAgent to consolidate overlaps and patch drift.
Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI
startup and gateway boot when the last run is older than interval_hours
(default 24) AND the agent has been idle for min_idle_hours (default 2).
Invariants (all load-bearing):
- Never touches bundled or hub-installed skills (.bundled_manifest +
.hub/lock.json double-filter)
- Never auto-deletes — archive only. Archives are recoverable
via `hermes curator restore <skill>`
- Pinned skills bypass all auto-transitions
- Uses the aux client; never touches the main session's prompt cache
New files:
- tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes,
provenance filter
- agent/curator.py — orchestrator: config, idle gating, state-machine
transitions (pure, no LLM), forked-agent review prompt
- hermes_cli/curator.py — `hermes curator {status,run,pause,resume,
pin,unpin,restore}` subcommand
- tests/tools/test_skill_usage.py — 29 tests
- tests/agent/test_curator.py — 25 tests
Modified files (surgical patches):
- tools/skills_tool.py — bump view_count on successful skill_view
- tools/skill_manager_tool.py — bump patch_count on skill_manage
patch/edit/write_file/remove_file; forget record on delete
- hermes_cli/config.py — add curator: section to DEFAULT_CONFIG
- hermes_cli/commands.py — add /curator CommandDef with subcommands
- hermes_cli/main.py — register `hermes curator` subparser via
register_cli() from hermes_cli.curator
- cli.py — /curator slash-command dispatch + startup hook
- gateway/run.py — gateway-boot hook (mirrors CLI)
Validation:
- 54 new tests across skill_usage + curator, all passing in 3s
- 346 tests across all touched files' neighbors green
- 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green
- CLI smoke: `hermes curator status/pause/resume` work end-to-end
Companion to PR #16026 (class-first skill review prompt) — together
they form a loop: the review prompt stops near-duplicate skill creation
at the source, and the curator prunes/consolidates what still accumulates.
Refs #7816.
2026-04-26 06:08:39 -07:00
|
|
|
review_agent = None
|
|
|
|
|
try:
|
feat(curator): umbrella-first prompt, inherit parent config, unbounded iterations
Based on three live test runs against 346 agent-created skills on the
author's own setup (~6.5 min, opus-4.7, 86 API calls), the curator
prompt needed three sharpenings before it consistently produced real
umbrella consolidation instead of passive audit output:
**Umbrella-first framing.** The original 'decide keep/patch/archive/
consolidate' framing lets opus default to 'keep' whenever two skills
aren't byte-identical. The new prompt explicitly tells the reviewer
that pairwise distinctness is the wrong bar — the right question is
'would a human maintainer write this as N separate skills, or one
skill with N labeled subsections?' Expect 10-25 prefix clusters; merge
each into an umbrella via one of three methods.
**Three concrete consolidation methods.** (a) Merge into an existing
umbrella (patch the broadest skill, archive siblings); (b) Create a
new umbrella SKILL.md (skill_manage action=create); (c) Demote
session-specific detail into references/, templates/, or scripts/
under the umbrella via skill_manage action=write_file, then archive
the narrow sibling. This matches the support-file vocabulary the
review-prompt side already uses (PR #17213).
**Two observed bailouts pre-empted:** 'usage counters are zero so I
can't judge' (rule 4: judge on content, not use_count) and 'each has
a distinct trigger' (rule 5: pairwise distinctness is the wrong bar).
**Config-aware parent inheritance.** _run_llm_review() was building
AIAgent() without explicit provider/model, hitting an auto-resolve
path that returned empty credentials → HTTP 400 'No models provided'
against OpenRouter. Fork now inherits the user's main provider and
model (via load_config + resolve_runtime_provider) before spawning —
runs on whatever the user is currently on, OAuth-backed or
pool-backed included.
**Unbounded iteration ceiling.** max_iterations=8 was way too low for
an umbrella-build pass over hundreds of skills. A live pass takes
50-100 API calls (scanning, clustering, skill_view'ing candidates,
patching umbrellas, mv'ing siblings). Raised to 9999 — the natural
stopping criterion is 'no more clusters worth processing', not an
arbitrary tool-call budget.
**Tests updated:** test_curator_review_prompt_has_invariants accepts
DO NOT / MUST NOT and drops 'keep' from the required-verb set (the
umbrella-first prompt correctly deemphasizes 'keep' as a first-class
decision label since passive keep-everything is the failure mode
being prevented). Added test_curator_review_prompt_is_umbrella_first
asserting the umbrella framing, class-level thinking, references/
+ templates/ + scripts/ support-file mentions, and the 'use_count
is not evidence of value' pre-emption. Added
test_curator_review_prompt_offers_support_file_actions asserting
skill_manage action=create and action=write_file are both named.
**Live validation on author's setup:**
- Run 1 (old prompt): 3 archives, stopped after surveying — typical passive outcome
- Run 2 (consolidation prompt): 44 archives, 3 patches, surfaced the 50-skill mlops reorg duplicate bug but didn't umbrella
- Run 3 (this prompt): 249 archives + 18 new class-level umbrellas created, reducing agent-created skills from 346 → 118 with every archived skill's content preserved as references/ under its umbrella. Pinned skill untouched. Full report in PR description.
2026-04-28 22:07:02 -07:00
|
|
|
review_agent = AIAgent(
|
|
|
|
|
model=_model_name,
|
|
|
|
|
provider=_resolved_provider,
|
|
|
|
|
api_key=_api_key,
|
|
|
|
|
base_url=_base_url,
|
|
|
|
|
api_mode=_api_mode,
|
|
|
|
|
# Umbrella-building over a large skill collection is worth a
|
|
|
|
|
# high iteration ceiling — the pass typically takes 50-100
|
|
|
|
|
# API calls against hundreds of candidate skills. The
|
|
|
|
|
# single-session review path caps itself at a much smaller
|
|
|
|
|
# number because it's not doing a curation sweep.
|
|
|
|
|
max_iterations=9999,
|
|
|
|
|
quiet_mode=True,
|
|
|
|
|
platform="curator",
|
|
|
|
|
skip_context_files=True,
|
|
|
|
|
skip_memory=True,
|
|
|
|
|
)
|
|
|
|
|
# Disable recursive nudges — the curator must never spawn its own review.
|
|
|
|
|
review_agent._memory_nudge_interval = 0
|
|
|
|
|
review_agent._skill_nudge_interval = 0
|
|
|
|
|
|
|
|
|
|
# Redirect the forked agent's stdout/stderr to /dev/null while it
|
|
|
|
|
# runs so its tool-call chatter doesn't pollute the foreground
|
|
|
|
|
# terminal. The background-thread runner also hides it; this
|
|
|
|
|
# belt-and-suspenders path matters when a caller invokes
|
|
|
|
|
# run_curator_review(synchronous=True) from the CLI.
|
feat(curator): background skill maintenance (issue #7816)
Adds the Curator — an auxiliary-model background task that periodically
reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage,
transitions unused skills through active → stale → archived, and spawns
a forked AIAgent to consolidate overlaps and patch drift.
Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI
startup and gateway boot when the last run is older than interval_hours
(default 24) AND the agent has been idle for min_idle_hours (default 2).
Invariants (all load-bearing):
- Never touches bundled or hub-installed skills (.bundled_manifest +
.hub/lock.json double-filter)
- Never auto-deletes — archive only. Archives are recoverable
via `hermes curator restore <skill>`
- Pinned skills bypass all auto-transitions
- Uses the aux client; never touches the main session's prompt cache
New files:
- tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes,
provenance filter
- agent/curator.py — orchestrator: config, idle gating, state-machine
transitions (pure, no LLM), forked-agent review prompt
- hermes_cli/curator.py — `hermes curator {status,run,pause,resume,
pin,unpin,restore}` subcommand
- tests/tools/test_skill_usage.py — 29 tests
- tests/agent/test_curator.py — 25 tests
Modified files (surgical patches):
- tools/skills_tool.py — bump view_count on successful skill_view
- tools/skill_manager_tool.py — bump patch_count on skill_manage
patch/edit/write_file/remove_file; forget record on delete
- hermes_cli/config.py — add curator: section to DEFAULT_CONFIG
- hermes_cli/commands.py — add /curator CommandDef with subcommands
- hermes_cli/main.py — register `hermes curator` subparser via
register_cli() from hermes_cli.curator
- cli.py — /curator slash-command dispatch + startup hook
- gateway/run.py — gateway-boot hook (mirrors CLI)
Validation:
- 54 new tests across skill_usage + curator, all passing in 3s
- 346 tests across all touched files' neighbors green
- 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green
- CLI smoke: `hermes curator status/pause/resume` work end-to-end
Companion to PR #16026 (class-first skill review prompt) — together
they form a loop: the review prompt stops near-duplicate skill creation
at the source, and the curator prunes/consolidates what still accumulates.
Refs #7816.
2026-04-26 06:08:39 -07:00
|
|
|
with open(os.devnull, "w") as _devnull, \
|
|
|
|
|
contextlib.redirect_stdout(_devnull), \
|
|
|
|
|
contextlib.redirect_stderr(_devnull):
|
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307)
Every curator pass now emits a dated report directory under
`~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files:
- `run.json` — machine-readable full record (before/after snapshot,
state transitions, all tool calls, model/provider, timing, full LLM
final response untruncated, error if any)
- `REPORT.md` — human-readable markdown: model + duration header,
auto-transition counts, LLM consolidation stats, archived-this-run
list, new-skills-this-run list, state transitions, the full LLM
final summary, and a recovery footer pointing at the archive + the
`hermes curator restore` command
Reports live under `logs/curator/`, not inside `skills/` — they're
operational telemetry, not user-authored skill data, and belong
alongside `agent.log` / `gateway.log`.
Internals:
- `_run_llm_review()` now returns a dict (final, summary, model,
provider, tool_calls, error) instead of a bare truncated string so
the reporter has full fidelity
- Report writer is fully best-effort — any failure logs at DEBUG and
never breaks the curator itself. Same-second rerun gets a numeric
suffix so reports can't clobber each other
- Report path stamped into `.curator_state` as `last_report_path`
- `hermes curator status` surfaces a "last report:" line so users
can immediately open the latest run
Tests (all green):
- 7 new tests in tests/agent/test_curator_reports.py covering: report
location (logs not skills), both files written, run.json shape and
diff accuracy, markdown structure, error path still writes, state
transitions captured, same-second runs get unique dirs
- Existing test_run_review_synchronous_invokes_llm_stub updated to
stub the new dict-returning _run_llm_review signature
Live E2E: ran a synchronous pass against a 1-skill test collection
with a stubbed LLM; report written correctly, state stamped with
last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
|
|
|
conv_result = review_agent.run_conversation(user_message=prompt)
|
feat(curator): background skill maintenance (issue #7816)
Adds the Curator — an auxiliary-model background task that periodically
reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage,
transitions unused skills through active → stale → archived, and spawns
a forked AIAgent to consolidate overlaps and patch drift.
Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI
startup and gateway boot when the last run is older than interval_hours
(default 24) AND the agent has been idle for min_idle_hours (default 2).
Invariants (all load-bearing):
- Never touches bundled or hub-installed skills (.bundled_manifest +
.hub/lock.json double-filter)
- Never auto-deletes — archive only. Archives are recoverable
via `hermes curator restore <skill>`
- Pinned skills bypass all auto-transitions
- Uses the aux client; never touches the main session's prompt cache
New files:
- tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes,
provenance filter
- agent/curator.py — orchestrator: config, idle gating, state-machine
transitions (pure, no LLM), forked-agent review prompt
- hermes_cli/curator.py — `hermes curator {status,run,pause,resume,
pin,unpin,restore}` subcommand
- tests/tools/test_skill_usage.py — 29 tests
- tests/agent/test_curator.py — 25 tests
Modified files (surgical patches):
- tools/skills_tool.py — bump view_count on successful skill_view
- tools/skill_manager_tool.py — bump patch_count on skill_manage
patch/edit/write_file/remove_file; forget record on delete
- hermes_cli/config.py — add curator: section to DEFAULT_CONFIG
- hermes_cli/commands.py — add /curator CommandDef with subcommands
- hermes_cli/main.py — register `hermes curator` subparser via
register_cli() from hermes_cli.curator
- cli.py — /curator slash-command dispatch + startup hook
- gateway/run.py — gateway-boot hook (mirrors CLI)
Validation:
- 54 new tests across skill_usage + curator, all passing in 3s
- 346 tests across all touched files' neighbors green
- 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green
- CLI smoke: `hermes curator status/pause/resume` work end-to-end
Companion to PR #16026 (class-first skill review prompt) — together
they form a loop: the review prompt stops near-duplicate skill creation
at the source, and the curator prunes/consolidates what still accumulates.
Refs #7816.
2026-04-26 06:08:39 -07:00
|
|
|
|
|
|
|
|
final = ""
|
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307)
Every curator pass now emits a dated report directory under
`~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files:
- `run.json` — machine-readable full record (before/after snapshot,
state transitions, all tool calls, model/provider, timing, full LLM
final response untruncated, error if any)
- `REPORT.md` — human-readable markdown: model + duration header,
auto-transition counts, LLM consolidation stats, archived-this-run
list, new-skills-this-run list, state transitions, the full LLM
final summary, and a recovery footer pointing at the archive + the
`hermes curator restore` command
Reports live under `logs/curator/`, not inside `skills/` — they're
operational telemetry, not user-authored skill data, and belong
alongside `agent.log` / `gateway.log`.
Internals:
- `_run_llm_review()` now returns a dict (final, summary, model,
provider, tool_calls, error) instead of a bare truncated string so
the reporter has full fidelity
- Report writer is fully best-effort — any failure logs at DEBUG and
never breaks the curator itself. Same-second rerun gets a numeric
suffix so reports can't clobber each other
- Report path stamped into `.curator_state` as `last_report_path`
- `hermes curator status` surfaces a "last report:" line so users
can immediately open the latest run
Tests (all green):
- 7 new tests in tests/agent/test_curator_reports.py covering: report
location (logs not skills), both files written, run.json shape and
diff accuracy, markdown structure, error path still writes, state
transitions captured, same-second runs get unique dirs
- Existing test_run_review_synchronous_invokes_llm_stub updated to
stub the new dict-returning _run_llm_review signature
Live E2E: ran a synchronous pass against a 1-skill test collection
with a stubbed LLM; report written correctly, state stamped with
last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
|
|
|
if isinstance(conv_result, dict):
|
|
|
|
|
final = str(conv_result.get("final_response") or "").strip()
|
|
|
|
|
result_meta["final"] = final
|
|
|
|
|
result_meta["summary"] = (final[:240] + "…") if len(final) > 240 else (final or "no change")
|
|
|
|
|
|
|
|
|
|
# Collect tool calls for the report. Walk the forked agent's
|
|
|
|
|
# session messages and extract every tool_call made during the
|
|
|
|
|
# pass. Truncate argument payloads so a giant skill_manage create
|
|
|
|
|
# doesn't blow up the report.
|
|
|
|
|
_calls: List[Dict[str, Any]] = []
|
|
|
|
|
for msg in getattr(review_agent, "_session_messages", []) or []:
|
|
|
|
|
if not isinstance(msg, dict):
|
|
|
|
|
continue
|
|
|
|
|
tcs = msg.get("tool_calls") or []
|
|
|
|
|
for tc in tcs:
|
|
|
|
|
if not isinstance(tc, dict):
|
|
|
|
|
continue
|
|
|
|
|
fn = tc.get("function") or {}
|
|
|
|
|
name = fn.get("name") or ""
|
|
|
|
|
args_raw = fn.get("arguments") or ""
|
|
|
|
|
if isinstance(args_raw, str) and len(args_raw) > 400:
|
|
|
|
|
args_raw = args_raw[:400] + "…"
|
|
|
|
|
_calls.append({"name": name, "arguments": args_raw})
|
|
|
|
|
result_meta["tool_calls"] = _calls
|
feat(curator): background skill maintenance (issue #7816)
Adds the Curator — an auxiliary-model background task that periodically
reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage,
transitions unused skills through active → stale → archived, and spawns
a forked AIAgent to consolidate overlaps and patch drift.
Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI
startup and gateway boot when the last run is older than interval_hours
(default 24) AND the agent has been idle for min_idle_hours (default 2).
Invariants (all load-bearing):
- Never touches bundled or hub-installed skills (.bundled_manifest +
.hub/lock.json double-filter)
- Never auto-deletes — archive only. Archives are recoverable
via `hermes curator restore <skill>`
- Pinned skills bypass all auto-transitions
- Uses the aux client; never touches the main session's prompt cache
New files:
- tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes,
provenance filter
- agent/curator.py — orchestrator: config, idle gating, state-machine
transitions (pure, no LLM), forked-agent review prompt
- hermes_cli/curator.py — `hermes curator {status,run,pause,resume,
pin,unpin,restore}` subcommand
- tests/tools/test_skill_usage.py — 29 tests
- tests/agent/test_curator.py — 25 tests
Modified files (surgical patches):
- tools/skills_tool.py — bump view_count on successful skill_view
- tools/skill_manager_tool.py — bump patch_count on skill_manage
patch/edit/write_file/remove_file; forget record on delete
- hermes_cli/config.py — add curator: section to DEFAULT_CONFIG
- hermes_cli/commands.py — add /curator CommandDef with subcommands
- hermes_cli/main.py — register `hermes curator` subparser via
register_cli() from hermes_cli.curator
- cli.py — /curator slash-command dispatch + startup hook
- gateway/run.py — gateway-boot hook (mirrors CLI)
Validation:
- 54 new tests across skill_usage + curator, all passing in 3s
- 346 tests across all touched files' neighbors green
- 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green
- CLI smoke: `hermes curator status/pause/resume` work end-to-end
Companion to PR #16026 (class-first skill review prompt) — together
they form a loop: the review prompt stops near-duplicate skill creation
at the source, and the curator prunes/consolidates what still accumulates.
Refs #7816.
2026-04-26 06:08:39 -07:00
|
|
|
except Exception as e:
|
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307)
Every curator pass now emits a dated report directory under
`~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files:
- `run.json` — machine-readable full record (before/after snapshot,
state transitions, all tool calls, model/provider, timing, full LLM
final response untruncated, error if any)
- `REPORT.md` — human-readable markdown: model + duration header,
auto-transition counts, LLM consolidation stats, archived-this-run
list, new-skills-this-run list, state transitions, the full LLM
final summary, and a recovery footer pointing at the archive + the
`hermes curator restore` command
Reports live under `logs/curator/`, not inside `skills/` — they're
operational telemetry, not user-authored skill data, and belong
alongside `agent.log` / `gateway.log`.
Internals:
- `_run_llm_review()` now returns a dict (final, summary, model,
provider, tool_calls, error) instead of a bare truncated string so
the reporter has full fidelity
- Report writer is fully best-effort — any failure logs at DEBUG and
never breaks the curator itself. Same-second rerun gets a numeric
suffix so reports can't clobber each other
- Report path stamped into `.curator_state` as `last_report_path`
- `hermes curator status` surfaces a "last report:" line so users
can immediately open the latest run
Tests (all green):
- 7 new tests in tests/agent/test_curator_reports.py covering: report
location (logs not skills), both files written, run.json shape and
diff accuracy, markdown structure, error path still writes, state
transitions captured, same-second runs get unique dirs
- Existing test_run_review_synchronous_invokes_llm_stub updated to
stub the new dict-returning _run_llm_review signature
Live E2E: ran a synchronous pass against a 1-skill test collection
with a stubbed LLM; report written correctly, state stamped with
last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
|
|
|
result_meta["error"] = f"error: {e}"
|
|
|
|
|
result_meta["summary"] = result_meta["error"]
|
feat(curator): background skill maintenance (issue #7816)
Adds the Curator — an auxiliary-model background task that periodically
reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage,
transitions unused skills through active → stale → archived, and spawns
a forked AIAgent to consolidate overlaps and patch drift.
Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI
startup and gateway boot when the last run is older than interval_hours
(default 24) AND the agent has been idle for min_idle_hours (default 2).
Invariants (all load-bearing):
- Never touches bundled or hub-installed skills (.bundled_manifest +
.hub/lock.json double-filter)
- Never auto-deletes — archive only. Archives are recoverable
via `hermes curator restore <skill>`
- Pinned skills bypass all auto-transitions
- Uses the aux client; never touches the main session's prompt cache
New files:
- tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes,
provenance filter
- agent/curator.py — orchestrator: config, idle gating, state-machine
transitions (pure, no LLM), forked-agent review prompt
- hermes_cli/curator.py — `hermes curator {status,run,pause,resume,
pin,unpin,restore}` subcommand
- tests/tools/test_skill_usage.py — 29 tests
- tests/agent/test_curator.py — 25 tests
Modified files (surgical patches):
- tools/skills_tool.py — bump view_count on successful skill_view
- tools/skill_manager_tool.py — bump patch_count on skill_manage
patch/edit/write_file/remove_file; forget record on delete
- hermes_cli/config.py — add curator: section to DEFAULT_CONFIG
- hermes_cli/commands.py — add /curator CommandDef with subcommands
- hermes_cli/main.py — register `hermes curator` subparser via
register_cli() from hermes_cli.curator
- cli.py — /curator slash-command dispatch + startup hook
- gateway/run.py — gateway-boot hook (mirrors CLI)
Validation:
- 54 new tests across skill_usage + curator, all passing in 3s
- 346 tests across all touched files' neighbors green
- 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green
- CLI smoke: `hermes curator status/pause/resume` work end-to-end
Companion to PR #16026 (class-first skill review prompt) — together
they form a loop: the review prompt stops near-duplicate skill creation
at the source, and the curator prunes/consolidates what still accumulates.
Refs #7816.
2026-04-26 06:08:39 -07:00
|
|
|
finally:
|
|
|
|
|
if review_agent is not None:
|
|
|
|
|
try:
|
|
|
|
|
review_agent.close()
|
|
|
|
|
except Exception:
|
|
|
|
|
pass
|
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307)
Every curator pass now emits a dated report directory under
`~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files:
- `run.json` — machine-readable full record (before/after snapshot,
state transitions, all tool calls, model/provider, timing, full LLM
final response untruncated, error if any)
- `REPORT.md` — human-readable markdown: model + duration header,
auto-transition counts, LLM consolidation stats, archived-this-run
list, new-skills-this-run list, state transitions, the full LLM
final summary, and a recovery footer pointing at the archive + the
`hermes curator restore` command
Reports live under `logs/curator/`, not inside `skills/` — they're
operational telemetry, not user-authored skill data, and belong
alongside `agent.log` / `gateway.log`.
Internals:
- `_run_llm_review()` now returns a dict (final, summary, model,
provider, tool_calls, error) instead of a bare truncated string so
the reporter has full fidelity
- Report writer is fully best-effort — any failure logs at DEBUG and
never breaks the curator itself. Same-second rerun gets a numeric
suffix so reports can't clobber each other
- Report path stamped into `.curator_state` as `last_report_path`
- `hermes curator status` surfaces a "last report:" line so users
can immediately open the latest run
Tests (all green):
- 7 new tests in tests/agent/test_curator_reports.py covering: report
location (logs not skills), both files written, run.json shape and
diff accuracy, markdown structure, error path still writes, state
transitions captured, same-second runs get unique dirs
- Existing test_run_review_synchronous_invokes_llm_stub updated to
stub the new dict-returning _run_llm_review signature
Live E2E: ran a synchronous pass against a 1-skill test collection
with a stubbed LLM; report written correctly, state stamped with
last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
|
|
|
return result_meta
|
feat(curator): background skill maintenance (issue #7816)
Adds the Curator — an auxiliary-model background task that periodically
reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage,
transitions unused skills through active → stale → archived, and spawns
a forked AIAgent to consolidate overlaps and patch drift.
Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI
startup and gateway boot when the last run is older than interval_hours
(default 24) AND the agent has been idle for min_idle_hours (default 2).
Invariants (all load-bearing):
- Never touches bundled or hub-installed skills (.bundled_manifest +
.hub/lock.json double-filter)
- Never auto-deletes — archive only. Archives are recoverable
via `hermes curator restore <skill>`
- Pinned skills bypass all auto-transitions
- Uses the aux client; never touches the main session's prompt cache
New files:
- tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes,
provenance filter
- agent/curator.py — orchestrator: config, idle gating, state-machine
transitions (pure, no LLM), forked-agent review prompt
- hermes_cli/curator.py — `hermes curator {status,run,pause,resume,
pin,unpin,restore}` subcommand
- tests/tools/test_skill_usage.py — 29 tests
- tests/agent/test_curator.py — 25 tests
Modified files (surgical patches):
- tools/skills_tool.py — bump view_count on successful skill_view
- tools/skill_manager_tool.py — bump patch_count on skill_manage
patch/edit/write_file/remove_file; forget record on delete
- hermes_cli/config.py — add curator: section to DEFAULT_CONFIG
- hermes_cli/commands.py — add /curator CommandDef with subcommands
- hermes_cli/main.py — register `hermes curator` subparser via
register_cli() from hermes_cli.curator
- cli.py — /curator slash-command dispatch + startup hook
- gateway/run.py — gateway-boot hook (mirrors CLI)
Validation:
- 54 new tests across skill_usage + curator, all passing in 3s
- 346 tests across all touched files' neighbors green
- 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green
- CLI smoke: `hermes curator status/pause/resume` work end-to-end
Companion to PR #16026 (class-first skill review prompt) — together
they form a loop: the review prompt stops near-duplicate skill creation
at the source, and the curator prunes/consolidates what still accumulates.
Refs #7816.
2026-04-26 06:08:39 -07:00
|
|
|
|
|
|
|
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
# Public entrypoint for the session-start hook
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
def maybe_run_curator(
|
|
|
|
|
*,
|
|
|
|
|
idle_for_seconds: Optional[float] = None,
|
|
|
|
|
on_summary: Optional[Callable[[str], None]] = None,
|
|
|
|
|
) -> Optional[Dict[str, Any]]:
|
|
|
|
|
"""Best-effort: run a curator pass if all gates pass. Returns the result
|
|
|
|
|
dict if a pass was started, else None. Never raises."""
|
|
|
|
|
try:
|
|
|
|
|
if not should_run_now():
|
|
|
|
|
return None
|
|
|
|
|
# Idle gating: only enforce when the caller provided a measurement.
|
|
|
|
|
if idle_for_seconds is not None:
|
|
|
|
|
min_idle_s = get_min_idle_hours() * 3600.0
|
|
|
|
|
if idle_for_seconds < min_idle_s:
|
|
|
|
|
return None
|
|
|
|
|
return run_curator_review(on_summary=on_summary)
|
|
|
|
|
except Exception as e:
|
|
|
|
|
logger.debug("maybe_run_curator failed: %s", e, exc_info=True)
|
|
|
|
|
return None
|