Files
hermes-agent/agent/curator.py

1317 lines
53 KiB
Python
Raw Normal View History

feat(curator): background skill maintenance (issue #7816) Adds the Curator — an auxiliary-model background task that periodically reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage, transitions unused skills through active → stale → archived, and spawns a forked AIAgent to consolidate overlaps and patch drift. Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI startup and gateway boot when the last run is older than interval_hours (default 24) AND the agent has been idle for min_idle_hours (default 2). Invariants (all load-bearing): - Never touches bundled or hub-installed skills (.bundled_manifest + .hub/lock.json double-filter) - Never auto-deletes — archive only. Archives are recoverable via `hermes curator restore <skill>` - Pinned skills bypass all auto-transitions - Uses the aux client; never touches the main session's prompt cache New files: - tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes, provenance filter - agent/curator.py — orchestrator: config, idle gating, state-machine transitions (pure, no LLM), forked-agent review prompt - hermes_cli/curator.py — `hermes curator {status,run,pause,resume, pin,unpin,restore}` subcommand - tests/tools/test_skill_usage.py — 29 tests - tests/agent/test_curator.py — 25 tests Modified files (surgical patches): - tools/skills_tool.py — bump view_count on successful skill_view - tools/skill_manager_tool.py — bump patch_count on skill_manage patch/edit/write_file/remove_file; forget record on delete - hermes_cli/config.py — add curator: section to DEFAULT_CONFIG - hermes_cli/commands.py — add /curator CommandDef with subcommands - hermes_cli/main.py — register `hermes curator` subparser via register_cli() from hermes_cli.curator - cli.py — /curator slash-command dispatch + startup hook - gateway/run.py — gateway-boot hook (mirrors CLI) Validation: - 54 new tests across skill_usage + curator, all passing in 3s - 346 tests across all touched files' neighbors green - 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green - CLI smoke: `hermes curator status/pause/resume` work end-to-end Companion to PR #16026 (class-first skill review prompt) — together they form a loop: the review prompt stops near-duplicate skill creation at the source, and the curator prunes/consolidates what still accumulates. Refs #7816.
2026-04-26 06:08:39 -07:00
"""Curator — background skill maintenance orchestrator.
The curator is an auxiliary-model task that periodically reviews agent-created
skills and maintains the collection. It runs inactivity-triggered (no cron
daemon): when the agent is idle and the last curator run was longer than
``interval_hours`` ago, ``maybe_run_curator()`` spawns a forked AIAgent to do
the review.
Responsibilities:
- Auto-transition lifecycle states based on derived skill activity timestamps
feat(curator): background skill maintenance (issue #7816) Adds the Curator — an auxiliary-model background task that periodically reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage, transitions unused skills through active → stale → archived, and spawns a forked AIAgent to consolidate overlaps and patch drift. Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI startup and gateway boot when the last run is older than interval_hours (default 24) AND the agent has been idle for min_idle_hours (default 2). Invariants (all load-bearing): - Never touches bundled or hub-installed skills (.bundled_manifest + .hub/lock.json double-filter) - Never auto-deletes — archive only. Archives are recoverable via `hermes curator restore <skill>` - Pinned skills bypass all auto-transitions - Uses the aux client; never touches the main session's prompt cache New files: - tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes, provenance filter - agent/curator.py — orchestrator: config, idle gating, state-machine transitions (pure, no LLM), forked-agent review prompt - hermes_cli/curator.py — `hermes curator {status,run,pause,resume, pin,unpin,restore}` subcommand - tests/tools/test_skill_usage.py — 29 tests - tests/agent/test_curator.py — 25 tests Modified files (surgical patches): - tools/skills_tool.py — bump view_count on successful skill_view - tools/skill_manager_tool.py — bump patch_count on skill_manage patch/edit/write_file/remove_file; forget record on delete - hermes_cli/config.py — add curator: section to DEFAULT_CONFIG - hermes_cli/commands.py — add /curator CommandDef with subcommands - hermes_cli/main.py — register `hermes curator` subparser via register_cli() from hermes_cli.curator - cli.py — /curator slash-command dispatch + startup hook - gateway/run.py — gateway-boot hook (mirrors CLI) Validation: - 54 new tests across skill_usage + curator, all passing in 3s - 346 tests across all touched files' neighbors green - 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green - CLI smoke: `hermes curator status/pause/resume` work end-to-end Companion to PR #16026 (class-first skill review prompt) — together they form a loop: the review prompt stops near-duplicate skill creation at the source, and the curator prunes/consolidates what still accumulates. Refs #7816.
2026-04-26 06:08:39 -07:00
- Spawn a background review agent that can pin / archive / consolidate /
patch agent-created skills via skill_manage
- Persist curator state (last_run_at, paused, etc.) in .curator_state
Strict invariants:
- Only touches agent-created skills (see tools/skill_usage.is_agent_created)
- Never auto-deletes only archives. Archive is recoverable.
- Pinned skills bypass all auto-transitions
- Uses the auxiliary client; never touches the main session's prompt cache
"""
from __future__ import annotations
import json
import logging
import os
import tempfile
import threading
from datetime import datetime, timedelta, timezone
from pathlib import Path
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307) Every curator pass now emits a dated report directory under `~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files: - `run.json` — machine-readable full record (before/after snapshot, state transitions, all tool calls, model/provider, timing, full LLM final response untruncated, error if any) - `REPORT.md` — human-readable markdown: model + duration header, auto-transition counts, LLM consolidation stats, archived-this-run list, new-skills-this-run list, state transitions, the full LLM final summary, and a recovery footer pointing at the archive + the `hermes curator restore` command Reports live under `logs/curator/`, not inside `skills/` — they're operational telemetry, not user-authored skill data, and belong alongside `agent.log` / `gateway.log`. Internals: - `_run_llm_review()` now returns a dict (final, summary, model, provider, tool_calls, error) instead of a bare truncated string so the reporter has full fidelity - Report writer is fully best-effort — any failure logs at DEBUG and never breaks the curator itself. Same-second rerun gets a numeric suffix so reports can't clobber each other - Report path stamped into `.curator_state` as `last_report_path` - `hermes curator status` surfaces a "last report:" line so users can immediately open the latest run Tests (all green): - 7 new tests in tests/agent/test_curator_reports.py covering: report location (logs not skills), both files written, run.json shape and diff accuracy, markdown structure, error path still writes, state transitions captured, same-second runs get unique dirs - Existing test_run_review_synchronous_invokes_llm_stub updated to stub the new dict-returning _run_llm_review signature Live E2E: ran a synchronous pass against a 1-skill test collection with a stubbed LLM; report written correctly, state stamped with last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
from typing import Any, Callable, Dict, List, Optional, Set
feat(curator): background skill maintenance (issue #7816) Adds the Curator — an auxiliary-model background task that periodically reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage, transitions unused skills through active → stale → archived, and spawns a forked AIAgent to consolidate overlaps and patch drift. Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI startup and gateway boot when the last run is older than interval_hours (default 24) AND the agent has been idle for min_idle_hours (default 2). Invariants (all load-bearing): - Never touches bundled or hub-installed skills (.bundled_manifest + .hub/lock.json double-filter) - Never auto-deletes — archive only. Archives are recoverable via `hermes curator restore <skill>` - Pinned skills bypass all auto-transitions - Uses the aux client; never touches the main session's prompt cache New files: - tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes, provenance filter - agent/curator.py — orchestrator: config, idle gating, state-machine transitions (pure, no LLM), forked-agent review prompt - hermes_cli/curator.py — `hermes curator {status,run,pause,resume, pin,unpin,restore}` subcommand - tests/tools/test_skill_usage.py — 29 tests - tests/agent/test_curator.py — 25 tests Modified files (surgical patches): - tools/skills_tool.py — bump view_count on successful skill_view - tools/skill_manager_tool.py — bump patch_count on skill_manage patch/edit/write_file/remove_file; forget record on delete - hermes_cli/config.py — add curator: section to DEFAULT_CONFIG - hermes_cli/commands.py — add /curator CommandDef with subcommands - hermes_cli/main.py — register `hermes curator` subparser via register_cli() from hermes_cli.curator - cli.py — /curator slash-command dispatch + startup hook - gateway/run.py — gateway-boot hook (mirrors CLI) Validation: - 54 new tests across skill_usage + curator, all passing in 3s - 346 tests across all touched files' neighbors green - 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green - CLI smoke: `hermes curator status/pause/resume` work end-to-end Companion to PR #16026 (class-first skill review prompt) — together they form a loop: the review prompt stops near-duplicate skill creation at the source, and the curator prunes/consolidates what still accumulates. Refs #7816.
2026-04-26 06:08:39 -07:00
from hermes_constants import get_hermes_home
from tools import skill_usage
logger = logging.getLogger(__name__)
DEFAULT_INTERVAL_HOURS = 24 * 7 # 7 days
feat(curator): background skill maintenance (issue #7816) Adds the Curator — an auxiliary-model background task that periodically reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage, transitions unused skills through active → stale → archived, and spawns a forked AIAgent to consolidate overlaps and patch drift. Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI startup and gateway boot when the last run is older than interval_hours (default 24) AND the agent has been idle for min_idle_hours (default 2). Invariants (all load-bearing): - Never touches bundled or hub-installed skills (.bundled_manifest + .hub/lock.json double-filter) - Never auto-deletes — archive only. Archives are recoverable via `hermes curator restore <skill>` - Pinned skills bypass all auto-transitions - Uses the aux client; never touches the main session's prompt cache New files: - tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes, provenance filter - agent/curator.py — orchestrator: config, idle gating, state-machine transitions (pure, no LLM), forked-agent review prompt - hermes_cli/curator.py — `hermes curator {status,run,pause,resume, pin,unpin,restore}` subcommand - tests/tools/test_skill_usage.py — 29 tests - tests/agent/test_curator.py — 25 tests Modified files (surgical patches): - tools/skills_tool.py — bump view_count on successful skill_view - tools/skill_manager_tool.py — bump patch_count on skill_manage patch/edit/write_file/remove_file; forget record on delete - hermes_cli/config.py — add curator: section to DEFAULT_CONFIG - hermes_cli/commands.py — add /curator CommandDef with subcommands - hermes_cli/main.py — register `hermes curator` subparser via register_cli() from hermes_cli.curator - cli.py — /curator slash-command dispatch + startup hook - gateway/run.py — gateway-boot hook (mirrors CLI) Validation: - 54 new tests across skill_usage + curator, all passing in 3s - 346 tests across all touched files' neighbors green - 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green - CLI smoke: `hermes curator status/pause/resume` work end-to-end Companion to PR #16026 (class-first skill review prompt) — together they form a loop: the review prompt stops near-duplicate skill creation at the source, and the curator prunes/consolidates what still accumulates. Refs #7816.
2026-04-26 06:08:39 -07:00
DEFAULT_MIN_IDLE_HOURS = 2
DEFAULT_STALE_AFTER_DAYS = 30
DEFAULT_ARCHIVE_AFTER_DAYS = 90
# ---------------------------------------------------------------------------
# .curator_state — persistent scheduler + status
# ---------------------------------------------------------------------------
def _state_file() -> Path:
return get_hermes_home() / "skills" / ".curator_state"
def _default_state() -> Dict[str, Any]:
return {
"last_run_at": None,
"last_run_duration_seconds": None,
"last_run_summary": None,
"last_report_path": None,
feat(curator): background skill maintenance (issue #7816) Adds the Curator — an auxiliary-model background task that periodically reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage, transitions unused skills through active → stale → archived, and spawns a forked AIAgent to consolidate overlaps and patch drift. Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI startup and gateway boot when the last run is older than interval_hours (default 24) AND the agent has been idle for min_idle_hours (default 2). Invariants (all load-bearing): - Never touches bundled or hub-installed skills (.bundled_manifest + .hub/lock.json double-filter) - Never auto-deletes — archive only. Archives are recoverable via `hermes curator restore <skill>` - Pinned skills bypass all auto-transitions - Uses the aux client; never touches the main session's prompt cache New files: - tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes, provenance filter - agent/curator.py — orchestrator: config, idle gating, state-machine transitions (pure, no LLM), forked-agent review prompt - hermes_cli/curator.py — `hermes curator {status,run,pause,resume, pin,unpin,restore}` subcommand - tests/tools/test_skill_usage.py — 29 tests - tests/agent/test_curator.py — 25 tests Modified files (surgical patches): - tools/skills_tool.py — bump view_count on successful skill_view - tools/skill_manager_tool.py — bump patch_count on skill_manage patch/edit/write_file/remove_file; forget record on delete - hermes_cli/config.py — add curator: section to DEFAULT_CONFIG - hermes_cli/commands.py — add /curator CommandDef with subcommands - hermes_cli/main.py — register `hermes curator` subparser via register_cli() from hermes_cli.curator - cli.py — /curator slash-command dispatch + startup hook - gateway/run.py — gateway-boot hook (mirrors CLI) Validation: - 54 new tests across skill_usage + curator, all passing in 3s - 346 tests across all touched files' neighbors green - 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green - CLI smoke: `hermes curator status/pause/resume` work end-to-end Companion to PR #16026 (class-first skill review prompt) — together they form a loop: the review prompt stops near-duplicate skill creation at the source, and the curator prunes/consolidates what still accumulates. Refs #7816.
2026-04-26 06:08:39 -07:00
"paused": False,
"run_count": 0,
}
def load_state() -> Dict[str, Any]:
path = _state_file()
if not path.exists():
return _default_state()
try:
data = json.loads(path.read_text(encoding="utf-8"))
if isinstance(data, dict):
base = _default_state()
base.update({k: v for k, v in data.items() if k in base or k.startswith("_")})
return base
except (OSError, json.JSONDecodeError) as e:
logger.debug("Failed to read curator state: %s", e)
return _default_state()
def save_state(data: Dict[str, Any]) -> None:
path = _state_file()
try:
path.parent.mkdir(parents=True, exist_ok=True)
fd, tmp = tempfile.mkstemp(dir=str(path.parent), prefix=".curator_state_", suffix=".tmp")
try:
with os.fdopen(fd, "w", encoding="utf-8") as f:
json.dump(data, f, indent=2, sort_keys=True, ensure_ascii=False)
f.flush()
os.fsync(f.fileno())
os.replace(tmp, path)
except BaseException:
try:
os.unlink(tmp)
except OSError:
pass
raise
except Exception as e:
logger.debug("Failed to save curator state: %s", e, exc_info=True)
def set_paused(paused: bool) -> None:
state = load_state()
state["paused"] = bool(paused)
save_state(state)
def is_paused() -> bool:
return bool(load_state().get("paused"))
# ---------------------------------------------------------------------------
# Config access
# ---------------------------------------------------------------------------
def _load_config() -> Dict[str, Any]:
"""Read curator.* config from ~/.hermes/config.yaml. Tolerates missing file."""
try:
from hermes_cli.config import load_config
cfg = load_config()
except Exception as e:
logger.debug("Failed to load config for curator: %s", e)
return {}
if not isinstance(cfg, dict):
return {}
cur = cfg.get("curator") or {}
if not isinstance(cur, dict):
return {}
return cur
def is_enabled() -> bool:
"""Default ON when no config says otherwise."""
cfg = _load_config()
return bool(cfg.get("enabled", True))
def get_interval_hours() -> int:
cfg = _load_config()
try:
return int(cfg.get("interval_hours", DEFAULT_INTERVAL_HOURS))
except (TypeError, ValueError):
return DEFAULT_INTERVAL_HOURS
def get_min_idle_hours() -> float:
cfg = _load_config()
try:
return float(cfg.get("min_idle_hours", DEFAULT_MIN_IDLE_HOURS))
except (TypeError, ValueError):
return DEFAULT_MIN_IDLE_HOURS
def get_stale_after_days() -> int:
cfg = _load_config()
try:
return int(cfg.get("stale_after_days", DEFAULT_STALE_AFTER_DAYS))
except (TypeError, ValueError):
return DEFAULT_STALE_AFTER_DAYS
def get_archive_after_days() -> int:
cfg = _load_config()
try:
return int(cfg.get("archive_after_days", DEFAULT_ARCHIVE_AFTER_DAYS))
except (TypeError, ValueError):
return DEFAULT_ARCHIVE_AFTER_DAYS
# ---------------------------------------------------------------------------
# Idle / interval check
# ---------------------------------------------------------------------------
def _parse_iso(ts: Optional[str]) -> Optional[datetime]:
if not ts:
return None
try:
return datetime.fromisoformat(ts)
except (TypeError, ValueError):
return None
def should_run_now(now: Optional[datetime] = None) -> bool:
"""Return True if the curator should run immediately.
Gates:
- curator.enabled == True
- not paused
- last_run_at missing, OR older than interval_hours
The idle check (min_idle_hours) is applied at the call site where we know
whether an agent is actively running here we only enforce the static
gates.
"""
if not is_enabled():
return False
if is_paused():
return False
state = load_state()
last = _parse_iso(state.get("last_run_at"))
if last is None:
return True
if now is None:
now = datetime.now(timezone.utc)
if last.tzinfo is None:
last = last.replace(tzinfo=timezone.utc)
interval = timedelta(hours=get_interval_hours())
return (now - last) >= interval
# ---------------------------------------------------------------------------
# Automatic state transitions (pure function, no LLM)
# ---------------------------------------------------------------------------
def apply_automatic_transitions(now: Optional[datetime] = None) -> Dict[str, int]:
"""Walk every agent-created skill and move active/stale/archived based on
the latest real activity timestamp. Pinned skills are never touched.
Returns a counter dict describing what changed."""
feat(curator): background skill maintenance (issue #7816) Adds the Curator — an auxiliary-model background task that periodically reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage, transitions unused skills through active → stale → archived, and spawns a forked AIAgent to consolidate overlaps and patch drift. Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI startup and gateway boot when the last run is older than interval_hours (default 24) AND the agent has been idle for min_idle_hours (default 2). Invariants (all load-bearing): - Never touches bundled or hub-installed skills (.bundled_manifest + .hub/lock.json double-filter) - Never auto-deletes — archive only. Archives are recoverable via `hermes curator restore <skill>` - Pinned skills bypass all auto-transitions - Uses the aux client; never touches the main session's prompt cache New files: - tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes, provenance filter - agent/curator.py — orchestrator: config, idle gating, state-machine transitions (pure, no LLM), forked-agent review prompt - hermes_cli/curator.py — `hermes curator {status,run,pause,resume, pin,unpin,restore}` subcommand - tests/tools/test_skill_usage.py — 29 tests - tests/agent/test_curator.py — 25 tests Modified files (surgical patches): - tools/skills_tool.py — bump view_count on successful skill_view - tools/skill_manager_tool.py — bump patch_count on skill_manage patch/edit/write_file/remove_file; forget record on delete - hermes_cli/config.py — add curator: section to DEFAULT_CONFIG - hermes_cli/commands.py — add /curator CommandDef with subcommands - hermes_cli/main.py — register `hermes curator` subparser via register_cli() from hermes_cli.curator - cli.py — /curator slash-command dispatch + startup hook - gateway/run.py — gateway-boot hook (mirrors CLI) Validation: - 54 new tests across skill_usage + curator, all passing in 3s - 346 tests across all touched files' neighbors green - 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green - CLI smoke: `hermes curator status/pause/resume` work end-to-end Companion to PR #16026 (class-first skill review prompt) — together they form a loop: the review prompt stops near-duplicate skill creation at the source, and the curator prunes/consolidates what still accumulates. Refs #7816.
2026-04-26 06:08:39 -07:00
from tools import skill_usage as _u
if now is None:
now = datetime.now(timezone.utc)
stale_cutoff = now - timedelta(days=get_stale_after_days())
archive_cutoff = now - timedelta(days=get_archive_after_days())
counts = {"marked_stale": 0, "archived": 0, "reactivated": 0, "checked": 0}
for row in _u.agent_created_report():
counts["checked"] += 1
name = row["name"]
if row.get("pinned"):
continue
last_activity = _parse_iso(row.get("last_activity_at"))
# If never active, treat created_at as the anchor so new skills don't
# immediately archive themselves.
anchor = last_activity or _parse_iso(row.get("created_at")) or now
feat(curator): background skill maintenance (issue #7816) Adds the Curator — an auxiliary-model background task that periodically reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage, transitions unused skills through active → stale → archived, and spawns a forked AIAgent to consolidate overlaps and patch drift. Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI startup and gateway boot when the last run is older than interval_hours (default 24) AND the agent has been idle for min_idle_hours (default 2). Invariants (all load-bearing): - Never touches bundled or hub-installed skills (.bundled_manifest + .hub/lock.json double-filter) - Never auto-deletes — archive only. Archives are recoverable via `hermes curator restore <skill>` - Pinned skills bypass all auto-transitions - Uses the aux client; never touches the main session's prompt cache New files: - tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes, provenance filter - agent/curator.py — orchestrator: config, idle gating, state-machine transitions (pure, no LLM), forked-agent review prompt - hermes_cli/curator.py — `hermes curator {status,run,pause,resume, pin,unpin,restore}` subcommand - tests/tools/test_skill_usage.py — 29 tests - tests/agent/test_curator.py — 25 tests Modified files (surgical patches): - tools/skills_tool.py — bump view_count on successful skill_view - tools/skill_manager_tool.py — bump patch_count on skill_manage patch/edit/write_file/remove_file; forget record on delete - hermes_cli/config.py — add curator: section to DEFAULT_CONFIG - hermes_cli/commands.py — add /curator CommandDef with subcommands - hermes_cli/main.py — register `hermes curator` subparser via register_cli() from hermes_cli.curator - cli.py — /curator slash-command dispatch + startup hook - gateway/run.py — gateway-boot hook (mirrors CLI) Validation: - 54 new tests across skill_usage + curator, all passing in 3s - 346 tests across all touched files' neighbors green - 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green - CLI smoke: `hermes curator status/pause/resume` work end-to-end Companion to PR #16026 (class-first skill review prompt) — together they form a loop: the review prompt stops near-duplicate skill creation at the source, and the curator prunes/consolidates what still accumulates. Refs #7816.
2026-04-26 06:08:39 -07:00
if anchor.tzinfo is None:
anchor = anchor.replace(tzinfo=timezone.utc)
current = row.get("state", _u.STATE_ACTIVE)
if anchor <= archive_cutoff and current != _u.STATE_ARCHIVED:
ok, _msg = _u.archive_skill(name)
if ok:
counts["archived"] += 1
elif anchor <= stale_cutoff and current == _u.STATE_ACTIVE:
_u.set_state(name, _u.STATE_STALE)
counts["marked_stale"] += 1
elif anchor > stale_cutoff and current == _u.STATE_STALE:
# Skill got used again after being marked stale — reactivate.
_u.set_state(name, _u.STATE_ACTIVE)
counts["reactivated"] += 1
return counts
# ---------------------------------------------------------------------------
# Review prompt for the forked agent
# ---------------------------------------------------------------------------
CURATOR_REVIEW_PROMPT = (
feat(curator): umbrella-first prompt, inherit parent config, unbounded iterations Based on three live test runs against 346 agent-created skills on the author's own setup (~6.5 min, opus-4.7, 86 API calls), the curator prompt needed three sharpenings before it consistently produced real umbrella consolidation instead of passive audit output: **Umbrella-first framing.** The original 'decide keep/patch/archive/ consolidate' framing lets opus default to 'keep' whenever two skills aren't byte-identical. The new prompt explicitly tells the reviewer that pairwise distinctness is the wrong bar — the right question is 'would a human maintainer write this as N separate skills, or one skill with N labeled subsections?' Expect 10-25 prefix clusters; merge each into an umbrella via one of three methods. **Three concrete consolidation methods.** (a) Merge into an existing umbrella (patch the broadest skill, archive siblings); (b) Create a new umbrella SKILL.md (skill_manage action=create); (c) Demote session-specific detail into references/, templates/, or scripts/ under the umbrella via skill_manage action=write_file, then archive the narrow sibling. This matches the support-file vocabulary the review-prompt side already uses (PR #17213). **Two observed bailouts pre-empted:** 'usage counters are zero so I can't judge' (rule 4: judge on content, not use_count) and 'each has a distinct trigger' (rule 5: pairwise distinctness is the wrong bar). **Config-aware parent inheritance.** _run_llm_review() was building AIAgent() without explicit provider/model, hitting an auto-resolve path that returned empty credentials → HTTP 400 'No models provided' against OpenRouter. Fork now inherits the user's main provider and model (via load_config + resolve_runtime_provider) before spawning — runs on whatever the user is currently on, OAuth-backed or pool-backed included. **Unbounded iteration ceiling.** max_iterations=8 was way too low for an umbrella-build pass over hundreds of skills. A live pass takes 50-100 API calls (scanning, clustering, skill_view'ing candidates, patching umbrellas, mv'ing siblings). Raised to 9999 — the natural stopping criterion is 'no more clusters worth processing', not an arbitrary tool-call budget. **Tests updated:** test_curator_review_prompt_has_invariants accepts DO NOT / MUST NOT and drops 'keep' from the required-verb set (the umbrella-first prompt correctly deemphasizes 'keep' as a first-class decision label since passive keep-everything is the failure mode being prevented). Added test_curator_review_prompt_is_umbrella_first asserting the umbrella framing, class-level thinking, references/ + templates/ + scripts/ support-file mentions, and the 'use_count is not evidence of value' pre-emption. Added test_curator_review_prompt_offers_support_file_actions asserting skill_manage action=create and action=write_file are both named. **Live validation on author's setup:** - Run 1 (old prompt): 3 archives, stopped after surveying — typical passive outcome - Run 2 (consolidation prompt): 44 archives, 3 patches, surfaced the 50-skill mlops reorg duplicate bug but didn't umbrella - Run 3 (this prompt): 249 archives + 18 new class-level umbrellas created, reducing agent-created skills from 346 → 118 with every archived skill's content preserved as references/ under its umbrella. Pinned skill untouched. Full report in PR description.
2026-04-28 22:07:02 -07:00
"You are running as Hermes' background skill CURATOR. This is an "
"UMBRELLA-BUILDING consolidation pass, not a passive audit and not a "
"duplicate-finder.\n\n"
"The goal of the skill collection is a LIBRARY OF CLASS-LEVEL "
"INSTRUCTIONS AND EXPERIENTIAL KNOWLEDGE. A collection of hundreds of "
"narrow skills where each one captures one session's specific bug is "
"a FAILURE of the library — not a feature. An agent searching skills "
"matches on descriptions, not on exact names; one broad umbrella "
"skill with labeled subsections beats five narrow siblings for "
"discoverability, not the other way around.\n\n"
"The right target shape is CLASS-LEVEL skills with rich SKILL.md "
"bodies + `references/`, `templates/`, and `scripts/` subfiles for "
"session-specific detail — not one-session-one-skill micro-entries.\n\n"
"Hard rules — do not violate:\n"
"1. DO NOT touch bundled or hub-installed skills. The candidate list "
feat(curator): background skill maintenance (issue #7816) Adds the Curator — an auxiliary-model background task that periodically reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage, transitions unused skills through active → stale → archived, and spawns a forked AIAgent to consolidate overlaps and patch drift. Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI startup and gateway boot when the last run is older than interval_hours (default 24) AND the agent has been idle for min_idle_hours (default 2). Invariants (all load-bearing): - Never touches bundled or hub-installed skills (.bundled_manifest + .hub/lock.json double-filter) - Never auto-deletes — archive only. Archives are recoverable via `hermes curator restore <skill>` - Pinned skills bypass all auto-transitions - Uses the aux client; never touches the main session's prompt cache New files: - tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes, provenance filter - agent/curator.py — orchestrator: config, idle gating, state-machine transitions (pure, no LLM), forked-agent review prompt - hermes_cli/curator.py — `hermes curator {status,run,pause,resume, pin,unpin,restore}` subcommand - tests/tools/test_skill_usage.py — 29 tests - tests/agent/test_curator.py — 25 tests Modified files (surgical patches): - tools/skills_tool.py — bump view_count on successful skill_view - tools/skill_manager_tool.py — bump patch_count on skill_manage patch/edit/write_file/remove_file; forget record on delete - hermes_cli/config.py — add curator: section to DEFAULT_CONFIG - hermes_cli/commands.py — add /curator CommandDef with subcommands - hermes_cli/main.py — register `hermes curator` subparser via register_cli() from hermes_cli.curator - cli.py — /curator slash-command dispatch + startup hook - gateway/run.py — gateway-boot hook (mirrors CLI) Validation: - 54 new tests across skill_usage + curator, all passing in 3s - 346 tests across all touched files' neighbors green - 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green - CLI smoke: `hermes curator status/pause/resume` work end-to-end Companion to PR #16026 (class-first skill review prompt) — together they form a loop: the review prompt stops near-duplicate skill creation at the source, and the curator prunes/consolidates what still accumulates. Refs #7816.
2026-04-26 06:08:39 -07:00
"below is already filtered to agent-created skills only.\n"
feat(curator): umbrella-first prompt, inherit parent config, unbounded iterations Based on three live test runs against 346 agent-created skills on the author's own setup (~6.5 min, opus-4.7, 86 API calls), the curator prompt needed three sharpenings before it consistently produced real umbrella consolidation instead of passive audit output: **Umbrella-first framing.** The original 'decide keep/patch/archive/ consolidate' framing lets opus default to 'keep' whenever two skills aren't byte-identical. The new prompt explicitly tells the reviewer that pairwise distinctness is the wrong bar — the right question is 'would a human maintainer write this as N separate skills, or one skill with N labeled subsections?' Expect 10-25 prefix clusters; merge each into an umbrella via one of three methods. **Three concrete consolidation methods.** (a) Merge into an existing umbrella (patch the broadest skill, archive siblings); (b) Create a new umbrella SKILL.md (skill_manage action=create); (c) Demote session-specific detail into references/, templates/, or scripts/ under the umbrella via skill_manage action=write_file, then archive the narrow sibling. This matches the support-file vocabulary the review-prompt side already uses (PR #17213). **Two observed bailouts pre-empted:** 'usage counters are zero so I can't judge' (rule 4: judge on content, not use_count) and 'each has a distinct trigger' (rule 5: pairwise distinctness is the wrong bar). **Config-aware parent inheritance.** _run_llm_review() was building AIAgent() without explicit provider/model, hitting an auto-resolve path that returned empty credentials → HTTP 400 'No models provided' against OpenRouter. Fork now inherits the user's main provider and model (via load_config + resolve_runtime_provider) before spawning — runs on whatever the user is currently on, OAuth-backed or pool-backed included. **Unbounded iteration ceiling.** max_iterations=8 was way too low for an umbrella-build pass over hundreds of skills. A live pass takes 50-100 API calls (scanning, clustering, skill_view'ing candidates, patching umbrellas, mv'ing siblings). Raised to 9999 — the natural stopping criterion is 'no more clusters worth processing', not an arbitrary tool-call budget. **Tests updated:** test_curator_review_prompt_has_invariants accepts DO NOT / MUST NOT and drops 'keep' from the required-verb set (the umbrella-first prompt correctly deemphasizes 'keep' as a first-class decision label since passive keep-everything is the failure mode being prevented). Added test_curator_review_prompt_is_umbrella_first asserting the umbrella framing, class-level thinking, references/ + templates/ + scripts/ support-file mentions, and the 'use_count is not evidence of value' pre-emption. Added test_curator_review_prompt_offers_support_file_actions asserting skill_manage action=create and action=write_file are both named. **Live validation on author's setup:** - Run 1 (old prompt): 3 archives, stopped after surveying — typical passive outcome - Run 2 (consolidation prompt): 44 archives, 3 patches, surfaced the 50-skill mlops reorg duplicate bug but didn't umbrella - Run 3 (this prompt): 249 archives + 18 new class-level umbrellas created, reducing agent-created skills from 346 → 118 with every archived skill's content preserved as references/ under its umbrella. Pinned skill untouched. Full report in PR description.
2026-04-28 22:07:02 -07:00
"2. DO NOT delete any skill. Archiving (moving the skill's directory "
"into ~/.hermes/skills/.archive/) is the maximum destructive action. "
"Archives are recoverable; deletion is not.\n"
"3. DO NOT touch skills shown as pinned=yes. Skip them entirely.\n"
"4. DO NOT use usage counters as a reason to skip consolidation. The "
"counters are new and often mostly zero. Judge overlap on CONTENT, "
"not on use_count. 'use=0' is not evidence a skill is valuable; it's "
"absence of evidence either way.\n"
"5. DO NOT reject consolidation on the grounds that 'each skill has "
"a distinct trigger'. Pairwise distinctness is the wrong bar. The "
"right bar is: 'would a human maintainer write this as N separate "
"skills, or as one skill with N labeled subsections?' When the "
"answer is the latter, merge.\n\n"
"How to work — not optional:\n"
"1. Scan the full candidate list. Identify PREFIX CLUSTERS (skills "
"sharing a first word or domain keyword). Examples you are likely "
"to find: hermes-config-*, hermes-dashboard-*, gateway-*, codex-*, "
"ollama-*, anthropic-*, gemini-*, mcp-*, salvage-*, pr-*, "
"competitor-*, python-*, security-*, etc. Expect 10-25 clusters.\n"
"2. For each cluster with 2+ members, do NOT ask 'are these pairs "
"overlapping?' — ask 'what is the UMBRELLA CLASS these skills all "
"serve? Would a maintainer name that class and write one skill for "
"it?' If yes, pick (or create) the umbrella and absorb the siblings "
"into it.\n"
"3. Three ways to consolidate — use the right one per cluster:\n"
" a. MERGE INTO EXISTING UMBRELLA — one skill in the cluster is "
"already broad enough to be the umbrella (example: `pr-triage-"
"salvage` for the PR review cluster). Patch it to add a labeled "
"section for each sibling's unique insight, then archive the "
"siblings.\n"
" b. CREATE A NEW UMBRELLA SKILL.md — no existing member is broad "
"enough. Use skill_manage action=create to write a new class-level "
"skill whose SKILL.md covers the shared workflow and has short "
"labeled subsections. Archive the now-absorbed narrow siblings.\n"
" c. DEMOTE TO REFERENCES/TEMPLATES/SCRIPTS — a sibling has "
"narrow-but-valuable session-specific content. Move it into the "
"umbrella's appropriate support directory:\n"
" • `references/<topic>.md` for session-specific detail OR "
"condensed knowledge banks (quoted research, API docs excerpts, "
"domain notes, provider quirks, reproduction recipes)\n"
" • `templates/<name>.<ext>` for starter files meant to be "
"copied and modified\n"
" • `scripts/<name>.<ext>` for statically re-runnable actions "
"(verification scripts, fixture generators, probes)\n"
" Then archive the old sibling. Use `terminal` with `mkdir -p "
"~/.hermes/skills/<umbrella>/references/ && mv ... <umbrella>/"
"references/<topic>.md` (or templates/ / scripts/).\n"
"4. Also flag skills whose NAME is too narrow (contains a PR number, "
"a feature codename, a specific error string, an 'audit' / "
"'diagnosis' / 'salvage' session artifact). These almost always "
"belong as a subsection or support file under a class-level umbrella.\n"
"5. Iterate. After one consolidation round, scan the remaining set "
"and look for the NEXT umbrella opportunity. Don't stop after 3 "
"merges.\n\n"
"Your toolset:\n"
feat(curator): umbrella-first prompt, inherit parent config, unbounded iterations Based on three live test runs against 346 agent-created skills on the author's own setup (~6.5 min, opus-4.7, 86 API calls), the curator prompt needed three sharpenings before it consistently produced real umbrella consolidation instead of passive audit output: **Umbrella-first framing.** The original 'decide keep/patch/archive/ consolidate' framing lets opus default to 'keep' whenever two skills aren't byte-identical. The new prompt explicitly tells the reviewer that pairwise distinctness is the wrong bar — the right question is 'would a human maintainer write this as N separate skills, or one skill with N labeled subsections?' Expect 10-25 prefix clusters; merge each into an umbrella via one of three methods. **Three concrete consolidation methods.** (a) Merge into an existing umbrella (patch the broadest skill, archive siblings); (b) Create a new umbrella SKILL.md (skill_manage action=create); (c) Demote session-specific detail into references/, templates/, or scripts/ under the umbrella via skill_manage action=write_file, then archive the narrow sibling. This matches the support-file vocabulary the review-prompt side already uses (PR #17213). **Two observed bailouts pre-empted:** 'usage counters are zero so I can't judge' (rule 4: judge on content, not use_count) and 'each has a distinct trigger' (rule 5: pairwise distinctness is the wrong bar). **Config-aware parent inheritance.** _run_llm_review() was building AIAgent() without explicit provider/model, hitting an auto-resolve path that returned empty credentials → HTTP 400 'No models provided' against OpenRouter. Fork now inherits the user's main provider and model (via load_config + resolve_runtime_provider) before spawning — runs on whatever the user is currently on, OAuth-backed or pool-backed included. **Unbounded iteration ceiling.** max_iterations=8 was way too low for an umbrella-build pass over hundreds of skills. A live pass takes 50-100 API calls (scanning, clustering, skill_view'ing candidates, patching umbrellas, mv'ing siblings). Raised to 9999 — the natural stopping criterion is 'no more clusters worth processing', not an arbitrary tool-call budget. **Tests updated:** test_curator_review_prompt_has_invariants accepts DO NOT / MUST NOT and drops 'keep' from the required-verb set (the umbrella-first prompt correctly deemphasizes 'keep' as a first-class decision label since passive keep-everything is the failure mode being prevented). Added test_curator_review_prompt_is_umbrella_first asserting the umbrella framing, class-level thinking, references/ + templates/ + scripts/ support-file mentions, and the 'use_count is not evidence of value' pre-emption. Added test_curator_review_prompt_offers_support_file_actions asserting skill_manage action=create and action=write_file are both named. **Live validation on author's setup:** - Run 1 (old prompt): 3 archives, stopped after surveying — typical passive outcome - Run 2 (consolidation prompt): 44 archives, 3 patches, surfaced the 50-skill mlops reorg duplicate bug but didn't umbrella - Run 3 (this prompt): 249 archives + 18 new class-level umbrellas created, reducing agent-created skills from 346 → 118 with every archived skill's content preserved as references/ under its umbrella. Pinned skill untouched. Full report in PR description.
2026-04-28 22:07:02 -07:00
" - skills_list, skill_view — read the current landscape\n"
" - skill_manage action=patch — add sections to the umbrella\n"
" - skill_manage action=create — create a new umbrella SKILL.md\n"
" - skill_manage action=write_file — add a references/, templates/, "
"or scripts/ file under an existing skill (the skill must already "
"exist)\n"
" - terminal — mv a sibling into the archive "
"OR move its content into a support subfile\n\n"
"'keep' is a legitimate decision ONLY when the skill is already a "
"class-level umbrella and none of the proposed merges would improve "
"discoverability. 'This is narrow but distinct from its siblings' "
"is NOT a reason to keep — it's a reason to move it under an "
"umbrella as a subsection or support file.\n\n"
"Expected output: real umbrella-ification. Process every obvious "
"cluster. If you end the pass with fewer than 10 archives, you "
"stopped too early — go back and look at the clusters you left "
"alone.\n\n"
feat(curator): split archived into consolidated vs pruned with model + heuristic classification (#17941) * fix(curator): split 'archived' into consolidated vs pruned in run reports Users who watched a curator run saw skills like 'anthropic-api' listed under 'Skills archived' and interpreted that as pruning — but the curator had actually absorbed those skills into a new umbrella (e.g. 'llm-providers') during the same run. The directory gets archived for safety (all removals are recoverable), but the content still lives under a different name. Users then 'restored' what they thought were deleted skills and ended up with confusingly duplicated skillsets (old-name + absorbed-inside-umbrella). Classify removed skills using this run's skill_manage tool calls: - consolidated: content absorbed into a surviving/newly-created skill (evidenced by a skill_manage write_file/patch/create/edit whose target is a different skill AND whose file_path/content references the removed skill's name) - pruned: archived without consolidation evidence (truly stale) REPORT.md now shows two distinct sections: - 'Consolidated into umbrella skills' — with `removed → merged into umbrella` - 'Pruned — archived for staleness' — pure staleness archives run.json schema additions (backward compatible): - counts.consolidated_this_run, counts.pruned_this_run - consolidated: [{name, into, evidence}, ...] - pruned: [names] - archived: retained as the union for backward compat Also: relabel the auto-transitions 'archived' counter to 'archived (no LLM, pure time-based staleness)' so it's clearly distinct from LLM-pass archives. Tests: 9 new tests in test_curator_classification.py covering consolidation evidence parsing (write_file/patch/create), hyphen/underscore name variants, self-reference rejection, destination-must-exist, mixed runs, and malformed-JSON fallback safety. Existing test_report_md_is_human_readable updated to cover the new section names. E2E: isolated HERMES_HOME, realistic 3-skill run, REPORT.md verified end-to-end. * feat(curator): hybrid model-declared + heuristic classification Extend the consolidated-vs-pruned split with LLM-authored intent: 1. Curator prompt now requires a structured YAML block at the end of the final response (consolidations / prunings with short rationale). 2. _parse_structured_summary() extracts it tolerantly — missing block, malformed YAML, partial lists all fall back to heuristic cleanly. 3. _reconcile_classification() merges model intent with the tool-call heuristic: - Model wins on rationale when its umbrella exists post-run - Model hallucination (umbrella doesn't exist) is downgraded to the heuristic's finding, or pruned if there's no evidence either - Heuristic catches model omission — consolidations the model enumerated tools for but forgot to list get surfaced with a '(detected via tool-call audit)' tag 4. REPORT.md now shows per-row rationale alongside 'removed → umbrella' and flags audit-only rows so the user knows why no reason is shown. Backward compat: run.json's 'archived' field (union) is preserved. 'pruned' is now a list of dicts with {name, source, reason}; 'pruned_names' is the flat-name list for legacy consumers. Tests: 15 new covering YAML parse edge cases (malformed, empty lists, bare-string entries, missing fields), reconciler rules (model wins, hallucination fallback, heuristic catches omission, prune with reason), and an end-to-end report-render test with all four paths exercised.
2026-04-30 10:31:23 -07:00
"When done, write a human summary AND a structured machine-readable "
"block so downstream tooling can distinguish consolidation from "
"pruning. Format EXACTLY:\n\n"
"## Structured summary (required)\n"
"```yaml\n"
"consolidations:\n"
" - from: <old-skill-name>\n"
" into: <umbrella-skill-name>\n"
" reason: <one short sentence — why merged, not just 'similar'>\n"
"prunings:\n"
" - name: <skill-name>\n"
" reason: <one short sentence — why archived with no merge target>\n"
"```\n\n"
"Every skill you moved to .archive/ MUST appear in exactly one of the "
"two lists. If you consolidated X into umbrella Y (patched Y, wrote "
"a references file to Y, or created Y with X's content absorbed), X "
"goes under `consolidations` with `into: Y`. If you archived X with "
"no absorption — truly stale, irrelevant, or obsolete — X goes under "
"`prunings`. Leave a list empty (`consolidations: []`) if none. Do "
"not omit the block. The block comes AFTER your human-readable "
"summary of clusters processed, patches made, and decisions left alone."
feat(curator): background skill maintenance (issue #7816) Adds the Curator — an auxiliary-model background task that periodically reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage, transitions unused skills through active → stale → archived, and spawns a forked AIAgent to consolidate overlaps and patch drift. Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI startup and gateway boot when the last run is older than interval_hours (default 24) AND the agent has been idle for min_idle_hours (default 2). Invariants (all load-bearing): - Never touches bundled or hub-installed skills (.bundled_manifest + .hub/lock.json double-filter) - Never auto-deletes — archive only. Archives are recoverable via `hermes curator restore <skill>` - Pinned skills bypass all auto-transitions - Uses the aux client; never touches the main session's prompt cache New files: - tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes, provenance filter - agent/curator.py — orchestrator: config, idle gating, state-machine transitions (pure, no LLM), forked-agent review prompt - hermes_cli/curator.py — `hermes curator {status,run,pause,resume, pin,unpin,restore}` subcommand - tests/tools/test_skill_usage.py — 29 tests - tests/agent/test_curator.py — 25 tests Modified files (surgical patches): - tools/skills_tool.py — bump view_count on successful skill_view - tools/skill_manager_tool.py — bump patch_count on skill_manage patch/edit/write_file/remove_file; forget record on delete - hermes_cli/config.py — add curator: section to DEFAULT_CONFIG - hermes_cli/commands.py — add /curator CommandDef with subcommands - hermes_cli/main.py — register `hermes curator` subparser via register_cli() from hermes_cli.curator - cli.py — /curator slash-command dispatch + startup hook - gateway/run.py — gateway-boot hook (mirrors CLI) Validation: - 54 new tests across skill_usage + curator, all passing in 3s - 346 tests across all touched files' neighbors green - 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green - CLI smoke: `hermes curator status/pause/resume` work end-to-end Companion to PR #16026 (class-first skill review prompt) — together they form a loop: the review prompt stops near-duplicate skill creation at the source, and the curator prunes/consolidates what still accumulates. Refs #7816.
2026-04-26 06:08:39 -07:00
)
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307) Every curator pass now emits a dated report directory under `~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files: - `run.json` — machine-readable full record (before/after snapshot, state transitions, all tool calls, model/provider, timing, full LLM final response untruncated, error if any) - `REPORT.md` — human-readable markdown: model + duration header, auto-transition counts, LLM consolidation stats, archived-this-run list, new-skills-this-run list, state transitions, the full LLM final summary, and a recovery footer pointing at the archive + the `hermes curator restore` command Reports live under `logs/curator/`, not inside `skills/` — they're operational telemetry, not user-authored skill data, and belong alongside `agent.log` / `gateway.log`. Internals: - `_run_llm_review()` now returns a dict (final, summary, model, provider, tool_calls, error) instead of a bare truncated string so the reporter has full fidelity - Report writer is fully best-effort — any failure logs at DEBUG and never breaks the curator itself. Same-second rerun gets a numeric suffix so reports can't clobber each other - Report path stamped into `.curator_state` as `last_report_path` - `hermes curator status` surfaces a "last report:" line so users can immediately open the latest run Tests (all green): - 7 new tests in tests/agent/test_curator_reports.py covering: report location (logs not skills), both files written, run.json shape and diff accuracy, markdown structure, error path still writes, state transitions captured, same-second runs get unique dirs - Existing test_run_review_synchronous_invokes_llm_stub updated to stub the new dict-returning _run_llm_review signature Live E2E: ran a synchronous pass against a 1-skill test collection with a stubbed LLM; report written correctly, state stamped with last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
# ---------------------------------------------------------------------------
# Per-run reports — {YYYYMMDD-HHMMSS}/run.json + REPORT.md under logs/curator/
# ---------------------------------------------------------------------------
def _reports_root() -> Path:
"""Directory where curator run reports are written.
Lives under the profile-aware logs dir (``~/.hermes/logs/curator/``)
alongside ``agent.log`` and ``gateway.log`` so it's found by anyone
looking for operational telemetry, not mixed in with the user's
authored skill data in ``~/.hermes/skills/``.
fix(curator): seed defaults on update, create logs/curator dir, defer fire import (#17927) Three fixes bundled for curator reliability on existing installs and broken/partial installs: 1. run_agent.py: defer `import fire` into the __main__ block. `fire` is only used by `fire.Fire(main)` when running run_agent.py directly as a CLI — it is NOT needed for library usage. Importing it at module top made `from run_agent import AIAgent` from a daemon thread (e.g. the curator's forked review agent) crash with ModuleNotFoundError on broken/partial installs where `fire` isn't present. 2. hermes_cli/config.py: add version 22 → 23 migration that writes the `curator` + `auxiliary.curator` sections to config.yaml with their defaults, only filling keys the user hasn't overridden. Existing configs from before PR #16049 / the April 2026 `auxiliary.curator` unification had neither section on disk, so users couldn't see or edit the settings in their config.yaml (runtime deep-merge papered over it at read time, but the file never reflected reality). 3. hermes_cli/config.py: `ensure_hermes_home()` now pre-creates `~/.hermes/logs/curator/` alongside cron/sessions/logs/memories on every CLI launch. Managed-mode (NixOS) variant mkdir's it defensively after the activation-script existence checks, since the activation script may not know about this subpath. 4. agent/curator.py: `_reports_root()` mkdir's the dir at call time as belt-and-suspenders for entry paths that bypass both ensure_hermes_home() and the v23 migration (gateway-only installs, bare library use). E2E validated in isolated HERMES_HOME: fresh install gets full defaults seeded; partial-override config keeps user's `enabled: false` and custom `interval_hours` while filling the missing keys; re-running the migration is a no-op.
2026-04-30 04:52:28 -07:00
``ensure_hermes_home()`` pre-creates this dir on every CLI launch and
the v22v23 migration backfills it for existing profiles, but we
still mkdir here as a belt-and-suspenders so the curator works even
from an odd entry path (e.g. gateway-only install, bare library use)
that bypasses both.
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307) Every curator pass now emits a dated report directory under `~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files: - `run.json` — machine-readable full record (before/after snapshot, state transitions, all tool calls, model/provider, timing, full LLM final response untruncated, error if any) - `REPORT.md` — human-readable markdown: model + duration header, auto-transition counts, LLM consolidation stats, archived-this-run list, new-skills-this-run list, state transitions, the full LLM final summary, and a recovery footer pointing at the archive + the `hermes curator restore` command Reports live under `logs/curator/`, not inside `skills/` — they're operational telemetry, not user-authored skill data, and belong alongside `agent.log` / `gateway.log`. Internals: - `_run_llm_review()` now returns a dict (final, summary, model, provider, tool_calls, error) instead of a bare truncated string so the reporter has full fidelity - Report writer is fully best-effort — any failure logs at DEBUG and never breaks the curator itself. Same-second rerun gets a numeric suffix so reports can't clobber each other - Report path stamped into `.curator_state` as `last_report_path` - `hermes curator status` surfaces a "last report:" line so users can immediately open the latest run Tests (all green): - 7 new tests in tests/agent/test_curator_reports.py covering: report location (logs not skills), both files written, run.json shape and diff accuracy, markdown structure, error path still writes, state transitions captured, same-second runs get unique dirs - Existing test_run_review_synchronous_invokes_llm_stub updated to stub the new dict-returning _run_llm_review signature Live E2E: ran a synchronous pass against a 1-skill test collection with a stubbed LLM; report written correctly, state stamped with last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
"""
fix(curator): seed defaults on update, create logs/curator dir, defer fire import (#17927) Three fixes bundled for curator reliability on existing installs and broken/partial installs: 1. run_agent.py: defer `import fire` into the __main__ block. `fire` is only used by `fire.Fire(main)` when running run_agent.py directly as a CLI — it is NOT needed for library usage. Importing it at module top made `from run_agent import AIAgent` from a daemon thread (e.g. the curator's forked review agent) crash with ModuleNotFoundError on broken/partial installs where `fire` isn't present. 2. hermes_cli/config.py: add version 22 → 23 migration that writes the `curator` + `auxiliary.curator` sections to config.yaml with their defaults, only filling keys the user hasn't overridden. Existing configs from before PR #16049 / the April 2026 `auxiliary.curator` unification had neither section on disk, so users couldn't see or edit the settings in their config.yaml (runtime deep-merge papered over it at read time, but the file never reflected reality). 3. hermes_cli/config.py: `ensure_hermes_home()` now pre-creates `~/.hermes/logs/curator/` alongside cron/sessions/logs/memories on every CLI launch. Managed-mode (NixOS) variant mkdir's it defensively after the activation-script existence checks, since the activation script may not know about this subpath. 4. agent/curator.py: `_reports_root()` mkdir's the dir at call time as belt-and-suspenders for entry paths that bypass both ensure_hermes_home() and the v23 migration (gateway-only installs, bare library use). E2E validated in isolated HERMES_HOME: fresh install gets full defaults seeded; partial-override config keeps user's `enabled: false` and custom `interval_hours` while filling the missing keys; re-running the migration is a no-op.
2026-04-30 04:52:28 -07:00
root = get_hermes_home() / "logs" / "curator"
try:
root.mkdir(parents=True, exist_ok=True)
except OSError as e:
logger.debug("Curator reports dir create failed: %s", e)
return root
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307) Every curator pass now emits a dated report directory under `~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files: - `run.json` — machine-readable full record (before/after snapshot, state transitions, all tool calls, model/provider, timing, full LLM final response untruncated, error if any) - `REPORT.md` — human-readable markdown: model + duration header, auto-transition counts, LLM consolidation stats, archived-this-run list, new-skills-this-run list, state transitions, the full LLM final summary, and a recovery footer pointing at the archive + the `hermes curator restore` command Reports live under `logs/curator/`, not inside `skills/` — they're operational telemetry, not user-authored skill data, and belong alongside `agent.log` / `gateway.log`. Internals: - `_run_llm_review()` now returns a dict (final, summary, model, provider, tool_calls, error) instead of a bare truncated string so the reporter has full fidelity - Report writer is fully best-effort — any failure logs at DEBUG and never breaks the curator itself. Same-second rerun gets a numeric suffix so reports can't clobber each other - Report path stamped into `.curator_state` as `last_report_path` - `hermes curator status` surfaces a "last report:" line so users can immediately open the latest run Tests (all green): - 7 new tests in tests/agent/test_curator_reports.py covering: report location (logs not skills), both files written, run.json shape and diff accuracy, markdown structure, error path still writes, state transitions captured, same-second runs get unique dirs - Existing test_run_review_synchronous_invokes_llm_stub updated to stub the new dict-returning _run_llm_review signature Live E2E: ran a synchronous pass against a 1-skill test collection with a stubbed LLM; report written correctly, state stamped with last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
feat(curator): split archived into consolidated vs pruned with model + heuristic classification (#17941) * fix(curator): split 'archived' into consolidated vs pruned in run reports Users who watched a curator run saw skills like 'anthropic-api' listed under 'Skills archived' and interpreted that as pruning — but the curator had actually absorbed those skills into a new umbrella (e.g. 'llm-providers') during the same run. The directory gets archived for safety (all removals are recoverable), but the content still lives under a different name. Users then 'restored' what they thought were deleted skills and ended up with confusingly duplicated skillsets (old-name + absorbed-inside-umbrella). Classify removed skills using this run's skill_manage tool calls: - consolidated: content absorbed into a surviving/newly-created skill (evidenced by a skill_manage write_file/patch/create/edit whose target is a different skill AND whose file_path/content references the removed skill's name) - pruned: archived without consolidation evidence (truly stale) REPORT.md now shows two distinct sections: - 'Consolidated into umbrella skills' — with `removed → merged into umbrella` - 'Pruned — archived for staleness' — pure staleness archives run.json schema additions (backward compatible): - counts.consolidated_this_run, counts.pruned_this_run - consolidated: [{name, into, evidence}, ...] - pruned: [names] - archived: retained as the union for backward compat Also: relabel the auto-transitions 'archived' counter to 'archived (no LLM, pure time-based staleness)' so it's clearly distinct from LLM-pass archives. Tests: 9 new tests in test_curator_classification.py covering consolidation evidence parsing (write_file/patch/create), hyphen/underscore name variants, self-reference rejection, destination-must-exist, mixed runs, and malformed-JSON fallback safety. Existing test_report_md_is_human_readable updated to cover the new section names. E2E: isolated HERMES_HOME, realistic 3-skill run, REPORT.md verified end-to-end. * feat(curator): hybrid model-declared + heuristic classification Extend the consolidated-vs-pruned split with LLM-authored intent: 1. Curator prompt now requires a structured YAML block at the end of the final response (consolidations / prunings with short rationale). 2. _parse_structured_summary() extracts it tolerantly — missing block, malformed YAML, partial lists all fall back to heuristic cleanly. 3. _reconcile_classification() merges model intent with the tool-call heuristic: - Model wins on rationale when its umbrella exists post-run - Model hallucination (umbrella doesn't exist) is downgraded to the heuristic's finding, or pruned if there's no evidence either - Heuristic catches model omission — consolidations the model enumerated tools for but forgot to list get surfaced with a '(detected via tool-call audit)' tag 4. REPORT.md now shows per-row rationale alongside 'removed → umbrella' and flags audit-only rows so the user knows why no reason is shown. Backward compat: run.json's 'archived' field (union) is preserved. 'pruned' is now a list of dicts with {name, source, reason}; 'pruned_names' is the flat-name list for legacy consumers. Tests: 15 new covering YAML parse edge cases (malformed, empty lists, bare-string entries, missing fields), reconciler rules (model wins, hallucination fallback, heuristic catches omission, prune with reason), and an end-to-end report-render test with all four paths exercised.
2026-04-30 10:31:23 -07:00
def _classify_removed_skills(
removed: List[str],
added: List[str],
after_names: Set[str],
tool_calls: List[Dict[str, Any]],
) -> Dict[str, List[Dict[str, Any]]]:
"""Split ``removed`` into consolidated vs pruned.
A removed skill is "consolidated" when the curator absorbed its content
into another skill (an umbrella) during this run the content still
lives, just under a different name. A removed skill is "pruned" when the
curator archived it for staleness/irrelevance without preserving its
content elsewhere.
Heuristic: scan this run's ``skill_manage`` tool calls and look for
``write_file``/``patch``/``create``/``edit`` actions whose target skill
(the ``name`` argument) is NOT the removed skill and whose
``file_path`` / ``file_content`` / ``content`` arguments reference the
removed skill's name. That's the textbook "absorbed into umbrella"
signal. Ties are broken by first-match (earliest tool call wins).
Returns ``{"consolidated": [{"name", "into", "evidence"}, ...],
"pruned": [{"name"}, ...]}``.
"""
consolidated: List[Dict[str, Any]] = []
pruned: List[Dict[str, Any]] = []
# Pre-parse tool calls: we only care about skill_manage.
parsed_calls: List[Dict[str, Any]] = []
for tc in tool_calls or []:
if not isinstance(tc, dict):
continue
if tc.get("name") != "skill_manage":
continue
raw = tc.get("arguments") or ""
# Arguments can be a JSON string (standard) or a dict (defensive).
args: Dict[str, Any] = {}
if isinstance(raw, dict):
args = raw
elif isinstance(raw, str):
try:
args = json.loads(raw)
except Exception:
# Truncated or malformed — fall back to substring match on
# the raw string so we still catch the common case.
args = {"_raw": raw}
if not isinstance(args, dict):
continue
parsed_calls.append(args)
# Build a set of "destination" skill names: anything still present after
# the run plus anything newly added this run. A removed skill being
# referenced from one of these is the consolidation signal.
destinations = set(after_names) | set(added or [])
for name in removed:
if not name:
continue
into: Optional[str] = None
evidence: Optional[str] = None
# Normalise name variants we'll search for in path/content strings.
needles = {name, name.replace("-", "_"), name.replace("_", "-")}
for args in parsed_calls:
target = args.get("name")
if not isinstance(target, str) or not target:
continue
# A call that operates on the removed skill itself isn't
# consolidation evidence.
if target == name:
continue
# The target must be a surviving or newly-created skill —
# otherwise we're pointing to a skill that doesn't exist.
if target not in destinations:
continue
# Look for the removed skill's name in file_path / content / raw.
haystacks: List[str] = []
for key in ("file_path", "file_content", "content", "new_string", "_raw"):
v = args.get(key)
if isinstance(v, str):
haystacks.append(v)
hit = False
for hay in haystacks:
for needle in needles:
if needle and needle in hay:
hit = True
evidence = (
f"skill_manage action={args.get('action', '?')} "
f"on '{target}' referenced '{name}' "
f"in {hay[:80]}"
)
break
if hit:
break
if hit:
into = target
break
if into:
consolidated.append({"name": name, "into": into, "evidence": evidence})
else:
pruned.append({"name": name})
return {"consolidated": consolidated, "pruned": pruned}
def _parse_structured_summary(
llm_final: str,
) -> Dict[str, List[Dict[str, str]]]:
"""Extract the structured YAML block from the curator's final response.
The curator prompt requires a fenced ```yaml block under
``## Structured summary (required)`` with ``consolidations:`` and
``prunings:`` lists. This parses it tolerantly:
- Missing block returns empty lists (we'll fall back to heuristic).
- Malformed YAML returns empty lists and we rely on heuristic.
- Partial block (e.g. only consolidations) returns what we could parse.
Returns ``{"consolidations": [{"from", "into", "reason"}, ...],
"prunings": [{"name", "reason"}, ...]}``.
"""
empty = {"consolidations": [], "prunings": []}
if not llm_final or not isinstance(llm_final, str):
return empty
# Find the YAML fenced block. We look for ```yaml ... ``` specifically
# rather than any fenced block so we don't accidentally pick up a code
# sample the model quoted elsewhere.
import re
match = re.search(
r"```ya?ml\s*\n(.*?)\n```",
llm_final,
re.DOTALL | re.IGNORECASE,
)
if not match:
return empty
body = match.group(1)
# Prefer PyYAML when available — every hermes install already has it
# (config.yaml loader). Fall back to a hand parser for paranoia.
try:
import yaml # type: ignore
data = yaml.safe_load(body)
except Exception:
return empty
if not isinstance(data, dict):
return empty
out: Dict[str, List[Dict[str, str]]] = {"consolidations": [], "prunings": []}
cons_raw = data.get("consolidations") or []
prun_raw = data.get("prunings") or []
if isinstance(cons_raw, list):
for entry in cons_raw:
if not isinstance(entry, dict):
continue
frm = entry.get("from")
into = entry.get("into")
if not (isinstance(frm, str) and frm.strip()
and isinstance(into, str) and into.strip()):
continue
reason = entry.get("reason")
out["consolidations"].append({
"from": frm.strip(),
"into": into.strip(),
"reason": (reason or "").strip() if isinstance(reason, str) else "",
})
if isinstance(prun_raw, list):
for entry in prun_raw:
if not isinstance(entry, dict):
continue
name = entry.get("name")
if not (isinstance(name, str) and name.strip()):
continue
reason = entry.get("reason")
out["prunings"].append({
"name": name.strip(),
"reason": (reason or "").strip() if isinstance(reason, str) else "",
})
return out
def _reconcile_classification(
removed: List[str],
heuristic: Dict[str, List[Dict[str, Any]]],
model_block: Dict[str, List[Dict[str, str]]],
destinations: Set[str],
) -> Dict[str, List[Dict[str, Any]]]:
"""Merge heuristic (tool-call evidence) with the model's structured block.
Rules:
- Model-declared consolidation wins when its ``into`` target exists
in ``destinations`` (survived or newly-created). This gives the
model authority over intent + rationale.
- Model-declared consolidation whose ``into`` target does NOT exist is
downgraded: the model hallucinated an umbrella. We prefer the
heuristic's finding for that skill, or fall back to pruned.
- Heuristic-only finding (model didn't mention it, tool calls confirm)
is preserved as a consolidation, marked ``source="tool-call audit"``.
- Model-declared pruning is accepted unless the heuristic has
tool-call evidence that contradicts it (rare the heuristic would
have flagged consolidation). In that case we log both.
Every removed skill is placed in exactly one bucket.
"""
heur_cons = {e["name"]: e for e in heuristic.get("consolidated", [])}
heur_pruned = {e["name"] for e in heuristic.get("pruned", [])}
model_cons = {e["from"]: e for e in model_block.get("consolidations", [])}
model_pruned = {e["name"]: e for e in model_block.get("prunings", [])}
consolidated: List[Dict[str, Any]] = []
pruned: List[Dict[str, Any]] = []
for name in removed:
mc = model_cons.get(name)
mp = model_pruned.get(name)
hc = heur_cons.get(name)
# Model says consolidated — trust it if the destination is real.
if mc and mc.get("into") in destinations:
entry: Dict[str, Any] = {
"name": name,
"into": mc["into"],
"source": "model" + ("+audit" if hc else ""),
"reason": mc.get("reason") or "",
}
if hc and hc.get("evidence"):
entry["evidence"] = hc["evidence"]
consolidated.append(entry)
continue
# Model says consolidated but the umbrella doesn't exist —
# hallucination. Fall back to heuristic or prune.
if mc and mc.get("into") not in destinations:
if hc:
consolidated.append({
"name": name,
"into": hc["into"],
"source": "tool-call audit (model named missing umbrella)",
"reason": "",
"evidence": hc.get("evidence", ""),
"model_claimed_into": mc["into"],
})
else:
pruned.append({
"name": name,
"source": "fallback (model named missing umbrella, no tool-call evidence)",
"reason": "",
})
continue
# Heuristic found consolidation the model didn't mention.
if hc:
consolidated.append({
"name": name,
"into": hc["into"],
"source": "tool-call audit (model omitted from structured block)",
"reason": "",
"evidence": hc.get("evidence", ""),
})
continue
# Model says pruned (or no mention + no heuristic evidence).
reason = mp.get("reason", "") if mp else ""
pruned.append({
"name": name,
"source": "model" if mp else "no-evidence fallback",
"reason": reason,
})
return {"consolidated": consolidated, "pruned": pruned}
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307) Every curator pass now emits a dated report directory under `~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files: - `run.json` — machine-readable full record (before/after snapshot, state transitions, all tool calls, model/provider, timing, full LLM final response untruncated, error if any) - `REPORT.md` — human-readable markdown: model + duration header, auto-transition counts, LLM consolidation stats, archived-this-run list, new-skills-this-run list, state transitions, the full LLM final summary, and a recovery footer pointing at the archive + the `hermes curator restore` command Reports live under `logs/curator/`, not inside `skills/` — they're operational telemetry, not user-authored skill data, and belong alongside `agent.log` / `gateway.log`. Internals: - `_run_llm_review()` now returns a dict (final, summary, model, provider, tool_calls, error) instead of a bare truncated string so the reporter has full fidelity - Report writer is fully best-effort — any failure logs at DEBUG and never breaks the curator itself. Same-second rerun gets a numeric suffix so reports can't clobber each other - Report path stamped into `.curator_state` as `last_report_path` - `hermes curator status` surfaces a "last report:" line so users can immediately open the latest run Tests (all green): - 7 new tests in tests/agent/test_curator_reports.py covering: report location (logs not skills), both files written, run.json shape and diff accuracy, markdown structure, error path still writes, state transitions captured, same-second runs get unique dirs - Existing test_run_review_synchronous_invokes_llm_stub updated to stub the new dict-returning _run_llm_review signature Live E2E: ran a synchronous pass against a 1-skill test collection with a stubbed LLM; report written correctly, state stamped with last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
def _write_run_report(
*,
started_at: datetime,
elapsed_seconds: float,
auto_counts: Dict[str, int],
auto_summary: str,
before_report: List[Dict[str, Any]],
before_names: Set[str],
after_report: List[Dict[str, Any]],
llm_meta: Dict[str, Any],
) -> Optional[Path]:
"""Write run.json + REPORT.md under logs/curator/{YYYYMMDD-HHMMSS}/.
Returns the report directory path on success, None if the write
couldn't happen (caller logs and continues — reporting is best-effort).
"""
root = _reports_root()
try:
root.mkdir(parents=True, exist_ok=True)
except Exception as e:
logger.debug("Curator report dir create failed: %s", e)
return None
stamp = started_at.strftime("%Y%m%d-%H%M%S")
run_dir = root / stamp
# If we crash-reran within the same second, append a disambiguator
suffix = 1
while run_dir.exists():
suffix += 1
run_dir = root / f"{stamp}-{suffix}"
try:
run_dir.mkdir(parents=True, exist_ok=False)
except Exception as e:
logger.debug("Curator run dir create failed: %s", e)
return None
# Diff before/after
after_by_name = {r.get("name"): r for r in after_report if isinstance(r, dict)}
after_names = set(after_by_name.keys())
removed = sorted(before_names - after_names) # archived during this run
added = sorted(after_names - before_names) # new skills this run
before_by_name = {r.get("name"): r for r in before_report if isinstance(r, dict)}
# State transitions between the two snapshots (e.g. active -> stale)
transitions: List[Dict[str, str]] = []
for name in sorted(after_names & before_names):
s_before = (before_by_name.get(name) or {}).get("state")
s_after = (after_by_name.get(name) or {}).get("state")
if s_before and s_after and s_before != s_after:
transitions.append({"name": name, "from": s_before, "to": s_after})
# Classify LLM tool calls
tc_counts: Dict[str, int] = {}
for tc in llm_meta.get("tool_calls", []) or []:
name = tc.get("name", "unknown")
tc_counts[name] = tc_counts.get(name, 0) + 1
feat(curator): split archived into consolidated vs pruned with model + heuristic classification (#17941) * fix(curator): split 'archived' into consolidated vs pruned in run reports Users who watched a curator run saw skills like 'anthropic-api' listed under 'Skills archived' and interpreted that as pruning — but the curator had actually absorbed those skills into a new umbrella (e.g. 'llm-providers') during the same run. The directory gets archived for safety (all removals are recoverable), but the content still lives under a different name. Users then 'restored' what they thought were deleted skills and ended up with confusingly duplicated skillsets (old-name + absorbed-inside-umbrella). Classify removed skills using this run's skill_manage tool calls: - consolidated: content absorbed into a surviving/newly-created skill (evidenced by a skill_manage write_file/patch/create/edit whose target is a different skill AND whose file_path/content references the removed skill's name) - pruned: archived without consolidation evidence (truly stale) REPORT.md now shows two distinct sections: - 'Consolidated into umbrella skills' — with `removed → merged into umbrella` - 'Pruned — archived for staleness' — pure staleness archives run.json schema additions (backward compatible): - counts.consolidated_this_run, counts.pruned_this_run - consolidated: [{name, into, evidence}, ...] - pruned: [names] - archived: retained as the union for backward compat Also: relabel the auto-transitions 'archived' counter to 'archived (no LLM, pure time-based staleness)' so it's clearly distinct from LLM-pass archives. Tests: 9 new tests in test_curator_classification.py covering consolidation evidence parsing (write_file/patch/create), hyphen/underscore name variants, self-reference rejection, destination-must-exist, mixed runs, and malformed-JSON fallback safety. Existing test_report_md_is_human_readable updated to cover the new section names. E2E: isolated HERMES_HOME, realistic 3-skill run, REPORT.md verified end-to-end. * feat(curator): hybrid model-declared + heuristic classification Extend the consolidated-vs-pruned split with LLM-authored intent: 1. Curator prompt now requires a structured YAML block at the end of the final response (consolidations / prunings with short rationale). 2. _parse_structured_summary() extracts it tolerantly — missing block, malformed YAML, partial lists all fall back to heuristic cleanly. 3. _reconcile_classification() merges model intent with the tool-call heuristic: - Model wins on rationale when its umbrella exists post-run - Model hallucination (umbrella doesn't exist) is downgraded to the heuristic's finding, or pruned if there's no evidence either - Heuristic catches model omission — consolidations the model enumerated tools for but forgot to list get surfaced with a '(detected via tool-call audit)' tag 4. REPORT.md now shows per-row rationale alongside 'removed → umbrella' and flags audit-only rows so the user knows why no reason is shown. Backward compat: run.json's 'archived' field (union) is preserved. 'pruned' is now a list of dicts with {name, source, reason}; 'pruned_names' is the flat-name list for legacy consumers. Tests: 15 new covering YAML parse edge cases (malformed, empty lists, bare-string entries, missing fields), reconciler rules (model wins, hallucination fallback, heuristic catches omission, prune with reason), and an end-to-end report-render test with all four paths exercised.
2026-04-30 10:31:23 -07:00
# Split "removed" into consolidated (absorbed into umbrella) vs pruned
# (archived for staleness, content not preserved elsewhere). The old
# "Skills archived" section lumped both together, which misled users
# into thinking consolidated skills had been pruned.
#
# Classification strategy:
# 1. Parse the curator's structured YAML block from its final response.
# The curator is now prompted to emit consolidations/prunings lists
# with short rationale. The model has intent visibility the tool
# calls don't.
# 2. Run the tool-call heuristic as a ground-truth audit.
# 3. Reconcile: model gets authority over intent + rationale, heuristic
# catches hallucination (umbrella doesn't exist) and omission
# (model forgot to list an actual consolidation).
heuristic = _classify_removed_skills(
removed=removed,
added=added,
after_names=after_names,
tool_calls=llm_meta.get("tool_calls", []) or [],
)
model_block = _parse_structured_summary(llm_meta.get("final", "") or "")
destinations = set(after_names) | set(added or [])
classification = _reconcile_classification(
removed=removed,
heuristic=heuristic,
model_block=model_block,
destinations=destinations,
)
consolidated = classification["consolidated"]
pruned = classification["pruned"]
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307) Every curator pass now emits a dated report directory under `~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files: - `run.json` — machine-readable full record (before/after snapshot, state transitions, all tool calls, model/provider, timing, full LLM final response untruncated, error if any) - `REPORT.md` — human-readable markdown: model + duration header, auto-transition counts, LLM consolidation stats, archived-this-run list, new-skills-this-run list, state transitions, the full LLM final summary, and a recovery footer pointing at the archive + the `hermes curator restore` command Reports live under `logs/curator/`, not inside `skills/` — they're operational telemetry, not user-authored skill data, and belong alongside `agent.log` / `gateway.log`. Internals: - `_run_llm_review()` now returns a dict (final, summary, model, provider, tool_calls, error) instead of a bare truncated string so the reporter has full fidelity - Report writer is fully best-effort — any failure logs at DEBUG and never breaks the curator itself. Same-second rerun gets a numeric suffix so reports can't clobber each other - Report path stamped into `.curator_state` as `last_report_path` - `hermes curator status` surfaces a "last report:" line so users can immediately open the latest run Tests (all green): - 7 new tests in tests/agent/test_curator_reports.py covering: report location (logs not skills), both files written, run.json shape and diff accuracy, markdown structure, error path still writes, state transitions captured, same-second runs get unique dirs - Existing test_run_review_synchronous_invokes_llm_stub updated to stub the new dict-returning _run_llm_review signature Live E2E: ran a synchronous pass against a 1-skill test collection with a stubbed LLM; report written correctly, state stamped with last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
payload = {
"started_at": started_at.isoformat(),
"duration_seconds": round(elapsed_seconds, 2),
"model": llm_meta.get("model", ""),
"provider": llm_meta.get("provider", ""),
"auto_transitions": auto_counts,
"counts": {
"before": len(before_names),
"after": len(after_names),
"delta": len(after_names) - len(before_names),
"archived_this_run": len(removed),
"added_this_run": len(added),
feat(curator): split archived into consolidated vs pruned with model + heuristic classification (#17941) * fix(curator): split 'archived' into consolidated vs pruned in run reports Users who watched a curator run saw skills like 'anthropic-api' listed under 'Skills archived' and interpreted that as pruning — but the curator had actually absorbed those skills into a new umbrella (e.g. 'llm-providers') during the same run. The directory gets archived for safety (all removals are recoverable), but the content still lives under a different name. Users then 'restored' what they thought were deleted skills and ended up with confusingly duplicated skillsets (old-name + absorbed-inside-umbrella). Classify removed skills using this run's skill_manage tool calls: - consolidated: content absorbed into a surviving/newly-created skill (evidenced by a skill_manage write_file/patch/create/edit whose target is a different skill AND whose file_path/content references the removed skill's name) - pruned: archived without consolidation evidence (truly stale) REPORT.md now shows two distinct sections: - 'Consolidated into umbrella skills' — with `removed → merged into umbrella` - 'Pruned — archived for staleness' — pure staleness archives run.json schema additions (backward compatible): - counts.consolidated_this_run, counts.pruned_this_run - consolidated: [{name, into, evidence}, ...] - pruned: [names] - archived: retained as the union for backward compat Also: relabel the auto-transitions 'archived' counter to 'archived (no LLM, pure time-based staleness)' so it's clearly distinct from LLM-pass archives. Tests: 9 new tests in test_curator_classification.py covering consolidation evidence parsing (write_file/patch/create), hyphen/underscore name variants, self-reference rejection, destination-must-exist, mixed runs, and malformed-JSON fallback safety. Existing test_report_md_is_human_readable updated to cover the new section names. E2E: isolated HERMES_HOME, realistic 3-skill run, REPORT.md verified end-to-end. * feat(curator): hybrid model-declared + heuristic classification Extend the consolidated-vs-pruned split with LLM-authored intent: 1. Curator prompt now requires a structured YAML block at the end of the final response (consolidations / prunings with short rationale). 2. _parse_structured_summary() extracts it tolerantly — missing block, malformed YAML, partial lists all fall back to heuristic cleanly. 3. _reconcile_classification() merges model intent with the tool-call heuristic: - Model wins on rationale when its umbrella exists post-run - Model hallucination (umbrella doesn't exist) is downgraded to the heuristic's finding, or pruned if there's no evidence either - Heuristic catches model omission — consolidations the model enumerated tools for but forgot to list get surfaced with a '(detected via tool-call audit)' tag 4. REPORT.md now shows per-row rationale alongside 'removed → umbrella' and flags audit-only rows so the user knows why no reason is shown. Backward compat: run.json's 'archived' field (union) is preserved. 'pruned' is now a list of dicts with {name, source, reason}; 'pruned_names' is the flat-name list for legacy consumers. Tests: 15 new covering YAML parse edge cases (malformed, empty lists, bare-string entries, missing fields), reconciler rules (model wins, hallucination fallback, heuristic catches omission, prune with reason), and an end-to-end report-render test with all four paths exercised.
2026-04-30 10:31:23 -07:00
"consolidated_this_run": len(consolidated),
"pruned_this_run": len(pruned),
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307) Every curator pass now emits a dated report directory under `~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files: - `run.json` — machine-readable full record (before/after snapshot, state transitions, all tool calls, model/provider, timing, full LLM final response untruncated, error if any) - `REPORT.md` — human-readable markdown: model + duration header, auto-transition counts, LLM consolidation stats, archived-this-run list, new-skills-this-run list, state transitions, the full LLM final summary, and a recovery footer pointing at the archive + the `hermes curator restore` command Reports live under `logs/curator/`, not inside `skills/` — they're operational telemetry, not user-authored skill data, and belong alongside `agent.log` / `gateway.log`. Internals: - `_run_llm_review()` now returns a dict (final, summary, model, provider, tool_calls, error) instead of a bare truncated string so the reporter has full fidelity - Report writer is fully best-effort — any failure logs at DEBUG and never breaks the curator itself. Same-second rerun gets a numeric suffix so reports can't clobber each other - Report path stamped into `.curator_state` as `last_report_path` - `hermes curator status` surfaces a "last report:" line so users can immediately open the latest run Tests (all green): - 7 new tests in tests/agent/test_curator_reports.py covering: report location (logs not skills), both files written, run.json shape and diff accuracy, markdown structure, error path still writes, state transitions captured, same-second runs get unique dirs - Existing test_run_review_synchronous_invokes_llm_stub updated to stub the new dict-returning _run_llm_review signature Live E2E: ran a synchronous pass against a 1-skill test collection with a stubbed LLM; report written correctly, state stamped with last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
"state_transitions": len(transitions),
"tool_calls_total": sum(tc_counts.values()),
},
"tool_call_counts": tc_counts,
"archived": removed,
feat(curator): split archived into consolidated vs pruned with model + heuristic classification (#17941) * fix(curator): split 'archived' into consolidated vs pruned in run reports Users who watched a curator run saw skills like 'anthropic-api' listed under 'Skills archived' and interpreted that as pruning — but the curator had actually absorbed those skills into a new umbrella (e.g. 'llm-providers') during the same run. The directory gets archived for safety (all removals are recoverable), but the content still lives under a different name. Users then 'restored' what they thought were deleted skills and ended up with confusingly duplicated skillsets (old-name + absorbed-inside-umbrella). Classify removed skills using this run's skill_manage tool calls: - consolidated: content absorbed into a surviving/newly-created skill (evidenced by a skill_manage write_file/patch/create/edit whose target is a different skill AND whose file_path/content references the removed skill's name) - pruned: archived without consolidation evidence (truly stale) REPORT.md now shows two distinct sections: - 'Consolidated into umbrella skills' — with `removed → merged into umbrella` - 'Pruned — archived for staleness' — pure staleness archives run.json schema additions (backward compatible): - counts.consolidated_this_run, counts.pruned_this_run - consolidated: [{name, into, evidence}, ...] - pruned: [names] - archived: retained as the union for backward compat Also: relabel the auto-transitions 'archived' counter to 'archived (no LLM, pure time-based staleness)' so it's clearly distinct from LLM-pass archives. Tests: 9 new tests in test_curator_classification.py covering consolidation evidence parsing (write_file/patch/create), hyphen/underscore name variants, self-reference rejection, destination-must-exist, mixed runs, and malformed-JSON fallback safety. Existing test_report_md_is_human_readable updated to cover the new section names. E2E: isolated HERMES_HOME, realistic 3-skill run, REPORT.md verified end-to-end. * feat(curator): hybrid model-declared + heuristic classification Extend the consolidated-vs-pruned split with LLM-authored intent: 1. Curator prompt now requires a structured YAML block at the end of the final response (consolidations / prunings with short rationale). 2. _parse_structured_summary() extracts it tolerantly — missing block, malformed YAML, partial lists all fall back to heuristic cleanly. 3. _reconcile_classification() merges model intent with the tool-call heuristic: - Model wins on rationale when its umbrella exists post-run - Model hallucination (umbrella doesn't exist) is downgraded to the heuristic's finding, or pruned if there's no evidence either - Heuristic catches model omission — consolidations the model enumerated tools for but forgot to list get surfaced with a '(detected via tool-call audit)' tag 4. REPORT.md now shows per-row rationale alongside 'removed → umbrella' and flags audit-only rows so the user knows why no reason is shown. Backward compat: run.json's 'archived' field (union) is preserved. 'pruned' is now a list of dicts with {name, source, reason}; 'pruned_names' is the flat-name list for legacy consumers. Tests: 15 new covering YAML parse edge cases (malformed, empty lists, bare-string entries, missing fields), reconciler rules (model wins, hallucination fallback, heuristic catches omission, prune with reason), and an end-to-end report-render test with all four paths exercised.
2026-04-30 10:31:23 -07:00
"consolidated": consolidated,
"pruned": pruned,
"pruned_names": [p["name"] for p in pruned],
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307) Every curator pass now emits a dated report directory under `~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files: - `run.json` — machine-readable full record (before/after snapshot, state transitions, all tool calls, model/provider, timing, full LLM final response untruncated, error if any) - `REPORT.md` — human-readable markdown: model + duration header, auto-transition counts, LLM consolidation stats, archived-this-run list, new-skills-this-run list, state transitions, the full LLM final summary, and a recovery footer pointing at the archive + the `hermes curator restore` command Reports live under `logs/curator/`, not inside `skills/` — they're operational telemetry, not user-authored skill data, and belong alongside `agent.log` / `gateway.log`. Internals: - `_run_llm_review()` now returns a dict (final, summary, model, provider, tool_calls, error) instead of a bare truncated string so the reporter has full fidelity - Report writer is fully best-effort — any failure logs at DEBUG and never breaks the curator itself. Same-second rerun gets a numeric suffix so reports can't clobber each other - Report path stamped into `.curator_state` as `last_report_path` - `hermes curator status` surfaces a "last report:" line so users can immediately open the latest run Tests (all green): - 7 new tests in tests/agent/test_curator_reports.py covering: report location (logs not skills), both files written, run.json shape and diff accuracy, markdown structure, error path still writes, state transitions captured, same-second runs get unique dirs - Existing test_run_review_synchronous_invokes_llm_stub updated to stub the new dict-returning _run_llm_review signature Live E2E: ran a synchronous pass against a 1-skill test collection with a stubbed LLM; report written correctly, state stamped with last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
"added": added,
"state_transitions": transitions,
"llm_final": llm_meta.get("final", ""),
"llm_summary": llm_meta.get("summary", ""),
"llm_error": llm_meta.get("error"),
"tool_calls": llm_meta.get("tool_calls", []),
}
# run.json — machine-readable, full fidelity
try:
(run_dir / "run.json").write_text(
json.dumps(payload, indent=2, ensure_ascii=False) + "\n",
encoding="utf-8",
)
except Exception as e:
logger.debug("Curator run.json write failed: %s", e)
# REPORT.md — human-readable
try:
md = _render_report_markdown(payload)
(run_dir / "REPORT.md").write_text(md, encoding="utf-8")
except Exception as e:
logger.debug("Curator REPORT.md write failed: %s", e)
return run_dir
def _render_report_markdown(p: Dict[str, Any]) -> str:
"""Render the human-readable report."""
lines: List[str] = []
started = p.get("started_at", "")
duration = p.get("duration_seconds", 0) or 0
mins, secs = divmod(int(duration), 60)
dur_label = f"{mins}m {secs}s" if mins else f"{secs}s"
lines.append(f"# Curator run — {started}\n")
model = p.get("model") or "(not resolved)"
prov = p.get("provider") or "(not resolved)"
counts = p.get("counts") or {}
lines.append(
f"Model: `{model}` via `{prov}` · Duration: {dur_label} · "
f"Agent-created skills: {counts.get('before', 0)}{counts.get('after', 0)} "
f"({counts.get('delta', 0):+d})\n"
)
error = p.get("llm_error")
if error:
lines.append(f"> ⚠ LLM pass error: `{error}`\n")
# Auto-transitions (pure, no LLM)
auto = p.get("auto_transitions") or {}
lines.append("## Auto-transitions (pure, no LLM)\n")
lines.append(f"- checked: {auto.get('checked', 0)}")
lines.append(f"- marked stale: {auto.get('marked_stale', 0)}")
feat(curator): split archived into consolidated vs pruned with model + heuristic classification (#17941) * fix(curator): split 'archived' into consolidated vs pruned in run reports Users who watched a curator run saw skills like 'anthropic-api' listed under 'Skills archived' and interpreted that as pruning — but the curator had actually absorbed those skills into a new umbrella (e.g. 'llm-providers') during the same run. The directory gets archived for safety (all removals are recoverable), but the content still lives under a different name. Users then 'restored' what they thought were deleted skills and ended up with confusingly duplicated skillsets (old-name + absorbed-inside-umbrella). Classify removed skills using this run's skill_manage tool calls: - consolidated: content absorbed into a surviving/newly-created skill (evidenced by a skill_manage write_file/patch/create/edit whose target is a different skill AND whose file_path/content references the removed skill's name) - pruned: archived without consolidation evidence (truly stale) REPORT.md now shows two distinct sections: - 'Consolidated into umbrella skills' — with `removed → merged into umbrella` - 'Pruned — archived for staleness' — pure staleness archives run.json schema additions (backward compatible): - counts.consolidated_this_run, counts.pruned_this_run - consolidated: [{name, into, evidence}, ...] - pruned: [names] - archived: retained as the union for backward compat Also: relabel the auto-transitions 'archived' counter to 'archived (no LLM, pure time-based staleness)' so it's clearly distinct from LLM-pass archives. Tests: 9 new tests in test_curator_classification.py covering consolidation evidence parsing (write_file/patch/create), hyphen/underscore name variants, self-reference rejection, destination-must-exist, mixed runs, and malformed-JSON fallback safety. Existing test_report_md_is_human_readable updated to cover the new section names. E2E: isolated HERMES_HOME, realistic 3-skill run, REPORT.md verified end-to-end. * feat(curator): hybrid model-declared + heuristic classification Extend the consolidated-vs-pruned split with LLM-authored intent: 1. Curator prompt now requires a structured YAML block at the end of the final response (consolidations / prunings with short rationale). 2. _parse_structured_summary() extracts it tolerantly — missing block, malformed YAML, partial lists all fall back to heuristic cleanly. 3. _reconcile_classification() merges model intent with the tool-call heuristic: - Model wins on rationale when its umbrella exists post-run - Model hallucination (umbrella doesn't exist) is downgraded to the heuristic's finding, or pruned if there's no evidence either - Heuristic catches model omission — consolidations the model enumerated tools for but forgot to list get surfaced with a '(detected via tool-call audit)' tag 4. REPORT.md now shows per-row rationale alongside 'removed → umbrella' and flags audit-only rows so the user knows why no reason is shown. Backward compat: run.json's 'archived' field (union) is preserved. 'pruned' is now a list of dicts with {name, source, reason}; 'pruned_names' is the flat-name list for legacy consumers. Tests: 15 new covering YAML parse edge cases (malformed, empty lists, bare-string entries, missing fields), reconciler rules (model wins, hallucination fallback, heuristic catches omission, prune with reason), and an end-to-end report-render test with all four paths exercised.
2026-04-30 10:31:23 -07:00
lines.append(f"- archived (no LLM, pure time-based staleness): {auto.get('archived', 0)}")
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307) Every curator pass now emits a dated report directory under `~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files: - `run.json` — machine-readable full record (before/after snapshot, state transitions, all tool calls, model/provider, timing, full LLM final response untruncated, error if any) - `REPORT.md` — human-readable markdown: model + duration header, auto-transition counts, LLM consolidation stats, archived-this-run list, new-skills-this-run list, state transitions, the full LLM final summary, and a recovery footer pointing at the archive + the `hermes curator restore` command Reports live under `logs/curator/`, not inside `skills/` — they're operational telemetry, not user-authored skill data, and belong alongside `agent.log` / `gateway.log`. Internals: - `_run_llm_review()` now returns a dict (final, summary, model, provider, tool_calls, error) instead of a bare truncated string so the reporter has full fidelity - Report writer is fully best-effort — any failure logs at DEBUG and never breaks the curator itself. Same-second rerun gets a numeric suffix so reports can't clobber each other - Report path stamped into `.curator_state` as `last_report_path` - `hermes curator status` surfaces a "last report:" line so users can immediately open the latest run Tests (all green): - 7 new tests in tests/agent/test_curator_reports.py covering: report location (logs not skills), both files written, run.json shape and diff accuracy, markdown structure, error path still writes, state transitions captured, same-second runs get unique dirs - Existing test_run_review_synchronous_invokes_llm_stub updated to stub the new dict-returning _run_llm_review signature Live E2E: ran a synchronous pass against a 1-skill test collection with a stubbed LLM; report written correctly, state stamped with last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
lines.append(f"- reactivated: {auto.get('reactivated', 0)}")
lines.append("")
# LLM pass numbers
tc_counts = p.get("tool_call_counts") or {}
lines.append("## LLM consolidation pass\n")
lines.append(f"- tool calls: **{counts.get('tool_calls_total', 0)}** "
f"(by name: {', '.join(f'{k}={v}' for k, v in sorted(tc_counts.items())) or 'none'})")
feat(curator): split archived into consolidated vs pruned with model + heuristic classification (#17941) * fix(curator): split 'archived' into consolidated vs pruned in run reports Users who watched a curator run saw skills like 'anthropic-api' listed under 'Skills archived' and interpreted that as pruning — but the curator had actually absorbed those skills into a new umbrella (e.g. 'llm-providers') during the same run. The directory gets archived for safety (all removals are recoverable), but the content still lives under a different name. Users then 'restored' what they thought were deleted skills and ended up with confusingly duplicated skillsets (old-name + absorbed-inside-umbrella). Classify removed skills using this run's skill_manage tool calls: - consolidated: content absorbed into a surviving/newly-created skill (evidenced by a skill_manage write_file/patch/create/edit whose target is a different skill AND whose file_path/content references the removed skill's name) - pruned: archived without consolidation evidence (truly stale) REPORT.md now shows two distinct sections: - 'Consolidated into umbrella skills' — with `removed → merged into umbrella` - 'Pruned — archived for staleness' — pure staleness archives run.json schema additions (backward compatible): - counts.consolidated_this_run, counts.pruned_this_run - consolidated: [{name, into, evidence}, ...] - pruned: [names] - archived: retained as the union for backward compat Also: relabel the auto-transitions 'archived' counter to 'archived (no LLM, pure time-based staleness)' so it's clearly distinct from LLM-pass archives. Tests: 9 new tests in test_curator_classification.py covering consolidation evidence parsing (write_file/patch/create), hyphen/underscore name variants, self-reference rejection, destination-must-exist, mixed runs, and malformed-JSON fallback safety. Existing test_report_md_is_human_readable updated to cover the new section names. E2E: isolated HERMES_HOME, realistic 3-skill run, REPORT.md verified end-to-end. * feat(curator): hybrid model-declared + heuristic classification Extend the consolidated-vs-pruned split with LLM-authored intent: 1. Curator prompt now requires a structured YAML block at the end of the final response (consolidations / prunings with short rationale). 2. _parse_structured_summary() extracts it tolerantly — missing block, malformed YAML, partial lists all fall back to heuristic cleanly. 3. _reconcile_classification() merges model intent with the tool-call heuristic: - Model wins on rationale when its umbrella exists post-run - Model hallucination (umbrella doesn't exist) is downgraded to the heuristic's finding, or pruned if there's no evidence either - Heuristic catches model omission — consolidations the model enumerated tools for but forgot to list get surfaced with a '(detected via tool-call audit)' tag 4. REPORT.md now shows per-row rationale alongside 'removed → umbrella' and flags audit-only rows so the user knows why no reason is shown. Backward compat: run.json's 'archived' field (union) is preserved. 'pruned' is now a list of dicts with {name, source, reason}; 'pruned_names' is the flat-name list for legacy consumers. Tests: 15 new covering YAML parse edge cases (malformed, empty lists, bare-string entries, missing fields), reconciler rules (model wins, hallucination fallback, heuristic catches omission, prune with reason), and an end-to-end report-render test with all four paths exercised.
2026-04-30 10:31:23 -07:00
lines.append(f"- consolidated into umbrellas: **{counts.get('consolidated_this_run', 0)}**")
lines.append(f"- pruned (archived for staleness): **{counts.get('pruned_this_run', 0)}**")
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307) Every curator pass now emits a dated report directory under `~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files: - `run.json` — machine-readable full record (before/after snapshot, state transitions, all tool calls, model/provider, timing, full LLM final response untruncated, error if any) - `REPORT.md` — human-readable markdown: model + duration header, auto-transition counts, LLM consolidation stats, archived-this-run list, new-skills-this-run list, state transitions, the full LLM final summary, and a recovery footer pointing at the archive + the `hermes curator restore` command Reports live under `logs/curator/`, not inside `skills/` — they're operational telemetry, not user-authored skill data, and belong alongside `agent.log` / `gateway.log`. Internals: - `_run_llm_review()` now returns a dict (final, summary, model, provider, tool_calls, error) instead of a bare truncated string so the reporter has full fidelity - Report writer is fully best-effort — any failure logs at DEBUG and never breaks the curator itself. Same-second rerun gets a numeric suffix so reports can't clobber each other - Report path stamped into `.curator_state` as `last_report_path` - `hermes curator status` surfaces a "last report:" line so users can immediately open the latest run Tests (all green): - 7 new tests in tests/agent/test_curator_reports.py covering: report location (logs not skills), both files written, run.json shape and diff accuracy, markdown structure, error path still writes, state transitions captured, same-second runs get unique dirs - Existing test_run_review_synchronous_invokes_llm_stub updated to stub the new dict-returning _run_llm_review signature Live E2E: ran a synchronous pass against a 1-skill test collection with a stubbed LLM; report written correctly, state stamped with last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
lines.append(f"- new skills this run: **{counts.get('added_this_run', 0)}**")
lines.append(f"- state transitions (active ↔ stale ↔ archived): "
f"**{counts.get('state_transitions', 0)}**")
lines.append("")
feat(curator): split archived into consolidated vs pruned with model + heuristic classification (#17941) * fix(curator): split 'archived' into consolidated vs pruned in run reports Users who watched a curator run saw skills like 'anthropic-api' listed under 'Skills archived' and interpreted that as pruning — but the curator had actually absorbed those skills into a new umbrella (e.g. 'llm-providers') during the same run. The directory gets archived for safety (all removals are recoverable), but the content still lives under a different name. Users then 'restored' what they thought were deleted skills and ended up with confusingly duplicated skillsets (old-name + absorbed-inside-umbrella). Classify removed skills using this run's skill_manage tool calls: - consolidated: content absorbed into a surviving/newly-created skill (evidenced by a skill_manage write_file/patch/create/edit whose target is a different skill AND whose file_path/content references the removed skill's name) - pruned: archived without consolidation evidence (truly stale) REPORT.md now shows two distinct sections: - 'Consolidated into umbrella skills' — with `removed → merged into umbrella` - 'Pruned — archived for staleness' — pure staleness archives run.json schema additions (backward compatible): - counts.consolidated_this_run, counts.pruned_this_run - consolidated: [{name, into, evidence}, ...] - pruned: [names] - archived: retained as the union for backward compat Also: relabel the auto-transitions 'archived' counter to 'archived (no LLM, pure time-based staleness)' so it's clearly distinct from LLM-pass archives. Tests: 9 new tests in test_curator_classification.py covering consolidation evidence parsing (write_file/patch/create), hyphen/underscore name variants, self-reference rejection, destination-must-exist, mixed runs, and malformed-JSON fallback safety. Existing test_report_md_is_human_readable updated to cover the new section names. E2E: isolated HERMES_HOME, realistic 3-skill run, REPORT.md verified end-to-end. * feat(curator): hybrid model-declared + heuristic classification Extend the consolidated-vs-pruned split with LLM-authored intent: 1. Curator prompt now requires a structured YAML block at the end of the final response (consolidations / prunings with short rationale). 2. _parse_structured_summary() extracts it tolerantly — missing block, malformed YAML, partial lists all fall back to heuristic cleanly. 3. _reconcile_classification() merges model intent with the tool-call heuristic: - Model wins on rationale when its umbrella exists post-run - Model hallucination (umbrella doesn't exist) is downgraded to the heuristic's finding, or pruned if there's no evidence either - Heuristic catches model omission — consolidations the model enumerated tools for but forgot to list get surfaced with a '(detected via tool-call audit)' tag 4. REPORT.md now shows per-row rationale alongside 'removed → umbrella' and flags audit-only rows so the user knows why no reason is shown. Backward compat: run.json's 'archived' field (union) is preserved. 'pruned' is now a list of dicts with {name, source, reason}; 'pruned_names' is the flat-name list for legacy consumers. Tests: 15 new covering YAML parse edge cases (malformed, empty lists, bare-string entries, missing fields), reconciler rules (model wins, hallucination fallback, heuristic catches omission, prune with reason), and an end-to-end report-render test with all four paths exercised.
2026-04-30 10:31:23 -07:00
# Consolidated list — content absorbed into an umbrella. The directory
# on disk still lives under ~/.hermes/skills/.archive/ (every removal is
# recoverable by design), but the "live" content for these skills
# continues to exist inside the destination umbrella.
consolidated = p.get("consolidated") or []
if consolidated:
lines.append(f"### Consolidated into umbrella skills ({len(consolidated)})\n")
lines.append(
"_These skills were **absorbed into another skill** during this run — "
"their content still lives, just under a different name. "
"The original directory was moved to `~/.hermes/skills/.archive/` for "
"safety and can be restored via `hermes curator restore <name>` if the "
"consolidation was wrong._\n"
)
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307) Every curator pass now emits a dated report directory under `~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files: - `run.json` — machine-readable full record (before/after snapshot, state transitions, all tool calls, model/provider, timing, full LLM final response untruncated, error if any) - `REPORT.md` — human-readable markdown: model + duration header, auto-transition counts, LLM consolidation stats, archived-this-run list, new-skills-this-run list, state transitions, the full LLM final summary, and a recovery footer pointing at the archive + the `hermes curator restore` command Reports live under `logs/curator/`, not inside `skills/` — they're operational telemetry, not user-authored skill data, and belong alongside `agent.log` / `gateway.log`. Internals: - `_run_llm_review()` now returns a dict (final, summary, model, provider, tool_calls, error) instead of a bare truncated string so the reporter has full fidelity - Report writer is fully best-effort — any failure logs at DEBUG and never breaks the curator itself. Same-second rerun gets a numeric suffix so reports can't clobber each other - Report path stamped into `.curator_state` as `last_report_path` - `hermes curator status` surfaces a "last report:" line so users can immediately open the latest run Tests (all green): - 7 new tests in tests/agent/test_curator_reports.py covering: report location (logs not skills), both files written, run.json shape and diff accuracy, markdown structure, error path still writes, state transitions captured, same-second runs get unique dirs - Existing test_run_review_synchronous_invokes_llm_stub updated to stub the new dict-returning _run_llm_review signature Live E2E: ran a synchronous pass against a 1-skill test collection with a stubbed LLM; report written correctly, state stamped with last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
SHOW = 50
feat(curator): split archived into consolidated vs pruned with model + heuristic classification (#17941) * fix(curator): split 'archived' into consolidated vs pruned in run reports Users who watched a curator run saw skills like 'anthropic-api' listed under 'Skills archived' and interpreted that as pruning — but the curator had actually absorbed those skills into a new umbrella (e.g. 'llm-providers') during the same run. The directory gets archived for safety (all removals are recoverable), but the content still lives under a different name. Users then 'restored' what they thought were deleted skills and ended up with confusingly duplicated skillsets (old-name + absorbed-inside-umbrella). Classify removed skills using this run's skill_manage tool calls: - consolidated: content absorbed into a surviving/newly-created skill (evidenced by a skill_manage write_file/patch/create/edit whose target is a different skill AND whose file_path/content references the removed skill's name) - pruned: archived without consolidation evidence (truly stale) REPORT.md now shows two distinct sections: - 'Consolidated into umbrella skills' — with `removed → merged into umbrella` - 'Pruned — archived for staleness' — pure staleness archives run.json schema additions (backward compatible): - counts.consolidated_this_run, counts.pruned_this_run - consolidated: [{name, into, evidence}, ...] - pruned: [names] - archived: retained as the union for backward compat Also: relabel the auto-transitions 'archived' counter to 'archived (no LLM, pure time-based staleness)' so it's clearly distinct from LLM-pass archives. Tests: 9 new tests in test_curator_classification.py covering consolidation evidence parsing (write_file/patch/create), hyphen/underscore name variants, self-reference rejection, destination-must-exist, mixed runs, and malformed-JSON fallback safety. Existing test_report_md_is_human_readable updated to cover the new section names. E2E: isolated HERMES_HOME, realistic 3-skill run, REPORT.md verified end-to-end. * feat(curator): hybrid model-declared + heuristic classification Extend the consolidated-vs-pruned split with LLM-authored intent: 1. Curator prompt now requires a structured YAML block at the end of the final response (consolidations / prunings with short rationale). 2. _parse_structured_summary() extracts it tolerantly — missing block, malformed YAML, partial lists all fall back to heuristic cleanly. 3. _reconcile_classification() merges model intent with the tool-call heuristic: - Model wins on rationale when its umbrella exists post-run - Model hallucination (umbrella doesn't exist) is downgraded to the heuristic's finding, or pruned if there's no evidence either - Heuristic catches model omission — consolidations the model enumerated tools for but forgot to list get surfaced with a '(detected via tool-call audit)' tag 4. REPORT.md now shows per-row rationale alongside 'removed → umbrella' and flags audit-only rows so the user knows why no reason is shown. Backward compat: run.json's 'archived' field (union) is preserved. 'pruned' is now a list of dicts with {name, source, reason}; 'pruned_names' is the flat-name list for legacy consumers. Tests: 15 new covering YAML parse edge cases (malformed, empty lists, bare-string entries, missing fields), reconciler rules (model wins, hallucination fallback, heuristic catches omission, prune with reason), and an end-to-end report-render test with all four paths exercised.
2026-04-30 10:31:23 -07:00
for entry in consolidated[:SHOW]:
name = entry.get("name", "?")
into = entry.get("into", "?")
reason = (entry.get("reason") or "").strip()
source = entry.get("source", "")
line = f"- `{name}` → merged into `{into}`"
if reason:
line += f"{reason}"
if source and source.startswith("tool-call audit"):
# The model didn't enumerate this one — surface that to the
# user so they know why the row has no rationale.
line += f" _(detected via {source})_"
lines.append(line)
if entry.get("model_claimed_into"):
lines.append(
f" ⚠ The curator's summary named `{entry['model_claimed_into']}` "
"as the umbrella but that skill doesn't exist post-run; "
"showing the tool-call audit's finding instead."
)
if len(consolidated) > SHOW:
lines.append(f"- … and {len(consolidated) - SHOW} more (see `run.json`)")
lines.append("")
# Pruned list — archived without consolidation. These are the
# "stale skill pruned" cases the UI should mark clearly.
pruned = p.get("pruned") or []
if pruned:
lines.append(f"### Pruned — archived for staleness ({len(pruned)})\n")
lines.append(
"_These skills were archived without being merged into an umbrella "
"(e.g. stale, unused, or judged irrelevant). "
"Directories live under `~/.hermes/skills/.archive/`. "
"Restore any via `hermes curator restore <name>`._\n"
)
SHOW = 50
for entry in pruned[:SHOW]:
# Entries are dicts with {name, source, reason} when written via
# the reconciler, or bare strings when an older format slipped
# through. Handle both.
if isinstance(entry, dict):
name = entry.get("name", "?")
reason = (entry.get("reason") or "").strip()
line = f"- `{name}`"
if reason:
line += f"{reason}"
lines.append(line)
else:
lines.append(f"- `{entry}`")
if len(pruned) > SHOW:
lines.append(f"- … and {len(pruned) - SHOW} more (see `run.json`)")
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307) Every curator pass now emits a dated report directory under `~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files: - `run.json` — machine-readable full record (before/after snapshot, state transitions, all tool calls, model/provider, timing, full LLM final response untruncated, error if any) - `REPORT.md` — human-readable markdown: model + duration header, auto-transition counts, LLM consolidation stats, archived-this-run list, new-skills-this-run list, state transitions, the full LLM final summary, and a recovery footer pointing at the archive + the `hermes curator restore` command Reports live under `logs/curator/`, not inside `skills/` — they're operational telemetry, not user-authored skill data, and belong alongside `agent.log` / `gateway.log`. Internals: - `_run_llm_review()` now returns a dict (final, summary, model, provider, tool_calls, error) instead of a bare truncated string so the reporter has full fidelity - Report writer is fully best-effort — any failure logs at DEBUG and never breaks the curator itself. Same-second rerun gets a numeric suffix so reports can't clobber each other - Report path stamped into `.curator_state` as `last_report_path` - `hermes curator status` surfaces a "last report:" line so users can immediately open the latest run Tests (all green): - 7 new tests in tests/agent/test_curator_reports.py covering: report location (logs not skills), both files written, run.json shape and diff accuracy, markdown structure, error path still writes, state transitions captured, same-second runs get unique dirs - Existing test_run_review_synchronous_invokes_llm_stub updated to stub the new dict-returning _run_llm_review signature Live E2E: ran a synchronous pass against a 1-skill test collection with a stubbed LLM; report written correctly, state stamped with last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
lines.append("")
# Added list
added = p.get("added") or []
if added:
lines.append(f"### New skills this run ({len(added)})\n")
lines.append("_Usually these are new class-level umbrellas created via `skill_manage action=create`._\n")
for n in added:
lines.append(f"- `{n}`")
lines.append("")
# State transitions
trans = p.get("state_transitions") or []
if trans:
lines.append(f"### State transitions ({len(trans)})\n")
for t in trans:
lines.append(f"- `{t.get('name')}`: {t.get('from')}{t.get('to')}")
lines.append("")
# Full LLM final response
final = (p.get("llm_final") or "").strip()
if final:
lines.append("## LLM final summary\n")
lines.append(final)
lines.append("")
elif not error:
llm_sum = p.get("llm_summary") or ""
if llm_sum:
lines.append("## LLM summary\n")
lines.append(llm_sum)
lines.append("")
# Recovery footer
lines.append("## Recovery\n")
lines.append("- Restore an archived skill: `hermes curator restore <name>`")
lines.append("- All archives live under `~/.hermes/skills/.archive/` and are recoverable by `mv`")
lines.append("- See `run.json` in this directory for the full machine-readable record.")
lines.append("")
return "\n".join(lines)
feat(curator): background skill maintenance (issue #7816) Adds the Curator — an auxiliary-model background task that periodically reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage, transitions unused skills through active → stale → archived, and spawns a forked AIAgent to consolidate overlaps and patch drift. Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI startup and gateway boot when the last run is older than interval_hours (default 24) AND the agent has been idle for min_idle_hours (default 2). Invariants (all load-bearing): - Never touches bundled or hub-installed skills (.bundled_manifest + .hub/lock.json double-filter) - Never auto-deletes — archive only. Archives are recoverable via `hermes curator restore <skill>` - Pinned skills bypass all auto-transitions - Uses the aux client; never touches the main session's prompt cache New files: - tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes, provenance filter - agent/curator.py — orchestrator: config, idle gating, state-machine transitions (pure, no LLM), forked-agent review prompt - hermes_cli/curator.py — `hermes curator {status,run,pause,resume, pin,unpin,restore}` subcommand - tests/tools/test_skill_usage.py — 29 tests - tests/agent/test_curator.py — 25 tests Modified files (surgical patches): - tools/skills_tool.py — bump view_count on successful skill_view - tools/skill_manager_tool.py — bump patch_count on skill_manage patch/edit/write_file/remove_file; forget record on delete - hermes_cli/config.py — add curator: section to DEFAULT_CONFIG - hermes_cli/commands.py — add /curator CommandDef with subcommands - hermes_cli/main.py — register `hermes curator` subparser via register_cli() from hermes_cli.curator - cli.py — /curator slash-command dispatch + startup hook - gateway/run.py — gateway-boot hook (mirrors CLI) Validation: - 54 new tests across skill_usage + curator, all passing in 3s - 346 tests across all touched files' neighbors green - 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green - CLI smoke: `hermes curator status/pause/resume` work end-to-end Companion to PR #16026 (class-first skill review prompt) — together they form a loop: the review prompt stops near-duplicate skill creation at the source, and the curator prunes/consolidates what still accumulates. Refs #7816.
2026-04-26 06:08:39 -07:00
# ---------------------------------------------------------------------------
# Orchestrator — spawn a forked AIAgent for the LLM review pass
# ---------------------------------------------------------------------------
def _render_candidate_list() -> str:
"""Human/agent-readable list of agent-created skills with usage stats."""
rows = skill_usage.agent_created_report()
if not rows:
return "No agent-created skills to review."
lines = [f"Agent-created skills ({len(rows)}):\n"]
for r in rows:
lines.append(
f"- {r['name']} "
f"state={r['state']} "
f"pinned={'yes' if r.get('pinned') else 'no'} "
f"activity={r.get('activity_count', 0)} "
feat(curator): background skill maintenance (issue #7816) Adds the Curator — an auxiliary-model background task that periodically reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage, transitions unused skills through active → stale → archived, and spawns a forked AIAgent to consolidate overlaps and patch drift. Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI startup and gateway boot when the last run is older than interval_hours (default 24) AND the agent has been idle for min_idle_hours (default 2). Invariants (all load-bearing): - Never touches bundled or hub-installed skills (.bundled_manifest + .hub/lock.json double-filter) - Never auto-deletes — archive only. Archives are recoverable via `hermes curator restore <skill>` - Pinned skills bypass all auto-transitions - Uses the aux client; never touches the main session's prompt cache New files: - tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes, provenance filter - agent/curator.py — orchestrator: config, idle gating, state-machine transitions (pure, no LLM), forked-agent review prompt - hermes_cli/curator.py — `hermes curator {status,run,pause,resume, pin,unpin,restore}` subcommand - tests/tools/test_skill_usage.py — 29 tests - tests/agent/test_curator.py — 25 tests Modified files (surgical patches): - tools/skills_tool.py — bump view_count on successful skill_view - tools/skill_manager_tool.py — bump patch_count on skill_manage patch/edit/write_file/remove_file; forget record on delete - hermes_cli/config.py — add curator: section to DEFAULT_CONFIG - hermes_cli/commands.py — add /curator CommandDef with subcommands - hermes_cli/main.py — register `hermes curator` subparser via register_cli() from hermes_cli.curator - cli.py — /curator slash-command dispatch + startup hook - gateway/run.py — gateway-boot hook (mirrors CLI) Validation: - 54 new tests across skill_usage + curator, all passing in 3s - 346 tests across all touched files' neighbors green - 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green - CLI smoke: `hermes curator status/pause/resume` work end-to-end Companion to PR #16026 (class-first skill review prompt) — together they form a loop: the review prompt stops near-duplicate skill creation at the source, and the curator prunes/consolidates what still accumulates. Refs #7816.
2026-04-26 06:08:39 -07:00
f"use={r.get('use_count', 0)} "
f"view={r.get('view_count', 0)} "
f"patches={r.get('patch_count', 0)} "
f"last_activity={r.get('last_activity_at') or 'never'}"
feat(curator): background skill maintenance (issue #7816) Adds the Curator — an auxiliary-model background task that periodically reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage, transitions unused skills through active → stale → archived, and spawns a forked AIAgent to consolidate overlaps and patch drift. Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI startup and gateway boot when the last run is older than interval_hours (default 24) AND the agent has been idle for min_idle_hours (default 2). Invariants (all load-bearing): - Never touches bundled or hub-installed skills (.bundled_manifest + .hub/lock.json double-filter) - Never auto-deletes — archive only. Archives are recoverable via `hermes curator restore <skill>` - Pinned skills bypass all auto-transitions - Uses the aux client; never touches the main session's prompt cache New files: - tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes, provenance filter - agent/curator.py — orchestrator: config, idle gating, state-machine transitions (pure, no LLM), forked-agent review prompt - hermes_cli/curator.py — `hermes curator {status,run,pause,resume, pin,unpin,restore}` subcommand - tests/tools/test_skill_usage.py — 29 tests - tests/agent/test_curator.py — 25 tests Modified files (surgical patches): - tools/skills_tool.py — bump view_count on successful skill_view - tools/skill_manager_tool.py — bump patch_count on skill_manage patch/edit/write_file/remove_file; forget record on delete - hermes_cli/config.py — add curator: section to DEFAULT_CONFIG - hermes_cli/commands.py — add /curator CommandDef with subcommands - hermes_cli/main.py — register `hermes curator` subparser via register_cli() from hermes_cli.curator - cli.py — /curator slash-command dispatch + startup hook - gateway/run.py — gateway-boot hook (mirrors CLI) Validation: - 54 new tests across skill_usage + curator, all passing in 3s - 346 tests across all touched files' neighbors green - 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green - CLI smoke: `hermes curator status/pause/resume` work end-to-end Companion to PR #16026 (class-first skill review prompt) — together they form a loop: the review prompt stops near-duplicate skill creation at the source, and the curator prunes/consolidates what still accumulates. Refs #7816.
2026-04-26 06:08:39 -07:00
)
return "\n".join(lines)
def run_curator_review(
on_summary: Optional[Callable[[str], None]] = None,
synchronous: bool = False,
) -> Dict[str, Any]:
"""Execute a single curator review pass.
Steps:
1. Apply automatic state transitions (pure, no LLM).
2. If there are agent-created skills, spawn a forked AIAgent that runs
the LLM review prompt against the current candidate list.
3. Update .curator_state with last_run_at and a one-line summary.
4. Invoke *on_summary* with a user-visible description.
If *synchronous* is True, the LLM review runs in the calling thread; the
default is to spawn a daemon thread so the caller returns immediately.
"""
start = datetime.now(timezone.utc)
counts = apply_automatic_transitions(now=start)
auto_summary_parts = []
if counts["marked_stale"]:
auto_summary_parts.append(f"{counts['marked_stale']} marked stale")
if counts["archived"]:
auto_summary_parts.append(f"{counts['archived']} archived")
if counts["reactivated"]:
auto_summary_parts.append(f"{counts['reactivated']} reactivated")
auto_summary = ", ".join(auto_summary_parts) if auto_summary_parts else "no changes"
# Persist state before the LLM pass so a crash mid-review still records
# the run and doesn't immediately re-trigger.
state = load_state()
state["last_run_at"] = start.isoformat()
state["run_count"] = int(state.get("run_count", 0)) + 1
state["last_run_summary"] = f"auto: {auto_summary}"
save_state(state)
def _llm_pass():
nonlocal auto_summary
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307) Every curator pass now emits a dated report directory under `~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files: - `run.json` — machine-readable full record (before/after snapshot, state transitions, all tool calls, model/provider, timing, full LLM final response untruncated, error if any) - `REPORT.md` — human-readable markdown: model + duration header, auto-transition counts, LLM consolidation stats, archived-this-run list, new-skills-this-run list, state transitions, the full LLM final summary, and a recovery footer pointing at the archive + the `hermes curator restore` command Reports live under `logs/curator/`, not inside `skills/` — they're operational telemetry, not user-authored skill data, and belong alongside `agent.log` / `gateway.log`. Internals: - `_run_llm_review()` now returns a dict (final, summary, model, provider, tool_calls, error) instead of a bare truncated string so the reporter has full fidelity - Report writer is fully best-effort — any failure logs at DEBUG and never breaks the curator itself. Same-second rerun gets a numeric suffix so reports can't clobber each other - Report path stamped into `.curator_state` as `last_report_path` - `hermes curator status` surfaces a "last report:" line so users can immediately open the latest run Tests (all green): - 7 new tests in tests/agent/test_curator_reports.py covering: report location (logs not skills), both files written, run.json shape and diff accuracy, markdown structure, error path still writes, state transitions captured, same-second runs get unique dirs - Existing test_run_review_synchronous_invokes_llm_stub updated to stub the new dict-returning _run_llm_review signature Live E2E: ran a synchronous pass against a 1-skill test collection with a stubbed LLM; report written correctly, state stamped with last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
# Snapshot skill state BEFORE the LLM pass so the report can diff.
try:
before_report = skill_usage.agent_created_report()
except Exception:
before_report = []
before_names = {r.get("name") for r in before_report if isinstance(r, dict)}
llm_meta: Dict[str, Any] = {}
feat(curator): background skill maintenance (issue #7816) Adds the Curator — an auxiliary-model background task that periodically reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage, transitions unused skills through active → stale → archived, and spawns a forked AIAgent to consolidate overlaps and patch drift. Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI startup and gateway boot when the last run is older than interval_hours (default 24) AND the agent has been idle for min_idle_hours (default 2). Invariants (all load-bearing): - Never touches bundled or hub-installed skills (.bundled_manifest + .hub/lock.json double-filter) - Never auto-deletes — archive only. Archives are recoverable via `hermes curator restore <skill>` - Pinned skills bypass all auto-transitions - Uses the aux client; never touches the main session's prompt cache New files: - tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes, provenance filter - agent/curator.py — orchestrator: config, idle gating, state-machine transitions (pure, no LLM), forked-agent review prompt - hermes_cli/curator.py — `hermes curator {status,run,pause,resume, pin,unpin,restore}` subcommand - tests/tools/test_skill_usage.py — 29 tests - tests/agent/test_curator.py — 25 tests Modified files (surgical patches): - tools/skills_tool.py — bump view_count on successful skill_view - tools/skill_manager_tool.py — bump patch_count on skill_manage patch/edit/write_file/remove_file; forget record on delete - hermes_cli/config.py — add curator: section to DEFAULT_CONFIG - hermes_cli/commands.py — add /curator CommandDef with subcommands - hermes_cli/main.py — register `hermes curator` subparser via register_cli() from hermes_cli.curator - cli.py — /curator slash-command dispatch + startup hook - gateway/run.py — gateway-boot hook (mirrors CLI) Validation: - 54 new tests across skill_usage + curator, all passing in 3s - 346 tests across all touched files' neighbors green - 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green - CLI smoke: `hermes curator status/pause/resume` work end-to-end Companion to PR #16026 (class-first skill review prompt) — together they form a loop: the review prompt stops near-duplicate skill creation at the source, and the curator prunes/consolidates what still accumulates. Refs #7816.
2026-04-26 06:08:39 -07:00
try:
candidate_list = _render_candidate_list()
if "No agent-created skills" in candidate_list:
final_summary = f"auto: {auto_summary}; llm: skipped (no candidates)"
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307) Every curator pass now emits a dated report directory under `~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files: - `run.json` — machine-readable full record (before/after snapshot, state transitions, all tool calls, model/provider, timing, full LLM final response untruncated, error if any) - `REPORT.md` — human-readable markdown: model + duration header, auto-transition counts, LLM consolidation stats, archived-this-run list, new-skills-this-run list, state transitions, the full LLM final summary, and a recovery footer pointing at the archive + the `hermes curator restore` command Reports live under `logs/curator/`, not inside `skills/` — they're operational telemetry, not user-authored skill data, and belong alongside `agent.log` / `gateway.log`. Internals: - `_run_llm_review()` now returns a dict (final, summary, model, provider, tool_calls, error) instead of a bare truncated string so the reporter has full fidelity - Report writer is fully best-effort — any failure logs at DEBUG and never breaks the curator itself. Same-second rerun gets a numeric suffix so reports can't clobber each other - Report path stamped into `.curator_state` as `last_report_path` - `hermes curator status` surfaces a "last report:" line so users can immediately open the latest run Tests (all green): - 7 new tests in tests/agent/test_curator_reports.py covering: report location (logs not skills), both files written, run.json shape and diff accuracy, markdown structure, error path still writes, state transitions captured, same-second runs get unique dirs - Existing test_run_review_synchronous_invokes_llm_stub updated to stub the new dict-returning _run_llm_review signature Live E2E: ran a synchronous pass against a 1-skill test collection with a stubbed LLM; report written correctly, state stamped with last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
llm_meta = {
"final": "",
"summary": "skipped (no candidates)",
"model": "",
"provider": "",
"tool_calls": [],
"error": None,
}
feat(curator): background skill maintenance (issue #7816) Adds the Curator — an auxiliary-model background task that periodically reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage, transitions unused skills through active → stale → archived, and spawns a forked AIAgent to consolidate overlaps and patch drift. Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI startup and gateway boot when the last run is older than interval_hours (default 24) AND the agent has been idle for min_idle_hours (default 2). Invariants (all load-bearing): - Never touches bundled or hub-installed skills (.bundled_manifest + .hub/lock.json double-filter) - Never auto-deletes — archive only. Archives are recoverable via `hermes curator restore <skill>` - Pinned skills bypass all auto-transitions - Uses the aux client; never touches the main session's prompt cache New files: - tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes, provenance filter - agent/curator.py — orchestrator: config, idle gating, state-machine transitions (pure, no LLM), forked-agent review prompt - hermes_cli/curator.py — `hermes curator {status,run,pause,resume, pin,unpin,restore}` subcommand - tests/tools/test_skill_usage.py — 29 tests - tests/agent/test_curator.py — 25 tests Modified files (surgical patches): - tools/skills_tool.py — bump view_count on successful skill_view - tools/skill_manager_tool.py — bump patch_count on skill_manage patch/edit/write_file/remove_file; forget record on delete - hermes_cli/config.py — add curator: section to DEFAULT_CONFIG - hermes_cli/commands.py — add /curator CommandDef with subcommands - hermes_cli/main.py — register `hermes curator` subparser via register_cli() from hermes_cli.curator - cli.py — /curator slash-command dispatch + startup hook - gateway/run.py — gateway-boot hook (mirrors CLI) Validation: - 54 new tests across skill_usage + curator, all passing in 3s - 346 tests across all touched files' neighbors green - 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green - CLI smoke: `hermes curator status/pause/resume` work end-to-end Companion to PR #16026 (class-first skill review prompt) — together they form a loop: the review prompt stops near-duplicate skill creation at the source, and the curator prunes/consolidates what still accumulates. Refs #7816.
2026-04-26 06:08:39 -07:00
else:
prompt = f"{CURATOR_REVIEW_PROMPT}\n\n{candidate_list}"
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307) Every curator pass now emits a dated report directory under `~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files: - `run.json` — machine-readable full record (before/after snapshot, state transitions, all tool calls, model/provider, timing, full LLM final response untruncated, error if any) - `REPORT.md` — human-readable markdown: model + duration header, auto-transition counts, LLM consolidation stats, archived-this-run list, new-skills-this-run list, state transitions, the full LLM final summary, and a recovery footer pointing at the archive + the `hermes curator restore` command Reports live under `logs/curator/`, not inside `skills/` — they're operational telemetry, not user-authored skill data, and belong alongside `agent.log` / `gateway.log`. Internals: - `_run_llm_review()` now returns a dict (final, summary, model, provider, tool_calls, error) instead of a bare truncated string so the reporter has full fidelity - Report writer is fully best-effort — any failure logs at DEBUG and never breaks the curator itself. Same-second rerun gets a numeric suffix so reports can't clobber each other - Report path stamped into `.curator_state` as `last_report_path` - `hermes curator status` surfaces a "last report:" line so users can immediately open the latest run Tests (all green): - 7 new tests in tests/agent/test_curator_reports.py covering: report location (logs not skills), both files written, run.json shape and diff accuracy, markdown structure, error path still writes, state transitions captured, same-second runs get unique dirs - Existing test_run_review_synchronous_invokes_llm_stub updated to stub the new dict-returning _run_llm_review signature Live E2E: ran a synchronous pass against a 1-skill test collection with a stubbed LLM; report written correctly, state stamped with last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
llm_meta = _run_llm_review(prompt)
final_summary = (
f"auto: {auto_summary}; llm: {llm_meta.get('summary', 'no change')}"
)
feat(curator): background skill maintenance (issue #7816) Adds the Curator — an auxiliary-model background task that periodically reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage, transitions unused skills through active → stale → archived, and spawns a forked AIAgent to consolidate overlaps and patch drift. Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI startup and gateway boot when the last run is older than interval_hours (default 24) AND the agent has been idle for min_idle_hours (default 2). Invariants (all load-bearing): - Never touches bundled or hub-installed skills (.bundled_manifest + .hub/lock.json double-filter) - Never auto-deletes — archive only. Archives are recoverable via `hermes curator restore <skill>` - Pinned skills bypass all auto-transitions - Uses the aux client; never touches the main session's prompt cache New files: - tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes, provenance filter - agent/curator.py — orchestrator: config, idle gating, state-machine transitions (pure, no LLM), forked-agent review prompt - hermes_cli/curator.py — `hermes curator {status,run,pause,resume, pin,unpin,restore}` subcommand - tests/tools/test_skill_usage.py — 29 tests - tests/agent/test_curator.py — 25 tests Modified files (surgical patches): - tools/skills_tool.py — bump view_count on successful skill_view - tools/skill_manager_tool.py — bump patch_count on skill_manage patch/edit/write_file/remove_file; forget record on delete - hermes_cli/config.py — add curator: section to DEFAULT_CONFIG - hermes_cli/commands.py — add /curator CommandDef with subcommands - hermes_cli/main.py — register `hermes curator` subparser via register_cli() from hermes_cli.curator - cli.py — /curator slash-command dispatch + startup hook - gateway/run.py — gateway-boot hook (mirrors CLI) Validation: - 54 new tests across skill_usage + curator, all passing in 3s - 346 tests across all touched files' neighbors green - 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green - CLI smoke: `hermes curator status/pause/resume` work end-to-end Companion to PR #16026 (class-first skill review prompt) — together they form a loop: the review prompt stops near-duplicate skill creation at the source, and the curator prunes/consolidates what still accumulates. Refs #7816.
2026-04-26 06:08:39 -07:00
except Exception as e:
logger.debug("Curator LLM pass failed: %s", e, exc_info=True)
final_summary = f"auto: {auto_summary}; llm: error ({e})"
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307) Every curator pass now emits a dated report directory under `~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files: - `run.json` — machine-readable full record (before/after snapshot, state transitions, all tool calls, model/provider, timing, full LLM final response untruncated, error if any) - `REPORT.md` — human-readable markdown: model + duration header, auto-transition counts, LLM consolidation stats, archived-this-run list, new-skills-this-run list, state transitions, the full LLM final summary, and a recovery footer pointing at the archive + the `hermes curator restore` command Reports live under `logs/curator/`, not inside `skills/` — they're operational telemetry, not user-authored skill data, and belong alongside `agent.log` / `gateway.log`. Internals: - `_run_llm_review()` now returns a dict (final, summary, model, provider, tool_calls, error) instead of a bare truncated string so the reporter has full fidelity - Report writer is fully best-effort — any failure logs at DEBUG and never breaks the curator itself. Same-second rerun gets a numeric suffix so reports can't clobber each other - Report path stamped into `.curator_state` as `last_report_path` - `hermes curator status` surfaces a "last report:" line so users can immediately open the latest run Tests (all green): - 7 new tests in tests/agent/test_curator_reports.py covering: report location (logs not skills), both files written, run.json shape and diff accuracy, markdown structure, error path still writes, state transitions captured, same-second runs get unique dirs - Existing test_run_review_synchronous_invokes_llm_stub updated to stub the new dict-returning _run_llm_review signature Live E2E: ran a synchronous pass against a 1-skill test collection with a stubbed LLM; report written correctly, state stamped with last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
llm_meta = {
"final": "",
"summary": f"error ({e})",
"model": "",
"provider": "",
"tool_calls": [],
"error": str(e),
}
feat(curator): background skill maintenance (issue #7816) Adds the Curator — an auxiliary-model background task that periodically reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage, transitions unused skills through active → stale → archived, and spawns a forked AIAgent to consolidate overlaps and patch drift. Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI startup and gateway boot when the last run is older than interval_hours (default 24) AND the agent has been idle for min_idle_hours (default 2). Invariants (all load-bearing): - Never touches bundled or hub-installed skills (.bundled_manifest + .hub/lock.json double-filter) - Never auto-deletes — archive only. Archives are recoverable via `hermes curator restore <skill>` - Pinned skills bypass all auto-transitions - Uses the aux client; never touches the main session's prompt cache New files: - tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes, provenance filter - agent/curator.py — orchestrator: config, idle gating, state-machine transitions (pure, no LLM), forked-agent review prompt - hermes_cli/curator.py — `hermes curator {status,run,pause,resume, pin,unpin,restore}` subcommand - tests/tools/test_skill_usage.py — 29 tests - tests/agent/test_curator.py — 25 tests Modified files (surgical patches): - tools/skills_tool.py — bump view_count on successful skill_view - tools/skill_manager_tool.py — bump patch_count on skill_manage patch/edit/write_file/remove_file; forget record on delete - hermes_cli/config.py — add curator: section to DEFAULT_CONFIG - hermes_cli/commands.py — add /curator CommandDef with subcommands - hermes_cli/main.py — register `hermes curator` subparser via register_cli() from hermes_cli.curator - cli.py — /curator slash-command dispatch + startup hook - gateway/run.py — gateway-boot hook (mirrors CLI) Validation: - 54 new tests across skill_usage + curator, all passing in 3s - 346 tests across all touched files' neighbors green - 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green - CLI smoke: `hermes curator status/pause/resume` work end-to-end Companion to PR #16026 (class-first skill review prompt) — together they form a loop: the review prompt stops near-duplicate skill creation at the source, and the curator prunes/consolidates what still accumulates. Refs #7816.
2026-04-26 06:08:39 -07:00
elapsed = (datetime.now(timezone.utc) - start).total_seconds()
state2 = load_state()
state2["last_run_duration_seconds"] = elapsed
state2["last_run_summary"] = final_summary
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307) Every curator pass now emits a dated report directory under `~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files: - `run.json` — machine-readable full record (before/after snapshot, state transitions, all tool calls, model/provider, timing, full LLM final response untruncated, error if any) - `REPORT.md` — human-readable markdown: model + duration header, auto-transition counts, LLM consolidation stats, archived-this-run list, new-skills-this-run list, state transitions, the full LLM final summary, and a recovery footer pointing at the archive + the `hermes curator restore` command Reports live under `logs/curator/`, not inside `skills/` — they're operational telemetry, not user-authored skill data, and belong alongside `agent.log` / `gateway.log`. Internals: - `_run_llm_review()` now returns a dict (final, summary, model, provider, tool_calls, error) instead of a bare truncated string so the reporter has full fidelity - Report writer is fully best-effort — any failure logs at DEBUG and never breaks the curator itself. Same-second rerun gets a numeric suffix so reports can't clobber each other - Report path stamped into `.curator_state` as `last_report_path` - `hermes curator status` surfaces a "last report:" line so users can immediately open the latest run Tests (all green): - 7 new tests in tests/agent/test_curator_reports.py covering: report location (logs not skills), both files written, run.json shape and diff accuracy, markdown structure, error path still writes, state transitions captured, same-second runs get unique dirs - Existing test_run_review_synchronous_invokes_llm_stub updated to stub the new dict-returning _run_llm_review signature Live E2E: ran a synchronous pass against a 1-skill test collection with a stubbed LLM; report written correctly, state stamped with last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
# Write the per-run report. Runs in a best-effort try so a
# reporting bug never breaks the curator itself. Report path is
# recorded in state so `hermes curator status` can point at it.
try:
after_report = skill_usage.agent_created_report()
except Exception:
after_report = []
try:
report_path = _write_run_report(
started_at=start,
elapsed_seconds=elapsed,
auto_counts=counts,
auto_summary=auto_summary,
before_report=before_report,
before_names=before_names,
after_report=after_report,
llm_meta=llm_meta,
)
if report_path is not None:
state2["last_report_path"] = str(report_path)
except Exception as e:
logger.debug("Curator report write failed: %s", e, exc_info=True)
feat(curator): background skill maintenance (issue #7816) Adds the Curator — an auxiliary-model background task that periodically reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage, transitions unused skills through active → stale → archived, and spawns a forked AIAgent to consolidate overlaps and patch drift. Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI startup and gateway boot when the last run is older than interval_hours (default 24) AND the agent has been idle for min_idle_hours (default 2). Invariants (all load-bearing): - Never touches bundled or hub-installed skills (.bundled_manifest + .hub/lock.json double-filter) - Never auto-deletes — archive only. Archives are recoverable via `hermes curator restore <skill>` - Pinned skills bypass all auto-transitions - Uses the aux client; never touches the main session's prompt cache New files: - tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes, provenance filter - agent/curator.py — orchestrator: config, idle gating, state-machine transitions (pure, no LLM), forked-agent review prompt - hermes_cli/curator.py — `hermes curator {status,run,pause,resume, pin,unpin,restore}` subcommand - tests/tools/test_skill_usage.py — 29 tests - tests/agent/test_curator.py — 25 tests Modified files (surgical patches): - tools/skills_tool.py — bump view_count on successful skill_view - tools/skill_manager_tool.py — bump patch_count on skill_manage patch/edit/write_file/remove_file; forget record on delete - hermes_cli/config.py — add curator: section to DEFAULT_CONFIG - hermes_cli/commands.py — add /curator CommandDef with subcommands - hermes_cli/main.py — register `hermes curator` subparser via register_cli() from hermes_cli.curator - cli.py — /curator slash-command dispatch + startup hook - gateway/run.py — gateway-boot hook (mirrors CLI) Validation: - 54 new tests across skill_usage + curator, all passing in 3s - 346 tests across all touched files' neighbors green - 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green - CLI smoke: `hermes curator status/pause/resume` work end-to-end Companion to PR #16026 (class-first skill review prompt) — together they form a loop: the review prompt stops near-duplicate skill creation at the source, and the curator prunes/consolidates what still accumulates. Refs #7816.
2026-04-26 06:08:39 -07:00
save_state(state2)
if on_summary:
try:
on_summary(f"curator: {final_summary}")
except Exception:
pass
if synchronous:
_llm_pass()
else:
t = threading.Thread(target=_llm_pass, daemon=True, name="curator-review")
t.start()
return {
"started_at": start.isoformat(),
"auto_transitions": counts,
"summary_so_far": auto_summary,
}
fix(curator): unify under auxiliary.curator (hermes model, dashboard) (#17868) Voscko reported curator.auxiliary.provider/model was advertised in the docs but ignored — the review fork read only model.provider/default. The narrow fix would wire the one-off key through, but that leaves curator as a parallel system: not in `hermes model` → auxiliary picker, not in the dashboard Models tab, missing per-task base_url/api_key/timeout/ extra_body. Unify curator with the rest of the aux task system so `hermes model` and the dashboard configure it like every other aux task. Four sources of truth updated: - hermes_cli/config.py — add 'curator' slot to DEFAULT_CONFIG.auxiliary (timeout=600 since reviews run long), drop the one-off curator.auxiliary block from DEFAULT_CONFIG.curator. - hermes_cli/main.py — add ('curator', 'Curator', 'skill-usage review pass') to _AUX_TASKS so the CLI picker offers it. - hermes_cli/web_server.py — add 'curator' to _AUX_TASK_SLOTS so the dashboard REST endpoint accepts it. - web/src/pages/ModelsPage.tsx — add Curator entry so the dashboard Models tab renders the task. agent/curator.py _resolve_review_model() now reads auxiliary.curator first (canonical), falls back to legacy curator.auxiliary (with an info log asking users to migrate), then falls back to the main chat model. Pre-unification users keep working. Docs updated: docs/user-guide/features/curator.md now points at `hermes model` → auxiliary → Curator and the dashboard Models tab. Tests: 6 unit tests on _resolve_review_model (auto default, canonical slot honored, partial override fallback, legacy fallback with deprecation log assertion, new-wins-over-legacy, empty-config safety) plus a cross-registry test that curator is wired into all four sources of truth. test_aux_tasks_keys_all_exist_in_default_config already covers the DEFAULT_CONFIG ↔ _AUX_TASKS invariant. Reported by Voscko on Discord.
2026-04-30 02:46:01 -07:00
def _resolve_review_model(cfg: Dict[str, Any]) -> tuple[str, str]:
"""Pick (provider, model) for the curator review fork.
Curator is a regular auxiliary task slot ``auxiliary.curator.{provider,model}``
so it participates in the canonical aux-model plumbing (``hermes model``
auxiliary picker, the dashboard Models tab, ``auxiliary.curator.{timeout,
base_url,api_key,extra_body}``). ``provider: "auto"`` with an empty model
means "use the main chat model" same default as every other aux task.
Legacy fallback: users who configured ``curator.auxiliary.{provider,model}``
under the previous one-off schema still work. Precedence:
1. ``auxiliary.curator.{provider,model}`` when both are set non-auto
2. Legacy ``curator.auxiliary.{provider,model}`` when both are set
3. Main ``model.{provider,default/model}`` pair
"""
_main = cfg.get("model", {}) if isinstance(cfg.get("model"), dict) else {}
_main_provider = _main.get("provider") or "auto"
_main_model = _main.get("default") or _main.get("model") or ""
# 1. Canonical aux task slot
_aux = cfg.get("auxiliary", {}) if isinstance(cfg.get("auxiliary"), dict) else {}
_cur_task = _aux.get("curator", {}) if isinstance(_aux.get("curator"), dict) else {}
_task_provider = (_cur_task.get("provider") or "").strip() or None
_task_model = (_cur_task.get("model") or "").strip() or None
if _task_provider and _task_provider != "auto" and _task_model:
return _task_provider, _task_model
# 2. Legacy curator.auxiliary.{provider,model} (deprecated, pre-unification)
_cur = cfg.get("curator", {}) if isinstance(cfg.get("curator"), dict) else {}
_legacy = _cur.get("auxiliary", {}) if isinstance(_cur.get("auxiliary"), dict) else {}
_legacy_provider = _legacy.get("provider") or None
_legacy_model = _legacy.get("model") or None
if _legacy_provider and _legacy_model:
logger.info(
"curator: using deprecated curator.auxiliary.{provider,model} "
"config — please migrate to auxiliary.curator.{provider,model}"
)
return _legacy_provider, _legacy_model
# 3. Fall through to the main chat model
return _main_provider, _main_model
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307) Every curator pass now emits a dated report directory under `~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files: - `run.json` — machine-readable full record (before/after snapshot, state transitions, all tool calls, model/provider, timing, full LLM final response untruncated, error if any) - `REPORT.md` — human-readable markdown: model + duration header, auto-transition counts, LLM consolidation stats, archived-this-run list, new-skills-this-run list, state transitions, the full LLM final summary, and a recovery footer pointing at the archive + the `hermes curator restore` command Reports live under `logs/curator/`, not inside `skills/` — they're operational telemetry, not user-authored skill data, and belong alongside `agent.log` / `gateway.log`. Internals: - `_run_llm_review()` now returns a dict (final, summary, model, provider, tool_calls, error) instead of a bare truncated string so the reporter has full fidelity - Report writer is fully best-effort — any failure logs at DEBUG and never breaks the curator itself. Same-second rerun gets a numeric suffix so reports can't clobber each other - Report path stamped into `.curator_state` as `last_report_path` - `hermes curator status` surfaces a "last report:" line so users can immediately open the latest run Tests (all green): - 7 new tests in tests/agent/test_curator_reports.py covering: report location (logs not skills), both files written, run.json shape and diff accuracy, markdown structure, error path still writes, state transitions captured, same-second runs get unique dirs - Existing test_run_review_synchronous_invokes_llm_stub updated to stub the new dict-returning _run_llm_review signature Live E2E: ran a synchronous pass against a 1-skill test collection with a stubbed LLM; report written correctly, state stamped with last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
def _run_llm_review(prompt: str) -> Dict[str, Any]:
"""Spawn an AIAgent fork to run the curator review prompt.
Returns a dict with:
- final: full (untruncated) final response from the reviewer
- summary: short summary suitable for state file (240-char cap)
- model, provider: what the fork actually ran on
- tool_calls: list of {name, arguments} for every tool call made during
the pass (arguments may be truncated for readability)
- error: set if the pass failed mid-run; final/summary may still be empty
Never raises; callers get a structured failure instead.
"""
feat(curator): background skill maintenance (issue #7816) Adds the Curator — an auxiliary-model background task that periodically reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage, transitions unused skills through active → stale → archived, and spawns a forked AIAgent to consolidate overlaps and patch drift. Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI startup and gateway boot when the last run is older than interval_hours (default 24) AND the agent has been idle for min_idle_hours (default 2). Invariants (all load-bearing): - Never touches bundled or hub-installed skills (.bundled_manifest + .hub/lock.json double-filter) - Never auto-deletes — archive only. Archives are recoverable via `hermes curator restore <skill>` - Pinned skills bypass all auto-transitions - Uses the aux client; never touches the main session's prompt cache New files: - tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes, provenance filter - agent/curator.py — orchestrator: config, idle gating, state-machine transitions (pure, no LLM), forked-agent review prompt - hermes_cli/curator.py — `hermes curator {status,run,pause,resume, pin,unpin,restore}` subcommand - tests/tools/test_skill_usage.py — 29 tests - tests/agent/test_curator.py — 25 tests Modified files (surgical patches): - tools/skills_tool.py — bump view_count on successful skill_view - tools/skill_manager_tool.py — bump patch_count on skill_manage patch/edit/write_file/remove_file; forget record on delete - hermes_cli/config.py — add curator: section to DEFAULT_CONFIG - hermes_cli/commands.py — add /curator CommandDef with subcommands - hermes_cli/main.py — register `hermes curator` subparser via register_cli() from hermes_cli.curator - cli.py — /curator slash-command dispatch + startup hook - gateway/run.py — gateway-boot hook (mirrors CLI) Validation: - 54 new tests across skill_usage + curator, all passing in 3s - 346 tests across all touched files' neighbors green - 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green - CLI smoke: `hermes curator status/pause/resume` work end-to-end Companion to PR #16026 (class-first skill review prompt) — together they form a loop: the review prompt stops near-duplicate skill creation at the source, and the curator prunes/consolidates what still accumulates. Refs #7816.
2026-04-26 06:08:39 -07:00
import contextlib
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307) Every curator pass now emits a dated report directory under `~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files: - `run.json` — machine-readable full record (before/after snapshot, state transitions, all tool calls, model/provider, timing, full LLM final response untruncated, error if any) - `REPORT.md` — human-readable markdown: model + duration header, auto-transition counts, LLM consolidation stats, archived-this-run list, new-skills-this-run list, state transitions, the full LLM final summary, and a recovery footer pointing at the archive + the `hermes curator restore` command Reports live under `logs/curator/`, not inside `skills/` — they're operational telemetry, not user-authored skill data, and belong alongside `agent.log` / `gateway.log`. Internals: - `_run_llm_review()` now returns a dict (final, summary, model, provider, tool_calls, error) instead of a bare truncated string so the reporter has full fidelity - Report writer is fully best-effort — any failure logs at DEBUG and never breaks the curator itself. Same-second rerun gets a numeric suffix so reports can't clobber each other - Report path stamped into `.curator_state` as `last_report_path` - `hermes curator status` surfaces a "last report:" line so users can immediately open the latest run Tests (all green): - 7 new tests in tests/agent/test_curator_reports.py covering: report location (logs not skills), both files written, run.json shape and diff accuracy, markdown structure, error path still writes, state transitions captured, same-second runs get unique dirs - Existing test_run_review_synchronous_invokes_llm_stub updated to stub the new dict-returning _run_llm_review signature Live E2E: ran a synchronous pass against a 1-skill test collection with a stubbed LLM; report written correctly, state stamped with last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
result_meta: Dict[str, Any] = {
"final": "",
"summary": "",
"model": "",
"provider": "",
"tool_calls": [],
"error": None,
}
feat(curator): background skill maintenance (issue #7816) Adds the Curator — an auxiliary-model background task that periodically reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage, transitions unused skills through active → stale → archived, and spawns a forked AIAgent to consolidate overlaps and patch drift. Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI startup and gateway boot when the last run is older than interval_hours (default 24) AND the agent has been idle for min_idle_hours (default 2). Invariants (all load-bearing): - Never touches bundled or hub-installed skills (.bundled_manifest + .hub/lock.json double-filter) - Never auto-deletes — archive only. Archives are recoverable via `hermes curator restore <skill>` - Pinned skills bypass all auto-transitions - Uses the aux client; never touches the main session's prompt cache New files: - tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes, provenance filter - agent/curator.py — orchestrator: config, idle gating, state-machine transitions (pure, no LLM), forked-agent review prompt - hermes_cli/curator.py — `hermes curator {status,run,pause,resume, pin,unpin,restore}` subcommand - tests/tools/test_skill_usage.py — 29 tests - tests/agent/test_curator.py — 25 tests Modified files (surgical patches): - tools/skills_tool.py — bump view_count on successful skill_view - tools/skill_manager_tool.py — bump patch_count on skill_manage patch/edit/write_file/remove_file; forget record on delete - hermes_cli/config.py — add curator: section to DEFAULT_CONFIG - hermes_cli/commands.py — add /curator CommandDef with subcommands - hermes_cli/main.py — register `hermes curator` subparser via register_cli() from hermes_cli.curator - cli.py — /curator slash-command dispatch + startup hook - gateway/run.py — gateway-boot hook (mirrors CLI) Validation: - 54 new tests across skill_usage + curator, all passing in 3s - 346 tests across all touched files' neighbors green - 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green - CLI smoke: `hermes curator status/pause/resume` work end-to-end Companion to PR #16026 (class-first skill review prompt) — together they form a loop: the review prompt stops near-duplicate skill creation at the source, and the curator prunes/consolidates what still accumulates. Refs #7816.
2026-04-26 06:08:39 -07:00
try:
from run_agent import AIAgent
except Exception as e:
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307) Every curator pass now emits a dated report directory under `~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files: - `run.json` — machine-readable full record (before/after snapshot, state transitions, all tool calls, model/provider, timing, full LLM final response untruncated, error if any) - `REPORT.md` — human-readable markdown: model + duration header, auto-transition counts, LLM consolidation stats, archived-this-run list, new-skills-this-run list, state transitions, the full LLM final summary, and a recovery footer pointing at the archive + the `hermes curator restore` command Reports live under `logs/curator/`, not inside `skills/` — they're operational telemetry, not user-authored skill data, and belong alongside `agent.log` / `gateway.log`. Internals: - `_run_llm_review()` now returns a dict (final, summary, model, provider, tool_calls, error) instead of a bare truncated string so the reporter has full fidelity - Report writer is fully best-effort — any failure logs at DEBUG and never breaks the curator itself. Same-second rerun gets a numeric suffix so reports can't clobber each other - Report path stamped into `.curator_state` as `last_report_path` - `hermes curator status` surfaces a "last report:" line so users can immediately open the latest run Tests (all green): - 7 new tests in tests/agent/test_curator_reports.py covering: report location (logs not skills), both files written, run.json shape and diff accuracy, markdown structure, error path still writes, state transitions captured, same-second runs get unique dirs - Existing test_run_review_synchronous_invokes_llm_stub updated to stub the new dict-returning _run_llm_review signature Live E2E: ran a synchronous pass against a 1-skill test collection with a stubbed LLM; report written correctly, state stamped with last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
result_meta["error"] = f"AIAgent import failed: {e}"
result_meta["summary"] = result_meta["error"]
return result_meta
feat(curator): background skill maintenance (issue #7816) Adds the Curator — an auxiliary-model background task that periodically reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage, transitions unused skills through active → stale → archived, and spawns a forked AIAgent to consolidate overlaps and patch drift. Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI startup and gateway boot when the last run is older than interval_hours (default 24) AND the agent has been idle for min_idle_hours (default 2). Invariants (all load-bearing): - Never touches bundled or hub-installed skills (.bundled_manifest + .hub/lock.json double-filter) - Never auto-deletes — archive only. Archives are recoverable via `hermes curator restore <skill>` - Pinned skills bypass all auto-transitions - Uses the aux client; never touches the main session's prompt cache New files: - tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes, provenance filter - agent/curator.py — orchestrator: config, idle gating, state-machine transitions (pure, no LLM), forked-agent review prompt - hermes_cli/curator.py — `hermes curator {status,run,pause,resume, pin,unpin,restore}` subcommand - tests/tools/test_skill_usage.py — 29 tests - tests/agent/test_curator.py — 25 tests Modified files (surgical patches): - tools/skills_tool.py — bump view_count on successful skill_view - tools/skill_manager_tool.py — bump patch_count on skill_manage patch/edit/write_file/remove_file; forget record on delete - hermes_cli/config.py — add curator: section to DEFAULT_CONFIG - hermes_cli/commands.py — add /curator CommandDef with subcommands - hermes_cli/main.py — register `hermes curator` subparser via register_cli() from hermes_cli.curator - cli.py — /curator slash-command dispatch + startup hook - gateway/run.py — gateway-boot hook (mirrors CLI) Validation: - 54 new tests across skill_usage + curator, all passing in 3s - 346 tests across all touched files' neighbors green - 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green - CLI smoke: `hermes curator status/pause/resume` work end-to-end Companion to PR #16026 (class-first skill review prompt) — together they form a loop: the review prompt stops near-duplicate skill creation at the source, and the curator prunes/consolidates what still accumulates. Refs #7816.
2026-04-26 06:08:39 -07:00
feat(curator): umbrella-first prompt, inherit parent config, unbounded iterations Based on three live test runs against 346 agent-created skills on the author's own setup (~6.5 min, opus-4.7, 86 API calls), the curator prompt needed three sharpenings before it consistently produced real umbrella consolidation instead of passive audit output: **Umbrella-first framing.** The original 'decide keep/patch/archive/ consolidate' framing lets opus default to 'keep' whenever two skills aren't byte-identical. The new prompt explicitly tells the reviewer that pairwise distinctness is the wrong bar — the right question is 'would a human maintainer write this as N separate skills, or one skill with N labeled subsections?' Expect 10-25 prefix clusters; merge each into an umbrella via one of three methods. **Three concrete consolidation methods.** (a) Merge into an existing umbrella (patch the broadest skill, archive siblings); (b) Create a new umbrella SKILL.md (skill_manage action=create); (c) Demote session-specific detail into references/, templates/, or scripts/ under the umbrella via skill_manage action=write_file, then archive the narrow sibling. This matches the support-file vocabulary the review-prompt side already uses (PR #17213). **Two observed bailouts pre-empted:** 'usage counters are zero so I can't judge' (rule 4: judge on content, not use_count) and 'each has a distinct trigger' (rule 5: pairwise distinctness is the wrong bar). **Config-aware parent inheritance.** _run_llm_review() was building AIAgent() without explicit provider/model, hitting an auto-resolve path that returned empty credentials → HTTP 400 'No models provided' against OpenRouter. Fork now inherits the user's main provider and model (via load_config + resolve_runtime_provider) before spawning — runs on whatever the user is currently on, OAuth-backed or pool-backed included. **Unbounded iteration ceiling.** max_iterations=8 was way too low for an umbrella-build pass over hundreds of skills. A live pass takes 50-100 API calls (scanning, clustering, skill_view'ing candidates, patching umbrellas, mv'ing siblings). Raised to 9999 — the natural stopping criterion is 'no more clusters worth processing', not an arbitrary tool-call budget. **Tests updated:** test_curator_review_prompt_has_invariants accepts DO NOT / MUST NOT and drops 'keep' from the required-verb set (the umbrella-first prompt correctly deemphasizes 'keep' as a first-class decision label since passive keep-everything is the failure mode being prevented). Added test_curator_review_prompt_is_umbrella_first asserting the umbrella framing, class-level thinking, references/ + templates/ + scripts/ support-file mentions, and the 'use_count is not evidence of value' pre-emption. Added test_curator_review_prompt_offers_support_file_actions asserting skill_manage action=create and action=write_file are both named. **Live validation on author's setup:** - Run 1 (old prompt): 3 archives, stopped after surveying — typical passive outcome - Run 2 (consolidation prompt): 44 archives, 3 patches, surfaced the 50-skill mlops reorg duplicate bug but didn't umbrella - Run 3 (this prompt): 249 archives + 18 new class-level umbrellas created, reducing agent-created skills from 346 → 118 with every archived skill's content preserved as references/ under its umbrella. Pinned skill untouched. Full report in PR description.
2026-04-28 22:07:02 -07:00
# Resolve provider + model the same way the CLI does, so the curator
# fork inherits the user's active main config rather than falling
# through to an empty provider/model pair (which sends HTTP 400
# "No models provided"). AIAgent() without explicit provider/model
# arguments hits an auto-resolution path that fails for OAuth-only
# providers and for pool-backed credentials.
fix(curator): unify under auxiliary.curator (hermes model, dashboard) (#17868) Voscko reported curator.auxiliary.provider/model was advertised in the docs but ignored — the review fork read only model.provider/default. The narrow fix would wire the one-off key through, but that leaves curator as a parallel system: not in `hermes model` → auxiliary picker, not in the dashboard Models tab, missing per-task base_url/api_key/timeout/ extra_body. Unify curator with the rest of the aux task system so `hermes model` and the dashboard configure it like every other aux task. Four sources of truth updated: - hermes_cli/config.py — add 'curator' slot to DEFAULT_CONFIG.auxiliary (timeout=600 since reviews run long), drop the one-off curator.auxiliary block from DEFAULT_CONFIG.curator. - hermes_cli/main.py — add ('curator', 'Curator', 'skill-usage review pass') to _AUX_TASKS so the CLI picker offers it. - hermes_cli/web_server.py — add 'curator' to _AUX_TASK_SLOTS so the dashboard REST endpoint accepts it. - web/src/pages/ModelsPage.tsx — add Curator entry so the dashboard Models tab renders the task. agent/curator.py _resolve_review_model() now reads auxiliary.curator first (canonical), falls back to legacy curator.auxiliary (with an info log asking users to migrate), then falls back to the main chat model. Pre-unification users keep working. Docs updated: docs/user-guide/features/curator.md now points at `hermes model` → auxiliary → Curator and the dashboard Models tab. Tests: 6 unit tests on _resolve_review_model (auto default, canonical slot honored, partial override fallback, legacy fallback with deprecation log assertion, new-wins-over-legacy, empty-config safety) plus a cross-registry test that curator is wired into all four sources of truth. test_aux_tasks_keys_all_exist_in_default_config already covers the DEFAULT_CONFIG ↔ _AUX_TASKS invariant. Reported by Voscko on Discord.
2026-04-30 02:46:01 -07:00
#
# `_resolve_review_model()` honors `auxiliary.curator.{provider,model}`
# (canonical aux-task slot, wired through `hermes model` → auxiliary
# picker and the dashboard Models tab), with a legacy fallback to
# `curator.auxiliary.{provider,model}`. See docs/user-guide/features/curator.md.
feat(curator): umbrella-first prompt, inherit parent config, unbounded iterations Based on three live test runs against 346 agent-created skills on the author's own setup (~6.5 min, opus-4.7, 86 API calls), the curator prompt needed three sharpenings before it consistently produced real umbrella consolidation instead of passive audit output: **Umbrella-first framing.** The original 'decide keep/patch/archive/ consolidate' framing lets opus default to 'keep' whenever two skills aren't byte-identical. The new prompt explicitly tells the reviewer that pairwise distinctness is the wrong bar — the right question is 'would a human maintainer write this as N separate skills, or one skill with N labeled subsections?' Expect 10-25 prefix clusters; merge each into an umbrella via one of three methods. **Three concrete consolidation methods.** (a) Merge into an existing umbrella (patch the broadest skill, archive siblings); (b) Create a new umbrella SKILL.md (skill_manage action=create); (c) Demote session-specific detail into references/, templates/, or scripts/ under the umbrella via skill_manage action=write_file, then archive the narrow sibling. This matches the support-file vocabulary the review-prompt side already uses (PR #17213). **Two observed bailouts pre-empted:** 'usage counters are zero so I can't judge' (rule 4: judge on content, not use_count) and 'each has a distinct trigger' (rule 5: pairwise distinctness is the wrong bar). **Config-aware parent inheritance.** _run_llm_review() was building AIAgent() without explicit provider/model, hitting an auto-resolve path that returned empty credentials → HTTP 400 'No models provided' against OpenRouter. Fork now inherits the user's main provider and model (via load_config + resolve_runtime_provider) before spawning — runs on whatever the user is currently on, OAuth-backed or pool-backed included. **Unbounded iteration ceiling.** max_iterations=8 was way too low for an umbrella-build pass over hundreds of skills. A live pass takes 50-100 API calls (scanning, clustering, skill_view'ing candidates, patching umbrellas, mv'ing siblings). Raised to 9999 — the natural stopping criterion is 'no more clusters worth processing', not an arbitrary tool-call budget. **Tests updated:** test_curator_review_prompt_has_invariants accepts DO NOT / MUST NOT and drops 'keep' from the required-verb set (the umbrella-first prompt correctly deemphasizes 'keep' as a first-class decision label since passive keep-everything is the failure mode being prevented). Added test_curator_review_prompt_is_umbrella_first asserting the umbrella framing, class-level thinking, references/ + templates/ + scripts/ support-file mentions, and the 'use_count is not evidence of value' pre-emption. Added test_curator_review_prompt_offers_support_file_actions asserting skill_manage action=create and action=write_file are both named. **Live validation on author's setup:** - Run 1 (old prompt): 3 archives, stopped after surveying — typical passive outcome - Run 2 (consolidation prompt): 44 archives, 3 patches, surfaced the 50-skill mlops reorg duplicate bug but didn't umbrella - Run 3 (this prompt): 249 archives + 18 new class-level umbrellas created, reducing agent-created skills from 346 → 118 with every archived skill's content preserved as references/ under its umbrella. Pinned skill untouched. Full report in PR description.
2026-04-28 22:07:02 -07:00
_api_key = None
_base_url = None
_api_mode = None
_resolved_provider = None
_model_name = ""
try:
from hermes_cli.config import load_config
from hermes_cli.runtime_provider import resolve_runtime_provider
_cfg = load_config()
fix(curator): unify under auxiliary.curator (hermes model, dashboard) (#17868) Voscko reported curator.auxiliary.provider/model was advertised in the docs but ignored — the review fork read only model.provider/default. The narrow fix would wire the one-off key through, but that leaves curator as a parallel system: not in `hermes model` → auxiliary picker, not in the dashboard Models tab, missing per-task base_url/api_key/timeout/ extra_body. Unify curator with the rest of the aux task system so `hermes model` and the dashboard configure it like every other aux task. Four sources of truth updated: - hermes_cli/config.py — add 'curator' slot to DEFAULT_CONFIG.auxiliary (timeout=600 since reviews run long), drop the one-off curator.auxiliary block from DEFAULT_CONFIG.curator. - hermes_cli/main.py — add ('curator', 'Curator', 'skill-usage review pass') to _AUX_TASKS so the CLI picker offers it. - hermes_cli/web_server.py — add 'curator' to _AUX_TASK_SLOTS so the dashboard REST endpoint accepts it. - web/src/pages/ModelsPage.tsx — add Curator entry so the dashboard Models tab renders the task. agent/curator.py _resolve_review_model() now reads auxiliary.curator first (canonical), falls back to legacy curator.auxiliary (with an info log asking users to migrate), then falls back to the main chat model. Pre-unification users keep working. Docs updated: docs/user-guide/features/curator.md now points at `hermes model` → auxiliary → Curator and the dashboard Models tab. Tests: 6 unit tests on _resolve_review_model (auto default, canonical slot honored, partial override fallback, legacy fallback with deprecation log assertion, new-wins-over-legacy, empty-config safety) plus a cross-registry test that curator is wired into all four sources of truth. test_aux_tasks_keys_all_exist_in_default_config already covers the DEFAULT_CONFIG ↔ _AUX_TASKS invariant. Reported by Voscko on Discord.
2026-04-30 02:46:01 -07:00
_provider, _model_name = _resolve_review_model(_cfg)
feat(curator): umbrella-first prompt, inherit parent config, unbounded iterations Based on three live test runs against 346 agent-created skills on the author's own setup (~6.5 min, opus-4.7, 86 API calls), the curator prompt needed three sharpenings before it consistently produced real umbrella consolidation instead of passive audit output: **Umbrella-first framing.** The original 'decide keep/patch/archive/ consolidate' framing lets opus default to 'keep' whenever two skills aren't byte-identical. The new prompt explicitly tells the reviewer that pairwise distinctness is the wrong bar — the right question is 'would a human maintainer write this as N separate skills, or one skill with N labeled subsections?' Expect 10-25 prefix clusters; merge each into an umbrella via one of three methods. **Three concrete consolidation methods.** (a) Merge into an existing umbrella (patch the broadest skill, archive siblings); (b) Create a new umbrella SKILL.md (skill_manage action=create); (c) Demote session-specific detail into references/, templates/, or scripts/ under the umbrella via skill_manage action=write_file, then archive the narrow sibling. This matches the support-file vocabulary the review-prompt side already uses (PR #17213). **Two observed bailouts pre-empted:** 'usage counters are zero so I can't judge' (rule 4: judge on content, not use_count) and 'each has a distinct trigger' (rule 5: pairwise distinctness is the wrong bar). **Config-aware parent inheritance.** _run_llm_review() was building AIAgent() without explicit provider/model, hitting an auto-resolve path that returned empty credentials → HTTP 400 'No models provided' against OpenRouter. Fork now inherits the user's main provider and model (via load_config + resolve_runtime_provider) before spawning — runs on whatever the user is currently on, OAuth-backed or pool-backed included. **Unbounded iteration ceiling.** max_iterations=8 was way too low for an umbrella-build pass over hundreds of skills. A live pass takes 50-100 API calls (scanning, clustering, skill_view'ing candidates, patching umbrellas, mv'ing siblings). Raised to 9999 — the natural stopping criterion is 'no more clusters worth processing', not an arbitrary tool-call budget. **Tests updated:** test_curator_review_prompt_has_invariants accepts DO NOT / MUST NOT and drops 'keep' from the required-verb set (the umbrella-first prompt correctly deemphasizes 'keep' as a first-class decision label since passive keep-everything is the failure mode being prevented). Added test_curator_review_prompt_is_umbrella_first asserting the umbrella framing, class-level thinking, references/ + templates/ + scripts/ support-file mentions, and the 'use_count is not evidence of value' pre-emption. Added test_curator_review_prompt_offers_support_file_actions asserting skill_manage action=create and action=write_file are both named. **Live validation on author's setup:** - Run 1 (old prompt): 3 archives, stopped after surveying — typical passive outcome - Run 2 (consolidation prompt): 44 archives, 3 patches, surfaced the 50-skill mlops reorg duplicate bug but didn't umbrella - Run 3 (this prompt): 249 archives + 18 new class-level umbrellas created, reducing agent-created skills from 346 → 118 with every archived skill's content preserved as references/ under its umbrella. Pinned skill untouched. Full report in PR description.
2026-04-28 22:07:02 -07:00
_rp = resolve_runtime_provider(
requested=_provider, target_model=_model_name
)
_api_key = _rp.get("api_key")
_base_url = _rp.get("base_url")
_api_mode = _rp.get("api_mode")
_resolved_provider = _rp.get("provider") or _provider
except Exception as e:
logger.debug("Curator provider resolution failed: %s", e, exc_info=True)
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307) Every curator pass now emits a dated report directory under `~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files: - `run.json` — machine-readable full record (before/after snapshot, state transitions, all tool calls, model/provider, timing, full LLM final response untruncated, error if any) - `REPORT.md` — human-readable markdown: model + duration header, auto-transition counts, LLM consolidation stats, archived-this-run list, new-skills-this-run list, state transitions, the full LLM final summary, and a recovery footer pointing at the archive + the `hermes curator restore` command Reports live under `logs/curator/`, not inside `skills/` — they're operational telemetry, not user-authored skill data, and belong alongside `agent.log` / `gateway.log`. Internals: - `_run_llm_review()` now returns a dict (final, summary, model, provider, tool_calls, error) instead of a bare truncated string so the reporter has full fidelity - Report writer is fully best-effort — any failure logs at DEBUG and never breaks the curator itself. Same-second rerun gets a numeric suffix so reports can't clobber each other - Report path stamped into `.curator_state` as `last_report_path` - `hermes curator status` surfaces a "last report:" line so users can immediately open the latest run Tests (all green): - 7 new tests in tests/agent/test_curator_reports.py covering: report location (logs not skills), both files written, run.json shape and diff accuracy, markdown structure, error path still writes, state transitions captured, same-second runs get unique dirs - Existing test_run_review_synchronous_invokes_llm_stub updated to stub the new dict-returning _run_llm_review signature Live E2E: ran a synchronous pass against a 1-skill test collection with a stubbed LLM; report written correctly, state stamped with last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
result_meta["model"] = _model_name
result_meta["provider"] = _resolved_provider or ""
feat(curator): background skill maintenance (issue #7816) Adds the Curator — an auxiliary-model background task that periodically reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage, transitions unused skills through active → stale → archived, and spawns a forked AIAgent to consolidate overlaps and patch drift. Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI startup and gateway boot when the last run is older than interval_hours (default 24) AND the agent has been idle for min_idle_hours (default 2). Invariants (all load-bearing): - Never touches bundled or hub-installed skills (.bundled_manifest + .hub/lock.json double-filter) - Never auto-deletes — archive only. Archives are recoverable via `hermes curator restore <skill>` - Pinned skills bypass all auto-transitions - Uses the aux client; never touches the main session's prompt cache New files: - tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes, provenance filter - agent/curator.py — orchestrator: config, idle gating, state-machine transitions (pure, no LLM), forked-agent review prompt - hermes_cli/curator.py — `hermes curator {status,run,pause,resume, pin,unpin,restore}` subcommand - tests/tools/test_skill_usage.py — 29 tests - tests/agent/test_curator.py — 25 tests Modified files (surgical patches): - tools/skills_tool.py — bump view_count on successful skill_view - tools/skill_manager_tool.py — bump patch_count on skill_manage patch/edit/write_file/remove_file; forget record on delete - hermes_cli/config.py — add curator: section to DEFAULT_CONFIG - hermes_cli/commands.py — add /curator CommandDef with subcommands - hermes_cli/main.py — register `hermes curator` subparser via register_cli() from hermes_cli.curator - cli.py — /curator slash-command dispatch + startup hook - gateway/run.py — gateway-boot hook (mirrors CLI) Validation: - 54 new tests across skill_usage + curator, all passing in 3s - 346 tests across all touched files' neighbors green - 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green - CLI smoke: `hermes curator status/pause/resume` work end-to-end Companion to PR #16026 (class-first skill review prompt) — together they form a loop: the review prompt stops near-duplicate skill creation at the source, and the curator prunes/consolidates what still accumulates. Refs #7816.
2026-04-26 06:08:39 -07:00
review_agent = None
try:
feat(curator): umbrella-first prompt, inherit parent config, unbounded iterations Based on three live test runs against 346 agent-created skills on the author's own setup (~6.5 min, opus-4.7, 86 API calls), the curator prompt needed three sharpenings before it consistently produced real umbrella consolidation instead of passive audit output: **Umbrella-first framing.** The original 'decide keep/patch/archive/ consolidate' framing lets opus default to 'keep' whenever two skills aren't byte-identical. The new prompt explicitly tells the reviewer that pairwise distinctness is the wrong bar — the right question is 'would a human maintainer write this as N separate skills, or one skill with N labeled subsections?' Expect 10-25 prefix clusters; merge each into an umbrella via one of three methods. **Three concrete consolidation methods.** (a) Merge into an existing umbrella (patch the broadest skill, archive siblings); (b) Create a new umbrella SKILL.md (skill_manage action=create); (c) Demote session-specific detail into references/, templates/, or scripts/ under the umbrella via skill_manage action=write_file, then archive the narrow sibling. This matches the support-file vocabulary the review-prompt side already uses (PR #17213). **Two observed bailouts pre-empted:** 'usage counters are zero so I can't judge' (rule 4: judge on content, not use_count) and 'each has a distinct trigger' (rule 5: pairwise distinctness is the wrong bar). **Config-aware parent inheritance.** _run_llm_review() was building AIAgent() without explicit provider/model, hitting an auto-resolve path that returned empty credentials → HTTP 400 'No models provided' against OpenRouter. Fork now inherits the user's main provider and model (via load_config + resolve_runtime_provider) before spawning — runs on whatever the user is currently on, OAuth-backed or pool-backed included. **Unbounded iteration ceiling.** max_iterations=8 was way too low for an umbrella-build pass over hundreds of skills. A live pass takes 50-100 API calls (scanning, clustering, skill_view'ing candidates, patching umbrellas, mv'ing siblings). Raised to 9999 — the natural stopping criterion is 'no more clusters worth processing', not an arbitrary tool-call budget. **Tests updated:** test_curator_review_prompt_has_invariants accepts DO NOT / MUST NOT and drops 'keep' from the required-verb set (the umbrella-first prompt correctly deemphasizes 'keep' as a first-class decision label since passive keep-everything is the failure mode being prevented). Added test_curator_review_prompt_is_umbrella_first asserting the umbrella framing, class-level thinking, references/ + templates/ + scripts/ support-file mentions, and the 'use_count is not evidence of value' pre-emption. Added test_curator_review_prompt_offers_support_file_actions asserting skill_manage action=create and action=write_file are both named. **Live validation on author's setup:** - Run 1 (old prompt): 3 archives, stopped after surveying — typical passive outcome - Run 2 (consolidation prompt): 44 archives, 3 patches, surfaced the 50-skill mlops reorg duplicate bug but didn't umbrella - Run 3 (this prompt): 249 archives + 18 new class-level umbrellas created, reducing agent-created skills from 346 → 118 with every archived skill's content preserved as references/ under its umbrella. Pinned skill untouched. Full report in PR description.
2026-04-28 22:07:02 -07:00
review_agent = AIAgent(
model=_model_name,
provider=_resolved_provider,
api_key=_api_key,
base_url=_base_url,
api_mode=_api_mode,
# Umbrella-building over a large skill collection is worth a
# high iteration ceiling — the pass typically takes 50-100
# API calls against hundreds of candidate skills. The
# single-session review path caps itself at a much smaller
# number because it's not doing a curation sweep.
max_iterations=9999,
quiet_mode=True,
platform="curator",
skip_context_files=True,
skip_memory=True,
)
# Disable recursive nudges — the curator must never spawn its own review.
review_agent._memory_nudge_interval = 0
review_agent._skill_nudge_interval = 0
# Redirect the forked agent's stdout/stderr to /dev/null while it
# runs so its tool-call chatter doesn't pollute the foreground
# terminal. The background-thread runner also hides it; this
# belt-and-suspenders path matters when a caller invokes
# run_curator_review(synchronous=True) from the CLI.
feat(curator): background skill maintenance (issue #7816) Adds the Curator — an auxiliary-model background task that periodically reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage, transitions unused skills through active → stale → archived, and spawns a forked AIAgent to consolidate overlaps and patch drift. Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI startup and gateway boot when the last run is older than interval_hours (default 24) AND the agent has been idle for min_idle_hours (default 2). Invariants (all load-bearing): - Never touches bundled or hub-installed skills (.bundled_manifest + .hub/lock.json double-filter) - Never auto-deletes — archive only. Archives are recoverable via `hermes curator restore <skill>` - Pinned skills bypass all auto-transitions - Uses the aux client; never touches the main session's prompt cache New files: - tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes, provenance filter - agent/curator.py — orchestrator: config, idle gating, state-machine transitions (pure, no LLM), forked-agent review prompt - hermes_cli/curator.py — `hermes curator {status,run,pause,resume, pin,unpin,restore}` subcommand - tests/tools/test_skill_usage.py — 29 tests - tests/agent/test_curator.py — 25 tests Modified files (surgical patches): - tools/skills_tool.py — bump view_count on successful skill_view - tools/skill_manager_tool.py — bump patch_count on skill_manage patch/edit/write_file/remove_file; forget record on delete - hermes_cli/config.py — add curator: section to DEFAULT_CONFIG - hermes_cli/commands.py — add /curator CommandDef with subcommands - hermes_cli/main.py — register `hermes curator` subparser via register_cli() from hermes_cli.curator - cli.py — /curator slash-command dispatch + startup hook - gateway/run.py — gateway-boot hook (mirrors CLI) Validation: - 54 new tests across skill_usage + curator, all passing in 3s - 346 tests across all touched files' neighbors green - 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green - CLI smoke: `hermes curator status/pause/resume` work end-to-end Companion to PR #16026 (class-first skill review prompt) — together they form a loop: the review prompt stops near-duplicate skill creation at the source, and the curator prunes/consolidates what still accumulates. Refs #7816.
2026-04-26 06:08:39 -07:00
with open(os.devnull, "w") as _devnull, \
contextlib.redirect_stdout(_devnull), \
contextlib.redirect_stderr(_devnull):
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307) Every curator pass now emits a dated report directory under `~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files: - `run.json` — machine-readable full record (before/after snapshot, state transitions, all tool calls, model/provider, timing, full LLM final response untruncated, error if any) - `REPORT.md` — human-readable markdown: model + duration header, auto-transition counts, LLM consolidation stats, archived-this-run list, new-skills-this-run list, state transitions, the full LLM final summary, and a recovery footer pointing at the archive + the `hermes curator restore` command Reports live under `logs/curator/`, not inside `skills/` — they're operational telemetry, not user-authored skill data, and belong alongside `agent.log` / `gateway.log`. Internals: - `_run_llm_review()` now returns a dict (final, summary, model, provider, tool_calls, error) instead of a bare truncated string so the reporter has full fidelity - Report writer is fully best-effort — any failure logs at DEBUG and never breaks the curator itself. Same-second rerun gets a numeric suffix so reports can't clobber each other - Report path stamped into `.curator_state` as `last_report_path` - `hermes curator status` surfaces a "last report:" line so users can immediately open the latest run Tests (all green): - 7 new tests in tests/agent/test_curator_reports.py covering: report location (logs not skills), both files written, run.json shape and diff accuracy, markdown structure, error path still writes, state transitions captured, same-second runs get unique dirs - Existing test_run_review_synchronous_invokes_llm_stub updated to stub the new dict-returning _run_llm_review signature Live E2E: ran a synchronous pass against a 1-skill test collection with a stubbed LLM; report written correctly, state stamped with last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
conv_result = review_agent.run_conversation(user_message=prompt)
feat(curator): background skill maintenance (issue #7816) Adds the Curator — an auxiliary-model background task that periodically reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage, transitions unused skills through active → stale → archived, and spawns a forked AIAgent to consolidate overlaps and patch drift. Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI startup and gateway boot when the last run is older than interval_hours (default 24) AND the agent has been idle for min_idle_hours (default 2). Invariants (all load-bearing): - Never touches bundled or hub-installed skills (.bundled_manifest + .hub/lock.json double-filter) - Never auto-deletes — archive only. Archives are recoverable via `hermes curator restore <skill>` - Pinned skills bypass all auto-transitions - Uses the aux client; never touches the main session's prompt cache New files: - tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes, provenance filter - agent/curator.py — orchestrator: config, idle gating, state-machine transitions (pure, no LLM), forked-agent review prompt - hermes_cli/curator.py — `hermes curator {status,run,pause,resume, pin,unpin,restore}` subcommand - tests/tools/test_skill_usage.py — 29 tests - tests/agent/test_curator.py — 25 tests Modified files (surgical patches): - tools/skills_tool.py — bump view_count on successful skill_view - tools/skill_manager_tool.py — bump patch_count on skill_manage patch/edit/write_file/remove_file; forget record on delete - hermes_cli/config.py — add curator: section to DEFAULT_CONFIG - hermes_cli/commands.py — add /curator CommandDef with subcommands - hermes_cli/main.py — register `hermes curator` subparser via register_cli() from hermes_cli.curator - cli.py — /curator slash-command dispatch + startup hook - gateway/run.py — gateway-boot hook (mirrors CLI) Validation: - 54 new tests across skill_usage + curator, all passing in 3s - 346 tests across all touched files' neighbors green - 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green - CLI smoke: `hermes curator status/pause/resume` work end-to-end Companion to PR #16026 (class-first skill review prompt) — together they form a loop: the review prompt stops near-duplicate skill creation at the source, and the curator prunes/consolidates what still accumulates. Refs #7816.
2026-04-26 06:08:39 -07:00
final = ""
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307) Every curator pass now emits a dated report directory under `~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files: - `run.json` — machine-readable full record (before/after snapshot, state transitions, all tool calls, model/provider, timing, full LLM final response untruncated, error if any) - `REPORT.md` — human-readable markdown: model + duration header, auto-transition counts, LLM consolidation stats, archived-this-run list, new-skills-this-run list, state transitions, the full LLM final summary, and a recovery footer pointing at the archive + the `hermes curator restore` command Reports live under `logs/curator/`, not inside `skills/` — they're operational telemetry, not user-authored skill data, and belong alongside `agent.log` / `gateway.log`. Internals: - `_run_llm_review()` now returns a dict (final, summary, model, provider, tool_calls, error) instead of a bare truncated string so the reporter has full fidelity - Report writer is fully best-effort — any failure logs at DEBUG and never breaks the curator itself. Same-second rerun gets a numeric suffix so reports can't clobber each other - Report path stamped into `.curator_state` as `last_report_path` - `hermes curator status` surfaces a "last report:" line so users can immediately open the latest run Tests (all green): - 7 new tests in tests/agent/test_curator_reports.py covering: report location (logs not skills), both files written, run.json shape and diff accuracy, markdown structure, error path still writes, state transitions captured, same-second runs get unique dirs - Existing test_run_review_synchronous_invokes_llm_stub updated to stub the new dict-returning _run_llm_review signature Live E2E: ran a synchronous pass against a 1-skill test collection with a stubbed LLM; report written correctly, state stamped with last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
if isinstance(conv_result, dict):
final = str(conv_result.get("final_response") or "").strip()
result_meta["final"] = final
result_meta["summary"] = (final[:240] + "") if len(final) > 240 else (final or "no change")
# Collect tool calls for the report. Walk the forked agent's
# session messages and extract every tool_call made during the
# pass. Truncate argument payloads so a giant skill_manage create
# doesn't blow up the report.
_calls: List[Dict[str, Any]] = []
for msg in getattr(review_agent, "_session_messages", []) or []:
if not isinstance(msg, dict):
continue
tcs = msg.get("tool_calls") or []
for tc in tcs:
if not isinstance(tc, dict):
continue
fn = tc.get("function") or {}
name = fn.get("name") or ""
args_raw = fn.get("arguments") or ""
if isinstance(args_raw, str) and len(args_raw) > 400:
args_raw = args_raw[:400] + ""
_calls.append({"name": name, "arguments": args_raw})
result_meta["tool_calls"] = _calls
feat(curator): background skill maintenance (issue #7816) Adds the Curator — an auxiliary-model background task that periodically reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage, transitions unused skills through active → stale → archived, and spawns a forked AIAgent to consolidate overlaps and patch drift. Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI startup and gateway boot when the last run is older than interval_hours (default 24) AND the agent has been idle for min_idle_hours (default 2). Invariants (all load-bearing): - Never touches bundled or hub-installed skills (.bundled_manifest + .hub/lock.json double-filter) - Never auto-deletes — archive only. Archives are recoverable via `hermes curator restore <skill>` - Pinned skills bypass all auto-transitions - Uses the aux client; never touches the main session's prompt cache New files: - tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes, provenance filter - agent/curator.py — orchestrator: config, idle gating, state-machine transitions (pure, no LLM), forked-agent review prompt - hermes_cli/curator.py — `hermes curator {status,run,pause,resume, pin,unpin,restore}` subcommand - tests/tools/test_skill_usage.py — 29 tests - tests/agent/test_curator.py — 25 tests Modified files (surgical patches): - tools/skills_tool.py — bump view_count on successful skill_view - tools/skill_manager_tool.py — bump patch_count on skill_manage patch/edit/write_file/remove_file; forget record on delete - hermes_cli/config.py — add curator: section to DEFAULT_CONFIG - hermes_cli/commands.py — add /curator CommandDef with subcommands - hermes_cli/main.py — register `hermes curator` subparser via register_cli() from hermes_cli.curator - cli.py — /curator slash-command dispatch + startup hook - gateway/run.py — gateway-boot hook (mirrors CLI) Validation: - 54 new tests across skill_usage + curator, all passing in 3s - 346 tests across all touched files' neighbors green - 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green - CLI smoke: `hermes curator status/pause/resume` work end-to-end Companion to PR #16026 (class-first skill review prompt) — together they form a loop: the review prompt stops near-duplicate skill creation at the source, and the curator prunes/consolidates what still accumulates. Refs #7816.
2026-04-26 06:08:39 -07:00
except Exception as e:
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307) Every curator pass now emits a dated report directory under `~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files: - `run.json` — machine-readable full record (before/after snapshot, state transitions, all tool calls, model/provider, timing, full LLM final response untruncated, error if any) - `REPORT.md` — human-readable markdown: model + duration header, auto-transition counts, LLM consolidation stats, archived-this-run list, new-skills-this-run list, state transitions, the full LLM final summary, and a recovery footer pointing at the archive + the `hermes curator restore` command Reports live under `logs/curator/`, not inside `skills/` — they're operational telemetry, not user-authored skill data, and belong alongside `agent.log` / `gateway.log`. Internals: - `_run_llm_review()` now returns a dict (final, summary, model, provider, tool_calls, error) instead of a bare truncated string so the reporter has full fidelity - Report writer is fully best-effort — any failure logs at DEBUG and never breaks the curator itself. Same-second rerun gets a numeric suffix so reports can't clobber each other - Report path stamped into `.curator_state` as `last_report_path` - `hermes curator status` surfaces a "last report:" line so users can immediately open the latest run Tests (all green): - 7 new tests in tests/agent/test_curator_reports.py covering: report location (logs not skills), both files written, run.json shape and diff accuracy, markdown structure, error path still writes, state transitions captured, same-second runs get unique dirs - Existing test_run_review_synchronous_invokes_llm_stub updated to stub the new dict-returning _run_llm_review signature Live E2E: ran a synchronous pass against a 1-skill test collection with a stubbed LLM; report written correctly, state stamped with last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
result_meta["error"] = f"error: {e}"
result_meta["summary"] = result_meta["error"]
feat(curator): background skill maintenance (issue #7816) Adds the Curator — an auxiliary-model background task that periodically reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage, transitions unused skills through active → stale → archived, and spawns a forked AIAgent to consolidate overlaps and patch drift. Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI startup and gateway boot when the last run is older than interval_hours (default 24) AND the agent has been idle for min_idle_hours (default 2). Invariants (all load-bearing): - Never touches bundled or hub-installed skills (.bundled_manifest + .hub/lock.json double-filter) - Never auto-deletes — archive only. Archives are recoverable via `hermes curator restore <skill>` - Pinned skills bypass all auto-transitions - Uses the aux client; never touches the main session's prompt cache New files: - tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes, provenance filter - agent/curator.py — orchestrator: config, idle gating, state-machine transitions (pure, no LLM), forked-agent review prompt - hermes_cli/curator.py — `hermes curator {status,run,pause,resume, pin,unpin,restore}` subcommand - tests/tools/test_skill_usage.py — 29 tests - tests/agent/test_curator.py — 25 tests Modified files (surgical patches): - tools/skills_tool.py — bump view_count on successful skill_view - tools/skill_manager_tool.py — bump patch_count on skill_manage patch/edit/write_file/remove_file; forget record on delete - hermes_cli/config.py — add curator: section to DEFAULT_CONFIG - hermes_cli/commands.py — add /curator CommandDef with subcommands - hermes_cli/main.py — register `hermes curator` subparser via register_cli() from hermes_cli.curator - cli.py — /curator slash-command dispatch + startup hook - gateway/run.py — gateway-boot hook (mirrors CLI) Validation: - 54 new tests across skill_usage + curator, all passing in 3s - 346 tests across all touched files' neighbors green - 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green - CLI smoke: `hermes curator status/pause/resume` work end-to-end Companion to PR #16026 (class-first skill review prompt) — together they form a loop: the review prompt stops near-duplicate skill creation at the source, and the curator prunes/consolidates what still accumulates. Refs #7816.
2026-04-26 06:08:39 -07:00
finally:
if review_agent is not None:
try:
review_agent.close()
except Exception:
pass
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307) Every curator pass now emits a dated report directory under `~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files: - `run.json` — machine-readable full record (before/after snapshot, state transitions, all tool calls, model/provider, timing, full LLM final response untruncated, error if any) - `REPORT.md` — human-readable markdown: model + duration header, auto-transition counts, LLM consolidation stats, archived-this-run list, new-skills-this-run list, state transitions, the full LLM final summary, and a recovery footer pointing at the archive + the `hermes curator restore` command Reports live under `logs/curator/`, not inside `skills/` — they're operational telemetry, not user-authored skill data, and belong alongside `agent.log` / `gateway.log`. Internals: - `_run_llm_review()` now returns a dict (final, summary, model, provider, tool_calls, error) instead of a bare truncated string so the reporter has full fidelity - Report writer is fully best-effort — any failure logs at DEBUG and never breaks the curator itself. Same-second rerun gets a numeric suffix so reports can't clobber each other - Report path stamped into `.curator_state` as `last_report_path` - `hermes curator status` surfaces a "last report:" line so users can immediately open the latest run Tests (all green): - 7 new tests in tests/agent/test_curator_reports.py covering: report location (logs not skills), both files written, run.json shape and diff accuracy, markdown structure, error path still writes, state transitions captured, same-second runs get unique dirs - Existing test_run_review_synchronous_invokes_llm_stub updated to stub the new dict-returning _run_llm_review signature Live E2E: ran a synchronous pass against a 1-skill test collection with a stubbed LLM; report written correctly, state stamped with last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
return result_meta
feat(curator): background skill maintenance (issue #7816) Adds the Curator — an auxiliary-model background task that periodically reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage, transitions unused skills through active → stale → archived, and spawns a forked AIAgent to consolidate overlaps and patch drift. Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI startup and gateway boot when the last run is older than interval_hours (default 24) AND the agent has been idle for min_idle_hours (default 2). Invariants (all load-bearing): - Never touches bundled or hub-installed skills (.bundled_manifest + .hub/lock.json double-filter) - Never auto-deletes — archive only. Archives are recoverable via `hermes curator restore <skill>` - Pinned skills bypass all auto-transitions - Uses the aux client; never touches the main session's prompt cache New files: - tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes, provenance filter - agent/curator.py — orchestrator: config, idle gating, state-machine transitions (pure, no LLM), forked-agent review prompt - hermes_cli/curator.py — `hermes curator {status,run,pause,resume, pin,unpin,restore}` subcommand - tests/tools/test_skill_usage.py — 29 tests - tests/agent/test_curator.py — 25 tests Modified files (surgical patches): - tools/skills_tool.py — bump view_count on successful skill_view - tools/skill_manager_tool.py — bump patch_count on skill_manage patch/edit/write_file/remove_file; forget record on delete - hermes_cli/config.py — add curator: section to DEFAULT_CONFIG - hermes_cli/commands.py — add /curator CommandDef with subcommands - hermes_cli/main.py — register `hermes curator` subparser via register_cli() from hermes_cli.curator - cli.py — /curator slash-command dispatch + startup hook - gateway/run.py — gateway-boot hook (mirrors CLI) Validation: - 54 new tests across skill_usage + curator, all passing in 3s - 346 tests across all touched files' neighbors green - 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green - CLI smoke: `hermes curator status/pause/resume` work end-to-end Companion to PR #16026 (class-first skill review prompt) — together they form a loop: the review prompt stops near-duplicate skill creation at the source, and the curator prunes/consolidates what still accumulates. Refs #7816.
2026-04-26 06:08:39 -07:00
# ---------------------------------------------------------------------------
# Public entrypoint for the session-start hook
# ---------------------------------------------------------------------------
def maybe_run_curator(
*,
idle_for_seconds: Optional[float] = None,
on_summary: Optional[Callable[[str], None]] = None,
) -> Optional[Dict[str, Any]]:
"""Best-effort: run a curator pass if all gates pass. Returns the result
dict if a pass was started, else None. Never raises."""
try:
if not should_run_now():
return None
# Idle gating: only enforce when the caller provided a measurement.
if idle_for_seconds is not None:
min_idle_s = get_min_idle_hours() * 3600.0
if idle_for_seconds < min_idle_s:
return None
return run_curator_review(on_summary=on_summary)
except Exception as e:
logger.debug("maybe_run_curator failed: %s", e, exc_info=True)
return None