mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-03 17:27:37 +08:00

Files

Teknium 289cc47631 docs: resync reference, user-guide, developer-guide, and messaging pages against code (#17738 )

Broad drift audit against origin/main (b52b63396).

Reference pages (most user-visible drift):
- slash-commands: add /busy, /curator, /footer, /indicator, /redraw, /steer
  that were missing; drop non-existent /terminal-setup; fix /q footnote
  (resolves to /queue, not /quit); extend CLI-only list with all 24
  CLI-only commands in the registry
- cli-commands: add dedicated sections for hermes curator / fallback /
  hooks (new subcommands not previously documented); remove stale
  hermes honcho standalone section (the plugin registers dynamically
  via hermes memory); list curator/fallback/hooks in top-level table;
  fix completion to include fish
- toolsets-reference: document the real 52-toolset count; split browser
  vs browser-cdp; add discord / discord_admin / spotify / yuanbao;
  correct hermes-cli tool count from 36 to 38; fix misleading claim
  that hermes-homeassistant adds tools (it's identical to hermes-cli)
- tools-reference: bump tool count 55 -> 68; add 7 Spotify, 5 Yuanbao,
  2 Discord toolsets; move browser_cdp/browser_dialog to their own
  browser-cdp toolset section
- environment-variables: add 40+ user-facing HERMES_* vars that were
  undocumented (--yolo, --accept-hooks, --ignore-*, inference model
  override, agent/stream/checkpoint timeouts, OAuth trace, per-platform
  batch tuning for Telegram/Discord/Matrix/Feishu/WeCom, cron knobs,
  gateway restart/connect timeouts); dedupe the Cron Scheduler section;
  replace stale QQ_SANDBOX with QQ_PORTAL_HOST

User-guide (top level):
- cli.md: compression preserves last 20 turns, not 4 (protect_last_n: 20)
- configuration.md: display.platforms is the canonical per-platform
  override key; tool_progress_overrides is deprecated and auto-migrated
- profiles.md: model.default is the config key, not model.model
- sessions.md: CLI/TUI session IDs use 6-char hex, gateway uses 8
- checkpoints-and-rollback.md: destructive-command list now matches
  _DESTRUCTIVE_PATTERNS (adds rmdir, cp, install, dd)
- docker.md: the container runs as non-root hermes (UID 10000) via
  gosu; fix install command (uv pip); add missing --insecure on the
  dashboard compose example (required for non-loopback bind)
- security.md: systemctl danger pattern also matches 'restart'
- index.md: built-in tool count 47 -> 68
- integrations/index.md: 6 STT providers, 8 memory providers
- integrations/providers.md: drop fictional dashscope/qwen aliases

Features:
- overview.md: 9 image models (not 8), 9 TTS providers (not 5),
  8 memory providers (Supermemory was missing)
- tool-gateway.md: 9 image models
- tools.md: extend common-toolsets list with search / messaging /
  spotify / discord / debugging / safe
- fallback-providers.md: add 6 real providers from PROVIDER_REGISTRY
  (lmstudio, kimi-coding-cn, stepfun, alibaba-coding-plan,
  tencent-tokenhub, azure-foundry)
- plugins.md: Available Hooks table now includes on_session_finalize,
  on_session_reset, subagent_stop
- built-in-plugins.md: add the 7 bundled plugins the page didn't
  mention (spotify, google_meet, three image_gen providers, two
  dashboard examples)
- web-dashboard.md: add --insecure and --tui flags
- cron.md: hermes cron create takes positional schedule/prompt, not
  flags

Messaging:
- telegram.md: TELEGRAM_WEBHOOK_SECRET is now REQUIRED when
  TELEGRAM_WEBHOOK_URL is set (gateway refuses to start without it
  per GHSA-3vpc-7q5r-276h). Biggest user-visible drift in the batch.
- discord.md: HERMES_DISCORD_TEXT_BATCH_SPLIT_DELAY_SECONDS default
  is 2.0, not 0.1
- dingtalk.md: document DINGTALK_REQUIRE_MENTION /
  FREE_RESPONSE_CHATS / MENTION_PATTERNS / HOME_CHANNEL /
  ALLOW_ALL_USERS that the adapter supports
- bluebubbles.md: drop fictional BLUEBUBBLES_SEND_READ_RECEIPTS env
  var; the setting lives in platforms.bluebubbles.extra only
- qqbot.md: drop dead QQ_SANDBOX; add real QQ_PORTAL_HOST and
  QQ_GROUP_ALLOWED_USERS
- wecom-callback.md: replace 'hermes gateway start' (service-only)
  with 'hermes gateway' for first-time setup

Developer-guide:
- architecture.md: refresh tool/toolset counts (61/52), terminal
  backend count (7), line counts for run_agent.py (~13.7k), cli.py
  (~11.5k), main.py (~10.4k), setup.py (~3.5k), gateway/run.py
  (~12.2k), mcp_tool.py (~3.1k); add yuanbao adapter, bump platform
  adapter count 18 -> 20
- agent-loop.md: run_agent.py line count 10.7k -> 13.7k
- tools-runtime.md: add vercel_sandbox backend
- adding-tools.md: remove stale 'Discovery import added to
  model_tools.py' checklist item (registry auto-discovery)
- adding-platform-adapters.md: mark send_typing / get_chat_info as
  concrete base methods; only connect/disconnect/send are abstract
- acp-internals.md: ACP sessions now persist to SessionDB
  (~/.hermes/state.db); acp.run_agent call uses
  use_unstable_protocol=True
- cron-internals.md: gateway runs scheduler in a dedicated background
  thread via _start_cron_ticker, not on a maintenance cycle; locking
  is cross-process via fcntl.flock (Unix) / msvcrt.locking (Windows)
- gateway-internals.md: gateway/run.py ~12k lines
- provider-runtime.md: cron DOES support fallback (run_job reads
  fallback_providers from config)
- session-storage.md: SCHEMA_VERSION = 11 (not 9); add migrations
  10 and 11 (trigram FTS, inline-mode FTS5 re-index); add
  api_call_count column to Sessions DDL; document messages_fts_trigram
  and state_meta in the architecture tree
- context-compression-and-caching.md: remove the obsolete 'context
  pressure warnings' section (warnings were removed for causing
  models to give up early)
- context-engine-plugin.md: compress() signature now includes
  focus_topic param
- extending-the-cli.md: _build_tui_layout_children signature now
  includes model_picker_widget; add to default layout

Also fixed three pre-existing broken links/anchors the build warned
about (docker.md -> api-server.md, yuanbao.md -> cron-jobs.md and
tips#background-tasks, nix-setup.md -> #container-aware-cli).

Regenerated per-skill pages via website/scripts/generate-skill-docs.py
so catalog tables and sidebar are consistent with current SKILL.md
frontmatter.

docusaurus build: clean, no broken links or anchors.

2026-04-29 20:55:59 -07:00

11 KiB

Raw Blame History

title, sidebar_label, description

title	sidebar_label	description
Subagent Driven Development — Execute plans via delegate_task subagents (2-stage review)	Subagent Driven Development	Execute plans via delegate_task subagents (2-stage review)

{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}

Subagent Driven Development

Execute plans via delegate_task subagents (2-stage review).

Skill metadata


Source	Bundled (installed by default)
Path	`skills/software-development/subagent-driven-development`
Version	`1.1.0`
Author	Hermes Agent (adapted from obra/superpowers)
License	MIT
Tags	`delegation`, `subagent`, `implementation`, `workflow`, `parallel`
Related skills	`writing-plans`, `requesting-code-review`, `test-driven-development`

Reference: full SKILL.md

:::info The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active. :::

Subagent-Driven Development

Overview

Execute implementation plans by dispatching fresh subagents per task with systematic two-stage review.

Core principle: Fresh subagent per task + two-stage review (spec then quality) = high quality, fast iteration.

When to Use

Use this skill when:

You have an implementation plan (from writing-plans skill or user requirements)
Tasks are mostly independent
Quality and spec compliance are important
You want automated review between tasks

vs. manual execution:

Fresh context per task (no confusion from accumulated state)
Automated review process catches issues early
Consistent quality checks across all tasks
Subagents can ask questions before starting work

The Process

1. Read and Parse Plan

Read the plan file. Extract ALL tasks with their full text and context upfront. Create a todo list:

# Read the plan
read_file("docs/plans/feature-plan.md")

# Create todo list with all tasks
todo([
    {"id": "task-1", "content": "Create User model with email field", "status": "pending"},
    {"id": "task-2", "content": "Add password hashing utility", "status": "pending"},
    {"id": "task-3", "content": "Create login endpoint", "status": "pending"},
])

Key: Read the plan ONCE. Extract everything. Don't make subagents read the plan file — provide the full task text directly in context.

2. Per-Task Workflow

For EACH task in the plan:

Step 1: Dispatch Implementer Subagent

Use delegate_task with complete context:

delegate_task(
    goal="Implement Task 1: Create User model with email and password_hash fields",
    context="""
    TASK FROM PLAN:
    - Create: src/models/user.py
    - Add User class with email (str) and password_hash (str) fields
    - Use bcrypt for password hashing
    - Include __repr__ for debugging

    FOLLOW TDD:
    1. Write failing test in tests/models/test_user.py
    2. Run: pytest tests/models/test_user.py -v (verify FAIL)
    3. Write minimal implementation
    4. Run: pytest tests/models/test_user.py -v (verify PASS)
    5. Run: pytest tests/ -q (verify no regressions)
    6. Commit: git add -A && git commit -m "feat: add User model with password hashing"

    PROJECT CONTEXT:
    - Python 3.11, Flask app in src/app.py
    - Existing models in src/models/
    - Tests use pytest, run from project root
    - bcrypt already in requirements.txt
    """,
    toolsets=['terminal', 'file']
)

Step 2: Dispatch Spec Compliance Reviewer

After the implementer completes, verify against the original spec:

delegate_task(
    goal="Review if implementation matches the spec from the plan",
    context="""
    ORIGINAL TASK SPEC:
    - Create src/models/user.py with User class
    - Fields: email (str), password_hash (str)
    - Use bcrypt for password hashing
    - Include __repr__

    CHECK:
    - [ ] All requirements from spec implemented?
    - [ ] File paths match spec?
    - [ ] Function signatures match spec?
    - [ ] Behavior matches expected?
    - [ ] Nothing extra added (no scope creep)?

    OUTPUT: PASS or list of specific spec gaps to fix.
    """,
    toolsets=['file']
)

If spec issues found: Fix gaps, then re-run spec review. Continue only when spec-compliant.

Step 3: Dispatch Code Quality Reviewer

After spec compliance passes:

delegate_task(
    goal="Review code quality for Task 1 implementation",
    context="""
    FILES TO REVIEW:
    - src/models/user.py
    - tests/models/test_user.py

    CHECK:
    - [ ] Follows project conventions and style?
    - [ ] Proper error handling?
    - [ ] Clear variable/function names?
    - [ ] Adequate test coverage?
    - [ ] No obvious bugs or missed edge cases?
    - [ ] No security issues?

    OUTPUT FORMAT:
    - Critical Issues: [must fix before proceeding]
    - Important Issues: [should fix]
    - Minor Issues: [optional]
    - Verdict: APPROVED or REQUEST_CHANGES
    """,
    toolsets=['file']
)

If quality issues found: Fix issues, re-review. Continue only when approved.

Step 4: Mark Complete

todo([{"id": "task-1", "content": "Create User model with email field", "status": "completed"}], merge=True)

3. Final Review

After ALL tasks are complete, dispatch a final integration reviewer:

delegate_task(
    goal="Review the entire implementation for consistency and integration issues",
    context="""
    All tasks from the plan are complete. Review the full implementation:
    - Do all components work together?
    - Any inconsistencies between tasks?
    - All tests passing?
    - Ready for merge?
    """,
    toolsets=['terminal', 'file']
)

4. Verify and Commit

# Run full test suite
pytest tests/ -q

# Review all changes
git diff --stat

# Final commit if needed
git add -A && git commit -m "feat: complete [feature name] implementation"

Task Granularity

Each task = 2-5 minutes of focused work.

Too big:

"Implement user authentication system"

Right size:

"Create User model with email and password fields"
"Add password hashing function"
"Create login endpoint"
"Add JWT token generation"
"Create registration endpoint"

Red Flags — Never Do These

Start implementation without a plan
Skip reviews (spec compliance OR code quality)
Proceed with unfixed critical/important issues
Dispatch multiple implementation subagents for tasks that touch the same files
Make subagent read the plan file (provide full text in context instead)
Skip scene-setting context (subagent needs to understand where the task fits)
Ignore subagent questions (answer before letting them proceed)
Accept "close enough" on spec compliance
Skip review loops (reviewer found issues → implementer fixes → review again)
Let implementer self-review replace actual review (both are needed)
Start code quality review before spec compliance is PASS (wrong order)
Move to next task while either review has open issues

Handling Issues

If Subagent Asks Questions

Answer clearly and completely
Provide additional context if needed
Don't rush them into implementation

If Reviewer Finds Issues

Implementer subagent (or a new one) fixes them
Reviewer reviews again
Repeat until approved
Don't skip the re-review

If Subagent Fails a Task

Dispatch a new fix subagent with specific instructions about what went wrong
Don't try to fix manually in the controller session (context pollution)

Efficiency Notes

Why fresh subagent per task:

Prevents context pollution from accumulated state
Each subagent gets clean, focused context
No confusion from prior tasks' code or reasoning

Why two-stage review:

Spec review catches under/over-building early
Quality review ensures the implementation is well-built
Catches issues before they compound across tasks

Cost trade-off:

More subagent invocations (implementer + 2 reviewers per task)
But catches issues early (cheaper than debugging compounded problems later)

Integration with Other Skills

With writing-plans

This skill EXECUTES plans created by the writing-plans skill:

User requirements → writing-plans → implementation plan
Implementation plan → subagent-driven-development → working code

With test-driven-development

Implementer subagents should follow TDD:

Write failing test first
Implement minimal code
Verify test passes
Commit

Include TDD instructions in every implementer context.

With requesting-code-review

The two-stage review process IS the code review. For final integration review, use the requesting-code-review skill's review dimensions.

With systematic-debugging

If a subagent encounters bugs during implementation:

Follow systematic-debugging process
Find root cause before fixing
Write regression test
Resume implementation

Example Workflow

[Read plan: docs/plans/auth-feature.md]
[Create todo list with 5 tasks]

--- Task 1: Create User model ---
[Dispatch implementer subagent]
  Implementer: "Should email be unique?"
  You: "Yes, email must be unique"
  Implementer: Implemented, 3/3 tests passing, committed.

[Dispatch spec reviewer]
  Spec reviewer: ✅ PASS — all requirements met

[Dispatch quality reviewer]
  Quality reviewer: ✅ APPROVED — clean code, good tests

[Mark Task 1 complete]

--- Task 2: Password hashing ---
[Dispatch implementer subagent]
  Implementer: No questions, implemented, 5/5 tests passing.

[Dispatch spec reviewer]
  Spec reviewer: ❌ Missing: password strength validation (spec says "min 8 chars")

[Implementer fixes]
  Implementer: Added validation, 7/7 tests passing.

[Dispatch spec reviewer again]
  Spec reviewer: ✅ PASS

[Dispatch quality reviewer]
  Quality reviewer: Important: Magic number 8, extract to constant
  Implementer: Extracted MIN_PASSWORD_LENGTH constant
  Quality reviewer: ✅ APPROVED

[Mark Task 2 complete]

... (continue for all tasks)

[After all tasks: dispatch final integration reviewer]
[Run full test suite: all passing]
[Done!]

Remember

Fresh subagent per task
Two-stage review every time
Spec compliance FIRST
Code quality SECOND
Never skip reviews
Catch issues early

Quality is not an accident. It's the result of systematic process.

11 KiB Raw Blame History