mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-06-10 04:08:28 +08:00
Compare commits
1 Commits
bb/desktop
...
hermes/her
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
8053a1526c |
203
AGENTS.md
203
AGENTS.md
@@ -4,6 +4,201 @@ Instructions for AI coding assistants and developers working on the hermes-agent
|
||||
|
||||
**Never give up on the right solution.**
|
||||
|
||||
## What Hermes Is
|
||||
|
||||
Hermes is a personal AI agent that runs the same agent core across a CLI, a
|
||||
messaging gateway (Telegram, Discord, Slack, and ~20 other platforms), a TUI,
|
||||
and an Electron desktop app. It learns across sessions (memory + skills),
|
||||
delegates to subagents, runs scheduled jobs, and drives a real terminal and
|
||||
browser. It is extended primarily through **plugins and skills**, not by
|
||||
growing the core.
|
||||
|
||||
Two properties shape almost every design decision and are the lens for
|
||||
reviewing any change:
|
||||
|
||||
- **Per-conversation prompt caching is sacred.** A long-lived conversation
|
||||
reuses a cached prefix every turn. Anything that mutates past context,
|
||||
swaps toolsets, or rebuilds the system prompt mid-conversation invalidates
|
||||
that cache and multiplies the user's cost. We do not do it (the one
|
||||
exception is context compression).
|
||||
- **The core is a narrow waist; capability lives at the edges.** Every model
|
||||
tool we add is sent on every API call, so the bar for a new *core* tool is
|
||||
high. Most new capability should arrive as a CLI command + skill, a
|
||||
service-gated tool, or a plugin — not as core surface.
|
||||
|
||||
## Contribution Rubric — What We Want / What We Don't
|
||||
|
||||
This is the project's intent layer. Use it two ways:
|
||||
|
||||
1. **For humans and for your own work** — what gets merged and what gets
|
||||
rejected, so a contribution aims at the target.
|
||||
2. **For automated review (the triage sweeper)** — guidance on when a PR is
|
||||
safe to close on the three allowed reasons (`implemented_on_main`,
|
||||
`cannot_reproduce`, `incoherent`) and, just as important, **when NOT to
|
||||
close** one. Taste-based "we don't want this / out of scope" closes are NOT
|
||||
an automated decision — those stay with a human maintainer. The sweeper's
|
||||
job here is to recognize design intent and *avoid wrongly closing a
|
||||
legitimate contribution*, not to make the won't-implement call itself.
|
||||
|
||||
Read the balance right: Hermes ships a **lot** — most merges are bug fixes to
|
||||
real reported behavior, and the product surface (platforms, channels,
|
||||
providers, models, desktop/TUI features) expands aggressively and on purpose.
|
||||
The restraint below is aimed squarely at the **core agent + the model tool
|
||||
schema**, the one place where every addition is paid for on every API call.
|
||||
"Smallest footprint" governs *how a capability is wired into the core*, NOT
|
||||
whether the product is allowed to grow. We are expansive at the edges and
|
||||
conservative at the waist.
|
||||
|
||||
### What we want
|
||||
|
||||
- **Fix real bugs, well.** The bulk of what lands is `fix(...)` against an
|
||||
actual reported symptom. A good fix reproduces the symptom on current
|
||||
`main`, points to the exact line where it manifests, and fixes the whole bug
|
||||
class — sibling call paths included — not just the one site the reporter hit.
|
||||
- **Expand reach at the edges.** New platform adapters, channels, providers,
|
||||
models, and desktop/TUI/dashboard features are welcome and land routinely,
|
||||
including large ones (a new messaging channel, a session-cap feature, a
|
||||
Windows PTY bridge). Breadth in the product is a goal, not a footprint
|
||||
concern — as long as it integrates with the existing setup/config UX
|
||||
(`hermes tools`, `hermes setup`, auto-install) rather than bolting on a raw
|
||||
env var.
|
||||
- **Refactor god-files into clean modules.** Extracting a multi-thousand-line
|
||||
cluster out of `cli.py` / `run_agent.py` / `gateway/run.py` into a focused
|
||||
mixin or module is wanted work, even when the diff is huge and mechanical
|
||||
(large `+N/-N` refactors merge regularly). The "every line traces to the
|
||||
request" test applies to *feature* PRs; a declared refactor's request IS the
|
||||
extraction.
|
||||
- **Keep the core narrow.** New *model tools* are the expensive exception —
|
||||
every tool ships on every API call. Prefer, in order: extend existing code →
|
||||
CLI command + skill → service-gated tool (`check_fn`) → plugin → MCP server
|
||||
in the catalog → new core tool (last resort). See "The Footprint Ladder."
|
||||
- **Extend, don't duplicate.** Before adding a module/manager/hook, check
|
||||
whether existing infrastructure already covers the use case. When several PRs
|
||||
integrate the same *category*, design one shared interface instead of merging
|
||||
them one at a time (see the ABC + orchestrator note under the Footprint
|
||||
Ladder).
|
||||
- **Behavior contracts over snapshots.** Tests should assert how two pieces of
|
||||
data must relate (invariants), not freeze a current value (model lists,
|
||||
config version literals, enumeration counts). See "Don't write
|
||||
change-detector tests."
|
||||
- **E2E validation, not just green unit mocks.** For anything touching
|
||||
resolution chains, config propagation, security boundaries, remote
|
||||
backends, or file/network I/O, exercise the real path with real imports
|
||||
against a temp `HERMES_HOME`. Mocks hide integration bugs.
|
||||
- **Cache-, alternation-, and invariant-safe.** Preserve prompt caching, strict
|
||||
message role alternation (never two same-role messages in a row; never a
|
||||
synthetic user message injected mid-loop), and a system prompt that is
|
||||
byte-stable for the life of a conversation.
|
||||
- **Contributor credit preserved.** Salvage external work by cherry-picking
|
||||
(rebase-merge) so authorship survives in git history; don't reimplement from
|
||||
scratch when you can build on top.
|
||||
|
||||
### What we don't want (rejected even when well-built)
|
||||
|
||||
- **Speculative infrastructure.** Hooks, callbacks, or extension points with no
|
||||
concrete consumer. Adding a hook is easy; removing one after plugins depend
|
||||
on it is hard. A hook is NOT speculative if a contributor has a real, stated
|
||||
use case — even if the consumer ships separately.
|
||||
- **New `HERMES_*` env vars for non-secret config.** `.env` is for secrets
|
||||
only (API keys, tokens, passwords). All behavioral settings — timeouts,
|
||||
thresholds, feature flags, display prefs — go in `config.yaml`. Bridge to an
|
||||
internal env var if the mechanism needs one, but user-facing docs point to
|
||||
`config.yaml`. Reject PRs that tell users to "set X in your .env" unless X
|
||||
is a credential.
|
||||
- **A new core tool when terminal + file already do the job, or when a skill
|
||||
would.** If the only barrier is file visibility on a remote backend, fix the
|
||||
mount, not the toolset.
|
||||
- **Lazy-reading escape hatches on instructional tools.** No `offset`/`limit`
|
||||
pagination on tools that load content the agent must read fully (skills,
|
||||
prompts, playbooks). Models will read page 1 and skip the rest.
|
||||
- **"Fixes" that destroy the feature they secure.** A mitigation that kills the
|
||||
feature's purpose is the wrong mitigation. Read the original commit's intent
|
||||
(`git log -p -S`) before restricting behavior; find a fix that preserves the
|
||||
feature.
|
||||
- **Outbound telemetry / usage attribution without opt-in gating.** No new
|
||||
analytics, third-party identifier tagging, or attribution tags until a
|
||||
generic user-facing opt-in (config gate + setup prompt + `hermes tools`
|
||||
toggle) exists. Park behind a label, do not merge.
|
||||
- **Change-detector tests, cache-breaking mid-conversation, dead code wired in
|
||||
without E2E proof, and plugins that touch core files.** Plugins live in their
|
||||
own directory and work within the ABCs/hooks we provide; if a plugin needs
|
||||
more, widen the generic plugin surface, don't special-case it in core.
|
||||
|
||||
### Before you call it a bug — verify the premise (and when NOT to close)
|
||||
|
||||
The most common reason a well-written PR gets closed is not code quality — it
|
||||
is that the change is built on a **wrong premise**, or it treats an
|
||||
**intentional design as a gap**. These patterns cut both ways: they tell a
|
||||
human reviewer what to scrutinize, and they tell the automated sweeper when a
|
||||
PR is NOT safe to close as `implemented_on_main` / `cannot_reproduce` (when in
|
||||
doubt, leave it open for a human). They are distilled from real closes.
|
||||
|
||||
- **"Intentional design, not a gap."** A limitation that looks like an
|
||||
oversight is often deliberate. Before "fixing" a missing link or a
|
||||
restriction, ask whether the isolation IS the design. Example: profiles are
|
||||
independent islands on purpose — a PR adding live config inheritance from the
|
||||
default profile was closed because coupling profiles together is exactly what
|
||||
the design prevents (the copy-at-creation `--clone` path already covers the
|
||||
legitimate "start from my default" case). Read the original commit's intent
|
||||
(`git log -p -S "<symbol>"`) before assuming something is unfinished.
|
||||
- **"The premise doesn't hold against how X actually works."** A PR's
|
||||
justification frequently rests on a wrong mental model of an existing
|
||||
mechanism. Trace the real code/runtime before accepting the rationale. Two
|
||||
real closes: a rate-limit "re-probe during cooldown" PR (the breaker only
|
||||
trips on a *confirmed-empty* account bucket, so re-probing just hammers a
|
||||
bucket we've already proven empty); a usage-accumulation fix whose new branch
|
||||
**never executes at runtime** because an earlier guard already popped the
|
||||
state it depended on. If you can't point to the exact line where the bug
|
||||
manifests AND show the fix changes that line's behavior, you haven't verified
|
||||
the premise.
|
||||
- **"This fix was wrong — the absence/omission was deliberate."** Adding the
|
||||
obvious-looking missing piece can break things the omission was protecting.
|
||||
Example: restoring "missing" `__init__.py` files made a test tree importable
|
||||
as a dotted package that shadowed the real plugin, deleting its `register()`
|
||||
at import time. The absence was load-bearing.
|
||||
- **"Overreached / resurrected an approach we'd moved past."** Scope creep that
|
||||
supersedes an agreed-on base, or revives a direction the maintainers
|
||||
deliberately closed, gets rejected even when the code works. Keep the change
|
||||
to the narrow piece that was actually agreed; offer the rest as a focused
|
||||
follow-up.
|
||||
|
||||
The throughline: **verify the claim AND the intent against the codebase before
|
||||
writing or merging a fix.** A confirmed reproduction on current `main` plus a
|
||||
line-level account of where the fix acts beats a plausible-sounding rationale
|
||||
every time. When in doubt about intent, it is cheaper to ask than to ship a
|
||||
fix that fights the design.
|
||||
|
||||
### The Footprint Ladder (new capability decision)
|
||||
|
||||
Each rung adds more permanent surface than the one above. Choose the highest
|
||||
(least-footprint) rung that correctly solves the problem:
|
||||
|
||||
1. **Extend existing code** — the capability is a variation of something that
|
||||
already exists. Zero new surface.
|
||||
2. **CLI command + skill** — manages config/state/infra expressible as shell
|
||||
commands. The agent runs `hermes <subcommand>` guided by a skill. Zero
|
||||
model-tool footprint. Default choice for subscriptions, scheduled tasks,
|
||||
service setup. Examples: `hermes webhook`, `hermes cron`, `hermes tools`.
|
||||
3. **Service-gated tool (`check_fn`)** — needs structured params/returns AND
|
||||
only appears when a prerequisite is configured. Zero footprint otherwise.
|
||||
Examples: Home Assistant tools (gated on token), memory-provider tools.
|
||||
4. **Plugin** — third-party/niche/user-specific capability that doesn't ship in
|
||||
core. Lives in `~/.hermes/plugins/` or a pip package, discovered at runtime.
|
||||
5. **MCP server (in the catalog)** — if the capability genuinely needs to be a
|
||||
tool (structured I/O the agent invokes) but isn't core-fundamental, prefer
|
||||
building it as an MCP server and adding it to the MCP catalog over growing
|
||||
the core toolset. The agent connects to it through the built-in MCP client;
|
||||
zero permanent core-schema footprint, and it's reusable by any MCP host.
|
||||
6. **New core tool** — only when the capability is fundamental, broadly useful
|
||||
to nearly every user, and unreachable via terminal + file (or an MCP server).
|
||||
Examples of correct core tools: terminal, read_file, web_search,
|
||||
browser_navigate.
|
||||
|
||||
When 3+ open PRs try to integrate the same *category* of thing (memory
|
||||
backends, providers, notifiers), don't merge them one at a time — design an
|
||||
ABC + orchestrator, wrap the existing built-in as the first provider, and turn
|
||||
the competing PRs into plugins against that interface.
|
||||
|
||||
## Development Environment
|
||||
|
||||
```bash
|
||||
@@ -302,9 +497,11 @@ A **separate** chat surface from both the classic CLI and the dashboard's embedd
|
||||
|
||||
## Adding New Tools
|
||||
|
||||
For most custom or local-only tools, do **not** edit Hermes core. Use the plugin
|
||||
route instead: create `~/.hermes/plugins/<name>/plugin.yaml` and
|
||||
`~/.hermes/plugins/<name>/__init__.py`, then register tools with
|
||||
Before adding any tool, settle the footprint question first (see "The
|
||||
Footprint Ladder" in the Contribution Rubric): most capabilities should NOT
|
||||
be core tools. For custom or local-only tools, do **not** edit Hermes core.
|
||||
Use the plugin route instead: create `~/.hermes/plugins/<name>/plugin.yaml`
|
||||
and `~/.hermes/plugins/<name>/__init__.py`, then register tools with
|
||||
`ctx.register_tool(...)`. Plugin toolsets are discovered automatically and can be
|
||||
enabled or disabled without touching `tools/` or `toolsets.py`.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user