Compare commits

...

31 Commits

Author SHA1 Message Date
Teknium
e52ddb6318 feat: language-aware context compression summaries
Port from anomalyco/opencode#20581: context compaction now generates
summaries in the same language the user was using in the conversation.

Previously, summaries were always produced in English regardless of the
conversation language, which would confuse multilingual users by injecting
English context into non-English conversations.

Adds 'Write the summary in the same language the user was using in the
conversation.' to both the initial and iterative update summarization
prompts in ContextCompressor.
2026-04-02 17:06:48 -07:00
Teknium
924bc67eee feat(memory): pluggable memory provider interface with profile isolation, review fixes, and honcho CLI restoration (#4623)
* feat(memory): add pluggable memory provider interface with profile isolation

Introduces a pluggable MemoryProvider ABC so external memory backends can
integrate with Hermes without modifying core files. Each backend becomes a
plugin implementing a standard interface, orchestrated by MemoryManager.

Key architecture:
- agent/memory_provider.py — ABC with core + optional lifecycle hooks
- agent/memory_manager.py — single integration point in the agent loop
- agent/builtin_memory_provider.py — wraps existing MEMORY.md/USER.md

Profile isolation fixes applied to all 6 shipped plugins:
- Cognitive Memory: use get_hermes_home() instead of raw env var
- Hindsight Memory: check $HERMES_HOME/hindsight/config.json first,
  fall back to legacy ~/.hindsight/ for backward compat
- Hermes Memory Store: replace hardcoded ~/.hermes paths with
  get_hermes_home() for config loading and DB path defaults
- Mem0 Memory: use get_hermes_home() instead of raw env var
- RetainDB Memory: auto-derive profile-scoped project name from
  hermes_home path (hermes-<profile>), explicit env var overrides
- OpenViking Memory: read-only, no local state, isolation via .env

MemoryManager.initialize_all() now injects hermes_home into kwargs so
every provider can resolve profile-scoped storage without importing
get_hermes_home() themselves.

Plugin system: adds register_memory_provider() to PluginContext and
get_plugin_memory_providers() accessor.

Based on PR #3825. 46 tests (37 unit + 5 E2E + 4 plugin registration).

* refactor(memory): drop cognitive plugin, rewrite OpenViking as full provider

Remove cognitive-memory plugin (#727) — core mechanics are broken:
decay runs 24x too fast (hourly not daily), prefetch uses row ID as
timestamp, search limited by importance not similarity.

Rewrite openviking-memory plugin from a read-only search wrapper into
a full bidirectional memory provider using the complete OpenViking
session lifecycle API:

- sync_turn: records user/assistant messages to OpenViking session
  (threaded, non-blocking)
- on_session_end: commits session to trigger automatic memory extraction
  into 6 categories (profile, preferences, entities, events, cases,
  patterns)
- prefetch: background semantic search via find() endpoint
- on_memory_write: mirrors built-in memory writes to the session
- is_available: checks env var only, no network calls (ABC compliance)

Tools expanded from 3 to 5:
- viking_search: semantic search with mode/scope/limit
- viking_read: tiered content (abstract ~100tok / overview ~2k / full)
- viking_browse: filesystem-style navigation (list/tree/stat)
- viking_remember: explicit memory storage via session
- viking_add_resource: ingest URLs/docs into knowledge base

Uses direct HTTP via httpx (no openviking SDK dependency needed).
Response truncation on viking_read to prevent context flooding.

* fix(memory): harden Mem0 plugin — thread safety, non-blocking sync, circuit breaker

- Remove redundant mem0_context tool (identical to mem0_search with
  rerank=true, top_k=5 — wastes a tool slot and confuses the model)
- Thread sync_turn so it's non-blocking — Mem0's server-side LLM
  extraction can take 5-10s, was stalling the agent after every turn
- Add threading.Lock around _get_client() for thread-safe lazy init
  (prefetch and sync threads could race on first client creation)
- Add circuit breaker: after 5 consecutive API failures, pause calls
  for 120s instead of hammering a down server every turn. Auto-resets
  after cooldown. Logs a warning when tripped.
- Track success/failure in prefetch, sync_turn, and all tool calls
- Wait for previous sync to finish before starting a new one (prevents
  unbounded thread accumulation on rapid turns)
- Clean up shutdown to join both prefetch and sync threads

* fix(memory): enforce single external memory provider limit

MemoryManager now rejects a second non-builtin provider with a warning.
Built-in memory (MEMORY.md/USER.md) is always accepted. Only ONE
external plugin provider is allowed at a time. This prevents tool
schema bloat (some providers add 3-5 tools each) and conflicting
memory backends.

The warning message directs users to configure memory.provider in
config.yaml to select which provider to activate.

Updated all 47 tests to use builtin + one external pattern instead
of multiple externals. Added test_second_external_rejected to verify
the enforcement.

* feat(memory): add ByteRover memory provider plugin

Implements the ByteRover integration (from PR #3499 by hieuntg81) as a
MemoryProvider plugin instead of direct run_agent.py modifications.

ByteRover provides persistent memory via the brv CLI — a hierarchical
knowledge tree with tiered retrieval (fuzzy text then LLM-driven search).
Local-first with optional cloud sync.

Plugin capabilities:
- prefetch: background brv query for relevant context
- sync_turn: curate conversation turns (threaded, non-blocking)
- on_memory_write: mirror built-in memory writes to brv
- on_pre_compress: extract insights before context compression

Tools (3):
- brv_query: search the knowledge tree
- brv_curate: store facts/decisions/patterns
- brv_status: check CLI version and context tree state

Profile isolation: working directory at $HERMES_HOME/byterover/ (scoped
per profile). Binary resolution cached with thread-safe double-checked
locking. All write operations threaded to avoid blocking the agent
(curate can take 120s with LLM processing).

* fix(memory): thread remaining sync_turns, fix holographic, add config key

Plugin fixes:
- Hindsight: thread sync_turn (was blocking up to 30s via _run_in_thread)
- RetainDB: thread sync_turn (was blocking on HTTP POST)
- Both: shutdown now joins sync threads alongside prefetch threads

Holographic retrieval fixes:
- reason(): removed dead intersection_key computation (bundled but never
  used in scoring). Now reuses pre-computed entity_residuals directly,
  moved role_content encoding outside the inner loop.
- contradict(): added _MAX_CONTRADICT_FACTS=500 scaling guard. Above
  500 facts, only checks the most recently updated ones to avoid O(n^2)
  explosion (~125K comparisons at 500 is acceptable).

Config:
- Added memory.provider key to DEFAULT_CONFIG ("" = builtin only).
  No version bump needed (deep_merge handles new keys automatically).

* feat(memory): extract Honcho as a MemoryProvider plugin

Creates plugins/honcho-memory/ as a thin adapter over the existing
honcho_integration/ package. All 4 Honcho tools (profile, search,
context, conclude) move from the normal tool registry to the
MemoryProvider interface.

The plugin delegates all work to HonchoSessionManager — no Honcho
logic is reimplemented. It uses the existing config chain:
$HERMES_HOME/honcho.json -> ~/.honcho/config.json -> env vars.

Lifecycle hooks:
- initialize: creates HonchoSessionManager via existing client factory
- prefetch: background dialectic query
- sync_turn: records messages + flushes to API (threaded)
- on_memory_write: mirrors user profile writes as conclusions
- on_session_end: flushes all pending messages

This is a prerequisite for the MemoryManager wiring in run_agent.py.
Once wired, Honcho goes through the same provider interface as all
other memory plugins, and the scattered Honcho code in run_agent.py
can be consolidated into the single MemoryManager integration point.

* feat(memory): wire MemoryManager into run_agent.py

Adds 8 integration points for the external memory provider plugin,
all purely additive (zero existing code modified):

1. Init (~L1130): Create MemoryManager, find matching plugin provider
   from memory.provider config, initialize with session context
2. Tool injection (~L1160): Append provider tool schemas to self.tools
   and self.valid_tool_names after memory_manager init
3. System prompt (~L2705): Add external provider's system_prompt_block
   alongside existing MEMORY.md/USER.md blocks
4. Tool routing (~L5362): Route provider tool calls through
   memory_manager.handle_tool_call() before the catchall handler
5. Memory write bridge (~L5353): Notify external provider via
   on_memory_write() when the built-in memory tool writes
6. Pre-compress (~L5233): Call on_pre_compress() before context
   compression discards messages
7. Prefetch (~L6421): Inject provider prefetch results into the
   current-turn user message (same pattern as Honcho turn context)
8. Turn sync + session end (~L8161, ~L8172): sync_all() after each
   completed turn, queue_prefetch_all() for next turn, on_session_end()
   + shutdown_all() at conversation end

All hooks are wrapped in try/except — a failing provider never breaks
the agent. The existing memory system, Honcho integration, and all
other code paths are completely untouched.

Full suite: 7222 passed, 4 pre-existing failures.

* refactor(memory): remove legacy Honcho integration from core

Extracts all Honcho-specific code from run_agent.py, model_tools.py,
toolsets.py, and gateway/run.py. Honcho is now exclusively available
as a memory provider plugin (plugins/honcho-memory/).

Removed from run_agent.py (-457 lines):
- Honcho init block (session manager creation, activation, config)
- 8 Honcho methods: _honcho_should_activate, _strip_honcho_tools,
  _activate_honcho, _register_honcho_exit_hook, _queue_honcho_prefetch,
  _honcho_prefetch, _honcho_save_user_observation, _honcho_sync
- _inject_honcho_turn_context module-level function
- Honcho system prompt block (tool descriptions, CLI commands)
- Honcho context injection in api_messages building
- Honcho params from __init__ (honcho_session_key, honcho_manager,
  honcho_config)
- HONCHO_TOOL_NAMES constant
- All honcho-specific tool dispatch forwarding

Removed from other files:
- model_tools.py: honcho_tools import, honcho params from handle_function_call
- toolsets.py: honcho toolset definition, honcho tools from core tools list
- gateway/run.py: honcho params from AIAgent constructor calls

Removed tests (-339 lines):
- 9 Honcho-specific test methods from test_run_agent.py
- TestHonchoAtexitFlush class from test_exit_cleanup_interrupt.py

Restored two regex constants (_SURROGATE_RE, _BUDGET_WARNING_RE) that
were accidentally removed during the honcho function extraction.

The honcho_integration/ package is kept intact — the plugin delegates
to it. tools/honcho_tools.py registry entries are now dead code (import
commented out in model_tools.py) but the file is preserved for reference.

Full suite: 7207 passed, 4 pre-existing failures. Zero regressions.

* refactor(memory): restructure plugins, add CLI, clean gateway, migration notice

Plugin restructure:
- Move all memory plugins from plugins/<name>-memory/ to plugins/memory/<name>/
  (byterover, hindsight, holographic, honcho, mem0, openviking, retaindb)
- New plugins/memory/__init__.py discovery module that scans the directory
  directly, loading providers by name without the general plugin system
- run_agent.py uses load_memory_provider() instead of get_plugin_memory_providers()

CLI wiring:
- hermes memory setup — interactive curses picker + config wizard
- hermes memory status — show active provider, config, availability
- hermes memory off — disable external provider (built-in only)
- hermes honcho — now shows migration notice pointing to hermes memory setup

Gateway cleanup:
- Remove _get_or_create_gateway_honcho (already removed in prev commit)
- Remove _shutdown_gateway_honcho and _shutdown_all_gateway_honcho methods
- Remove all calls to shutdown methods (4 call sites)
- Remove _honcho_managers/_honcho_configs dict references

Dead code removal:
- Delete tools/honcho_tools.py (279 lines, import was already commented out)
- Delete tests/gateway/test_honcho_lifecycle.py (131 lines, tested removed methods)
- Remove if False placeholder from run_agent.py

Migration:
- Honcho migration notice on startup: detects existing honcho.json or
  ~/.honcho/config.json, prints guidance to run hermes memory setup.
  Only fires when memory.provider is not set and not in quiet mode.

Full suite: 7203 passed, 4 pre-existing failures. Zero regressions.

* feat(memory): standardize plugin config + add per-plugin documentation

Config architecture:
- Add save_config(values, hermes_home) to MemoryProvider ABC
- Honcho: writes to $HERMES_HOME/honcho.json (SDK native)
- Mem0: writes to $HERMES_HOME/mem0.json
- Hindsight: writes to $HERMES_HOME/hindsight/config.json
- Holographic: writes to config.yaml under plugins.hermes-memory-store
- OpenViking/RetainDB/ByteRover: env-var only (default no-op)

Setup wizard (hermes memory setup):
- Now calls provider.save_config() for non-secret config
- Secrets still go to .env via env vars
- Only memory.provider activation key goes to config.yaml

Documentation:
- README.md for each of the 7 providers in plugins/memory/<name>/
- Requirements, setup (wizard + manual), config reference, tools table
- Consistent format across all providers

The contract for new memory plugins:
- get_config_schema() declares all fields (REQUIRED)
- save_config() writes native config (REQUIRED if not env-var-only)
- Secrets use env_var field in schema, written to .env by wizard
- README.md in the plugin directory

* docs: add memory providers user guide + developer guide

New pages:
- user-guide/features/memory-providers.md — comprehensive guide covering
  all 7 shipped providers (Honcho, OpenViking, Mem0, Hindsight,
  Holographic, RetainDB, ByteRover). Each with setup, config, tools,
  cost, and unique features. Includes comparison table and profile
  isolation notes.
- developer-guide/memory-provider-plugin.md — how to build a new memory
  provider plugin. Covers ABC, required methods, config schema,
  save_config, threading contract, profile isolation, testing.

Updated pages:
- user-guide/features/memory.md — replaced Honcho section with link to
  new Memory Providers page
- user-guide/features/honcho.md — replaced with migration redirect to
  the new Memory Providers page
- sidebars.ts — added both new pages to navigation

* fix(memory): auto-migrate Honcho users to memory provider plugin

When honcho.json or ~/.honcho/config.json exists but memory.provider
is not set, automatically set memory.provider: honcho in config.yaml
and activate the plugin. The plugin reads the same config files, so
all data and credentials are preserved. Zero user action needed.

Persists the migration to config.yaml so it only fires once. Prints
a one-line confirmation in non-quiet mode.

* fix(memory): only auto-migrate Honcho when enabled + credentialed

Check HonchoClientConfig.enabled AND (api_key OR base_url) before
auto-migrating — not just file existence. Prevents false activation
for users who disabled Honcho, stopped using it (config lingers),
or have ~/.honcho/ from a different tool.

* feat(memory): auto-install pip dependencies during hermes memory setup

Reads pip_dependencies from plugin.yaml, checks which are missing,
installs them via pip before config walkthrough. Also shows install
guidance for external_dependencies (e.g. brv CLI for ByteRover).

Updated all 7 plugin.yaml files with pip_dependencies:
- honcho: honcho-ai
- mem0: mem0ai
- openviking: httpx
- hindsight: hindsight-client
- holographic: (none)
- retaindb: requests
- byterover: (external_dependencies for brv CLI)

* fix: remove remaining Honcho crash risks from cli.py and gateway

cli.py: removed Honcho session re-mapping block (would crash importing
deleted tools/honcho_tools.py), Honcho flush on compress, Honcho
session display on startup, Honcho shutdown on exit, honcho_session_key
AIAgent param.

gateway/run.py: removed honcho_session_key params from helper methods,
sync_honcho param, _honcho.shutdown() block.

tests: fixed test_cron_session_with_honcho_key_skipped (was passing
removed honcho_key param to _flush_memories_for_session).

* fix: include plugins/ in pyproject.toml package list

Without this, plugins/memory/ wouldn't be included in non-editable
installs. Hermes always runs from the repo checkout so this is belt-
and-suspenders, but prevents breakage if the install method changes.

* fix(memory): correct pip-to-import name mapping for dep checks

The heuristic dep.replace('-', '_') fails for packages where the pip
name differs from the import name: honcho-ai→honcho, mem0ai→mem0,
hindsight-client→hindsight_client. Added explicit mapping table so
hermes memory setup doesn't try to reinstall already-installed packages.

* chore: remove dead code from old plugin memory registration path

- hermes_cli/plugins.py: removed register_memory_provider(),
  _memory_providers list, get_plugin_memory_providers() — memory
  providers now use plugins/memory/ discovery, not the general plugin system
- hermes_cli/main.py: stripped 74 lines of dead honcho argparse
  subparsers (setup, status, sessions, map, peer, mode, tokens,
  identity, migrate) — kept only the migration redirect
- agent/memory_provider.py: updated docstring to reflect new
  registration path
- tests: replaced TestPluginMemoryProviderRegistration with
  TestPluginMemoryDiscovery that tests the actual plugins/memory/
  discovery system. Added 3 new tests (discover, load, nonexistent).

* chore: delete dead honcho_integration/cli.py and its tests

cli.py (794 lines) was the old 'hermes honcho' command handler — nobody
calls it since cmd_honcho was replaced with a migration redirect.

Deleted tests that imported from removed code:
- tests/honcho_integration/test_cli.py (tested _resolve_api_key)
- tests/honcho_integration/test_config_isolation.py (tested CLI config paths)
- tests/tools/test_honcho_tools.py (tested the deleted tools/honcho_tools.py)

Remaining honcho_integration/ files (actively used by the plugin):
- client.py (445 lines) — config loading, SDK client creation
- session.py (991 lines) — session management, queries, flush

* refactor: move honcho_integration/ into the honcho plugin

Moves client.py (445 lines) and session.py (991 lines) from the
top-level honcho_integration/ package into plugins/memory/honcho/.
No Honcho code remains in the main codebase.

- plugins/memory/honcho/client.py — config loading, SDK client creation
- plugins/memory/honcho/session.py — session management, queries, flush
- Updated all imports: run_agent.py (auto-migration), hermes_cli/doctor.py,
  plugin __init__.py, session.py cross-import, all tests
- Removed honcho_integration/ package and pyproject.toml entry
- Renamed tests/honcho_integration/ → tests/honcho_plugin/

* docs: update architecture + gateway-internals for memory provider system

- architecture.md: replaced honcho_integration/ with plugins/memory/
- gateway-internals.md: replaced Honcho-specific session routing and
  flush lifecycle docs with generic memory provider interface docs

* fix: update stale mock path for resolve_active_host after honcho plugin migration

* fix(memory): address review feedback — P0 lifecycle, ABC contract, honcho CLI restore

Review feedback from Honcho devs (erosika):

P0 — Provider lifecycle:
- Remove on_session_end() + shutdown_all() from run_conversation() tail
  (was killing providers after every turn in multi-turn sessions)
- Add shutdown_memory_provider() method on AIAgent for callers
- Wire shutdown into CLI atexit, reset_conversation, gateway stop/expiry

Bug fixes:
- Remove sync_honcho=False kwarg from /btw callsites (TypeError crash)
- Fix doctor.py references to dead 'hermes honcho setup' command
- Cache prefetch_all() before tool loop (was re-calling every iteration)

ABC contract hardening (all backwards-compatible):
- Add session_id kwarg to prefetch/sync_turn/queue_prefetch
- Make on_pre_compress() return str (provider insights in compression)
- Add **kwargs to on_turn_start() for runtime context
- Add on_delegation() hook for parent-side subagent observation
- Document agent_context/agent_identity/agent_workspace kwargs on
  initialize() (prevents cron corruption, enables profile scoping)
- Fix docstring: single external provider, not multiple

Honcho CLI restoration:
- Add plugins/memory/honcho/cli.py (from main's honcho_integration/cli.py
  with imports adapted to plugin path)
- Restore full hermes honcho command with all subcommands (status, peer,
  mode, tokens, identity, enable/disable, sync, peers, --target-profile)
- Restore auto-clone on profile creation + sync on hermes update
- hermes honcho setup now redirects to hermes memory setup

* fix(memory): wire on_delegation, skip_memory for cron/flush, fix ByteRover return type

- Wire on_delegation() in delegate_tool.py — parent's memory provider
  is notified with task+result after each subagent completes
- Add skip_memory=True to cron scheduler (prevents cron system prompts
  from corrupting user representations — closes #4052)
- Add skip_memory=True to gateway flush agent (throwaway agent shouldn't
  activate memory provider)
- Fix ByteRover on_pre_compress() return type: None -> str

* fix(honcho): port profile isolation fixes from PR #4632

Ports 5 bug fixes found during profile testing (erosika's PR #4632):

1. 3-tier config resolution — resolve_config_path() now checks
   $HERMES_HOME/honcho.json → ~/.hermes/honcho.json → ~/.honcho/config.json
   (non-default profiles couldn't find shared host blocks)

2. Thread host=_host_key() through from_global_config() in cmd_setup,
   cmd_status, cmd_identity (--target-profile was being ignored)

3. Use bare profile name as aiPeer (not host key with dots) — Honcho's
   peer ID pattern is ^[a-zA-Z0-9_-]+$, dots are invalid

4. Wrap add_peers() in try/except — was fatal on new AI peers, killed
   all message uploads for the session

5. Gate Honcho clone behind --clone/--clone-all on profile create
   (bare create should be blank-slate)

Also: sanitize assistant_peer_id via _sanitize_id()

* fix(tests): add module cleanup fixture to test_cli_provider_resolution

test_cli_provider_resolution._import_cli() wipes tools.*, cli, and
run_agent from sys.modules to force fresh imports, but had no cleanup.
This poisoned all subsequent tests on the same xdist worker — mocks
targeting tools.file_tools, tools.send_message_tool, etc. patched the
NEW module object while already-imported functions still referenced
the OLD one. Caused ~25 cascade failures: send_message KeyError,
process_registry FileNotFoundError, file_read_guards timeouts,
read_loop_detection file-not-found, mcp_oauth None port, and
provider_parity/codex_execution stale tool lists.

Fix: autouse fixture saves all affected modules before each test and
restores them after, matching the pattern in
test_managed_browserbase_and_modal.py.
2026-04-02 15:33:51 -07:00
Teknium
e0b2bdb089 fix: webhook platform support — skip home channel prompt, disable tool progress (salvage #4363) (#4660)
Cherry-picked from PR #4363 by @bennyhodl with follow-up fixes:

- Skip 'No home channel' prompt for webhook platform (webhooks deliver
  to configured targets, not a home channel)
- Disable tool progress for webhooks (no message editing support)
- Add webhook to PLATFORMS in tools_config.py and skills_config.py
- Add hermes-webhook toolset to toolsets.py + hermes-gateway includes
- Removed overly aggressive <50 char content filter that blocked
  legitimate short responses (tool progress already handled at source)

Co-authored-by: bennyhodl <bennyhodl@users.noreply.github.com>
2026-04-02 14:00:22 -07:00
SHL0MS
6d68fbf756 Merge pull request #4654 from SHL0MS/skill/research-paper-writing
Replace ml-paper-writing with research-paper-writing: full end-to-end research pipeline
2026-04-02 13:24:12 -07:00
SHL0MS
b86647c295 Replace ml-paper-writing with research-paper-writing: full research pipeline skill
Replaces the writing-focused ml-paper-writing skill (940 lines) with a
complete end-to-end research paper pipeline (1,599 lines SKILL.md + 3,184
lines across 7 reference files).

New content:
- Full 8-phase pipeline: project setup, literature review, experiment
  design, execution/monitoring, analysis, paper drafting, review/revision,
  submission preparation
- Iterative refinement strategy guide from autoreason research (when to use
  autoreason vs critique-and-revise vs single-pass, model selection)
- Hermes agent integration: delegate_task parallel drafting, cronjob
  monitoring, memory/todo state management, skill composition
- Professional LaTeX tooling: microtype, siunitx, TikZ diagram patterns,
  algorithm2e, subcaption, latexdiff, SciencePlots
- Human evaluation design: annotation protocols, inter-annotator agreement,
  crowdsourcing platforms
- Title, Figure 1, conclusion, appendix strategy, page budget management
- Anonymization checklist, rebuttal writing, camera-ready preparation
- AAAI and COLM venue coverage (checklists, reviewer guidelines)

Preserved from ml-paper-writing:
- All writing philosophy (Nanda, Farquhar, Gopen & Swan, Lipton, Perez)
- Citation verification workflow (5-step mandatory process)
- All 6 conference templates (NeurIPS, ICML, ICLR, ACL, AAAI, COLM)
- Conference requirements, format conversion workflow
- Proactivity/collaboration guidance

Bug fixes in inherited reference files:
- BibLaTeX recommendation now correctly says natbib for conferences
- Bare except clauses fixed to except Exception
- Jinja2 template tags removed from citation-workflow.md
- Stale date caveats added to reviewer-guidelines.md
2026-04-02 16:13:26 -04:00
Teknium
798a7b99e4 docs: add Configuration Options section to Slack docs (#4644)
* docs: add Configuration Options section to Slack docs

Documents all config.yaml options for the Slack bot:
- Thread & reply behavior (reply_to_mode, reply_broadcast)
- Session isolation (group_sessions_per_user)
- Mention & trigger behavior (require_mention, mention_patterns, reply_prefix)
- Unauthorized user handling (unauthorized_dm_behavior)
- Voice transcription (stt_enabled)
- Full example config showing all options together

Includes a note about Slack's hardcoded @mention requirement in channels
(no free_response_channels equivalent like Discord/Telegram).

* docs: consolidate reply_in_thread into Configuration Options section

Folds the standalone Reply Threading subsection from PR #4643 into
the Thread & Reply Behavior subsection, keeping all config options
in one place. Adds reply_in_thread to the table and full example.
2026-04-02 12:38:13 -07:00
kshitijk4poor
d2b08406a4 fix(agent): classify think-only empty responses before retrying 2026-04-02 12:29:18 -07:00
Teknium
241cbeeccd docs: add reply_in_thread config to Slack docs 2026-04-02 12:18:40 -07:00
Animesh Mishra
b9a968c1de feat(slack): add reply_in_thread config option
By default, Hermes always threads replies to channel messages. Teams
that prefer direct channel replies had no way to opt out without
patching the source.

Add a reply_in_thread option (default: true) to the Slack platform
extra config:

  platforms:
    slack:
      extra:
        reply_in_thread: false

When false, _resolve_thread_ts() returns None for top-level channel
messages, so replies go directly to the channel. Messages already
inside an existing thread are still replied in-thread to preserve
conversation context. Default is true for full backward compatibility.
2026-04-02 12:18:40 -07:00
Teknium
d89cc7fec1 feat(prompt): add Google model operational guidance for Gemini and Gemma (#4641)
Adapted from OpenCode's gemini.txt. Gemini and Gemma models now get
structured operational directives alongside tool-use enforcement:
absolute paths, verify-before-edit, dependency checks, conciseness,
parallel tool calls, non-interactive flags, autonomous execution.

Based on PR #4026, extended to cover Gemma models.
2026-04-02 11:52:34 -07:00
Teknium
3186668799 feat: per-turn primary runtime restoration and transport recovery (#4624)
Makes provider fallback turn-scoped in long-lived CLI sessions. Previously, a single transient failure pinned the session to the fallback provider for every subsequent turn.

- _primary_runtime dict snapshot at __init__ (model, provider, base_url, api_mode, client_kwargs, compressor state)
- _restore_primary_runtime() at top of run_conversation() — restores all state, resets fallback chain index
- _try_recover_primary_transport() — one extra recovery cycle (client rebuild + cooldown) for transient transport errors on direct endpoints before fallback
- Skipped for aggregator providers (OpenRouter, Nous)
- 25 tests

Inspired by #4612 (@betamod). Closes #4612.
2026-04-02 10:52:01 -07:00
Teknium
918d593544 chore: gitignore generated skills.json
Follow-up to #4500 — the extraction script generates this file at
build time, so it should not be committed.
2026-04-02 10:48:15 -07:00
Nacho Avecilla
b8dd059c40 feat(website): add skills browse and search page to docs (#4500)
Adds a Skills Hub page to the documentation site with browsable/searchable catalog of all skills (built-in, optional, and community from cached hub indexes).

- Python extraction script (website/scripts/extract-skills.py) parses SKILL.md frontmatter and hub index caches into skills.json
- React page (website/src/pages/skills/) with search, category filtering, source filtering, and expandable skill cards
- CI workflow updated to run extraction before Docusaurus build
- Deploy trigger expanded to include skills/ and optional-skills/ changes

Authored by @IAvecilla
2026-04-02 10:47:38 -07:00
kshitijk4poor
20441cf2c8 fix(insights): persist token usage for non-CLI sessions 2026-04-02 10:47:13 -07:00
Teknium
585855d2ca fix: preserve Anthropic thinking block signatures across tool-use turns
Anthropic extended thinking blocks include an opaque 'signature' field
required for thinking chain continuity across multi-turn tool-use
conversations. Previously, normalize_anthropic_response() extracted
only the thinking text and set reasoning_details=None, discarding the
signature. On subsequent turns the API could not verify the chain.

Changes:
- _to_plain_data(): new recursive SDK-to-dict converter with depth cap
  (20 levels) and path-based cycle detection for safety
- _extract_preserved_thinking_blocks(): rehydrates preserved thinking
  blocks (including signature) from reasoning_details on assistant
  messages, placing them before tool_use blocks as Anthropic requires
- normalize_anthropic_response(): stores full thinking blocks in
  reasoning_details via _to_plain_data()
- _extract_reasoning(): adds 'thinking' key to the detail lookup chain
  so Anthropic-format details are found alongside OpenRouter format

Salvaged from PR #4503 by @priveperfumes — focused on the thinking
block continuity fix only (cache strategy and other changes excluded).
2026-04-02 10:30:32 -07:00
Teknium
28a073edc6 fix: repair OpenCode model routing and selection (#4508)
OpenCode Zen and Go are mixed-API-surface providers — different models
behind them use different API surfaces (GPT on Zen uses codex_responses,
Claude on Zen uses anthropic_messages, MiniMax on Go uses
anthropic_messages, GLM/Kimi on Go use chat_completions).

Changes:
- Add normalize_opencode_model_id() and opencode_model_api_mode() to
  models.py for model ID normalization and API surface routing
- Add _provider_supports_explicit_api_mode() to runtime_provider.py
  to prevent stale api_mode from leaking across provider switches
- Wire opencode routing into all three api_mode resolution paths:
  pool entry, api_key provider, and explicit runtime
- Add api_mode field to ModelSwitchResult for propagation through the
  switch pipeline
- Consolidate _PROVIDER_MODELS from main.py into models.py (single
  source of truth, eliminates duplicate dict)
- Add opencode normalization to setup wizard and model picker flows
- Add opencode block to _normalize_model_for_provider in CLI
- Add opencode-zen/go fallback model lists to setup.py

Tests: 160 targeted tests pass (26 new tests covering normalization,
api_mode routing per provider/model, persistence, and setup wizard
normalization).

Based on PR #3017 by SaM13997.

Co-authored-by: SaM13997 <139419381+SaM13997@users.noreply.github.com>
2026-04-02 09:36:24 -07:00
Devorun
f4f64c413f fix(cli): ensure zero exit code on successful quiet mode queries (#4601) 2026-04-02 09:33:31 -07:00
Teknium
8dc5b11e95 fix(honcho): remove redundant local HOST import in _all_profile_host_configs
HOST is already imported at module level from honcho_integration.client.
The local import inside _all_profile_host_configs() was unnecessary.
2026-04-02 09:25:16 -07:00
Erosika
37d73d94bb fix: patch _local_config_path in tests for write isolation 2026-04-02 09:25:16 -07:00
Erosika
a0eae33248 fix(honcho): address PR review findings
- Remove duplicate cmd_sync definition (kept version with error output)
- Fix from_env workspace to stay shared (hermes) not profile-derived
- Add docstring clarifying get_or_create is idempotent in status
- Remove unused import importlib in test
- Fix test assertion for shared workspace in from_env path
- Add 3 tests for sync_honcho_profiles_quiet
2026-04-02 09:25:16 -07:00
Erosika
c146631e3b feat(honcho): sync command + auto-sync on hermes update
- hermes honcho sync: scan all profiles, create missing host blocks
- hermes update: automatically syncs Honcho config to all profiles
  after skill sync (existing users get profile mapping on next update)
- sync_honcho_profiles_quiet() for silent use from update path
2026-04-02 09:25:16 -07:00
Erosika
89eab74c67 feat(honcho): --target-profile flag + peer card display in status
- hermes honcho --target-profile <name> <command>: target another
  profile's Honcho config without switching profiles. Works with all
  subcommands (status, peer, mode, tokens, enable, disable, etc.)
- hermes honcho status now shows user peer card and AI peer
  representation when connected (fetched live from Honcho API)
2026-04-02 09:25:16 -07:00
Erosika
5f6bf2a473 fix(honcho): share workspace across profiles by default
Profiles inherit the default workspace instead of deriving a separate
one. All profiles see the same user context, sessions, and project
history. Each profile is a different AI peer in a shared space.

Workspace can still be overridden per-profile via config if isolation
is needed.
2026-04-02 09:25:16 -07:00
Erosika
f27da5fe8e fix(honcho): remove linkedHosts from peers table 2026-04-02 09:25:16 -07:00
Erosika
0e90df1216 feat(honcho): eager peer creation + enable/disable per profile
- Eagerly create AI and user peers in Honcho when a profile is created
  (not deferred to first message). Uses idempotent peer() SDK call.
- hermes honcho enable: turn on Honcho for active profile, clone
  settings from default if first time, create peer immediately
- hermes honcho disable: turn off Honcho for active profile
- _ensure_peer_exists() helper for idempotent peer creation
2026-04-02 09:25:16 -07:00
Erosika
37458e72a2 feat(honcho): auto-clone config to new profiles on creation
When a profile is created and Honcho is already configured on the
default host, automatically creates a host block for the new profile
with inherited settings (memory mode, recall mode, write frequency,
peer name, etc.) and auto-derived workspace/aiPeer.

Zero-friction path: hermes profile create coder -> Honcho config
cloned as hermes.coder with all settings inherited.
2026-04-02 09:25:16 -07:00
Erosika
d1189f2be9 feat(honcho): add cross-profile observability for Honcho integration
- hermes honcho status: shows active profile name + host key
- hermes honcho status --all: compact table of all profiles with mode,
  recall, write frequency per host block
- hermes honcho peers: cross-profile peer identity table (user peer,
  AI peer, linked hosts)
- All write commands (peer, mode, tokens) print [host_key] label when
  operating on a non-default profile
2026-04-02 09:25:16 -07:00
Erosika
18c156af8e feat(honcho): scope host and peer resolution to active Hermes profile
Derives the Honcho host key from the active Hermes profile so that each
profile gets its own Honcho host block, workspace, and AI peer identity.

Profile "coder" resolves to host "hermes.coder", reads from
hosts["hermes.coder"] in honcho.json, and defaults workspace + aiPeer
to the derived host name.

Resolution order: HERMES_HONCHO_HOST env var > active profile name >
"hermes" (default).

Complements #3681 (profiles) with the Honcho identity layer that was
part of #2845 (named instances), adapted to the merged profiles system.
2026-04-02 09:25:16 -07:00
Teknium
661a1b0ba2 fix: exclude matrix from [all] extras — python-olm is upstream-broken (#4615)
python-olm (required by matrix-nio[e2e]) fails to build on modern macOS:
- CMake 4 rejects vendored libolm's cmake_minimum_required(VERSION 3.4)
- Apple Clang 21+ rejects a C++ type error in include/olm/list.hh
- Upstream libolm repo is archived, no fix forthcoming

Including matrix in [all] causes the entire extras install to fail during
`hermes update`, silently dropping all other extras (telegram, discord,
slack, cron, etc.) when the fallback kicks in.

The [matrix] extra is preserved for opt-in install:
  pip install 'hermes-agent[matrix]'

Closes #4178
2026-04-02 09:21:37 -07:00
Teknium
acea9ee20b fix(tests): fix 11 real test failures + major cascade poisoner (#4570)
Three root causes addressed:

1. AIAgent no longer defaults base_url to OpenRouter (9 tests)
   Tests that assert OpenRouter-specific behavior (prompt caching,
   reasoning extra_body, provider preferences) need explicit base_url
   and model set on the agent. Updated test_run_agent.py and
   test_provider_parity.py.

2. Credential pool auto-seeding from host env (2 tests)
   test_auxiliary_client.py tests for Anthropic OAuth and custom
   endpoint fallback were not mocking _select_pool_entry, so the
   host's credential pool interfered. Added pool + codex mocks.

3. sys.modules corruption cascade (major - ~250 tests)
   test_managed_modal_environment.py replaced sys.modules entries
   (tools, hermes_cli, agent packages) with SimpleNamespace stubs
   but had NO cleanup fixture. Every subsequent test in the process
   saw corrupted imports: 'cannot import get_config_path from
   <unknown module name>' and 'module tools has no attribute
   environments'. Added _restore_tool_and_agent_modules autouse
   fixture matching the pattern in test_managed_browserbase_and_modal.py.

   This was also the root cause of CI failures (104 failed on main).
2026-04-02 08:43:06 -07:00
Teknium
624ad582a5 fix: make gateway approval block agent thread like CLI does (#4557)
The gateway's dangerous command approval system was fundamentally broken:
the agent loop continued running after a command was flagged, and the
approval request only reached the user after the agent finished its
entire conversation loop. By then the context was lost.

This change makes the gateway approval mirror the CLI's synchronous
behavior. When a dangerous command is detected:

1. The agent thread blocks on a threading.Event
2. The approval request is sent to the user immediately
3. The user responds with /approve or /deny
4. The event is signaled and the agent resumes with the real result

The agent never sees 'approval_required' as a tool result. It either
gets the command output (approved) or a definitive BLOCKED message
(denied/timed out) — same as CLI mode.

Queue-based design supports multiple concurrent approvals (parallel
subagents via delegate_task, execute_code RPC handlers). Each approval
gets its own _ApprovalEntry with its own threading.Event. /approve
resolves the oldest (FIFO); /approve all resolves all at once.

Changes:
- tools/approval.py: Queue-based per-session blocking gateway approval
  (register/unregister callbacks, resolve with FIFO or all-at-once)
- gateway/run.py: Register approval callback in run_sync(), remove
  post-loop pop_pending hack, /approve and /deny support 'all' flag
- tests: 21 tests including parallel subagent E2E scenarios
2026-04-02 01:47:19 -07:00
161 changed files with 15037 additions and 3644 deletions

View File

@@ -6,6 +6,8 @@ on:
paths:
- 'website/**'
- 'landingpage/**'
- 'skills/**'
- 'optional-skills/**'
- '.github/workflows/deploy-site.yml'
workflow_dispatch:
@@ -34,6 +36,16 @@ jobs:
cache: npm
cache-dependency-path: website/package-lock.json
- uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install PyYAML for skill extraction
run: pip install pyyaml
- name: Extract skill metadata for dashboard
run: python3 website/scripts/extract-skills.py
- name: Install dependencies
run: npm ci
working-directory: website

View File

@@ -27,8 +27,11 @@ jobs:
with:
python-version: '3.11'
- name: Install ascii-guard
run: python -m pip install ascii-guard
- name: Install Python dependencies
run: python -m pip install ascii-guard pyyaml
- name: Extract skill metadata for dashboard
run: python3 website/scripts/extract-skills.py
- name: Lint docs diagrams
run: npm run lint:diagrams

View File

@@ -10,6 +10,7 @@ Auth supports:
- Claude Code credentials (~/.claude.json or ~/.claude/.credentials.json) → Bearer auth
"""
import copy
import json
import logging
import os
@@ -949,6 +950,69 @@ def _convert_content_part_to_anthropic(part: Any) -> Optional[Dict[str, Any]]:
return block
def _to_plain_data(value: Any, *, _depth: int = 0, _path: Optional[set] = None) -> Any:
"""Recursively convert SDK objects to plain Python data structures.
Guards against circular references (``_path`` tracks ``id()`` of objects
on the *current* recursion path) and runaway depth (capped at 20 levels).
Uses path-based tracking so shared (but non-cyclic) objects referenced by
multiple siblings are converted correctly rather than being stringified.
"""
_MAX_DEPTH = 20
if _depth > _MAX_DEPTH:
return str(value)
if _path is None:
_path = set()
obj_id = id(value)
if obj_id in _path:
return str(value)
if hasattr(value, "model_dump"):
_path.add(obj_id)
result = _to_plain_data(value.model_dump(), _depth=_depth + 1, _path=_path)
_path.discard(obj_id)
return result
if isinstance(value, dict):
_path.add(obj_id)
result = {k: _to_plain_data(v, _depth=_depth + 1, _path=_path) for k, v in value.items()}
_path.discard(obj_id)
return result
if isinstance(value, (list, tuple)):
_path.add(obj_id)
result = [_to_plain_data(v, _depth=_depth + 1, _path=_path) for v in value]
_path.discard(obj_id)
return result
if hasattr(value, "__dict__"):
_path.add(obj_id)
result = {
k: _to_plain_data(v, _depth=_depth + 1, _path=_path)
for k, v in vars(value).items()
if not k.startswith("_")
}
_path.discard(obj_id)
return result
return value
def _extract_preserved_thinking_blocks(message: Dict[str, Any]) -> List[Dict[str, Any]]:
"""Return Anthropic thinking blocks previously preserved on the message."""
raw_details = message.get("reasoning_details")
if not isinstance(raw_details, list):
return []
preserved: List[Dict[str, Any]] = []
for detail in raw_details:
if not isinstance(detail, dict):
continue
block_type = str(detail.get("type", "") or "").strip().lower()
if block_type not in {"thinking", "redacted_thinking"}:
continue
preserved.append(copy.deepcopy(detail))
return preserved
def _convert_content_to_anthropic(content: Any) -> Any:
"""Convert OpenAI-style multimodal content arrays to Anthropic blocks."""
if not isinstance(content, list):
@@ -995,7 +1059,7 @@ def convert_messages_to_anthropic(
continue
if role == "assistant":
blocks = []
blocks = _extract_preserved_thinking_blocks(m)
if content:
if isinstance(content, list):
converted_content = _convert_content_to_anthropic(content)
@@ -1279,6 +1343,7 @@ def normalize_anthropic_response(
"""
text_parts = []
reasoning_parts = []
reasoning_details = []
tool_calls = []
for block in response.content:
@@ -1286,6 +1351,9 @@ def normalize_anthropic_response(
text_parts.append(block.text)
elif block.type == "thinking":
reasoning_parts.append(block.thinking)
block_dict = _to_plain_data(block)
if isinstance(block_dict, dict):
reasoning_details.append(block_dict)
elif block.type == "tool_use":
name = block.name
if strip_tool_prefix and name.startswith(_MCP_TOOL_PREFIX):
@@ -1316,7 +1384,7 @@ def normalize_anthropic_response(
tool_calls=tool_calls or None,
reasoning="\n\n".join(reasoning_parts) if reasoning_parts else None,
reasoning_content=None,
reasoning_details=None,
reasoning_details=reasoning_details or None,
),
finish_reason,
)

View File

@@ -0,0 +1,113 @@
"""BuiltinMemoryProvider — wraps MEMORY.md / USER.md as a MemoryProvider.
Always registered as the first provider. Cannot be disabled or removed.
This is the existing Hermes memory system exposed through the provider
interface for compatibility with the MemoryManager.
The actual storage logic lives in tools/memory_tool.py (MemoryStore).
This provider is a thin adapter that delegates to MemoryStore and
exposes the memory tool schema.
"""
from __future__ import annotations
import json
import logging
from typing import Any, Dict, List, Optional
from agent.memory_provider import MemoryProvider
logger = logging.getLogger(__name__)
class BuiltinMemoryProvider(MemoryProvider):
"""Built-in file-backed memory (MEMORY.md + USER.md).
Always active, never disabled by other providers. The `memory` tool
is handled by run_agent.py's agent-level tool interception (not through
the normal registry), so get_tool_schemas() returns an empty list —
the memory tool is already wired separately.
"""
def __init__(
self,
memory_store=None,
memory_enabled: bool = False,
user_profile_enabled: bool = False,
):
self._store = memory_store
self._memory_enabled = memory_enabled
self._user_profile_enabled = user_profile_enabled
@property
def name(self) -> str:
return "builtin"
def is_available(self) -> bool:
"""Built-in memory is always available."""
return True
def initialize(self, session_id: str, **kwargs) -> None:
"""Load memory from disk if not already loaded."""
if self._store is not None:
self._store.load_from_disk()
def system_prompt_block(self) -> str:
"""Return MEMORY.md and USER.md content for the system prompt.
Uses the frozen snapshot captured at load time. This ensures the
system prompt stays stable throughout a session (preserving the
prompt cache), even though the live entries may change via tool calls.
"""
if not self._store:
return ""
parts = []
if self._memory_enabled:
mem_block = self._store.format_for_system_prompt("memory")
if mem_block:
parts.append(mem_block)
if self._user_profile_enabled:
user_block = self._store.format_for_system_prompt("user")
if user_block:
parts.append(user_block)
return "\n\n".join(parts)
def prefetch(self, query: str, *, session_id: str = "") -> str:
"""Built-in memory doesn't do query-based recall — it's injected via system_prompt_block."""
return ""
def sync_turn(self, user_content: str, assistant_content: str, *, session_id: str = "") -> None:
"""Built-in memory doesn't auto-sync turns — writes happen via the memory tool."""
def get_tool_schemas(self) -> List[Dict[str, Any]]:
"""Return empty list.
The `memory` tool is an agent-level intercepted tool, handled
specially in run_agent.py before normal tool dispatch. It's not
part of the standard tool registry. We don't duplicate it here.
"""
return []
def handle_tool_call(self, tool_name: str, args: Dict[str, Any], **kwargs) -> str:
"""Not used — the memory tool is intercepted in run_agent.py."""
return json.dumps({"error": "Built-in memory tool is handled by the agent loop"})
def shutdown(self) -> None:
"""No cleanup needed — files are saved on every write."""
# -- Property access for backward compatibility --------------------------
@property
def store(self):
"""Access the underlying MemoryStore for legacy code paths."""
return self._store
@property
def memory_enabled(self) -> bool:
return self._memory_enabled
@property
def user_profile_enabled(self) -> bool:
return self._user_profile_enabled

View File

@@ -301,6 +301,8 @@ Update the summary using this exact structure. PRESERVE all existing information
Target ~{summary_budget} tokens. Be specific — include file paths, command outputs, error messages, and concrete values rather than vague descriptions.
Write the summary in the same language the user was using in the conversation.
Write only the summary body. Do not include any preamble or prefix."""
else:
# First compaction: summarize from scratch
@@ -339,6 +341,8 @@ Use this exact structure:
Target ~{summary_budget} tokens. Be specific — include file paths, command outputs, error messages, and concrete values rather than vague descriptions. The goal is to prevent the next assistant from repeating work or losing important details.
Write the summary in the same language the user was using in the conversation.
Write only the summary body. Do not include any preamble or prefix."""
try:

335
agent/memory_manager.py Normal file
View File

@@ -0,0 +1,335 @@
"""MemoryManager — orchestrates the built-in memory provider plus at most
ONE external plugin memory provider.
Single integration point in run_agent.py. Replaces scattered per-backend
code with one manager that delegates to registered providers.
The BuiltinMemoryProvider is always registered first and cannot be removed.
Only ONE external (non-builtin) provider is allowed at a time — attempting
to register a second external provider is rejected with a warning. This
prevents tool schema bloat and conflicting memory backends.
Usage in run_agent.py:
self._memory_manager = MemoryManager()
self._memory_manager.add_provider(BuiltinMemoryProvider(...))
# Only ONE of these:
self._memory_manager.add_provider(plugin_provider)
# System prompt
prompt_parts.append(self._memory_manager.build_system_prompt())
# Pre-turn
context = self._memory_manager.prefetch_all(user_message)
# Post-turn
self._memory_manager.sync_all(user_msg, assistant_response)
self._memory_manager.queue_prefetch_all(user_msg)
"""
from __future__ import annotations
import json
import logging
from typing import Any, Dict, List, Optional
from agent.memory_provider import MemoryProvider
logger = logging.getLogger(__name__)
class MemoryManager:
"""Orchestrates the built-in provider plus at most one external provider.
The builtin provider is always first. Only one non-builtin (external)
provider is allowed. Failures in one provider never block the other.
"""
def __init__(self) -> None:
self._providers: List[MemoryProvider] = []
self._tool_to_provider: Dict[str, MemoryProvider] = {}
self._has_external: bool = False # True once a non-builtin provider is added
# -- Registration --------------------------------------------------------
def add_provider(self, provider: MemoryProvider) -> None:
"""Register a memory provider.
Built-in provider (name ``"builtin"``) is always accepted.
Only **one** external (non-builtin) provider is allowed — a second
attempt is rejected with a warning.
"""
is_builtin = provider.name == "builtin"
if not is_builtin:
if self._has_external:
existing = next(
(p.name for p in self._providers if p.name != "builtin"), "unknown"
)
logger.warning(
"Rejected memory provider '%s' — external provider '%s' is "
"already registered. Only one external memory provider is "
"allowed at a time. Configure which one via memory.provider "
"in config.yaml.",
provider.name, existing,
)
return
self._has_external = True
self._providers.append(provider)
# Index tool names → provider for routing
for schema in provider.get_tool_schemas():
tool_name = schema.get("name", "")
if tool_name and tool_name not in self._tool_to_provider:
self._tool_to_provider[tool_name] = provider
elif tool_name in self._tool_to_provider:
logger.warning(
"Memory tool name conflict: '%s' already registered by %s, "
"ignoring from %s",
tool_name,
self._tool_to_provider[tool_name].name,
provider.name,
)
logger.info(
"Memory provider '%s' registered (%d tools)",
provider.name,
len(provider.get_tool_schemas()),
)
@property
def providers(self) -> List[MemoryProvider]:
"""All registered providers in order."""
return list(self._providers)
@property
def provider_names(self) -> List[str]:
"""Names of all registered providers."""
return [p.name for p in self._providers]
def get_provider(self, name: str) -> Optional[MemoryProvider]:
"""Get a provider by name, or None if not registered."""
for p in self._providers:
if p.name == name:
return p
return None
# -- System prompt -------------------------------------------------------
def build_system_prompt(self) -> str:
"""Collect system prompt blocks from all providers.
Returns combined text, or empty string if no providers contribute.
Each non-empty block is labeled with the provider name.
"""
blocks = []
for provider in self._providers:
try:
block = provider.system_prompt_block()
if block and block.strip():
blocks.append(block)
except Exception as e:
logger.warning(
"Memory provider '%s' system_prompt_block() failed: %s",
provider.name, e,
)
return "\n\n".join(blocks)
# -- Prefetch / recall ---------------------------------------------------
def prefetch_all(self, query: str, *, session_id: str = "") -> str:
"""Collect prefetch context from all providers.
Returns merged context text labeled by provider. Empty providers
are skipped. Failures in one provider don't block others.
"""
parts = []
for provider in self._providers:
try:
result = provider.prefetch(query, session_id=session_id)
if result and result.strip():
parts.append(result)
except Exception as e:
logger.debug(
"Memory provider '%s' prefetch failed (non-fatal): %s",
provider.name, e,
)
return "\n\n".join(parts)
def queue_prefetch_all(self, query: str, *, session_id: str = "") -> None:
"""Queue background prefetch on all providers for the next turn."""
for provider in self._providers:
try:
provider.queue_prefetch(query, session_id=session_id)
except Exception as e:
logger.debug(
"Memory provider '%s' queue_prefetch failed (non-fatal): %s",
provider.name, e,
)
# -- Sync ----------------------------------------------------------------
def sync_all(self, user_content: str, assistant_content: str, *, session_id: str = "") -> None:
"""Sync a completed turn to all providers."""
for provider in self._providers:
try:
provider.sync_turn(user_content, assistant_content, session_id=session_id)
except Exception as e:
logger.warning(
"Memory provider '%s' sync_turn failed: %s",
provider.name, e,
)
# -- Tools ---------------------------------------------------------------
def get_all_tool_schemas(self) -> List[Dict[str, Any]]:
"""Collect tool schemas from all providers."""
schemas = []
seen = set()
for provider in self._providers:
try:
for schema in provider.get_tool_schemas():
name = schema.get("name", "")
if name and name not in seen:
schemas.append(schema)
seen.add(name)
except Exception as e:
logger.warning(
"Memory provider '%s' get_tool_schemas() failed: %s",
provider.name, e,
)
return schemas
def get_all_tool_names(self) -> set:
"""Return set of all tool names across all providers."""
return set(self._tool_to_provider.keys())
def has_tool(self, tool_name: str) -> bool:
"""Check if any provider handles this tool."""
return tool_name in self._tool_to_provider
def handle_tool_call(
self, tool_name: str, args: Dict[str, Any], **kwargs
) -> str:
"""Route a tool call to the correct provider.
Returns JSON string result. Raises ValueError if no provider
handles the tool.
"""
provider = self._tool_to_provider.get(tool_name)
if provider is None:
return json.dumps({"error": f"No memory provider handles tool '{tool_name}'"})
try:
return provider.handle_tool_call(tool_name, args, **kwargs)
except Exception as e:
logger.error(
"Memory provider '%s' handle_tool_call(%s) failed: %s",
provider.name, tool_name, e,
)
return json.dumps({"error": f"Memory tool '{tool_name}' failed: {e}"})
# -- Lifecycle hooks -----------------------------------------------------
def on_turn_start(self, turn_number: int, message: str, **kwargs) -> None:
"""Notify all providers of a new turn.
kwargs may include: remaining_tokens, model, platform, tool_count.
"""
for provider in self._providers:
try:
provider.on_turn_start(turn_number, message, **kwargs)
except Exception as e:
logger.debug(
"Memory provider '%s' on_turn_start failed: %s",
provider.name, e,
)
def on_session_end(self, messages: List[Dict[str, Any]]) -> None:
"""Notify all providers of session end."""
for provider in self._providers:
try:
provider.on_session_end(messages)
except Exception as e:
logger.debug(
"Memory provider '%s' on_session_end failed: %s",
provider.name, e,
)
def on_pre_compress(self, messages: List[Dict[str, Any]]) -> str:
"""Notify all providers before context compression.
Returns combined text from providers to include in the compression
summary prompt. Empty string if no provider contributes.
"""
parts = []
for provider in self._providers:
try:
result = provider.on_pre_compress(messages)
if result and result.strip():
parts.append(result)
except Exception as e:
logger.debug(
"Memory provider '%s' on_pre_compress failed: %s",
provider.name, e,
)
return "\n\n".join(parts)
def on_memory_write(self, action: str, target: str, content: str) -> None:
"""Notify external providers when the built-in memory tool writes.
Skips the builtin provider itself (it's the source of the write).
"""
for provider in self._providers:
if provider.name == "builtin":
continue
try:
provider.on_memory_write(action, target, content)
except Exception as e:
logger.debug(
"Memory provider '%s' on_memory_write failed: %s",
provider.name, e,
)
def on_delegation(self, task: str, result: str, *,
child_session_id: str = "", **kwargs) -> None:
"""Notify all providers that a subagent completed."""
for provider in self._providers:
try:
provider.on_delegation(
task, result, child_session_id=child_session_id, **kwargs
)
except Exception as e:
logger.debug(
"Memory provider '%s' on_delegation failed: %s",
provider.name, e,
)
def shutdown_all(self) -> None:
"""Shut down all providers (reverse order for clean teardown)."""
for provider in reversed(self._providers):
try:
provider.shutdown()
except Exception as e:
logger.warning(
"Memory provider '%s' shutdown failed: %s",
provider.name, e,
)
def initialize_all(self, session_id: str, **kwargs) -> None:
"""Initialize all providers.
Automatically injects ``hermes_home`` into *kwargs* so that every
provider can resolve profile-scoped storage paths without importing
``get_hermes_home()`` themselves.
"""
if "hermes_home" not in kwargs:
from hermes_constants import get_hermes_home
kwargs["hermes_home"] = str(get_hermes_home())
for provider in self._providers:
try:
provider.initialize(session_id=session_id, **kwargs)
except Exception as e:
logger.warning(
"Memory provider '%s' initialize failed: %s",
provider.name, e,
)

231
agent/memory_provider.py Normal file
View File

@@ -0,0 +1,231 @@
"""Abstract base class for pluggable memory providers.
Memory providers give the agent persistent recall across sessions. One
external provider is active at a time alongside the always-on built-in
memory (MEMORY.md / USER.md). The MemoryManager enforces this limit.
Built-in memory is always active as the first provider and cannot be removed.
External providers (Honcho, Hindsight, Mem0, etc.) are additive — they never
disable the built-in store. Only one external provider runs at a time to
prevent tool schema bloat and conflicting memory backends.
Registration:
1. Built-in: BuiltinMemoryProvider — always present, not removable.
2. Plugins: Ship in plugins/memory/<name>/, activated by memory.provider config.
Lifecycle (called by MemoryManager, wired in run_agent.py):
initialize() — connect, create resources, warm up
system_prompt_block() — static text for the system prompt
prefetch(query) — background recall before each turn
sync_turn(user, asst) — async write after each turn
get_tool_schemas() — tool schemas to expose to the model
handle_tool_call() — dispatch a tool call
shutdown() — clean exit
Optional hooks (override to opt in):
on_turn_start(turn, message, **kwargs) — per-turn tick with runtime context
on_session_end(messages) — end-of-session extraction
on_pre_compress(messages) -> str — extract before context compression
on_memory_write(action, target, content) — mirror built-in memory writes
on_delegation(task, result, **kwargs) — parent-side observation of subagent work
"""
from __future__ import annotations
import logging
from abc import ABC, abstractmethod
from typing import Any, Dict, List, Optional
logger = logging.getLogger(__name__)
class MemoryProvider(ABC):
"""Abstract base class for memory providers."""
@property
@abstractmethod
def name(self) -> str:
"""Short identifier for this provider (e.g. 'builtin', 'honcho', 'hindsight')."""
# -- Core lifecycle (implement these) ------------------------------------
@abstractmethod
def is_available(self) -> bool:
"""Return True if this provider is configured, has credentials, and is ready.
Called during agent init to decide whether to activate the provider.
Should not make network calls — just check config and installed deps.
"""
@abstractmethod
def initialize(self, session_id: str, **kwargs) -> None:
"""Initialize for a session.
Called once at agent startup. May create resources (banks, tables),
establish connections, start background threads, etc.
kwargs always include:
- hermes_home (str): The active HERMES_HOME directory path. Use this
for profile-scoped storage instead of hardcoding ``~/.hermes``.
- platform (str): "cli", "telegram", "discord", "cron", etc.
kwargs may also include:
- agent_context (str): "primary", "subagent", "cron", or "flush".
Providers should skip writes for non-primary contexts (cron system
prompts would corrupt user representations).
- agent_identity (str): Profile name (e.g. "coder"). Use for
per-profile provider identity scoping.
- agent_workspace (str): Shared workspace name (e.g. "hermes").
- parent_session_id (str): For subagents, the parent's session_id.
- user_id (str): Platform user identifier (gateway sessions).
"""
def system_prompt_block(self) -> str:
"""Return text to include in the system prompt.
Called during system prompt assembly. Return empty string to skip.
This is for STATIC provider info (instructions, status). Prefetched
recall context is injected separately via prefetch().
"""
return ""
def prefetch(self, query: str, *, session_id: str = "") -> str:
"""Recall relevant context for the upcoming turn.
Called before each API call. Return formatted text to inject as
context, or empty string if nothing relevant. Implementations
should be fast — use background threads for the actual recall
and return cached results here.
session_id is provided for providers serving concurrent sessions
(gateway group chats, cached agents). Providers that don't need
per-session scoping can ignore it.
"""
return ""
def queue_prefetch(self, query: str, *, session_id: str = "") -> None:
"""Queue a background recall for the NEXT turn.
Called after each turn completes. The result will be consumed
by prefetch() on the next turn. Default is no-op — providers
that do background prefetching should override this.
"""
def sync_turn(self, user_content: str, assistant_content: str, *, session_id: str = "") -> None:
"""Persist a completed turn to the backend.
Called after each turn. Should be non-blocking — queue for
background processing if the backend has latency.
"""
@abstractmethod
def get_tool_schemas(self) -> List[Dict[str, Any]]:
"""Return tool schemas this provider exposes.
Each schema follows the OpenAI function calling format:
{"name": "...", "description": "...", "parameters": {...}}
Return empty list if this provider has no tools (context-only).
"""
def handle_tool_call(self, tool_name: str, args: Dict[str, Any], **kwargs) -> str:
"""Handle a tool call for one of this provider's tools.
Must return a JSON string (the tool result).
Only called for tool names returned by get_tool_schemas().
"""
raise NotImplementedError(f"Provider {self.name} does not handle tool {tool_name}")
def shutdown(self) -> None:
"""Clean shutdown — flush queues, close connections."""
# -- Optional hooks (override to opt in) ---------------------------------
def on_turn_start(self, turn_number: int, message: str, **kwargs) -> None:
"""Called at the start of each turn with the user message.
Use for turn-counting, scope management, periodic maintenance.
kwargs may include: remaining_tokens, model, platform, tool_count.
Providers use what they need; extras are ignored.
"""
def on_session_end(self, messages: List[Dict[str, Any]]) -> None:
"""Called when a session ends (explicit exit or timeout).
Use for end-of-session fact extraction, summarization, etc.
messages is the full conversation history.
NOT called after every turn — only at actual session boundaries
(CLI exit, /reset, gateway session expiry).
"""
def on_pre_compress(self, messages: List[Dict[str, Any]]) -> str:
"""Called before context compression discards old messages.
Use to extract insights from messages about to be compressed.
messages is the list that will be summarized/discarded.
Return text to include in the compression summary prompt so the
compressor preserves provider-extracted insights. Return empty
string for no contribution (backwards-compatible default).
"""
return ""
def on_delegation(self, task: str, result: str, *,
child_session_id: str = "", **kwargs) -> None:
"""Called on the PARENT agent when a subagent completes.
The parent's memory provider gets the task+result pair as an
observation of what was delegated and what came back. The subagent
itself has no provider session (skip_memory=True).
task: the delegation prompt
result: the subagent's final response
child_session_id: the subagent's session_id
"""
def get_config_schema(self) -> List[Dict[str, Any]]:
"""Return config fields this provider needs for setup.
Used by 'hermes memory setup' to walk the user through configuration.
Each field is a dict with:
key: config key name (e.g. 'api_key', 'mode')
description: human-readable description
secret: True if this should go to .env (default: False)
required: True if required (default: False)
default: default value (optional)
choices: list of valid values (optional)
url: URL where user can get this credential (optional)
env_var: explicit env var name for secrets (default: auto-generated)
Return empty list if no config needed (e.g. local-only providers).
"""
return []
def save_config(self, values: Dict[str, Any], hermes_home: str) -> None:
"""Write non-secret config to the provider's native location.
Called by 'hermes memory setup' after collecting user inputs.
``values`` contains only non-secret fields (secrets go to .env).
``hermes_home`` is the active HERMES_HOME directory path.
Providers with native config files (JSON, YAML) should override
this to write to their expected location. Providers that use only
env vars can leave the default (no-op).
All new memory provider plugins MUST implement either:
- save_config() for native config file formats, OR
- use only env vars (in which case get_config_schema() fields
should all have ``env_var`` set and this method stays no-op).
"""
def on_memory_write(self, action: str, target: str, content: str) -> None:
"""Called when the built-in memory tool writes an entry.
action: 'add', 'replace', or 'remove'
target: 'memory' or 'user'
content: the entry content
Use to mirror built-in memory writes to your backend.
"""

View File

@@ -187,7 +187,29 @@ TOOL_USE_ENFORCEMENT_GUIDANCE = (
# Model name substrings that trigger tool-use enforcement guidance.
# Add new patterns here when a model family needs explicit steering.
TOOL_USE_ENFORCEMENT_MODELS = ("gpt", "codex")
TOOL_USE_ENFORCEMENT_MODELS = ("gpt", "codex", "gemini", "gemma")
# Gemini/Gemma-specific operational guidance, adapted from OpenCode's gemini.txt.
# Injected alongside TOOL_USE_ENFORCEMENT_GUIDANCE when the model is Gemini or Gemma.
GOOGLE_MODEL_OPERATIONAL_GUIDANCE = (
"# Google model operational directives\n"
"Follow these operational rules strictly:\n"
"- **Absolute paths:** Always construct and use absolute file paths for all "
"file system operations. Combine the project root with relative paths.\n"
"- **Verify first:** Use read_file/search_files to check file contents and "
"project structure before making changes. Never guess at file contents.\n"
"- **Dependency checks:** Never assume a library is available. Check "
"package.json, requirements.txt, Cargo.toml, etc. before importing.\n"
"- **Conciseness:** Keep explanatory text brief — a few sentences, not "
"paragraphs. Focus on actions and results over narration.\n"
"- **Parallel tool calls:** When you need to perform multiple independent "
"operations (e.g. reading several files), make all the tool calls in a "
"single response rather than sequentially.\n"
"- **Non-interactive commands:** Use flags like -y, --yes, --non-interactive "
"to prevent CLI tools from hanging on prompts.\n"
"- **Keep going:** Work autonomously until the task is fully resolved. "
"Don't stop with a plan — execute it.\n"
)
# Model name substrings that should use the 'developer' role instead of
# 'system' for the system prompt. OpenAI's newer models (GPT-5, Codex)

94
cli.py
View File

@@ -508,6 +508,8 @@ from tools.browser_tool import _emergency_cleanup_all_sessions as _cleanup_all_b
# Guard to prevent cleanup from running multiple times on exit
_cleanup_done = False
# Weak reference to the active AIAgent for memory provider shutdown at exit
_active_agent_ref = None
def _run_cleanup():
"""Run resource cleanup exactly once."""
@@ -536,6 +538,15 @@ def _run_cleanup():
shutdown_cached_clients()
except Exception:
pass
# Shut down memory provider (on_session_end + shutdown_all) at actual
# session boundary — NOT per-turn inside run_conversation().
try:
if _active_agent_ref and hasattr(_active_agent_ref, 'shutdown_memory_provider'):
_active_agent_ref.shutdown_memory_provider(
getattr(_active_agent_ref, 'conversation_history', None) or []
)
except Exception:
pass
# =============================================================================
@@ -1602,6 +1613,28 @@ class HermesCLI:
pass
return changed
if resolved_provider in {"opencode-zen", "opencode-go"}:
try:
from hermes_cli.models import normalize_opencode_model_id, opencode_model_api_mode
canonical = normalize_opencode_model_id(resolved_provider, current_model)
if canonical and canonical != current_model:
if not self._model_is_default:
self.console.print(
f"[yellow]⚠️ Stripped provider prefix from '{current_model}'; using '{canonical}' for {resolved_provider}.[/]"
)
self.model = canonical
current_model = canonical
changed = True
resolved_mode = opencode_model_api_mode(resolved_provider, current_model)
if resolved_mode != self.api_mode:
self.api_mode = resolved_mode
changed = True
except Exception:
pass
return changed
if resolved_provider != "openai-codex":
return False
@@ -2196,7 +2229,7 @@ class HermesCLI:
session_db=self._session_db,
clarify_callback=self._clarify_callback,
reasoning_callback=self._current_reasoning_callback(),
honcho_session_key=None, # resolved by run_agent via config sessions map / title
fallback_model=self._fallback_model,
thinking_callback=self._on_thinking,
checkpoints_enabled=self.checkpoints_enabled,
@@ -2208,6 +2241,9 @@ class HermesCLI:
stream_delta_callback=self._stream_delta if self.streaming_enabled else None,
tool_gen_callback=self._on_tool_gen_start if self.streaming_enabled else None,
)
# Store reference for atexit memory provider shutdown
global _active_agent_ref
_active_agent_ref = self.agent
# Route agent status output through prompt_toolkit so ANSI escape
# sequences aren't garbled by patch_stdout's StdoutProxy (#2262).
self.agent._print_fn = _cprint
@@ -3215,6 +3251,9 @@ class HermesCLI:
def reset_conversation(self):
"""Reset the conversation by starting a new session."""
# Shut down memory provider before resetting — actual session boundary
if hasattr(self, 'agent') and self.agent:
self.agent.shutdown_memory_provider(self.conversation_history)
self.new_session()
def save_conversation(self):
@@ -3879,28 +3918,6 @@ class HermesCLI:
try:
if self._session_db.set_session_title(self.session_id, new_title):
_cprint(f" Session title set: {new_title}")
# Re-map Honcho session key to new title
if self.agent and getattr(self.agent, '_honcho', None):
try:
hcfg = self.agent._honcho_config
new_key = (
hcfg.resolve_session_name(
session_title=new_title,
session_id=self.agent.session_id,
)
if hcfg else new_title
)
if new_key and new_key != self.agent._honcho_session_key:
old_key = self.agent._honcho_session_key
self.agent._honcho.get_or_create(new_key)
self.agent._honcho_session_key = new_key
from tools.honcho_tools import set_session_context
set_session_context(self.agent._honcho, new_key)
from agent.display import honcho_session_line, write_tty
write_tty(honcho_session_line(hcfg.workspace_id, new_key) + "\n")
_cprint(f" Honcho session: {old_key}{new_key}")
except Exception:
pass
else:
_cprint(" Session not found in database.")
except ValueError as e:
@@ -4365,7 +4382,6 @@ class HermesCLI:
user_message=btw_prompt,
conversation_history=history_snapshot,
task_id=task_id,
sync_honcho=False,
)
response = (result.get("final_response") or "") if result else ""
@@ -4795,12 +4811,7 @@ class HermesCLI:
f" ✅ Compressed: {original_count}{new_count} messages "
f"(~{approx_tokens:,} → ~{new_tokens:,} tokens)"
)
# Flush Honcho async queue so queued messages land before context resets
if self.agent and getattr(self.agent, '_honcho', None):
try:
self.agent._honcho.flush_all()
except Exception:
pass
except Exception as e:
print(f" ❌ Compression failed: {e}")
@@ -6461,17 +6472,6 @@ class HermesCLI:
# One-line Honcho session indicator (TTY-only, not captured by agent).
# Only show when the user explicitly configured Honcho for Hermes
# (not auto-enabled from a stray HONCHO_API_KEY env var).
try:
from honcho_integration.client import HonchoClientConfig
from agent.display import honcho_session_line, write_tty
hcfg = HonchoClientConfig.from_global_config()
if hcfg.enabled and (hcfg.api_key or hcfg.base_url) and hcfg.explicitly_configured:
sname = hcfg.resolve_session_name(session_id=self.session_id)
if sname:
write_tty(honcho_session_line(hcfg.workspace_id, sname) + "\n")
except Exception:
pass
# If resuming a session, load history and display it immediately
# so the user has context before typing their first message.
if self._resumed:
@@ -7790,12 +7790,6 @@ class HermesCLI:
set_sudo_password_callback(None)
set_approval_callback(None)
set_secret_capture_callback(None)
# Flush + shut down Honcho async writer (drains queue before exit)
if self.agent and getattr(self.agent, '_honcho', None):
try:
self.agent._honcho.shutdown()
except (Exception, KeyboardInterrupt):
pass
# Close session in SQLite
if hasattr(self, '_session_db') and self._session_db and self.agent:
try:
@@ -8020,6 +8014,12 @@ def main(
if response:
print(response)
print(f"\nsession_id: {cli.session_id}")
# Ensure proper exit code for automation wrappers
sys.exit(1 if isinstance(result, dict) and result.get("failed") else 0)
# Exit with error code if credentials or agent init fails
sys.exit(1)
else:
cli.show_banner()
cli.console.print(f"[bold blue]Query:[/] {query}")

View File

@@ -437,6 +437,7 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
provider_sort=pr.get("sort"),
disabled_toolsets=["cronjob", "messaging", "clarify"],
quiet_mode=True,
skip_memory=True, # Cron system prompts would corrupt user representations
platform="cron",
session_id=_cron_session_id,
session_db=_session_db,

View File

@@ -323,7 +323,18 @@ class SlackAdapter(BasePlatformAdapter):
Prefers metadata thread_id (the thread parent's ts, set by the
gateway) over reply_to (which may be a child message's ts).
When ``reply_in_thread`` is ``false`` in the platform extra config,
top-level channel messages receive direct channel replies instead of
thread replies. Messages that originate inside an existing thread are
always replied to in-thread to preserve conversation context.
"""
# When reply_in_thread is disabled (default: True for backward compat),
# only thread messages that are already part of an existing thread.
if not self.config.extra.get("reply_in_thread", True):
existing_thread = (metadata or {}).get("thread_id") or (metadata or {}).get("thread_ts")
return existing_thread or None
if metadata:
if metadata.get("thread_id"):
return metadata["thread_id"]

View File

@@ -474,8 +474,6 @@ class GatewayRunner:
# Persistent Honcho managers keyed by gateway session key.
# This preserves write_frequency="session" semantics across short-lived
# per-message AIAgent instances.
self._honcho_managers: Dict[str, Any] = {}
self._honcho_configs: Dict[str, Any] = {}
@@ -508,61 +506,9 @@ class GatewayRunner:
# Track background tasks to prevent garbage collection mid-execution
self._background_tasks: set = set()
def _get_or_create_gateway_honcho(self, session_key: str):
"""Return a persistent Honcho manager/config pair for this gateway session."""
if not hasattr(self, "_honcho_managers"):
self._honcho_managers = {}
if not hasattr(self, "_honcho_configs"):
self._honcho_configs = {}
if session_key in self._honcho_managers:
return self._honcho_managers[session_key], self._honcho_configs.get(session_key)
try:
from honcho_integration.client import HonchoClientConfig, get_honcho_client
from honcho_integration.session import HonchoSessionManager
hcfg = HonchoClientConfig.from_global_config()
if not hcfg.enabled or not (hcfg.api_key or hcfg.base_url):
return None, hcfg
client = get_honcho_client(hcfg)
manager = HonchoSessionManager(
honcho=client,
config=hcfg,
context_tokens=hcfg.context_tokens,
)
self._honcho_managers[session_key] = manager
self._honcho_configs[session_key] = hcfg
return manager, hcfg
except Exception as e:
logger.debug("Gateway Honcho init failed for %s: %s", session_key, e)
return None, None
def _shutdown_gateway_honcho(self, session_key: str) -> None:
"""Flush and close the persistent Honcho manager for a gateway session."""
managers = getattr(self, "_honcho_managers", None)
configs = getattr(self, "_honcho_configs", None)
if managers is None or configs is None:
return
manager = managers.pop(session_key, None)
configs.pop(session_key, None)
if not manager:
return
try:
manager.shutdown()
except Exception as e:
logger.debug("Gateway Honcho shutdown failed for %s: %s", session_key, e)
def _shutdown_all_gateway_honcho(self) -> None:
"""Flush and close all persistent Honcho managers."""
managers = getattr(self, "_honcho_managers", None)
if not managers:
return
for session_key in list(managers.keys()):
self._shutdown_gateway_honcho(session_key)
# -- Setup skill availability ----------------------------------------
def _has_setup_skill(self) -> bool:
@@ -627,7 +573,6 @@ class GatewayRunner:
def _flush_memories_for_session(
self,
old_session_id: str,
honcho_session_key: Optional[str] = None,
):
"""Prompt the agent to save memories/skills before context is lost.
@@ -660,9 +605,9 @@ class GatewayRunner:
model=model,
max_iterations=8,
quiet_mode=True,
skip_memory=True, # Flush agent — no memory provider
enabled_toolsets=["memory", "skills"],
session_id=old_session_id,
honcho_session_key=honcho_session_key,
)
# Fully silence the flush agent — quiet_mode only suppresses init
# messages; tool call output still leaks to the terminal through
@@ -725,22 +670,14 @@ class GatewayRunner:
tmp_agent.run_conversation(
user_message=flush_prompt,
conversation_history=msgs,
sync_honcho=False,
)
logger.info("Pre-reset memory flush completed for session %s", old_session_id)
# Flush any queued Honcho writes before the session is dropped
if getattr(tmp_agent, '_honcho', None):
try:
tmp_agent._honcho.shutdown()
except Exception:
pass
except Exception as e:
logger.debug("Pre-reset memory flush failed for session %s: %s", old_session_id, e)
async def _async_flush_memories(
self,
old_session_id: str,
honcho_session_key: Optional[str] = None,
):
"""Run the sync memory flush in a thread pool so it won't block the event loop."""
loop = asyncio.get_event_loop()
@@ -748,7 +685,6 @@ class GatewayRunner:
None,
self._flush_memories_for_session,
old_session_id,
honcho_session_key,
)
@property
@@ -1291,7 +1227,14 @@ class GatewayRunner:
)
try:
await self._async_flush_memories(entry.session_id, key)
self._shutdown_gateway_honcho(key)
# Shut down memory provider on the cached agent
cached_agent = self._running_agents.get(key)
if cached_agent and cached_agent is not _AGENT_PENDING_SENTINEL:
try:
if hasattr(cached_agent, 'shutdown_memory_provider'):
cached_agent.shutdown_memory_provider()
except Exception:
pass
# Mark as flushed and persist to disk so the flag
# survives gateway restarts.
with self.session_store._lock:
@@ -1425,6 +1368,12 @@ class GatewayRunner:
logger.debug("Interrupted running agent for session %s during shutdown", session_key[:20])
except Exception as e:
logger.debug("Failed interrupting agent during shutdown: %s", e)
# Shut down memory provider at actual session boundary
try:
if hasattr(agent, 'shutdown_memory_provider'):
agent.shutdown_memory_provider()
except Exception:
pass
for platform, adapter in list(self.adapters.items()):
try:
@@ -1446,7 +1395,6 @@ class GatewayRunner:
self._running_agents.clear()
self._pending_messages.clear()
self._pending_approvals.clear()
self._shutdown_all_gateway_honcho()
self._shutdown_event.set()
from gateway.status import remove_pid_file, write_runtime_status
@@ -2449,7 +2397,8 @@ class GatewayRunner:
)
# One-time prompt if no home channel is set for this platform
if not history and source.platform and source.platform != Platform.LOCAL:
# Skip for webhooks - they deliver directly to configured targets (github_comment, etc.)
if not history and source.platform and source.platform != Platform.LOCAL and source.platform != Platform.WEBHOOK:
platform_name = source.platform.value
env_key = f"{platform_name.upper()}_HOME_CHANNEL"
if not os.getenv(env_key):
@@ -2721,27 +2670,12 @@ class GatewayRunner:
except Exception as e:
logger.error("Process watcher setup error: %s", e)
# Check if the agent encountered a dangerous command needing approval
try:
from tools.approval import pop_pending
import time as _time
pending = pop_pending(session_key)
if pending:
pending["timestamp"] = _time.time()
self._pending_approvals[session_key] = pending
# Append structured instructions so the user knows how to respond
cmd_preview = pending.get("command", "")
if len(cmd_preview) > 200:
cmd_preview = cmd_preview[:200] + "..."
approval_hint = (
f"\n\n⚠️ **Dangerous command requires approval:**\n"
f"```\n{cmd_preview}\n```\n"
f"Reply `/approve` to execute, `/approve session` to approve this pattern "
f"for the session, or `/deny` to cancel."
)
response = (response or "") + approval_hint
except Exception as e:
logger.debug("Failed to check pending approvals: %s", e)
# NOTE: Dangerous command approvals are now handled inline by the
# blocking gateway approval mechanism in tools/approval.py. The agent
# thread blocks until the user responds with /approve or /deny, so by
# the time we reach here the approval has already been resolved. The
# old post-loop pop_pending + approval_hint code was removed in favour
# of the blocking approach that mirrors CLI's synchronous input().
# Save the full conversation to the transcript, including tool calls.
# This preserves the complete agent loop (tool_calls, tool results,
@@ -2819,20 +2753,12 @@ class GatewayRunner:
skip_db=agent_persisted,
)
# Update session with actual prompt token count and model from the agent
# Token counts and model are now persisted by the agent directly.
# Keep only last_prompt_tokens here for context-window tracking and
# compression decisions.
self.session_store.update_session(
session_entry.session_key,
input_tokens=agent_result.get("input_tokens", 0),
output_tokens=agent_result.get("output_tokens", 0),
cache_read_tokens=agent_result.get("cache_read_tokens", 0),
cache_write_tokens=agent_result.get("cache_write_tokens", 0),
last_prompt_tokens=agent_result.get("last_prompt_tokens", 0),
model=agent_result.get("model"),
estimated_cost_usd=agent_result.get("estimated_cost_usd"),
cost_status=agent_result.get("cost_status"),
cost_source=agent_result.get("cost_source"),
provider=agent_result.get("provider"),
base_url=agent_result.get("base_url"),
)
# Auto voice reply: send TTS audio before the text response
@@ -3014,8 +2940,6 @@ class GatewayRunner:
_flush_task.add_done_callback(self._background_tasks.discard)
except Exception as e:
logger.debug("Gateway memory flush on reset failed: %s", e)
self._shutdown_gateway_honcho(session_key)
self._evict_cached_agent(session_key)
# Reset the session
@@ -4166,7 +4090,6 @@ class GatewayRunner:
user_message=btw_prompt,
conversation_history=history_snapshot,
task_id=task_id,
sync_honcho=False,
)
loop = asyncio.get_event_loop()
@@ -4548,8 +4471,6 @@ class GatewayRunner:
except Exception as e:
logger.debug("Memory flush on resume failed: %s", e)
self._shutdown_gateway_honcho(session_key)
# Clear any running agent for this session key
if session_key in self._running_agents:
del self._running_agents[session_key]
@@ -4730,123 +4651,93 @@ class GatewayRunner:
_APPROVAL_TIMEOUT_SECONDS = 300 # 5 minutes
async def _handle_approve_command(self, event: MessageEvent) -> Optional[str]:
"""Handle /approve command — execute a pending dangerous command.
"""Handle /approve command — unblock waiting agent thread(s).
After execution, re-invokes the agent with the command result so it
can continue its multi-step task (fixes the "dead agent" bug where
the agent loop exited on approval_required and never resumed).
The agent thread(s) are blocked inside tools/approval.py waiting for
the user to respond. This handler signals the event so the agent
resumes and the terminal_tool executes the command inline — the same
flow as the CLI's synchronous input() approval.
Supports multiple concurrent approvals (parallel subagents,
execute_code). ``/approve`` resolves the oldest pending command;
``/approve all`` resolves every pending command at once.
Usage:
/approve — approve and execute the pending command
/approve session — approve and remember for this session
/approve always — approve this pattern permanently
/approve — approve oldest pending command once
/approve all — approve ALL pending commands at once
/approve session — approve oldest + remember for session
/approve all session — approve all + remember for session
/approve always — approve oldest + remember permanently
/approve all always — approve all + remember permanently
"""
source = event.source
session_key = self._session_key_for_source(source)
if session_key not in self._pending_approvals:
from tools.approval import (
resolve_gateway_approval, has_blocking_approval,
pending_approval_count,
)
if not has_blocking_approval(session_key):
if session_key in self._pending_approvals:
self._pending_approvals.pop(session_key)
return "⚠️ Approval expired (agent is no longer waiting). Ask the agent to try again."
return "No pending command to approve."
import time as _time
approval = self._pending_approvals[session_key]
# Parse args: support "all", "all session", "all always", "session", "always"
args = event.get_command_args().strip().lower().split()
resolve_all = "all" in args
remaining = [a for a in args if a != "all"]
# Check for timeout
ts = approval.get("timestamp", 0)
if _time.time() - ts > self._APPROVAL_TIMEOUT_SECONDS:
self._pending_approvals.pop(session_key, None)
return "⚠️ Approval expired (timed out after 5 minutes). Ask the agent to try again."
self._pending_approvals.pop(session_key)
cmd = approval["command"]
pattern_keys = approval.get("pattern_keys", [])
if not pattern_keys:
pk = approval.get("pattern_key", "")
pattern_keys = [pk] if pk else []
# Determine approval scope from args
args = event.get_command_args().strip().lower()
from tools.approval import approve_session, approve_permanent
if args in ("always", "permanent", "permanently"):
for pk in pattern_keys:
approve_permanent(pk)
if any(a in ("always", "permanent", "permanently") for a in remaining):
choice = "always"
scope_msg = " (pattern approved permanently)"
elif args in ("session", "ses"):
for pk in pattern_keys:
approve_session(session_key, pk)
elif any(a in ("session", "ses") for a in remaining):
choice = "session"
scope_msg = " (pattern approved for this session)"
else:
# One-time approval — just approve for session so the immediate
# replay works, but don't advertise it as session-wide
for pk in pattern_keys:
approve_session(session_key, pk)
choice = "once"
scope_msg = ""
logger.info("User approved dangerous command via /approve: %s...%s", cmd[:60], scope_msg)
from tools.terminal_tool import terminal_tool
result = await asyncio.to_thread(terminal_tool, command=cmd, force=True)
count = resolve_gateway_approval(session_key, choice, resolve_all=resolve_all)
if not count:
return "No pending command to approve."
# Send immediate feedback so the user sees the command output right away
immediate_msg = f"✅ Command approved and executed{scope_msg}.\n\n```\n{result[:3500]}\n```"
adapter = self.adapters.get(source.platform)
if adapter:
try:
await adapter.send(source.chat_id, immediate_msg)
except Exception as e:
logger.warning("Failed to send approval feedback: %s", e)
# Re-invoke the agent with the command result so it can continue its task.
# The agent's conversation history (persisted in SQLite) already contains
# the tool call that returned approval_required — the continuation message
# provides the actual execution output so the agent can pick up where it
# left off.
continuation_text = (
f"[System: The user approved the previously blocked command and it has been executed.\n"
f"Command: {cmd}\n"
f"<command_output>\n{result[:3500]}\n</command_output>\n\n"
f"Continue with the task you were working on.]"
)
synthetic_event = MessageEvent(
text=continuation_text,
source=source,
message_id=f"approve-continuation-{uuid.uuid4().hex}",
)
async def _continue_agent():
try:
response = await self._handle_message(synthetic_event)
if response and adapter:
await adapter.send(source.chat_id, response)
except Exception as e:
logger.error("Failed to continue agent after /approve: %s", e)
if adapter:
try:
await adapter.send(
source.chat_id,
f"⚠️ Failed to resume agent after approval: {e}"
)
except Exception:
pass
_task = asyncio.create_task(_continue_agent())
self._background_tasks.add(_task)
_task.add_done_callback(self._background_tasks.discard)
# Return None — we already sent the immediate feedback and the agent
# continuation is running in the background.
return None
count_msg = f" ({count} commands)" if count > 1 else ""
logger.info("User approved %d dangerous command(s) via /approve%s", count, scope_msg)
return f"✅ Command{'s' if count > 1 else ''} approved{scope_msg}{count_msg}. The agent is resuming..."
async def _handle_deny_command(self, event: MessageEvent) -> str:
"""Handle /deny command — reject a pending dangerous command."""
"""Handle /deny command — reject pending dangerous command(s).
Signals blocked agent thread(s) with a 'deny' result so they receive
a definitive BLOCKED message, same as the CLI deny flow.
``/deny`` denies the oldest; ``/deny all`` denies everything.
"""
source = event.source
session_key = self._session_key_for_source(source)
if session_key not in self._pending_approvals:
from tools.approval import (
resolve_gateway_approval, has_blocking_approval,
)
if not has_blocking_approval(session_key):
if session_key in self._pending_approvals:
self._pending_approvals.pop(session_key)
return "❌ Command denied (approval was stale)."
return "No pending command to deny."
self._pending_approvals.pop(session_key)
logger.info("User denied dangerous command via /deny")
return "❌ Command denied."
args = event.get_command_args().strip().lower()
resolve_all = "all" in args
count = resolve_gateway_approval(session_key, "deny", resolve_all=resolve_all)
if not count:
return "No pending command to deny."
count_msg = f" ({count} commands)" if count > 1 else ""
logger.info("User denied %d dangerous command(s) via /deny", count)
return f"❌ Command{'s' if count > 1 else ''} denied{count_msg}."
async def _handle_update_command(self, event: MessageEvent) -> str:
"""Handle /update command — update Hermes Agent to the latest version.
@@ -5409,7 +5300,10 @@ class GatewayRunner:
or os.getenv("HERMES_TOOL_PROGRESS_MODE")
or "all"
)
tool_progress_enabled = progress_mode != "off"
# Disable tool progress for webhooks - they don't support message editing,
# so each progress line would be sent as a separate message.
from gateway.config import Platform
tool_progress_enabled = progress_mode != "off" and source.platform != Platform.WEBHOOK
# Queue for progress messages (thread-safe)
progress_queue = queue.Queue() if tool_progress_enabled else None
@@ -5648,7 +5542,6 @@ class GatewayRunner:
}
pr = self._provider_routing
honcho_manager, honcho_config = self._get_or_create_gateway_honcho(session_key)
reasoning_config = self._load_reasoning_config()
self._reasoning_config = reasoning_config
# Set up streaming consumer if enabled
@@ -5721,9 +5614,6 @@ class GatewayRunner:
provider_data_collection=pr.get("data_collection"),
session_id=session_id,
platform=platform_key,
honcho_session_key=session_key,
honcho_manager=honcho_manager,
honcho_config=honcho_config,
session_db=self._session_db,
fallback_model=self._fallback_model,
)
@@ -5829,7 +5719,42 @@ class GatewayRunner:
if _p:
_history_media_paths.add(_p)
result = agent.run_conversation(message, conversation_history=agent_history, task_id=session_id)
# Register per-session gateway approval callback so dangerous
# command approval blocks the agent thread (mirrors CLI input()).
# The callback bridges sync→async to send the approval request
# to the user immediately.
from tools.approval import register_gateway_notify, unregister_gateway_notify
def _approval_notify_sync(approval_data: dict) -> None:
"""Send the approval request to the user from the agent thread."""
cmd = approval_data.get("command", "")
cmd_preview = cmd[:200] + "..." if len(cmd) > 200 else cmd
desc = approval_data.get("description", "dangerous command")
msg = (
f"⚠️ **Dangerous command requires approval:**\n"
f"```\n{cmd_preview}\n```\n"
f"Reason: {desc}\n\n"
f"Reply `/approve` to execute, `/approve session` to approve this pattern "
f"for the session, `/approve always` to approve permanently, or `/deny` to cancel."
)
try:
asyncio.run_coroutine_threadsafe(
_status_adapter.send(
_status_chat_id,
msg,
metadata=_status_thread_metadata,
),
_loop_for_step,
).result(timeout=15)
except Exception as _e:
logger.error("Failed to send approval request: %s", _e)
_approval_session_key = session_key or ""
register_gateway_notify(_approval_session_key, _approval_notify_sync)
try:
result = agent.run_conversation(message, conversation_history=agent_history, task_id=session_id)
finally:
unregister_gateway_notify(_approval_session_key)
result_holder[0] = result
# Signal the stream consumer that the agent is done

View File

@@ -778,66 +778,18 @@ class SessionStore:
def update_session(
self,
session_key: str,
input_tokens: int = 0,
output_tokens: int = 0,
cache_read_tokens: int = 0,
cache_write_tokens: int = 0,
last_prompt_tokens: int = None,
model: str = None,
estimated_cost_usd: Optional[float] = None,
cost_status: Optional[str] = None,
cost_source: Optional[str] = None,
provider: Optional[str] = None,
base_url: Optional[str] = None,
) -> None:
"""Update a session's metadata after an interaction."""
db_session_id = None
"""Update lightweight session metadata after an interaction."""
with self._lock:
self._ensure_loaded_locked()
if session_key in self._entries:
entry = self._entries[session_key]
entry.updated_at = _now()
# Direct assignment — the gateway receives cumulative totals
# from the cached agent, not per-call deltas.
entry.input_tokens = input_tokens
entry.output_tokens = output_tokens
entry.cache_read_tokens = cache_read_tokens
entry.cache_write_tokens = cache_write_tokens
if last_prompt_tokens is not None:
entry.last_prompt_tokens = last_prompt_tokens
if estimated_cost_usd is not None:
entry.estimated_cost_usd = estimated_cost_usd
if cost_status:
entry.cost_status = cost_status
entry.total_tokens = (
entry.input_tokens
+ entry.output_tokens
+ entry.cache_read_tokens
+ entry.cache_write_tokens
)
self._save()
db_session_id = entry.session_id
if self._db and db_session_id:
try:
self._db.set_token_counts(
db_session_id,
input_tokens=input_tokens,
output_tokens=output_tokens,
cache_read_tokens=cache_read_tokens,
cache_write_tokens=cache_write_tokens,
estimated_cost_usd=estimated_cost_usd,
cost_status=cost_status,
cost_source=cost_source,
billing_provider=provider,
billing_base_url=base_url,
model=model,
absolute=True,
)
except Exception as e:
logger.debug("Session DB operation failed: %s", e)
def reset_session(self, session_key: str) -> Optional[SessionEntry]:
"""Force reset a session, creating a new session ID."""

View File

@@ -200,6 +200,10 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
id="opencode-go",
name="OpenCode Go",
auth_type="api_key",
# OpenCode Go mixes API surfaces by model:
# - GLM / Kimi use OpenAI-compatible chat completions under /v1
# - MiniMax models use Anthropic Messages under /v1/messages
# Keep the provider base at /v1 and select api_mode per-model.
inference_base_url="https://opencode.ai/zen/go/v1",
api_key_env_vars=("OPENCODE_GO_API_KEY",),
base_url_env_var="OPENCODE_GO_BASE_URL",

View File

@@ -428,6 +428,11 @@ DEFAULT_CONFIG = {
"user_profile_enabled": True,
"memory_char_limit": 2200, # ~800 tokens at 2.75 chars/token
"user_char_limit": 1375, # ~500 tokens at 2.75 chars/token
# External memory provider plugin (empty = built-in only).
# Set to a provider name to activate: "openviking", "mem0",
# "hindsight", "holographic", "retaindb", "byterover".
# Only ONE external provider is allowed at a time.
"provider": "",
},
# Subagent delegation — override the provider:model used by delegate_task

View File

@@ -55,7 +55,7 @@ def _has_provider_env_config(content: str) -> bool:
def _honcho_is_configured_for_doctor() -> bool:
"""Return True when Honcho is configured, even if this process has no active session."""
try:
from honcho_integration.client import HonchoClientConfig
from plugins.memory.honcho.client import HonchoClientConfig
cfg = HonchoClientConfig.from_global_config()
return bool(cfg.enabled and (cfg.api_key or cfg.base_url))
@@ -709,19 +709,19 @@ def run_doctor(args):
print(color("◆ Honcho Memory", Colors.CYAN, Colors.BOLD))
try:
from honcho_integration.client import HonchoClientConfig, resolve_config_path
from plugins.memory.honcho.client import HonchoClientConfig, resolve_config_path
hcfg = HonchoClientConfig.from_global_config()
_honcho_cfg_path = resolve_config_path()
if not _honcho_cfg_path.exists():
check_warn("Honcho config not found", "run: hermes honcho setup")
check_warn("Honcho config not found", "run: hermes memory setup")
elif not hcfg.enabled:
check_info(f"Honcho disabled (set enabled: true in {_honcho_cfg_path} to activate)")
elif not (hcfg.api_key or hcfg.base_url):
check_fail("Honcho API key or base URL not set", "run: hermes honcho setup")
issues.append("No Honcho API key — run 'hermes honcho setup'")
check_fail("Honcho API key or base URL not set", "run: hermes memory setup")
issues.append("No Honcho API key — run 'hermes memory setup'")
else:
from honcho_integration.client import get_honcho_client, reset_honcho_client
from plugins.memory.honcho.client import get_honcho_client, reset_honcho_client
reset_honcho_client()
try:
get_honcho_client(hcfg)

View File

@@ -1645,81 +1645,8 @@ def _model_flow_named_custom(config, provider_info):
print(f" Provider: {name} ({base_url})")
# Curated model lists for direct API-key providers
_PROVIDER_MODELS = {
"copilot-acp": [
"copilot-acp",
],
"copilot": [
"gpt-5.4",
"gpt-5.4-mini",
"gpt-5-mini",
"gpt-5.3-codex",
"gpt-5.2-codex",
"gpt-4.1",
"gpt-4o",
"gpt-4o-mini",
"claude-opus-4.6",
"claude-sonnet-4.6",
"claude-sonnet-4.5",
"claude-haiku-4.5",
"gemini-2.5-pro",
"grok-code-fast-1",
],
"zai": [
"glm-5",
"glm-4.7",
"glm-4.5",
"glm-4.5-flash",
],
"kimi-coding": [
"kimi-for-coding",
"kimi-k2.5",
"kimi-k2-thinking",
"kimi-k2-thinking-turbo",
"kimi-k2-turbo-preview",
"kimi-k2-0905-preview",
],
"moonshot": [
"kimi-k2.5",
"kimi-k2-thinking",
"kimi-k2-turbo-preview",
"kimi-k2-0905-preview",
],
"minimax": [
"MiniMax-M2.7",
"MiniMax-M2.7-highspeed",
"MiniMax-M2.5",
"MiniMax-M2.5-highspeed",
"MiniMax-M2.1",
],
"minimax-cn": [
"MiniMax-M2.7",
"MiniMax-M2.7-highspeed",
"MiniMax-M2.5",
"MiniMax-M2.5-highspeed",
"MiniMax-M2.1",
],
"kilocode": [
"anthropic/claude-opus-4.6",
"anthropic/claude-sonnet-4.6",
"openai/gpt-5.4",
"google/gemini-3-pro-preview",
"google/gemini-3-flash-preview",
],
# Curated HF model list — only agentic models that map to OpenRouter defaults.
# Format: HF model ID → OpenRouter equivalent noted in comment
"huggingface": [
"Qwen/Qwen3.5-397B-A17B", # ↔ qwen/qwen3.5-plus
"Qwen/Qwen3.5-35B-A3B", # ↔ qwen/qwen3.5-35b-a3b
"deepseek-ai/DeepSeek-V3.2", # ↔ deepseek/deepseek-chat
"moonshotai/Kimi-K2.5", # ↔ moonshotai/kimi-k2.5
"MiniMaxAI/MiniMax-M2.5", # ↔ minimax/minimax-m2.5
"zai-org/GLM-5", # ↔ z-ai/glm-5
"XiaomiMiMo/MiMo-V2-Flash", # ↔ xiaomi/mimo-v2-pro
"moonshotai/Kimi-K2-Thinking", # ↔ moonshotai/kimi-k2-thinking
],
}
# Curated model lists for direct API-key providers — single source in models.py
from hermes_cli.models import _PROVIDER_MODELS
def _current_reasoning_effort(config) -> str:
@@ -2188,12 +2115,13 @@ def _model_flow_kimi(config, current_model=""):
def _model_flow_api_key_provider(config, provider_id, current_model=""):
"""Generic flow for API-key providers (z.ai, MiniMax)."""
"""Generic flow for API-key providers (z.ai, MiniMax, OpenCode, etc.)."""
from hermes_cli.auth import (
PROVIDER_REGISTRY, _prompt_model_selection, _save_model_choice,
deactivate_provider,
)
from hermes_cli.config import get_env_value, save_env_value, load_config, save_config
from hermes_cli.models import fetch_api_models, opencode_model_api_mode, normalize_opencode_model_id
pconfig = PROVIDER_REGISTRY[provider_id]
key_env = pconfig.api_key_env_vars[0] if pconfig.api_key_env_vars else ""
@@ -2247,7 +2175,6 @@ def _model_flow_api_key_provider(config, provider_id, current_model=""):
# Curated list is substantial — use it directly, skip live probe
live_models = None
else:
from hermes_cli.models import fetch_api_models
api_key_for_probe = existing_key or (get_env_value(key_env) if key_env else "")
live_models = fetch_api_models(api_key_for_probe, effective_base)
@@ -2260,6 +2187,11 @@ def _model_flow_api_key_provider(config, provider_id, current_model=""):
print(f" Showing {len(model_list)} curated models — use \"Enter custom model name\" for others.")
# else: no defaults either, will fall through to raw input
if provider_id in {"opencode-zen", "opencode-go"}:
model_list = [normalize_opencode_model_id(provider_id, mid) for mid in model_list]
current_model = normalize_opencode_model_id(provider_id, current_model)
model_list = list(dict.fromkeys(mid for mid in model_list if mid))
if model_list:
selected = _prompt_model_selection(model_list, current_model=current_model)
else:
@@ -2269,9 +2201,12 @@ def _model_flow_api_key_provider(config, provider_id, current_model=""):
selected = None
if selected:
if provider_id in {"opencode-zen", "opencode-go"}:
selected = normalize_opencode_model_id(provider_id, selected)
_save_model_choice(selected)
# Update config with provider and base URL
# Update config with provider, base URL, and provider-specific API mode
cfg = load_config()
model = cfg.get("model")
if not isinstance(model, dict):
@@ -2279,7 +2214,10 @@ def _model_flow_api_key_provider(config, provider_id, current_model=""):
cfg["model"] = model
model["provider"] = provider_id
model["base_url"] = effective_base
model.pop("api_mode", None) # let runtime auto-detect from URL
if provider_id in {"opencode-zen", "opencode-go"}:
model["api_mode"] = opencode_model_api_mode(provider_id, selected)
else:
model.pop("api_mode", None)
save_config(cfg)
deactivate_provider()
@@ -3266,6 +3204,15 @@ def cmd_update(args):
except Exception:
pass # profiles module not available or no profiles
# Sync Honcho host blocks to all profiles
try:
from plugins.memory.honcho.cli import sync_honcho_profiles_quiet
synced = sync_honcho_profiles_quiet()
if synced:
print(f"\n-> Honcho: synced {synced} profile(s)")
except Exception:
pass # honcho plugin not installed or not configured
# Check for config migrations
print()
print("→ Checking configuration for new options...")
@@ -3608,6 +3555,15 @@ def cmd_profile(args):
else:
print(f"Cloned config, .env, SOUL.md from {source_label}.")
# Auto-clone Honcho config for the new profile (only with --clone/--clone-all)
if clone or clone_all:
try:
from plugins.memory.honcho.cli import clone_honcho_for_profile
if clone_honcho_for_profile(name):
print(f"Honcho config cloned (peer: {name})")
except Exception:
pass # Honcho plugin not installed or not configured
# Seed bundled skills (skip if --clone-all already copied them)
if not clone_all:
result = seed_profile_skills(profile_dir)
@@ -4494,27 +4450,30 @@ For more help on a command:
plugins_parser.set_defaults(func=cmd_plugins)
# =========================================================================
# honcho command
# honcho command — Honcho-specific config (peer, mode, tokens, profiles)
# Provider selection happens via 'hermes memory setup'.
# =========================================================================
honcho_parser = subparsers.add_parser(
"honcho",
help="Manage Honcho AI memory integration",
help="Manage Honcho memory provider config (peer, mode, profiles)",
description=(
"Honcho is a memory layer that persists across sessions.\n\n"
"Each conversation is stored as a peer interaction in a workspace. "
"Honcho builds a representation of the user over time — conclusions, "
"patterns, context — and surfaces the relevant slice at the start of "
"each turn so Hermes knows who you are without you having to repeat yourself.\n\n"
"Modes: hybrid (Honcho + local MEMORY.md), honcho (Honcho only), "
"local (MEMORY.md only). Write frequency is configurable so memory "
"writes never block the response."
"Configure Honcho-specific settings. Honcho is now a memory provider\n"
"plugin — initial setup is via 'hermes memory setup'. These commands\n"
"manage Honcho's own config: peer names, memory mode, token budgets,\n"
"per-profile host blocks, and cross-profile observability."
),
formatter_class=__import__("argparse").RawDescriptionHelpFormatter,
)
honcho_parser.add_argument(
"--target-profile", metavar="NAME", dest="target_profile",
help="Target a specific profile's Honcho config without switching",
)
honcho_subparsers = honcho_parser.add_subparsers(dest="honcho_command")
honcho_subparsers.add_parser("setup", help="Interactive setup wizard for Honcho integration")
honcho_subparsers.add_parser("status", help="Show current Honcho config and connection status")
honcho_subparsers.add_parser("setup", help="Initial Honcho setup (redirects to hermes memory setup)")
honcho_status = honcho_subparsers.add_parser("status", help="Show current Honcho config and connection status")
honcho_status.add_argument("--all", action="store_true", help="Show config overview across all profiles")
honcho_subparsers.add_parser("peers", help="Show peer identities across all profiles")
honcho_subparsers.add_parser("sessions", help="List known Honcho session mappings")
honcho_map = honcho_subparsers.add_parser(
@@ -4574,13 +4533,60 @@ For more help on a command:
"migrate",
help="Step-by-step migration guide from openclaw-honcho to Hermes Honcho",
)
honcho_subparsers.add_parser("enable", help="Enable Honcho for the active profile")
honcho_subparsers.add_parser("disable", help="Disable Honcho for the active profile")
honcho_subparsers.add_parser("sync", help="Sync Honcho config to all existing profiles")
def cmd_honcho(args):
from honcho_integration.cli import honcho_command
sub = getattr(args, "honcho_command", None)
if sub == "setup":
# Redirect to the generic memory setup
print("\n Honcho is now configured via the memory provider system.")
print(" Running 'hermes memory setup'...\n")
from hermes_cli.memory_setup import memory_command
memory_command(args)
return
from plugins.memory.honcho.cli import honcho_command
honcho_command(args)
honcho_parser.set_defaults(func=cmd_honcho)
# =========================================================================
# memory command
# =========================================================================
memory_parser = subparsers.add_parser(
"memory",
help="Configure external memory provider",
description=(
"Set up and manage external memory provider plugins.\n\n"
"Available providers: honcho, openviking, mem0, hindsight,\n"
"holographic, retaindb, byterover.\n\n"
"Only one external provider can be active at a time.\n"
"Built-in memory (MEMORY.md/USER.md) is always active."
),
)
memory_sub = memory_parser.add_subparsers(dest="memory_command")
memory_sub.add_parser("setup", help="Interactive provider selection and configuration")
memory_sub.add_parser("status", help="Show current memory provider config")
memory_off_p = memory_sub.add_parser("off", help="Disable external provider (built-in only)")
def cmd_memory(args):
sub = getattr(args, "memory_command", None)
if sub == "off":
from hermes_cli.config import load_config, save_config
config = load_config()
if not isinstance(config.get("memory"), dict):
config["memory"] = {}
config["memory"]["provider"] = ""
save_config(config)
print("\n ✓ Memory provider: built-in only")
print(" Saved to config.yaml\n")
else:
from hermes_cli.memory_setup import memory_command
memory_command(args)
memory_parser.set_defaults(func=cmd_memory)
# =========================================================================
# tools command
# =========================================================================

451
hermes_cli/memory_setup.py Normal file
View File

@@ -0,0 +1,451 @@
"""hermes memory setup|status — configure memory provider plugins.
Auto-detects installed memory providers via the plugin system.
Interactive curses-based UI for provider selection, then walks through
the provider's config schema. Writes config to config.yaml + .env.
"""
from __future__ import annotations
import getpass
import os
import sys
from pathlib import Path
# ---------------------------------------------------------------------------
# Curses-based interactive picker (same pattern as hermes tools)
# ---------------------------------------------------------------------------
def _curses_select(title: str, items: list[tuple[str, str]], default: int = 0) -> int:
"""Interactive single-select with arrow keys.
items: list of (label, description) tuples.
Returns selected index, or default on escape/quit.
"""
try:
import curses
result = [default]
def _menu(stdscr):
curses.curs_set(0)
if curses.has_colors():
curses.start_color()
curses.use_default_colors()
curses.init_pair(1, curses.COLOR_GREEN, -1)
curses.init_pair(2, curses.COLOR_YELLOW, -1)
curses.init_pair(3, curses.COLOR_CYAN, -1)
cursor = default
while True:
stdscr.clear()
max_y, max_x = stdscr.getmaxyx()
# Title
try:
stdscr.addnstr(0, 0, title, max_x - 1,
curses.A_BOLD | (curses.color_pair(2) if curses.has_colors() else 0))
stdscr.addnstr(1, 0, " ↑↓ navigate ⏎ select q quit", max_x - 1,
curses.color_pair(3) if curses.has_colors() else curses.A_DIM)
except curses.error:
pass
for i, (label, desc) in enumerate(items):
y = i + 3
if y >= max_y - 1:
break
arrow = "" if i == cursor else " "
line = f" {arrow} {label}"
if desc:
line += f" {desc}"
attr = curses.A_NORMAL
if i == cursor:
attr = curses.A_BOLD
if curses.has_colors():
attr |= curses.color_pair(1)
try:
stdscr.addnstr(y, 0, line[:max_x - 1], max_x - 1, attr)
except curses.error:
pass
stdscr.refresh()
key = stdscr.getch()
if key in (curses.KEY_UP, ord('k')):
cursor = (cursor - 1) % len(items)
elif key in (curses.KEY_DOWN, ord('j')):
cursor = (cursor + 1) % len(items)
elif key in (curses.KEY_ENTER, 10, 13):
result[0] = cursor
return
elif key in (27, ord('q')):
return
curses.wrapper(_menu)
return result[0]
except Exception:
# Fallback: numbered input
print(f"\n {title}\n")
for i, (label, desc) in enumerate(items):
marker = "" if i == default else " "
d = f" {desc}" if desc else ""
print(f" {marker} {i + 1}. {label}{d}")
while True:
try:
val = input(f"\n Select [1-{len(items)}] ({default + 1}): ")
if not val:
return default
idx = int(val) - 1
if 0 <= idx < len(items):
return idx
except (ValueError, EOFError):
return default
def _prompt(label: str, default: str | None = None, secret: bool = False) -> str:
"""Prompt for a value with optional default and secret masking."""
suffix = f" [{default}]" if default else ""
if secret:
sys.stdout.write(f" {label}{suffix}: ")
sys.stdout.flush()
if sys.stdin.isatty():
val = getpass.getpass(prompt="")
else:
val = sys.stdin.readline().strip()
else:
sys.stdout.write(f" {label}{suffix}: ")
sys.stdout.flush()
val = sys.stdin.readline().strip()
return val or (default or "")
# ---------------------------------------------------------------------------
# Provider discovery
# ---------------------------------------------------------------------------
def _install_dependencies(provider_name: str) -> None:
"""Install pip dependencies declared in plugin.yaml."""
import subprocess
from pathlib import Path as _Path
plugin_dir = _Path(__file__).parent.parent / "plugins" / "memory" / provider_name
yaml_path = plugin_dir / "plugin.yaml"
if not yaml_path.exists():
return
try:
import yaml
with open(yaml_path) as f:
meta = yaml.safe_load(f) or {}
except Exception:
return
pip_deps = meta.get("pip_dependencies", [])
if not pip_deps:
return
# pip name → import name mapping for packages where they differ
_IMPORT_NAMES = {
"honcho-ai": "honcho",
"mem0ai": "mem0",
"hindsight-client": "hindsight_client",
}
# Check which packages are missing
missing = []
for dep in pip_deps:
import_name = _IMPORT_NAMES.get(dep, dep.replace("-", "_").split("[")[0])
try:
__import__(import_name)
except ImportError:
missing.append(dep)
if not missing:
return
print(f"\n Installing dependencies: {', '.join(missing)}")
try:
subprocess.run(
[sys.executable, "-m", "pip", "install", "--quiet"] + missing,
check=True, timeout=120,
capture_output=True,
)
print(f" ✓ Installed {', '.join(missing)}")
except subprocess.CalledProcessError as e:
print(f" ⚠ Failed to install {', '.join(missing)}")
stderr = (e.stderr or b"").decode()[:200]
if stderr:
print(f" {stderr}")
print(f" Run manually: pip install {' '.join(missing)}")
except Exception as e:
print(f" ⚠ Install failed: {e}")
print(f" Run manually: pip install {' '.join(missing)}")
# Also show external dependencies (non-pip) if any
ext_deps = meta.get("external_dependencies", [])
for dep in ext_deps:
dep_name = dep.get("name", "")
check_cmd = dep.get("check", "")
install_cmd = dep.get("install", "")
if check_cmd:
try:
subprocess.run(
check_cmd, shell=True, capture_output=True, timeout=5
)
except Exception:
if install_cmd:
print(f"\n'{dep_name}' not found. Install with:")
print(f" {install_cmd}")
def _get_available_providers() -> list:
"""Discover memory providers from plugins/memory/.
Returns list of (name, description, provider_instance) tuples.
"""
try:
from plugins.memory import discover_memory_providers, load_memory_provider
raw = discover_memory_providers()
except Exception:
raw = []
results = []
for name, desc, available in raw:
try:
provider = load_memory_provider(name)
if not provider:
continue
except Exception:
continue
# Override description with setup hint
schema = provider.get_config_schema() if hasattr(provider, "get_config_schema") else []
has_secrets = any(f.get("secret") for f in schema)
if has_secrets:
setup_hint = "requires API key"
elif not schema:
setup_hint = "no setup needed"
else:
setup_hint = "local"
results.append((name, setup_hint, provider))
return results
# ---------------------------------------------------------------------------
# Setup wizard
# ---------------------------------------------------------------------------
def cmd_setup(args) -> None:
"""Interactive memory provider setup wizard."""
from hermes_cli.config import load_config, save_config
providers = _get_available_providers()
if not providers:
print("\n No memory provider plugins detected.")
print(" Install a plugin to ~/.hermes/plugins/ and try again.\n")
return
# Build picker items
items = []
for name, desc, _ in providers:
items.append((name, f"{desc}"))
items.append(("Built-in only", "— MEMORY.md / USER.md (default)"))
builtin_idx = len(items) - 1
selected = _curses_select("Memory provider setup", items, default=builtin_idx)
config = load_config()
if not isinstance(config.get("memory"), dict):
config["memory"] = {}
# Built-in only
if selected >= len(providers) or selected < 0:
config["memory"]["provider"] = ""
save_config(config)
print("\n ✓ Memory provider: built-in only")
print(" Saved to config.yaml\n")
return
name, _, provider = providers[selected]
# Install pip dependencies if declared in plugin.yaml
_install_dependencies(name)
schema = provider.get_config_schema() if hasattr(provider, "get_config_schema") else []
# Provider config section
provider_config = config["memory"].get(name, {})
if not isinstance(provider_config, dict):
provider_config = {}
env_path = Path(os.environ.get("HERMES_HOME", os.path.expanduser("~/.hermes"))) / ".env"
env_writes = {}
if schema:
print(f"\n Configuring {name}:\n")
for field in schema:
key = field["key"]
desc = field.get("description", key)
default = field.get("default")
is_secret = field.get("secret", False)
choices = field.get("choices")
env_var = field.get("env_var")
url = field.get("url")
if choices and not is_secret:
# Use curses picker for choice fields
choice_items = [(c, "") for c in choices]
current = provider_config.get(key, default)
current_idx = 0
if current and current in choices:
current_idx = choices.index(current)
sel = _curses_select(f" {desc}", choice_items, default=current_idx)
provider_config[key] = choices[sel]
elif is_secret:
# Prompt for secret
existing = os.environ.get(env_var, "") if env_var else ""
if existing:
masked = f"...{existing[-4:]}" if len(existing) > 4 else "set"
val = _prompt(f"{desc} (current: {masked}, blank to keep)", secret=True)
else:
hint = f" Get yours at {url}" if url else ""
if hint:
print(hint)
val = _prompt(desc, secret=True)
if val and env_var:
env_writes[env_var] = val
else:
# Regular text prompt
current = provider_config.get(key)
effective_default = current or default
val = _prompt(desc, default=str(effective_default) if effective_default else None)
if val:
provider_config[key] = val
# Write activation key to config.yaml
config["memory"]["provider"] = name
save_config(config)
# Write non-secret config to provider's native location
hermes_home = str(Path(os.environ.get("HERMES_HOME", os.path.expanduser("~/.hermes"))))
if provider_config and hasattr(provider, "save_config"):
try:
provider.save_config(provider_config, hermes_home)
except Exception as e:
print(f" ⚠ Failed to write provider config: {e}")
# Write secrets to .env
if env_writes:
_write_env_vars(env_path, env_writes)
print(f"\n ✓ Memory provider: {name}")
print(f" ✓ Activation saved to config.yaml")
if provider_config:
print(f" ✓ Provider config saved")
if env_writes:
print(f" ✓ API keys saved to .env")
print(f"\n Start a new session to activate.\n")
def _write_env_vars(env_path: Path, env_writes: dict) -> None:
"""Append or update env vars in .env file."""
env_path.parent.mkdir(parents=True, exist_ok=True)
existing_lines = []
if env_path.exists():
existing_lines = env_path.read_text().splitlines()
updated_keys = set()
new_lines = []
for line in existing_lines:
key_match = line.split("=", 1)[0].strip() if "=" in line else ""
if key_match in env_writes:
new_lines.append(f"{key_match}={env_writes[key_match]}")
updated_keys.add(key_match)
else:
new_lines.append(line)
for key, val in env_writes.items():
if key not in updated_keys:
new_lines.append(f"{key}={val}")
env_path.write_text("\n".join(new_lines) + "\n")
# ---------------------------------------------------------------------------
# Status
# ---------------------------------------------------------------------------
def cmd_status(args) -> None:
"""Show current memory provider config."""
from hermes_cli.config import load_config
config = load_config()
mem_config = config.get("memory", {})
provider_name = mem_config.get("provider", "")
print(f"\nMemory status\n" + "" * 40)
print(f" Built-in: always active")
print(f" Provider: {provider_name or '(none — built-in only)'}")
if provider_name:
provider_config = mem_config.get(provider_name, {})
if provider_config:
print(f"\n {provider_name} config:")
for key, val in provider_config.items():
print(f" {key}: {val}")
providers = _get_available_providers()
found = any(name == provider_name for name, _, _ in providers)
if found:
print(f"\n Plugin: installed ✓")
for pname, _, p in providers:
if pname == provider_name:
if p.is_available():
print(f" Status: available ✓")
else:
print(f" Status: not available ✗")
schema = p.get_config_schema() if hasattr(p, "get_config_schema") else []
secrets = [f for f in schema if f.get("secret")]
if secrets:
print(f" Missing:")
for s in secrets:
env_var = s.get("env_var", "")
url = s.get("url", "")
is_set = bool(os.environ.get(env_var))
mark = "" if is_set else ""
line = f" {mark} {env_var}"
if url and not is_set:
line += f"{url}"
print(line)
break
else:
print(f"\n Plugin: NOT installed ✗")
print(f" Install the '{provider_name}' memory plugin to ~/.hermes/plugins/")
providers = _get_available_providers()
if providers:
print(f"\n Installed plugins:")
for pname, desc, _ in providers:
active = " ← active" if pname == provider_name else ""
print(f"{pname} ({desc}){active}")
print()
# ---------------------------------------------------------------------------
# Router
# ---------------------------------------------------------------------------
def memory_command(args) -> None:
"""Route memory subcommands."""
sub = getattr(args, "memory_command", None)
if sub == "setup":
cmd_setup(args)
elif sub == "status":
cmd_status(args)
else:
cmd_status(args)

View File

@@ -26,6 +26,7 @@ class ModelSwitchResult:
provider_changed: bool = False
api_key: str = ""
base_url: str = ""
api_mode: str = ""
persist: bool = False
error_message: str = ""
warning_message: str = ""
@@ -73,6 +74,7 @@ def switch_model(
detect_provider_for_model,
validate_requested_model,
_PROVIDER_LABELS,
opencode_model_api_mode,
)
from hermes_cli.runtime_provider import resolve_runtime_provider
@@ -98,11 +100,13 @@ def switch_model(
# Step 4: Resolve credentials for target provider
api_key = current_api_key
base_url = current_base_url
api_mode = ""
if provider_changed:
try:
runtime = resolve_runtime_provider(requested=target_provider)
api_key = runtime.get("api_key", "")
base_url = runtime.get("base_url", "")
api_mode = runtime.get("api_mode", "")
except Exception as e:
provider_label = _PROVIDER_LABELS.get(target_provider, target_provider)
if target_provider == "custom":
@@ -130,6 +134,7 @@ def switch_model(
runtime = resolve_runtime_provider(requested=current_provider)
api_key = runtime.get("api_key", "")
base_url = runtime.get("base_url", "")
api_mode = runtime.get("api_mode", "")
except Exception:
pass
@@ -166,6 +171,12 @@ def switch_model(
and ("localhost" in (base_url or "") or "127.0.0.1" in (base_url or ""))
)
if target_provider in {"opencode-zen", "opencode-go"}:
# Recompute against the requested new model, not the currently-configured
# model used during runtime resolution. OpenCode mixes API surfaces by
# model family, so a same-provider model switch can change api_mode.
api_mode = opencode_model_api_mode(target_provider, new_model)
return ModelSwitchResult(
success=True,
new_model=new_model,
@@ -173,6 +184,7 @@ def switch_model(
provider_changed=provider_changed,
api_key=api_key,
base_url=base_url,
api_mode=api_mode,
persist=bool(validation.get("persist")),
warning_message=validation.get("message") or "",
is_custom_target=is_custom_target,

View File

@@ -125,6 +125,12 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"kimi-k2-turbo-preview",
"kimi-k2-0905-preview",
],
"moonshot": [
"kimi-k2.5",
"kimi-k2-thinking",
"kimi-k2-turbo-preview",
"kimi-k2-0905-preview",
],
"minimax": [
"MiniMax-M2.7",
"MiniMax-M2.7-highspeed",
@@ -948,6 +954,53 @@ def copilot_model_api_mode(
return "chat_completions"
def normalize_opencode_model_id(provider_id: Optional[str], model_id: Optional[str]) -> str:
"""Normalize OpenCode config IDs to the bare model slug used in API requests."""
provider = normalize_provider(provider_id)
current = str(model_id or "").strip()
if not current or provider not in {"opencode-zen", "opencode-go"}:
return current
prefix = f"{provider}/"
if current.lower().startswith(prefix):
return current[len(prefix):]
return current
def opencode_model_api_mode(provider_id: Optional[str], model_id: Optional[str]) -> str:
"""Determine the API mode for an OpenCode Zen / Go model.
OpenCode routes different models behind different API surfaces:
- GPT-5 / Codex models on Zen use ``/v1/responses``
- Claude models on Zen use ``/v1/messages``
- MiniMax models on Go use ``/v1/messages``
- GLM / Kimi on Go use ``/v1/chat/completions``
- Other Zen models (Gemini, GLM, Kimi, MiniMax, Qwen, etc.) use
``/v1/chat/completions``
This follows the published OpenCode docs for Zen and Go endpoints.
"""
provider = normalize_provider(provider_id)
normalized = normalize_opencode_model_id(provider_id, model_id).lower()
if not normalized:
return "chat_completions"
if provider == "opencode-go":
if normalized.startswith("minimax-"):
return "anthropic_messages"
return "chat_completions"
if provider == "opencode-zen":
if normalized.startswith("claude-"):
return "anthropic_messages"
if normalized.startswith("gpt-"):
return "codex_responses"
return "chat_completions"
return "chat_completions"
def github_model_reasoning_efforts(
model_id: Optional[str],
*,

View File

@@ -82,9 +82,27 @@ def _get_model_config() -> Dict[str, Any]:
return {}
def _provider_supports_explicit_api_mode(provider: Optional[str], configured_provider: Optional[str] = None) -> bool:
"""Check whether a persisted api_mode should be honored for a given provider.
Prevents stale api_mode from a previous provider leaking into a
different one after a model/provider switch. Only applies the
persisted mode when the config's provider matches the runtime
provider (or when no configured provider is recorded).
"""
normalized_provider = (provider or "").strip().lower()
normalized_configured = (configured_provider or "").strip().lower()
if not normalized_configured:
return True
if normalized_provider == "custom":
return normalized_configured == "custom" or normalized_configured.startswith("custom:")
return normalized_configured == normalized_provider
def _copilot_runtime_api_mode(model_cfg: Dict[str, Any], api_key: str) -> str:
configured_provider = str(model_cfg.get("provider") or "").strip().lower()
configured_mode = _parse_api_mode(model_cfg.get("api_mode"))
if configured_mode:
if configured_mode and _provider_supports_explicit_api_mode("copilot", configured_provider):
return configured_mode
model_name = str(model_cfg.get("default") or "").strip()
@@ -140,9 +158,13 @@ def _resolve_runtime_from_pool_entry(
elif provider == "copilot":
api_mode = _copilot_runtime_api_mode(model_cfg, getattr(entry, "runtime_api_key", ""))
else:
configured_provider = str(model_cfg.get("provider") or "").strip().lower()
configured_mode = _parse_api_mode(model_cfg.get("api_mode"))
if configured_mode:
if configured_mode and _provider_supports_explicit_api_mode(provider, configured_provider):
api_mode = configured_mode
elif provider in ("opencode-zen", "opencode-go"):
from hermes_cli.models import opencode_model_api_mode
api_mode = opencode_model_api_mode(provider, model_cfg.get("default", ""))
elif base_url.rstrip("/").endswith("/anthropic"):
api_mode = "anthropic_messages"
@@ -666,10 +688,14 @@ def resolve_runtime_provider(
if provider == "copilot":
api_mode = _copilot_runtime_api_mode(model_cfg, creds.get("api_key", ""))
else:
# Check explicit api_mode from model config first
configured_provider = str(model_cfg.get("provider") or "").strip().lower()
# Only honor persisted api_mode when it belongs to the same provider family.
configured_mode = _parse_api_mode(model_cfg.get("api_mode"))
if configured_mode:
if configured_mode and _provider_supports_explicit_api_mode(provider, configured_provider):
api_mode = configured_mode
elif provider in ("opencode-zen", "opencode-go"):
from hermes_cli.models import opencode_model_api_mode
api_mode = opencode_model_api_mode(provider, model_cfg.get("default", ""))
# Auto-detect Anthropic-compatible endpoints by URL convention
# (e.g. https://api.minimax.io/anthropic, https://dashscope.../anthropic)
elif base_url.rstrip("/").endswith("/anthropic"):

View File

@@ -114,6 +114,8 @@ _DEFAULT_PROVIDER_MODELS = {
"minimax-cn": ["MiniMax-M2.7", "MiniMax-M2.7-highspeed", "MiniMax-M2.5", "MiniMax-M2.5-highspeed", "MiniMax-M2.1"],
"ai-gateway": ["anthropic/claude-opus-4.6", "anthropic/claude-sonnet-4.6", "openai/gpt-5", "google/gemini-3-flash"],
"kilocode": ["anthropic/claude-opus-4.6", "anthropic/claude-sonnet-4.6", "openai/gpt-5.4", "google/gemini-3-pro-preview", "google/gemini-3-flash-preview"],
"opencode-zen": ["gpt-5.4", "gpt-5.3-codex", "claude-sonnet-4-6", "gemini-3-flash", "glm-5", "kimi-k2.5", "minimax-m2.7"],
"opencode-go": ["glm-5", "kimi-k2.5", "minimax-m2.5", "minimax-m2.7"],
"huggingface": [
"Qwen/Qwen3.5-397B-A17B", "Qwen/Qwen3-235B-A22B-Thinking-2507",
"Qwen/Qwen3-Coder-480B-A35B-Instruct", "deepseek-ai/DeepSeek-R1-0528",
@@ -189,6 +191,8 @@ def _setup_provider_model_selection(config, provider_id, current_model, prompt_c
fetch_api_models,
fetch_github_model_catalog,
normalize_copilot_model_id,
normalize_opencode_model_id,
opencode_model_api_mode,
)
pconfig = PROVIDER_REGISTRY[provider_id]
@@ -242,6 +246,11 @@ def _setup_provider_model_selection(config, provider_id, current_model, prompt_c
f" Use \"Custom model\" if the model you expect isn't listed."
)
if provider_id in {"opencode-zen", "opencode-go"}:
provider_models = [normalize_opencode_model_id(provider_id, mid) for mid in provider_models]
current_model = normalize_opencode_model_id(provider_id, current_model)
provider_models = list(dict.fromkeys(mid for mid in provider_models if mid))
model_choices = list(provider_models)
model_choices.append("Custom model")
model_choices.append(f"Keep current ({current_model})")
@@ -259,6 +268,8 @@ def _setup_provider_model_selection(config, provider_id, current_model, prompt_c
catalog=catalog,
api_key=api_key,
) or selected_model
elif provider_id in {"opencode-zen", "opencode-go"}:
selected_model = normalize_opencode_model_id(provider_id, selected_model)
_set_default_model(config, selected_model)
elif model_idx == len(provider_models):
custom = prompt_fn("Enter model name")
@@ -269,6 +280,8 @@ def _setup_provider_model_selection(config, provider_id, current_model, prompt_c
catalog=catalog,
api_key=api_key,
) or custom
elif provider_id in {"opencode-zen", "opencode-go"}:
selected_model = normalize_opencode_model_id(provider_id, custom)
else:
selected_model = custom
_set_default_model(config, selected_model)
@@ -300,6 +313,10 @@ def _setup_provider_model_selection(config, provider_id, current_model, prompt_c
catalog=catalog,
api_key=api_key,
)
elif provider_id in {"opencode-zen", "opencode-go"} and selected_model:
model_cfg = _model_config_dict(config)
model_cfg["api_mode"] = opencode_model_api_mode(provider_id, selected_model)
config["model"] = model_cfg
def _sync_model_from_disk(config: Dict[str, Any]) -> None:

View File

@@ -30,6 +30,7 @@ PLATFORMS = {
"dingtalk": "💬 DingTalk",
"feishu": "🪽 Feishu",
"wecom": "💬 WeCom",
"webhook": "🔗 Webhook",
}
# ─── Config Helpers ───────────────────────────────────────────────────────────

View File

@@ -150,6 +150,7 @@ PLATFORMS = {
"wecom": {"label": "💬 WeCom", "default_toolset": "hermes-wecom"},
"api_server": {"label": "🌐 API Server", "default_toolset": "hermes-api-server"},
"mattermost": {"label": "💬 Mattermost", "default_toolset": "hermes-mattermost"},
"webhook": {"label": "🔗 Webhook", "default_toolset": "hermes-webhook"},
}

View File

@@ -1,9 +0,0 @@
"""Honcho integration for AI-native memory.
This package is only active when honcho.enabled=true in config and
HONCHO_API_KEY is set. All honcho-ai imports are deferred to avoid
ImportError when the package is not installed.
Named ``honcho_integration`` (not ``honcho``) to avoid shadowing the
``honcho`` package installed by the ``honcho-ai`` SDK.
"""

View File

@@ -156,7 +156,7 @@ def _discover_tools():
"tools.delegate_tool",
"tools.process_registry",
"tools.send_message_tool",
"tools.honcho_tools",
# "tools.honcho_tools", # Removed — Honcho is now a memory provider plugin
"tools.homeassistant_tool",
]
import importlib
@@ -371,8 +371,6 @@ def handle_function_call(
task_id: Optional[str] = None,
user_task: Optional[str] = None,
enabled_tools: Optional[List[str]] = None,
honcho_manager: Optional[Any] = None,
honcho_session_key: Optional[str] = None,
) -> str:
"""
Main function call dispatcher that routes calls to the tool registry.
@@ -417,16 +415,12 @@ def handle_function_call(
function_name, function_args,
task_id=task_id,
enabled_tools=sandbox_enabled,
honcho_manager=honcho_manager,
honcho_session_key=honcho_session_key,
)
else:
result = registry.dispatch(
function_name, function_args,
task_id=task_id,
user_task=user_task,
honcho_manager=honcho_manager,
honcho_session_key=honcho_session_key,
)
try:

1
plugins/__init__.py Normal file
View File

@@ -0,0 +1 @@
# Hermes plugins package

213
plugins/memory/__init__.py Normal file
View File

@@ -0,0 +1,213 @@
"""Memory provider plugin discovery.
Scans ``plugins/memory/<name>/`` directories for memory provider plugins.
Each subdirectory must contain ``__init__.py`` with a class implementing
the MemoryProvider ABC.
Memory providers are separate from the general plugin system — they live
in the repo and are always available without user installation. Only ONE
can be active at a time, selected via ``memory.provider`` in config.yaml.
Usage:
from plugins.memory import discover_memory_providers, load_memory_provider
available = discover_memory_providers() # [(name, desc, available), ...]
provider = load_memory_provider("openviking") # MemoryProvider instance
"""
from __future__ import annotations
import importlib
import importlib.util
import logging
import sys
from pathlib import Path
from typing import List, Optional, Tuple
logger = logging.getLogger(__name__)
_MEMORY_PLUGINS_DIR = Path(__file__).parent
def discover_memory_providers() -> List[Tuple[str, str, bool]]:
"""Scan plugins/memory/ for available providers.
Returns list of (name, description, is_available) tuples.
Does NOT import the providers — just reads plugin.yaml for metadata
and does a lightweight availability check.
"""
results = []
if not _MEMORY_PLUGINS_DIR.is_dir():
return results
for child in sorted(_MEMORY_PLUGINS_DIR.iterdir()):
if not child.is_dir() or child.name.startswith(("_", ".")):
continue
init_file = child / "__init__.py"
if not init_file.exists():
continue
# Read description from plugin.yaml if available
desc = ""
yaml_file = child / "plugin.yaml"
if yaml_file.exists():
try:
import yaml
with open(yaml_file) as f:
meta = yaml.safe_load(f) or {}
desc = meta.get("description", "")
except Exception:
pass
# Quick availability check — try loading and calling is_available()
available = True
try:
provider = _load_provider_from_dir(child)
if provider:
available = provider.is_available()
else:
available = False
except Exception:
available = False
results.append((child.name, desc, available))
return results
def load_memory_provider(name: str) -> Optional["MemoryProvider"]:
"""Load and return a MemoryProvider instance by name.
Returns None if the provider is not found or fails to load.
"""
provider_dir = _MEMORY_PLUGINS_DIR / name
if not provider_dir.is_dir():
logger.debug("Memory provider '%s' not found in %s", name, _MEMORY_PLUGINS_DIR)
return None
try:
provider = _load_provider_from_dir(provider_dir)
if provider:
return provider
logger.warning("Memory provider '%s' loaded but no provider instance found", name)
return None
except Exception as e:
logger.warning("Failed to load memory provider '%s': %s", name, e)
return None
def _load_provider_from_dir(provider_dir: Path) -> Optional["MemoryProvider"]:
"""Import a provider module and extract the MemoryProvider instance.
The module must have either:
- A register(ctx) function (plugin-style) — we simulate a ctx
- A top-level class that extends MemoryProvider — we instantiate it
"""
name = provider_dir.name
module_name = f"plugins.memory.{name}"
init_file = provider_dir / "__init__.py"
if not init_file.exists():
return None
# Check if already loaded
if module_name in sys.modules:
mod = sys.modules[module_name]
else:
# Handle relative imports within the plugin
# First ensure the parent packages are registered
for parent in ("plugins", "plugins.memory"):
if parent not in sys.modules:
parent_path = Path(__file__).parent
if parent == "plugins":
parent_path = parent_path.parent
parent_init = parent_path / "__init__.py"
if parent_init.exists():
spec = importlib.util.spec_from_file_location(
parent, str(parent_init),
submodule_search_locations=[str(parent_path)]
)
if spec:
parent_mod = importlib.util.module_from_spec(spec)
sys.modules[parent] = parent_mod
try:
spec.loader.exec_module(parent_mod)
except Exception:
pass
# Now load the provider module
spec = importlib.util.spec_from_file_location(
module_name, str(init_file),
submodule_search_locations=[str(provider_dir)]
)
if not spec:
return None
mod = importlib.util.module_from_spec(spec)
sys.modules[module_name] = mod
# Register submodules so relative imports work
# e.g., "from .store import MemoryStore" in holographic plugin
for sub_file in provider_dir.glob("*.py"):
if sub_file.name == "__init__.py":
continue
sub_name = sub_file.stem
full_sub_name = f"{module_name}.{sub_name}"
if full_sub_name not in sys.modules:
sub_spec = importlib.util.spec_from_file_location(
full_sub_name, str(sub_file)
)
if sub_spec:
sub_mod = importlib.util.module_from_spec(sub_spec)
sys.modules[full_sub_name] = sub_mod
try:
sub_spec.loader.exec_module(sub_mod)
except Exception as e:
logger.debug("Failed to load submodule %s: %s", full_sub_name, e)
try:
spec.loader.exec_module(mod)
except Exception as e:
logger.debug("Failed to exec_module %s: %s", module_name, e)
sys.modules.pop(module_name, None)
return None
# Try register(ctx) pattern first (how our plugins are written)
if hasattr(mod, "register"):
collector = _ProviderCollector()
try:
mod.register(collector)
if collector.provider:
return collector.provider
except Exception as e:
logger.debug("register() failed for %s: %s", name, e)
# Fallback: find a MemoryProvider subclass and instantiate it
from agent.memory_provider import MemoryProvider
for attr_name in dir(mod):
attr = getattr(mod, attr_name, None)
if (isinstance(attr, type) and issubclass(attr, MemoryProvider)
and attr is not MemoryProvider):
try:
return attr()
except Exception:
pass
return None
class _ProviderCollector:
"""Fake plugin context that captures register_memory_provider calls."""
def __init__(self):
self.provider = None
def register_memory_provider(self, provider):
self.provider = provider
# No-op for other registration methods
def register_tool(self, *args, **kwargs):
pass
def register_hook(self, *args, **kwargs):
pass

View File

@@ -0,0 +1,41 @@
# ByteRover Memory Provider
Persistent memory via the `brv` CLI — hierarchical knowledge tree with tiered retrieval (fuzzy text → LLM-driven search).
## Requirements
Install the ByteRover CLI:
```bash
curl -fsSL https://byterover.dev/install.sh | sh
# or
npm install -g byterover-cli
```
## Setup
```bash
hermes memory setup # select "byterover"
```
Or manually:
```bash
hermes config set memory.provider byterover
# Optional cloud sync:
echo "BRV_API_KEY=your-key" >> ~/.hermes/.env
```
## Config
| Env Var | Required | Description |
|---------|----------|-------------|
| `BRV_API_KEY` | No | Cloud sync key (optional, local-first by default) |
Working directory: `$HERMES_HOME/byterover/` (profile-scoped).
## Tools
| Tool | Description |
|------|-------------|
| `brv_query` | Search the knowledge tree |
| `brv_curate` | Store facts, decisions, patterns |
| `brv_status` | CLI version, tree stats, sync state |

View File

@@ -0,0 +1,398 @@
"""ByteRover memory plugin — MemoryProvider interface.
Persistent memory via the ByteRover CLI (``brv``). Organizes knowledge into
a hierarchical context tree with tiered retrieval (fuzzy text → LLM-driven
search). Local-first with optional cloud sync.
Original PR #3499 by hieuntg81, adapted to MemoryProvider ABC.
Requires: ``brv`` CLI installed (npm install -g byterover-cli or
curl -fsSL https://byterover.dev/install.sh | sh).
Config via environment variables (profile-scoped via each profile's .env):
BRV_API_KEY — ByteRover API key (for cloud features, optional for local)
Working directory: $HERMES_HOME/byterover/ (profile-scoped context tree)
"""
from __future__ import annotations
import json
import logging
import os
import shutil
import subprocess
import threading
import time
from pathlib import Path
from typing import Any, Dict, List, Optional
from agent.memory_provider import MemoryProvider
logger = logging.getLogger(__name__)
# Timeouts
_QUERY_TIMEOUT = 30 # brv query — should be fast
_CURATE_TIMEOUT = 120 # brv curate — may involve LLM processing
# Minimum lengths to filter noise
_MIN_QUERY_LEN = 10
_MIN_OUTPUT_LEN = 20
# ---------------------------------------------------------------------------
# brv binary resolution (cached, thread-safe)
# ---------------------------------------------------------------------------
_brv_path_lock = threading.Lock()
_cached_brv_path: Optional[str] = None
def _resolve_brv_path() -> Optional[str]:
"""Find the brv binary on PATH or well-known install locations."""
global _cached_brv_path
with _brv_path_lock:
if _cached_brv_path is not None:
return _cached_brv_path if _cached_brv_path != "" else None
found = shutil.which("brv")
if not found:
home = Path.home()
candidates = [
home / ".brv-cli" / "bin" / "brv",
Path("/usr/local/bin/brv"),
home / ".npm-global" / "bin" / "brv",
]
for c in candidates:
if c.exists():
found = str(c)
break
with _brv_path_lock:
if _cached_brv_path is not None:
return _cached_brv_path if _cached_brv_path != "" else None
_cached_brv_path = found or ""
return found
def _run_brv(args: List[str], timeout: int = _QUERY_TIMEOUT,
cwd: str = None) -> dict:
"""Run a brv CLI command. Returns {success, output, error}."""
brv_path = _resolve_brv_path()
if not brv_path:
return {"success": False, "error": "brv CLI not found. Install: npm install -g byterover-cli"}
cmd = [brv_path] + args
effective_cwd = cwd or str(_get_brv_cwd())
Path(effective_cwd).mkdir(parents=True, exist_ok=True)
env = os.environ.copy()
brv_bin_dir = str(Path(brv_path).parent)
env["PATH"] = brv_bin_dir + os.pathsep + env.get("PATH", "")
try:
result = subprocess.run(
cmd, capture_output=True, text=True,
timeout=timeout, cwd=effective_cwd, env=env,
)
stdout = result.stdout.strip()
stderr = result.stderr.strip()
if result.returncode == 0:
return {"success": True, "output": stdout}
return {"success": False, "error": stderr or stdout or f"brv exited {result.returncode}"}
except subprocess.TimeoutExpired:
return {"success": False, "error": f"brv timed out after {timeout}s"}
except FileNotFoundError:
global _cached_brv_path
with _brv_path_lock:
_cached_brv_path = None
return {"success": False, "error": "brv CLI not found"}
except Exception as e:
return {"success": False, "error": str(e)}
def _get_brv_cwd() -> Path:
"""Profile-scoped working directory for the brv context tree."""
from hermes_constants import get_hermes_home
return get_hermes_home() / "byterover"
# ---------------------------------------------------------------------------
# Tool schemas
# ---------------------------------------------------------------------------
QUERY_SCHEMA = {
"name": "brv_query",
"description": (
"Search ByteRover's persistent knowledge tree for relevant context. "
"Returns memories, project knowledge, architectural decisions, and "
"patterns from previous sessions. Use for any question where past "
"context would help."
),
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "What to search for."},
},
"required": ["query"],
},
}
CURATE_SCHEMA = {
"name": "brv_curate",
"description": (
"Store important information in ByteRover's persistent knowledge tree. "
"Use for architectural decisions, bug fixes, user preferences, project "
"patterns — anything worth remembering across sessions. ByteRover's LLM "
"automatically categorizes and organizes the memory."
),
"parameters": {
"type": "object",
"properties": {
"content": {"type": "string", "description": "The information to remember."},
},
"required": ["content"],
},
}
STATUS_SCHEMA = {
"name": "brv_status",
"description": "Check ByteRover status — CLI version, context tree stats, cloud sync state.",
"parameters": {"type": "object", "properties": {}, "required": []},
}
# ---------------------------------------------------------------------------
# MemoryProvider implementation
# ---------------------------------------------------------------------------
class ByteRoverMemoryProvider(MemoryProvider):
"""ByteRover persistent memory via the brv CLI."""
def __init__(self):
self._cwd = ""
self._session_id = ""
self._turn_count = 0
self._prefetch_result = ""
self._prefetch_lock = threading.Lock()
self._prefetch_thread: Optional[threading.Thread] = None
self._sync_thread: Optional[threading.Thread] = None
@property
def name(self) -> str:
return "byterover"
def is_available(self) -> bool:
"""Check if brv CLI is installed. No network calls."""
return _resolve_brv_path() is not None
def get_config_schema(self):
return [
{
"key": "api_key",
"description": "ByteRover API key (optional, for cloud sync)",
"secret": True,
"env_var": "BRV_API_KEY",
"url": "https://app.byterover.dev",
},
]
def initialize(self, session_id: str, **kwargs) -> None:
self._cwd = str(_get_brv_cwd())
self._session_id = session_id
self._turn_count = 0
Path(self._cwd).mkdir(parents=True, exist_ok=True)
def system_prompt_block(self) -> str:
if not _resolve_brv_path():
return ""
return (
"# ByteRover Memory\n"
"Active. Persistent knowledge tree with hierarchical context.\n"
"Use brv_query to search past knowledge, brv_curate to store "
"important facts, brv_status to check state."
)
def prefetch(self, query: str, *, session_id: str = "") -> str:
if self._prefetch_thread and self._prefetch_thread.is_alive():
self._prefetch_thread.join(timeout=3.0)
with self._prefetch_lock:
result = self._prefetch_result
self._prefetch_result = ""
if not result:
return ""
return f"## ByteRover Context\n{result}"
def queue_prefetch(self, query: str, *, session_id: str = "") -> None:
if not query or len(query.strip()) < _MIN_QUERY_LEN:
return
def _run():
try:
result = _run_brv(
["query", "--", query.strip()[:5000]],
timeout=_QUERY_TIMEOUT, cwd=self._cwd,
)
if result["success"] and result.get("output"):
output = result["output"].strip()
if len(output) > _MIN_OUTPUT_LEN:
with self._prefetch_lock:
self._prefetch_result = output
except Exception as e:
logger.debug("ByteRover prefetch failed: %s", e)
self._prefetch_thread = threading.Thread(
target=_run, daemon=True, name="brv-prefetch"
)
self._prefetch_thread.start()
def sync_turn(self, user_content: str, assistant_content: str, *, session_id: str = "") -> None:
"""Curate the conversation turn in background (non-blocking)."""
self._turn_count += 1
# Only curate substantive turns
if len(user_content.strip()) < _MIN_QUERY_LEN:
return
def _sync():
try:
combined = f"User: {user_content[:2000]}\nAssistant: {assistant_content[:2000]}"
_run_brv(
["curate", "--", combined],
timeout=_CURATE_TIMEOUT, cwd=self._cwd,
)
except Exception as e:
logger.debug("ByteRover sync failed: %s", e)
# Wait for previous sync
if self._sync_thread and self._sync_thread.is_alive():
self._sync_thread.join(timeout=5.0)
self._sync_thread = threading.Thread(
target=_sync, daemon=True, name="brv-sync"
)
self._sync_thread.start()
def on_memory_write(self, action: str, target: str, content: str) -> None:
"""Mirror built-in memory writes to ByteRover."""
if action not in ("add", "replace") or not content:
return
def _write():
try:
label = "User profile" if target == "user" else "Agent memory"
_run_brv(
["curate", "--", f"[{label}] {content}"],
timeout=_CURATE_TIMEOUT, cwd=self._cwd,
)
except Exception as e:
logger.debug("ByteRover memory mirror failed: %s", e)
t = threading.Thread(target=_write, daemon=True, name="brv-memwrite")
t.start()
def on_pre_compress(self, messages: List[Dict[str, Any]]) -> str:
"""Extract insights before context compression discards turns."""
if not messages:
return ""
# Build a summary of messages about to be compressed
parts = []
for msg in messages[-10:]: # last 10 messages
role = msg.get("role", "")
content = msg.get("content", "")
if isinstance(content, str) and content.strip() and role in ("user", "assistant"):
parts.append(f"{role}: {content[:500]}")
if not parts:
return ""
combined = "\n".join(parts)
def _flush():
try:
_run_brv(
["curate", "--", f"[Pre-compression context]\n{combined}"],
timeout=_CURATE_TIMEOUT, cwd=self._cwd,
)
logger.info("ByteRover pre-compression flush: %d messages", len(parts))
except Exception as e:
logger.debug("ByteRover pre-compression flush failed: %s", e)
t = threading.Thread(target=_flush, daemon=True, name="brv-flush")
t.start()
return ""
def get_tool_schemas(self) -> List[Dict[str, Any]]:
return [QUERY_SCHEMA, CURATE_SCHEMA, STATUS_SCHEMA]
def handle_tool_call(self, tool_name: str, args: dict, **kwargs) -> str:
if tool_name == "brv_query":
return self._tool_query(args)
elif tool_name == "brv_curate":
return self._tool_curate(args)
elif tool_name == "brv_status":
return self._tool_status()
return json.dumps({"error": f"Unknown tool: {tool_name}"})
def shutdown(self) -> None:
for t in (self._sync_thread, self._prefetch_thread):
if t and t.is_alive():
t.join(timeout=10.0)
# -- Tool implementations ------------------------------------------------
def _tool_query(self, args: dict) -> str:
query = args.get("query", "")
if not query:
return json.dumps({"error": "query is required"})
result = _run_brv(
["query", "--", query.strip()[:5000]],
timeout=_QUERY_TIMEOUT, cwd=self._cwd,
)
if not result["success"]:
return json.dumps({"error": result.get("error", "Query failed")})
output = result.get("output", "").strip()
if not output or len(output) < _MIN_OUTPUT_LEN:
return json.dumps({"result": "No relevant memories found."})
# Truncate very long results
if len(output) > 8000:
output = output[:8000] + "\n\n[... truncated]"
return json.dumps({"result": output})
def _tool_curate(self, args: dict) -> str:
content = args.get("content", "")
if not content:
return json.dumps({"error": "content is required"})
result = _run_brv(
["curate", "--", content],
timeout=_CURATE_TIMEOUT, cwd=self._cwd,
)
if not result["success"]:
return json.dumps({"error": result.get("error", "Curate failed")})
return json.dumps({"result": "Memory curated successfully."})
def _tool_status(self) -> str:
result = _run_brv(["status"], timeout=15, cwd=self._cwd)
if not result["success"]:
return json.dumps({"error": result.get("error", "Status check failed")})
return json.dumps({"status": result.get("output", "")})
# ---------------------------------------------------------------------------
# Plugin entry point
# ---------------------------------------------------------------------------
def register(ctx) -> None:
"""Register ByteRover as a memory provider plugin."""
ctx.register_memory_provider(ByteRoverMemoryProvider())

View File

@@ -0,0 +1,9 @@
name: byterover
version: 1.0.0
description: "ByteRover — persistent knowledge tree with tiered retrieval via the brv CLI."
external_dependencies:
- name: brv
install: "curl -fsSL https://byterover.dev/install.sh | sh"
check: "brv --version"
hooks:
- on_pre_compress

View File

@@ -0,0 +1,38 @@
# Hindsight Memory Provider
Long-term memory with knowledge graph, entity resolution, and multi-strategy retrieval. Supports cloud and local modes.
## Requirements
- Cloud: `pip install hindsight-client` + API key from [app.hindsight.vectorize.io](https://app.hindsight.vectorize.io)
- Local: `pip install hindsight` + LLM API key for embeddings
## Setup
```bash
hermes memory setup # select "hindsight"
```
Or manually:
```bash
hermes config set memory.provider hindsight
echo "HINDSIGHT_API_KEY=your-key" >> ~/.hermes/.env
```
## Config
Config file: `$HERMES_HOME/hindsight/config.json` (or `~/.hindsight/config.json` legacy)
| Key | Default | Description |
|-----|---------|-------------|
| `mode` | `cloud` | `cloud` or `local` |
| `bank_id` | `hermes` | Memory bank identifier |
| `budget` | `mid` | Recall thoroughness: `low`/`mid`/`high` |
## Tools
| Tool | Description |
|------|-------------|
| `hindsight_retain` | Store information with auto entity extraction |
| `hindsight_recall` | Multi-strategy search (semantic + entity graph) |
| `hindsight_reflect` | Cross-memory synthesis (LLM-powered) |

View File

@@ -0,0 +1,358 @@
"""Hindsight memory plugin — MemoryProvider interface.
Long-term memory with knowledge graph, entity resolution, and multi-strategy
retrieval. Supports cloud (API key) and local (embedded PostgreSQL) modes.
Original PR #1811 by benfrank241, adapted to MemoryProvider ABC.
Config via environment variables:
HINDSIGHT_API_KEY — API key for Hindsight Cloud
HINDSIGHT_BANK_ID — memory bank identifier (default: hermes)
HINDSIGHT_BUDGET — recall budget: low/mid/high (default: mid)
HINDSIGHT_API_URL — API endpoint
HINDSIGHT_MODE — cloud or local (default: cloud)
Or via $HERMES_HOME/hindsight/config.json (profile-scoped), falling back to
~/.hindsight/config.json (legacy, shared) for backward compatibility.
"""
from __future__ import annotations
import json
import logging
import os
import queue
import threading
from typing import Any, Dict, List
from agent.memory_provider import MemoryProvider
logger = logging.getLogger(__name__)
_DEFAULT_API_URL = "https://api.hindsight.vectorize.io"
_VALID_BUDGETS = {"low", "mid", "high"}
# ---------------------------------------------------------------------------
# Thread helper (from original PR — avoids aiohttp event loop conflicts)
# ---------------------------------------------------------------------------
def _run_in_thread(fn, timeout: float = 30.0):
result_q: queue.Queue = queue.Queue(maxsize=1)
def _run():
import asyncio
asyncio.set_event_loop(None)
try:
result_q.put(("ok", fn()))
except Exception as exc:
result_q.put(("err", exc))
t = threading.Thread(target=_run, daemon=True, name="hindsight-call")
t.start()
kind, value = result_q.get(timeout=timeout)
if kind == "err":
raise value
return value
# ---------------------------------------------------------------------------
# Tool schemas
# ---------------------------------------------------------------------------
RETAIN_SCHEMA = {
"name": "hindsight_retain",
"description": (
"Store information to long-term memory. Hindsight automatically "
"extracts structured facts, resolves entities, and indexes for retrieval."
),
"parameters": {
"type": "object",
"properties": {
"content": {"type": "string", "description": "The information to store."},
"context": {"type": "string", "description": "Short label (e.g. 'user preference', 'project decision')."},
},
"required": ["content"],
},
}
RECALL_SCHEMA = {
"name": "hindsight_recall",
"description": (
"Search long-term memory. Returns memories ranked by relevance using "
"semantic search, keyword matching, entity graph traversal, and reranking."
),
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "What to search for."},
},
"required": ["query"],
},
}
REFLECT_SCHEMA = {
"name": "hindsight_reflect",
"description": (
"Synthesize a reasoned answer from long-term memories. Unlike recall, "
"this reasons across all stored memories to produce a coherent response."
),
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "The question to reflect on."},
},
"required": ["query"],
},
}
# ---------------------------------------------------------------------------
# Config
# ---------------------------------------------------------------------------
def _load_config() -> dict:
"""Load config from profile-scoped path, legacy path, or env vars.
Resolution order:
1. $HERMES_HOME/hindsight/config.json (profile-scoped)
2. ~/.hindsight/config.json (legacy, shared)
3. Environment variables
"""
from pathlib import Path
from hermes_constants import get_hermes_home
# Profile-scoped path (preferred)
profile_path = get_hermes_home() / "hindsight" / "config.json"
if profile_path.exists():
try:
return json.loads(profile_path.read_text(encoding="utf-8"))
except Exception:
pass
# Legacy shared path (backward compat)
legacy_path = Path.home() / ".hindsight" / "config.json"
if legacy_path.exists():
try:
return json.loads(legacy_path.read_text(encoding="utf-8"))
except Exception:
pass
return {
"mode": os.environ.get("HINDSIGHT_MODE", "cloud"),
"apiKey": os.environ.get("HINDSIGHT_API_KEY", ""),
"banks": {
"hermes": {
"bankId": os.environ.get("HINDSIGHT_BANK_ID", "hermes"),
"budget": os.environ.get("HINDSIGHT_BUDGET", "mid"),
"enabled": True,
}
},
}
# ---------------------------------------------------------------------------
# MemoryProvider implementation
# ---------------------------------------------------------------------------
class HindsightMemoryProvider(MemoryProvider):
"""Hindsight long-term memory with knowledge graph and multi-strategy retrieval."""
def __init__(self):
self._config = None
self._api_key = None
self._bank_id = "hermes"
self._budget = "mid"
self._mode = "cloud"
self._prefetch_result = ""
self._prefetch_lock = threading.Lock()
self._prefetch_thread = None
self._sync_thread = None
@property
def name(self) -> str:
return "hindsight"
def is_available(self) -> bool:
try:
cfg = _load_config()
mode = cfg.get("mode", "cloud")
if mode == "local":
embed = cfg.get("embed", {})
return bool(embed.get("llmApiKey") or os.environ.get("HINDSIGHT_LLM_API_KEY"))
api_key = cfg.get("apiKey") or os.environ.get("HINDSIGHT_API_KEY", "")
return bool(api_key)
except Exception:
return False
def save_config(self, values, hermes_home):
"""Write config to $HERMES_HOME/hindsight/config.json."""
import json
from pathlib import Path
config_dir = Path(hermes_home) / "hindsight"
config_dir.mkdir(parents=True, exist_ok=True)
config_path = config_dir / "config.json"
existing = {}
if config_path.exists():
try:
existing = json.loads(config_path.read_text())
except Exception:
pass
existing.update(values)
config_path.write_text(json.dumps(existing, indent=2))
def get_config_schema(self):
return [
{"key": "mode", "description": "Cloud API or local embedded mode", "default": "cloud", "choices": ["cloud", "local"]},
{"key": "api_key", "description": "Hindsight Cloud API key", "secret": True, "env_var": "HINDSIGHT_API_KEY", "url": "https://app.hindsight.vectorize.io"},
{"key": "bank_id", "description": "Memory bank identifier", "default": "hermes"},
{"key": "budget", "description": "Recall thoroughness", "default": "mid", "choices": ["low", "mid", "high"]},
{"key": "llm_provider", "description": "LLM provider for local mode", "default": "anthropic", "choices": ["anthropic", "openai", "groq", "ollama"]},
{"key": "llm_api_key", "description": "LLM API key for local mode", "secret": True, "env_var": "HINDSIGHT_LLM_API_KEY"},
{"key": "llm_model", "description": "LLM model for local mode", "default": "claude-haiku-4-5-20251001"},
]
def _make_client(self):
"""Create a fresh Hindsight client (thread-safe)."""
if self._mode == "local":
from hindsight import HindsightEmbedded
embed = self._config.get("embed", {})
return HindsightEmbedded(
profile=embed.get("profile", "hermes"),
llm_provider=embed.get("llmProvider", ""),
llm_api_key=embed.get("llmApiKey", ""),
llm_model=embed.get("llmModel", ""),
)
from hindsight_client import Hindsight
return Hindsight(api_key=self._api_key, timeout=30.0)
def initialize(self, session_id: str, **kwargs) -> None:
self._config = _load_config()
self._mode = self._config.get("mode", "cloud")
self._api_key = self._config.get("apiKey") or os.environ.get("HINDSIGHT_API_KEY", "")
banks = self._config.get("banks", {}).get("hermes", {})
self._bank_id = banks.get("bankId", "hermes")
budget = banks.get("budget", "mid")
self._budget = budget if budget in _VALID_BUDGETS else "mid"
# Ensure bank exists
try:
client = _run_in_thread(self._make_client)
_run_in_thread(lambda: client.create_bank(bank_id=self._bank_id, name=self._bank_id))
except Exception:
pass # Already exists
def system_prompt_block(self) -> str:
return (
f"# Hindsight Memory\n"
f"Active. Bank: {self._bank_id}, budget: {self._budget}.\n"
f"Use hindsight_recall to search, hindsight_reflect for synthesis, "
f"hindsight_retain to store facts."
)
def prefetch(self, query: str, *, session_id: str = "") -> str:
if self._prefetch_thread and self._prefetch_thread.is_alive():
self._prefetch_thread.join(timeout=3.0)
with self._prefetch_lock:
result = self._prefetch_result
self._prefetch_result = ""
if not result:
return ""
return f"## Hindsight Memory\n{result}"
def queue_prefetch(self, query: str, *, session_id: str = "") -> None:
def _run():
try:
client = self._make_client()
resp = client.recall(bank_id=self._bank_id, query=query, budget=self._budget)
if resp.results:
text = "\n".join(r.text for r in resp.results if r.text)
with self._prefetch_lock:
self._prefetch_result = text
except Exception as e:
logger.debug("Hindsight prefetch failed: %s", e)
self._prefetch_thread = threading.Thread(target=_run, daemon=True, name="hindsight-prefetch")
self._prefetch_thread.start()
def sync_turn(self, user_content: str, assistant_content: str, *, session_id: str = "") -> None:
"""Retain conversation turn in background (non-blocking)."""
combined = f"User: {user_content}\nAssistant: {assistant_content}"
def _sync():
try:
_run_in_thread(
lambda: self._make_client().retain(
bank_id=self._bank_id, content=combined, context="conversation"
)
)
except Exception as e:
logger.warning("Hindsight sync failed: %s", e)
if self._sync_thread and self._sync_thread.is_alive():
self._sync_thread.join(timeout=5.0)
self._sync_thread = threading.Thread(target=_sync, daemon=True, name="hindsight-sync")
self._sync_thread.start()
def get_tool_schemas(self) -> List[Dict[str, Any]]:
return [RETAIN_SCHEMA, RECALL_SCHEMA, REFLECT_SCHEMA]
def handle_tool_call(self, tool_name: str, args: dict, **kwargs) -> str:
if tool_name == "hindsight_retain":
content = args.get("content", "")
if not content:
return json.dumps({"error": "Missing required parameter: content"})
context = args.get("context")
try:
_run_in_thread(
lambda: self._make_client().retain(
bank_id=self._bank_id, content=content, context=context
)
)
return json.dumps({"result": "Memory stored successfully."})
except Exception as e:
return json.dumps({"error": f"Failed to store memory: {e}"})
elif tool_name == "hindsight_recall":
query = args.get("query", "")
if not query:
return json.dumps({"error": "Missing required parameter: query"})
try:
resp = _run_in_thread(
lambda: self._make_client().recall(
bank_id=self._bank_id, query=query, budget=self._budget
)
)
if not resp.results:
return json.dumps({"result": "No relevant memories found."})
lines = [f"{i}. {r.text}" for i, r in enumerate(resp.results, 1)]
return json.dumps({"result": "\n".join(lines)})
except Exception as e:
return json.dumps({"error": f"Failed to search memory: {e}"})
elif tool_name == "hindsight_reflect":
query = args.get("query", "")
if not query:
return json.dumps({"error": "Missing required parameter: query"})
try:
resp = _run_in_thread(
lambda: self._make_client().reflect(
bank_id=self._bank_id, query=query, budget=self._budget
)
)
return json.dumps({"result": resp.text or "No relevant memories found."})
except Exception as e:
return json.dumps({"error": f"Failed to reflect: {e}"})
return json.dumps({"error": f"Unknown tool: {tool_name}"})
def shutdown(self) -> None:
for t in (self._prefetch_thread, self._sync_thread):
if t and t.is_alive():
t.join(timeout=5.0)
def register(ctx) -> None:
"""Register Hindsight as a memory provider plugin."""
ctx.register_memory_provider(HindsightMemoryProvider())

View File

@@ -0,0 +1,9 @@
name: hindsight
version: 1.0.0
description: "Hindsight — long-term memory with knowledge graph, entity resolution, and multi-strategy retrieval."
pip_dependencies:
- hindsight-client
requires_env:
- HINDSIGHT_API_KEY
hooks:
- on_session_end

View File

@@ -0,0 +1,36 @@
# Holographic Memory Provider
Local SQLite fact store with FTS5 search, trust scoring, entity resolution, and HRR-based compositional retrieval.
## Requirements
None — uses SQLite (always available). NumPy optional for HRR algebra.
## Setup
```bash
hermes memory setup # select "holographic"
```
Or manually:
```bash
hermes config set memory.provider holographic
```
## Config
Config in `config.yaml` under `plugins.hermes-memory-store`:
| Key | Default | Description |
|-----|---------|-------------|
| `db_path` | `$HERMES_HOME/memory_store.db` | SQLite database path |
| `auto_extract` | `false` | Auto-extract facts at session end |
| `default_trust` | `0.5` | Default trust score for new facts |
| `hrr_dim` | `1024` | HRR vector dimensions |
## Tools
| Tool | Description |
|------|-------------|
| `fact_store` | 9 actions: add, search, probe, related, reason, contradict, update, remove, list |
| `fact_feedback` | Rate facts as helpful/unhelpful (trains trust scores) |

View File

@@ -0,0 +1,395 @@
"""hermes-memory-store — holographic memory plugin using MemoryProvider interface.
Registers as a MemoryProvider plugin, giving the agent structured fact storage
with entity resolution, trust scoring, and HRR-based compositional retrieval.
Original plugin by dusterbloom (PR #2351), adapted to the MemoryProvider ABC.
Config in $HERMES_HOME/config.yaml (profile-scoped):
plugins:
hermes-memory-store:
db_path: $HERMES_HOME/memory_store.db
auto_extract: false
default_trust: 0.5
min_trust_threshold: 0.3
temporal_decay_half_life: 0
"""
from __future__ import annotations
import json
import logging
import re
from pathlib import Path
from typing import Any, Dict, List
from agent.memory_provider import MemoryProvider
from .store import MemoryStore
from .retrieval import FactRetriever
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Tool schemas (unchanged from original PR)
# ---------------------------------------------------------------------------
FACT_STORE_SCHEMA = {
"name": "fact_store",
"description": (
"Deep structured memory with algebraic reasoning. "
"Use alongside the memory tool — memory for always-on context, "
"fact_store for deep recall and compositional queries.\n\n"
"ACTIONS (simple → powerful):\n"
"• add — Store a fact the user would expect you to remember.\n"
"• search — Keyword lookup ('editor config', 'deploy process').\n"
"• probe — Entity recall: ALL facts about a person/thing.\n"
"• related — What connects to an entity? Structural adjacency.\n"
"• reason — Compositional: facts connected to MULTIPLE entities simultaneously.\n"
"• contradict — Memory hygiene: find facts making conflicting claims.\n"
"• update/remove/list — CRUD operations.\n\n"
"IMPORTANT: Before answering questions about the user, ALWAYS probe or reason first."
),
"parameters": {
"type": "object",
"properties": {
"action": {
"type": "string",
"enum": ["add", "search", "probe", "related", "reason", "contradict", "update", "remove", "list"],
},
"content": {"type": "string", "description": "Fact content (required for 'add')."},
"query": {"type": "string", "description": "Search query (required for 'search')."},
"entity": {"type": "string", "description": "Entity name for 'probe'/'related'."},
"entities": {"type": "array", "items": {"type": "string"}, "description": "Entity names for 'reason'."},
"fact_id": {"type": "integer", "description": "Fact ID for 'update'/'remove'."},
"category": {"type": "string", "enum": ["user_pref", "project", "tool", "general"]},
"tags": {"type": "string", "description": "Comma-separated tags."},
"trust_delta": {"type": "number", "description": "Trust adjustment for 'update'."},
"min_trust": {"type": "number", "description": "Minimum trust filter (default: 0.3)."},
"limit": {"type": "integer", "description": "Max results (default: 10)."},
},
"required": ["action"],
},
}
FACT_FEEDBACK_SCHEMA = {
"name": "fact_feedback",
"description": (
"Rate a fact after using it. Mark 'helpful' if accurate, 'unhelpful' if outdated. "
"This trains the memory — good facts rise, bad facts sink."
),
"parameters": {
"type": "object",
"properties": {
"action": {"type": "string", "enum": ["helpful", "unhelpful"]},
"fact_id": {"type": "integer", "description": "The fact ID to rate."},
},
"required": ["action", "fact_id"],
},
}
# ---------------------------------------------------------------------------
# Config
# ---------------------------------------------------------------------------
def _load_plugin_config() -> dict:
from hermes_constants import get_hermes_home
config_path = get_hermes_home() / "config.yaml"
if not config_path.exists():
return {}
try:
import yaml
with open(config_path) as f:
all_config = yaml.safe_load(f) or {}
return all_config.get("plugins", {}).get("hermes-memory-store", {}) or {}
except Exception:
return {}
# ---------------------------------------------------------------------------
# MemoryProvider implementation
# ---------------------------------------------------------------------------
class HolographicMemoryProvider(MemoryProvider):
"""Holographic memory with structured facts, entity resolution, and HRR retrieval."""
def __init__(self, config: dict | None = None):
self._config = config or _load_plugin_config()
self._store = None
self._retriever = None
self._min_trust = float(self._config.get("min_trust_threshold", 0.3))
@property
def name(self) -> str:
return "holographic"
def is_available(self) -> bool:
return True # SQLite is always available, numpy is optional
def save_config(self, values, hermes_home):
"""Write config to config.yaml under plugins.hermes-memory-store."""
from pathlib import Path
config_path = Path(hermes_home) / "config.yaml"
try:
import yaml
existing = {}
if config_path.exists():
with open(config_path) as f:
existing = yaml.safe_load(f) or {}
existing.setdefault("plugins", {})
existing["plugins"]["hermes-memory-store"] = values
with open(config_path, "w") as f:
yaml.dump(existing, f, default_flow_style=False)
except Exception:
pass
def get_config_schema(self):
from hermes_constants import display_hermes_home
_default_db = f"{display_hermes_home()}/memory_store.db"
return [
{"key": "db_path", "description": "SQLite database path", "default": _default_db},
{"key": "auto_extract", "description": "Auto-extract facts at session end", "default": "false", "choices": ["true", "false"]},
{"key": "default_trust", "description": "Default trust score for new facts", "default": "0.5"},
{"key": "hrr_dim", "description": "HRR vector dimensions", "default": "1024"},
]
def initialize(self, session_id: str, **kwargs) -> None:
from hermes_constants import get_hermes_home
_default_db = str(get_hermes_home() / "memory_store.db")
db_path = self._config.get("db_path", _default_db)
default_trust = float(self._config.get("default_trust", 0.5))
hrr_dim = int(self._config.get("hrr_dim", 1024))
hrr_weight = float(self._config.get("hrr_weight", 0.3))
temporal_decay = int(self._config.get("temporal_decay_half_life", 0))
self._store = MemoryStore(db_path=db_path, default_trust=default_trust, hrr_dim=hrr_dim)
self._retriever = FactRetriever(
store=self._store,
temporal_decay_half_life=temporal_decay,
hrr_weight=hrr_weight,
hrr_dim=hrr_dim,
)
self._session_id = session_id
def system_prompt_block(self) -> str:
if not self._store:
return ""
try:
total = self._store._conn.execute(
"SELECT COUNT(*) FROM facts"
).fetchone()[0]
except Exception:
total = 0
if total == 0:
return ""
return (
f"# Holographic Memory\n"
f"Active. {total} facts stored with entity resolution and trust scoring.\n"
f"Use fact_store to search, probe entities, reason across entities, or add facts.\n"
f"Use fact_feedback to rate facts after using them (trains trust scores)."
)
def prefetch(self, query: str, *, session_id: str = "") -> str:
if not self._retriever or not query:
return ""
try:
results = self._retriever.search(query, min_trust=self._min_trust, limit=5)
if not results:
return ""
lines = []
for r in results:
trust = r.get("trust", 0)
lines.append(f"- [{trust:.1f}] {r.get('content', '')}")
return "## Holographic Memory\n" + "\n".join(lines)
except Exception as e:
logger.debug("Holographic prefetch failed: %s", e)
return ""
def sync_turn(self, user_content: str, assistant_content: str, *, session_id: str = "") -> None:
# Holographic memory stores explicit facts via tools, not auto-sync.
# The on_session_end hook handles auto-extraction if configured.
pass
def get_tool_schemas(self) -> List[Dict[str, Any]]:
return [FACT_STORE_SCHEMA, FACT_FEEDBACK_SCHEMA]
def handle_tool_call(self, tool_name: str, args: Dict[str, Any], **kwargs) -> str:
if tool_name == "fact_store":
return self._handle_fact_store(args)
elif tool_name == "fact_feedback":
return self._handle_fact_feedback(args)
return json.dumps({"error": f"Unknown tool: {tool_name}"})
def on_session_end(self, messages: List[Dict[str, Any]]) -> None:
if not self._config.get("auto_extract", False):
return
if not self._store or not messages:
return
self._auto_extract_facts(messages)
def on_memory_write(self, action: str, target: str, content: str) -> None:
"""Mirror built-in memory writes as facts."""
if action == "add" and self._store and content:
try:
category = "user_pref" if target == "user" else "general"
self._store.add_fact(content, category=category)
except Exception as e:
logger.debug("Holographic memory_write mirror failed: %s", e)
def shutdown(self) -> None:
self._store = None
self._retriever = None
# -- Tool handlers -------------------------------------------------------
def _handle_fact_store(self, args: dict) -> str:
try:
action = args["action"]
store = self._store
retriever = self._retriever
if action == "add":
fact_id = store.add_fact(
args["content"],
category=args.get("category", "general"),
tags=args.get("tags", ""),
)
return json.dumps({"fact_id": fact_id, "status": "added"})
elif action == "search":
results = retriever.search(
args["query"],
category=args.get("category"),
min_trust=float(args.get("min_trust", self._min_trust)),
limit=int(args.get("limit", 10)),
)
return json.dumps({"results": results, "count": len(results)})
elif action == "probe":
results = retriever.probe(
args["entity"],
category=args.get("category"),
limit=int(args.get("limit", 10)),
)
return json.dumps({"results": results, "count": len(results)})
elif action == "related":
results = retriever.related(
args["entity"],
category=args.get("category"),
limit=int(args.get("limit", 10)),
)
return json.dumps({"results": results, "count": len(results)})
elif action == "reason":
entities = args.get("entities", [])
if not entities:
return json.dumps({"error": "reason requires 'entities' list"})
results = retriever.reason(
entities,
category=args.get("category"),
limit=int(args.get("limit", 10)),
)
return json.dumps({"results": results, "count": len(results)})
elif action == "contradict":
results = retriever.contradict(
category=args.get("category"),
limit=int(args.get("limit", 10)),
)
return json.dumps({"results": results, "count": len(results)})
elif action == "update":
updated = store.update_fact(
int(args["fact_id"]),
content=args.get("content"),
trust_delta=float(args["trust_delta"]) if "trust_delta" in args else None,
tags=args.get("tags"),
category=args.get("category"),
)
return json.dumps({"updated": updated})
elif action == "remove":
removed = store.remove_fact(int(args["fact_id"]))
return json.dumps({"removed": removed})
elif action == "list":
facts = store.list_facts(
category=args.get("category"),
min_trust=float(args.get("min_trust", 0.0)),
limit=int(args.get("limit", 10)),
)
return json.dumps({"facts": facts, "count": len(facts)})
else:
return json.dumps({"error": f"Unknown action: {action}"})
except KeyError as exc:
return json.dumps({"error": f"Missing required argument: {exc}"})
except Exception as exc:
return json.dumps({"error": str(exc)})
def _handle_fact_feedback(self, args: dict) -> str:
try:
fact_id = int(args["fact_id"])
helpful = args["action"] == "helpful"
result = self._store.record_feedback(fact_id, helpful=helpful)
return json.dumps(result)
except KeyError as exc:
return json.dumps({"error": f"Missing required argument: {exc}"})
except Exception as exc:
return json.dumps({"error": str(exc)})
# -- Auto-extraction (on_session_end) ------------------------------------
def _auto_extract_facts(self, messages: list) -> None:
_PREF_PATTERNS = [
re.compile(r'\bI\s+(?:prefer|like|love|use|want|need)\s+(.+)', re.IGNORECASE),
re.compile(r'\bmy\s+(?:favorite|preferred|default)\s+\w+\s+is\s+(.+)', re.IGNORECASE),
re.compile(r'\bI\s+(?:always|never|usually)\s+(.+)', re.IGNORECASE),
]
_DECISION_PATTERNS = [
re.compile(r'\bwe\s+(?:decided|agreed|chose)\s+(?:to\s+)?(.+)', re.IGNORECASE),
re.compile(r'\bthe\s+project\s+(?:uses|needs|requires)\s+(.+)', re.IGNORECASE),
]
extracted = 0
for msg in messages:
if msg.get("role") != "user":
continue
content = msg.get("content", "")
if not isinstance(content, str) or len(content) < 10:
continue
for pattern in _PREF_PATTERNS:
if pattern.search(content):
try:
self._store.add_fact(content[:400], category="user_pref")
extracted += 1
except Exception:
pass
break
for pattern in _DECISION_PATTERNS:
if pattern.search(content):
try:
self._store.add_fact(content[:400], category="project")
extracted += 1
except Exception:
pass
break
if extracted:
logger.info("Auto-extracted %d facts from conversation", extracted)
# ---------------------------------------------------------------------------
# Plugin entry point
# ---------------------------------------------------------------------------
def register(ctx) -> None:
"""Register the holographic memory provider with the plugin system."""
config = _load_plugin_config()
provider = HolographicMemoryProvider(config=config)
ctx.register_memory_provider(provider)

View File

@@ -0,0 +1,203 @@
"""Holographic Reduced Representations (HRR) with phase encoding.
HRRs are a vector symbolic architecture for encoding compositional structure
into fixed-width distributed representations. This module uses *phase vectors*:
each concept is a vector of angles in [0, 2π). The algebraic operations are:
bind — circular convolution (phase addition) — associates two concepts
unbind — circular correlation (phase subtraction) — retrieves a bound value
bundle — superposition (circular mean) — merges multiple concepts
Phase encoding is numerically stable, avoids the magnitude collapse of
traditional complex-number HRRs, and maps cleanly to cosine similarity.
Atoms are generated deterministically from SHA-256 so representations are
identical across processes, machines, and language versions.
References:
Plate (1995) — Holographic Reduced Representations
Gayler (2004) — Vector Symbolic Architectures answer Jackendoff's challenges
"""
import hashlib
import logging
import struct
import math
try:
import numpy as np
_HAS_NUMPY = True
except ImportError:
_HAS_NUMPY = False
logger = logging.getLogger(__name__)
_TWO_PI = 2.0 * math.pi
def _require_numpy() -> None:
if not _HAS_NUMPY:
raise RuntimeError("numpy is required for holographic operations")
def encode_atom(word: str, dim: int = 1024) -> "np.ndarray":
"""Deterministic phase vector via SHA-256 counter blocks.
Uses hashlib (not numpy RNG) for cross-platform reproducibility.
Algorithm:
- Generate enough SHA-256 blocks by hashing f"{word}:{i}" for i=0,1,2,...
- Concatenate digests, interpret as uint16 values via struct.unpack
- Scale to [0, 2π): phases = values * (2π / 65536)
- Truncate to dim elements
- Returns np.float64 array of shape (dim,)
"""
_require_numpy()
# Each SHA-256 digest is 32 bytes = 16 uint16 values.
values_per_block = 16
blocks_needed = math.ceil(dim / values_per_block)
uint16_values: list[int] = []
for i in range(blocks_needed):
digest = hashlib.sha256(f"{word}:{i}".encode()).digest()
uint16_values.extend(struct.unpack("<16H", digest))
phases = np.array(uint16_values[:dim], dtype=np.float64) * (_TWO_PI / 65536.0)
return phases
def bind(a: "np.ndarray", b: "np.ndarray") -> "np.ndarray":
"""Circular convolution = element-wise phase addition.
Binding associates two concepts into a single composite vector.
The result is dissimilar to both inputs (quasi-orthogonal).
"""
_require_numpy()
return (a + b) % _TWO_PI
def unbind(memory: "np.ndarray", key: "np.ndarray") -> "np.ndarray":
"""Circular correlation = element-wise phase subtraction.
Unbinding retrieves the value associated with a key from a memory vector.
unbind(bind(a, b), a) ≈ b (up to superposition noise)
"""
_require_numpy()
return (memory - key) % _TWO_PI
def bundle(*vectors: "np.ndarray") -> "np.ndarray":
"""Superposition via circular mean of complex exponentials.
Bundling merges multiple vectors into one that is similar to each input.
The result can hold O(sqrt(dim)) items before similarity degrades.
"""
_require_numpy()
complex_sum = np.sum([np.exp(1j * v) for v in vectors], axis=0)
return np.angle(complex_sum) % _TWO_PI
def similarity(a: "np.ndarray", b: "np.ndarray") -> float:
"""Phase cosine similarity. Range [-1, 1].
Returns 1.0 for identical vectors, near 0.0 for random (unrelated) vectors,
and -1.0 for perfectly anti-correlated vectors.
"""
_require_numpy()
return float(np.mean(np.cos(a - b)))
def encode_text(text: str, dim: int = 1024) -> "np.ndarray":
"""Bag-of-words: bundle of atom vectors for each token.
Tokenizes by lowercasing, splitting on whitespace, and stripping
leading/trailing punctuation from each token.
Returns bundle of all token atom vectors.
If text is empty or produces no tokens, returns encode_atom("__hrr_empty__", dim).
"""
_require_numpy()
tokens = [
token.strip(".,!?;:\"'()[]{}")
for token in text.lower().split()
]
tokens = [t for t in tokens if t]
if not tokens:
return encode_atom("__hrr_empty__", dim)
atom_vectors = [encode_atom(token, dim) for token in tokens]
return bundle(*atom_vectors)
def encode_fact(content: str, entities: list[str], dim: int = 1024) -> "np.ndarray":
"""Structured encoding: content bound to ROLE_CONTENT, each entity bound to ROLE_ENTITY, all bundled.
Role vectors are reserved atoms: "__hrr_role_content__", "__hrr_role_entity__"
Components:
1. bind(encode_text(content, dim), encode_atom("__hrr_role_content__", dim))
2. For each entity: bind(encode_atom(entity.lower(), dim), encode_atom("__hrr_role_entity__", dim))
3. bundle all components together
This enables algebraic extraction:
unbind(fact, bind(entity, ROLE_ENTITY)) ≈ content_vector
"""
_require_numpy()
role_content = encode_atom("__hrr_role_content__", dim)
role_entity = encode_atom("__hrr_role_entity__", dim)
components: list[np.ndarray] = [
bind(encode_text(content, dim), role_content)
]
for entity in entities:
components.append(bind(encode_atom(entity.lower(), dim), role_entity))
return bundle(*components)
def phases_to_bytes(phases: "np.ndarray") -> bytes:
"""Serialize phase vector to bytes. float64 tobytes — 8 KB at dim=1024."""
_require_numpy()
return phases.tobytes()
def bytes_to_phases(data: bytes) -> "np.ndarray":
"""Deserialize bytes back to phase vector. Inverse of phases_to_bytes.
The .copy() call is required because frombuffer returns a read-only view
backed by the bytes object; callers expect a mutable array.
"""
_require_numpy()
return np.frombuffer(data, dtype=np.float64).copy()
def snr_estimate(dim: int, n_items: int) -> float:
"""Signal-to-noise ratio estimate for holographic storage.
SNR = sqrt(dim / n_items) when n_items > 0, else inf.
The SNR falls below 2.0 when n_items > dim / 4, meaning retrieval
errors become likely. Logs a warning when this threshold is crossed.
"""
_require_numpy()
if n_items <= 0:
return float("inf")
snr = math.sqrt(dim / n_items)
if snr < 2.0:
logger.warning(
"HRR storage near capacity: SNR=%.2f (dim=%d, n_items=%d). "
"Retrieval accuracy may degrade. Consider increasing dim or reducing stored items.",
snr,
dim,
n_items,
)
return snr

View File

@@ -0,0 +1,5 @@
name: holographic
version: 0.1.0
description: "Holographic memory — local SQLite fact store with FTS5 search, trust scoring, and HRR-based compositional retrieval."
hooks:
- on_session_end

View File

@@ -0,0 +1,593 @@
"""Hybrid keyword/BM25 retrieval for the memory store.
Ported from KIK memory_agent.py — combines FTS5 full-text search with
Jaccard similarity reranking and trust-weighted scoring.
"""
from __future__ import annotations
import math
from datetime import datetime, timezone
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from .store import MemoryStore
try:
from . import holographic as hrr
except ImportError:
import holographic as hrr # type: ignore[no-redef]
class FactRetriever:
"""Multi-strategy fact retrieval with trust-weighted scoring."""
def __init__(
self,
store: MemoryStore,
temporal_decay_half_life: int = 0, # days, 0 = disabled
fts_weight: float = 0.4,
jaccard_weight: float = 0.3,
hrr_weight: float = 0.3,
hrr_dim: int = 1024,
):
self.store = store
self.half_life = temporal_decay_half_life
self.hrr_dim = hrr_dim
# Auto-redistribute weights if numpy unavailable
if hrr_weight > 0 and not hrr._HAS_NUMPY:
fts_weight = 0.6
jaccard_weight = 0.4
hrr_weight = 0.0
self.fts_weight = fts_weight
self.jaccard_weight = jaccard_weight
self.hrr_weight = hrr_weight
def search(
self,
query: str,
category: str | None = None,
min_trust: float = 0.3,
limit: int = 10,
) -> list[dict]:
"""Hybrid search: FTS5 candidates → Jaccard rerank → trust weighting.
Pipeline:
1. FTS5 search: Get limit*3 candidates from SQLite full-text search
2. Jaccard boost: Token overlap between query and fact content
3. Trust weighting: final_score = relevance * trust_score
4. Temporal decay (optional): decay = 0.5^(age_days / half_life)
Returns list of dicts with fact data + 'score' field, sorted by score desc.
"""
# Stage 1: Get FTS5 candidates (more than limit for reranking headroom)
candidates = self._fts_candidates(query, category, min_trust, limit * 3)
if not candidates:
return []
# Stage 2: Rerank with Jaccard + trust + optional decay
query_tokens = self._tokenize(query)
scored = []
for fact in candidates:
content_tokens = self._tokenize(fact["content"])
tag_tokens = self._tokenize(fact.get("tags", ""))
all_tokens = content_tokens | tag_tokens
jaccard = self._jaccard_similarity(query_tokens, all_tokens)
fts_score = fact.get("fts_rank", 0.0)
# HRR similarity
if self.hrr_weight > 0 and fact.get("hrr_vector"):
fact_vec = hrr.bytes_to_phases(fact["hrr_vector"])
query_vec = hrr.encode_text(query, self.hrr_dim)
hrr_sim = (hrr.similarity(query_vec, fact_vec) + 1.0) / 2.0 # shift to [0,1]
else:
hrr_sim = 0.5 # neutral
# Combine FTS5 + Jaccard + HRR
relevance = (self.fts_weight * fts_score
+ self.jaccard_weight * jaccard
+ self.hrr_weight * hrr_sim)
# Trust weighting
score = relevance * fact["trust_score"]
# Optional temporal decay
if self.half_life > 0:
score *= self._temporal_decay(fact.get("updated_at") or fact.get("created_at"))
fact["score"] = score
scored.append(fact)
# Sort by score descending, return top limit
scored.sort(key=lambda x: x["score"], reverse=True)
results = scored[:limit]
# Strip raw HRR bytes — callers expect JSON-serializable dicts
for fact in results:
fact.pop("hrr_vector", None)
return results
def probe(
self,
entity: str,
category: str | None = None,
limit: int = 10,
) -> list[dict]:
"""Compositional entity query using HRR algebra.
Unbinds entity from memory bank to extract associated content.
This is NOT keyword search — it uses algebraic structure to find facts
where the entity plays a structural role.
Falls back to FTS5 search if numpy unavailable.
"""
if not hrr._HAS_NUMPY:
# Fallback to keyword search on entity name
return self.search(entity, category=category, limit=limit)
conn = self.store._conn
# Encode entity as role-bound vector
role_entity = hrr.encode_atom("__hrr_role_entity__", self.hrr_dim)
entity_vec = hrr.encode_atom(entity.lower(), self.hrr_dim)
probe_key = hrr.bind(entity_vec, role_entity)
# Try category-specific bank first, then all facts
if category:
bank_name = f"cat:{category}"
bank_row = conn.execute(
"SELECT vector FROM memory_banks WHERE bank_name = ?",
(bank_name,),
).fetchone()
if bank_row:
bank_vec = hrr.bytes_to_phases(bank_row["vector"])
extracted = hrr.unbind(bank_vec, probe_key)
# Use extracted signal to score individual facts
return self._score_facts_by_vector(
extracted, category=category, limit=limit
)
# Score against individual fact vectors directly
where = "WHERE hrr_vector IS NOT NULL"
params: list = []
if category:
where += " AND category = ?"
params.append(category)
rows = conn.execute(
f"""
SELECT fact_id, content, category, tags, trust_score,
retrieval_count, helpful_count, created_at, updated_at,
hrr_vector
FROM facts
{where}
""",
params,
).fetchall()
if not rows:
# Final fallback: keyword search
return self.search(entity, category=category, limit=limit)
scored = []
for row in rows:
fact = dict(row)
fact_vec = hrr.bytes_to_phases(fact.pop("hrr_vector"))
# Unbind probe key from fact to see if entity is structurally present
residual = hrr.unbind(fact_vec, probe_key)
# Compare residual against content signal
role_content = hrr.encode_atom("__hrr_role_content__", self.hrr_dim)
content_vec = hrr.bind(hrr.encode_text(fact["content"], self.hrr_dim), role_content)
sim = hrr.similarity(residual, content_vec)
fact["score"] = (sim + 1.0) / 2.0 * fact["trust_score"]
scored.append(fact)
scored.sort(key=lambda x: x["score"], reverse=True)
return scored[:limit]
def related(
self,
entity: str,
category: str | None = None,
limit: int = 10,
) -> list[dict]:
"""Discover facts that share structural connections with an entity.
Unlike probe (which finds facts *about* an entity), related finds
facts that are connected through shared context — e.g., other entities
mentioned alongside this one, or content that overlaps structurally.
Falls back to FTS5 search if numpy unavailable.
"""
if not hrr._HAS_NUMPY:
return self.search(entity, category=category, limit=limit)
conn = self.store._conn
# Encode entity as a bare atom (not role-bound — we want ANY structural match)
entity_vec = hrr.encode_atom(entity.lower(), self.hrr_dim)
# Get all facts with vectors
where = "WHERE hrr_vector IS NOT NULL"
params: list = []
if category:
where += " AND category = ?"
params.append(category)
rows = conn.execute(
f"""
SELECT fact_id, content, category, tags, trust_score,
retrieval_count, helpful_count, created_at, updated_at,
hrr_vector
FROM facts
{where}
""",
params,
).fetchall()
if not rows:
return self.search(entity, category=category, limit=limit)
# Score each fact by how much the entity's atom appears in its vector
# This catches both role-bound entity matches AND content word matches
scored = []
for row in rows:
fact = dict(row)
fact_vec = hrr.bytes_to_phases(fact.pop("hrr_vector"))
# Check structural similarity: unbind entity from fact
residual = hrr.unbind(fact_vec, entity_vec)
# A high-similarity residual to ANY known role vector means this entity
# plays a structural role in the fact
role_entity = hrr.encode_atom("__hrr_role_entity__", self.hrr_dim)
role_content = hrr.encode_atom("__hrr_role_content__", self.hrr_dim)
entity_role_sim = hrr.similarity(residual, role_entity)
content_role_sim = hrr.similarity(residual, role_content)
# Take the max — entity could appear in either role
best_sim = max(entity_role_sim, content_role_sim)
fact["score"] = (best_sim + 1.0) / 2.0 * fact["trust_score"]
scored.append(fact)
scored.sort(key=lambda x: x["score"], reverse=True)
return scored[:limit]
def reason(
self,
entities: list[str],
category: str | None = None,
limit: int = 10,
) -> list[dict]:
"""Multi-entity compositional query — vector-space JOIN.
Given multiple entities, algebraically intersects their structural
connections to find facts related to ALL of them simultaneously.
This is compositional reasoning that no embedding DB can do.
Example: reason(["peppi", "backend"]) finds facts where peppi AND
backend both play structural roles — without keyword matching.
Falls back to FTS5 search if numpy unavailable.
"""
if not hrr._HAS_NUMPY or not entities:
# Fallback: search with all entities as keywords
query = " ".join(entities)
return self.search(query, category=category, limit=limit)
conn = self.store._conn
role_entity = hrr.encode_atom("__hrr_role_entity__", self.hrr_dim)
# For each entity, compute what the bank "remembers" about it
# by unbinding entity+role from each fact vector
entity_residuals = []
for entity in entities:
entity_vec = hrr.encode_atom(entity.lower(), self.hrr_dim)
probe_key = hrr.bind(entity_vec, role_entity)
entity_residuals.append(probe_key)
# Get all facts with vectors
where = "WHERE hrr_vector IS NOT NULL"
params: list = []
if category:
where += " AND category = ?"
params.append(category)
rows = conn.execute(
f"""
SELECT fact_id, content, category, tags, trust_score,
retrieval_count, helpful_count, created_at, updated_at,
hrr_vector
FROM facts
{where}
""",
params,
).fetchall()
if not rows:
query = " ".join(entities)
return self.search(query, category=category, limit=limit)
# Score each fact by how much EACH entity is structurally present.
# A fact scores high only if ALL entities have structural presence
# (AND semantics via min, vs OR which would use mean/max).
role_content = hrr.encode_atom("__hrr_role_content__", self.hrr_dim)
scored = []
for row in rows:
fact = dict(row)
fact_vec = hrr.bytes_to_phases(fact.pop("hrr_vector"))
entity_scores = []
for probe_key in entity_residuals:
residual = hrr.unbind(fact_vec, probe_key)
sim = hrr.similarity(residual, role_content)
entity_scores.append(sim)
min_sim = min(entity_scores)
fact["score"] = (min_sim + 1.0) / 2.0 * fact["trust_score"]
scored.append(fact)
scored.sort(key=lambda x: x["score"], reverse=True)
return scored[:limit]
def contradict(
self,
category: str | None = None,
threshold: float = 0.3,
limit: int = 10,
) -> list[dict]:
"""Find potentially contradictory facts via entity overlap + content divergence.
Two facts contradict when they share entities (same subject) but have
low content-vector similarity (different claims). This is automated
memory hygiene — no other memory system does this.
Returns pairs of facts with a contradiction score.
Falls back to empty list if numpy unavailable.
"""
if not hrr._HAS_NUMPY:
return []
conn = self.store._conn
# Get all facts with vectors and their linked entities
where = "WHERE f.hrr_vector IS NOT NULL"
params: list = []
if category:
where += " AND f.category = ?"
params.append(category)
rows = conn.execute(
f"""
SELECT f.fact_id, f.content, f.category, f.tags, f.trust_score,
f.created_at, f.updated_at, f.hrr_vector
FROM facts f
{where}
""",
params,
).fetchall()
if len(rows) < 2:
return []
# Guard against O(n²) explosion on large fact stores.
# At 500 facts, that's ~125K comparisons — acceptable.
# Above that, only check the most recently updated facts.
_MAX_CONTRADICT_FACTS = 500
if len(rows) > _MAX_CONTRADICT_FACTS:
rows = sorted(rows, key=lambda r: r["updated_at"] or r["created_at"], reverse=True)
rows = rows[:_MAX_CONTRADICT_FACTS]
# Build entity sets per fact
fact_entities: dict[int, set[str]] = {}
for row in rows:
fid = row["fact_id"]
entity_rows = conn.execute(
"""
SELECT e.name FROM entities e
JOIN fact_entities fe ON fe.entity_id = e.entity_id
WHERE fe.fact_id = ?
""",
(fid,),
).fetchall()
fact_entities[fid] = {r["name"].lower() for r in entity_rows}
# Compare all pairs: high entity overlap + low content similarity = contradiction
facts = [dict(r) for r in rows]
contradictions = []
for i in range(len(facts)):
for j in range(i + 1, len(facts)):
f1, f2 = facts[i], facts[j]
ents1 = fact_entities.get(f1["fact_id"], set())
ents2 = fact_entities.get(f2["fact_id"], set())
if not ents1 or not ents2:
continue
# Entity overlap (Jaccard)
entity_overlap = len(ents1 & ents2) / len(ents1 | ents2) if (ents1 | ents2) else 0.0
if entity_overlap < 0.3:
continue # Not enough entity overlap to be contradictory
# Content similarity via HRR vectors
v1 = hrr.bytes_to_phases(f1["hrr_vector"])
v2 = hrr.bytes_to_phases(f2["hrr_vector"])
content_sim = hrr.similarity(v1, v2)
# High entity overlap + low content similarity = potential contradiction
# contradiction_score: higher = more contradictory
contradiction_score = entity_overlap * (1.0 - (content_sim + 1.0) / 2.0)
if contradiction_score >= threshold:
# Strip hrr_vector from output (not JSON serializable)
f1_clean = {k: v for k, v in f1.items() if k != "hrr_vector"}
f2_clean = {k: v for k, v in f2.items() if k != "hrr_vector"}
contradictions.append({
"fact_a": f1_clean,
"fact_b": f2_clean,
"entity_overlap": round(entity_overlap, 3),
"content_similarity": round(content_sim, 3),
"contradiction_score": round(contradiction_score, 3),
"shared_entities": sorted(ents1 & ents2),
})
contradictions.sort(key=lambda x: x["contradiction_score"], reverse=True)
return contradictions[:limit]
def _score_facts_by_vector(
self,
target_vec: "np.ndarray",
category: str | None = None,
limit: int = 10,
) -> list[dict]:
"""Score facts by similarity to a target vector."""
conn = self.store._conn
where = "WHERE hrr_vector IS NOT NULL"
params: list = []
if category:
where += " AND category = ?"
params.append(category)
rows = conn.execute(
f"""
SELECT fact_id, content, category, tags, trust_score,
retrieval_count, helpful_count, created_at, updated_at,
hrr_vector
FROM facts
{where}
""",
params,
).fetchall()
scored = []
for row in rows:
fact = dict(row)
fact_vec = hrr.bytes_to_phases(fact.pop("hrr_vector"))
sim = hrr.similarity(target_vec, fact_vec)
fact["score"] = (sim + 1.0) / 2.0 * fact["trust_score"]
scored.append(fact)
scored.sort(key=lambda x: x["score"], reverse=True)
return scored[:limit]
def _fts_candidates(
self,
query: str,
category: str | None,
min_trust: float,
limit: int,
) -> list[dict]:
"""Get raw FTS5 candidates from the store.
Uses the store's database connection directly for FTS5 MATCH
with rank scoring. Normalizes FTS5 rank to [0, 1] range.
"""
conn = self.store._conn
# Build query - FTS5 rank is negative (lower = better match)
# We need to join facts_fts with facts to get all columns
params: list = []
where_clauses = ["facts_fts MATCH ?"]
params.append(query)
if category:
where_clauses.append("f.category = ?")
params.append(category)
where_clauses.append("f.trust_score >= ?")
params.append(min_trust)
where_sql = " AND ".join(where_clauses)
sql = f"""
SELECT f.*, facts_fts.rank as fts_rank_raw
FROM facts_fts
JOIN facts f ON f.fact_id = facts_fts.rowid
WHERE {where_sql}
ORDER BY facts_fts.rank
LIMIT ?
"""
params.append(limit)
try:
rows = conn.execute(sql, params).fetchall()
except Exception:
# FTS5 MATCH can fail on malformed queries — fall back to empty
return []
if not rows:
return []
# Normalize FTS5 rank: rank is negative, lower = better
# Convert to positive score in [0, 1] range
raw_ranks = [abs(row["fts_rank_raw"]) for row in rows]
max_rank = max(raw_ranks) if raw_ranks else 1.0
max_rank = max(max_rank, 1e-6) # avoid div by zero
results = []
for row, raw_rank in zip(rows, raw_ranks):
fact = dict(row)
fact.pop("fts_rank_raw", None)
fact["fts_rank"] = raw_rank / max_rank # normalize to [0, 1]
results.append(fact)
return results
@staticmethod
def _tokenize(text: str) -> set[str]:
"""Simple whitespace tokenization with lowercasing.
Strips common punctuation. No stemming/lemmatization (Phase 1).
"""
if not text:
return set()
# Split on whitespace, lowercase, strip punctuation
tokens = set()
for word in text.lower().split():
cleaned = word.strip(".,;:!?\"'()[]{}#@<>")
if cleaned:
tokens.add(cleaned)
return tokens
@staticmethod
def _jaccard_similarity(set_a: set, set_b: set) -> float:
"""Jaccard similarity coefficient: |A ∩ B| / |A B|."""
if not set_a or not set_b:
return 0.0
intersection = len(set_a & set_b)
union = len(set_a | set_b)
return intersection / union if union > 0 else 0.0
def _temporal_decay(self, timestamp_str: str | None) -> float:
"""Exponential decay: 0.5^(age_days / half_life_days).
Returns 1.0 if decay is disabled or timestamp is missing.
"""
if not self.half_life or not timestamp_str:
return 1.0
try:
if isinstance(timestamp_str, str):
# Parse ISO format timestamp from SQLite
ts = datetime.fromisoformat(timestamp_str.replace("Z", "+00:00"))
else:
ts = timestamp_str
if ts.tzinfo is None:
ts = ts.replace(tzinfo=timezone.utc)
age_days = (datetime.now(timezone.utc) - ts).total_seconds() / 86400
if age_days < 0:
return 1.0
return math.pow(0.5, age_days / self.half_life)
except (ValueError, TypeError):
return 1.0

View File

@@ -0,0 +1,575 @@
"""
SQLite-backed fact store with entity resolution and trust scoring.
Single-user Hermes memory store plugin.
"""
import re
import sqlite3
import threading
from datetime import datetime
from pathlib import Path
try:
from . import holographic as hrr
except ImportError:
import holographic as hrr # type: ignore[no-redef]
_SCHEMA = """
CREATE TABLE IF NOT EXISTS facts (
fact_id INTEGER PRIMARY KEY AUTOINCREMENT,
content TEXT NOT NULL UNIQUE,
category TEXT DEFAULT 'general',
tags TEXT DEFAULT '',
trust_score REAL DEFAULT 0.5,
retrieval_count INTEGER DEFAULT 0,
helpful_count INTEGER DEFAULT 0,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
hrr_vector BLOB
);
CREATE TABLE IF NOT EXISTS entities (
entity_id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
entity_type TEXT DEFAULT 'unknown',
aliases TEXT DEFAULT '',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE IF NOT EXISTS fact_entities (
fact_id INTEGER REFERENCES facts(fact_id),
entity_id INTEGER REFERENCES entities(entity_id),
PRIMARY KEY (fact_id, entity_id)
);
CREATE INDEX IF NOT EXISTS idx_facts_trust ON facts(trust_score DESC);
CREATE INDEX IF NOT EXISTS idx_facts_category ON facts(category);
CREATE INDEX IF NOT EXISTS idx_entities_name ON entities(name);
CREATE VIRTUAL TABLE IF NOT EXISTS facts_fts
USING fts5(content, tags, content=facts, content_rowid=fact_id);
CREATE TRIGGER IF NOT EXISTS facts_ai AFTER INSERT ON facts BEGIN
INSERT INTO facts_fts(rowid, content, tags)
VALUES (new.fact_id, new.content, new.tags);
END;
CREATE TRIGGER IF NOT EXISTS facts_ad AFTER DELETE ON facts BEGIN
INSERT INTO facts_fts(facts_fts, rowid, content, tags)
VALUES ('delete', old.fact_id, old.content, old.tags);
END;
CREATE TRIGGER IF NOT EXISTS facts_au AFTER UPDATE ON facts BEGIN
INSERT INTO facts_fts(facts_fts, rowid, content, tags)
VALUES ('delete', old.fact_id, old.content, old.tags);
INSERT INTO facts_fts(rowid, content, tags)
VALUES (new.fact_id, new.content, new.tags);
END;
CREATE TABLE IF NOT EXISTS memory_banks (
bank_id INTEGER PRIMARY KEY AUTOINCREMENT,
bank_name TEXT NOT NULL UNIQUE,
vector BLOB NOT NULL,
dim INTEGER NOT NULL,
fact_count INTEGER DEFAULT 0,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
"""
# Trust adjustment constants
_HELPFUL_DELTA = 0.05
_UNHELPFUL_DELTA = -0.10
_TRUST_MIN = 0.0
_TRUST_MAX = 1.0
# Entity extraction patterns
_RE_CAPITALIZED = re.compile(r'\b([A-Z][a-z]+(?:\s+[A-Z][a-z]+)+)\b')
_RE_DOUBLE_QUOTE = re.compile(r'"([^"]+)"')
_RE_SINGLE_QUOTE = re.compile(r"'([^']+)'")
_RE_AKA = re.compile(
r'(\w+(?:\s+\w+)*)\s+(?:aka|also known as)\s+(\w+(?:\s+\w+)*)',
re.IGNORECASE,
)
def _clamp_trust(value: float) -> float:
return max(_TRUST_MIN, min(_TRUST_MAX, value))
class MemoryStore:
"""SQLite-backed fact store with entity resolution and trust scoring."""
def __init__(
self,
db_path: "str | Path | None" = None,
default_trust: float = 0.5,
hrr_dim: int = 1024,
) -> None:
if db_path is None:
from hermes_constants import get_hermes_home
db_path = str(get_hermes_home() / "memory_store.db")
self.db_path = Path(db_path).expanduser()
self.db_path.parent.mkdir(parents=True, exist_ok=True)
self.default_trust = _clamp_trust(default_trust)
self.hrr_dim = hrr_dim
self._hrr_available = hrr._HAS_NUMPY
self._conn: sqlite3.Connection = sqlite3.connect(
str(self.db_path),
check_same_thread=False,
timeout=10.0,
)
self._lock = threading.RLock()
self._conn.row_factory = sqlite3.Row
self._init_db()
# ------------------------------------------------------------------
# Initialisation
# ------------------------------------------------------------------
def _init_db(self) -> None:
"""Create tables, indexes, and triggers if they do not exist. Enable WAL mode."""
self._conn.execute("PRAGMA journal_mode=WAL")
self._conn.executescript(_SCHEMA)
# Migrate: add hrr_vector column if missing (safe for existing databases)
columns = {row[1] for row in self._conn.execute("PRAGMA table_info(facts)").fetchall()}
if "hrr_vector" not in columns:
self._conn.execute("ALTER TABLE facts ADD COLUMN hrr_vector BLOB")
self._conn.commit()
# ------------------------------------------------------------------
# Public API
# ------------------------------------------------------------------
def add_fact(
self,
content: str,
category: str = "general",
tags: str = "",
) -> int:
"""Insert a fact and return its fact_id.
Deduplicates by content (UNIQUE constraint). On duplicate, returns
the existing fact_id without modifying the row. Extracts entities from
the content and links them to the fact.
"""
with self._lock:
content = content.strip()
if not content:
raise ValueError("content must not be empty")
try:
cur = self._conn.execute(
"""
INSERT INTO facts (content, category, tags, trust_score)
VALUES (?, ?, ?, ?)
""",
(content, category, tags, self.default_trust),
)
self._conn.commit()
fact_id: int = cur.lastrowid # type: ignore[assignment]
except sqlite3.IntegrityError:
# Duplicate content — return existing id
row = self._conn.execute(
"SELECT fact_id FROM facts WHERE content = ?", (content,)
).fetchone()
return int(row["fact_id"])
# Entity extraction and linking
for name in self._extract_entities(content):
entity_id = self._resolve_entity(name)
self._link_fact_entity(fact_id, entity_id)
# Compute HRR vector after entity linking
self._compute_hrr_vector(fact_id, content)
self._rebuild_bank(category)
return fact_id
def search_facts(
self,
query: str,
category: str | None = None,
min_trust: float = 0.3,
limit: int = 10,
) -> list[dict]:
"""Full-text search over facts using FTS5.
Returns a list of fact dicts ordered by FTS5 rank, then trust_score
descending. Also increments retrieval_count for matched facts.
"""
with self._lock:
query = query.strip()
if not query:
return []
params: list = [query, min_trust]
category_clause = ""
if category is not None:
category_clause = "AND f.category = ?"
params.append(category)
params.append(limit)
sql = f"""
SELECT f.fact_id, f.content, f.category, f.tags,
f.trust_score, f.retrieval_count, f.helpful_count,
f.created_at, f.updated_at
FROM facts f
JOIN facts_fts fts ON fts.rowid = f.fact_id
WHERE facts_fts MATCH ?
AND f.trust_score >= ?
{category_clause}
ORDER BY fts.rank, f.trust_score DESC
LIMIT ?
"""
rows = self._conn.execute(sql, params).fetchall()
results = [self._row_to_dict(r) for r in rows]
if results:
ids = [r["fact_id"] for r in results]
placeholders = ",".join("?" * len(ids))
self._conn.execute(
f"UPDATE facts SET retrieval_count = retrieval_count + 1 WHERE fact_id IN ({placeholders})",
ids,
)
self._conn.commit()
return results
def update_fact(
self,
fact_id: int,
content: str | None = None,
trust_delta: float | None = None,
tags: str | None = None,
category: str | None = None,
) -> bool:
"""Partially update a fact. Trust is clamped to [0, 1].
Returns True if the row existed, False otherwise.
"""
with self._lock:
row = self._conn.execute(
"SELECT fact_id, trust_score FROM facts WHERE fact_id = ?", (fact_id,)
).fetchone()
if row is None:
return False
assignments: list[str] = ["updated_at = CURRENT_TIMESTAMP"]
params: list = []
if content is not None:
assignments.append("content = ?")
params.append(content.strip())
if tags is not None:
assignments.append("tags = ?")
params.append(tags)
if category is not None:
assignments.append("category = ?")
params.append(category)
if trust_delta is not None:
new_trust = _clamp_trust(row["trust_score"] + trust_delta)
assignments.append("trust_score = ?")
params.append(new_trust)
params.append(fact_id)
self._conn.execute(
f"UPDATE facts SET {', '.join(assignments)} WHERE fact_id = ?",
params,
)
self._conn.commit()
# If content changed, re-extract entities
if content is not None:
self._conn.execute(
"DELETE FROM fact_entities WHERE fact_id = ?", (fact_id,)
)
for name in self._extract_entities(content):
entity_id = self._resolve_entity(name)
self._link_fact_entity(fact_id, entity_id)
self._conn.commit()
# Recompute HRR vector if content changed
if content is not None:
self._compute_hrr_vector(fact_id, content)
# Rebuild bank for relevant category
cat = category or self._conn.execute(
"SELECT category FROM facts WHERE fact_id = ?", (fact_id,)
).fetchone()["category"]
self._rebuild_bank(cat)
return True
def remove_fact(self, fact_id: int) -> bool:
"""Delete a fact and its entity links. Returns True if the row existed."""
with self._lock:
row = self._conn.execute(
"SELECT fact_id, category FROM facts WHERE fact_id = ?", (fact_id,)
).fetchone()
if row is None:
return False
self._conn.execute(
"DELETE FROM fact_entities WHERE fact_id = ?", (fact_id,)
)
self._conn.execute("DELETE FROM facts WHERE fact_id = ?", (fact_id,))
self._conn.commit()
self._rebuild_bank(row["category"])
return True
def list_facts(
self,
category: str | None = None,
min_trust: float = 0.0,
limit: int = 50,
) -> list[dict]:
"""Browse facts ordered by trust_score descending.
Optionally filter by category and minimum trust score.
"""
with self._lock:
params: list = [min_trust]
category_clause = ""
if category is not None:
category_clause = "AND category = ?"
params.append(category)
params.append(limit)
sql = f"""
SELECT fact_id, content, category, tags, trust_score,
retrieval_count, helpful_count, created_at, updated_at
FROM facts
WHERE trust_score >= ?
{category_clause}
ORDER BY trust_score DESC
LIMIT ?
"""
rows = self._conn.execute(sql, params).fetchall()
return [self._row_to_dict(r) for r in rows]
def record_feedback(self, fact_id: int, helpful: bool) -> dict:
"""Record user feedback and adjust trust asymmetrically.
helpful=True -> trust += 0.05, helpful_count += 1
helpful=False -> trust -= 0.10
Returns a dict with fact_id, old_trust, new_trust, helpful_count.
Raises KeyError if fact_id does not exist.
"""
with self._lock:
row = self._conn.execute(
"SELECT fact_id, trust_score, helpful_count FROM facts WHERE fact_id = ?",
(fact_id,),
).fetchone()
if row is None:
raise KeyError(f"fact_id {fact_id} not found")
old_trust: float = row["trust_score"]
delta = _HELPFUL_DELTA if helpful else _UNHELPFUL_DELTA
new_trust = _clamp_trust(old_trust + delta)
helpful_increment = 1 if helpful else 0
self._conn.execute(
"""
UPDATE facts
SET trust_score = ?,
helpful_count = helpful_count + ?,
updated_at = CURRENT_TIMESTAMP
WHERE fact_id = ?
""",
(new_trust, helpful_increment, fact_id),
)
self._conn.commit()
return {
"fact_id": fact_id,
"old_trust": old_trust,
"new_trust": new_trust,
"helpful_count": row["helpful_count"] + helpful_increment,
}
# ------------------------------------------------------------------
# Entity helpers
# ------------------------------------------------------------------
def _extract_entities(self, text: str) -> list[str]:
"""Extract entity candidates from text using simple regex rules.
Rules applied (in order):
1. Capitalized multi-word phrases e.g. "John Doe"
2. Double-quoted terms e.g. "Python"
3. Single-quoted terms e.g. 'pytest'
4. AKA patterns e.g. "Guido aka BDFL" -> two entities
Returns a deduplicated list preserving first-seen order.
"""
seen: set[str] = set()
candidates: list[str] = []
def _add(name: str) -> None:
stripped = name.strip()
if stripped and stripped.lower() not in seen:
seen.add(stripped.lower())
candidates.append(stripped)
for m in _RE_CAPITALIZED.finditer(text):
_add(m.group(1))
for m in _RE_DOUBLE_QUOTE.finditer(text):
_add(m.group(1))
for m in _RE_SINGLE_QUOTE.finditer(text):
_add(m.group(1))
for m in _RE_AKA.finditer(text):
_add(m.group(1))
_add(m.group(2))
return candidates
def _resolve_entity(self, name: str) -> int:
"""Find an existing entity by name or alias (case-insensitive) or create one.
Returns the entity_id.
"""
# Exact name match
row = self._conn.execute(
"SELECT entity_id FROM entities WHERE name LIKE ?", (name,)
).fetchone()
if row is not None:
return int(row["entity_id"])
# Search aliases — aliases stored as comma-separated; use LIKE with % boundaries
alias_row = self._conn.execute(
"""
SELECT entity_id FROM entities
WHERE ',' || aliases || ',' LIKE '%,' || ? || ',%'
""",
(name,),
).fetchone()
if alias_row is not None:
return int(alias_row["entity_id"])
# Create new entity
cur = self._conn.execute(
"INSERT INTO entities (name) VALUES (?)", (name,)
)
self._conn.commit()
return int(cur.lastrowid) # type: ignore[return-value]
def _link_fact_entity(self, fact_id: int, entity_id: int) -> None:
"""Insert into fact_entities, silently ignore if the link already exists."""
self._conn.execute(
"""
INSERT OR IGNORE INTO fact_entities (fact_id, entity_id)
VALUES (?, ?)
""",
(fact_id, entity_id),
)
self._conn.commit()
def _compute_hrr_vector(self, fact_id: int, content: str) -> None:
"""Compute and store HRR vector for a fact. No-op if numpy unavailable."""
with self._lock:
if not self._hrr_available:
return
# Get entities linked to this fact
rows = self._conn.execute(
"""
SELECT e.name FROM entities e
JOIN fact_entities fe ON fe.entity_id = e.entity_id
WHERE fe.fact_id = ?
""",
(fact_id,),
).fetchall()
entities = [row["name"] for row in rows]
vector = hrr.encode_fact(content, entities, self.hrr_dim)
self._conn.execute(
"UPDATE facts SET hrr_vector = ? WHERE fact_id = ?",
(hrr.phases_to_bytes(vector), fact_id),
)
self._conn.commit()
def _rebuild_bank(self, category: str) -> None:
"""Full rebuild of a category's memory bank from all its fact vectors."""
with self._lock:
if not self._hrr_available:
return
bank_name = f"cat:{category}"
rows = self._conn.execute(
"SELECT hrr_vector FROM facts WHERE category = ? AND hrr_vector IS NOT NULL",
(category,),
).fetchall()
if not rows:
self._conn.execute("DELETE FROM memory_banks WHERE bank_name = ?", (bank_name,))
self._conn.commit()
return
vectors = [hrr.bytes_to_phases(row["hrr_vector"]) for row in rows]
bank_vector = hrr.bundle(*vectors)
fact_count = len(vectors)
# Check SNR
hrr.snr_estimate(self.hrr_dim, fact_count)
self._conn.execute(
"""
INSERT INTO memory_banks (bank_name, vector, dim, fact_count, updated_at)
VALUES (?, ?, ?, ?, CURRENT_TIMESTAMP)
ON CONFLICT(bank_name) DO UPDATE SET
vector = excluded.vector,
dim = excluded.dim,
fact_count = excluded.fact_count,
updated_at = excluded.updated_at
""",
(bank_name, hrr.phases_to_bytes(bank_vector), self.hrr_dim, fact_count),
)
self._conn.commit()
def rebuild_all_vectors(self, dim: int | None = None) -> int:
"""Recompute all HRR vectors + banks from text. For recovery/migration.
Returns the number of facts processed.
"""
with self._lock:
if not self._hrr_available:
return 0
if dim is not None:
self.hrr_dim = dim
rows = self._conn.execute(
"SELECT fact_id, content, category FROM facts"
).fetchall()
categories: set[str] = set()
for row in rows:
self._compute_hrr_vector(row["fact_id"], row["content"])
categories.add(row["category"])
for category in categories:
self._rebuild_bank(category)
return len(rows)
# ------------------------------------------------------------------
# Utilities
# ------------------------------------------------------------------
def _row_to_dict(self, row: sqlite3.Row) -> dict:
"""Convert a sqlite3.Row to a plain dict."""
return dict(row)
def close(self) -> None:
"""Close the database connection."""
self._conn.close()
def __enter__(self) -> "MemoryStore":
return self
def __exit__(self, *_: object) -> None:
self.close()

View File

@@ -0,0 +1,35 @@
# Honcho Memory Provider
AI-native cross-session user modeling with dialectic Q&A, semantic search, peer cards, and persistent conclusions.
## Requirements
- `pip install honcho-ai`
- Honcho API key from [app.honcho.dev](https://app.honcho.dev)
## Setup
```bash
hermes memory setup # select "honcho"
```
Or manually:
```bash
hermes config set memory.provider honcho
echo "HONCHO_API_KEY=your-key" >> ~/.hermes/.env
```
## Config
Config file: `$HERMES_HOME/honcho.json` (or `~/.honcho/config.json` legacy)
Existing Honcho users: your config and data are preserved. Just set `memory.provider: honcho`.
## Tools
| Tool | Description |
|------|-------------|
| `honcho_profile` | User's peer card — key facts, no LLM |
| `honcho_search` | Semantic search over stored context |
| `honcho_context` | LLM-synthesized answer from memory |
| `honcho_conclude` | Write a fact about the user to memory |

View File

@@ -0,0 +1,355 @@
"""Honcho memory plugin — MemoryProvider for Honcho AI-native memory.
Provides cross-session user modeling with dialectic Q&A, semantic search,
peer cards, and persistent conclusions via the Honcho SDK. Honcho provides AI-native cross-session user
modeling with dialectic Q&A, semantic search, peer cards, and conclusions.
The 4 tools (profile, search, context, conclude) are exposed through
the MemoryProvider interface.
Config: Uses the existing Honcho config chain:
1. $HERMES_HOME/honcho.json (profile-scoped)
2. ~/.honcho/config.json (legacy global)
3. Environment variables
"""
from __future__ import annotations
import json
import logging
import threading
from typing import Any, Dict, List, Optional
from agent.memory_provider import MemoryProvider
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Tool schemas (moved from tools/honcho_tools.py)
# ---------------------------------------------------------------------------
PROFILE_SCHEMA = {
"name": "honcho_profile",
"description": (
"Retrieve the user's peer card from Honcho — a curated list of key facts "
"about them (name, role, preferences, communication style, patterns). "
"Fast, no LLM reasoning, minimal cost. "
"Use this at conversation start or when you need a quick factual snapshot."
),
"parameters": {"type": "object", "properties": {}, "required": []},
}
SEARCH_SCHEMA = {
"name": "honcho_search",
"description": (
"Semantic search over Honcho's stored context about the user. "
"Returns raw excerpts ranked by relevance — no LLM synthesis. "
"Cheaper and faster than honcho_context. "
"Good when you want to find specific past facts and reason over them yourself."
),
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "What to search for in Honcho's memory.",
},
"max_tokens": {
"type": "integer",
"description": "Token budget for returned context (default 800, max 2000).",
},
},
"required": ["query"],
},
}
CONTEXT_SCHEMA = {
"name": "honcho_context",
"description": (
"Ask Honcho a natural language question and get a synthesized answer. "
"Uses Honcho's LLM (dialectic reasoning) — higher cost than honcho_profile or honcho_search. "
"Can query about any peer: the user (default) or the AI assistant."
),
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "A natural language question.",
},
"peer": {
"type": "string",
"description": "Which peer to query about: 'user' (default) or 'ai'.",
},
},
"required": ["query"],
},
}
CONCLUDE_SCHEMA = {
"name": "honcho_conclude",
"description": (
"Write a conclusion about the user back to Honcho's memory. "
"Conclusions are persistent facts that build the user's profile. "
"Use when the user states a preference, corrects you, or shares "
"something to remember across sessions."
),
"parameters": {
"type": "object",
"properties": {
"conclusion": {
"type": "string",
"description": "A factual statement about the user to persist.",
}
},
"required": ["conclusion"],
},
}
# ---------------------------------------------------------------------------
# MemoryProvider implementation
# ---------------------------------------------------------------------------
class HonchoMemoryProvider(MemoryProvider):
"""Honcho AI-native memory with dialectic Q&A and persistent user modeling."""
def __init__(self):
self._manager = None # HonchoSessionManager
self._config = None # HonchoClientConfig
self._session_key = ""
self._prefetch_result = ""
self._prefetch_lock = threading.Lock()
self._prefetch_thread: Optional[threading.Thread] = None
self._sync_thread: Optional[threading.Thread] = None
@property
def name(self) -> str:
return "honcho"
def is_available(self) -> bool:
"""Check if Honcho is configured. No network calls."""
try:
from plugins.memory.honcho.client import HonchoClientConfig
cfg = HonchoClientConfig.from_global_config()
return cfg.enabled and bool(cfg.api_key or cfg.base_url)
except Exception:
return False
def save_config(self, values, hermes_home):
"""Write config to $HERMES_HOME/honcho.json (Honcho SDK native format)."""
import json
from pathlib import Path
config_path = Path(hermes_home) / "honcho.json"
existing = {}
if config_path.exists():
try:
existing = json.loads(config_path.read_text())
except Exception:
pass
existing.update(values)
config_path.write_text(json.dumps(existing, indent=2))
def get_config_schema(self):
return [
{"key": "api_key", "description": "Honcho API key", "secret": True, "env_var": "HONCHO_API_KEY", "url": "https://app.honcho.dev"},
{"key": "base_url", "description": "Honcho base URL", "default": "https://api.honcho.dev"},
]
def initialize(self, session_id: str, **kwargs) -> None:
"""Initialize Honcho session manager."""
try:
from plugins.memory.honcho.client import HonchoClientConfig, get_honcho_client
from plugins.memory.honcho.session import HonchoSessionManager
cfg = HonchoClientConfig.from_global_config()
if not cfg.enabled or not (cfg.api_key or cfg.base_url):
logger.debug("Honcho not configured — plugin inactive")
return
self._config = cfg
client = get_honcho_client(cfg)
self._manager = HonchoSessionManager(
honcho=client,
config=cfg,
context_tokens=cfg.context_tokens,
)
# Build session key from kwargs or session_id
platform = kwargs.get("platform", "cli")
user_id = kwargs.get("user_id", "")
if user_id:
self._session_key = f"{platform}:{user_id}"
else:
self._session_key = session_id
except ImportError:
logger.debug("honcho-ai package not installed — plugin inactive")
except Exception as e:
logger.warning("Honcho init failed: %s", e)
self._manager = None
def system_prompt_block(self) -> str:
if not self._manager or not self._session_key:
return ""
return (
"# Honcho Memory\n"
"Active. AI-native cross-session user modeling.\n"
"Use honcho_profile for a quick factual snapshot, "
"honcho_search for raw excerpts, honcho_context for synthesized answers, "
"honcho_conclude to save facts about the user."
)
def prefetch(self, query: str, *, session_id: str = "") -> str:
"""Return prefetched dialectic context from background thread."""
if self._prefetch_thread and self._prefetch_thread.is_alive():
self._prefetch_thread.join(timeout=3.0)
with self._prefetch_lock:
result = self._prefetch_result
self._prefetch_result = ""
if not result:
return ""
return f"## Honcho Context\n{result}"
def queue_prefetch(self, query: str, *, session_id: str = "") -> None:
"""Fire a background dialectic query for the upcoming turn."""
if not self._manager or not self._session_key or not query:
return
def _run():
try:
result = self._manager.dialectic_query(
self._session_key, query, peer="user"
)
if result and result.strip():
with self._prefetch_lock:
self._prefetch_result = result
except Exception as e:
logger.debug("Honcho prefetch failed: %s", e)
self._prefetch_thread = threading.Thread(
target=_run, daemon=True, name="honcho-prefetch"
)
self._prefetch_thread.start()
def sync_turn(self, user_content: str, assistant_content: str, *, session_id: str = "") -> None:
"""Record the conversation turn in Honcho (non-blocking)."""
if not self._manager or not self._session_key:
return
def _sync():
try:
session = self._manager.get_or_create_session(self._session_key)
session.add_message("user", user_content[:4000])
session.add_message("assistant", assistant_content[:4000])
# Flush to Honcho API
self._manager._flush_session(session)
except Exception as e:
logger.debug("Honcho sync_turn failed: %s", e)
if self._sync_thread and self._sync_thread.is_alive():
self._sync_thread.join(timeout=5.0)
self._sync_thread = threading.Thread(
target=_sync, daemon=True, name="honcho-sync"
)
self._sync_thread.start()
def on_memory_write(self, action: str, target: str, content: str) -> None:
"""Mirror built-in user profile writes as Honcho conclusions."""
if action != "add" or target != "user" or not content:
return
if not self._manager or not self._session_key:
return
def _write():
try:
self._manager.create_conclusion(self._session_key, content)
except Exception as e:
logger.debug("Honcho memory mirror failed: %s", e)
t = threading.Thread(target=_write, daemon=True, name="honcho-memwrite")
t.start()
def on_session_end(self, messages: List[Dict[str, Any]]) -> None:
"""Flush all pending messages to Honcho on session end."""
if not self._manager:
return
# Wait for pending sync
if self._sync_thread and self._sync_thread.is_alive():
self._sync_thread.join(timeout=10.0)
try:
self._manager.flush_all()
except Exception as e:
logger.debug("Honcho session-end flush failed: %s", e)
def get_tool_schemas(self) -> List[Dict[str, Any]]:
return [PROFILE_SCHEMA, SEARCH_SCHEMA, CONTEXT_SCHEMA, CONCLUDE_SCHEMA]
def handle_tool_call(self, tool_name: str, args: dict, **kwargs) -> str:
if not self._manager or not self._session_key:
return json.dumps({"error": "Honcho is not active for this session."})
try:
if tool_name == "honcho_profile":
card = self._manager.get_peer_card(self._session_key)
if not card:
return json.dumps({"result": "No profile facts available yet."})
return json.dumps({"result": card})
elif tool_name == "honcho_search":
query = args.get("query", "")
if not query:
return json.dumps({"error": "Missing required parameter: query"})
max_tokens = min(int(args.get("max_tokens", 800)), 2000)
result = self._manager.search_context(
self._session_key, query, max_tokens=max_tokens
)
if not result:
return json.dumps({"result": "No relevant context found."})
return json.dumps({"result": result})
elif tool_name == "honcho_context":
query = args.get("query", "")
if not query:
return json.dumps({"error": "Missing required parameter: query"})
peer = args.get("peer", "user")
result = self._manager.dialectic_query(
self._session_key, query, peer=peer
)
return json.dumps({"result": result or "No result from Honcho."})
elif tool_name == "honcho_conclude":
conclusion = args.get("conclusion", "")
if not conclusion:
return json.dumps({"error": "Missing required parameter: conclusion"})
ok = self._manager.create_conclusion(self._session_key, conclusion)
if ok:
return json.dumps({"result": f"Conclusion saved: {conclusion}"})
return json.dumps({"error": "Failed to save conclusion."})
return json.dumps({"error": f"Unknown tool: {tool_name}"})
except Exception as e:
logger.error("Honcho tool %s failed: %s", tool_name, e)
return json.dumps({"error": f"Honcho {tool_name} failed: {e}"})
def shutdown(self) -> None:
for t in (self._prefetch_thread, self._sync_thread):
if t and t.is_alive():
t.join(timeout=5.0)
# Flush any remaining messages
if self._manager:
try:
self._manager.flush_all()
except Exception:
pass
# ---------------------------------------------------------------------------
# Plugin entry point
# ---------------------------------------------------------------------------
def register(ctx) -> None:
"""Register Honcho as a memory provider plugin."""
ctx.register_memory_provider(HonchoMemoryProvider())

View File

@@ -11,9 +11,228 @@ import sys
from pathlib import Path
from hermes_constants import get_hermes_home
from honcho_integration.client import resolve_config_path, GLOBAL_CONFIG_PATH
from plugins.memory.honcho.client import resolve_active_host, resolve_config_path, GLOBAL_CONFIG_PATH, HOST
HOST = "hermes"
def clone_honcho_for_profile(profile_name: str) -> bool:
"""Auto-clone Honcho config for a new profile from the default host block.
Called during profile creation. If Honcho is configured on the default
host, creates a new host block for the profile with inherited settings
and auto-derived workspace/aiPeer.
Returns True if a host block was created, False if Honcho isn't configured.
"""
cfg = _read_config()
if not cfg:
return False
hosts = cfg.get("hosts", {})
default_block = hosts.get(HOST, {})
# No default host block and no root-level API key = Honcho not configured
has_key = bool(cfg.get("apiKey") or os.environ.get("HONCHO_API_KEY"))
if not default_block and not has_key:
return False
new_host = f"{HOST}.{profile_name}"
if new_host in hosts:
return False # already exists
# Clone settings from default block, override identity fields
new_block = {}
for key in ("memoryMode", "recallMode", "writeFrequency", "sessionStrategy",
"sessionPeerPrefix", "contextTokens", "dialecticReasoningLevel",
"dialecticMaxChars", "saveMessages"):
val = default_block.get(key)
if val is not None:
new_block[key] = val
# Inherit peer name from default
peer_name = default_block.get("peerName") or cfg.get("peerName")
if peer_name:
new_block["peerName"] = peer_name
# AI peer is profile-specific; workspace is shared so all profiles
# see the same user context, sessions, and project history.
# Use the bare profile name as the peer identity (not the host key)
# because Honcho's peer ID pattern is ^[a-zA-Z0-9_-]+$ (no dots).
new_block["aiPeer"] = profile_name
new_block["workspace"] = default_block.get("workspace") or cfg.get("workspace") or HOST
new_block["enabled"] = default_block.get("enabled", True)
cfg.setdefault("hosts", {})[new_host] = new_block
_write_config(cfg)
# Eagerly create the peer in Honcho so it exists before first message
_ensure_peer_exists(new_host)
return True
def _ensure_peer_exists(host_key: str | None = None) -> bool:
"""Create the AI peer in Honcho if it doesn't already exist.
Idempotent -- safe to call multiple times. Returns True if the peer
was created or already exists, False on failure.
"""
try:
from plugins.memory.honcho.client import HonchoClientConfig, get_honcho_client
hcfg = HonchoClientConfig.from_global_config(host=host_key)
if not hcfg.enabled or not (hcfg.api_key or hcfg.base_url):
return False
client = get_honcho_client(hcfg)
# peer() is idempotent -- creates if missing, returns if exists
client.peer(hcfg.ai_peer)
if hcfg.peer_name:
client.peer(hcfg.peer_name)
return True
except Exception:
return False
def cmd_enable(args) -> None:
"""Enable Honcho for the active profile."""
cfg = _read_config()
host = _host_key()
label = f"[{host}] " if host != "hermes" else ""
block = cfg.setdefault("hosts", {}).setdefault(host, {})
if block.get("enabled") is True:
print(f" {label}Honcho is already enabled.\n")
return
block["enabled"] = True
# If this is a new profile host block with no settings, clone from default
if not block.get("aiPeer"):
default_block = cfg.get("hosts", {}).get(HOST, {})
for key in ("memoryMode", "recallMode", "writeFrequency", "sessionStrategy",
"contextTokens", "dialecticReasoningLevel", "dialecticMaxChars"):
val = default_block.get(key)
if val is not None and key not in block:
block[key] = val
peer_name = default_block.get("peerName") or cfg.get("peerName")
if peer_name and "peerName" not in block:
block["peerName"] = peer_name
# Use bare profile name as AI peer, not the host key
ai_peer = host.split(".", 1)[1] if "." in host else host
block.setdefault("aiPeer", ai_peer)
block.setdefault("workspace", default_block.get("workspace") or cfg.get("workspace") or HOST)
_write_config(cfg)
print(f" {label}Honcho enabled.")
# Create peer eagerly
if _ensure_peer_exists(host):
print(f" {label}Peer '{block.get('aiPeer', host)}' ready.")
else:
print(f" {label}Peer creation deferred (no connection).")
print(f" Saved to {_config_path()}\n")
def cmd_disable(args) -> None:
"""Disable Honcho for the active profile."""
cfg = _read_config()
host = _host_key()
label = f"[{host}] " if host != "hermes" else ""
block = cfg.get("hosts", {}).get(host, {})
if not block or block.get("enabled") is False:
print(f" {label}Honcho is already disabled.\n")
return
block["enabled"] = False
_write_config(cfg)
print(f" {label}Honcho disabled.")
print(f" Saved to {_config_path()}\n")
def cmd_sync(args) -> None:
"""Sync Honcho config to all existing profiles.
Scans all Hermes profiles and creates host blocks for any that don't
have one yet. Inherits settings from the default host block.
"""
try:
from hermes_cli.profiles import list_profiles
profiles = list_profiles()
except Exception as e:
print(f" Could not list profiles: {e}\n")
return
cfg = _read_config()
if not cfg:
print(" No Honcho config found. Run 'hermes honcho setup' first.\n")
return
hosts = cfg.get("hosts", {})
default_block = hosts.get(HOST, {})
has_key = bool(cfg.get("apiKey") or os.environ.get("HONCHO_API_KEY"))
if not default_block and not has_key:
print(" Honcho not configured on default profile. Run 'hermes honcho setup' first.\n")
return
created = 0
skipped = 0
for p in profiles:
if p.name == "default":
continue
if clone_honcho_for_profile(p.name):
print(f" + {p.name} -> hermes.{p.name}")
created += 1
else:
skipped += 1
if created:
print(f"\n {created} profile(s) synced.")
else:
print(" All profiles already have Honcho config.")
if skipped:
print(f" {skipped} profile(s) already configured (skipped).")
print()
def sync_honcho_profiles_quiet() -> int:
"""Sync Honcho host blocks for all profiles. Returns count of newly created blocks.
Called from `hermes update` -- no output, no exceptions.
"""
try:
from hermes_cli.profiles import list_profiles
profiles = list_profiles()
except Exception:
return 0
cfg = _read_config()
if not cfg:
return 0
default_block = cfg.get("hosts", {}).get(HOST, {})
has_key = bool(cfg.get("apiKey") or os.environ.get("HONCHO_API_KEY"))
if not default_block and not has_key:
return 0
created = 0
for p in profiles:
if p.name == "default":
continue
if clone_honcho_for_profile(p.name):
created += 1
return created
_profile_override: str | None = None
def _host_key() -> str:
"""Return the active Honcho host key, derived from the current Hermes profile."""
if _profile_override:
if _profile_override in ("default", "custom"):
return HOST
return f"{HOST}.{_profile_override}"
return resolve_active_host()
def _config_path() -> Path:
@@ -52,7 +271,7 @@ def _write_config(cfg: dict, path: Path | None = None) -> None:
def _resolve_api_key(cfg: dict) -> str:
"""Resolve API key with host -> root -> env fallback."""
host_key = ((cfg.get("hosts") or {}).get(HOST) or {}).get("apiKey")
host_key = ((cfg.get("hosts") or {}).get(_host_key()) or {}).get("apiKey")
return host_key or cfg.get("apiKey", "") or os.environ.get("HONCHO_API_KEY", "")
@@ -118,10 +337,10 @@ def cmd_setup(args) -> None:
if not _ensure_sdk_installed():
return
# All writes go to hosts.hermes — root keys are managed by the user
# or the honcho CLI only.
# All writes go to the active host block — root keys are managed by
# the user or the honcho CLI only.
hosts = cfg.setdefault("hosts", {})
hermes_host = hosts.setdefault(HOST, {})
hermes_host = hosts.setdefault(_host_key(), {})
# API key — shared credential, lives at root so all hosts can read it
current_key = cfg.get("apiKey", "")
@@ -148,7 +367,7 @@ def cmd_setup(args) -> None:
if new_workspace:
hermes_host["workspace"] = new_workspace
hermes_host.setdefault("aiPeer", HOST)
hermes_host.setdefault("aiPeer", _host_key())
# Memory mode
current_mode = hermes_host.get("memoryMode") or cfg.get("memoryMode", "hybrid")
@@ -205,9 +424,9 @@ def cmd_setup(args) -> None:
# Test connection
print(" Testing connection... ", end="", flush=True)
try:
from honcho_integration.client import HonchoClientConfig, get_honcho_client, reset_honcho_client
from plugins.memory.honcho.client import HonchoClientConfig, get_honcho_client, reset_honcho_client
reset_honcho_client()
hcfg = HonchoClientConfig.from_global_config()
hcfg = HonchoClientConfig.from_global_config(host=_host_key())
get_honcho_client(hcfg)
print("OK")
except Exception as e:
@@ -237,8 +456,53 @@ def cmd_setup(args) -> None:
print(" hermes honcho map <name> — map this directory to a session name\n")
def _active_profile_name() -> str:
"""Return the active Hermes profile name (respects --target-profile override)."""
if _profile_override:
return _profile_override
try:
from hermes_cli.profiles import get_active_profile_name
return get_active_profile_name()
except Exception:
return "default"
def _all_profile_host_configs() -> list[tuple[str, str, dict]]:
"""Return (profile_name, host_key, host_block) for every known profile.
Reads honcho.json once and maps each profile to its host block.
"""
try:
from hermes_cli.profiles import list_profiles
profiles = list_profiles()
except Exception:
return [(_active_profile_name(), _host_key(), {})]
cfg = _read_config()
hosts = cfg.get("hosts", {})
results = []
# Default profile
default_block = hosts.get(HOST, {})
results.append(("default", HOST, default_block))
for p in profiles:
if p.name == "default":
continue
h = f"{HOST}.{p.name}"
results.append((p.name, h, hosts.get(h, {})))
return results
def cmd_status(args) -> None:
"""Show current Honcho config and connection status."""
show_all = getattr(args, "all", False)
if show_all:
_cmd_status_all()
return
try:
import honcho # noqa: F401
except ImportError:
@@ -256,8 +520,8 @@ def cmd_status(args) -> None:
return
try:
from honcho_integration.client import HonchoClientConfig, get_honcho_client
hcfg = HonchoClientConfig.from_global_config()
from plugins.memory.honcho.client import HonchoClientConfig, get_honcho_client
hcfg = HonchoClientConfig.from_global_config(host=_host_key())
except Exception as e:
print(f" Config error: {e}\n")
return
@@ -265,11 +529,16 @@ def cmd_status(args) -> None:
api_key = hcfg.api_key or ""
masked = f"...{api_key[-8:]}" if len(api_key) > 8 else ("set" if api_key else "not set")
print("\nHoncho status\n" + "" * 40)
profile = _active_profile_name()
profile_label = f" [{hcfg.host}]" if profile != "default" else ""
print(f"\nHoncho status{profile_label}\n" + "" * 40)
if profile != "default":
print(f" Profile: {profile}")
print(f" Host: {hcfg.host}")
print(f" Enabled: {hcfg.enabled}")
print(f" API key: {masked}")
print(f" Workspace: {hcfg.workspace_id}")
print(f" Host: {hcfg.host}")
print(f" Config path: {active_path}")
if write_path != active_path:
print(f" Write path: {write_path} (instance-local)")
@@ -287,8 +556,9 @@ def cmd_status(args) -> None:
if hcfg.enabled and (hcfg.api_key or hcfg.base_url):
print("\n Connection... ", end="", flush=True)
try:
get_honcho_client(hcfg)
print("OK\n")
client = get_honcho_client(hcfg)
print("OK")
_show_peer_cards(hcfg, client)
except Exception as e:
print(f"FAILED ({e})\n")
else:
@@ -296,6 +566,90 @@ def cmd_status(args) -> None:
print(f"\n Not connected ({reason})\n")
def _show_peer_cards(hcfg, client) -> None:
"""Fetch and display peer cards for the active profile.
Uses get_or_create to ensure the session exists with peers configured.
This is idempotent -- if the session already exists on the server it's
just retrieved, not duplicated.
"""
try:
from plugins.memory.honcho.session import HonchoSessionManager
mgr = HonchoSessionManager(honcho=client, config=hcfg)
session_key = hcfg.resolve_session_name()
mgr.get_or_create(session_key)
# User peer card
card = mgr.get_peer_card(session_key)
if card:
print(f"\n User peer card ({len(card)} facts):")
for fact in card[:10]:
print(f" - {fact}")
if len(card) > 10:
print(f" ... and {len(card) - 10} more")
# AI peer representation
ai_rep = mgr.get_ai_representation(session_key)
ai_text = ai_rep.get("representation", "")
if ai_text:
# Truncate to first 200 chars
display = ai_text[:200] + ("..." if len(ai_text) > 200 else "")
print(f"\n AI peer representation:")
print(f" {display}")
if not card and not ai_text:
print("\n No peer data yet (accumulates after first conversation)")
print()
except Exception as e:
print(f"\n Peer data unavailable: {e}\n")
def _cmd_status_all() -> None:
"""Show Honcho config overview across all profiles."""
rows = _all_profile_host_configs()
cfg = _read_config()
active = _active_profile_name()
print(f"\nHoncho profiles ({len(rows)})\n" + "" * 60)
print(f" {'Profile':<14} {'Host':<22} {'Enabled':<9} {'Mode':<9} {'Recall':<9} {'Write'}")
print(f" {'' * 14} {'' * 22} {'' * 9} {'' * 9} {'' * 9} {'' * 9}")
for name, host, block in rows:
enabled = block.get("enabled", cfg.get("enabled"))
if enabled is None:
# Auto-enable check: any credentials?
has_creds = bool(cfg.get("apiKey") or os.environ.get("HONCHO_API_KEY"))
enabled = has_creds if block else False
enabled_str = "yes" if enabled else "no"
mode = block.get("memoryMode") or cfg.get("memoryMode", "hybrid")
recall = block.get("recallMode") or cfg.get("recallMode", "hybrid")
write = block.get("writeFrequency") or cfg.get("writeFrequency", "async")
marker = " *" if name == active else ""
print(f" {name + marker:<14} {host:<22} {enabled_str:<9} {mode:<9} {recall:<9} {write}")
print(f"\n * active profile\n")
def cmd_peers(args) -> None:
"""Show peer identities across all profiles."""
rows = _all_profile_host_configs()
cfg = _read_config()
print(f"\nHoncho peer identities ({len(rows)} profiles)\n" + "" * 50)
print(f" {'Profile':<14} {'User peer':<16} {'AI peer'}")
print(f" {'' * 14} {'' * 16} {'' * 18}")
for name, host, block in rows:
user = block.get("peerName") or cfg.get("peerName") or "(not set)"
ai = block.get("aiPeer") or cfg.get("aiPeer") or host
print(f" {name:<14} {user:<16} {ai}")
print()
def cmd_sessions(args) -> None:
"""List known directory → session name mappings."""
cfg = _read_config()
@@ -354,9 +708,9 @@ def cmd_peer(args) -> None:
if user_name is None and ai_name is None and reasoning is None:
# Show current values
hosts = cfg.get("hosts", {})
hermes = hosts.get(HOST, {})
hermes = hosts.get(_host_key(), {})
user = hermes.get('peerName') or cfg.get('peerName') or '(not set)'
ai = hermes.get('aiPeer') or cfg.get('aiPeer') or HOST
ai = hermes.get('aiPeer') or cfg.get('aiPeer') or _host_key()
lvl = hermes.get("dialecticReasoningLevel") or cfg.get("dialecticReasoningLevel") or "low"
max_chars = hermes.get("dialecticMaxChars") or cfg.get("dialecticMaxChars") or 600
print("\nHoncho peers\n" + "" * 40)
@@ -370,23 +724,26 @@ def cmd_peer(args) -> None:
print(f" Dialectic cap: {max_chars} chars\n")
return
host = _host_key()
label = f"[{host}] " if host != "hermes" else ""
if user_name is not None:
cfg.setdefault("hosts", {}).setdefault(HOST, {})["peerName"] = user_name.strip()
cfg.setdefault("hosts", {}).setdefault(host, {})["peerName"] = user_name.strip()
changed = True
print(f" User peer {user_name.strip()}")
print(f" {label}User peer -> {user_name.strip()}")
if ai_name is not None:
cfg.setdefault("hosts", {}).setdefault(HOST, {})["aiPeer"] = ai_name.strip()
cfg.setdefault("hosts", {}).setdefault(host, {})["aiPeer"] = ai_name.strip()
changed = True
print(f" AI peer {ai_name.strip()}")
print(f" {label}AI peer -> {ai_name.strip()}")
if reasoning is not None:
if reasoning not in REASONING_LEVELS:
print(f" Invalid reasoning level '{reasoning}'. Options: {', '.join(REASONING_LEVELS)}")
return
cfg.setdefault("hosts", {}).setdefault(HOST, {})["dialecticReasoningLevel"] = reasoning
cfg.setdefault("hosts", {}).setdefault(host, {})["dialecticReasoningLevel"] = reasoning
changed = True
print(f" Dialectic reasoning level {reasoning}")
print(f" {label}Dialectic reasoning level -> {reasoning}")
if changed:
_write_config(cfg)
@@ -404,7 +761,7 @@ def cmd_mode(args) -> None:
if mode_arg is None:
current = (
(cfg.get("hosts") or {}).get(HOST, {}).get("memoryMode")
(cfg.get("hosts") or {}).get(_host_key(), {}).get("memoryMode")
or cfg.get("memoryMode")
or "hybrid"
)
@@ -419,16 +776,18 @@ def cmd_mode(args) -> None:
print(f" Invalid mode '{mode_arg}'. Options: {', '.join(MODES)}\n")
return
cfg.setdefault("hosts", {}).setdefault(HOST, {})["memoryMode"] = mode_arg
host = _host_key()
label = f"[{host}] " if host != "hermes" else ""
cfg.setdefault("hosts", {}).setdefault(host, {})["memoryMode"] = mode_arg
_write_config(cfg)
print(f" Memory mode {mode_arg} ({MODES[mode_arg]})\n")
print(f" {label}Memory mode -> {mode_arg} ({MODES[mode_arg]})\n")
def cmd_tokens(args) -> None:
"""Show or set token budget settings."""
cfg = _read_config()
hosts = cfg.get("hosts", {})
hermes = hosts.get(HOST, {})
hermes = hosts.get(_host_key(), {})
context = getattr(args, "context", None)
dialectic = getattr(args, "dialectic", None)
@@ -451,14 +810,16 @@ def cmd_tokens(args) -> None:
print("\n Set with: hermes honcho tokens [--context N] [--dialectic N]\n")
return
host = _host_key()
label = f"[{host}] " if host != "hermes" else ""
changed = False
if context is not None:
cfg.setdefault("hosts", {}).setdefault(HOST, {})["contextTokens"] = context
print(f" context tokens {context}")
cfg.setdefault("hosts", {}).setdefault(host, {})["contextTokens"] = context
print(f" {label}context tokens -> {context}")
changed = True
if dialectic is not None:
cfg.setdefault("hosts", {}).setdefault(HOST, {})["dialecticMaxChars"] = dialectic
print(f" dialectic cap {dialectic} chars")
cfg.setdefault("hosts", {}).setdefault(host, {})["dialecticMaxChars"] = dialectic
print(f" {label}dialectic cap -> {dialectic} chars")
changed = True
if changed:
@@ -477,9 +838,9 @@ def cmd_identity(args) -> None:
show = getattr(args, "show", False)
try:
from honcho_integration.client import HonchoClientConfig, get_honcho_client
from honcho_integration.session import HonchoSessionManager
hcfg = HonchoClientConfig.from_global_config()
from plugins.memory.honcho.client import HonchoClientConfig, get_honcho_client
from plugins.memory.honcho.session import HonchoSessionManager
hcfg = HonchoClientConfig.from_global_config(host=_host_key())
client = get_honcho_client(hcfg)
mgr = HonchoSessionManager(honcho=client, config=hcfg)
session_key = hcfg.resolve_session_name()
@@ -642,12 +1003,12 @@ def cmd_migrate(args) -> None:
answer = _prompt(" Upload user memory files to Honcho now?", default="y")
if answer.lower() in ("y", "yes"):
try:
from honcho_integration.client import (
from plugins.memory.honcho.client import (
HonchoClientConfig,
get_honcho_client,
reset_honcho_client,
)
from honcho_integration.session import HonchoSessionManager
from plugins.memory.honcho.session import HonchoSessionManager
reset_honcho_client()
hcfg = HonchoClientConfig.from_global_config()
@@ -692,12 +1053,12 @@ def cmd_migrate(args) -> None:
answer = _prompt(" Seed AI identity from all detected files now?", default="y")
if answer.lower() in ("y", "yes"):
try:
from honcho_integration.client import (
from plugins.memory.honcho.client import (
HonchoClientConfig,
get_honcho_client,
reset_honcho_client,
)
from honcho_integration.session import HonchoSessionManager
from plugins.memory.honcho.session import HonchoSessionManager
reset_honcho_client()
hcfg = HonchoClientConfig.from_global_config()
@@ -770,11 +1131,16 @@ def cmd_migrate(args) -> None:
def honcho_command(args) -> None:
"""Route honcho subcommands."""
global _profile_override
_profile_override = getattr(args, "target_profile", None)
sub = getattr(args, "honcho_command", None)
if sub == "setup" or sub is None:
cmd_setup(args)
elif sub == "status":
cmd_status(args)
elif sub == "peers":
cmd_peers(args)
elif sub == "sessions":
cmd_sessions(args)
elif sub == "map":
@@ -789,6 +1155,12 @@ def honcho_command(args) -> None:
cmd_identity(args)
elif sub == "migrate":
cmd_migrate(args)
elif sub == "enable":
cmd_enable(args)
elif sub == "disable":
cmd_disable(args)
elif sub == "sync":
cmd_sync(args)
else:
print(f" Unknown honcho command: {sub}")
print(" Available: setup, status, sessions, map, peer, mode, tokens, identity, migrate\n")
print(" Available: setup, status, sessions, map, peer, mode, tokens, identity, migrate, enable, disable, sync\n")

View File

@@ -31,16 +31,47 @@ GLOBAL_CONFIG_PATH = Path.home() / ".honcho" / "config.json"
HOST = "hermes"
def resolve_active_host() -> str:
"""Derive the Honcho host key from the active Hermes profile.
Resolution order:
1. HERMES_HONCHO_HOST env var (explicit override)
2. Active profile name via profiles system -> ``hermes.<profile>``
3. Fallback: ``"hermes"`` (default profile)
"""
explicit = os.environ.get("HERMES_HONCHO_HOST", "").strip()
if explicit:
return explicit
try:
from hermes_cli.profiles import get_active_profile_name
profile = get_active_profile_name()
if profile and profile not in ("default", "custom"):
return f"{HOST}.{profile}"
except Exception:
pass
return HOST
def resolve_config_path() -> Path:
"""Return the active Honcho config path.
Checks $HERMES_HOME/honcho.json first (instance-local), then falls back
to ~/.honcho/config.json (global). Returns the global path if neither
exists (for first-time setup writes).
Resolution order:
1. $HERMES_HOME/honcho.json (profile-local, if it exists)
2. ~/.hermes/honcho.json (default profile shared host blocks live here)
3. ~/.honcho/config.json (global, cross-app interop)
Returns the global path if none exist (for first-time setup writes).
"""
local_path = get_hermes_home() / "honcho.json"
if local_path.exists():
return local_path
# Default profile's config — host blocks accumulate here via setup/clone
default_path = Path.home() / ".hermes" / "honcho.json"
if default_path != local_path and default_path.exists():
return default_path
return GLOBAL_CONFIG_PATH
@@ -135,40 +166,49 @@ class HonchoClientConfig:
explicitly_configured: bool = False
@classmethod
def from_env(cls, workspace_id: str = "hermes") -> HonchoClientConfig:
def from_env(
cls,
workspace_id: str = "hermes",
host: str | None = None,
) -> HonchoClientConfig:
"""Create config from environment variables (fallback)."""
resolved_host = host or resolve_active_host()
api_key = os.environ.get("HONCHO_API_KEY")
base_url = os.environ.get("HONCHO_BASE_URL", "").strip() or None
return cls(
host=resolved_host,
workspace_id=workspace_id,
api_key=api_key,
environment=os.environ.get("HONCHO_ENVIRONMENT", "production"),
base_url=base_url,
ai_peer=resolved_host,
enabled=bool(api_key or base_url),
)
@classmethod
def from_global_config(
cls,
host: str = HOST,
host: str | None = None,
config_path: Path | None = None,
) -> HonchoClientConfig:
"""Create config from the resolved Honcho config path.
Resolution: $HERMES_HOME/honcho.json -> ~/.honcho/config.json -> env vars.
When host is None, derives it from the active Hermes profile.
"""
resolved_host = host or resolve_active_host()
path = config_path or resolve_config_path()
if not path.exists():
logger.debug("No global Honcho config at %s, falling back to env", path)
return cls.from_env()
return cls.from_env(host=resolved_host)
try:
raw = json.loads(path.read_text(encoding="utf-8"))
except (json.JSONDecodeError, OSError) as e:
logger.warning("Failed to read %s: %s, falling back to env", path, e)
return cls.from_env()
return cls.from_env(host=resolved_host)
host_block = (raw.get("hosts") or {}).get(host, {})
host_block = (raw.get("hosts") or {}).get(resolved_host, {})
# A hosts.hermes block or explicit enabled flag means the user
# intentionally configured Honcho for this host.
_explicitly_configured = bool(host_block) or raw.get("enabled") is True
@@ -177,12 +217,12 @@ class HonchoClientConfig:
workspace = (
host_block.get("workspace")
or raw.get("workspace")
or host
or resolved_host
)
ai_peer = (
host_block.get("aiPeer")
or raw.get("aiPeer")
or host
or resolved_host
)
linked_hosts = host_block.get("linkedHosts", [])
@@ -242,7 +282,7 @@ class HonchoClientConfig:
)
return cls(
host=host,
host=resolved_host,
workspace_id=workspace,
api_key=api_key,
environment=environment,

View File

@@ -0,0 +1,7 @@
name: honcho
version: 1.0.0
description: "Honcho AI-native memory — cross-session user modeling with dialectic Q&A, semantic search, and persistent conclusions."
pip_dependencies:
- honcho-ai
hooks:
- on_session_end

View File

@@ -10,7 +10,7 @@ from dataclasses import dataclass, field
from datetime import datetime
from typing import Any, TYPE_CHECKING
from honcho_integration.client import get_honcho_client
from plugins.memory.honcho.client import get_honcho_client
if TYPE_CHECKING:
from honcho import Honcho
@@ -162,11 +162,17 @@ class HonchoSessionManager:
# Configure peer observation settings.
# observe_me=True for AI peer so Honcho watches what the agent says
# and builds its representation over time — enabling identity formation.
from honcho.session import SessionPeerConfig
user_config = SessionPeerConfig(observe_me=True, observe_others=True)
ai_config = SessionPeerConfig(observe_me=True, observe_others=True)
try:
from honcho.session import SessionPeerConfig
user_config = SessionPeerConfig(observe_me=True, observe_others=True)
ai_config = SessionPeerConfig(observe_me=True, observe_others=True)
session.add_peers([(user_peer, user_config), (assistant_peer, ai_config)])
session.add_peers([(user_peer, user_config), (assistant_peer, ai_config)])
except Exception as e:
logger.warning(
"Honcho session '%s' add_peers failed (non-fatal): %s",
session_id, e,
)
# Load existing messages via context() - single call for messages + metadata
existing_messages = []
@@ -231,7 +237,7 @@ class HonchoSessionManager:
chat_id = parts[1] if len(parts) > 1 else key
user_peer_id = self._sanitize_id(f"user-{channel}-{chat_id}")
assistant_peer_id = (
assistant_peer_id = self._sanitize_id(
self._config.ai_peer if self._config else "hermes-assistant"
)

View File

@@ -0,0 +1,38 @@
# Mem0 Memory Provider
Server-side LLM fact extraction with semantic search, reranking, and automatic deduplication.
## Requirements
- `pip install mem0ai`
- Mem0 API key from [app.mem0.ai](https://app.mem0.ai)
## Setup
```bash
hermes memory setup # select "mem0"
```
Or manually:
```bash
hermes config set memory.provider mem0
echo "MEM0_API_KEY=your-key" >> ~/.hermes/.env
```
## Config
Config file: `$HERMES_HOME/mem0.json`
| Key | Default | Description |
|-----|---------|-------------|
| `user_id` | `hermes-user` | User identifier on Mem0 |
| `agent_id` | `hermes` | Agent identifier |
| `rerank` | `true` | Enable reranking for recall |
## Tools
| Tool | Description |
|------|-------------|
| `mem0_profile` | All stored memories about the user |
| `mem0_search` | Semantic search with optional reranking |
| `mem0_conclude` | Store a fact verbatim (no LLM extraction) |

View File

@@ -0,0 +1,344 @@
"""Mem0 memory plugin — MemoryProvider interface.
Server-side LLM fact extraction, semantic search with reranking, and
automatic deduplication via the Mem0 Platform API.
Original PR #2933 by kartik-mem0, adapted to MemoryProvider ABC.
Config via environment variables:
MEM0_API_KEY — Mem0 Platform API key (required)
MEM0_USER_ID — User identifier (default: hermes-user)
MEM0_AGENT_ID — Agent identifier (default: hermes)
Or via $HERMES_HOME/mem0.json.
"""
from __future__ import annotations
import json
import logging
import os
import threading
import time
from pathlib import Path
from typing import Any, Dict, List
from agent.memory_provider import MemoryProvider
logger = logging.getLogger(__name__)
# Circuit breaker: after this many consecutive failures, pause API calls
# for _BREAKER_COOLDOWN_SECS to avoid hammering a down server.
_BREAKER_THRESHOLD = 5
_BREAKER_COOLDOWN_SECS = 120
# ---------------------------------------------------------------------------
# Config
# ---------------------------------------------------------------------------
def _load_config() -> dict:
"""Load config from $HERMES_HOME/mem0.json or env vars."""
from hermes_constants import get_hermes_home
config_path = get_hermes_home() / "mem0.json"
if config_path.exists():
try:
return json.loads(config_path.read_text(encoding="utf-8"))
except Exception:
pass
return {
"api_key": os.environ.get("MEM0_API_KEY", ""),
"user_id": os.environ.get("MEM0_USER_ID", "hermes-user"),
"agent_id": os.environ.get("MEM0_AGENT_ID", "hermes"),
"rerank": True,
"keyword_search": False,
}
# ---------------------------------------------------------------------------
# Tool schemas
# ---------------------------------------------------------------------------
PROFILE_SCHEMA = {
"name": "mem0_profile",
"description": (
"Retrieve all stored memories about the user — preferences, facts, "
"project context. Fast, no reranking. Use at conversation start."
),
"parameters": {"type": "object", "properties": {}, "required": []},
}
SEARCH_SCHEMA = {
"name": "mem0_search",
"description": (
"Search memories by meaning. Returns relevant facts ranked by similarity. "
"Set rerank=true for higher accuracy on important queries."
),
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "What to search for."},
"rerank": {"type": "boolean", "description": "Enable reranking for precision (default: false)."},
"top_k": {"type": "integer", "description": "Max results (default: 10, max: 50)."},
},
"required": ["query"],
},
}
CONCLUDE_SCHEMA = {
"name": "mem0_conclude",
"description": (
"Store a durable fact about the user. Stored verbatim (no LLM extraction). "
"Use for explicit preferences, corrections, or decisions."
),
"parameters": {
"type": "object",
"properties": {
"conclusion": {"type": "string", "description": "The fact to store."},
},
"required": ["conclusion"],
},
}
# ---------------------------------------------------------------------------
# MemoryProvider implementation
# ---------------------------------------------------------------------------
class Mem0MemoryProvider(MemoryProvider):
"""Mem0 Platform memory with server-side extraction and semantic search."""
def __init__(self):
self._config = None
self._client = None
self._client_lock = threading.Lock()
self._api_key = ""
self._user_id = "hermes-user"
self._agent_id = "hermes"
self._rerank = True
self._prefetch_result = ""
self._prefetch_lock = threading.Lock()
self._prefetch_thread = None
self._sync_thread = None
# Circuit breaker state
self._consecutive_failures = 0
self._breaker_open_until = 0.0
@property
def name(self) -> str:
return "mem0"
def is_available(self) -> bool:
cfg = _load_config()
return bool(cfg.get("api_key"))
def save_config(self, values, hermes_home):
"""Write config to $HERMES_HOME/mem0.json."""
import json
from pathlib import Path
config_path = Path(hermes_home) / "mem0.json"
existing = {}
if config_path.exists():
try:
existing = json.loads(config_path.read_text())
except Exception:
pass
existing.update(values)
config_path.write_text(json.dumps(existing, indent=2))
def get_config_schema(self):
return [
{"key": "api_key", "description": "Mem0 Platform API key", "secret": True, "required": True, "env_var": "MEM0_API_KEY", "url": "https://app.mem0.ai"},
{"key": "user_id", "description": "User identifier", "default": "hermes-user"},
{"key": "agent_id", "description": "Agent identifier", "default": "hermes"},
{"key": "rerank", "description": "Enable reranking for recall", "default": "true", "choices": ["true", "false"]},
]
def _get_client(self):
"""Thread-safe client accessor with lazy initialization."""
with self._client_lock:
if self._client is not None:
return self._client
try:
from mem0 import MemoryClient
self._client = MemoryClient(api_key=self._api_key)
return self._client
except ImportError:
raise RuntimeError("mem0 package not installed. Run: pip install mem0ai")
def _is_breaker_open(self) -> bool:
"""Return True if the circuit breaker is tripped (too many failures)."""
if self._consecutive_failures < _BREAKER_THRESHOLD:
return False
if time.monotonic() >= self._breaker_open_until:
# Cooldown expired — reset and allow a retry
self._consecutive_failures = 0
return False
return True
def _record_success(self):
self._consecutive_failures = 0
def _record_failure(self):
self._consecutive_failures += 1
if self._consecutive_failures >= _BREAKER_THRESHOLD:
self._breaker_open_until = time.monotonic() + _BREAKER_COOLDOWN_SECS
logger.warning(
"Mem0 circuit breaker tripped after %d consecutive failures. "
"Pausing API calls for %ds.",
self._consecutive_failures, _BREAKER_COOLDOWN_SECS,
)
def initialize(self, session_id: str, **kwargs) -> None:
self._config = _load_config()
self._api_key = self._config.get("api_key", "")
self._user_id = self._config.get("user_id", "hermes-user")
self._agent_id = self._config.get("agent_id", "hermes")
self._rerank = self._config.get("rerank", True)
def system_prompt_block(self) -> str:
return (
"# Mem0 Memory\n"
f"Active. User: {self._user_id}.\n"
"Use mem0_search to find memories, mem0_conclude to store facts, "
"mem0_profile for a full overview."
)
def prefetch(self, query: str, *, session_id: str = "") -> str:
if self._prefetch_thread and self._prefetch_thread.is_alive():
self._prefetch_thread.join(timeout=3.0)
with self._prefetch_lock:
result = self._prefetch_result
self._prefetch_result = ""
if not result:
return ""
return f"## Mem0 Memory\n{result}"
def queue_prefetch(self, query: str, *, session_id: str = "") -> None:
if self._is_breaker_open():
return
def _run():
try:
client = self._get_client()
results = client.search(
query=query,
user_id=self._user_id,
rerank=self._rerank,
top_k=5,
)
if results:
lines = [r.get("memory", "") for r in results if r.get("memory")]
with self._prefetch_lock:
self._prefetch_result = "\n".join(f"- {l}" for l in lines)
self._record_success()
except Exception as e:
self._record_failure()
logger.debug("Mem0 prefetch failed: %s", e)
self._prefetch_thread = threading.Thread(target=_run, daemon=True, name="mem0-prefetch")
self._prefetch_thread.start()
def sync_turn(self, user_content: str, assistant_content: str, *, session_id: str = "") -> None:
"""Send the turn to Mem0 for server-side fact extraction (non-blocking)."""
if self._is_breaker_open():
return
def _sync():
try:
client = self._get_client()
messages = [
{"role": "user", "content": user_content},
{"role": "assistant", "content": assistant_content},
]
client.add(messages, user_id=self._user_id, agent_id=self._agent_id)
self._record_success()
except Exception as e:
self._record_failure()
logger.warning("Mem0 sync failed: %s", e)
# Wait for any previous sync before starting a new one
if self._sync_thread and self._sync_thread.is_alive():
self._sync_thread.join(timeout=5.0)
self._sync_thread = threading.Thread(target=_sync, daemon=True, name="mem0-sync")
self._sync_thread.start()
def get_tool_schemas(self) -> List[Dict[str, Any]]:
return [PROFILE_SCHEMA, SEARCH_SCHEMA, CONCLUDE_SCHEMA]
def handle_tool_call(self, tool_name: str, args: dict, **kwargs) -> str:
if self._is_breaker_open():
return json.dumps({
"error": "Mem0 API temporarily unavailable (multiple consecutive failures). Will retry automatically."
})
try:
client = self._get_client()
except Exception as e:
return json.dumps({"error": str(e)})
if tool_name == "mem0_profile":
try:
memories = client.get_all(user_id=self._user_id)
self._record_success()
if not memories:
return json.dumps({"result": "No memories stored yet."})
lines = [m.get("memory", "") for m in memories if m.get("memory")]
return json.dumps({"result": "\n".join(lines), "count": len(lines)})
except Exception as e:
self._record_failure()
return json.dumps({"error": f"Failed to fetch profile: {e}"})
elif tool_name == "mem0_search":
query = args.get("query", "")
if not query:
return json.dumps({"error": "Missing required parameter: query"})
rerank = args.get("rerank", False)
top_k = min(int(args.get("top_k", 10)), 50)
try:
results = client.search(
query=query, user_id=self._user_id,
rerank=rerank, top_k=top_k,
)
self._record_success()
if not results:
return json.dumps({"result": "No relevant memories found."})
items = [{"memory": r.get("memory", ""), "score": r.get("score", 0)} for r in results]
return json.dumps({"results": items, "count": len(items)})
except Exception as e:
self._record_failure()
return json.dumps({"error": f"Search failed: {e}"})
elif tool_name == "mem0_conclude":
conclusion = args.get("conclusion", "")
if not conclusion:
return json.dumps({"error": "Missing required parameter: conclusion"})
try:
client.add(
[{"role": "user", "content": conclusion}],
user_id=self._user_id,
agent_id=self._agent_id,
infer=False,
)
self._record_success()
return json.dumps({"result": "Fact stored."})
except Exception as e:
self._record_failure()
return json.dumps({"error": f"Failed to store: {e}"})
return json.dumps({"error": f"Unknown tool: {tool_name}"})
def shutdown(self) -> None:
for t in (self._prefetch_thread, self._sync_thread):
if t and t.is_alive():
t.join(timeout=5.0)
with self._client_lock:
self._client = None
def register(ctx) -> None:
"""Register Mem0 as a memory provider plugin."""
ctx.register_memory_provider(Mem0MemoryProvider())

View File

@@ -0,0 +1,5 @@
name: mem0
version: 1.0.0
description: "Mem0 — server-side LLM fact extraction with semantic search, reranking, and automatic deduplication."
pip_dependencies:
- mem0ai

View File

@@ -0,0 +1,40 @@
# OpenViking Memory Provider
Context database by Volcengine (ByteDance) with filesystem-style knowledge hierarchy, tiered retrieval, and automatic memory extraction.
## Requirements
- `pip install openviking`
- OpenViking server running (`openviking-server`)
- Embedding + VLM model configured in `~/.openviking/ov.conf`
## Setup
```bash
hermes memory setup # select "openviking"
```
Or manually:
```bash
hermes config set memory.provider openviking
echo "OPENVIKING_ENDPOINT=http://localhost:1933" >> ~/.hermes/.env
```
## Config
All config via environment variables in `.env`:
| Env Var | Default | Description |
|---------|---------|-------------|
| `OPENVIKING_ENDPOINT` | `http://127.0.0.1:1933` | Server URL |
| `OPENVIKING_API_KEY` | (none) | API key (optional) |
## Tools
| Tool | Description |
|------|-------------|
| `viking_search` | Semantic search with fast/deep/auto modes |
| `viking_read` | Read content at a viking:// URI (abstract/overview/full) |
| `viking_browse` | Filesystem-style navigation (list/tree/stat) |
| `viking_remember` | Store a fact for extraction on session commit |
| `viking_add_resource` | Ingest URLs/docs into the knowledge base |

View File

@@ -0,0 +1,582 @@
"""OpenViking memory plugin — full bidirectional MemoryProvider interface.
Context database by Volcengine (ByteDance) that organizes agent knowledge
into a filesystem hierarchy (viking:// URIs) with tiered context loading,
automatic memory extraction, and session management.
Original PR #3369 by Mibayy, rewritten to use the full OpenViking session
lifecycle instead of read-only search endpoints.
Config via environment variables (profile-scoped via each profile's .env):
OPENVIKING_ENDPOINT — Server URL (default: http://127.0.0.1:1933)
OPENVIKING_API_KEY — API key (required for authenticated servers)
Capabilities:
- Automatic memory extraction on session commit (6 categories)
- Tiered context: L0 (~100 tokens), L1 (~2k), L2 (full)
- Semantic search with hierarchical directory retrieval
- Filesystem-style browsing via viking:// URIs
- Resource ingestion (URLs, docs, code)
"""
from __future__ import annotations
import json
import logging
import os
import threading
from typing import Any, Dict, List, Optional
from agent.memory_provider import MemoryProvider
logger = logging.getLogger(__name__)
_DEFAULT_ENDPOINT = "http://127.0.0.1:1933"
_TIMEOUT = 30.0
# ---------------------------------------------------------------------------
# HTTP helper — uses httpx to avoid requiring the openviking SDK
# ---------------------------------------------------------------------------
def _get_httpx():
"""Lazy import httpx."""
try:
import httpx
return httpx
except ImportError:
return None
class _VikingClient:
"""Thin HTTP client for the OpenViking REST API."""
def __init__(self, endpoint: str, api_key: str = ""):
self._endpoint = endpoint.rstrip("/")
self._api_key = api_key
self._httpx = _get_httpx()
if self._httpx is None:
raise ImportError("httpx is required for OpenViking: pip install httpx")
def _headers(self) -> dict:
h = {"Content-Type": "application/json"}
if self._api_key:
h["X-API-Key"] = self._api_key
return h
def _url(self, path: str) -> str:
return f"{self._endpoint}{path}"
def get(self, path: str, **kwargs) -> dict:
resp = self._httpx.get(
self._url(path), headers=self._headers(), timeout=_TIMEOUT, **kwargs
)
resp.raise_for_status()
return resp.json()
def post(self, path: str, payload: dict = None, **kwargs) -> dict:
resp = self._httpx.post(
self._url(path), json=payload or {}, headers=self._headers(),
timeout=_TIMEOUT, **kwargs
)
resp.raise_for_status()
return resp.json()
def health(self) -> bool:
try:
resp = self._httpx.get(
self._url("/health"), timeout=3.0
)
return resp.status_code == 200
except Exception:
return False
# ---------------------------------------------------------------------------
# Tool schemas
# ---------------------------------------------------------------------------
SEARCH_SCHEMA = {
"name": "viking_search",
"description": (
"Semantic search over the OpenViking knowledge base. "
"Returns ranked results with viking:// URIs for deeper reading. "
"Use mode='deep' for complex queries that need reasoning across "
"multiple sources, 'fast' for simple lookups."
),
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query."},
"mode": {
"type": "string", "enum": ["auto", "fast", "deep"],
"description": "Search depth (default: auto).",
},
"scope": {
"type": "string",
"description": "Viking URI prefix to scope search (e.g. 'viking://resources/docs/').",
},
"limit": {"type": "integer", "description": "Max results (default: 10)."},
},
"required": ["query"],
},
}
READ_SCHEMA = {
"name": "viking_read",
"description": (
"Read content at a viking:// URI. Three detail levels:\n"
" abstract — ~100 token summary (L0)\n"
" overview — ~2k token key points (L1)\n"
" full — complete content (L2)\n"
"Start with abstract/overview, only use full when you need details."
),
"parameters": {
"type": "object",
"properties": {
"uri": {"type": "string", "description": "viking:// URI to read."},
"level": {
"type": "string", "enum": ["abstract", "overview", "full"],
"description": "Detail level (default: overview).",
},
},
"required": ["uri"],
},
}
BROWSE_SCHEMA = {
"name": "viking_browse",
"description": (
"Browse the OpenViking knowledge store like a filesystem.\n"
" list — show directory contents\n"
" tree — show hierarchy\n"
" stat — show metadata for a URI"
),
"parameters": {
"type": "object",
"properties": {
"action": {
"type": "string", "enum": ["tree", "list", "stat"],
"description": "Browse action.",
},
"path": {
"type": "string",
"description": "Viking URI path (default: viking://). Examples: 'viking://resources/', 'viking://user/memories/'.",
},
},
"required": ["action"],
},
}
REMEMBER_SCHEMA = {
"name": "viking_remember",
"description": (
"Explicitly store a fact or memory in the OpenViking knowledge base. "
"Use for important information the agent should remember long-term. "
"The system automatically categorizes and indexes the memory."
),
"parameters": {
"type": "object",
"properties": {
"content": {"type": "string", "description": "The information to remember."},
"category": {
"type": "string",
"enum": ["preference", "entity", "event", "case", "pattern"],
"description": "Memory category (default: auto-detected).",
},
},
"required": ["content"],
},
}
ADD_RESOURCE_SCHEMA = {
"name": "viking_add_resource",
"description": (
"Add a URL or document to the OpenViking knowledge base. "
"Supports web pages, GitHub repos, PDFs, markdown, code files. "
"The system automatically parses, indexes, and generates summaries."
),
"parameters": {
"type": "object",
"properties": {
"url": {"type": "string", "description": "URL or path of the resource to add."},
"reason": {
"type": "string",
"description": "Why this resource is relevant (improves search).",
},
},
"required": ["url"],
},
}
# ---------------------------------------------------------------------------
# MemoryProvider implementation
# ---------------------------------------------------------------------------
class OpenVikingMemoryProvider(MemoryProvider):
"""Full bidirectional memory via OpenViking context database."""
def __init__(self):
self._client: Optional[_VikingClient] = None
self._endpoint = ""
self._api_key = ""
self._session_id = ""
self._turn_count = 0
self._sync_thread: Optional[threading.Thread] = None
self._prefetch_result = ""
self._prefetch_lock = threading.Lock()
self._prefetch_thread: Optional[threading.Thread] = None
@property
def name(self) -> str:
return "openviking"
def is_available(self) -> bool:
"""Check if OpenViking endpoint is configured. No network calls."""
return bool(os.environ.get("OPENVIKING_ENDPOINT"))
def get_config_schema(self):
return [
{
"key": "endpoint",
"description": "OpenViking server URL",
"required": True,
"default": _DEFAULT_ENDPOINT,
"env_var": "OPENVIKING_ENDPOINT",
},
{
"key": "api_key",
"description": "OpenViking API key",
"secret": True,
"env_var": "OPENVIKING_API_KEY",
},
]
def initialize(self, session_id: str, **kwargs) -> None:
self._endpoint = os.environ.get("OPENVIKING_ENDPOINT", _DEFAULT_ENDPOINT)
self._api_key = os.environ.get("OPENVIKING_API_KEY", "")
self._session_id = session_id
self._turn_count = 0
try:
self._client = _VikingClient(self._endpoint, self._api_key)
if not self._client.health():
logger.warning("OpenViking server at %s is not reachable", self._endpoint)
self._client = None
except ImportError:
logger.warning("httpx not installed — OpenViking plugin disabled")
self._client = None
def system_prompt_block(self) -> str:
if not self._client:
return ""
# Provide brief info about the knowledge base
try:
# Check what's in the knowledge base via a root listing
resp = self._client.post("/api/v1/browse", {"action": "stat", "path": "viking://"})
result = resp.get("result", {})
children = result.get("children", 0)
if children == 0:
return ""
return (
"# OpenViking Knowledge Base\n"
f"Active. Endpoint: {self._endpoint}\n"
"Use viking_search to find information, viking_read for details "
"(abstract/overview/full), viking_browse to explore.\n"
"Use viking_remember to store facts, viking_add_resource to index URLs/docs."
)
except Exception:
return (
"# OpenViking Knowledge Base\n"
f"Active. Endpoint: {self._endpoint}\n"
"Use viking_search, viking_read, viking_browse, "
"viking_remember, viking_add_resource."
)
def prefetch(self, query: str, *, session_id: str = "") -> str:
"""Return prefetched results from the background thread."""
if self._prefetch_thread and self._prefetch_thread.is_alive():
self._prefetch_thread.join(timeout=3.0)
with self._prefetch_lock:
result = self._prefetch_result
self._prefetch_result = ""
if not result:
return ""
return f"## OpenViking Context\n{result}"
def queue_prefetch(self, query: str, *, session_id: str = "") -> None:
"""Fire a background search to pre-load relevant context."""
if not self._client or not query:
return
def _run():
try:
client = _VikingClient(self._endpoint, self._api_key)
resp = client.post("/api/v1/search/find", {
"query": query,
"top_k": 5,
})
result = resp.get("result", {})
parts = []
for ctx_type in ("memories", "resources"):
items = result.get(ctx_type, [])
for item in items[:3]:
uri = item.get("uri", "")
abstract = item.get("abstract", "")
score = item.get("score", 0)
if abstract:
parts.append(f"- [{score:.2f}] {abstract} ({uri})")
if parts:
with self._prefetch_lock:
self._prefetch_result = "\n".join(parts)
except Exception as e:
logger.debug("OpenViking prefetch failed: %s", e)
self._prefetch_thread = threading.Thread(
target=_run, daemon=True, name="openviking-prefetch"
)
self._prefetch_thread.start()
def sync_turn(self, user_content: str, assistant_content: str, *, session_id: str = "") -> None:
"""Record the conversation turn in OpenViking's session (non-blocking)."""
if not self._client:
return
self._turn_count += 1
def _sync():
try:
client = _VikingClient(self._endpoint, self._api_key)
sid = self._session_id
# Add user message
client.post(f"/api/v1/sessions/{sid}/messages", {
"role": "user",
"content": user_content[:4000], # trim very long messages
})
# Add assistant message
client.post(f"/api/v1/sessions/{sid}/messages", {
"role": "assistant",
"content": assistant_content[:4000],
})
except Exception as e:
logger.debug("OpenViking sync_turn failed: %s", e)
# Wait for any previous sync to finish before starting a new one
if self._sync_thread and self._sync_thread.is_alive():
self._sync_thread.join(timeout=5.0)
self._sync_thread = threading.Thread(
target=_sync, daemon=True, name="openviking-sync"
)
self._sync_thread.start()
def on_session_end(self, messages: List[Dict[str, Any]]) -> None:
"""Commit the session to trigger memory extraction.
OpenViking automatically extracts 6 categories of memories:
profile, preferences, entities, events, cases, and patterns.
"""
if not self._client or self._turn_count == 0:
return
# Wait for any pending sync to finish first
if self._sync_thread and self._sync_thread.is_alive():
self._sync_thread.join(timeout=10.0)
try:
self._client.post(f"/api/v1/sessions/{self._session_id}/commit")
logger.info("OpenViking session %s committed (%d turns)", self._session_id, self._turn_count)
except Exception as e:
logger.warning("OpenViking session commit failed: %s", e)
def on_memory_write(self, action: str, target: str, content: str) -> None:
"""Mirror built-in memory writes to OpenViking as explicit memories."""
if not self._client or action != "add" or not content:
return
def _write():
try:
client = _VikingClient(self._endpoint, self._api_key)
# Add as a user message with memory context so the commit
# picks it up as an explicit memory during extraction
client.post(f"/api/v1/sessions/{self._session_id}/messages", {
"role": "user",
"parts": [
{"type": "text", "text": f"[Memory note — {target}] {content}"},
],
})
except Exception as e:
logger.debug("OpenViking memory mirror failed: %s", e)
t = threading.Thread(target=_write, daemon=True, name="openviking-memwrite")
t.start()
def get_tool_schemas(self) -> List[Dict[str, Any]]:
return [SEARCH_SCHEMA, READ_SCHEMA, BROWSE_SCHEMA, REMEMBER_SCHEMA, ADD_RESOURCE_SCHEMA]
def handle_tool_call(self, tool_name: str, args: dict, **kwargs) -> str:
if not self._client:
return json.dumps({"error": "OpenViking server not connected"})
try:
if tool_name == "viking_search":
return self._tool_search(args)
elif tool_name == "viking_read":
return self._tool_read(args)
elif tool_name == "viking_browse":
return self._tool_browse(args)
elif tool_name == "viking_remember":
return self._tool_remember(args)
elif tool_name == "viking_add_resource":
return self._tool_add_resource(args)
return json.dumps({"error": f"Unknown tool: {tool_name}"})
except Exception as e:
return json.dumps({"error": str(e)})
def shutdown(self) -> None:
# Wait for background threads to finish
for t in (self._sync_thread, self._prefetch_thread):
if t and t.is_alive():
t.join(timeout=5.0)
# -- Tool implementations ------------------------------------------------
def _tool_search(self, args: dict) -> str:
query = args.get("query", "")
if not query:
return json.dumps({"error": "query is required"})
payload: Dict[str, Any] = {"query": query}
mode = args.get("mode", "auto")
if mode != "auto":
payload["mode"] = mode
if args.get("scope"):
payload["target_uri"] = args["scope"]
if args.get("limit"):
payload["top_k"] = args["limit"]
resp = self._client.post("/api/v1/search/find", payload)
result = resp.get("result", {})
# Format results for the model — keep it concise
formatted = []
for ctx_type in ("memories", "resources", "skills"):
items = result.get(ctx_type, [])
for item in items:
entry = {
"uri": item.get("uri", ""),
"type": ctx_type.rstrip("s"),
"score": round(item.get("score", 0), 3),
"abstract": item.get("abstract", ""),
}
if item.get("relations"):
entry["related"] = [r.get("uri") for r in item["relations"][:3]]
formatted.append(entry)
return json.dumps({
"results": formatted,
"total": result.get("total", len(formatted)),
}, ensure_ascii=False)
def _tool_read(self, args: dict) -> str:
uri = args.get("uri", "")
if not uri:
return json.dumps({"error": "uri is required"})
level = args.get("level", "overview")
# Map our level names to OpenViking endpoints
if level == "abstract":
resp = self._client.post("/api/v1/read/abstract", {"uri": uri})
elif level == "full":
resp = self._client.post("/api/v1/read", {"uri": uri, "level": "read"})
else: # overview
resp = self._client.post("/api/v1/read", {"uri": uri, "level": "overview"})
result = resp.get("result", {})
content = result.get("content", "")
# Truncate very long content to avoid flooding the context
if len(content) > 8000:
content = content[:8000] + "\n\n[... truncated, use a more specific URI or abstract level]"
return json.dumps({
"uri": uri,
"level": level,
"content": content,
}, ensure_ascii=False)
def _tool_browse(self, args: dict) -> str:
action = args.get("action", "list")
path = args.get("path", "viking://")
resp = self._client.post("/api/v1/browse", {
"action": action,
"path": path,
})
result = resp.get("result", {})
# Format for readability
if action == "list" and "entries" in result:
entries = []
for e in result["entries"][:50]: # cap at 50 entries
entries.append({
"name": e.get("name", ""),
"uri": e.get("uri", ""),
"type": "dir" if e.get("is_dir") else "file",
})
return json.dumps({"path": path, "entries": entries}, ensure_ascii=False)
return json.dumps(result, ensure_ascii=False)
def _tool_remember(self, args: dict) -> str:
content = args.get("content", "")
if not content:
return json.dumps({"error": "content is required"})
# Store as a session message that will be extracted during commit.
# The category hint helps OpenViking's extraction classify correctly.
category = args.get("category", "")
text = f"[Remember] {content}"
if category:
text = f"[Remember — {category}] {content}"
self._client.post(f"/api/v1/sessions/{self._session_id}/messages", {
"role": "user",
"parts": [
{"type": "text", "text": text},
],
})
return json.dumps({
"status": "stored",
"message": "Memory recorded. Will be extracted and indexed on session commit.",
})
def _tool_add_resource(self, args: dict) -> str:
url = args.get("url", "")
if not url:
return json.dumps({"error": "url is required"})
payload: Dict[str, Any] = {"path": url}
if args.get("reason"):
payload["reason"] = args["reason"]
resp = self._client.post("/api/v1/resources", payload)
result = resp.get("result", {})
return json.dumps({
"status": "added",
"root_uri": result.get("root_uri", ""),
"message": "Resource queued for processing. Use viking_search after a moment to find it.",
}, ensure_ascii=False)
# ---------------------------------------------------------------------------
# Plugin entry point
# ---------------------------------------------------------------------------
def register(ctx) -> None:
"""Register OpenViking as a memory provider plugin."""
ctx.register_memory_provider(OpenVikingMemoryProvider())

View File

@@ -0,0 +1,9 @@
name: openviking
version: 2.0.0
description: "OpenViking context database — session-managed memory with automatic extraction, tiered retrieval, and filesystem-style knowledge browsing."
pip_dependencies:
- httpx
requires_env:
- OPENVIKING_ENDPOINT
hooks:
- on_session_end

View File

@@ -0,0 +1,40 @@
# RetainDB Memory Provider
Cloud memory API with hybrid search (Vector + BM25 + Reranking) and 7 memory types.
## Requirements
- RetainDB account ($20/month) from [retaindb.com](https://www.retaindb.com)
- `pip install requests`
## Setup
```bash
hermes memory setup # select "retaindb"
```
Or manually:
```bash
hermes config set memory.provider retaindb
echo "RETAINDB_API_KEY=your-key" >> ~/.hermes/.env
```
## Config
All config via environment variables in `.env`:
| Env Var | Default | Description |
|---------|---------|-------------|
| `RETAINDB_API_KEY` | (required) | API key |
| `RETAINDB_BASE_URL` | `https://api.retaindb.com` | API endpoint |
| `RETAINDB_PROJECT` | auto (profile-scoped) | Project identifier |
## Tools
| Tool | Description |
|------|-------------|
| `retaindb_profile` | User's stable profile |
| `retaindb_search` | Semantic search |
| `retaindb_context` | Task-relevant context |
| `retaindb_remember` | Store a fact with type + importance |
| `retaindb_forget` | Delete a memory by ID |

View File

@@ -0,0 +1,302 @@
"""RetainDB memory plugin — MemoryProvider interface.
Cross-session memory via RetainDB cloud API. Durable write-behind queue,
semantic search with deduplication, and user profile retrieval.
Original PR #2732 by Alinxus, adapted to MemoryProvider ABC.
Config via environment variables:
RETAINDB_API_KEY — API key (required)
RETAINDB_BASE_URL — API endpoint (default: https://api.retaindb.com)
RETAINDB_PROJECT — Project identifier (default: hermes)
"""
from __future__ import annotations
import json
import logging
import os
import threading
from typing import Any, Dict, List
from agent.memory_provider import MemoryProvider
logger = logging.getLogger(__name__)
_DEFAULT_BASE_URL = "https://api.retaindb.com"
# ---------------------------------------------------------------------------
# Tool schemas
# ---------------------------------------------------------------------------
PROFILE_SCHEMA = {
"name": "retaindb_profile",
"description": "Get the user's stable profile — preferences, facts, and patterns.",
"parameters": {"type": "object", "properties": {}, "required": []},
}
SEARCH_SCHEMA = {
"name": "retaindb_search",
"description": (
"Semantic search across stored memories. Returns ranked results "
"with relevance scores."
),
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "What to search for."},
"top_k": {"type": "integer", "description": "Max results (default: 8, max: 20)."},
},
"required": ["query"],
},
}
CONTEXT_SCHEMA = {
"name": "retaindb_context",
"description": "Synthesized 'what matters now' context block for the current task.",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Current task or question."},
},
"required": ["query"],
},
}
REMEMBER_SCHEMA = {
"name": "retaindb_remember",
"description": "Persist an explicit fact or preference to long-term memory.",
"parameters": {
"type": "object",
"properties": {
"content": {"type": "string", "description": "The fact to remember."},
"memory_type": {
"type": "string",
"enum": ["preference", "fact", "decision", "context"],
"description": "Category (default: fact).",
},
"importance": {
"type": "number",
"description": "Importance 0-1 (default: 0.5).",
},
},
"required": ["content"],
},
}
FORGET_SCHEMA = {
"name": "retaindb_forget",
"description": "Delete a specific memory by ID.",
"parameters": {
"type": "object",
"properties": {
"memory_id": {"type": "string", "description": "Memory ID to delete."},
},
"required": ["memory_id"],
},
}
# ---------------------------------------------------------------------------
# MemoryProvider implementation
# ---------------------------------------------------------------------------
class RetainDBMemoryProvider(MemoryProvider):
"""RetainDB cloud memory with write-behind queue and semantic search."""
def __init__(self):
self._api_key = ""
self._base_url = _DEFAULT_BASE_URL
self._project = "hermes"
self._user_id = ""
self._prefetch_result = ""
self._prefetch_lock = threading.Lock()
self._prefetch_thread = None
self._sync_thread = None
@property
def name(self) -> str:
return "retaindb"
def is_available(self) -> bool:
return bool(os.environ.get("RETAINDB_API_KEY"))
def get_config_schema(self):
return [
{"key": "api_key", "description": "RetainDB API key", "secret": True, "required": True, "env_var": "RETAINDB_API_KEY", "url": "https://retaindb.com"},
{"key": "base_url", "description": "API endpoint", "default": "https://api.retaindb.com"},
{"key": "project", "description": "Project identifier", "default": "hermes"},
]
def _headers(self) -> dict:
return {
"Authorization": f"Bearer {self._api_key}",
"Content-Type": "application/json",
}
def _api(self, method: str, path: str, **kwargs):
"""Make an API call to RetainDB."""
import requests
url = f"{self._base_url}{path}"
resp = requests.request(method, url, headers=self._headers(), timeout=30, **kwargs)
resp.raise_for_status()
return resp.json()
def initialize(self, session_id: str, **kwargs) -> None:
self._api_key = os.environ.get("RETAINDB_API_KEY", "")
self._base_url = os.environ.get("RETAINDB_BASE_URL", _DEFAULT_BASE_URL)
self._user_id = kwargs.get("user_id", "default")
self._session_id = session_id
# Derive profile-scoped project name so different profiles don't
# share server-side memory. Explicit RETAINDB_PROJECT always wins.
explicit_project = os.environ.get("RETAINDB_PROJECT")
if explicit_project:
self._project = explicit_project
else:
hermes_home = kwargs.get("hermes_home", "")
profile_name = os.path.basename(hermes_home) if hermes_home else ""
# Default profile (~/.hermes) → "hermes"; named profiles → "hermes-<name>"
if profile_name and profile_name != ".hermes":
self._project = f"hermes-{profile_name}"
else:
self._project = "hermes"
def system_prompt_block(self) -> str:
return (
"# RetainDB Memory\n"
f"Active. Project: {self._project}.\n"
"Use retaindb_search to find memories, retaindb_remember to store facts, "
"retaindb_profile for a user overview, retaindb_context for task-relevant context."
)
def prefetch(self, query: str, *, session_id: str = "") -> str:
if self._prefetch_thread and self._prefetch_thread.is_alive():
self._prefetch_thread.join(timeout=3.0)
with self._prefetch_lock:
result = self._prefetch_result
self._prefetch_result = ""
if not result:
return ""
return f"## RetainDB Memory\n{result}"
def queue_prefetch(self, query: str, *, session_id: str = "") -> None:
def _run():
try:
data = self._api("POST", "/v1/recall", json={
"project": self._project,
"query": query,
"user_id": self._user_id,
"top_k": 5,
})
results = data.get("results", [])
if results:
lines = [r.get("content", "") for r in results if r.get("content")]
with self._prefetch_lock:
self._prefetch_result = "\n".join(f"- {l}" for l in lines)
except Exception as e:
logger.debug("RetainDB prefetch failed: %s", e)
self._prefetch_thread = threading.Thread(target=_run, daemon=True, name="retaindb-prefetch")
self._prefetch_thread.start()
def sync_turn(self, user_content: str, assistant_content: str, *, session_id: str = "") -> None:
"""Ingest conversation turn in background (non-blocking)."""
def _sync():
try:
self._api("POST", "/v1/ingest", json={
"project": self._project,
"user_id": self._user_id,
"session_id": self._session_id,
"messages": [
{"role": "user", "content": user_content},
{"role": "assistant", "content": assistant_content},
],
})
except Exception as e:
logger.warning("RetainDB sync failed: %s", e)
if self._sync_thread and self._sync_thread.is_alive():
self._sync_thread.join(timeout=5.0)
self._sync_thread = threading.Thread(target=_sync, daemon=True, name="retaindb-sync")
self._sync_thread.start()
def get_tool_schemas(self) -> List[Dict[str, Any]]:
return [PROFILE_SCHEMA, SEARCH_SCHEMA, CONTEXT_SCHEMA, REMEMBER_SCHEMA, FORGET_SCHEMA]
def handle_tool_call(self, tool_name: str, args: dict, **kwargs) -> str:
try:
if tool_name == "retaindb_profile":
data = self._api("GET", f"/v1/profile/{self._project}/{self._user_id}")
return json.dumps(data)
elif tool_name == "retaindb_search":
query = args.get("query", "")
if not query:
return json.dumps({"error": "query is required"})
data = self._api("POST", "/v1/search", json={
"project": self._project,
"user_id": self._user_id,
"query": query,
"top_k": min(int(args.get("top_k", 8)), 20),
})
return json.dumps(data)
elif tool_name == "retaindb_context":
query = args.get("query", "")
if not query:
return json.dumps({"error": "query is required"})
data = self._api("POST", "/v1/recall", json={
"project": self._project,
"user_id": self._user_id,
"query": query,
"top_k": 5,
})
return json.dumps(data)
elif tool_name == "retaindb_remember":
content = args.get("content", "")
if not content:
return json.dumps({"error": "content is required"})
data = self._api("POST", "/v1/remember", json={
"project": self._project,
"user_id": self._user_id,
"content": content,
"memory_type": args.get("memory_type", "fact"),
"importance": float(args.get("importance", 0.5)),
})
return json.dumps(data)
elif tool_name == "retaindb_forget":
memory_id = args.get("memory_id", "")
if not memory_id:
return json.dumps({"error": "memory_id is required"})
data = self._api("DELETE", f"/v1/memory/{memory_id}")
return json.dumps(data)
return json.dumps({"error": f"Unknown tool: {tool_name}"})
except Exception as e:
return json.dumps({"error": str(e)})
def on_memory_write(self, action: str, target: str, content: str) -> None:
if action == "add":
try:
self._api("POST", "/v1/remember", json={
"project": self._project,
"user_id": self._user_id,
"content": content,
"memory_type": "preference" if target == "user" else "fact",
})
except Exception as e:
logger.debug("RetainDB memory bridge failed: %s", e)
def shutdown(self) -> None:
for t in (self._prefetch_thread, self._sync_thread):
if t and t.is_alive():
t.join(timeout=5.0)
def register(ctx) -> None:
"""Register RetainDB as a memory provider plugin."""
ctx.register_memory_provider(RetainDBMemoryProvider())

View File

@@ -0,0 +1,7 @@
name: retaindb
version: 1.0.0
description: "RetainDB — cloud memory API with hybrid search and 7 memory types."
pip_dependencies:
- requests
requires_env:
- RETAINDB_API_KEY

View File

@@ -76,7 +76,10 @@ all = [
"hermes-agent[modal]",
"hermes-agent[daytona]",
"hermes-agent[messaging]",
"hermes-agent[matrix]",
# matrix excluded: python-olm (required by matrix-nio[e2e]) is upstream-broken
# on modern macOS (archived libolm, C++ errors with Clang 21+). Including it
# here causes the entire [all] install to fail, dropping all other extras.
# Users who need Matrix can install manually: pip install 'hermes-agent[matrix]'
"hermes-agent[cron]",
"hermes-agent[cli]",
"hermes-agent[dev]",
@@ -102,7 +105,7 @@ hermes-acp = "acp_adapter.entry:main"
py-modules = ["run_agent", "model_tools", "toolsets", "batch_runner", "trajectory_compressor", "toolset_distributions", "cli", "hermes_constants", "hermes_state", "hermes_time", "rl_cli", "utils"]
[tool.setuptools.packages.find]
include = ["agent", "tools", "tools.*", "hermes_cli", "gateway", "gateway.*", "cron", "honcho_integration", "acp_adapter"]
include = ["agent", "tools", "tools.*", "hermes_cli", "gateway", "gateway.*", "cron", "acp_adapter", "plugins", "plugins.*"]
[tool.pytest.ini_options]
testpaths = ["tests"]

File diff suppressed because it is too large Load Diff

View File

@@ -1,940 +0,0 @@
---
name: ml-paper-writing
description: Write publication-ready ML/AI papers for NeurIPS, ICML, ICLR, ACL, AAAI, COLM. Use when drafting papers from research repos, structuring arguments, verifying citations, or preparing camera-ready submissions. Includes LaTeX templates, reviewer guidelines, and citation verification workflows.
version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [semanticscholar, arxiv, habanero, requests]
metadata:
hermes:
tags: [Academic Writing, NeurIPS, ICML, ICLR, ACL, AAAI, COLM, LaTeX, Paper Writing, Citations, Research]
---
# ML Paper Writing for Top AI Conferences
Expert-level guidance for writing publication-ready papers targeting **NeurIPS, ICML, ICLR, ACL, AAAI, and COLM**. This skill combines writing philosophy from top researchers (Nanda, Farquhar, Karpathy, Lipton, Steinhardt) with practical tools: LaTeX templates, citation verification APIs, and conference checklists.
## Core Philosophy: Collaborative Writing
**Paper writing is collaborative, but Claude should be proactive in delivering drafts.**
The typical workflow starts with a research repository containing code, results, and experimental artifacts. Claude's role is to:
1. **Understand the project** by exploring the repo, results, and existing documentation
2. **Deliver a complete first draft** when confident about the contribution
3. **Search literature** using web search and APIs to find relevant citations
4. **Refine through feedback cycles** when the scientist provides input
5. **Ask for clarification** only when genuinely uncertain about key decisions
**Key Principle**: Be proactive. If the repo and results are clear, deliver a full draft. Don't block waiting for feedback on every section—scientists are busy. Produce something concrete they can react to, then iterate based on their response.
---
## ⚠️ CRITICAL: Never Hallucinate Citations
**This is the most important rule in academic writing with AI assistance.**
### The Problem
AI-generated citations have a **~40% error rate**. Hallucinated references—papers that don't exist, wrong authors, incorrect years, fabricated DOIs—are a serious form of academic misconduct that can result in desk rejection or retraction.
### The Rule
**NEVER generate BibTeX entries from memory. ALWAYS fetch programmatically.**
| Action | ✅ Correct | ❌ Wrong |
|--------|-----------|----------|
| Adding a citation | Search API → verify → fetch BibTeX | Write BibTeX from memory |
| Uncertain about a paper | Mark as `[CITATION NEEDED]` | Guess the reference |
| Can't find exact paper | Note: "placeholder - verify" | Invent similar-sounding paper |
### When You Can't Verify a Citation
If you cannot programmatically verify a citation, you MUST:
```latex
% EXPLICIT PLACEHOLDER - requires human verification
\cite{PLACEHOLDER_author2024_verify_this} % TODO: Verify this citation exists
```
**Always tell the scientist**: "I've marked [X] citations as placeholders that need verification. I could not confirm these papers exist."
### Recommended: Install Exa MCP for Paper Search
For the best paper search experience, install **Exa MCP** which provides real-time academic search:
**Claude Code:**
```bash
claude mcp add exa -- npx -y mcp-remote "https://mcp.exa.ai/mcp"
```
**Cursor / VS Code** (add to MCP settings):
```json
{
"mcpServers": {
"exa": {
"type": "http",
"url": "https://mcp.exa.ai/mcp"
}
}
}
```
Exa MCP enables searches like:
- "Find papers on RLHF for language models published after 2023"
- "Search for transformer architecture papers by Vaswani"
- "Get recent work on sparse autoencoders for interpretability"
Then verify results with Semantic Scholar API and fetch BibTeX via DOI.
---
## Workflow 0: Starting from a Research Repository
When beginning paper writing, start by understanding the project:
```
Project Understanding:
- [ ] Step 1: Explore the repository structure
- [ ] Step 2: Read README, existing docs, and key results
- [ ] Step 3: Identify the main contribution with the scientist
- [ ] Step 4: Find papers already cited in the codebase
- [ ] Step 5: Search for additional relevant literature
- [ ] Step 6: Outline the paper structure together
- [ ] Step 7: Draft sections iteratively with feedback
```
**Step 1: Explore the Repository**
```bash
# Understand project structure
ls -la
find . -name "*.py" | head -20
find . -name "*.md" -o -name "*.txt" | xargs grep -l -i "result\|conclusion\|finding"
```
Look for:
- `README.md` - Project overview and claims
- `results/`, `outputs/`, `experiments/` - Key findings
- `configs/` - Experimental settings
- Existing `.bib` files or citation references
- Any draft documents or notes
**Step 2: Identify Existing Citations**
Check for papers already referenced in the codebase:
```bash
# Find existing citations
grep -r "arxiv\|doi\|cite" --include="*.md" --include="*.bib" --include="*.py"
find . -name "*.bib"
```
These are high-signal starting points for Related Work—the scientist has already deemed them relevant.
**Step 3: Clarify the Contribution**
Before writing, explicitly confirm with the scientist:
> "Based on my understanding of the repo, the main contribution appears to be [X].
> The key results show [Y]. Is this the framing you want for the paper,
> or should we emphasize different aspects?"
**Never assume the narrative—always verify with the human.**
**Step 4: Search for Additional Literature**
Use web search to find relevant papers:
```
Search queries to try:
- "[main technique] + [application domain]"
- "[baseline method] comparison"
- "[problem name] state-of-the-art"
- Author names from existing citations
```
Then verify and retrieve BibTeX using the citation workflow below.
**Step 5: Deliver a First Draft**
**Be proactive—deliver a complete draft rather than asking permission for each section.**
If the repo provides clear results and the contribution is apparent:
1. Write the full first draft end-to-end
2. Present the complete draft for feedback
3. Iterate based on scientist's response
If genuinely uncertain about framing or major claims:
1. Draft what you can confidently
2. Flag specific uncertainties: "I framed X as the main contribution—let me know if you'd prefer to emphasize Y instead"
3. Continue with the draft rather than blocking
**Questions to include with the draft** (not before):
- "I emphasized X as the main contribution—adjust if needed"
- "I highlighted results A, B, C—let me know if others are more important"
- "Related work section includes [papers]—add any I missed"
---
## When to Use This Skill
Use this skill when:
- **Starting from a research repo** to write a paper
- **Drafting or revising** specific sections
- **Finding and verifying citations** for related work
- **Formatting** for conference submission
- **Resubmitting** to a different venue (format conversion)
- **Iterating** on drafts with scientist feedback
**Always remember**: First drafts are starting points for discussion, not final outputs.
---
## Balancing Proactivity and Collaboration
**Default: Be proactive. Deliver drafts, then iterate.**
| Confidence Level | Action |
|-----------------|--------|
| **High** (clear repo, obvious contribution) | Write full draft, deliver, iterate on feedback |
| **Medium** (some ambiguity) | Write draft with flagged uncertainties, continue |
| **Low** (major unknowns) | Ask 1-2 targeted questions, then draft |
**Draft first, ask with the draft** (not before):
| Section | Draft Autonomously | Flag With Draft |
|---------|-------------------|-----------------|
| Abstract | Yes | "Framed contribution as X—adjust if needed" |
| Introduction | Yes | "Emphasized problem Y—correct if wrong" |
| Methods | Yes | "Included details A, B, C—add missing pieces" |
| Experiments | Yes | "Highlighted results 1, 2, 3—reorder if needed" |
| Related Work | Yes | "Cited papers X, Y, Z—add any I missed" |
**Only block for input when:**
- Target venue is unclear (affects page limits, framing)
- Multiple contradictory framings seem equally valid
- Results seem incomplete or inconsistent
- Explicit request to review before continuing
**Don't block for:**
- Word choice decisions
- Section ordering
- Which specific results to show (make a choice, flag it)
- Citation completeness (draft with what you find, note gaps)
---
## The Narrative Principle
**The single most critical insight**: Your paper is not a collection of experiments—it's a story with one clear contribution supported by evidence.
Every successful ML paper centers on what Neel Nanda calls "the narrative": a short, rigorous, evidence-based technical story with a takeaway readers care about.
**Three Pillars (must be crystal clear by end of introduction):**
| Pillar | Description | Example |
|--------|-------------|---------|
| **The What** | 1-3 specific novel claims within cohesive theme | "We prove that X achieves Y under condition Z" |
| **The Why** | Rigorous empirical evidence supporting claims | Strong baselines, experiments distinguishing hypotheses |
| **The So What** | Why readers should care | Connection to recognized community problems |
**If you cannot state your contribution in one sentence, you don't yet have a paper.**
---
## Paper Structure Workflow
### Workflow 1: Writing a Complete Paper (Iterative)
Copy this checklist and track progress. **Each step involves drafting → feedback → revision:**
```
Paper Writing Progress:
- [ ] Step 1: Define the one-sentence contribution (with scientist)
- [ ] Step 2: Draft Figure 1 → get feedback → revise
- [ ] Step 3: Draft abstract → get feedback → revise
- [ ] Step 4: Draft introduction → get feedback → revise
- [ ] Step 5: Draft methods → get feedback → revise
- [ ] Step 6: Draft experiments → get feedback → revise
- [ ] Step 7: Draft related work → get feedback → revise
- [ ] Step 8: Draft limitations → get feedback → revise
- [ ] Step 9: Complete paper checklist (required)
- [ ] Step 10: Final review cycle and submission
```
**Step 1: Define the One-Sentence Contribution**
**This step requires explicit confirmation from the scientist.**
Before writing anything, articulate and verify:
- What is the single thing your paper contributes?
- What was not obvious or present before your work?
> "I propose framing the contribution as: '[one sentence]'. Does this capture
> what you see as the main takeaway? Should we adjust the emphasis?"
**Step 2: Draft Figure 1**
Figure 1 deserves special attention—many readers skip directly to it.
- Convey core idea, approach, or most compelling result
- Use vector graphics (PDF/EPS for plots)
- Write captions that stand alone without main text
- Ensure readability in black-and-white (8% of men have color vision deficiency)
**Step 3: Write Abstract (5-Sentence Formula)**
From Sebastian Farquhar (DeepMind):
```
1. What you achieved: "We introduce...", "We prove...", "We demonstrate..."
2. Why this is hard and important
3. How you do it (with specialist keywords for discoverability)
4. What evidence you have
5. Your most remarkable number/result
```
**Delete** generic openings like "Large language models have achieved remarkable success..."
**Step 4: Write Introduction (1-1.5 pages max)**
Must include:
- 2-4 bullet contribution list (max 1-2 lines each in two-column format)
- Clear problem statement
- Brief approach overview
- Methods should start by page 2-3 maximum
**Step 5: Methods Section**
Enable reimplementation:
- Conceptual outline or pseudocode
- All hyperparameters listed
- Architectural details sufficient for reproduction
- Present final design decisions; ablations go in experiments
**Step 6: Experiments Section**
For each experiment, explicitly state:
- What claim it supports
- How it connects to main contribution
- Experimental setting (details in appendix)
- What to observe: "the blue line shows X, which demonstrates Y"
Requirements:
- Error bars with methodology (standard deviation vs standard error)
- Hyperparameter search ranges
- Compute infrastructure (GPU type, total hours)
- Seed-setting methods
**Step 7: Related Work**
Organize methodologically, not paper-by-paper:
**Good:** "One line of work uses Floogledoodle's assumption [refs] whereas we use Doobersnoddle's assumption because..."
**Bad:** "Snap et al. introduced X while Crackle et al. introduced Y."
Cite generously—reviewers likely authored relevant papers.
**Step 8: Limitations Section (REQUIRED)**
All major conferences require this. Counter-intuitively, honesty helps:
- Reviewers are instructed not to penalize honest limitation acknowledgment
- Pre-empt criticisms by identifying weaknesses first
- Explain why limitations don't undermine core claims
**Step 9: Paper Checklist**
NeurIPS, ICML, and ICLR all require paper checklists. See [references/checklists.md](references/checklists.md).
---
## Writing Philosophy for Top ML Conferences
**This section distills the most important writing principles from leading ML researchers.** These aren't optional style suggestions—they're what separates accepted papers from rejected ones.
> "A paper is a short, rigorous, evidence-based technical story with a takeaway readers care about." — Neel Nanda
### The Sources Behind This Guidance
This skill synthesizes writing philosophy from researchers who have published extensively at top venues:
| Source | Key Contribution | Link |
|--------|-----------------|------|
| **Neel Nanda** (Google DeepMind) | The Narrative Principle, What/Why/So What framework | [How to Write ML Papers](https://www.alignmentforum.org/posts/eJGptPbbFPZGLpjsp/highly-opinionated-advice-on-how-to-write-ml-papers) |
| **Sebastian Farquhar** (DeepMind) | 5-sentence abstract formula | [How to Write ML Papers](https://sebastianfarquhar.com/on-research/2024/11/04/how_to_write_ml_papers/) |
| **Gopen & Swan** | 7 principles of reader expectations | [Science of Scientific Writing](https://cseweb.ucsd.edu/~swanson/papers/science-of-writing.pdf) |
| **Zachary Lipton** | Word choice, eliminating hedging | [Heuristics for Scientific Writing](https://www.approximatelycorrect.com/2018/01/29/heuristics-technical-scientific-writing-machine-learning-perspective/) |
| **Jacob Steinhardt** (UC Berkeley) | Precision, consistent terminology | [Writing Tips](https://bounded-regret.ghost.io/) |
| **Ethan Perez** (Anthropic) | Micro-level clarity tips | [Easy Paper Writing Tips](https://ethanperez.net/easy-paper-writing-tips/) |
| **Andrej Karpathy** | Single contribution focus | Various lectures |
**For deeper dives into any of these, see:**
- [references/writing-guide.md](references/writing-guide.md) - Full explanations with examples
- [references/sources.md](references/sources.md) - Complete bibliography
### Time Allocation (From Neel Nanda)
Spend approximately **equal time** on each of:
1. The abstract
2. The introduction
3. The figures
4. Everything else combined
**Why?** Most reviewers form judgments before reaching your methods. Readers encounter your paper as: **title → abstract → introduction → figures → maybe the rest.**
### Writing Style Guidelines
#### Sentence-Level Clarity (Gopen & Swan's 7 Principles)
These principles are based on how readers actually process prose. Violating them forces readers to spend cognitive effort on structure rather than content.
| Principle | Rule | Example |
|-----------|------|---------|
| **Subject-verb proximity** | Keep subject and verb close | ❌ "The model, which was trained on..., achieves" → ✅ "The model achieves... after training on..." |
| **Stress position** | Place emphasis at sentence ends | ❌ "Accuracy improves by 15% when using attention" → ✅ "When using attention, accuracy improves by **15%**" |
| **Topic position** | Put context first, new info after | ✅ "Given these constraints, we propose..." |
| **Old before new** | Familiar info → unfamiliar info | Link backward, then introduce new |
| **One unit, one function** | Each paragraph makes one point | Split multi-point paragraphs |
| **Action in verb** | Use verbs, not nominalizations | ❌ "We performed an analysis" → ✅ "We analyzed" |
| **Context before new** | Set stage before presenting | Explain before showing equation |
**Full 7 principles with detailed examples:** See [references/writing-guide.md](references/writing-guide.md#the-7-principles-of-reader-expectations)
#### Micro-Level Tips (Ethan Perez)
These small changes accumulate into significantly clearer prose:
- **Minimize pronouns**: ❌ "This shows..." → ✅ "This result shows..."
- **Verbs early**: Position verbs near sentence start
- **Unfold apostrophes**: ❌ "X's Y" → ✅ "The Y of X" (when awkward)
- **Delete filler words**: "actually," "a bit," "very," "really," "basically," "quite," "essentially"
**Full micro-tips with examples:** See [references/writing-guide.md](references/writing-guide.md#micro-level-writing-tips)
#### Word Choice (Zachary Lipton)
- **Be specific**: ❌ "performance" → ✅ "accuracy" or "latency" (say what you mean)
- **Eliminate hedging**: Drop "may" and "can" unless genuinely uncertain
- **Avoid incremental vocabulary**: ❌ "combine," "modify," "expand" → ✅ "develop," "propose," "introduce"
- **Delete intensifiers**: ❌ "provides *very* tight approximation" → ✅ "provides tight approximation"
#### Precision Over Brevity (Jacob Steinhardt)
- **Consistent terminology**: Different terms for same concept creates confusion. Pick one and stick with it.
- **State assumptions formally**: Before theorems, list all assumptions explicitly
- **Intuition + rigor**: Provide intuitive explanations alongside formal proofs
### What Reviewers Actually Read
Understanding reviewer behavior helps prioritize your effort:
| Paper Section | % Reviewers Who Read | Implication |
|---------------|---------------------|-------------|
| Abstract | 100% | Must be perfect |
| Introduction | 90%+ (skimmed) | Front-load contribution |
| Figures | Examined before methods | Figure 1 is critical |
| Methods | Only if interested | Don't bury the lede |
| Appendix | Rarely | Put only supplementary details |
**Bottom line**: If your abstract and intro don't hook reviewers, they may never read your brilliant methods section.
---
## Conference Requirements Quick Reference
| Conference | Page Limit | Extra for Camera-Ready | Key Requirement |
|------------|------------|------------------------|-----------------|
| **NeurIPS 2025** | 9 pages | +0 | Mandatory checklist, lay summary for accepted |
| **ICML 2026** | 8 pages | +1 | Broader Impact Statement required |
| **ICLR 2026** | 9 pages | +1 | LLM disclosure required, reciprocal reviewing |
| **ACL 2025** | 8 pages (long) | varies | Limitations section mandatory |
| **AAAI 2026** | 7 pages | +1 | Strict style file adherence |
| **COLM 2025** | 9 pages | +1 | Focus on language models |
**Universal Requirements:**
- Double-blind review (anonymize submissions)
- References don't count toward page limit
- Appendices unlimited but reviewers not required to read
- LaTeX required for all venues
**LaTeX Templates:** See [templates/](templates/) directory for all conference templates.
---
## Using LaTeX Templates Properly
### Workflow 4: Starting a New Paper from Template
**Always copy the entire template directory first, then write within it.**
```
Template Setup Checklist:
- [ ] Step 1: Copy entire template directory to new project
- [ ] Step 2: Verify template compiles as-is (before any changes)
- [ ] Step 3: Read the template's example content to understand structure
- [ ] Step 4: Replace example content section by section
- [ ] Step 5: Keep template comments/examples as reference until done
- [ ] Step 6: Clean up template artifacts only at the end
```
**Step 1: Copy the Full Template**
```bash
# Create your paper directory with the complete template
cp -r templates/neurips2025/ ~/papers/my-new-paper/
cd ~/papers/my-new-paper/
# Verify structure is complete
ls -la
# Should see: main.tex, neurips.sty, Makefile, etc.
```
**⚠️ IMPORTANT**: Copy the ENTIRE directory, not just `main.tex`. Templates include:
- Style files (`.sty`) - required for compilation
- Bibliography styles (`.bst`) - required for references
- Example content - useful as reference
- Makefiles - for easy compilation
**Step 2: Verify Template Compiles First**
Before making ANY changes, compile the template as-is:
```bash
# Using latexmk (recommended)
latexmk -pdf main.tex
# Or manual compilation
pdflatex main.tex
bibtex main
pdflatex main.tex
pdflatex main.tex
```
If the unmodified template doesn't compile, fix that first. Common issues:
- Missing TeX packages → install via `tlmgr install <package>`
- Wrong TeX distribution → use TeX Live (recommended)
**Step 3: Keep Template Content as Reference**
Don't immediately delete all example content. Instead:
```latex
% KEEP template examples commented out as you write
% This shows you the expected format
% Template example (keep for reference):
% \begin{figure}[t]
% \centering
% \includegraphics[width=0.8\linewidth]{example-image}
% \caption{Template shows caption style}
% \end{figure}
% Your actual figure:
\begin{figure}[t]
\centering
\includegraphics[width=0.8\linewidth]{your-figure.pdf}
\caption{Your caption following the same style.}
\end{figure}
```
**Step 4: Replace Content Section by Section**
Work through the paper systematically:
```
Replacement Order:
1. Title and authors (anonymize for submission)
2. Abstract
3. Introduction
4. Methods
5. Experiments
6. Related Work
7. Conclusion
8. References (your .bib file)
9. Appendix
```
For each section:
1. Read the template's example content
2. Note any special formatting or macros used
3. Replace with your content following the same patterns
4. Compile frequently to catch errors early
**Step 5: Use Template Macros**
Templates often define useful macros. Check the preamble for:
```latex
% Common template macros to use:
\newcommand{\method}{YourMethodName} % Consistent method naming
\newcommand{\eg}{e.g.,\xspace} % Proper abbreviations
\newcommand{\ie}{i.e.,\xspace}
\newcommand{\etal}{\textit{et al.}\xspace}
```
**Step 6: Clean Up Only at the End**
Only remove template artifacts when paper is nearly complete:
```latex
% BEFORE SUBMISSION - remove these:
% - Commented-out template examples
% - Unused packages
% - Template's example figures/tables
% - Lorem ipsum or placeholder text
% KEEP these:
% - All style files (.sty)
% - Bibliography style (.bst)
% - Required packages from template
% - Any custom macros you're using
```
### Template Pitfalls to Avoid
| Pitfall | Problem | Solution |
|---------|---------|----------|
| Copying only `main.tex` | Missing `.sty`, won't compile | Copy entire directory |
| Modifying `.sty` files | Breaks conference formatting | Never edit style files |
| Adding random packages | Conflicts, breaks template | Only add if necessary |
| Deleting template content too early | Lose formatting reference | Keep as comments until done |
| Not compiling frequently | Errors accumulate | Compile after each section |
### Quick Template Reference
| Conference | Main File | Key Style File | Notes |
|------------|-----------|----------------|-------|
| NeurIPS 2025 | `main.tex` | `neurips.sty` | Has Makefile |
| ICML 2026 | `example_paper.tex` | `icml2026.sty` | Includes algorithm packages |
| ICLR 2026 | `iclr2026_conference.tex` | `iclr2026_conference.sty` | Has math_commands.tex |
| ACL | `acl_latex.tex` | `acl.sty` | Strict formatting |
| AAAI 2026 | `aaai2026-unified-template.tex` | `aaai2026.sty` | Very strict compliance |
| COLM 2025 | `colm2025_conference.tex` | `colm2025_conference.sty` | Similar to ICLR |
---
## Conference Resubmission & Format Conversion
When a paper is rejected or withdrawn from one venue and resubmitted to another, format conversion is required. This is a common workflow in ML research.
### Workflow 3: Converting Between Conference Formats
```
Format Conversion Checklist:
- [ ] Step 1: Identify source and target template differences
- [ ] Step 2: Create new project with target template
- [ ] Step 3: Copy content sections (not preamble)
- [ ] Step 4: Adjust page limits and content
- [ ] Step 5: Update conference-specific requirements
- [ ] Step 6: Verify compilation and formatting
```
**Step 1: Key Template Differences**
| From → To | Page Change | Key Adjustments |
|-----------|-------------|-----------------|
| NeurIPS → ICML | 9 → 8 pages | Cut 1 page, add Broader Impact if missing |
| ICML → ICLR | 8 → 9 pages | Can expand experiments, add LLM disclosure |
| NeurIPS → ACL | 9 → 8 pages | Restructure for NLP conventions, add Limitations |
| ICLR → AAAI | 9 → 7 pages | Significant cuts needed, strict style adherence |
| Any → COLM | varies → 9 | Reframe for language model focus |
**Step 2: Content Migration (NOT Template Merge)**
**Never copy LaTeX preambles between templates.** Instead:
```bash
# 1. Start fresh with target template
cp -r templates/icml2026/ new_submission/
# 2. Copy ONLY content sections from old paper
# - Abstract text
# - Section content (between \section{} commands)
# - Figures and tables
# - Bibliography entries
# 3. Paste into target template structure
```
**Step 3: Adjusting for Page Limits**
When cutting pages (e.g., NeurIPS 9 → AAAI 7):
- Move detailed proofs to appendix
- Condense related work (cite surveys instead of individual papers)
- Combine similar experiments into unified tables
- Use smaller figure sizes with subfigures
- Tighten writing: eliminate redundancy, use active voice
When expanding (e.g., ICML 8 → ICLR 9):
- Add ablation studies reviewers requested
- Expand limitations discussion
- Include additional baselines
- Add qualitative examples
**Step 4: Conference-Specific Adjustments**
| Target Venue | Required Additions |
|--------------|-------------------|
| **ICML** | Broader Impact Statement (after conclusion) |
| **ICLR** | LLM usage disclosure, reciprocal reviewing agreement |
| **ACL/EMNLP** | Limitations section (mandatory), Ethics Statement |
| **AAAI** | Strict adherence to style file (no modifications) |
| **NeurIPS** | Paper checklist (appendix), lay summary if accepted |
**Step 5: Update References**
```latex
% Remove self-citations that reveal identity (for blind review)
% Update any "under review" citations to published versions
% Add new relevant work published since last submission
```
**Step 6: Addressing Previous Reviews**
When resubmitting after rejection:
- **Do** address reviewer concerns in the new version
- **Do** add experiments/clarifications reviewers requested
- **Don't** include a "changes from previous submission" section (blind review)
- **Don't** reference the previous submission or reviews
**Common Conversion Pitfalls:**
- ❌ Copying `\usepackage` commands (causes conflicts)
- ❌ Keeping old conference header/footer commands
- ❌ Forgetting to update `\bibliography{}` path
- ❌ Missing conference-specific required sections
- ❌ Exceeding page limit after format change
---
## Citation Workflow (Hallucination Prevention)
**⚠️ CRITICAL**: AI-generated citations have ~40% error rate. **Never write BibTeX from memory.**
### The Golden Rule
```
IF you cannot programmatically fetch a citation:
→ Mark it as [CITATION NEEDED] or [PLACEHOLDER - VERIFY]
→ Tell the scientist explicitly
→ NEVER invent a plausible-sounding reference
```
### Workflow 2: Adding Citations
```
Citation Verification (MANDATORY for every citation):
- [ ] Step 1: Search using Exa MCP or Semantic Scholar API
- [ ] Step 2: Verify paper exists in 2+ sources (Semantic Scholar + arXiv/CrossRef)
- [ ] Step 3: Retrieve BibTeX via DOI (programmatically, not from memory)
- [ ] Step 4: Verify the claim you're citing actually appears in the paper
- [ ] Step 5: Add verified BibTeX to bibliography
- [ ] Step 6: If ANY step fails → mark as placeholder, inform scientist
```
**Step 0: Use Exa MCP for Initial Search (Recommended)**
If Exa MCP is installed, use it to find relevant papers:
```
Search: "RLHF language model alignment 2023"
Search: "sparse autoencoders interpretability"
Search: "attention mechanism transformers Vaswani"
```
Then verify each result with Semantic Scholar and fetch BibTeX via DOI.
**Step 1: Search Semantic Scholar**
```python
from semanticscholar import SemanticScholar
sch = SemanticScholar()
results = sch.search_paper("attention mechanism transformers", limit=5)
for paper in results:
print(f"{paper.title} - {paper.paperId}")
print(f" DOI: {paper.externalIds.get('DOI', 'N/A')}")
```
**Step 2: Verify Existence**
Confirm paper appears in at least two sources (Semantic Scholar + CrossRef/arXiv).
**Step 3: Retrieve BibTeX via DOI**
```python
import requests
def doi_to_bibtex(doi: str) -> str:
"""Get verified BibTeX from DOI via CrossRef."""
response = requests.get(
f"https://doi.org/{doi}",
headers={"Accept": "application/x-bibtex"}
)
response.raise_for_status()
return response.text
# Example
bibtex = doi_to_bibtex("10.48550/arXiv.1706.03762")
print(bibtex)
```
**Step 4: Verify Claims**
Before citing for a specific claim, access the paper and confirm the attributed claim actually appears.
**Step 5: Handle Failures Explicitly**
If you cannot verify a citation at ANY step:
```latex
% Option 1: Explicit placeholder
\cite{PLACEHOLDER_smith2023_verify} % TODO: Could not verify - scientist must confirm
% Option 2: Note in text
... as shown in prior work [CITATION NEEDED - could not verify Smith et al. 2023].
```
**Always inform the scientist:**
> "I could not verify the following citations and have marked them as placeholders:
> - Smith et al. 2023 on reward hacking - could not find in Semantic Scholar
> - Jones 2022 on scaling laws - found similar paper but different authors
> Please verify these before submission."
### Summary: Citation Rules
| Situation | Action |
|-----------|--------|
| Found paper, got DOI, fetched BibTeX | ✅ Use the citation |
| Found paper, no DOI | ✅ Use arXiv BibTeX or manual entry from paper |
| Paper exists but can't fetch BibTeX | ⚠️ Mark placeholder, inform scientist |
| Uncertain if paper exists | ❌ Mark `[CITATION NEEDED]`, inform scientist |
| "I think there's a paper about X" | ❌ **NEVER cite** - search first or mark placeholder |
**🚨 NEVER generate BibTeX from memory—always fetch programmatically. 🚨**
See [references/citation-workflow.md](references/citation-workflow.md) for complete API documentation.
---
## Common Issues and Solutions
**Issue: Abstract too generic**
Delete first sentence if it could be prepended to any ML paper. Start with your specific contribution.
**Issue: Introduction exceeds 1.5 pages**
Split background into Related Work. Front-load contribution bullets. Methods should start by page 2-3.
**Issue: Experiments lack explicit claims**
Add sentence before each experiment: "This experiment tests whether [specific claim]..."
**Issue: Reviewers find paper hard to follow**
- Add explicit signposting: "In this section, we show X"
- Use consistent terminology throughout
- Include figure captions that stand alone
**Issue: Missing statistical significance**
Always include:
- Error bars (specify: std dev or std error)
- Number of runs
- Statistical tests if comparing methods
---
## Reviewer Evaluation Criteria
Reviewers assess papers on four dimensions:
| Criterion | What Reviewers Look For |
|-----------|------------------------|
| **Quality** | Technical soundness, well-supported claims |
| **Clarity** | Clear writing, reproducible by experts |
| **Significance** | Community impact, advances understanding |
| **Originality** | New insights (doesn't require new method) |
**Scoring (NeurIPS 6-point scale):**
- 6: Strong Accept - Groundbreaking, flawless
- 5: Accept - Technically solid, high impact
- 4: Borderline Accept - Solid, limited evaluation
- 3: Borderline Reject - Solid but weaknesses outweigh
- 2: Reject - Technical flaws
- 1: Strong Reject - Known results or ethics issues
See [references/reviewer-guidelines.md](references/reviewer-guidelines.md) for detailed reviewer instructions.
---
## Tables and Figures
### Tables
Use `booktabs` LaTeX package for professional tables:
```latex
\usepackage{booktabs}
\begin{tabular}{lcc}
\toprule
Method & Accuracy ↑ & Latency ↓ \\
\midrule
Baseline & 85.2 & 45ms \\
\textbf{Ours} & \textbf{92.1} & 38ms \\
\bottomrule
\end{tabular}
```
**Rules:**
- Bold best value per metric
- Include direction symbols (↑ higher is better, ↓ lower is better)
- Right-align numerical columns
- Consistent decimal precision
### Figures
- **Vector graphics** (PDF, EPS) for all plots and diagrams
- **Raster** (PNG 600 DPI) only for photographs
- Use **colorblind-safe palettes** (Okabe-Ito or Paul Tol)
- Verify **grayscale readability** (8% of men have color vision deficiency)
- **No title inside figure**—the caption serves this function
- **Self-contained captions**—reader should understand without main text
---
## References & Resources
### Reference Documents (Deep Dives)
| Document | Contents |
|----------|----------|
| [writing-guide.md](references/writing-guide.md) | Gopen & Swan 7 principles, Ethan Perez micro-tips, word choice |
| [citation-workflow.md](references/citation-workflow.md) | Citation APIs, Python code, BibTeX management |
| [checklists.md](references/checklists.md) | NeurIPS 16-item, ICML, ICLR, ACL requirements |
| [reviewer-guidelines.md](references/reviewer-guidelines.md) | Evaluation criteria, scoring, rebuttals |
| [sources.md](references/sources.md) | Complete bibliography of all sources |
### LaTeX Templates
Templates in `templates/` directory: **ICML 2026**, **ICLR 2026**, **NeurIPS 2025**, **ACL/EMNLP**, **AAAI 2026**, **COLM 2025**.
**Compiling to PDF:**
- **VS Code/Cursor**: Install LaTeX Workshop extension + TeX Live → Save to auto-compile
- **Command line**: `latexmk -pdf main.tex` or `pdflatex` + `bibtex` workflow
- **Online**: Upload to [Overleaf](https://overleaf.com)
See [templates/README.md](templates/README.md) for detailed setup instructions.
### Key External Sources
**Writing Philosophy:**
- [Neel Nanda: How to Write ML Papers](https://www.alignmentforum.org/posts/eJGptPbbFPZGLpjsp/highly-opinionated-advice-on-how-to-write-ml-papers) - Narrative, "What/Why/So What"
- [Farquhar: How to Write ML Papers](https://sebastianfarquhar.com/on-research/2024/11/04/how_to_write_ml_papers/) - 5-sentence abstract
- [Gopen & Swan: Science of Scientific Writing](https://cseweb.ucsd.edu/~swanson/papers/science-of-writing.pdf) - 7 reader expectation principles
- [Lipton: Heuristics for Scientific Writing](https://www.approximatelycorrect.com/2018/01/29/heuristics-technical-scientific-writing-machine-learning-perspective/) - Word choice
- [Perez: Easy Paper Writing Tips](https://ethanperez.net/easy-paper-writing-tips/) - Micro-level clarity
**APIs:** [Semantic Scholar](https://api.semanticscholar.org/api-docs/) | [CrossRef](https://www.crossref.org/documentation/retrieve-metadata/rest-api/) | [arXiv](https://info.arxiv.org/help/api/basics.html)
**Venues:** [NeurIPS](https://neurips.cc/Conferences/2025/PaperInformation/StyleFiles) | [ICML](https://icml.cc/Conferences/2025/AuthorInstructions) | [ICLR](https://iclr.cc/Conferences/2026/AuthorGuide) | [ACL](https://github.com/acl-org/acl-style-files)

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,394 @@
# Autoreason: Iterative Refinement Methodology
Complete reference for the autoreason iterative refinement method, derived from experimental results across subjective writing tasks, competitive programming, and four model tiers. Use this when any output (paper draft, experiment script, analysis, task definition) needs iterative improvement.
**Source**: [NousResearch/autoreason](https://github.com/NousResearch/autoreason) — "Autoreason: When Iterative LLM Refinement Works and Why It Fails"
---
## Strategy Selection Guide
### Decision Tree
```
Is the task objectively verifiable (code, math, factual)?
├── YES → Does the model solve it on the first attempt?
│ ├── YES → Use single pass (no refinement needed)
│ └── NO → Use autoreason (structured analysis → reason-informed revision)
└── NO (subjective) → What model tier are you using?
├── Weak (Llama 8B, small models)
│ → Single pass. Model too weak for refinement to help.
│ Invest in generation quality, not iteration.
├── Mid-tier (Haiku 3.5, Gemini Flash)
│ → Autoreason with stronger judges. This is the sweet spot.
│ Self-refinement DESTROYS weak model outputs — autoreason prevents this.
├── Strong (Sonnet 4)
│ → Autoreason for open-ended tasks. Wins 3/5.
│ Critique-and-revise for concrete technical tasks (2/5).
└── Frontier (Sonnet 4.6, Opus)
├── Constrained scope? → Autoreason. Wins 2/3 constrained tasks.
└── Unconstrained? → Critique-and-revise or single pass.
Autoreason FAILS on unconstrained frontier tasks (comes last).
```
### Strategy Comparison Table
| Strategy | Best For | Avoid When | Compute (per iteration) |
|----------|----------|------------|------------------------|
| **Single pass** | Frontier models, template tasks, tight budgets | Mid-tier models where quality ceiling is low | 1 call |
| **Critique-and-revise** | Concrete technical requirements (system design, specifications) | Weak models (degrades output), unconstrained subjective tasks | 2 calls |
| **Autoreason** | Mid-tier models, constrained scope, tasks with genuine tradeoffs | Weak models (Llama 8B), frontier + unconstrained | ~6 calls |
| **Best-of-N** | Almost never recommended | Weak models especially — worse than single pass | N calls |
### Why Each Strategy Fails
| Strategy | Failure Mode | Mechanism |
|----------|-------------|-----------|
| **Single pass** | Quality ceiling | No mechanism to improve beyond first attempt |
| **Critique-and-revise** | Progressive degradation | Model hallucinates problems (sycophancy), scope creeps each pass, never declines to change |
| **Best-of-N** | Random selection | Without good ranking signal, more samples = more mediocre options |
| **Autoreason (unconstrained)** | Synthesis drift | Stronger models produce syntheses so consistently preferred that incumbent never stabilizes |
---
## The Autoreason Loop
### Architecture
```
┌──────────────────────────────────────────────────────────┐
│ ITERATION LOOP │
│ │
│ Incumbent A ──► Critic ──► Author B ──► Synthesizer │
│ │ │ │
│ │ ┌───────────────────────┘ │
│ ▼ ▼ │
│ [A] [AB] [B] │
│ │ │ │ │
│ └──────────────┼────────────┘ │
│ ▼ │
│ Judge Panel (blind) │
│ │ │
│ ▼ │
│ Winner │
│ │ │
│ ┌───────┴───────┐ │
│ ▼ ▼ │
│ A wins k=2 B or AB wins │
│ consecutive? → new incumbent │
│ │ │
│ ▼ │
│ CONVERGED │
└──────────────────────────────────────────────────────────┘
```
### Roles
Every role is a **fresh, isolated agent** with no shared context:
| Role | Input | Output | Key Rule |
|------|-------|--------|----------|
| **Critic** | Task + Incumbent A | List of problems | Find problems ONLY. No fixes. No suggestions. |
| **Author B** | Task + A + Critique | Revised version B | Address each criticism. State which problem each change fixes. |
| **Synthesizer** | Task + X + Y (randomized labels) | Synthesis AB | Take strongest elements of each. Not a compromise. |
| **Judge Panel** | Task + A, AB, B (randomized labels + order) | Ranking | Rank best to worst. No authorship stake. |
### Configuration
| Parameter | Value | Rationale |
|-----------|-------|-----------|
| **Convergence k** | 2 | k=1 premature (94% displaced later). k=2 converges 100%, quality plateaus. k=3 fails 24%, 2x cost, no quality gain. |
| **Author temperature** | 0.7-0.8 | Encourages diverse revisions |
| **Judge temperature** | 0.3 | Encourages consistent evaluation |
| **In-loop judges** | 3 | Balance per-pass cost vs evaluation stability |
| **Final evaluation judges** | 7 | Higher statistical power for final comparison |
| **Max tokens** | 4096 | Standard; 8192 for long-form (papers) |
| **Judge type** | Chain-of-thought | 3x faster convergence on some tasks. Always use. |
| **Tiebreak** | Conservative (incumbent wins) | Prevents false positives — A must be genuinely beaten |
| **Max passes** | 25 (constrained), 50 (remedy) | Safety cap; most converge by pass 10-15 |
### Prompts
#### Critic
```
System: You are a critical reviewer. Your only job is to find real problems.
Be specific and concrete. Do not suggest fixes.
User: Find real problems with this proposal. Focus on:
- Things that won't work as described
- Complexity that doesn't pay for itself
- Assumptions that are wrong
- Missing pieces
Do NOT propose fixes. Just the problems.
```
#### Author B
```
System: You are a senior consultant revising a proposal based on specific
criticisms. Address each valid criticism directly. Do not make changes not
motivated by an identified problem.
User: [TASK] + [VERSION A] + [CRITIC OUTPUT]
Revise to address these problems. For each change, state which problem it fixes.
```
#### Synthesizer
```
System: You are given two versions as equal inputs. Take the strongest elements
from each and produce a coherent synthesis. This is not a compromise.
User: [TASK] + [VERSION X] + [VERSION Y]
(labels randomized — synthesizer doesn't know which is incumbent)
```
#### Judge (Chain-of-Thought) — ALWAYS USE THIS VERSION
```
System: You are an independent evaluator. Think carefully before deciding.
User: [TASK] + Three proposals. For each, think step by step:
1. What does it get right?
2. What does it get wrong or miss?
3. Are numbers and claims defensible?
4. Is detail appropriate or bloated?
After reasoning, rank all three.
RANKING: [best], [second], [worst]
```
#### Baseline Prompts (for comparison experiments)
| Baseline | Prompt |
|----------|--------|
| **Conservative** | "Make minimal improvements while preserving what works. Do not add new sections or significantly expand scope." |
| **Improve this** | "Improve this document." (no further guidance) |
| **Harsh critic** | "Critically evaluate and rewrite, fixing all weaknesses you identify." |
| **Critique & revise** | Step 1: "Produce a structured critique. List specific weaknesses." Step 2: "Revise to address each criticism." |
---
## Scoring: Borda Count
Judges rank candidates. Points awarded by rank position:
| Rank | Points (3 candidates) |
|------|----------------------|
| 1st | 3 |
| 2nd | 2 |
| 3rd | 1 |
**Aggregation**: Sum across all judges. Winner = highest total.
**Tiebreak**: Incumbent (A) wins any tie.
**Example** (3 judges):
- Judge 1: AB > A > B → AB gets 3, A gets 2, B gets 1
- Judge 2: A > AB > B → A gets 3, AB gets 2, B gets 1
- Judge 3: AB > B > A → AB gets 3, B gets 2, A gets 1
- Totals: AB=8, A=6, B=4 → AB wins, becomes new incumbent
**Randomization per judge**:
- Candidate labels randomized (A might be called "Proposal X" for one judge, "Proposal Z" for another)
- Presentation order randomized (AB might appear first or last)
- This prevents position bias and label bias
---
## Model Selection Guide
### Empirical Results by Model Tier
| Model | Autoreason Wins | Autoreason Avg Borda | Best Baseline | Margin | Recommendation |
|-------|----------------|---------------------|---------------|--------|----------------|
| **Llama 3.1 8B** | 1/3 | 23.7 | 25.0 (single) | -1.3 | Skip autoreason. Model too weak for diverse candidates. |
| **Gemini 2.0 Flash** | 2/3 | 25.0 | 20.0 (single) | +5.0 | Good candidate. Moderate gains. |
| **Haiku 3.5** | 3/3 | **42.0** | 33.7 (single) | **+8.3** | **Best candidate.** Perfect scores. Baselines actively destroy quality. |
| **Sonnet 4** | 3/5 | 27.8 | 22.4 (C&R) | +5.4 | Good candidate for open tasks. C&R better for technical tasks. |
| **Sonnet 4.6 (unconstrained)** | 0/1 | 7.0 | 31.0 (C&R) | -24.0 | Do NOT use autoreason without constraints. |
| **Sonnet 4.6 (constrained)** | 2/3 | 29.0 | 27.0 (improve) | +2.0 | Use only with scope constraints. |
### The Generation-Evaluation Gap
The core insight: **autoreason's value depends on the gap between a model's generation capability and its self-evaluation capability.**
```
Weak models (Llama 8B):
Generation: Poor | Self-evaluation: Poor
Gap: Small (both bad) → Autoreason can't help, no diverse candidates
Mid-tier models (Haiku, Flash):
Generation: Decent | Self-evaluation: Poor
Gap: LARGE → Autoreason's sweet spot. External eval bridges the gap.
Strong models (Sonnet 4):
Generation: Good | Self-evaluation: Decent
Gap: Moderate → Autoreason helps on 3/5 tasks
Frontier models (Sonnet 4.6):
Generation: Excellent | Self-evaluation: Good
Gap: Small → Simple methods suffice. Autoreason hurts on unconstrained tasks.
```
**Practical rule**: As model costs drop and capabilities improve, today's frontier becomes tomorrow's mid-tier. The generation-evaluation gap is structural, not temporary. Match refinement architecture to the model's position on the capability curve.
### Judge Selection
| Author Model | Recommended Judge | Rationale |
|-------------|------------------|-----------|
| Llama 8B | Don't use autoreason | Model too weak |
| Gemini Flash | Sonnet 4 | Cross-model evaluation works |
| Haiku 3.5 | Sonnet 4 | Strong external eval is the mechanism |
| Haiku 3.5 | Haiku 3.5 (same) | Still works — tournament structure provides value even without strong judges (20.7 vs 18.3 avg Borda) |
| Sonnet 4 | Sonnet 4 (same) | Same-model judges work at this tier |
| Sonnet 4.6 | Sonnet 4.6 (same) | Only with scope constraints |
---
## Scope Constraint Design
### What Makes Autoreason Work on Constrained Tasks
The same model (Sonnet 4.6) goes from **last place** (unconstrained) to **first place** (constrained) with scope constraints. The constraints bound the improvement space so synthesis drift can't accumulate.
### Effective Constraints
| Constraint Type | Example | Why It Works |
|----------------|---------|-------------|
| **Fixed facts** | "Use only these 8 data points, add nothing else" | Bounds information space |
| **Fixed deliverable** | "500-word startup pitch" (not "improve this") | Defines done condition |
| **Fixed structure** | "Exactly 4 sections, each with 3 numbered items" | Prevents structural drift |
| **Fixed change items** | "Address exactly these 3 reviewer concerns" | Bounds modification scope |
### Ineffective Constraints
| Constraint | Why It Fails | What Happens |
|-----------|-------------|-------------|
| Word count alone | Not a scope constraint | False convergence — rejected for length, not quality |
| "Be concise" | Too vague | Ignored after 2-3 passes |
| "Be comprehensive" | Anti-constraint | Invites scope creep |
| No constraints at all | Unbounded improvement space | Synthesis dominates, no convergence |
### Task Categories
| Task Type | Autoreason Works? | Why |
|-----------|-------------------|-----|
| Tasks with genuine tradeoffs (strategy, policy) | Yes | Multiple valid approaches for tournament to select between |
| Constrained writing (pitch, memo, postmortem) | Mostly (2/3) | Bounded scope, clear evaluation criteria |
| Template-filling (incident postmortem) | No | One correct structure, minimal decision space |
| Competitive programming | Yes | Naturally scoped, test suite provides external verification |
| Open-ended unconstrained + frontier model | No | Synthesis drift, no convergence |
---
## Failure Taxonomy
| Failure Mode | Condition | Detection | Evidence |
|-------------|-----------|-----------|----------|
| **Self-correction unreliable** | No external evaluation signal | Baselines degrade below single pass | Haiku baselines: 16.3 avg vs 33.7 single pass |
| **Drift / synthesis dominance** | Unconstrained scope | A wins <15%, AB dominates | Sonnet 4.6 unconstrained: A wins 12%, AB wins 60%+ |
| **Overfitting to visible feedback** | Shallow revision loop (C&R) | High public/private divergence | C&R overfits 32% on hard code problems |
| **No convergence** | Broken judge pipeline | Parsing failures, <3 valid judges | Mixed panel parser failure: 11+ passes |
| **Model too weak** | Insufficient generation diversity | All candidates look similar | Llama 8B wins only 1/3 tasks |
### Recovery Patterns
| Failure | Recovery |
|---------|----------|
| No convergence (drift) | Add scope constraints to the task |
| No convergence (broken judges) | Fix parser, ensure 3 valid judges before continuing |
| Quality degrades with iteration | Switch to single pass or add constraints |
| Model too weak | Use a stronger model for generation, keep weak model for cheap roles |
| Overfitting (code) | Use structured analysis step, not just test feedback |
---
## Code Domain Adaptation
The autoreason method adapts differently for code vs writing:
### Writing Domain
```
Call 1: Critic (find problems in incumbent)
Call 2: Author B (revise based on critique)
Call 3: Synthesizer (merge A and B)
Calls 4-6: Judge Panel (3 blind judges rank A, B, AB)
```
### Code Domain (6-call budget)
```
Call 1: Initial generation
Call 2: Structured analysis (5 points — NO CODE):
- Problem analysis: what does the problem actually require?
- Approach analysis: what approach did we use, is it correct?
- Failure analysis: why did tests fail?
- Alternative approaches: what else could work?
- Edge cases: what inputs might break the solution?
Calls 3-6: Reason-informed revisions
- Each revision must explain WHY it fixes the issue
- Sees test results from public (visible) test cases
```
**Key difference**: The code strategy replaces the judge panel with test-suite evaluation (objective ground truth). The structured analysis step (Call 2) is what drives recovery — it forces reasoning about *why* the approach failed before attempting fixes.
**Results**: Recovery is the mechanism. Among problems where both autoreason and single-pass failed initially, autoreason recovered 62% vs single-pass's 43% (McNemar p=0.041, Cohen's h=0.32).
---
## Applying Autoreason to Paper Writing
The paper itself was refined using autoreason (Section 8 of the paper):
### Setup
- Model: claude-opus-4
- Judges: 3 Opus judges
- Enhancement: Ground-truth critic (access to actual experimental data)
- Result: Converged in 9 passes
### Key Findings for Paper Refinement
1. **Ground-truth critic is essential**: Without ground-truth access, Opus hallucinated a fabricated ablation study, fake confidence intervals, wrong model names, and incorrect role descriptions. With ground-truth access, the critic caught all four on pass 1.
2. **Judge panel integrity matters**: A broken parser in one judge (Gemini output format mismatch) reduced the panel from 3 to 2 judges. This prevented convergence for 11+ passes. Fixing to 3 working judges, the same incumbent converged in 2 passes. A broken judge doesn't add noise — it prevents equilibrium.
### Recommended Setup for Paper Refinement
```
Critic prompt: "You are reviewing a research paper draft. You have access to the
actual experimental results [GROUND TRUTH DATA]. Find factual errors, unsupported
claims, hallucinated results, and structural problems. Do not suggest fixes."
Author B prompt: "Revise this paper draft to fix the identified problems. For each
change, cite the specific problem it addresses. Do not add claims not supported by
the provided experimental data."
Judge prompt (CoT): "Compare three versions of this paper. For each, evaluate:
1. Factual accuracy against the provided results
2. Clarity of the narrative and contribution
3. Whether claims are properly hedged and supported
4. Writing quality (concision, precision, no filler)
After reasoning, rank all three. RANKING: [best], [second], [worst]"
```
### What to Provide as Ground Truth
- All experimental result JSON files
- Statistical test outputs
- Raw numbers for every table and figure
- Configuration files showing exact hyperparameters
- Code that generated the results (for method description accuracy)
---
## Compute Budget Reference
| Method | Calls per Pass | Typical Passes | Total Calls | Relative Cost |
|--------|---------------|----------------|-------------|---------------|
| Single pass | 1 | 1 | 1 | 1x |
| Best-of-N | N | 1 | N | Nx |
| Critique & revise | 2 | 15 | 30 | 30x |
| Autoreason (in-loop) | ~6 | 10-15 | 60-90 | 60-90x |
| Autoreason (with final eval) | ~6 + 7 | 10-15 + 1 | 67-97 | ~80x |
**Cost-quality tradeoff**: Autoreason uses ~6x more compute per pass and typically runs more passes. This is a real tradeoff. The method trades compute for evaluation quality. On constrained tasks with mid-tier models, this tradeoff is strongly positive. On unconstrained tasks with frontier models, it's negative.
**CoT judges reduce cost**: 1 CoT judge provides evaluation quality comparable to 3 standard judges, at ~40% cost savings. Always use CoT judges.

View File

@@ -10,6 +10,8 @@ This reference documents the mandatory checklist requirements for major ML/AI co
- [ICML Paper Checklist](#icml-paper-checklist)
- [ICLR Requirements](#iclr-requirements)
- [ACL Requirements](#acl-requirements)
- [AAAI Requirements](#aaai-requirements)
- [COLM Requirements](#colm-requirements)
- [Universal Pre-Submission Checklist](#universal-pre-submission-checklist)
---
@@ -280,6 +282,77 @@ If applicable:
---
## AAAI Requirements
### Formatting (Strictest of All Venues)
AAAI enforces formatting rules more strictly than any other major venue. Papers that deviate from the template are desk-rejected.
- [ ] Use the **exact** AAAI style file without modification — no `\setlength`, no `\vspace` hacks, no font overrides
- [ ] 7 pages main content (8 for camera-ready with author info)
- [ ] Two-column format, Times font (set by template)
- [ ] References and appendices do not count toward page limit
- [ ] Abstract must be a single paragraph
- [ ] Do not modify margins, column widths, or font sizes
### Required Sections
- [ ] Abstract (single paragraph, no math or citations)
- [ ] Introduction with clear contribution statement
- [ ] References in AAAI format (uses `aaai2026.bst`)
- [ ] Appendix (optional, unlimited)
### Ethics and Reproducibility
- [ ] Broader impact statement (encouraged but not always mandatory — check current year's CFP)
- [ ] Reproducibility details (datasets, code availability)
- [ ] Acknowledge use of AI writing tools if applicable
### Key Differences from Other Venues
- **No separate limitations section required** (unlike ACL), but discussing limitations is recommended
- **Strictest formatting enforcement** — the style checker will reject non-compliant PDFs
- **No paper checklist** like NeurIPS has, but the universal checklist below still applies
- **Unified template** covers main paper and supplementary in the same file
---
## COLM Requirements
### Overview
COLM (Conference on Language Modeling) focuses specifically on language model research. Framing must target this community.
### Formatting
- [ ] 9 pages main content (10 for camera-ready)
- [ ] Use COLM template (based on ICLR template with modifications)
- [ ] Double-blind review
- [ ] References and appendices unlimited
### Required Sections
- [ ] Abstract
- [ ] Introduction framed for language modeling community
- [ ] Conclusion
- [ ] References
### Content Expectations
- [ ] Contribution must be relevant to language models (broadly interpreted: training, evaluation, applications, theory, alignment, safety)
- [ ] If the method is general, frame with language model examples
- [ ] Baselines should include recent LM-specific methods where applicable
### Key Differences from Other Venues
- **Narrower scope** than NeurIPS/ICML — must frame for LM community
- **Template derived from ICLR** — similar formatting rules
- **Newer venue** — reviewer norms are still establishing; err on the side of thorough evaluation
- **No mandatory checklist** like NeurIPS, but broader impact discussion is expected
- **LLM disclosure**: If LLMs were used in research (code generation, data annotation, writing assistance), disclose this
---
## Universal Pre-Submission Checklist
### Before Every Submission

View File

@@ -289,7 +289,7 @@ class CitationManager:
)
if resp.status_code == 200:
sources.append("CrossRef")
except:
except Exception:
pass
# Check arXiv if ID available
@@ -301,7 +301,7 @@ class CitationManager:
)
if "<entry>" in resp.text and "<title>" in resp.text:
sources.append("arXiv")
except:
except Exception:
pass
return len(sources) >= 2, sources
@@ -318,7 +318,7 @@ class CitationManager:
)
if resp.status_code == 200:
return resp.text
except:
except Exception:
pass
# Fallback: generate from paper data
@@ -419,7 +419,7 @@ def batch_cite(queries: List[str], output_file: str = "references.bib"):
| Customization | Limited | Highly flexible |
| Backend | bibtex | Biber (recommended) |
**Recommendation**: Use BibLaTeX with Biber for new papers.
**Recommendation**: Use natbib with BibTeX for conference submissions — all major venue templates (NeurIPS, ICML, ICLR, ACL, AAAI, COLM) ship with natbib and `.bst` files. BibLaTeX with Biber is an option for journals or personal projects where you control the template.
### LaTeX Setup

View File

@@ -0,0 +1,728 @@
# Experiment Design Patterns
Patterns and best practices distilled from running research experiments at scale with the Hermes agent. These cover experiment infrastructure, evaluation protocols, monitoring, and failure recovery.
---
## Experiment Infrastructure
### Directory Structure
Organize experiments with a consistent structure:
```
workspace/
experiments/
run_main.py # Core experiment runner
run_baselines.py # Baseline comparison
run_ablation.py # Ablation studies
strategies.py # Method implementations
config.yaml # Shared configuration
results/
<experiment_name>/
<task_or_problem>/
<strategy>/
result.json # Final metrics
final_output.md # Final output artifact
history.json # Full trajectory/log
pass_01/ # Per-iteration artifacts (if iterative)
intermediate.md
analysis/
analyze_results.py # Statistical analysis
compute_stats.py # Significance tests
make_charts.py # Visualization
paper/
paper.tex # LaTeX source
fig_*.pdf # Generated figures
```
### Script Design Principles
**1. Incremental Saving (Crash Recovery)**
Every experiment script should save results after each unit of work, and skip already-completed work on restart:
```python
import json, os
from pathlib import Path
def run_experiment(problems, strategies, output_dir):
for problem in problems:
for strategy in strategies:
result_path = Path(output_dir) / problem["id"] / strategy / "result.json"
if result_path.exists():
print(f"Skipping {problem['id']}/{strategy} (already done)")
continue
# Run the experiment
result = execute_strategy(problem, strategy)
# Save immediately
result_path.parent.mkdir(parents=True, exist_ok=True)
with open(result_path, 'w') as f:
json.dump(result, f, indent=2)
```
This pattern makes re-runs safe and efficient. If a process crashes at problem 47/150, restarting skips the first 46.
**2. Artifact Preservation**
Save all intermediate outputs, not just final results. This enables post-hoc analysis without re-running:
```python
def save_pass_artifacts(output_dir, pass_num, artifacts):
"""Save all artifacts from a single pass of an iterative method."""
pass_dir = Path(output_dir) / f"pass_{pass_num:02d}"
pass_dir.mkdir(parents=True, exist_ok=True)
for name, content in artifacts.items():
with open(pass_dir / f"{name}.md", 'w') as f:
f.write(content)
```
**3. Configuration Management**
Use YAML configs for reproducibility:
```yaml
# config.yaml
model: anthropic/claude-sonnet-4-20250514
author_temperature: 0.8
judge_temperature: 0.3
max_tokens: 4096
num_judges: 3
max_passes: 15
convergence_k: 2
```
```python
import yaml
with open("config.yaml") as f:
config = yaml.safe_load(f)
```
**4. Separation of Concerns**
Keep generation, evaluation, and visualization in separate scripts:
| Script | Purpose |
|--------|---------|
| `run_experiment.py` | Core method execution |
| `run_baselines.py` | Baseline comparisons at same compute |
| `run_eval.py` | Blind evaluation / judge panels |
| `analyze_results.py` | Statistical analysis |
| `make_charts.py` | Figure generation |
This lets you re-run evaluation without re-running expensive generation, and regenerate figures without re-running analysis.
---
## Evaluation Protocols
### Blind Judge Panels (for Subjective Tasks)
When evaluating subjective outputs (writing, analysis, recommendations), use a blind judge panel:
```python
import random
def run_blind_evaluation(outputs: dict, task_prompt: str, num_judges: int = 7):
"""
Run blind evaluation of multiple method outputs.
Args:
outputs: {"method_name": "output_text", ...}
task_prompt: The original task description
num_judges: Number of independent judge evaluations
"""
rankings = []
for judge_i in range(num_judges):
# Randomize labels and presentation order per judge
methods = list(outputs.keys())
random.shuffle(methods)
labels = {m: chr(65 + i) for i, m in enumerate(methods)} # A, B, C...
# Present to judge with randomized labels
prompt = f"Task: {task_prompt}\n\n"
for method in methods:
prompt += f"--- Proposal {labels[method]} ---\n{outputs[method]}\n\n"
prompt += "Rank all proposals from best to worst. Format: RANKING: [best], [second], [worst]"
ranking = call_judge(prompt)
rankings.append({"labels": labels, "ranking": ranking})
# Aggregate via Borda count
return compute_borda(rankings)
def compute_borda(rankings, n_methods=3):
"""Borda count: 3/2/1 points for 1st/2nd/3rd."""
scores = {}
points = {0: n_methods, 1: n_methods - 1, 2: n_methods - 2} # Adjust for n_methods
for r in rankings:
for position, method in enumerate(r["ranking"]):
scores[method] = scores.get(method, 0) + points.get(position, 0)
return scores
```
Key design decisions:
- **Randomize both labels AND order** per judge to prevent position bias
- **Use odd number of judges** (3, 5, 7) to break ties
- **Conservative tiebreak**: Incumbent/baseline wins ties (prevents false positives)
- **CoT judges** match non-CoT quality at ~40% cost (1 CoT judge ≈ 3 standard judges)
### Code/Objective Evaluation
For tasks with ground-truth evaluation (code, math, factual):
```python
import subprocess
def evaluate_code(solution: str, test_cases: list, timeout: int = 30):
"""Run code solution against test cases with sandboxed execution."""
results = {"public": [], "private": []}
for test in test_cases:
try:
proc = subprocess.run(
["python3", "-c", solution],
input=test["input"],
capture_output=True,
timeout=timeout,
text=True
)
actual = proc.stdout.strip()
expected = test["expected"].strip()
passed = actual == expected
except subprocess.TimeoutExpired:
passed = False
category = "public" if test.get("public") else "private"
results[category].append(passed)
return {
"public_pass_rate": sum(results["public"]) / max(len(results["public"]), 1),
"private_pass_rate": sum(results["private"]) / max(len(results["private"]), 1),
}
```
### Compute-Matched Comparison
Always compare methods at equal compute budget. If your method uses N API calls, baselines get N calls too:
| Method | Call Budget | Allocation |
|--------|-----------|------------|
| Single pass | 6 calls | 6 independent generations |
| Critique & revise | 6 calls | 1 generate + 5 revise rounds |
| Autoreason | 6 calls | 1 generate + 1 analysis + 4 revisions |
| Best-of-N | 6 calls | 6 independent, pick best on public test |
### Human Evaluation Design
Many ML/NLP papers require human evaluation, especially for subjective tasks (text generation, summarization, dialogue, creative writing). Poorly designed human evals are a common rejection reason.
#### When Human Evaluation Is Required
| Task Type | Required? | Notes |
|-----------|-----------|-------|
| Text generation (open-ended) | Yes | LLM-as-judge alone is insufficient for acceptance at ACL/EMNLP |
| Summarization | Usually | At minimum for a subset of outputs |
| Dialogue systems | Yes | User studies or annotation |
| Code generation | No | Test suites are objective ground truth |
| Classification | No | Standard metrics suffice |
| Any task with subjective quality | Strongly recommended | Strengthens the paper significantly |
#### Annotation Protocol Design
```
Human Evaluation Protocol:
1. Define the evaluation dimensions (fluency, relevance, factual accuracy, etc.)
2. Create annotation guidelines with examples of each score level
3. Run a pilot with 2-3 annotators on 20-30 examples
4. Compute pilot inter-annotator agreement — if low, revise guidelines
5. Run full evaluation
6. Report: annotator count, agreement metrics, compensation, time per item
```
**Evaluation dimensions** (pick relevant subset):
| Dimension | Definition | Scale |
|-----------|-----------|-------|
| Fluency | Grammaticality and naturalness | 1-5 Likert |
| Relevance | Does it address the task? | 1-5 Likert |
| Factual accuracy | Are stated facts correct? | Binary or 1-5 |
| Coherence | Logical flow and consistency | 1-5 Likert |
| Informativeness | Does it provide useful information? | 1-5 Likert |
| Overall preference | Which output is better? | A/B/Tie (pairwise) |
**Pairwise comparison** (preferred over absolute scoring — more reliable):
- Present two outputs side-by-side (randomize left/right position)
- Ask: "Which is better? A / B / Tie"
- More discriminative and less susceptible to annotator calibration drift
#### Inter-Annotator Agreement
Always report agreement metrics. Without them, reviewers assume your annotations are unreliable.
```python
# Krippendorff's alpha (preferred — handles missing data, any scale)
# pip install krippendorffs-alpha
import krippendorff
# Ratings: rows = annotators, columns = items, values = scores
ratings = [
[3, 4, 1, 2, 5, None, 3], # Annotator 1
[3, 5, 1, 3, 5, 2, 3], # Annotator 2
[4, 4, 2, 2, 4, 2, None], # Annotator 3
]
alpha = krippendorff.alpha(reliability_data=ratings, level_of_measurement="ordinal")
print(f"Krippendorff's alpha: {alpha:.3f}")
# Interpretation: >0.80 good, 0.67-0.80 acceptable, <0.67 questionable
```
```python
# Cohen's kappa (for exactly 2 annotators, categorical data)
from sklearn.metrics import cohen_kappa_score
annotator_1 = [1, 2, 3, 1, 2, 3, 2]
annotator_2 = [1, 2, 2, 1, 3, 3, 2]
kappa = cohen_kappa_score(annotator_1, annotator_2)
print(f"Cohen's kappa: {kappa:.3f}")
# Interpretation: >0.80 excellent, 0.60-0.80 substantial, 0.40-0.60 moderate
```
| Metric | When to Use | Annotators | Scale |
|--------|------------|-----------|-------|
| Krippendorff's alpha | Default choice | Any number | Any (ordinal, nominal, ratio) |
| Cohen's kappa | 2 annotators, categorical | Exactly 2 | Nominal/ordinal |
| Fleiss' kappa | 3+ annotators, categorical | 3+ | Nominal |
| Pearson/Spearman | Continuous scores | 2 | Interval/ratio |
#### Crowdsourcing Platforms
| Platform | Best For | Cost | Quality |
|----------|----------|------|---------|
| **Prolific** | Academic research, higher quality | $8-15/hr | High — academic participant pool |
| **MTurk** | Large-scale, fast turnaround | $2-10/hr | Variable — use qualifications |
| **Surge AI** | NLP-specific annotations | Premium | High — trained annotators |
| **Expert annotators** | Domain-specific (medical, legal) | Highest | Highest — but slow |
**Ethics requirements**:
- Report compensation rate (must be at minimum local minimum wage)
- Describe annotator demographics if relevant
- Obtain IRB/ethics approval if required by your institution
- ACL venues explicitly require compensation documentation
#### What to Report in the Paper
```
Human Evaluation Section Checklist:
- [ ] Number of annotators
- [ ] Annotator qualifications / recruitment method
- [ ] Number of items evaluated
- [ ] Evaluation dimensions with definitions
- [ ] Scale used (Likert, pairwise, binary)
- [ ] Inter-annotator agreement (Krippendorff's alpha or Cohen's kappa)
- [ ] Compensation rate
- [ ] Time per annotation item
- [ ] Whether annotators saw model identities (should be blind)
- [ ] Randomization of presentation order
```
---
## Statistical Analysis
### Required Tests
| Test | When to Use | Python |
|------|------------|--------|
| McNemar's test | Comparing two methods on same problems | `scipy.stats.binomtest` for small n |
| Two-proportion z-test | Comparing success rates | Custom or `statsmodels` |
| Fisher's exact test | Small sample pairwise comparison | `scipy.stats.fisher_exact` |
| Bootstrapped CI | Confidence intervals for any metric | Custom bootstrap |
| Cohen's h | Effect size for proportions | Manual calculation |
### Standard Analysis Script
```python
import numpy as np
from scipy import stats
from pathlib import Path
import json
def load_all_results(results_dir):
"""Load all results into a structured format."""
results = {}
for result_file in Path(results_dir).rglob("result.json"):
parts = result_file.relative_to(results_dir).parts
if len(parts) >= 3:
experiment, task, strategy = parts[0], parts[1], parts[2]
data = json.loads(result_file.read_text())
results.setdefault(experiment, {}).setdefault(strategy, {})[task] = data
return results
def pairwise_mcnemar(method_a_results, method_b_results):
"""McNemar's test for paired binary outcomes."""
a_win_b_lose = sum(1 for a, b in zip(method_a_results, method_b_results) if a and not b)
b_win_a_lose = sum(1 for a, b in zip(method_a_results, method_b_results) if b and not a)
n = a_win_b_lose + b_win_a_lose
if n < 25:
# Use exact binomial for small samples
result = stats.binomtest(a_win_b_lose, n, 0.5)
p_value = result.pvalue
else:
# Chi-squared approximation
chi2 = (abs(a_win_b_lose - b_win_a_lose) - 1)**2 / (a_win_b_lose + b_win_a_lose)
p_value = 1 - stats.chi2.cdf(chi2, df=1)
return {
"a_wins": a_win_b_lose,
"b_wins": b_win_a_lose,
"n_discordant": n,
"p_value": p_value,
"significant": p_value < 0.05
}
def bootstrap_ci(data, n_bootstrap=10000, ci=0.95):
"""Bootstrap confidence interval for mean."""
means = []
for _ in range(n_bootstrap):
sample = np.random.choice(data, size=len(data), replace=True)
means.append(np.mean(sample))
lower = np.percentile(means, (1 - ci) / 2 * 100)
upper = np.percentile(means, (1 + ci) / 2 * 100)
return {"mean": np.mean(data), "ci_lower": lower, "ci_upper": upper}
def cohens_h(p1, p2):
"""Cohen's h effect size for two proportions."""
return 2 * np.arcsin(np.sqrt(p1)) - 2 * np.arcsin(np.sqrt(p2))
```
### Reporting Standards
Always include in the paper:
- **Sample sizes**: n=X problems/tasks
- **Number of runs**: K independent runs if applicable
- **Error bars**: Specify standard deviation or standard error
- **Confidence intervals**: 95% CI for key results
- **Significance tests**: p-values for key comparisons
- **Effect sizes**: Cohen's d or h for practical significance
---
## Monitoring (Cron Pattern)
### Cron Prompt Template
For each experiment batch, create a monitoring prompt:
```
Check the status of the [EXPERIMENT_NAME] experiment:
1. Process check: ps aux | grep [PROCESS_PATTERN]
2. Log check: tail -30 [LOG_FILE]
3. Results check: ls [RESULT_DIR]/eval/ (or appropriate result location)
4. If results are available:
- Read the result JSON files
- Report metrics in a table (Borda scores, accuracy, etc.)
- Compute key comparisons between methods
5. If all experiments in this batch are complete:
- git add -A && git commit -m "[COMMIT_MESSAGE]" && git push
- Report final summary
6. Key question: [SPECIFIC ANALYTICAL QUESTION]
If nothing has changed since the last check, respond with [SILENT].
```
### Monitoring Best Practices
1. **Check processes first** — don't read results if the experiment is still running and results are incomplete
2. **Read the log tail** — look for errors, progress indicators, completion messages
3. **Count completed vs expected** — "45/150 problems done" is more useful than "some results exist"
4. **Report in structured tables** — always include key metrics in a table
5. **Answer the key question** — each experiment should have a specific analytical question to answer when done
6. **[SILENT] for no-news** — suppress notifications when nothing has changed
7. **Commit on completion** — every completed batch gets committed with a descriptive message
### Example Monitoring Report
```
## Code Experiments (Haiku 3.5) - COMPLETE
| Strategy | Pass Rate (150 problems) | vs Single |
|----------|------------------------|-----------|
| single_pass | 38.0% | — |
| critique_revise | 35.2% | -2.8pp |
| **autoreason** | **40.0%** | **+2.0pp** |
| best_of_6 | 31.0% | -7.0pp |
Key finding: Autoreason shows +2pp improvement over single pass, while
best-of-6 collapses due to single-public-test selection issue.
Committed: `git commit -m "Add Haiku code results (150 problems, 4 strategies)"`
Next: Run significance tests on these results.
```
---
## Failure Recovery
### Common Failures and Recovery
| Failure | Detection | Recovery |
|---------|-----------|----------|
| **API credit exhaustion** | 402 errors in logs, incomplete results | Top up credits, re-run (skips completed work automatically) |
| **Rate limiting** | 429 errors, slow progress | Add retry logic with exponential backoff |
| **Process crash** | PID gone, log stops mid-problem | Re-run script (resumes from last checkpoint) |
| **Wrong model ID** | Model not found errors | Fix ID (e.g., `claude-opus-4-6` not `claude-opus-4.6`) |
| **Parallel slowdown** | Each experiment taking 2x longer | Reduce parallel experiments to 2-3 max |
| **Security scan blocks** | Commands blocked by security | Use `execute_code` instead of piped `terminal` commands |
| **Delegation failures** | `delegate_task` returns errors | Fall back to doing work directly |
| **Timeout on hard problems** | Process stuck, no log progress | Kill, skip problem, note in results |
| **Dataset path mismatch** | File not found errors | Verify paths before launching |
### Retry Naming Convention
When re-running failed experiments, use a suffix to track rounds:
```
logs/experiment_haiku_0_50.log # Round 1
logs/experiment_haiku_0_50_r2.log # Round 2 (after credit exhaustion)
logs/experiment_haiku_0_50_r3.log # Round 3 (after bug fix)
```
### Pre-Flight Checklist
Before launching any experiment batch:
```
Pre-Flight:
- [ ] API credits sufficient for estimated calls
- [ ] Model IDs correct (test with 1 problem first)
- [ ] Output directory exists and is writable
- [ ] Resume logic works (re-run won't overwrite existing results)
- [ ] Log file path is unique (won't overwrite previous logs)
- [ ] Dataset/task files are accessible
- [ ] Config matches intended experiment
```
---
## Task/Benchmark Design
### Open-Ended Tasks (Subjective Evaluation)
Design tasks that have clear objectives but subjective quality:
```markdown
# Task: [Title]
## Context
[Specific scenario with concrete details: company size, constraints, timeline]
## Deliverable
[Exact format and structure required]
## Requirements
- [Specific, measurable requirements]
- [Not vague — "be comprehensive" is bad, "include exactly 6 sections" is good]
```
### Constrained Tasks (for Testing Scope Effects)
Constrained tasks test whether methods respect scope boundaries. Design with:
- **Fixed facts**: "Use only these N data points, add nothing else"
- **Fixed deliverable**: Specific format (pitch, postmortem, memo — not "improve this")
- **Fixed structure**: "These sections in this order, do not add/remove"
- **Fixed change items**: "Address exactly these N points, nothing else"
**Do NOT use word count as a scope constraint.** Word limits cause false convergence — outputs get rejected for length, not quality. Constrain scope (what to include) not length.
### Example: Good vs Bad Constraints
| Bad Constraint | Why | Good Constraint |
|---------------|-----|-----------------|
| "Max 500 words" | Judges reject for length | "Exactly 4 sections, each with 3 numbered items" |
| "Be concise" | Too vague | "Each prohibition must reference a specific base fact" |
| "Improve this" | Unbounded scope | "Write a 600-word incident postmortem with this exact structure" |
| "Make it better" | No clear criterion | "Address exactly these 3 reviewer concerns" |
---
## Visualization Best Practices
### Setup: SciencePlots + matplotlib
Install SciencePlots for publication-ready defaults:
```bash
pip install SciencePlots matplotlib numpy
```
**Option A: SciencePlots styles** (recommended — handles most defaults automatically):
```python
import matplotlib.pyplot as plt
import scienceplots # registers the styles
# Pick a style:
# 'science' — clean, serif fonts, suitable for most venues
# 'science+ieee' — IEEE-style (good for two-column papers)
# 'science+nature' — Nature-style
# Add 'no-latex' if LaTeX is not installed on the machine generating plots
with plt.style.context(['science', 'no-latex']):
fig, ax = plt.subplots(figsize=(3.5, 2.5)) # single-column width
# ... plot ...
fig.savefig('paper/fig_results.pdf', bbox_inches='tight')
```
**Option B: Manual rcParams** (when you need full control):
```python
import matplotlib.pyplot as plt
plt.rcParams.update({
'font.size': 10,
'font.family': 'serif',
'axes.labelsize': 11,
'axes.titlesize': 11,
'xtick.labelsize': 9,
'ytick.labelsize': 9,
'legend.fontsize': 9,
'figure.figsize': (3.5, 2.5), # single-column default
'figure.dpi': 300,
'savefig.dpi': 300,
'savefig.bbox': 'tight',
'savefig.pad_inches': 0.05,
'axes.linewidth': 0.8,
'lines.linewidth': 1.5,
'lines.markersize': 5,
'axes.grid': True,
'grid.alpha': 0.3,
'grid.linewidth': 0.5,
})
```
### Standard Figure Sizes (Two-Column Format)
| Use Case | figsize | Notes |
|----------|---------|-------|
| Single column | `(3.5, 2.5)` | Fits in one column of two-column layout |
| Double column | `(7.0, 3.0)` | Spans full page width |
| Square (heatmap, confusion matrix) | `(3.5, 3.5)` | Single column |
| Tall single (many rows) | `(3.5, 5.0)` | Use sparingly |
### Colorblind-Safe Palette (Okabe-Ito)
Use this palette for all paper figures. It is distinguishable by people with all common forms of color vision deficiency:
```python
COLORS = {
'blue': '#0072B2',
'orange': '#E69F00',
'green': '#009E73',
'red': '#D55E00',
'purple': '#CC79A7',
'cyan': '#56B4E9',
'yellow': '#F0E442',
'black': '#000000',
}
# As a list for cycling:
COLOR_CYCLE = ['#0072B2', '#D55E00', '#009E73', '#E69F00', '#CC79A7', '#56B4E9']
```
Also differentiate lines by **marker and linestyle**, not just color:
```python
STYLES = [
{'color': '#0072B2', 'marker': 'o', 'linestyle': '-'},
{'color': '#D55E00', 'marker': 's', 'linestyle': '--'},
{'color': '#009E73', 'marker': '^', 'linestyle': '-.'},
{'color': '#E69F00', 'marker': 'D', 'linestyle': ':'},
]
```
### Complete Example: Method Comparison Bar Chart
```python
import matplotlib.pyplot as plt
import numpy as np
try:
import scienceplots
style = ['science', 'no-latex']
except ImportError:
style = 'default'
with plt.style.context(style):
methods = ['Single Pass', 'Critique+Revise', 'Best-of-N', 'Ours']
scores = [73.2, 74.1, 68.5, 77.0]
errors = [2.1, 1.8, 3.2, 1.5]
colors = ['#56B4E9', '#E69F00', '#CC79A7', '#0072B2']
fig, ax = plt.subplots(figsize=(3.5, 2.5))
bars = ax.bar(methods, scores, yerr=errors, capsize=3,
color=colors, edgecolor='black', linewidth=0.5)
# Highlight "Ours"
bars[-1].set_edgecolor('#0072B2')
bars[-1].set_linewidth(1.5)
ax.set_ylabel('Pass Rate (%)')
ax.set_ylim(60, 85)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
fig.savefig('paper/fig_comparison.pdf', bbox_inches='tight')
```
### Complete Example: Convergence/Trajectory Line Chart
```python
with plt.style.context(style):
fig, ax = plt.subplots(figsize=(3.5, 2.5))
passes = np.arange(1, 16)
ours = [65, 72, 78, 82, 85, 87, 88, 89, 89.5, 90, 90, 90, 90, 90, 90]
baseline = [65, 68, 70, 71, 69, 67, 66, 65, 64, 63, 62, 61, 60, 59, 58]
ax.plot(passes, ours, **STYLES[0], label='Ours', markersize=4)
ax.plot(passes, baseline, **STYLES[1], label='Critique+Revise', markersize=4)
# Mark convergence point
ax.axvline(x=10, color='gray', linestyle=':', alpha=0.5, linewidth=0.8)
ax.annotate('Converged', xy=(10, 90), fontsize=8, ha='center',
xytext=(10, 93), arrowprops=dict(arrowstyle='->', color='gray'))
ax.set_xlabel('Iteration')
ax.set_ylabel('Quality Score')
ax.legend(loc='lower right')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
fig.savefig('paper/fig_trajectory.pdf', bbox_inches='tight')
```
### Output Rules
- **Always save as PDF**: `fig.savefig('fig.pdf')` — vector graphics, sharp at any zoom
- **Never save as PNG** for paper figures — raster PNGs look blurry when printed/zoomed
- **Exception**: Screenshots, photographs, or pixel-art visualizations → PNG at 600 DPI
- **Verify grayscale**: Print to grayscale PDF and check all information is still visible
### Chart Types for Common Comparisons
| Comparison Type | Chart | Notes |
|----------------|-------|-------|
| Method vs method | Grouped bar chart | Include error bars |
| Across model sizes | Line chart with CI bands | Log scale for model size axis |
| Ablation study | Stacked/grouped bar | Highlight removed component |
| Trajectory/convergence | Line chart over iterations | Show winner per iteration |
| Per-task breakdown | Heatmap or grouped bar | Show variance across tasks |

View File

@@ -105,7 +105,7 @@ Reviewers are explicitly instructed to:
- Penalizing authors for honest limitation acknowledgment
- Rejecting for missing citations to reviewer's own work
### Timeline (NeurIPS 2025)
### Timeline (NeurIPS 2025 — verify dates for current year)
- Bidding: May 17-21
- Reviewing period: May 29 - July 2
@@ -113,6 +113,8 @@ Reviewers are explicitly instructed to:
- Discussion period: July 31 - August 13
- Final notifications: September 18
> **Note**: These dates are from the 2025 cycle. Always check the current year's call for papers at the venue website.
---
## ICML Reviewer Guidelines
@@ -198,6 +200,70 @@ ACL has a dedicated ethics review process for:
---
## AAAI Reviewer Guidelines
### Evaluation Criteria
AAAI reviewers evaluate along similar axes to NeurIPS/ICML but with some differences:
| Criterion | Weight | Notes |
|-----------|--------|-------|
| **Technical quality** | High | Soundness of approach, correctness of results |
| **Significance** | High | Importance of the problem and contribution |
| **Novelty** | Medium-High | New ideas, methods, or insights |
| **Clarity** | Medium | Clear writing, well-organized presentation |
| **Reproducibility** | Medium | Sufficient detail to reproduce results |
### AAAI-Specific Considerations
- **Broader AI scope**: AAAI covers all of AI, not just ML. Papers on planning, reasoning, knowledge representation, NLP, vision, robotics, and multi-agent systems are all in scope. Reviewers may not be deep ML specialists.
- **Formatting strictness**: AAAI reviewers are instructed to flag formatting violations. Non-compliant papers may be desk-rejected before review.
- **Application papers**: AAAI is more receptive to application-focused work than NeurIPS/ICML. Framing a strong application contribution is viable.
- **Senior Program Committee**: AAAI uses SPCs (Senior Program Committee members) who mediate between reviewers and make accept/reject recommendations.
### Scoring (AAAI Scale)
- **Strong Accept**: Clearly above threshold, excellent contribution
- **Accept**: Above threshold, good contribution with minor issues
- **Weak Accept**: Borderline, merits outweigh concerns
- **Weak Reject**: Borderline, concerns outweigh merits
- **Reject**: Below threshold, significant issues
- **Strong Reject**: Well below threshold
---
## COLM Reviewer Guidelines
### Evaluation Criteria
COLM reviews focus on relevance to language modeling in addition to standard criteria:
| Criterion | Weight | Notes |
|-----------|--------|-------|
| **Relevance** | High | Must be relevant to language modeling community |
| **Technical quality** | High | Sound methodology, well-supported claims |
| **Novelty** | Medium-High | New insights about language models |
| **Clarity** | Medium | Clear presentation, reproducible |
| **Significance** | Medium-High | Impact on LM research and practice |
### COLM-Specific Considerations
- **Language model focus**: Reviewers will assess whether the contribution advances understanding of language models. General ML contributions need explicit LM framing.
- **Newer venue norms**: COLM is newer than NeurIPS/ICML, so reviewer calibration varies more. Write more defensively — anticipate a wider range of reviewer expertise.
- **ICLR-derived process**: Review process is modeled on ICLR (open reviews, author response period, discussion among reviewers).
- **Broad interpretation of "language modeling"**: Includes training, evaluation, alignment, safety, efficiency, applications, theory, multimodality (if language is central), and social impact of LMs.
### Scoring
COLM uses an ICLR-style scoring system:
- **8-10**: Strong accept (top papers)
- **6-7**: Weak accept (solid contribution)
- **5**: Borderline
- **3-4**: Weak reject (below threshold)
- **1-2**: Strong reject
---
## What Makes Reviews Strong
### Following Daniel Dennett's Rules

View File

@@ -225,8 +225,6 @@ Provide context before asking the reader to consider anything new. This applies
---
---
## Micro-Level Writing Tips
### From Ethan Perez (Anthropic)

Some files were not shown because too many files have changed in this diff Show More