The dashboard plugin gets the last layer of features that turn it from a
'usable read surface with drag-drop' into a 'full kanban UI' — no more
'drop to CLI to do X' moments from inside the tab.
Plugin backend
- POST /tasks/bulk — apply the same patch (status / archive / assignee
/ priority) to every id in the request body. Each id runs
independently: one bad id reports {ok: false, error: ...} without
aborting siblings. Status transitions that aren't legal for the
current state are surfaced per-id ('transition to done refused').
Used by the multi-select bulk action bar.
- GET /config — returns the dashboard.kanban section of config.yaml
(default_tenant, lane_by_profile, include_archived_by_default,
render_markdown) with sensible defaults when the section is absent.
Loaded once by the SPA to preselect filters and toggle markdown
rendering.
- _conn() helper — every handler now goes through it, calling
kanban_db.init_db() (idempotent) before every connection. Fresh
installs work whether the first hit is GET /board, POST /tasks, or
any other endpoint — no more 'no such table: tasks' when the CLI
or a script hits the plugin before the dashboard has ever loaded.
Plugin UI (plugin bundle, +~12 KB)
- Multi-select: per-card checkbox; shift/ctrl-click also toggles
without opening the drawer. A BulkActionBar appears above the
columns with batch → ready / complete / archive / reassign
(profile dropdown + unassign option). Destructive batches confirm
first. Partial failures from the backend are surfaced inline.
- Drawer inline editing:
- Click the title → TitleEditor swaps in an input, Enter saves,
Escape cancels.
- Click the Assignee meta row → AssigneeEditor input (empty string
unassigns).
- Click the Priority meta row → PriorityEditor numeric input.
- New 'edit' button on Description → full-width textarea; Save /
Cancel switch back to rendered view.
- Dependency editor: chip list of parents + children with per-chip
× button (calls DELETE /links). Add-parent / add-child dropdowns
filter out self + already-linked tasks so you cannot re-add a
duplicate edge or a self-loop. Cycle rejections from the server
surface cleanly via the existing error banner.
- Parent selection in InlineCreate: new dropdown listing every task
on the board ('{id} — {title}') — picking one sends parents=[id]
with the create payload, so the task lands in todo (or triage if
created from the Triage column) with the dependency wired up.
- Safe markdown rendering for description, comment bodies, and
result. A small in-bundle renderer handles headings, bold, italic,
inline code, fenced code, bullet lists, and http(s)/mailto links.
Every substitution runs on HTML-escaped input (no raw HTML), links
get target=_blank + rel=noopener,noreferrer. Disabled by config
key dashboard.kanban.render_markdown=false (falls back to <pre>).
- Touch drag-drop: attachTouchDrag() installs a pointerdown handler
that spawns a drag proxy, tracks elementFromPoint under the finger,
and dispatches a hermes-kanban:drop CustomEvent on the column when
released. Desktop continues to use native HTML5 DnD. Columns
listen for both.
- ErrorBoundary already present from the prior commit catches any
renderer throw; markdown escape + touch-proxy cleanup both have
their own try/finally.
Tests (tests/plugins/test_kanban_dashboard_plugin.py — 90/90 pass)
- bulk_status_ready: 3 tasks blocked, batch → ready, all move
- bulk_archive hides all ids from default board
- bulk_reassign changes every assignee
- bulk_unassign_via_empty_string sets assignee back to None
- bulk_partial_failure_doesnt_abort_siblings: bogus id in middle,
good siblings still get priority=7
- bulk_empty_ids_400
- config_returns_defaults_when_section_missing
- config_reads_dashboard_kanban_section (writes config.yaml, verifies
every key round-trips)
Live smoke (real FastAPI app + isolated HERMES_HOME):
- /config without section returns defaults
- /config with dashboard.kanban section returns the configured values
- POST /tasks as the first-ever request (no prior /board) succeeds —
auto-init handles it
- Link add + remove via POST /links + DELETE /links round-trip
- Bulk priority bump on 2 ids, both get priority=5
- Bulk archive hides ids from default board
- PATCH {title, body} updates the task, markdown source survives
the round trip
- POST /tasks {triage: true, parents: [id]} lands in triage, not todo
- Bulk partial: 2 good + 1 bogus returns per-id outcome
Docs (website/docs/user-guide/features/kanban.md)
- 'What the plugin gives you' rewritten to reflect bulk, drawer
edit, dep editor, parent-on-create, markdown, touch drag-drop.
- New 'Dashboard config' subsection with a YAML example for
dashboard.kanban.*.
- REST table gains /tasks/bulk and /config rows.
Follows up on the initial dashboard plugin with the items called out
during self-review — ships the GUI-reality claims the PR body made,
closes the WebSocket auth gap, and lands the 'Triage' status the design
spec's Fusion-style screenshot leads with.
Kernel changes
- kanban_db.VALID_STATUSES gains 'triage'. status is TEXT without a
CHECK constraint so no schema migration is needed.
- create_task(triage=True) forces the initial status to 'triage'
regardless of parents, and parent ids are still validated so the
eventual link rows don't dangle. recompute_ready() only promotes
'todo' -> 'ready', so triage tasks are naturally isolated from the
dispatcher pipeline.
- hermes kanban create gains --triage.
Patterns table (docs) gains P9 'Triage specifier'.
Plugin backend (plugins/kanban/dashboard/plugin_api.py)
- GET /board now auto-init's kanban.db on first read (idempotent).
A fresh install shows an empty board instead of 'failed to load'.
- GET /board returns a new 'progress' field per task — {done, total}
of child-task completion, or None if the task has no children.
- BOARD_COLUMNS prepends 'triage'.
- POST /tasks accepts {triage: bool}; PATCH /tasks/:id accepts
{status: 'triage'}.
- WebSocket /events now requires ?token=<session_token> as a query
param — browsers can't set Authorization on a WS upgrade, so this
matches the pattern the in-browser PTY bridge uses. Constant-time
compare against hermes_cli.web_server._SESSION_TOKEN. In bare-test
contexts (no dashboard module) the check no-ops so the tail loop
stays testable. Security boundary documented in the module header
and in website/docs/user-guide/features/kanban.md.
Plugin UI (plugins/kanban/dashboard/dist/index.js + style.css)
- Adds the Triage column (lilac dot) with helper text
'Raw ideas — a specifier will flesh out the spec'. Inline-create
from the Triage column parks new tasks in triage.
- Status action row in the drawer gains '→ triage'.
- Progress pill (N/M) on cards that have children. Full-complete
state tints the pill green.
- 'Lanes by profile' toolbar toggle — sub-groups the Running column
by assignee so you see at a glance which specialist is busy on
what.
- Destructive status moves (done / archived / blocked) via drag-drop
OR via the drawer action row now prompt for confirmation.
- Escape closes the drawer.
- Live-update reloads are debounced (250ms) so a burst of
task_events triggers one refetch, not N.
- WebSocket includes ?token= built from window.__HERMES_SESSION_TOKEN__.
- WebSocket reconnect uses exponential backoff capped at 30s, not
a fixed 1.5s spin loop, and surfaces a user-visible error on
code-1008 (auth rejected) instead of reconnecting forever.
- ErrorBoundary wraps the page — a bad card render shows a
'rendering error, reload view' card instead of crashing the tab.
Tests (tests/plugins/test_kanban_dashboard_plugin.py, +5 tests = 21)
- empty-board shape now asserts all 6 columns including 'triage'
- create_triage_lands_in_triage_column
- triage_task_not_promoted_to_ready (dispatcher bypasses triage)
- patch_status_triage_works (both into triage and out of it)
- board_progress_rollup (0/2 -> 1/2 -> childless cards = None)
- board_auto_initializes_missing_db
- ws_events_rejects_when_token_required (three sub-assertions:
missing → 1008, wrong → 1008, correct → handshake accepted)
All 82 kanban tests pass under scripts/run_tests.sh.
Docs
- kanban.md 'What the plugin gives you' fully rewritten to match
shipped reality (triage, progress pill, assignee lanes,
destructive-confirm, Escape-close, debounce).
- New 'Security model' subsection documents the explicit-plugin-
route-bypass, the WS token requirement, and the --host 0.0.0.0
warning; also notes that kanban.db is profile-agnostic on purpose
(the coordination primitive) so cross-profile visibility is
expected.
- CLI command reference shows --triage.
- Collaboration patterns table adds P9 'Triage specifier'.
Ships plugins/kanban/dashboard/ as a bundled dashboard plugin. No core
changes — uses the standard dashboard plugin contract (manifest.json +
dist/index.js + plugin_api.py) documented in 'Extending the Dashboard'.
What the tab gives you:
- One column per kanban status (todo / ready / running / blocked / done;
archived behind a toggle), column counts, coloured status dots.
- Cards with id, title, priority badge, tenant tag, assignee,
comment/link counts, 'created N ago'.
- HTML5 drag-drop between columns — status change routes through the
same kanban_db code the CLI /kanban verbs use, so the three surfaces
(CLI, gateway, dashboard) can never drift.
- Inline create per-column (title, assignee, priority).
- Side drawer on card click: description, status action row
(→ ready / → running / block / unblock / complete / archive),
dependency links, comment thread with Enter-to-submit,
last 20 events.
- Toolbar: search, tenant filter, assignee filter, show-archived,
nudge-dispatcher (skip the 60s wait), refresh.
- Live updates via WebSocket tailing task_events — the board reflects
CLI or gateway actions in real time.
REST surface under /api/plugins/kanban/: GET /board, GET /tasks/:id,
POST /tasks, PATCH /tasks/:id, POST /tasks/:id/comments, POST /links,
DELETE /links, POST /dispatch, WS /events. Every handler is a thin
wrapper around kanban_db — no new business logic.
Visually theme-aware: the plugin CSS reads only --color-*, --radius,
--font-mono etc. so it reskins with whichever dashboard theme is active.
Tests (tests/plugins/test_kanban_dashboard_plugin.py, 16 tests):
- empty board shape
- create + appears in ready column with tenant/assignee rollups
- tenant filter
- detail includes parents/children/events
- 404 on unknown task
- PATCH status: complete / block / unblock / ready drag-drop / running
- PATCH reassign, priority, edit, invalid-status rejection
- POST comment (plus empty-body rejection)
- POST link + DELETE link + cycle rejection
- POST dispatch (dry run)
All 76 kanban tests pass under scripts/run_tests.sh.
Docs: website/docs/user-guide/features/kanban.md gains a full
'Dashboard (GUI)' section covering install, architecture, REST surface,
live-updates mechanism, extending, and scope boundary.
Adds an optional bank_id_template config that derives the bank name at
initialize() time from runtime context. Existing users with a static
bank_id keep the current behavior (template is empty by default).
Supported placeholders:
{profile} — active Hermes profile (agent_identity kwarg)
{workspace} — Hermes workspace (agent_workspace kwarg)
{platform} — cli, telegram, discord, etc.
{user} — platform user id (gateway sessions)
{session} — session id
Unsafe characters in placeholder values are sanitized, and empty
placeholders collapse cleanly (e.g. "hermes-{user}" with no user
becomes "hermes"). If the template renders empty, the static bank_id
is used as a fallback.
Common uses:
bank_id_template: hermes-{profile} # isolate per Hermes profile
bank_id_template: {workspace}-{profile} # workspace + profile scoping
bank_id_template: hermes-{user} # per-user banks for gateway
Reusing session_id as document_id caused data loss on /resume: when
the session is loaded again, _session_turns starts empty and the next
retain replaces the entire previously stored content.
Now each process lifecycle gets its own document_id formed as
{session_id}-{startup_timestamp}, so:
- Same session, same process: turns accumulate into one document (existing behavior)
- Resume (new process, same session): writes a new document, old one preserved
- Forks: child process gets its own document; parent's doc is untouched
Also adds session lineage tags so all processes for the same session
(or its parent) can still be filtered together via recall:
- session:<session_id> on every retain
- parent:<parent_session_id> when initialized with parent_session_id
Closes#6602
The existing test_local_embedded_setup_materializes_profile_env expected
exact equality on ~/.hermes/.env content; the new HINDSIGHT_TIMEOUT=120
line from the timeout feature now appears in that file. Append it to the
expected string so the test reflects the new post_setup output.
The module-global `_loop` / `_loop_thread` pair is shared across every
`HindsightMemoryProvider` instance in the process — the plugin loader
creates one provider per `AIAgent`, and the gateway creates one `AIAgent`
per concurrent chat session (Telegram/Discord/Slack/CLI).
`HindsightMemoryProvider.shutdown()` stopped the shared loop when any one
session ended. That stranded the aiohttp `ClientSession` and `TCPConnector`
owned by every sibling provider on a now-dead loop — they were never
reachable for close and surfaced as the `Unclosed client session` /
`Unclosed connector` warnings reported in #11923.
Fix: stop stopping the shared loop in `shutdown()`. Per-provider cleanup
still closes that provider's own client via `self._client.aclose()`. The
loop runs on a daemon thread and is reclaimed on process exit; keeping
it alive between provider shutdowns means sibling providers can drain
their own sessions cleanly.
Regression tests in `tests/plugins/memory/test_hindsight_provider.py`
(`TestSharedEventLoopLifecycle`):
- `test_shutdown_does_not_stop_shared_event_loop` — two providers share
the loop; shutting down one leaves the loop live for the other. This
test reproduces the #11923 leak on `main` and passes with the fix.
- `test_client_aclose_called_on_cloud_mode_shutdown` — each provider's
own aiohttp session is still closed via `aclose()`.
Fixes#11923.
The agent-facing image_generate tool only passes prompt + aspect_ratio to
provider.generate() (see tools/image_generation_tool.py:953). The editing
block (reference_images / edit_image kwargs) could never fire from the
tool surface, and the xAI edits endpoint is /images/edits with a
different payload shape anyway — not /images/generations as submitted.
- Remove reference_images / edit_image kwargs handling from generate()
- Remove matching test_with_reference_images case
- Update docstring + plugin.yaml description to text-to-image only
- Surface resolution in the success extras
Follow-up to PR #14547. Tests: 18/18 pass.
New built-in image_gen backend at plugins/image_gen/openai-codex/ that
exposes the same gpt-image-2 low/medium/high tier catalog as the
existing 'openai' plugin, but routes generation through the ChatGPT/
Codex Responses image_generation tool path. Available whenever the user
has Codex OAuth signed in; no OPENAI_API_KEY required.
The two plugins are independent — users select between them via
'hermes tools' → Image Generation, and image_gen.provider in
config.yaml. The existing 'openai' (API-key) plugin is unchanged.
Reuses _read_codex_access_token() and _codex_cloudflare_headers() from
agent.auxiliary_client so token expiry / cred-pool / Cloudflare
originator handling stays in one place.
Inspired by #14047 by @Hygaard, but re-implemented as a separate
plugin instead of an in-place fork of the openai plugin.
Closes#11195
- Add configurable retain_tags / retain_source / retain_user_prefix /
retain_assistant_prefix knobs for native Hindsight.
- Thread gateway session identity (user_name, chat_id, chat_name,
chat_type, thread_id) through AIAgent and MemoryManager into
MemoryProvider.initialize kwargs so providers can scope and tag
retained memories.
- Hindsight attaches the new identity fields as retain metadata,
merges per-call tool tags with configured default tags, and uses
the configurable transcript labels for auto-retained turns.
Co-authored-by: Abner <abner.the.foreman@agentmail.to>
* feat(plugins): pluggable image_gen backends + OpenAI provider
Adds a ImageGenProvider ABC so image generation backends register as
bundled plugins under `plugins/image_gen/<name>/`. The plugin scanner
gains three primitives to make this work generically:
- `kind:` manifest field (`standalone` | `backend` | `exclusive`).
Bundled `kind: backend` plugins auto-load — no `plugins.enabled`
incantation. User-installed backends stay opt-in.
- Path-derived keys: `plugins/image_gen/openai/` gets key
`image_gen/openai`, so a future `tts/openai` cannot collide.
- Depth-2 recursion into category namespaces (parent dirs without a
`plugin.yaml` of their own).
Includes `OpenAIImageGenProvider` as the first consumer (gpt-image-1.5
default, plus gpt-image-1, gpt-image-1-mini, DALL-E 3/2). Base64
responses save to `$HERMES_HOME/cache/images/`; URL responses pass
through.
FAL stays in-tree for this PR — a follow-up ports it into
`plugins/image_gen/fal/` so the in-tree `image_generation_tool.py`
slims down. The dispatch shim in `_handle_image_generate` only fires
when `image_gen.provider` is explicitly set to a non-FAL value, so
existing FAL setups are untouched.
- 41 unit tests (scanner recursion, kind parsing, gate logic,
registry, OpenAI payload shapes)
- E2E smoke verified: bundled plugin autoloads, registers, and
`_handle_image_generate` routes to OpenAI when configured
* fix(image_gen/openai): don't send response_format to gpt-image-*
The live API rejects it: 'Unknown parameter: response_format'
(verified 2026-04-21 with gpt-image-1.5). gpt-image-* models return
b64_json unconditionally, so the parameter was both unnecessary and
actively broken.
* feat(image_gen/openai): gpt-image-2 only, drop legacy catalog
gpt-image-2 is the latest/best OpenAI image model (released 2026-04-21)
and there's no reason to expose the older gpt-image-1.5 / gpt-image-1 /
dall-e-3 / dall-e-2 alongside it — slower, lower quality, or awkward
(dall-e-2 squares only). Trim the catalog down to a single model.
Live-verified end-to-end: landscape 1536x1024 render of a Moog-style
synth matches prompt exactly, 2.4MB PNG saved to cache.
* feat(image_gen/openai): expose gpt-image-2 as three quality tiers
Users pick speed/fidelity via the normal model picker instead of a
hidden quality knob. All three tier IDs resolve to the single underlying
gpt-image-2 API model with a different quality parameter:
gpt-image-2-low ~15s fast iteration
gpt-image-2-medium ~40s default
gpt-image-2-high ~2min highest fidelity
Live-measured on OpenAI's API today: 15.4s / 40.8s / 116.9s for the
same 1024x1024 prompt.
Config:
image_gen.openai.model: gpt-image-2-high
# or
image_gen.model: gpt-image-2-low
# or env var for scripts/tests
OPENAI_IMAGE_MODEL=gpt-image-2-medium
Live-verified end-to-end with the low tier: 18.8s landscape render of a
golden retriever in wildflowers, vision-confirmed exact match.
* feat(tools_config): plugin image_gen providers inject themselves into picker
'hermes tools' → Image Generation now shows plugin-registered backends
alongside Nous Subscription and FAL.ai without tools_config.py needing
to know about them. OpenAI appears as a third option today; future
backends appear automatically as they're added.
Mechanism:
- ImageGenProvider gains an optional get_setup_schema() hook
(name, badge, tag, env_vars). Default derived from display_name.
- tools_config._plugin_image_gen_providers() pulls the schemas from
every registered non-FAL plugin provider.
- _visible_providers() appends those rows when rendering the Image
Generation category.
- _configure_provider() handles the new image_gen_plugin_name marker:
writes image_gen.provider and routes to the plugin's list_models()
catalog for the model picker.
- _toolset_needs_configuration_prompt('image_gen') stops demanding a
FAL key when any plugin provider reports is_available().
FAL is skipped in the plugin path because it already has hardcoded
TOOL_CATEGORIES rows — when it gets ported to a plugin in a follow-up
PR the hardcoded rows go away and it surfaces through the same path
as OpenAI.
Verified live: picker shows Nous Subscription / FAL.ai / OpenAI.
Picking OpenAI prompts for OPENAI_API_KEY, then shows the
gpt-image-2-low/medium/high model picker sourced from the plugin.
397 tests pass across plugins/, tools_config, registry, and picker.
* fix(image_gen): close final gaps for plugin-backend parity with FAL
Two small places that still hardcoded FAL:
- hermes_cli/setup.py status line: an OpenAI-only setup showed
'Image Generation: missing FAL_KEY'. Now probes plugin providers
and reports '(OpenAI)' when one is_available() — or falls back to
'missing FAL_KEY or OPENAI_API_KEY' if nothing is configured.
- image_generate tool schema description: said 'using FAL.ai, default
FLUX 2 Klein 9B'. Rewrote provider-neutral — 'backend and model are
user-configured' — and notes the 'image' field can be a URL or an
absolute path, which the gateway delivers either way via
extract_local_files().
Plugins now require explicit consent to load. Discovery still finds every
plugin — user-installed, bundled, and pip — so they all show up in
`hermes plugins` and `/plugins`, but the loader only instantiates
plugins whose name appears in `plugins.enabled` in config.yaml. This
removes the previous ambient-execution risk where a newly-installed or
bundled plugin could register hooks, tools, and commands on first run
without the user opting in.
The three-state model is now explicit:
enabled — in plugins.enabled, loads on next session
disabled — in plugins.disabled, never loads (wins over enabled)
not enabled — discovered but never opted in (default for new installs)
`hermes plugins install <repo>` prompts "Enable 'name' now? [y/N]"
(defaults to no). New `--enable` / `--no-enable` flags skip the prompt
for scripted installs. `hermes plugins enable/disable` manage both lists
so a disabled plugin stays explicitly off even if something later adds
it to enabled.
Config migration (schema v20 → v21): existing user plugins already
installed under ~/.hermes/plugins/ (minus anything in plugins.disabled)
are auto-grandfathered into plugins.enabled so upgrades don't silently
break working setups. Bundled plugins are NOT grandfathered — even
existing users have to opt in explicitly.
Also: HERMES_DISABLE_BUNDLED_PLUGINS env var removed (redundant with
opt-in default), cmd_list now shows bundled + user plugins together with
their three-state status, interactive UI tags bundled entries
[bundled], docs updated across plugins.md and built-in-plugins.md.
Validation: 442 plugin/config tests pass. E2E: fresh install discovers
disk-cleanup but does not load it; `hermes plugins enable disk-cleanup`
activates hooks; migration grandfathers existing user plugins correctly
while leaving bundled plugins off.
The original name was cute but non-obvious; disk-cleanup says what it
does. Plugin directory, script, state path, log lines, slash command,
and test module all renamed. No user-visible state exists yet, so no
migration path is needed.
New website page "Built-in Plugins" documents the <repo>/plugins/<name>/
source, how discovery interacts with user/project plugins, the
HERMES_DISABLE_BUNDLED_PLUGINS escape hatch, disk-cleanup's hook
behaviour and deletion rules, and guidance on when a plugin belongs
bundled vs. user-installable. Added to the Features → Core sidebar next
to the main Plugins page, with a cross-reference from plugins.md.
Rewires @LVT382009's disk-guardian (PR #12212) from a skill-plus-script
into a plugin that runs entirely via hooks — no agent compliance needed.
- post_tool_call hook auto-tracks files created by write_file / terminal
/ patch when they match test_/tmp_/*.test.* patterns under HERMES_HOME
- on_session_end hook runs cmd_quick cleanup when test files were
auto-tracked during the turn; stays quiet otherwise
- /disk-guardian slash command keeps status / dry-run / quick / deep /
track / forget for manual use
- Deterministic cleanup rules, path safety, atomic writes, and audit
logging preserved from the original contribution
- Protect well-known top-level state dirs (logs/, memories/, sessions/,
cron/, cache/, etc.) from empty-dir removal so fresh installs don't
get gutted on first session end
The plugin system gains a bundled-plugin discovery path (<repo>/plugins/
<name>/) alongside user/project/entry-point sources. Memory and
context_engine subdirs are skipped — they keep their own discovery
paths. HERMES_DISABLE_BUNDLED_PLUGINS=1 suppresses the scan; the test
conftest sets it by default so existing plugin tests stay clean.
Co-authored-by: LVT382009 <levantam.98.2324@gmail.com>
Cuts shard-3 local runtime in half by neutralizing real wall-clock
waits across three classes of slow test:
## 1. Retry backoff mocks
- tests/run_agent/conftest.py (NEW): autouse fixture mocks
jittered_backoff to 0.0 so the `while time.time() < sleep_end`
busy-loop exits immediately. No global time.sleep mock (would
break threading tests).
- test_anthropic_error_handling, test_413_compression,
test_run_agent_codex_responses, test_fallback_model: per-file
fixtures mock time.sleep / asyncio.sleep for retry / compression
paths.
- test_retaindb_plugin: cap the retaindb module's bound time.sleep
to 0.05s via a per-test shim (background writer-thread retries
sleep 2s after errors; tests don't care about exact duration).
Plus replace arbitrary time.sleep(N) waits with short polling
loops bounded by deadline.
## 2. Subprocess sleeps in production code
- test_update_gateway_restart: mock time.sleep. Production code
does time.sleep(3) after `systemctl restart` to verify the
service survived. Tests mock subprocess.run \u2014 nothing actually
restarts \u2014 so the wait is dead time.
## 3. Network / IMDS timeouts (biggest single win)
- tests/conftest.py: add AWS_EC2_METADATA_DISABLED=true plus
AWS_METADATA_SERVICE_TIMEOUT=1 and ATTEMPTS=1. boto3 falls back
to IMDS (169.254.169.254) when no AWS creds are set. Any test
hitting has_aws_credentials() / resolve_aws_auth_env_var() (e.g.
test_status, test_setup_copilot_acp, anything that touches
provider auto-detect) burned ~2-4s waiting for that to time out.
- test_exit_cleanup_interrupt: explicitly mock
resolve_runtime_provider which was doing real network auto-detect
(~4s). Tests don't care about provider resolution \u2014 the agent
is already mocked.
- test_timezone: collapse the 3-test "TZ env in subprocess" suite
into 2 tests by checking both injection AND no-leak in the same
subprocess spawn (was 3 \u00d7 3.2s, now 2 \u00d7 4s).
## Validation
| Test | Before | After |
|---|---|---|
| test_anthropic_error_handling (8 tests) | ~80s | ~15s |
| test_413_compression (14 tests) | ~18s | 2.3s |
| test_retaindb_plugin (67 tests) | ~13s | 1.3s |
| test_status_includes_tavily_key | 4.0s | 0.05s |
| test_setup_copilot_acp_skips_same_provider_pool_step | 8.0s | 0.26s |
| test_update_gateway_restart (5 tests) | ~18s total | ~0.35s total |
| test_exit_cleanup_interrupt (2 tests) | 8s | 1.5s |
| **Matrix shard 3 local** | **108s** | **50s** |
No behavioral contract changed \u2014 tests still verify retry happens,
service restart logic runs, etc.; they just don't burn real seconds
waiting for it.
Supersedes PR #11779 (those changes are included here).
First pass of test-suite reduction to address flaky CI and bloat.
Removed tests that fall into these change-detector patterns:
1. Source-grep tests (tests/gateway/test_feishu.py, test_email.py): tests
that call inspect.getsource() on production modules and grep for string
literals. Break on any refactor/rename even when behavior is correct.
2. Platform enum tautologies (every gateway/test_X.py): assertions like
`Platform.X.value == 'x'` duplicated across ~9 adapter test files.
3. Toolset/PLATFORM_HINTS/setup-wizard registry-presence checks: tests that
only verify a key exists in a dict. Data-layout tests, not behavior.
4. Argparse wiring tests (test_argparse_flag_propagation, test_subparser_routing
_fallback): tests that do parser.parse_args([...]) then assert args.field.
Tests Python's argparse, not our code.
5. Pure dispatch tests (test_plugins_cmd.TestPluginsCommandDispatch): patch
cmd_X, call plugins_command with matching action, assert mock called.
Tests the if/elif chain, not behavior.
6. Kwarg-to-mock verification (test_auxiliary_client ~45 tests,
test_web_tools_config, test_gemini_cloudcode, test_retaindb_plugin): tests
that mock the external API client, call our function, and assert exact
kwargs. Break on refactor even when behavior is preserved.
7. Schedule-internal "function-was-called" tests (acp/test_server scheduling
tests): tests that patch own helper method, then assert it was called.
Kept behavioral tests throughout: error paths (pytest.raises), security
tests (path traversal, SSRF, redaction), message alternation invariants,
provider API format conversion, streaming logic, memory contract, real
config load/merge tests.
Net reduction: 169 tests removed. 38 empty classes cleaned up.
Collected before: 12,522 tests
Collected after: 12,353 tests
Port missing features from the hindsight-hermes external integration
package into the native plugin. Only touches plugin files — no core
changes.
Features:
- Tags on retain/recall (tags, recall_tags, recall_tags_match)
- Recall config (recall_max_tokens, recall_max_input_chars, recall_types,
recall_prompt_preamble)
- Retain controls (retain_every_n_turns, auto_retain, auto_recall,
retain_async via aretain_batch, retain_context)
- Bank config via Banks API (bank_mission, bank_retain_mission)
- Structured JSON retain with per-message timestamps
- Full session accumulation with document_id for dedup
- Custom post_setup() wizard with curses picker
- Mode-aware dep install (hindsight-client for cloud, hindsight-all for local)
- local_external mode and openai_compatible LLM provider
- OpenRouter support with auto base URL
- Auto-upgrade of hindsight-client to >=0.4.22 on session start
- Comprehensive debug logging across all operations
- 46 unit tests
- Updated README and website docs
Based on PR #5413 spec by MaheshtheDev (Mahesh Sanikommu).
Changes:
- Add search_mode config (hybrid/memories/documents) passed to SDK
- Add {identity} template support in container_tag for profile-scoped containers
- Add SUPERMEMORY_CONTAINER_TAG env var override (priority over config)
- Add multi-container mode: enable_custom_container_tags, custom_containers,
custom_container_instructions in supermemory.json
- Dynamic tool schemas when multi-container enabled (optional container_tag param)
- Whitelist validation for custom container tags in tool calls
- Simplify get_config_schema() to only prompt for API key during setup
- Defer container_tag sanitization to initialize() (after template resolution)
- Add custom_id support to documents.add calls
- Update README with multi-container docs, search_mode, identity template,
support links (Discord, email)
- Update memory-providers.md with new features and multi-container example
- Update memory-provider-plugin.md with minimal vs full schema guidance
- Add 12 new tests covering identity template, search_mode, multi-container,
config schema, and env var override
Consolidated salvage from PRs #5301 (qaqcvc), #5339 (lance0),
#5058 and #5098 (maymuneth).
Mem0 API v2 compatibility (#5301):
- All reads use filters={user_id: ...} instead of bare user_id= kwarg
- All writes use filters with user_id + agent_id for attribution
- Response unwrapping for v2 dict format {results: [...]}
- Split _read_filters() vs _write_filters() — reads are user-scoped
only for cross-session recall, writes include agent_id
- Preserved 'hermes-user' default (no breaking change for existing users)
- Omitted run_id scoping from #5301 — cross-session memory is Mem0's
core value, session-scoping reads would defeat that purpose
Memory prefetch context fencing (#5339):
- Wraps prefetched memory in <memory-context> fenced blocks with system
note marking content as recalled context, NOT user input
- Sanitizes provider output to strip fence-escape sequences, preventing
injection where memory content breaks out of the fence
- API-call-time only — never persisted to session history
Secret redaction (#5058, #5098):
- Added prefix patterns for Groq (gsk_), Matrix (syt_), RetainDB
(retaindb_), Hindsight (hsk-), Mem0 (mem0_), ByteRover (brv_)