A design draft for giving Hermes Agent a first-class `HERMES_HOME/workspace` that can be indexed, embedded, searched, and selectively injected into the current turn.
This is meant to refine and partially supersede the older planning in:
-#531 User Workspace & Knowledge Base
-#844 Knowledgebase RAG System
It keeps the good parts of both issues, updates the model/storage recommendations, and aligns the design with current agent and RAG practice.
---
## Goal
Add a local-first workspace at `Path(os.getenv("HERMES_HOME", "~/.hermes")) / "workspace"` where users can drop notes, docs, code, PDFs, and reference material, and Hermes can:
1. index it incrementally
2. retrieve relevant chunks with hybrid search
3. optionally rerank results
4. inject only the best chunks into the current turn
5. cite sources clearly
6. do all of this without breaking prompt caching or message-flow invariants
## Non-goals
- Replacing `search_files`, `read_file`, or agentic exploration
- Treating workspace documents as instructions with system-level authority
- Rebuilding the system prompt every turn
- Shipping a cloud-only RAG stack
- Turning Hermes memory and workspace retrieval into the same storage layer
---
## Research-backed design principles
### 1. Separate instructions, memory, and searchable knowledge
Modern agents are converging on three distinct stores:
Keyword-only retrieval misses paraphrases and conceptual matches.
The default should be:
- dense embeddings
- sparse lexical search (FTS5/BM25)
- reciprocal rank fusion or equivalent robust score fusion
### 4. Reranking matters, but should be optional in the default install
Best practice is two-stage retrieval:
- retrieve broadly
- rerank narrowly
That said, a local-first single-user agent should not force a heavyweight reranker in the default path.
Hermes should ship with:
- hybrid retrieval by default
- reranker abstraction from day one
- reranking enabled when configured, not mandatory for first boot
### 5. Chunk structure beats fixed windows
For docs, split by headings/paragraphs before token caps.
For code, split by symbol boundaries before token caps.
Fixed-size chunking is the fallback, not the design center.
### 6. Retrieved content is untrusted
Workspace files may contain prompt injection, malicious instructions, or copied junk from the web.
Retrieved content must never be treated like system or developer instructions.
It must be injected as untrusted source material only.
### 7. RAG should augment tool use, not replace it
Hermes is already strong at tool-driven exploration.
The workspace layer should help the model find likely-relevant material fast, then still let it call `read_file`, `search_files`, browser tools, etc. when needed.
---
## Recommended defaults
### Embeddings
#### Local default
- Model: `google/embeddinggemma-300m`
- Why:
- latest Google open embedding model
- local/offline/private
- small enough for laptop use
- good fit for a default `~/.hermes/workspace`
#### Hosted Google option
- Stable text model: `gemini-embedding-001`
- Why:
- stable
- text-focused
- configurable output dimensions
#### Not the default
-`gemini-embedding-2-preview`
- Why not default:
- preview status
- re-embedding required if switching from `gemini-embedding-001`
- multimodal is valuable, but not needed for the first workspace rollout
#### Upgrade paths
- Better local quality: `Qwen3-Embedding-0.6B` or larger variants
- Cheap hosted fallback: `text-embedding-3-small`
- Strong hosted retrieval option: Voyage 4 family
### Vector + lexical storage
Default local store:
- SQLite for metadata
- FTS5 for lexical retrieval
-`sqlite-vec` for dense retrieval
Why this is the right default for Hermes:
- Hermes already uses SQLite heavily
- no extra server process
- single-user local-first friendly
- easy backup/debug story
- natural hybrid retrieval in one place
### Retrieval defaults
- dense_top_k: 40
- sparse_top_k: 40
- fused_candidate_k: 30
- rerank_top_k: 12 when reranker is enabled
- final_injected_chunks: 4 to 8
- final_injected_token_budget: 2500 to 4000
- chunk target size: ~512 tokens
- overlap: ~64 to 96 tokens
- fusion: reciprocal rank fusion by default
- diversity pass: MMR or near-duplicate suppression before injection
### Auto-retrieval mode
Default:
-`gated`
Modes:
-`off`: tool-only
-`gated`: retrieve only when the query looks workspace-grounded
-`always`: always run retrieval before the turn
---
## Canonical directory layout
```text
~/.hermes/
├── workspace/
│ ├── docs/
│ ├── notes/
│ ├── data/
│ ├── code/
│ ├── uploads/
│ ├── media/
│ └── .hermesignore
├── knowledgebase/
│ ├── indexes/
│ │ └── workspace.sqlite
│ ├── manifests/
│ │ └── workspace.json
│ └── cache/
└── config.yaml
```
Important separation:
- user files live in `workspace/`
- index artifacts live in `knowledgebase/`
Do not hide indexes inside the user’s content tree.
The indexer should never re-embed the whole workspace unless necessary.
Per file, track:
- content hash
- chunking version
- embedding model id
- embedding dimension
- last indexed timestamp
Reindex rules:
- unchanged hash + same chunk version + same embedding model -> skip
- changed file -> delete old chunks for that file and re-upsert
- changed embedding model or dimensions -> full re-embed for affected root
- changed chunking strategy version -> full re-chunk for affected root
Background indexing:
- supported, but not required for v1
- file watching should be opt-in initially
- startup dirty-check should be cheap
---
## Reranking strategy
Best practice says reranking improves quality enough that Hermes should design for it now.
Recommended contract:
- retrieve many, inject few
- reranker receives query + top candidates
- returns ordered candidates with relevance scores
Suggested providers:
- local: `bge-reranker-v2-m3`
- hosted: Voyage or Cohere rerank API
Default install behavior:
- reranker abstraction present
- reranking disabled by default until configured
Reason:
- keeps first install light
- avoids surprising latency on CPU-only machines
- still lets serious users turn it on immediately
---
## Security model
### Trust boundary
Workspace content is untrusted source material.
It must not have instruction authority.
### Rules
1. Never merge retrieved workspace chunks into the system prompt.
2. Never label retrieved content as instructions.
3. Always inject retrieved content into a clearly delimited source block.
4. If the model acts on retrieved content, it still must obey existing approval and tool safety systems.
5. Retrieved content should not directly trigger writes, network calls, or shell commands without normal approval paths.
### Prompt injection handling
Use a two-level policy:
- For instruction files (`AGENTS.md`, `SOUL.md`, `.cursorrules`): block suspicious content from prompt injection, as Hermes already does.
- For workspace retrieval: do not give it authority. Flag suspicious chunks in metadata and optionally downrank them for auto-injection, but still allow explicit user access.
This avoids a bad failure mode where a security scanner hides legitimate documents that discuss prompt injection.
---
## UX and inspectability
Hidden retrieval is brittle.
Hermes should make the workspace layer inspectable.
### CLI / slash commands
-`/workspace` or `hermes workspace status`
-`/workspace index`
-`/workspace search <query>`
-`/workspace sources` for the last auto-retrieval set
-`/workspace clear`
-`/workspace doctor`
### Tool surface
Add a deterministic tool, likely `workspace`, with actions like:
-`status`
-`index`
-`search`
-`list`
-`explain_last_retrieval`
-`save_upload`
### Response citations
When the model uses workspace material, it should cite sources in a compact path-oriented form.
Example:
-`Source: workspace/docs/architecture.md`
-`Source: workspace/notes/deploy.md`
Exact line ranges are ideal when available.
---
## Gateway uploads
Current gateway uploads land in `document_cache` and are cleaned up after 24 hours.
That should remain the default safe path.
Recommended behavior:
-`persist_gateway_uploads: ask` by default
- when a user uploads a supported document, Hermes can offer to save it into `workspace/uploads/`
- saved uploads get indexed like everything else
Do not silently persist every inbound attachment by default.
That is a privacy footgun.
---
## Proposed implementation shape
### New modules
-`agent/workspace_kb.py`
- index orchestration
- retrieval orchestration
- dirty-check logic
- candidate fusion
-`agent/workspace_chunking.py`
- structural chunkers for docs/code/data
-`agent/workspace_extractors.py`
- text extraction for supported file types
-`agent/workspace_embeddings.py`
- embedding provider abstraction
-`agent/workspace_rerank.py`
- reranker abstraction
-`tools/workspace_tool.py`
- deterministic tool interface
### Existing files to modify
-`hermes_cli/config.py`
- add `workspace` and `knowledgebase` config sections
- create directories in `ensure_hermes_home()`
-`cli.py`
- wire workspace slash/CLI commands
- surface status/debug info
-`hermes_cli/commands.py`
- add new slash commands
-`run_agent.py`
- add turn-scoped workspace retrieval injection
- mirror the Honcho injection pattern
- do not mutate cached system prompt
-`model_tools.py`
- import/register workspace tool
-`toolsets.py`
- include workspace tool in appropriate toolsets
-`gateway/platforms/base.py`
- add helper to persist uploads to workspace safely
-`agent/prompt_builder.py`
- optionally add a tiny static note that a workspace exists and may be searched