mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-29 23:41:35 +08:00
fix: revert accidental summary_target_tokens change + add docs for context_length config
- Revert summary_target_tokens from 2500 back to 500 (accidental change during patching) - Add 'Context Length Detection' section to Custom & Self-Hosted docs explaining model.context_length config override
This commit is contained in:
@@ -414,6 +414,29 @@ LLM_MODEL=meta-llama/Llama-3.1-70B-Instruct-Turbo
|
||||
|
||||
---
|
||||
|
||||
### Context Length Detection
|
||||
|
||||
Hermes automatically detects your model's context length by querying the endpoint's `/v1/models` response. For most setups this works out of the box. If detection fails (the model name doesn't match, the endpoint doesn't expose `/v1/models`, etc.), Hermes falls back to a high default and probes downward on context-length errors.
|
||||
|
||||
To set the context length explicitly, add `context_length` to your model config:
|
||||
|
||||
```yaml
|
||||
model:
|
||||
default: "qwen3.5:9b"
|
||||
base_url: "http://localhost:8080/v1"
|
||||
context_length: 131072 # tokens
|
||||
```
|
||||
|
||||
This takes highest priority — it overrides auto-detection, cached values, and hardcoded defaults.
|
||||
|
||||
:::tip When to set this manually
|
||||
- Your model shows "2M context" in the status bar (detection failed)
|
||||
- You want to limit context below the model's maximum (e.g., 8k on a 128k model to save VRAM)
|
||||
- You're running behind a proxy that doesn't expose `/v1/models`
|
||||
:::
|
||||
|
||||
---
|
||||
|
||||
### Choosing the Right Setup
|
||||
|
||||
| Use Case | Recommended |
|
||||
|
||||
Reference in New Issue
Block a user