Files
hermes-agent/website/docs/user-guide/features/tts.md

286 lines
15 KiB
Markdown
Raw Normal View History

---
sidebar_position: 9
title: "Voice & TTS"
description: "Text-to-speech and voice message transcription across all platforms"
---
# Voice & TTS
Hermes Agent supports both text-to-speech output and voice message transcription across all messaging platforms.
:::tip Nous Subscribers
If you have a paid [Nous Portal](https://portal.nousresearch.com) subscription, OpenAI TTS is available through the **[Tool Gateway](tool-gateway.md)** without a separate OpenAI API key. Run `hermes model` or `hermes tools` to enable it.
:::
## Text-to-Speech
feat(tts): add Piper as a native local TTS provider (closes #8508) (#17885) Piper (OHF-Voice/piper1-gpl) is a fast, local neural TTS engine from the Home Assistant project that supports 44 languages with zero API keys. Adds it as a native built-in provider alongside edge/neutts/kittentts, installable via 'hermes tools' with one keystroke. What ships: - New 'piper' built-in provider in tools/tts_tool.py - Lazy import via _import_piper() - Module-level voice cache keyed on (model_path, use_cuda) so switching voices doesn't invalidate older cached voices - _resolve_piper_voice_path() accepts either an absolute .onnx path or a voice name (auto-downloaded on first use via 'python -m piper.download_voices --download-dir <cache>') - Voice cache at ~/.hermes/cache/piper-voices/ (profile-aware via get_hermes_dir) - Optional SynthesisConfig knobs: length_scale, noise_scale, noise_w_scale, volume, normalize_audio, use_cuda — passed through only when configured, so older piper-tts versions aren't broken - WAV output then ffmpeg conversion path (same as neutts/kittentts) so Telegram voice bubbles work when ffmpeg is present - Piper added to BUILTIN_TTS_PROVIDERS so a user's tts.providers.piper.command cannot shadow the native provider (regression test included) - 'hermes tools' wizard entry - Piper appears under Voice and TTS as local free, with 'pip install piper-tts' auto-install via post_setup handler - Prints voice-catalog URL and default-voice info after install - config.yaml defaults - tts.piper.voice defaults to en_US-lessac-medium - Commented advanced knobs for discoverability - Docs - New 'Piper (local, 44 languages)' section in features/tts.md explaining install path, voice switching, pre-downloaded voices, and advanced knobs - Piper listed in the ten-provider table and ffmpeg table - Custom-command-providers section updated to drop the Piper example (now native) and add a piper-custom example for users with their own trained .onnx models - overview.md bumps provider count to ten - Tests (tests/tools/test_tts_piper.py, 16 tests) - Registration (BUILTIN_TTS_PROVIDERS, PROVIDER_MAX_TEXT_LENGTH) - _resolve_piper_voice_path across every branch: direct .onnx path, cached voice name, fresh download with correct CLI args, download failure, successful-exit-but-missing-files, empty voice to default - _generate_piper_tts: loads voice once, reuses cache, voice-name download wiring, advanced knobs flow through SynthesisConfig - text_to_speech_tool end-to-end dispatch and missing-package error - check_tts_requirements: piper availability toggles the return value - Regression guard: piper cannot be shadowed by a command provider with the same name - Pre-existing test_tts_mistral test broadened to mock the new piper/kittentts/command-provider checks (otherwise it false-passes when piper is installed in the test venv) E2E verification (live): Actual pip install piper-tts, config piper + en_US-lessac-low, text_to_speech_tool call, voice auto-downloaded from HuggingFace, WAV synthesized, ffmpeg-converted to Ogg/Opus. Second call hits the cache (~60ms). Cache dir populated with .onnx and .onnx.json. This caught a real bug during development: the first pass used '-d' as the download-dir flag; the actual piper.download_voices CLI wants '--download-dir'. Fixed before PR opened.
2026-04-30 02:53:20 -07:00
Convert text to speech with ten providers:
| Provider | Quality | Cost | API Key |
|----------|---------|------|---------|
| **Edge TTS** (default) | Good | Free | None needed |
| **ElevenLabs** | Excellent | Paid | `ELEVENLABS_API_KEY` |
| **OpenAI TTS** | Good | Paid | `VOICE_TOOLS_OPENAI_KEY` |
| **MiniMax TTS** | Excellent | Paid | `MINIMAX_API_KEY` |
| **Mistral (Voxtral TTS)** | Excellent | Paid | `MISTRAL_API_KEY` |
| **Google Gemini TTS** | Excellent | Free tier | `GEMINI_API_KEY` |
| **xAI TTS** | Excellent | Paid | `XAI_API_KEY` |
2026-04-21 00:44:37 -07:00
| **NeuTTS** | Good | Free (local) | None needed |
| **KittenTTS** | Good | Free (local) | None needed |
feat(tts): add Piper as a native local TTS provider (closes #8508) (#17885) Piper (OHF-Voice/piper1-gpl) is a fast, local neural TTS engine from the Home Assistant project that supports 44 languages with zero API keys. Adds it as a native built-in provider alongside edge/neutts/kittentts, installable via 'hermes tools' with one keystroke. What ships: - New 'piper' built-in provider in tools/tts_tool.py - Lazy import via _import_piper() - Module-level voice cache keyed on (model_path, use_cuda) so switching voices doesn't invalidate older cached voices - _resolve_piper_voice_path() accepts either an absolute .onnx path or a voice name (auto-downloaded on first use via 'python -m piper.download_voices --download-dir <cache>') - Voice cache at ~/.hermes/cache/piper-voices/ (profile-aware via get_hermes_dir) - Optional SynthesisConfig knobs: length_scale, noise_scale, noise_w_scale, volume, normalize_audio, use_cuda — passed through only when configured, so older piper-tts versions aren't broken - WAV output then ffmpeg conversion path (same as neutts/kittentts) so Telegram voice bubbles work when ffmpeg is present - Piper added to BUILTIN_TTS_PROVIDERS so a user's tts.providers.piper.command cannot shadow the native provider (regression test included) - 'hermes tools' wizard entry - Piper appears under Voice and TTS as local free, with 'pip install piper-tts' auto-install via post_setup handler - Prints voice-catalog URL and default-voice info after install - config.yaml defaults - tts.piper.voice defaults to en_US-lessac-medium - Commented advanced knobs for discoverability - Docs - New 'Piper (local, 44 languages)' section in features/tts.md explaining install path, voice switching, pre-downloaded voices, and advanced knobs - Piper listed in the ten-provider table and ffmpeg table - Custom-command-providers section updated to drop the Piper example (now native) and add a piper-custom example for users with their own trained .onnx models - overview.md bumps provider count to ten - Tests (tests/tools/test_tts_piper.py, 16 tests) - Registration (BUILTIN_TTS_PROVIDERS, PROVIDER_MAX_TEXT_LENGTH) - _resolve_piper_voice_path across every branch: direct .onnx path, cached voice name, fresh download with correct CLI args, download failure, successful-exit-but-missing-files, empty voice to default - _generate_piper_tts: loads voice once, reuses cache, voice-name download wiring, advanced knobs flow through SynthesisConfig - text_to_speech_tool end-to-end dispatch and missing-package error - check_tts_requirements: piper availability toggles the return value - Regression guard: piper cannot be shadowed by a command provider with the same name - Pre-existing test_tts_mistral test broadened to mock the new piper/kittentts/command-provider checks (otherwise it false-passes when piper is installed in the test venv) E2E verification (live): Actual pip install piper-tts, config piper + en_US-lessac-low, text_to_speech_tool call, voice auto-downloaded from HuggingFace, WAV synthesized, ffmpeg-converted to Ogg/Opus. Second call hits the cache (~60ms). Cache dir populated with .onnx and .onnx.json. This caught a real bug during development: the first pass used '-d' as the download-dir flag; the actual piper.download_voices CLI wants '--download-dir'. Fixed before PR opened.
2026-04-30 02:53:20 -07:00
| **Piper** | Good | Free (local) | None needed |
### Platform Delivery
| Platform | Delivery | Format |
|----------|----------|--------|
| Telegram | Voice bubble (plays inline) | Opus `.ogg` |
docs: fix stale and incorrect documentation across 18 files Cross-referenced all 84 docs pages against the actual codebase and corrected every discrepancy found. Reference docs: - faq.md: Fix non-existent commands (/stats→/usage, /context→/usage, hermes models→hermes model, hermes config get→hermes config show, hermes gateway logs→cat gateway.log, async→sync chat() call) - cli-commands.md: Fix --provider choices list (remove providers not in argparse), add undocumented -s/--skills flag - slash-commands.md: Add missing /queue and /resume commands, fix /approve args_hint to show [session|always] - tools-reference.md: Remove duplicate vision and web toolset sections - environment-variables.md: Fix HERMES_INFERENCE_PROVIDER list (add copilot-acp, remove alibaba to match actual argparse choices) Configuration & user guide: - configuration.md: Fix approval_mode→approvals.mode (manual not ask), checkpoints.enabled default true not false, human_delay defaults (500/2000→800/2500), remove non-existent delegation.max_iterations and delegation.default_toolsets, fix website_blocklist nesting under security:, add .hermes.md and CLAUDE.md to context files table with priority system explanation - security.md: Fix website_blocklist nesting under security: - context-files.md: Add .hermes.md/HERMES.md and CLAUDE.md support, document priority-based first-match-wins loading behavior - cli.md: Fix personalities config nesting (top-level, not under agent:) - delegation.md: Fix model override docs (config-level, not per-call tool parameter) - rl-training.md: Fix log directory (tinker-atropos/logs/→ ~/.hermes/logs/rl_training/) - tts.md: Fix Discord delivery format (voice bubble with fallback, not just file attachment) - git-worktrees.md: Remove outdated v0.2.0 version reference Developer guide: - prompt-assembly.md: Add .hermes.md, CLAUDE.md, document priority system for context files - agent-loop.md: Fix callback list (remove non-existent message_callback, add stream_delta_callback, tool_gen_callback, status_callback) Messaging & guides: - webhooks.md: Fix command (hermes setup gateway→hermes gateway setup) - tips.md: Fix session idle timeout (120min→24h), config file (gateway.json→config.yaml) - build-a-hermes-plugin.md: Fix plugin.yaml provides: format (provides_tools/provides_hooks as lists), note register_command() as not yet implemented
2026-03-24 07:53:07 -07:00
| Discord | Voice bubble (Opus/OGG), falls back to file attachment | Opus/MP3 |
| WhatsApp | Audio file attachment | MP3 |
| CLI | Saved to `~/.hermes/audio_cache/` | MP3 |
### Configuration
```yaml
# In ~/.hermes/config.yaml
tts:
feat(tts): add Piper as a native local TTS provider (closes #8508) (#17885) Piper (OHF-Voice/piper1-gpl) is a fast, local neural TTS engine from the Home Assistant project that supports 44 languages with zero API keys. Adds it as a native built-in provider alongside edge/neutts/kittentts, installable via 'hermes tools' with one keystroke. What ships: - New 'piper' built-in provider in tools/tts_tool.py - Lazy import via _import_piper() - Module-level voice cache keyed on (model_path, use_cuda) so switching voices doesn't invalidate older cached voices - _resolve_piper_voice_path() accepts either an absolute .onnx path or a voice name (auto-downloaded on first use via 'python -m piper.download_voices --download-dir <cache>') - Voice cache at ~/.hermes/cache/piper-voices/ (profile-aware via get_hermes_dir) - Optional SynthesisConfig knobs: length_scale, noise_scale, noise_w_scale, volume, normalize_audio, use_cuda — passed through only when configured, so older piper-tts versions aren't broken - WAV output then ffmpeg conversion path (same as neutts/kittentts) so Telegram voice bubbles work when ffmpeg is present - Piper added to BUILTIN_TTS_PROVIDERS so a user's tts.providers.piper.command cannot shadow the native provider (regression test included) - 'hermes tools' wizard entry - Piper appears under Voice and TTS as local free, with 'pip install piper-tts' auto-install via post_setup handler - Prints voice-catalog URL and default-voice info after install - config.yaml defaults - tts.piper.voice defaults to en_US-lessac-medium - Commented advanced knobs for discoverability - Docs - New 'Piper (local, 44 languages)' section in features/tts.md explaining install path, voice switching, pre-downloaded voices, and advanced knobs - Piper listed in the ten-provider table and ffmpeg table - Custom-command-providers section updated to drop the Piper example (now native) and add a piper-custom example for users with their own trained .onnx models - overview.md bumps provider count to ten - Tests (tests/tools/test_tts_piper.py, 16 tests) - Registration (BUILTIN_TTS_PROVIDERS, PROVIDER_MAX_TEXT_LENGTH) - _resolve_piper_voice_path across every branch: direct .onnx path, cached voice name, fresh download with correct CLI args, download failure, successful-exit-but-missing-files, empty voice to default - _generate_piper_tts: loads voice once, reuses cache, voice-name download wiring, advanced knobs flow through SynthesisConfig - text_to_speech_tool end-to-end dispatch and missing-package error - check_tts_requirements: piper availability toggles the return value - Regression guard: piper cannot be shadowed by a command provider with the same name - Pre-existing test_tts_mistral test broadened to mock the new piper/kittentts/command-provider checks (otherwise it false-passes when piper is installed in the test venv) E2E verification (live): Actual pip install piper-tts, config piper + en_US-lessac-low, text_to_speech_tool call, voice auto-downloaded from HuggingFace, WAV synthesized, ffmpeg-converted to Ogg/Opus. Second call hits the cache (~60ms). Cache dir populated with .onnx and .onnx.json. This caught a real bug during development: the first pass used '-d' as the download-dir flag; the actual piper.download_voices CLI wants '--download-dir'. Fixed before PR opened.
2026-04-30 02:53:20 -07:00
provider: "edge" # "edge" | "elevenlabs" | "openai" | "minimax" | "mistral" | "gemini" | "xai" | "neutts" | "kittentts" | "piper"
docs: comprehensive update for recent merged PRs (#9019) Audit and update documentation across 12 files to match changes from ~50 recently merged PRs. Key updates: Slash commands (slash-commands.md): - Add 5 missing commands: /snapshot, /fast, /image, /debug, /restart - Fix /status incorrectly labeled as messaging-only (available in both) - Add --global flag to /model docs - Add [focus topic] arg to /compress docs CLI commands (cli-commands.md): - Add hermes debug share section with options and examples - Add hermes backup section with --quick and --label flags - Add hermes import section Feature docs: - TTS: document global tts.speed and per-provider speed for Edge/OpenAI - Web dashboard: add docs for 5 missing pages (Sessions, Logs, Analytics, Cron, Skills) and 15+ API endpoints - WhatsApp: add streaming, 4K chunking, and markdown formatting docs - Skills: add GitHub rate-limit/GITHUB_TOKEN troubleshooting tip - Budget: document CLI notification on iteration budget exhaustion Config migration (compression.summary_* → auxiliary.compression.*): - Update configuration.md, environment-variables.md, fallback-providers.md, cli.md, and context-compression-and-caching.md - Replace legacy compression.summary_model/provider/base_url references with auxiliary.compression.model/provider/base_url - Add legacy migration info boxes explaining auto-migration Minor fixes: - wecom-callback.md: clarify 'text only' limitation (input only) - Escape {session_id}/{job_id} in web-dashboard.md headings for MDX
2026-04-13 10:50:59 -07:00
speed: 1.0 # Global speed multiplier (provider-specific settings override this)
edge:
voice: "en-US-AriaNeural" # 322 voices, 74 languages
docs: comprehensive update for recent merged PRs (#9019) Audit and update documentation across 12 files to match changes from ~50 recently merged PRs. Key updates: Slash commands (slash-commands.md): - Add 5 missing commands: /snapshot, /fast, /image, /debug, /restart - Fix /status incorrectly labeled as messaging-only (available in both) - Add --global flag to /model docs - Add [focus topic] arg to /compress docs CLI commands (cli-commands.md): - Add hermes debug share section with options and examples - Add hermes backup section with --quick and --label flags - Add hermes import section Feature docs: - TTS: document global tts.speed and per-provider speed for Edge/OpenAI - Web dashboard: add docs for 5 missing pages (Sessions, Logs, Analytics, Cron, Skills) and 15+ API endpoints - WhatsApp: add streaming, 4K chunking, and markdown formatting docs - Skills: add GitHub rate-limit/GITHUB_TOKEN troubleshooting tip - Budget: document CLI notification on iteration budget exhaustion Config migration (compression.summary_* → auxiliary.compression.*): - Update configuration.md, environment-variables.md, fallback-providers.md, cli.md, and context-compression-and-caching.md - Replace legacy compression.summary_model/provider/base_url references with auxiliary.compression.model/provider/base_url - Add legacy migration info boxes explaining auto-migration Minor fixes: - wecom-callback.md: clarify 'text only' limitation (input only) - Escape {session_id}/{job_id} in web-dashboard.md headings for MDX
2026-04-13 10:50:59 -07:00
speed: 1.0 # Converted to rate percentage (+/-%)
elevenlabs:
voice_id: "pNInz6obpgDQGcFmaJgB" # Adam
model_id: "eleven_multilingual_v2"
openai:
model: "gpt-4o-mini-tts"
voice: "alloy" # alloy, echo, fable, onyx, nova, shimmer
base_url: "https://api.openai.com/v1" # Override for OpenAI-compatible TTS endpoints
docs: comprehensive update for recent merged PRs (#9019) Audit and update documentation across 12 files to match changes from ~50 recently merged PRs. Key updates: Slash commands (slash-commands.md): - Add 5 missing commands: /snapshot, /fast, /image, /debug, /restart - Fix /status incorrectly labeled as messaging-only (available in both) - Add --global flag to /model docs - Add [focus topic] arg to /compress docs CLI commands (cli-commands.md): - Add hermes debug share section with options and examples - Add hermes backup section with --quick and --label flags - Add hermes import section Feature docs: - TTS: document global tts.speed and per-provider speed for Edge/OpenAI - Web dashboard: add docs for 5 missing pages (Sessions, Logs, Analytics, Cron, Skills) and 15+ API endpoints - WhatsApp: add streaming, 4K chunking, and markdown formatting docs - Skills: add GitHub rate-limit/GITHUB_TOKEN troubleshooting tip - Budget: document CLI notification on iteration budget exhaustion Config migration (compression.summary_* → auxiliary.compression.*): - Update configuration.md, environment-variables.md, fallback-providers.md, cli.md, and context-compression-and-caching.md - Replace legacy compression.summary_model/provider/base_url references with auxiliary.compression.model/provider/base_url - Add legacy migration info boxes explaining auto-migration Minor fixes: - wecom-callback.md: clarify 'text only' limitation (input only) - Escape {session_id}/{job_id} in web-dashboard.md headings for MDX
2026-04-13 10:50:59 -07:00
speed: 1.0 # 0.25 - 4.0
minimax:
model: "speech-2.8-hd" # speech-2.8-hd (default), speech-2.8-turbo
voice_id: "English_Graceful_Lady" # See https://platform.minimax.io/faq/system-voice-id
speed: 1 # 0.5 - 2.0
vol: 1 # 0 - 10
pitch: 0 # -12 - 12
mistral:
model: "voxtral-mini-tts-2603"
voice_id: "c69964a6-ab8b-4f8a-9465-ec0925096ec8" # Paul - Neutral (default)
gemini:
model: "gemini-2.5-flash-preview-tts" # or gemini-2.5-pro-preview-tts
voice: "Kore" # 30 prebuilt voices: Zephyr, Puck, Kore, Enceladus, Gacrux, etc.
xai:
voice_id: "eve" # xAI TTS voice (see https://docs.x.ai/docs/api-reference#tts)
language: "en" # ISO 639-1 code
sample_rate: 24000 # 22050 / 24000 (default) / 44100 / 48000
bit_rate: 128000 # MP3 bitrate; only applies when codec=mp3
# base_url: "https://api.x.ai/v1" # Override via XAI_BASE_URL env var
neutts:
ref_audio: ''
ref_text: ''
model: neuphonic/neutts-air-q4-gguf
device: cpu
2026-04-21 00:44:37 -07:00
kittentts:
model: KittenML/kitten-tts-nano-0.8-int8 # 25MB int8; also: kitten-tts-micro-0.8 (41MB), kitten-tts-mini-0.8 (80MB)
voice: Jasper # Jasper, Bella, Luna, Bruno, Rosie, Hugo, Kiki, Leo
speed: 1.0 # 0.5 - 2.0
clean_text: true # Expand numbers, currencies, units
feat(tts): add Piper as a native local TTS provider (closes #8508) (#17885) Piper (OHF-Voice/piper1-gpl) is a fast, local neural TTS engine from the Home Assistant project that supports 44 languages with zero API keys. Adds it as a native built-in provider alongside edge/neutts/kittentts, installable via 'hermes tools' with one keystroke. What ships: - New 'piper' built-in provider in tools/tts_tool.py - Lazy import via _import_piper() - Module-level voice cache keyed on (model_path, use_cuda) so switching voices doesn't invalidate older cached voices - _resolve_piper_voice_path() accepts either an absolute .onnx path or a voice name (auto-downloaded on first use via 'python -m piper.download_voices --download-dir <cache>') - Voice cache at ~/.hermes/cache/piper-voices/ (profile-aware via get_hermes_dir) - Optional SynthesisConfig knobs: length_scale, noise_scale, noise_w_scale, volume, normalize_audio, use_cuda — passed through only when configured, so older piper-tts versions aren't broken - WAV output then ffmpeg conversion path (same as neutts/kittentts) so Telegram voice bubbles work when ffmpeg is present - Piper added to BUILTIN_TTS_PROVIDERS so a user's tts.providers.piper.command cannot shadow the native provider (regression test included) - 'hermes tools' wizard entry - Piper appears under Voice and TTS as local free, with 'pip install piper-tts' auto-install via post_setup handler - Prints voice-catalog URL and default-voice info after install - config.yaml defaults - tts.piper.voice defaults to en_US-lessac-medium - Commented advanced knobs for discoverability - Docs - New 'Piper (local, 44 languages)' section in features/tts.md explaining install path, voice switching, pre-downloaded voices, and advanced knobs - Piper listed in the ten-provider table and ffmpeg table - Custom-command-providers section updated to drop the Piper example (now native) and add a piper-custom example for users with their own trained .onnx models - overview.md bumps provider count to ten - Tests (tests/tools/test_tts_piper.py, 16 tests) - Registration (BUILTIN_TTS_PROVIDERS, PROVIDER_MAX_TEXT_LENGTH) - _resolve_piper_voice_path across every branch: direct .onnx path, cached voice name, fresh download with correct CLI args, download failure, successful-exit-but-missing-files, empty voice to default - _generate_piper_tts: loads voice once, reuses cache, voice-name download wiring, advanced knobs flow through SynthesisConfig - text_to_speech_tool end-to-end dispatch and missing-package error - check_tts_requirements: piper availability toggles the return value - Regression guard: piper cannot be shadowed by a command provider with the same name - Pre-existing test_tts_mistral test broadened to mock the new piper/kittentts/command-provider checks (otherwise it false-passes when piper is installed in the test venv) E2E verification (live): Actual pip install piper-tts, config piper + en_US-lessac-low, text_to_speech_tool call, voice auto-downloaded from HuggingFace, WAV synthesized, ffmpeg-converted to Ogg/Opus. Second call hits the cache (~60ms). Cache dir populated with .onnx and .onnx.json. This caught a real bug during development: the first pass used '-d' as the download-dir flag; the actual piper.download_voices CLI wants '--download-dir'. Fixed before PR opened.
2026-04-30 02:53:20 -07:00
piper:
voice: en_US-lessac-medium # voice name (auto-downloaded) OR absolute path to .onnx
# voices_dir: '' # default: ~/.hermes/cache/piper-voices/
# use_cuda: false # requires onnxruntime-gpu
# length_scale: 1.0 # 2.0 = twice as slow
# noise_scale: 0.667
# noise_w_scale: 0.8
# volume: 1.0 # 0.5 = half as loud
# normalize_audio: true
```
docs: comprehensive update for recent merged PRs (#9019) Audit and update documentation across 12 files to match changes from ~50 recently merged PRs. Key updates: Slash commands (slash-commands.md): - Add 5 missing commands: /snapshot, /fast, /image, /debug, /restart - Fix /status incorrectly labeled as messaging-only (available in both) - Add --global flag to /model docs - Add [focus topic] arg to /compress docs CLI commands (cli-commands.md): - Add hermes debug share section with options and examples - Add hermes backup section with --quick and --label flags - Add hermes import section Feature docs: - TTS: document global tts.speed and per-provider speed for Edge/OpenAI - Web dashboard: add docs for 5 missing pages (Sessions, Logs, Analytics, Cron, Skills) and 15+ API endpoints - WhatsApp: add streaming, 4K chunking, and markdown formatting docs - Skills: add GitHub rate-limit/GITHUB_TOKEN troubleshooting tip - Budget: document CLI notification on iteration budget exhaustion Config migration (compression.summary_* → auxiliary.compression.*): - Update configuration.md, environment-variables.md, fallback-providers.md, cli.md, and context-compression-and-caching.md - Replace legacy compression.summary_model/provider/base_url references with auxiliary.compression.model/provider/base_url - Add legacy migration info boxes explaining auto-migration Minor fixes: - wecom-callback.md: clarify 'text only' limitation (input only) - Escape {session_id}/{job_id} in web-dashboard.md headings for MDX
2026-04-13 10:50:59 -07:00
**Speed control**: The global `tts.speed` value applies to all providers by default. Each provider can override it with its own `speed` setting (e.g., `tts.openai.speed: 1.5`). Provider-specific speed takes precedence over the global value. Default is `1.0` (normal speed).
### Telegram Voice Bubbles & ffmpeg
Telegram voice bubbles require Opus/OGG audio format:
- **OpenAI, ElevenLabs, and Mistral** produce Opus natively — no extra setup
- **Edge TTS** (default) outputs MP3 and needs **ffmpeg** to convert:
- **MiniMax TTS** outputs MP3 and needs **ffmpeg** to convert for Telegram voice bubbles
- **Google Gemini TTS** outputs raw PCM and uses **ffmpeg** to encode Opus directly for Telegram voice bubbles
- **xAI TTS** outputs MP3 and needs **ffmpeg** to convert for Telegram voice bubbles
- **NeuTTS** outputs WAV and also needs **ffmpeg** to convert for Telegram voice bubbles
2026-04-21 00:44:37 -07:00
- **KittenTTS** outputs WAV and also needs **ffmpeg** to convert for Telegram voice bubbles
feat(tts): add Piper as a native local TTS provider (closes #8508) (#17885) Piper (OHF-Voice/piper1-gpl) is a fast, local neural TTS engine from the Home Assistant project that supports 44 languages with zero API keys. Adds it as a native built-in provider alongside edge/neutts/kittentts, installable via 'hermes tools' with one keystroke. What ships: - New 'piper' built-in provider in tools/tts_tool.py - Lazy import via _import_piper() - Module-level voice cache keyed on (model_path, use_cuda) so switching voices doesn't invalidate older cached voices - _resolve_piper_voice_path() accepts either an absolute .onnx path or a voice name (auto-downloaded on first use via 'python -m piper.download_voices --download-dir <cache>') - Voice cache at ~/.hermes/cache/piper-voices/ (profile-aware via get_hermes_dir) - Optional SynthesisConfig knobs: length_scale, noise_scale, noise_w_scale, volume, normalize_audio, use_cuda — passed through only when configured, so older piper-tts versions aren't broken - WAV output then ffmpeg conversion path (same as neutts/kittentts) so Telegram voice bubbles work when ffmpeg is present - Piper added to BUILTIN_TTS_PROVIDERS so a user's tts.providers.piper.command cannot shadow the native provider (regression test included) - 'hermes tools' wizard entry - Piper appears under Voice and TTS as local free, with 'pip install piper-tts' auto-install via post_setup handler - Prints voice-catalog URL and default-voice info after install - config.yaml defaults - tts.piper.voice defaults to en_US-lessac-medium - Commented advanced knobs for discoverability - Docs - New 'Piper (local, 44 languages)' section in features/tts.md explaining install path, voice switching, pre-downloaded voices, and advanced knobs - Piper listed in the ten-provider table and ffmpeg table - Custom-command-providers section updated to drop the Piper example (now native) and add a piper-custom example for users with their own trained .onnx models - overview.md bumps provider count to ten - Tests (tests/tools/test_tts_piper.py, 16 tests) - Registration (BUILTIN_TTS_PROVIDERS, PROVIDER_MAX_TEXT_LENGTH) - _resolve_piper_voice_path across every branch: direct .onnx path, cached voice name, fresh download with correct CLI args, download failure, successful-exit-but-missing-files, empty voice to default - _generate_piper_tts: loads voice once, reuses cache, voice-name download wiring, advanced knobs flow through SynthesisConfig - text_to_speech_tool end-to-end dispatch and missing-package error - check_tts_requirements: piper availability toggles the return value - Regression guard: piper cannot be shadowed by a command provider with the same name - Pre-existing test_tts_mistral test broadened to mock the new piper/kittentts/command-provider checks (otherwise it false-passes when piper is installed in the test venv) E2E verification (live): Actual pip install piper-tts, config piper + en_US-lessac-low, text_to_speech_tool call, voice auto-downloaded from HuggingFace, WAV synthesized, ffmpeg-converted to Ogg/Opus. Second call hits the cache (~60ms). Cache dir populated with .onnx and .onnx.json. This caught a real bug during development: the first pass used '-d' as the download-dir flag; the actual piper.download_voices CLI wants '--download-dir'. Fixed before PR opened.
2026-04-30 02:53:20 -07:00
- **Piper** outputs WAV and also needs **ffmpeg** to convert for Telegram voice bubbles
```bash
# Ubuntu/Debian
sudo apt install ffmpeg
# macOS
brew install ffmpeg
# Fedora
sudo dnf install ffmpeg
```
feat(tts): add Piper as a native local TTS provider (closes #8508) (#17885) Piper (OHF-Voice/piper1-gpl) is a fast, local neural TTS engine from the Home Assistant project that supports 44 languages with zero API keys. Adds it as a native built-in provider alongside edge/neutts/kittentts, installable via 'hermes tools' with one keystroke. What ships: - New 'piper' built-in provider in tools/tts_tool.py - Lazy import via _import_piper() - Module-level voice cache keyed on (model_path, use_cuda) so switching voices doesn't invalidate older cached voices - _resolve_piper_voice_path() accepts either an absolute .onnx path or a voice name (auto-downloaded on first use via 'python -m piper.download_voices --download-dir <cache>') - Voice cache at ~/.hermes/cache/piper-voices/ (profile-aware via get_hermes_dir) - Optional SynthesisConfig knobs: length_scale, noise_scale, noise_w_scale, volume, normalize_audio, use_cuda — passed through only when configured, so older piper-tts versions aren't broken - WAV output then ffmpeg conversion path (same as neutts/kittentts) so Telegram voice bubbles work when ffmpeg is present - Piper added to BUILTIN_TTS_PROVIDERS so a user's tts.providers.piper.command cannot shadow the native provider (regression test included) - 'hermes tools' wizard entry - Piper appears under Voice and TTS as local free, with 'pip install piper-tts' auto-install via post_setup handler - Prints voice-catalog URL and default-voice info after install - config.yaml defaults - tts.piper.voice defaults to en_US-lessac-medium - Commented advanced knobs for discoverability - Docs - New 'Piper (local, 44 languages)' section in features/tts.md explaining install path, voice switching, pre-downloaded voices, and advanced knobs - Piper listed in the ten-provider table and ffmpeg table - Custom-command-providers section updated to drop the Piper example (now native) and add a piper-custom example for users with their own trained .onnx models - overview.md bumps provider count to ten - Tests (tests/tools/test_tts_piper.py, 16 tests) - Registration (BUILTIN_TTS_PROVIDERS, PROVIDER_MAX_TEXT_LENGTH) - _resolve_piper_voice_path across every branch: direct .onnx path, cached voice name, fresh download with correct CLI args, download failure, successful-exit-but-missing-files, empty voice to default - _generate_piper_tts: loads voice once, reuses cache, voice-name download wiring, advanced knobs flow through SynthesisConfig - text_to_speech_tool end-to-end dispatch and missing-package error - check_tts_requirements: piper availability toggles the return value - Regression guard: piper cannot be shadowed by a command provider with the same name - Pre-existing test_tts_mistral test broadened to mock the new piper/kittentts/command-provider checks (otherwise it false-passes when piper is installed in the test venv) E2E verification (live): Actual pip install piper-tts, config piper + en_US-lessac-low, text_to_speech_tool call, voice auto-downloaded from HuggingFace, WAV synthesized, ffmpeg-converted to Ogg/Opus. Second call hits the cache (~60ms). Cache dir populated with .onnx and .onnx.json. This caught a real bug during development: the first pass used '-d' as the download-dir flag; the actual piper.download_voices CLI wants '--download-dir'. Fixed before PR opened.
2026-04-30 02:53:20 -07:00
Without ffmpeg, Edge TTS, MiniMax TTS, NeuTTS, KittenTTS, and Piper audio are sent as regular audio files (playable, but shown as a rectangular player instead of a voice bubble).
:::tip
If you want voice bubbles without installing ffmpeg, switch to the OpenAI, ElevenLabs, or Mistral provider.
:::
feat(tts): add Piper as a native local TTS provider (closes #8508) (#17885) Piper (OHF-Voice/piper1-gpl) is a fast, local neural TTS engine from the Home Assistant project that supports 44 languages with zero API keys. Adds it as a native built-in provider alongside edge/neutts/kittentts, installable via 'hermes tools' with one keystroke. What ships: - New 'piper' built-in provider in tools/tts_tool.py - Lazy import via _import_piper() - Module-level voice cache keyed on (model_path, use_cuda) so switching voices doesn't invalidate older cached voices - _resolve_piper_voice_path() accepts either an absolute .onnx path or a voice name (auto-downloaded on first use via 'python -m piper.download_voices --download-dir <cache>') - Voice cache at ~/.hermes/cache/piper-voices/ (profile-aware via get_hermes_dir) - Optional SynthesisConfig knobs: length_scale, noise_scale, noise_w_scale, volume, normalize_audio, use_cuda — passed through only when configured, so older piper-tts versions aren't broken - WAV output then ffmpeg conversion path (same as neutts/kittentts) so Telegram voice bubbles work when ffmpeg is present - Piper added to BUILTIN_TTS_PROVIDERS so a user's tts.providers.piper.command cannot shadow the native provider (regression test included) - 'hermes tools' wizard entry - Piper appears under Voice and TTS as local free, with 'pip install piper-tts' auto-install via post_setup handler - Prints voice-catalog URL and default-voice info after install - config.yaml defaults - tts.piper.voice defaults to en_US-lessac-medium - Commented advanced knobs for discoverability - Docs - New 'Piper (local, 44 languages)' section in features/tts.md explaining install path, voice switching, pre-downloaded voices, and advanced knobs - Piper listed in the ten-provider table and ffmpeg table - Custom-command-providers section updated to drop the Piper example (now native) and add a piper-custom example for users with their own trained .onnx models - overview.md bumps provider count to ten - Tests (tests/tools/test_tts_piper.py, 16 tests) - Registration (BUILTIN_TTS_PROVIDERS, PROVIDER_MAX_TEXT_LENGTH) - _resolve_piper_voice_path across every branch: direct .onnx path, cached voice name, fresh download with correct CLI args, download failure, successful-exit-but-missing-files, empty voice to default - _generate_piper_tts: loads voice once, reuses cache, voice-name download wiring, advanced knobs flow through SynthesisConfig - text_to_speech_tool end-to-end dispatch and missing-package error - check_tts_requirements: piper availability toggles the return value - Regression guard: piper cannot be shadowed by a command provider with the same name - Pre-existing test_tts_mistral test broadened to mock the new piper/kittentts/command-provider checks (otherwise it false-passes when piper is installed in the test venv) E2E verification (live): Actual pip install piper-tts, config piper + en_US-lessac-low, text_to_speech_tool call, voice auto-downloaded from HuggingFace, WAV synthesized, ffmpeg-converted to Ogg/Opus. Second call hits the cache (~60ms). Cache dir populated with .onnx and .onnx.json. This caught a real bug during development: the first pass used '-d' as the download-dir flag; the actual piper.download_voices CLI wants '--download-dir'. Fixed before PR opened.
2026-04-30 02:53:20 -07:00
### Piper (local, 44 languages)
Piper is a fast, local neural TTS engine from the Open Home Foundation (the Home Assistant maintainers). It runs entirely on CPU, supports **44 languages** with pre-trained voices, and needs no API key.
**Install via `hermes tools`** → Voice & TTS → Piper — Hermes runs `pip install piper-tts` for you. Or install manually: `pip install piper-tts`.
**Switch to Piper:**
```yaml
tts:
provider: piper
piper:
voice: en_US-lessac-medium
```
On the first TTS call for a voice that isn't cached locally, Hermes runs `python -m piper.download_voices <name>` and downloads the model (~20-90MB depending on quality tier) into `~/.hermes/cache/piper-voices/`. Subsequent calls reuse the cached model.
**Picking a voice.** The [full voice catalog](https://github.com/OHF-Voice/piper1-gpl/blob/main/docs/VOICES.md) covers English, Spanish, French, German, Italian, Dutch, Portuguese, Russian, Polish, Turkish, Chinese, Arabic, Hindi, and more — each with `x_low` / `low` / `medium` / `high` quality tiers. Sample voices at [rhasspy.github.io/piper-samples](https://rhasspy.github.io/piper-samples/).
**Using a pre-downloaded voice.** Set `tts.piper.voice` to an absolute path ending in `.onnx`:
```yaml
tts:
piper:
voice: /path/to/my-custom-voice.onnx
```
**Advanced knobs** (`tts.piper.length_scale` / `noise_scale` / `noise_w_scale` / `volume` / `normalize_audio`, `use_cuda`) correspond 1:1 to Piper's `SynthesisConfig`. They're ignored on older `piper-tts` versions.
feat(tts): add command-type provider registry under tts.providers.<name> (#17843) Reshape of PR #17211 (@versun). Lets users wire any local or external TTS CLI into Hermes without adding engine-specific Python code. Users declare any number of named providers in config.yaml and switch between them with tts.provider: <name>, alongside the built-ins (edge, openai, elevenlabs, …). Config shape: tts: provider: piper-en providers: piper-en: type: command command: 'piper -m ~/model.onnx -f {output_path} < {input_path}' output_format: wav Placeholders: {input_path}, {text_path}, {output_path}, {format}, {voice}, {model}, {speed}. Use {{ / }} for literal braces. Key behavior: - Built-in provider names always win — a tts.providers.openai entry cannot shadow the native OpenAI provider. - type: command is the default when command: is set. - Placeholder values are shell-quote-aware (bare / single / double context), so paths with spaces and shell metacharacters are safe. - Default delivery is a regular audio attachment. voice_compatible: true opts in to Telegram voice-bubble delivery via ffmpeg Opus conversion. - Command failures (non-zero exit, timeout, empty output) surface to the agent with stderr/stdout included so you can debug from chat. - Process-tree kill on timeout (Unix killpg, Windows taskkill /T). - max_text_length defaults to 5000 for command providers; override under tts.providers.<name>.max_text_length. Tests: tests/tools/test_tts_command_providers.py — 42 new tests cover provider resolution, shell-quote context, placeholder rendering with injection payloads, timeout, non-zero exit, empty output, voice_compatible opt-in, and end-to-end dispatch through text_to_speech_tool. All 88 pre-existing TTS tests still pass. Docs: new "Custom command providers" section in website/docs/user-guide/features/tts.md with three worked examples (Piper, VoxCPM, MLX-Kokoro), placeholder reference, optional keys, behavior notes, and security caveat. E2E-verified live: isolated HERMES_HOME, command provider declared in config.yaml, text_to_speech_tool dispatches through the registered shell command and the output file is produced as expected. Co-authored-by: Versun <me+github7604@versun.org>
2026-04-30 02:29:08 -07:00
### Custom command providers
feat(tts): add Piper as a native local TTS provider (closes #8508) (#17885) Piper (OHF-Voice/piper1-gpl) is a fast, local neural TTS engine from the Home Assistant project that supports 44 languages with zero API keys. Adds it as a native built-in provider alongside edge/neutts/kittentts, installable via 'hermes tools' with one keystroke. What ships: - New 'piper' built-in provider in tools/tts_tool.py - Lazy import via _import_piper() - Module-level voice cache keyed on (model_path, use_cuda) so switching voices doesn't invalidate older cached voices - _resolve_piper_voice_path() accepts either an absolute .onnx path or a voice name (auto-downloaded on first use via 'python -m piper.download_voices --download-dir <cache>') - Voice cache at ~/.hermes/cache/piper-voices/ (profile-aware via get_hermes_dir) - Optional SynthesisConfig knobs: length_scale, noise_scale, noise_w_scale, volume, normalize_audio, use_cuda — passed through only when configured, so older piper-tts versions aren't broken - WAV output then ffmpeg conversion path (same as neutts/kittentts) so Telegram voice bubbles work when ffmpeg is present - Piper added to BUILTIN_TTS_PROVIDERS so a user's tts.providers.piper.command cannot shadow the native provider (regression test included) - 'hermes tools' wizard entry - Piper appears under Voice and TTS as local free, with 'pip install piper-tts' auto-install via post_setup handler - Prints voice-catalog URL and default-voice info after install - config.yaml defaults - tts.piper.voice defaults to en_US-lessac-medium - Commented advanced knobs for discoverability - Docs - New 'Piper (local, 44 languages)' section in features/tts.md explaining install path, voice switching, pre-downloaded voices, and advanced knobs - Piper listed in the ten-provider table and ffmpeg table - Custom-command-providers section updated to drop the Piper example (now native) and add a piper-custom example for users with their own trained .onnx models - overview.md bumps provider count to ten - Tests (tests/tools/test_tts_piper.py, 16 tests) - Registration (BUILTIN_TTS_PROVIDERS, PROVIDER_MAX_TEXT_LENGTH) - _resolve_piper_voice_path across every branch: direct .onnx path, cached voice name, fresh download with correct CLI args, download failure, successful-exit-but-missing-files, empty voice to default - _generate_piper_tts: loads voice once, reuses cache, voice-name download wiring, advanced knobs flow through SynthesisConfig - text_to_speech_tool end-to-end dispatch and missing-package error - check_tts_requirements: piper availability toggles the return value - Regression guard: piper cannot be shadowed by a command provider with the same name - Pre-existing test_tts_mistral test broadened to mock the new piper/kittentts/command-provider checks (otherwise it false-passes when piper is installed in the test venv) E2E verification (live): Actual pip install piper-tts, config piper + en_US-lessac-low, text_to_speech_tool call, voice auto-downloaded from HuggingFace, WAV synthesized, ffmpeg-converted to Ogg/Opus. Second call hits the cache (~60ms). Cache dir populated with .onnx and .onnx.json. This caught a real bug during development: the first pass used '-d' as the download-dir flag; the actual piper.download_voices CLI wants '--download-dir'. Fixed before PR opened.
2026-04-30 02:53:20 -07:00
If a TTS engine you want isn't natively supported (VoxCPM, MLX-Kokoro, XTTS CLI, a voice-cloning script, anything else that exposes a CLI), you can wire it in as a **command-type provider** without writing any Python. Hermes writes the input text to a temp UTF-8 file, runs your shell command, and reads the audio file the command produced.
feat(tts): add command-type provider registry under tts.providers.<name> (#17843) Reshape of PR #17211 (@versun). Lets users wire any local or external TTS CLI into Hermes without adding engine-specific Python code. Users declare any number of named providers in config.yaml and switch between them with tts.provider: <name>, alongside the built-ins (edge, openai, elevenlabs, …). Config shape: tts: provider: piper-en providers: piper-en: type: command command: 'piper -m ~/model.onnx -f {output_path} < {input_path}' output_format: wav Placeholders: {input_path}, {text_path}, {output_path}, {format}, {voice}, {model}, {speed}. Use {{ / }} for literal braces. Key behavior: - Built-in provider names always win — a tts.providers.openai entry cannot shadow the native OpenAI provider. - type: command is the default when command: is set. - Placeholder values are shell-quote-aware (bare / single / double context), so paths with spaces and shell metacharacters are safe. - Default delivery is a regular audio attachment. voice_compatible: true opts in to Telegram voice-bubble delivery via ffmpeg Opus conversion. - Command failures (non-zero exit, timeout, empty output) surface to the agent with stderr/stdout included so you can debug from chat. - Process-tree kill on timeout (Unix killpg, Windows taskkill /T). - max_text_length defaults to 5000 for command providers; override under tts.providers.<name>.max_text_length. Tests: tests/tools/test_tts_command_providers.py — 42 new tests cover provider resolution, shell-quote context, placeholder rendering with injection payloads, timeout, non-zero exit, empty output, voice_compatible opt-in, and end-to-end dispatch through text_to_speech_tool. All 88 pre-existing TTS tests still pass. Docs: new "Custom command providers" section in website/docs/user-guide/features/tts.md with three worked examples (Piper, VoxCPM, MLX-Kokoro), placeholder reference, optional keys, behavior notes, and security caveat. E2E-verified live: isolated HERMES_HOME, command provider declared in config.yaml, text_to_speech_tool dispatches through the registered shell command and the output file is produced as expected. Co-authored-by: Versun <me+github7604@versun.org>
2026-04-30 02:29:08 -07:00
Declare one or more providers under `tts.providers.<name>` and switch between them with `tts.provider: <name>` — the same way you switch between built-ins like `edge` and `openai`.
```yaml
tts:
feat(tts): add Piper as a native local TTS provider (closes #8508) (#17885) Piper (OHF-Voice/piper1-gpl) is a fast, local neural TTS engine from the Home Assistant project that supports 44 languages with zero API keys. Adds it as a native built-in provider alongside edge/neutts/kittentts, installable via 'hermes tools' with one keystroke. What ships: - New 'piper' built-in provider in tools/tts_tool.py - Lazy import via _import_piper() - Module-level voice cache keyed on (model_path, use_cuda) so switching voices doesn't invalidate older cached voices - _resolve_piper_voice_path() accepts either an absolute .onnx path or a voice name (auto-downloaded on first use via 'python -m piper.download_voices --download-dir <cache>') - Voice cache at ~/.hermes/cache/piper-voices/ (profile-aware via get_hermes_dir) - Optional SynthesisConfig knobs: length_scale, noise_scale, noise_w_scale, volume, normalize_audio, use_cuda — passed through only when configured, so older piper-tts versions aren't broken - WAV output then ffmpeg conversion path (same as neutts/kittentts) so Telegram voice bubbles work when ffmpeg is present - Piper added to BUILTIN_TTS_PROVIDERS so a user's tts.providers.piper.command cannot shadow the native provider (regression test included) - 'hermes tools' wizard entry - Piper appears under Voice and TTS as local free, with 'pip install piper-tts' auto-install via post_setup handler - Prints voice-catalog URL and default-voice info after install - config.yaml defaults - tts.piper.voice defaults to en_US-lessac-medium - Commented advanced knobs for discoverability - Docs - New 'Piper (local, 44 languages)' section in features/tts.md explaining install path, voice switching, pre-downloaded voices, and advanced knobs - Piper listed in the ten-provider table and ffmpeg table - Custom-command-providers section updated to drop the Piper example (now native) and add a piper-custom example for users with their own trained .onnx models - overview.md bumps provider count to ten - Tests (tests/tools/test_tts_piper.py, 16 tests) - Registration (BUILTIN_TTS_PROVIDERS, PROVIDER_MAX_TEXT_LENGTH) - _resolve_piper_voice_path across every branch: direct .onnx path, cached voice name, fresh download with correct CLI args, download failure, successful-exit-but-missing-files, empty voice to default - _generate_piper_tts: loads voice once, reuses cache, voice-name download wiring, advanced knobs flow through SynthesisConfig - text_to_speech_tool end-to-end dispatch and missing-package error - check_tts_requirements: piper availability toggles the return value - Regression guard: piper cannot be shadowed by a command provider with the same name - Pre-existing test_tts_mistral test broadened to mock the new piper/kittentts/command-provider checks (otherwise it false-passes when piper is installed in the test venv) E2E verification (live): Actual pip install piper-tts, config piper + en_US-lessac-low, text_to_speech_tool call, voice auto-downloaded from HuggingFace, WAV synthesized, ffmpeg-converted to Ogg/Opus. Second call hits the cache (~60ms). Cache dir populated with .onnx and .onnx.json. This caught a real bug during development: the first pass used '-d' as the download-dir flag; the actual piper.download_voices CLI wants '--download-dir'. Fixed before PR opened.
2026-04-30 02:53:20 -07:00
provider: voxcpm # pick any name under tts.providers
feat(tts): add command-type provider registry under tts.providers.<name> (#17843) Reshape of PR #17211 (@versun). Lets users wire any local or external TTS CLI into Hermes without adding engine-specific Python code. Users declare any number of named providers in config.yaml and switch between them with tts.provider: <name>, alongside the built-ins (edge, openai, elevenlabs, …). Config shape: tts: provider: piper-en providers: piper-en: type: command command: 'piper -m ~/model.onnx -f {output_path} < {input_path}' output_format: wav Placeholders: {input_path}, {text_path}, {output_path}, {format}, {voice}, {model}, {speed}. Use {{ / }} for literal braces. Key behavior: - Built-in provider names always win — a tts.providers.openai entry cannot shadow the native OpenAI provider. - type: command is the default when command: is set. - Placeholder values are shell-quote-aware (bare / single / double context), so paths with spaces and shell metacharacters are safe. - Default delivery is a regular audio attachment. voice_compatible: true opts in to Telegram voice-bubble delivery via ffmpeg Opus conversion. - Command failures (non-zero exit, timeout, empty output) surface to the agent with stderr/stdout included so you can debug from chat. - Process-tree kill on timeout (Unix killpg, Windows taskkill /T). - max_text_length defaults to 5000 for command providers; override under tts.providers.<name>.max_text_length. Tests: tests/tools/test_tts_command_providers.py — 42 new tests cover provider resolution, shell-quote context, placeholder rendering with injection payloads, timeout, non-zero exit, empty output, voice_compatible opt-in, and end-to-end dispatch through text_to_speech_tool. All 88 pre-existing TTS tests still pass. Docs: new "Custom command providers" section in website/docs/user-guide/features/tts.md with three worked examples (Piper, VoxCPM, MLX-Kokoro), placeholder reference, optional keys, behavior notes, and security caveat. E2E-verified live: isolated HERMES_HOME, command provider declared in config.yaml, text_to_speech_tool dispatches through the registered shell command and the output file is produced as expected. Co-authored-by: Versun <me+github7604@versun.org>
2026-04-30 02:29:08 -07:00
providers:
voxcpm:
type: command
command: "voxcpm --ref ~/voice.wav --text-file {input_path} --out {output_path}"
output_format: mp3
timeout: 180
voice_compatible: true # try to deliver as a Telegram voice bubble
mlx-kokoro:
type: command
command: "python -m mlx_kokoro --in {input_path} --out {output_path} --voice {voice}"
voice: af_sky
output_format: wav
feat(tts): add Piper as a native local TTS provider (closes #8508) (#17885) Piper (OHF-Voice/piper1-gpl) is a fast, local neural TTS engine from the Home Assistant project that supports 44 languages with zero API keys. Adds it as a native built-in provider alongside edge/neutts/kittentts, installable via 'hermes tools' with one keystroke. What ships: - New 'piper' built-in provider in tools/tts_tool.py - Lazy import via _import_piper() - Module-level voice cache keyed on (model_path, use_cuda) so switching voices doesn't invalidate older cached voices - _resolve_piper_voice_path() accepts either an absolute .onnx path or a voice name (auto-downloaded on first use via 'python -m piper.download_voices --download-dir <cache>') - Voice cache at ~/.hermes/cache/piper-voices/ (profile-aware via get_hermes_dir) - Optional SynthesisConfig knobs: length_scale, noise_scale, noise_w_scale, volume, normalize_audio, use_cuda — passed through only when configured, so older piper-tts versions aren't broken - WAV output then ffmpeg conversion path (same as neutts/kittentts) so Telegram voice bubbles work when ffmpeg is present - Piper added to BUILTIN_TTS_PROVIDERS so a user's tts.providers.piper.command cannot shadow the native provider (regression test included) - 'hermes tools' wizard entry - Piper appears under Voice and TTS as local free, with 'pip install piper-tts' auto-install via post_setup handler - Prints voice-catalog URL and default-voice info after install - config.yaml defaults - tts.piper.voice defaults to en_US-lessac-medium - Commented advanced knobs for discoverability - Docs - New 'Piper (local, 44 languages)' section in features/tts.md explaining install path, voice switching, pre-downloaded voices, and advanced knobs - Piper listed in the ten-provider table and ffmpeg table - Custom-command-providers section updated to drop the Piper example (now native) and add a piper-custom example for users with their own trained .onnx models - overview.md bumps provider count to ten - Tests (tests/tools/test_tts_piper.py, 16 tests) - Registration (BUILTIN_TTS_PROVIDERS, PROVIDER_MAX_TEXT_LENGTH) - _resolve_piper_voice_path across every branch: direct .onnx path, cached voice name, fresh download with correct CLI args, download failure, successful-exit-but-missing-files, empty voice to default - _generate_piper_tts: loads voice once, reuses cache, voice-name download wiring, advanced knobs flow through SynthesisConfig - text_to_speech_tool end-to-end dispatch and missing-package error - check_tts_requirements: piper availability toggles the return value - Regression guard: piper cannot be shadowed by a command provider with the same name - Pre-existing test_tts_mistral test broadened to mock the new piper/kittentts/command-provider checks (otherwise it false-passes when piper is installed in the test venv) E2E verification (live): Actual pip install piper-tts, config piper + en_US-lessac-low, text_to_speech_tool call, voice auto-downloaded from HuggingFace, WAV synthesized, ffmpeg-converted to Ogg/Opus. Second call hits the cache (~60ms). Cache dir populated with .onnx and .onnx.json. This caught a real bug during development: the first pass used '-d' as the download-dir flag; the actual piper.download_voices CLI wants '--download-dir'. Fixed before PR opened.
2026-04-30 02:53:20 -07:00
piper-custom: # native Piper also supports custom .onnx via tts.piper.voice
type: command
command: "piper -m /path/to/custom.onnx -f {output_path} < {input_path}"
output_format: wav
feat(tts): add command-type provider registry under tts.providers.<name> (#17843) Reshape of PR #17211 (@versun). Lets users wire any local or external TTS CLI into Hermes without adding engine-specific Python code. Users declare any number of named providers in config.yaml and switch between them with tts.provider: <name>, alongside the built-ins (edge, openai, elevenlabs, …). Config shape: tts: provider: piper-en providers: piper-en: type: command command: 'piper -m ~/model.onnx -f {output_path} < {input_path}' output_format: wav Placeholders: {input_path}, {text_path}, {output_path}, {format}, {voice}, {model}, {speed}. Use {{ / }} for literal braces. Key behavior: - Built-in provider names always win — a tts.providers.openai entry cannot shadow the native OpenAI provider. - type: command is the default when command: is set. - Placeholder values are shell-quote-aware (bare / single / double context), so paths with spaces and shell metacharacters are safe. - Default delivery is a regular audio attachment. voice_compatible: true opts in to Telegram voice-bubble delivery via ffmpeg Opus conversion. - Command failures (non-zero exit, timeout, empty output) surface to the agent with stderr/stdout included so you can debug from chat. - Process-tree kill on timeout (Unix killpg, Windows taskkill /T). - max_text_length defaults to 5000 for command providers; override under tts.providers.<name>.max_text_length. Tests: tests/tools/test_tts_command_providers.py — 42 new tests cover provider resolution, shell-quote context, placeholder rendering with injection payloads, timeout, non-zero exit, empty output, voice_compatible opt-in, and end-to-end dispatch through text_to_speech_tool. All 88 pre-existing TTS tests still pass. Docs: new "Custom command providers" section in website/docs/user-guide/features/tts.md with three worked examples (Piper, VoxCPM, MLX-Kokoro), placeholder reference, optional keys, behavior notes, and security caveat. E2E-verified live: isolated HERMES_HOME, command provider declared in config.yaml, text_to_speech_tool dispatches through the registered shell command and the output file is produced as expected. Co-authored-by: Versun <me+github7604@versun.org>
2026-04-30 02:29:08 -07:00
```
#### Placeholders
Your command template can reference these placeholders. Hermes substitutes them at render time and shell-quotes each value for the surrounding context (bare / single-quoted / double-quoted), so paths with spaces and other shell-sensitive characters are safe.
| Placeholder | Meaning |
|------------------|------------------------------------------------------|
| `{input_path}` | Path to the temp UTF-8 text file Hermes wrote |
| `{text_path}` | Alias for `{input_path}` |
| `{output_path}` | Path the command must write audio to |
| `{format}` | `mp3` / `wav` / `ogg` / `flac` |
| `{voice}` | `tts.providers.<name>.voice`, empty when unset |
| `{model}` | `tts.providers.<name>.model` |
| `{speed}` | Resolved speed multiplier (provider or global) |
Use `{{` and `}}` for literal braces.
#### Optional keys
| Key | Default | Meaning |
|--------------------|---------|------------------------------------------------------------------------------------------------------------|
| `timeout` | `120` | Seconds; the process tree is killed on expiry (Unix `killpg`, Windows `taskkill /T`). |
| `output_format` | `mp3` | One of `mp3` / `wav` / `ogg` / `flac`. Auto-inferred from the output extension if Hermes picks a path. |
| `voice_compatible` | `false` | When `true`, Hermes converts MP3/WAV output to Opus/OGG via ffmpeg so Telegram renders a voice bubble. |
| `max_text_length` | `5000` | Input is truncated to this length before rendering the command. |
| `voice` / `model` | empty | Passed to the command as placeholder values only. |
#### Behavior notes
- **Built-in names always win.** A `tts.providers.openai` entry never shadows the native OpenAI provider, so no user config can silently replace a built-in.
- **Default delivery is a document.** Command providers deliver as regular audio attachments on every platform. Opt in to voice-bubble delivery per-provider with `voice_compatible: true`.
- **Command failures surface to the agent.** Non-zero exit, empty output, or timeout all return an error with the command's stderr/stdout included so you can debug the provider from the conversation.
- **`type: command` is the default when `command:` is set.** Writing `type: command` explicitly is good practice but not required; an entry with a non-empty `command` string is treated as a command provider.
- **`{input_path}` / `{text_path}` are interchangeable.** Use whichever reads better in your command.
#### Security
Command-type providers run whatever shell command you configure, with your user's permissions. Hermes quotes placeholder values and enforces the configured timeout, but the command template itself is trusted local input — treat it the same way you would a shell script on your PATH.
feat(stt): add free local whisper transcription via faster-whisper (#1185) * fix: Home Assistant event filtering now closed by default Previously, when no watch_domains or watch_entities were configured, ALL state_changed events passed through to the agent, causing users to be flooded with notifications for every HA entity change. Now events are dropped by default unless the user explicitly configures: - watch_domains: list of domains to monitor (e.g. climate, light) - watch_entities: list of specific entity IDs to monitor - watch_all: true (new option — opt-in to receive all events) A warning is logged at connect time if no filters are configured, guiding users to set up their HA platform config. All 49 gateway HA tests + 52 HA tool tests pass. * docs: update Home Assistant integration documentation - homeassistant.md: Fix event filtering docs to reflect closed-by-default behavior. Add watch_all option. Replace Python dict config example with YAML. Fix defaults table (was incorrectly showing 'all'). Add required configuration warning admonition. - environment-variables.md: Add HASS_TOKEN and HASS_URL to Messaging section. - messaging/index.md: Add Home Assistant to description, architecture diagram, platform toolsets table, and Next Steps links. * fix(terminal): strip provider env vars from background and PTY subprocesses Extends the env var blocklist from #1157 to also cover the two remaining leaky paths in process_registry.py: - spawn_local() PTY path (line 156) - spawn_local() background Popen path (line 197) Both were still using raw os.environ, leaking provider vars to background processes and interactive PTY sessions. Now uses the same dynamic _HERMES_PROVIDER_ENV_BLOCKLIST from local.py. Explicit env_vars passed to spawn_local() still override the blocklist, matching the existing behavior for callers that intentionally need these. Gap identified by PR #1004 (@PeterFile). * feat(delegate): add observability metadata to subagent results Enrich delegate_task results with metadata from the child AIAgent: - model: which model the child used - exit_reason: completed | interrupted | max_iterations - tokens.input / tokens.output: token counts - tool_trace: per-tool-call trace with byte sizes and ok/error status Tool trace uses tool_call_id matching to correctly pair parallel tool calls with their results, with a fallback for messages without IDs. Cherry-picked from PR #872 by @omerkaz, with fixes: - Fixed parallel tool call trace pairing (was always updating last entry) - Removed redundant 'iterations' field (identical to existing 'api_calls') - Added test for parallel tool call trace correctness Co-authored-by: omerkaz <omerkaz@users.noreply.github.com> * feat(stt): add free local whisper transcription via faster-whisper Replace OpenAI-only STT with a dual-provider system mirroring the TTS architecture (Edge TTS free / ElevenLabs paid): STT: faster-whisper local (free, default) / OpenAI Whisper API (paid) Changes: - tools/transcription_tools.py: Full rewrite with provider dispatch, config loading, local faster-whisper backend, and OpenAI API backend. Auto-downloads model (~150MB for 'base') on first voice message. Singleton model instance reused across calls. - pyproject.toml: Add faster-whisper>=1.0.0 as core dependency - hermes_cli/config.py: Expand stt config to match TTS pattern with provider selection and per-provider model settings - agent/context_compressor.py: Fix .strip() crash when LLM returns non-string content (dict from llama.cpp, None). Fixes #1100 partially. - tests/: 23 new tests for STT providers + 2 for compressor fix - docs/: Updated Voice & TTS page with STT provider table, model sizes, config examples, and fallback behavior Fallback behavior: - Local not installed → OpenAI API (if key set) - OpenAI key not set → local whisper (if installed) - Neither → graceful error message to user Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com> --------- Co-authored-by: omerkaz <omerkaz@users.noreply.github.com> Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com>
2026-03-13 11:11:05 -07:00
## Voice Message Transcription (STT)
feat(stt): add free local whisper transcription via faster-whisper (#1185) * fix: Home Assistant event filtering now closed by default Previously, when no watch_domains or watch_entities were configured, ALL state_changed events passed through to the agent, causing users to be flooded with notifications for every HA entity change. Now events are dropped by default unless the user explicitly configures: - watch_domains: list of domains to monitor (e.g. climate, light) - watch_entities: list of specific entity IDs to monitor - watch_all: true (new option — opt-in to receive all events) A warning is logged at connect time if no filters are configured, guiding users to set up their HA platform config. All 49 gateway HA tests + 52 HA tool tests pass. * docs: update Home Assistant integration documentation - homeassistant.md: Fix event filtering docs to reflect closed-by-default behavior. Add watch_all option. Replace Python dict config example with YAML. Fix defaults table (was incorrectly showing 'all'). Add required configuration warning admonition. - environment-variables.md: Add HASS_TOKEN and HASS_URL to Messaging section. - messaging/index.md: Add Home Assistant to description, architecture diagram, platform toolsets table, and Next Steps links. * fix(terminal): strip provider env vars from background and PTY subprocesses Extends the env var blocklist from #1157 to also cover the two remaining leaky paths in process_registry.py: - spawn_local() PTY path (line 156) - spawn_local() background Popen path (line 197) Both were still using raw os.environ, leaking provider vars to background processes and interactive PTY sessions. Now uses the same dynamic _HERMES_PROVIDER_ENV_BLOCKLIST from local.py. Explicit env_vars passed to spawn_local() still override the blocklist, matching the existing behavior for callers that intentionally need these. Gap identified by PR #1004 (@PeterFile). * feat(delegate): add observability metadata to subagent results Enrich delegate_task results with metadata from the child AIAgent: - model: which model the child used - exit_reason: completed | interrupted | max_iterations - tokens.input / tokens.output: token counts - tool_trace: per-tool-call trace with byte sizes and ok/error status Tool trace uses tool_call_id matching to correctly pair parallel tool calls with their results, with a fallback for messages without IDs. Cherry-picked from PR #872 by @omerkaz, with fixes: - Fixed parallel tool call trace pairing (was always updating last entry) - Removed redundant 'iterations' field (identical to existing 'api_calls') - Added test for parallel tool call trace correctness Co-authored-by: omerkaz <omerkaz@users.noreply.github.com> * feat(stt): add free local whisper transcription via faster-whisper Replace OpenAI-only STT with a dual-provider system mirroring the TTS architecture (Edge TTS free / ElevenLabs paid): STT: faster-whisper local (free, default) / OpenAI Whisper API (paid) Changes: - tools/transcription_tools.py: Full rewrite with provider dispatch, config loading, local faster-whisper backend, and OpenAI API backend. Auto-downloads model (~150MB for 'base') on first voice message. Singleton model instance reused across calls. - pyproject.toml: Add faster-whisper>=1.0.0 as core dependency - hermes_cli/config.py: Expand stt config to match TTS pattern with provider selection and per-provider model settings - agent/context_compressor.py: Fix .strip() crash when LLM returns non-string content (dict from llama.cpp, None). Fixes #1100 partially. - tests/: 23 new tests for STT providers + 2 for compressor fix - docs/: Updated Voice & TTS page with STT provider table, model sizes, config examples, and fallback behavior Fallback behavior: - Local not installed → OpenAI API (if key set) - OpenAI key not set → local whisper (if installed) - Neither → graceful error message to user Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com> --------- Co-authored-by: omerkaz <omerkaz@users.noreply.github.com> Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com>
2026-03-13 11:11:05 -07:00
Voice messages sent on Telegram, Discord, WhatsApp, Slack, or Signal are automatically transcribed and injected as text into the conversation. The agent sees the transcript as normal text.
feat(stt): add free local whisper transcription via faster-whisper (#1185) * fix: Home Assistant event filtering now closed by default Previously, when no watch_domains or watch_entities were configured, ALL state_changed events passed through to the agent, causing users to be flooded with notifications for every HA entity change. Now events are dropped by default unless the user explicitly configures: - watch_domains: list of domains to monitor (e.g. climate, light) - watch_entities: list of specific entity IDs to monitor - watch_all: true (new option — opt-in to receive all events) A warning is logged at connect time if no filters are configured, guiding users to set up their HA platform config. All 49 gateway HA tests + 52 HA tool tests pass. * docs: update Home Assistant integration documentation - homeassistant.md: Fix event filtering docs to reflect closed-by-default behavior. Add watch_all option. Replace Python dict config example with YAML. Fix defaults table (was incorrectly showing 'all'). Add required configuration warning admonition. - environment-variables.md: Add HASS_TOKEN and HASS_URL to Messaging section. - messaging/index.md: Add Home Assistant to description, architecture diagram, platform toolsets table, and Next Steps links. * fix(terminal): strip provider env vars from background and PTY subprocesses Extends the env var blocklist from #1157 to also cover the two remaining leaky paths in process_registry.py: - spawn_local() PTY path (line 156) - spawn_local() background Popen path (line 197) Both were still using raw os.environ, leaking provider vars to background processes and interactive PTY sessions. Now uses the same dynamic _HERMES_PROVIDER_ENV_BLOCKLIST from local.py. Explicit env_vars passed to spawn_local() still override the blocklist, matching the existing behavior for callers that intentionally need these. Gap identified by PR #1004 (@PeterFile). * feat(delegate): add observability metadata to subagent results Enrich delegate_task results with metadata from the child AIAgent: - model: which model the child used - exit_reason: completed | interrupted | max_iterations - tokens.input / tokens.output: token counts - tool_trace: per-tool-call trace with byte sizes and ok/error status Tool trace uses tool_call_id matching to correctly pair parallel tool calls with their results, with a fallback for messages without IDs. Cherry-picked from PR #872 by @omerkaz, with fixes: - Fixed parallel tool call trace pairing (was always updating last entry) - Removed redundant 'iterations' field (identical to existing 'api_calls') - Added test for parallel tool call trace correctness Co-authored-by: omerkaz <omerkaz@users.noreply.github.com> * feat(stt): add free local whisper transcription via faster-whisper Replace OpenAI-only STT with a dual-provider system mirroring the TTS architecture (Edge TTS free / ElevenLabs paid): STT: faster-whisper local (free, default) / OpenAI Whisper API (paid) Changes: - tools/transcription_tools.py: Full rewrite with provider dispatch, config loading, local faster-whisper backend, and OpenAI API backend. Auto-downloads model (~150MB for 'base') on first voice message. Singleton model instance reused across calls. - pyproject.toml: Add faster-whisper>=1.0.0 as core dependency - hermes_cli/config.py: Expand stt config to match TTS pattern with provider selection and per-provider model settings - agent/context_compressor.py: Fix .strip() crash when LLM returns non-string content (dict from llama.cpp, None). Fixes #1100 partially. - tests/: 23 new tests for STT providers + 2 for compressor fix - docs/: Updated Voice & TTS page with STT provider table, model sizes, config examples, and fallback behavior Fallback behavior: - Local not installed → OpenAI API (if key set) - OpenAI key not set → local whisper (if installed) - Neither → graceful error message to user Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com> --------- Co-authored-by: omerkaz <omerkaz@users.noreply.github.com> Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com>
2026-03-13 11:11:05 -07:00
| Provider | Quality | Cost | API Key |
|----------|---------|------|---------|
| **Local Whisper** (default) | Good | Free | None needed |
| **Groq Whisper API** | GoodBest | Free tier | `GROQ_API_KEY` |
| **OpenAI Whisper API** | GoodBest | Paid | `VOICE_TOOLS_OPENAI_KEY` or `OPENAI_API_KEY` |
feat(stt): add free local whisper transcription via faster-whisper (#1185) * fix: Home Assistant event filtering now closed by default Previously, when no watch_domains or watch_entities were configured, ALL state_changed events passed through to the agent, causing users to be flooded with notifications for every HA entity change. Now events are dropped by default unless the user explicitly configures: - watch_domains: list of domains to monitor (e.g. climate, light) - watch_entities: list of specific entity IDs to monitor - watch_all: true (new option — opt-in to receive all events) A warning is logged at connect time if no filters are configured, guiding users to set up their HA platform config. All 49 gateway HA tests + 52 HA tool tests pass. * docs: update Home Assistant integration documentation - homeassistant.md: Fix event filtering docs to reflect closed-by-default behavior. Add watch_all option. Replace Python dict config example with YAML. Fix defaults table (was incorrectly showing 'all'). Add required configuration warning admonition. - environment-variables.md: Add HASS_TOKEN and HASS_URL to Messaging section. - messaging/index.md: Add Home Assistant to description, architecture diagram, platform toolsets table, and Next Steps links. * fix(terminal): strip provider env vars from background and PTY subprocesses Extends the env var blocklist from #1157 to also cover the two remaining leaky paths in process_registry.py: - spawn_local() PTY path (line 156) - spawn_local() background Popen path (line 197) Both were still using raw os.environ, leaking provider vars to background processes and interactive PTY sessions. Now uses the same dynamic _HERMES_PROVIDER_ENV_BLOCKLIST from local.py. Explicit env_vars passed to spawn_local() still override the blocklist, matching the existing behavior for callers that intentionally need these. Gap identified by PR #1004 (@PeterFile). * feat(delegate): add observability metadata to subagent results Enrich delegate_task results with metadata from the child AIAgent: - model: which model the child used - exit_reason: completed | interrupted | max_iterations - tokens.input / tokens.output: token counts - tool_trace: per-tool-call trace with byte sizes and ok/error status Tool trace uses tool_call_id matching to correctly pair parallel tool calls with their results, with a fallback for messages without IDs. Cherry-picked from PR #872 by @omerkaz, with fixes: - Fixed parallel tool call trace pairing (was always updating last entry) - Removed redundant 'iterations' field (identical to existing 'api_calls') - Added test for parallel tool call trace correctness Co-authored-by: omerkaz <omerkaz@users.noreply.github.com> * feat(stt): add free local whisper transcription via faster-whisper Replace OpenAI-only STT with a dual-provider system mirroring the TTS architecture (Edge TTS free / ElevenLabs paid): STT: faster-whisper local (free, default) / OpenAI Whisper API (paid) Changes: - tools/transcription_tools.py: Full rewrite with provider dispatch, config loading, local faster-whisper backend, and OpenAI API backend. Auto-downloads model (~150MB for 'base') on first voice message. Singleton model instance reused across calls. - pyproject.toml: Add faster-whisper>=1.0.0 as core dependency - hermes_cli/config.py: Expand stt config to match TTS pattern with provider selection and per-provider model settings - agent/context_compressor.py: Fix .strip() crash when LLM returns non-string content (dict from llama.cpp, None). Fixes #1100 partially. - tests/: 23 new tests for STT providers + 2 for compressor fix - docs/: Updated Voice & TTS page with STT provider table, model sizes, config examples, and fallback behavior Fallback behavior: - Local not installed → OpenAI API (if key set) - OpenAI key not set → local whisper (if installed) - Neither → graceful error message to user Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com> --------- Co-authored-by: omerkaz <omerkaz@users.noreply.github.com> Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com>
2026-03-13 11:11:05 -07:00
:::info Zero Config
Local transcription works out of the box when `faster-whisper` is installed. If that's unavailable, Hermes can also use a local `whisper` CLI from common install locations (like `/opt/homebrew/bin`) or a custom command via `HERMES_LOCAL_STT_COMMAND`.
feat(stt): add free local whisper transcription via faster-whisper (#1185) * fix: Home Assistant event filtering now closed by default Previously, when no watch_domains or watch_entities were configured, ALL state_changed events passed through to the agent, causing users to be flooded with notifications for every HA entity change. Now events are dropped by default unless the user explicitly configures: - watch_domains: list of domains to monitor (e.g. climate, light) - watch_entities: list of specific entity IDs to monitor - watch_all: true (new option — opt-in to receive all events) A warning is logged at connect time if no filters are configured, guiding users to set up their HA platform config. All 49 gateway HA tests + 52 HA tool tests pass. * docs: update Home Assistant integration documentation - homeassistant.md: Fix event filtering docs to reflect closed-by-default behavior. Add watch_all option. Replace Python dict config example with YAML. Fix defaults table (was incorrectly showing 'all'). Add required configuration warning admonition. - environment-variables.md: Add HASS_TOKEN and HASS_URL to Messaging section. - messaging/index.md: Add Home Assistant to description, architecture diagram, platform toolsets table, and Next Steps links. * fix(terminal): strip provider env vars from background and PTY subprocesses Extends the env var blocklist from #1157 to also cover the two remaining leaky paths in process_registry.py: - spawn_local() PTY path (line 156) - spawn_local() background Popen path (line 197) Both were still using raw os.environ, leaking provider vars to background processes and interactive PTY sessions. Now uses the same dynamic _HERMES_PROVIDER_ENV_BLOCKLIST from local.py. Explicit env_vars passed to spawn_local() still override the blocklist, matching the existing behavior for callers that intentionally need these. Gap identified by PR #1004 (@PeterFile). * feat(delegate): add observability metadata to subagent results Enrich delegate_task results with metadata from the child AIAgent: - model: which model the child used - exit_reason: completed | interrupted | max_iterations - tokens.input / tokens.output: token counts - tool_trace: per-tool-call trace with byte sizes and ok/error status Tool trace uses tool_call_id matching to correctly pair parallel tool calls with their results, with a fallback for messages without IDs. Cherry-picked from PR #872 by @omerkaz, with fixes: - Fixed parallel tool call trace pairing (was always updating last entry) - Removed redundant 'iterations' field (identical to existing 'api_calls') - Added test for parallel tool call trace correctness Co-authored-by: omerkaz <omerkaz@users.noreply.github.com> * feat(stt): add free local whisper transcription via faster-whisper Replace OpenAI-only STT with a dual-provider system mirroring the TTS architecture (Edge TTS free / ElevenLabs paid): STT: faster-whisper local (free, default) / OpenAI Whisper API (paid) Changes: - tools/transcription_tools.py: Full rewrite with provider dispatch, config loading, local faster-whisper backend, and OpenAI API backend. Auto-downloads model (~150MB for 'base') on first voice message. Singleton model instance reused across calls. - pyproject.toml: Add faster-whisper>=1.0.0 as core dependency - hermes_cli/config.py: Expand stt config to match TTS pattern with provider selection and per-provider model settings - agent/context_compressor.py: Fix .strip() crash when LLM returns non-string content (dict from llama.cpp, None). Fixes #1100 partially. - tests/: 23 new tests for STT providers + 2 for compressor fix - docs/: Updated Voice & TTS page with STT provider table, model sizes, config examples, and fallback behavior Fallback behavior: - Local not installed → OpenAI API (if key set) - OpenAI key not set → local whisper (if installed) - Neither → graceful error message to user Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com> --------- Co-authored-by: omerkaz <omerkaz@users.noreply.github.com> Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com>
2026-03-13 11:11:05 -07:00
:::
### Configuration
```yaml
# In ~/.hermes/config.yaml
stt:
docs: two-week gap sweep — platforms, CLI, config, TUI, hooks, providers (#17727) Covers ~60 merged PRs from Apr 15–29 that shipped user-visible behavior without docs coverage. No functional code changes; docs + static manifest regeneration only. Highlights: Stale / incorrect: - configuration.md: auxiliary auto-routing line was wrong since #11900; now correctly states auto routes to the main model, with a note on the cost trade-off and per-task override pattern. - integrations/providers.md + configuration.md compression intro: removed stale 'Gemini Flash via OpenRouter' claim. - website/static/api/model-catalog.json: rebuilt from hermes_cli/models.py so the live manifest picks up tencent/hy3-preview (and remains in sync for future model-catalog PRs). Platform messaging (#17417 #16997 #16193 #14315 #13151 #11794 #10610 #10283 #10246 #11564 #13178): - Signal: native formatting (bodyRanges), reply quotes, reactions. - Telegram: table rendering (bullets + code-block fallback), disable_link_previews, group_allowed_chats. - Slack: strict_mention config. - Discord: slash_commands disable, send_animation GIF, send_message native media attachments. - DingTalk: require_mention + allowed_users. CLI (#16052 #16539 #16566 #15841 #14798 #10043): - New 'hermes fallback' interactive manager. - New 'hermes update --check', '--backup' flag, and pre-update pairing snapshot behavior. - 'hermes gateway start/restart --all' multi-profile flag. - cron.md: 'hermes tools' as a platform, per-job enabled_toolsets, wakeAgent gate, context_from chaining. Config keys / env vars (#17305 #17026 #17000 #15077 #14557 #14227 #14166 #14730 #17008): - terminal.docker_run_as_host_user, display.runtime_metadata_footer, compression.hygiene_hard_message_limit, HINDSIGHT_TIMEOUT, skills.guard_agent_created, TAVILY_BASE_URL, security.allow_private_urls, agent.api_max_retries, gateway hot-reload of compression/context_length config edits. TUI / CLI UX (#17130 #17113 #17175 #17150 #16707 #12312 #12305 #12934 #14810 #14045 #17286 #17126): - HERMES_TUI_RESUME, HERMES_TUI_THEME, LaTeX rendering, busy-indicator styles, ctrl-x queued-message delete, git branch in status bar, per- prompt elapsed stopwatch, external-editor keybind, markdown stripping, TUI voice-mode parity, /agents overlay, /reload + /mouse. Gateway features (#16506 #15027 #13428 #12116): - Native multimodal image routing based on vision capability. - /usage account-limits section. - /steer slash command (added to reference + explanation in CLI). Plugins / hooks (#12929 #12972 #10763 #16364): - transform_tool_result, transform_terminal_output plugin hooks. - PluginContext.dispatch_tool() documented with slash-command example. - google_meet bundled plugin entry under built-in-plugins.md. Other (#16576 #16572 #16383 #15878 #15608 #15606 #14809 #14767 #14231 #14232 #14307 #13683 #12373 #11891 #11291 #10066): - hermes backup exclusions (WAL/SHM/journal + checkpoints/). - security.md hardline blocklist (floor below --yolo). - FHS install layout for root installs. - openssh-client + docker-cli baked into the Docker image. - MEDIA: tag supported extensions table (docs/office/archives/pdf). - Remote-to-host file sync on SSH/Modal/Daytona teardown. - 'hermes model' -> Configure Auxiliary Models interactive picker. - Podman support via HERMES_DOCKER_BINARY. Providers / STT / one-shot (#15045 #14473 #15704): - alibaba-coding-plan first-class provider entry. - xAI Grok STT as a 6th transcription option. - 'hermes -z' scripted one-shot mode + HERMES_INFERENCE_MODEL. Build: 'docusaurus build' succeeds. No new broken links/anchors; pre-existing warnings unchanged.
2026-04-29 20:32:37 -07:00
provider: "local" # "local" | "groq" | "openai" | "mistral" | "xai"
feat(stt): add free local whisper transcription via faster-whisper (#1185) * fix: Home Assistant event filtering now closed by default Previously, when no watch_domains or watch_entities were configured, ALL state_changed events passed through to the agent, causing users to be flooded with notifications for every HA entity change. Now events are dropped by default unless the user explicitly configures: - watch_domains: list of domains to monitor (e.g. climate, light) - watch_entities: list of specific entity IDs to monitor - watch_all: true (new option — opt-in to receive all events) A warning is logged at connect time if no filters are configured, guiding users to set up their HA platform config. All 49 gateway HA tests + 52 HA tool tests pass. * docs: update Home Assistant integration documentation - homeassistant.md: Fix event filtering docs to reflect closed-by-default behavior. Add watch_all option. Replace Python dict config example with YAML. Fix defaults table (was incorrectly showing 'all'). Add required configuration warning admonition. - environment-variables.md: Add HASS_TOKEN and HASS_URL to Messaging section. - messaging/index.md: Add Home Assistant to description, architecture diagram, platform toolsets table, and Next Steps links. * fix(terminal): strip provider env vars from background and PTY subprocesses Extends the env var blocklist from #1157 to also cover the two remaining leaky paths in process_registry.py: - spawn_local() PTY path (line 156) - spawn_local() background Popen path (line 197) Both were still using raw os.environ, leaking provider vars to background processes and interactive PTY sessions. Now uses the same dynamic _HERMES_PROVIDER_ENV_BLOCKLIST from local.py. Explicit env_vars passed to spawn_local() still override the blocklist, matching the existing behavior for callers that intentionally need these. Gap identified by PR #1004 (@PeterFile). * feat(delegate): add observability metadata to subagent results Enrich delegate_task results with metadata from the child AIAgent: - model: which model the child used - exit_reason: completed | interrupted | max_iterations - tokens.input / tokens.output: token counts - tool_trace: per-tool-call trace with byte sizes and ok/error status Tool trace uses tool_call_id matching to correctly pair parallel tool calls with their results, with a fallback for messages without IDs. Cherry-picked from PR #872 by @omerkaz, with fixes: - Fixed parallel tool call trace pairing (was always updating last entry) - Removed redundant 'iterations' field (identical to existing 'api_calls') - Added test for parallel tool call trace correctness Co-authored-by: omerkaz <omerkaz@users.noreply.github.com> * feat(stt): add free local whisper transcription via faster-whisper Replace OpenAI-only STT with a dual-provider system mirroring the TTS architecture (Edge TTS free / ElevenLabs paid): STT: faster-whisper local (free, default) / OpenAI Whisper API (paid) Changes: - tools/transcription_tools.py: Full rewrite with provider dispatch, config loading, local faster-whisper backend, and OpenAI API backend. Auto-downloads model (~150MB for 'base') on first voice message. Singleton model instance reused across calls. - pyproject.toml: Add faster-whisper>=1.0.0 as core dependency - hermes_cli/config.py: Expand stt config to match TTS pattern with provider selection and per-provider model settings - agent/context_compressor.py: Fix .strip() crash when LLM returns non-string content (dict from llama.cpp, None). Fixes #1100 partially. - tests/: 23 new tests for STT providers + 2 for compressor fix - docs/: Updated Voice & TTS page with STT provider table, model sizes, config examples, and fallback behavior Fallback behavior: - Local not installed → OpenAI API (if key set) - OpenAI key not set → local whisper (if installed) - Neither → graceful error message to user Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com> --------- Co-authored-by: omerkaz <omerkaz@users.noreply.github.com> Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com>
2026-03-13 11:11:05 -07:00
local:
model: "base" # tiny, base, small, medium, large-v3
openai:
model: "whisper-1" # whisper-1, gpt-4o-mini-transcribe, gpt-4o-transcribe
mistral:
model: "voxtral-mini-latest" # voxtral-mini-latest, voxtral-mini-2602
docs: two-week gap sweep — platforms, CLI, config, TUI, hooks, providers (#17727) Covers ~60 merged PRs from Apr 15–29 that shipped user-visible behavior without docs coverage. No functional code changes; docs + static manifest regeneration only. Highlights: Stale / incorrect: - configuration.md: auxiliary auto-routing line was wrong since #11900; now correctly states auto routes to the main model, with a note on the cost trade-off and per-task override pattern. - integrations/providers.md + configuration.md compression intro: removed stale 'Gemini Flash via OpenRouter' claim. - website/static/api/model-catalog.json: rebuilt from hermes_cli/models.py so the live manifest picks up tencent/hy3-preview (and remains in sync for future model-catalog PRs). Platform messaging (#17417 #16997 #16193 #14315 #13151 #11794 #10610 #10283 #10246 #11564 #13178): - Signal: native formatting (bodyRanges), reply quotes, reactions. - Telegram: table rendering (bullets + code-block fallback), disable_link_previews, group_allowed_chats. - Slack: strict_mention config. - Discord: slash_commands disable, send_animation GIF, send_message native media attachments. - DingTalk: require_mention + allowed_users. CLI (#16052 #16539 #16566 #15841 #14798 #10043): - New 'hermes fallback' interactive manager. - New 'hermes update --check', '--backup' flag, and pre-update pairing snapshot behavior. - 'hermes gateway start/restart --all' multi-profile flag. - cron.md: 'hermes tools' as a platform, per-job enabled_toolsets, wakeAgent gate, context_from chaining. Config keys / env vars (#17305 #17026 #17000 #15077 #14557 #14227 #14166 #14730 #17008): - terminal.docker_run_as_host_user, display.runtime_metadata_footer, compression.hygiene_hard_message_limit, HINDSIGHT_TIMEOUT, skills.guard_agent_created, TAVILY_BASE_URL, security.allow_private_urls, agent.api_max_retries, gateway hot-reload of compression/context_length config edits. TUI / CLI UX (#17130 #17113 #17175 #17150 #16707 #12312 #12305 #12934 #14810 #14045 #17286 #17126): - HERMES_TUI_RESUME, HERMES_TUI_THEME, LaTeX rendering, busy-indicator styles, ctrl-x queued-message delete, git branch in status bar, per- prompt elapsed stopwatch, external-editor keybind, markdown stripping, TUI voice-mode parity, /agents overlay, /reload + /mouse. Gateway features (#16506 #15027 #13428 #12116): - Native multimodal image routing based on vision capability. - /usage account-limits section. - /steer slash command (added to reference + explanation in CLI). Plugins / hooks (#12929 #12972 #10763 #16364): - transform_tool_result, transform_terminal_output plugin hooks. - PluginContext.dispatch_tool() documented with slash-command example. - google_meet bundled plugin entry under built-in-plugins.md. Other (#16576 #16572 #16383 #15878 #15608 #15606 #14809 #14767 #14231 #14232 #14307 #13683 #12373 #11891 #11291 #10066): - hermes backup exclusions (WAL/SHM/journal + checkpoints/). - security.md hardline blocklist (floor below --yolo). - FHS install layout for root installs. - openssh-client + docker-cli baked into the Docker image. - MEDIA: tag supported extensions table (docs/office/archives/pdf). - Remote-to-host file sync on SSH/Modal/Daytona teardown. - 'hermes model' -> Configure Auxiliary Models interactive picker. - Podman support via HERMES_DOCKER_BINARY. Providers / STT / one-shot (#15045 #14473 #15704): - alibaba-coding-plan first-class provider entry. - xAI Grok STT as a 6th transcription option. - 'hermes -z' scripted one-shot mode + HERMES_INFERENCE_MODEL. Build: 'docusaurus build' succeeds. No new broken links/anchors; pre-existing warnings unchanged.
2026-04-29 20:32:37 -07:00
xai:
model: "grok-stt" # xAI Grok STT
```
feat(stt): add free local whisper transcription via faster-whisper (#1185) * fix: Home Assistant event filtering now closed by default Previously, when no watch_domains or watch_entities were configured, ALL state_changed events passed through to the agent, causing users to be flooded with notifications for every HA entity change. Now events are dropped by default unless the user explicitly configures: - watch_domains: list of domains to monitor (e.g. climate, light) - watch_entities: list of specific entity IDs to monitor - watch_all: true (new option — opt-in to receive all events) A warning is logged at connect time if no filters are configured, guiding users to set up their HA platform config. All 49 gateway HA tests + 52 HA tool tests pass. * docs: update Home Assistant integration documentation - homeassistant.md: Fix event filtering docs to reflect closed-by-default behavior. Add watch_all option. Replace Python dict config example with YAML. Fix defaults table (was incorrectly showing 'all'). Add required configuration warning admonition. - environment-variables.md: Add HASS_TOKEN and HASS_URL to Messaging section. - messaging/index.md: Add Home Assistant to description, architecture diagram, platform toolsets table, and Next Steps links. * fix(terminal): strip provider env vars from background and PTY subprocesses Extends the env var blocklist from #1157 to also cover the two remaining leaky paths in process_registry.py: - spawn_local() PTY path (line 156) - spawn_local() background Popen path (line 197) Both were still using raw os.environ, leaking provider vars to background processes and interactive PTY sessions. Now uses the same dynamic _HERMES_PROVIDER_ENV_BLOCKLIST from local.py. Explicit env_vars passed to spawn_local() still override the blocklist, matching the existing behavior for callers that intentionally need these. Gap identified by PR #1004 (@PeterFile). * feat(delegate): add observability metadata to subagent results Enrich delegate_task results with metadata from the child AIAgent: - model: which model the child used - exit_reason: completed | interrupted | max_iterations - tokens.input / tokens.output: token counts - tool_trace: per-tool-call trace with byte sizes and ok/error status Tool trace uses tool_call_id matching to correctly pair parallel tool calls with their results, with a fallback for messages without IDs. Cherry-picked from PR #872 by @omerkaz, with fixes: - Fixed parallel tool call trace pairing (was always updating last entry) - Removed redundant 'iterations' field (identical to existing 'api_calls') - Added test for parallel tool call trace correctness Co-authored-by: omerkaz <omerkaz@users.noreply.github.com> * feat(stt): add free local whisper transcription via faster-whisper Replace OpenAI-only STT with a dual-provider system mirroring the TTS architecture (Edge TTS free / ElevenLabs paid): STT: faster-whisper local (free, default) / OpenAI Whisper API (paid) Changes: - tools/transcription_tools.py: Full rewrite with provider dispatch, config loading, local faster-whisper backend, and OpenAI API backend. Auto-downloads model (~150MB for 'base') on first voice message. Singleton model instance reused across calls. - pyproject.toml: Add faster-whisper>=1.0.0 as core dependency - hermes_cli/config.py: Expand stt config to match TTS pattern with provider selection and per-provider model settings - agent/context_compressor.py: Fix .strip() crash when LLM returns non-string content (dict from llama.cpp, None). Fixes #1100 partially. - tests/: 23 new tests for STT providers + 2 for compressor fix - docs/: Updated Voice & TTS page with STT provider table, model sizes, config examples, and fallback behavior Fallback behavior: - Local not installed → OpenAI API (if key set) - OpenAI key not set → local whisper (if installed) - Neither → graceful error message to user Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com> --------- Co-authored-by: omerkaz <omerkaz@users.noreply.github.com> Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com>
2026-03-13 11:11:05 -07:00
### Provider Details
**Local (faster-whisper)** — Runs Whisper locally via [faster-whisper](https://github.com/SYSTRAN/faster-whisper). Uses CPU by default, GPU if available. Model sizes:
| Model | Size | Speed | Quality |
|-------|------|-------|---------|
| `tiny` | ~75 MB | Fastest | Basic |
| `base` | ~150 MB | Fast | Good (default) |
| `small` | ~500 MB | Medium | Better |
| `medium` | ~1.5 GB | Slower | Great |
| `large-v3` | ~3 GB | Slowest | Best |
**Groq API** — Requires `GROQ_API_KEY`. Good cloud fallback when you want a free hosted STT option.
**OpenAI API** — Accepts `VOICE_TOOLS_OPENAI_KEY` first and falls back to `OPENAI_API_KEY`. Supports `whisper-1`, `gpt-4o-mini-transcribe`, and `gpt-4o-transcribe`.
**Mistral API (Voxtral Transcribe)** — Requires `MISTRAL_API_KEY`. Uses Mistral's [Voxtral Transcribe](https://docs.mistral.ai/capabilities/audio/speech_to_text/) models. Supports 13 languages, speaker diarization, and word-level timestamps. Install with `pip install hermes-agent[mistral]`.
docs: two-week gap sweep — platforms, CLI, config, TUI, hooks, providers (#17727) Covers ~60 merged PRs from Apr 15–29 that shipped user-visible behavior without docs coverage. No functional code changes; docs + static manifest regeneration only. Highlights: Stale / incorrect: - configuration.md: auxiliary auto-routing line was wrong since #11900; now correctly states auto routes to the main model, with a note on the cost trade-off and per-task override pattern. - integrations/providers.md + configuration.md compression intro: removed stale 'Gemini Flash via OpenRouter' claim. - website/static/api/model-catalog.json: rebuilt from hermes_cli/models.py so the live manifest picks up tencent/hy3-preview (and remains in sync for future model-catalog PRs). Platform messaging (#17417 #16997 #16193 #14315 #13151 #11794 #10610 #10283 #10246 #11564 #13178): - Signal: native formatting (bodyRanges), reply quotes, reactions. - Telegram: table rendering (bullets + code-block fallback), disable_link_previews, group_allowed_chats. - Slack: strict_mention config. - Discord: slash_commands disable, send_animation GIF, send_message native media attachments. - DingTalk: require_mention + allowed_users. CLI (#16052 #16539 #16566 #15841 #14798 #10043): - New 'hermes fallback' interactive manager. - New 'hermes update --check', '--backup' flag, and pre-update pairing snapshot behavior. - 'hermes gateway start/restart --all' multi-profile flag. - cron.md: 'hermes tools' as a platform, per-job enabled_toolsets, wakeAgent gate, context_from chaining. Config keys / env vars (#17305 #17026 #17000 #15077 #14557 #14227 #14166 #14730 #17008): - terminal.docker_run_as_host_user, display.runtime_metadata_footer, compression.hygiene_hard_message_limit, HINDSIGHT_TIMEOUT, skills.guard_agent_created, TAVILY_BASE_URL, security.allow_private_urls, agent.api_max_retries, gateway hot-reload of compression/context_length config edits. TUI / CLI UX (#17130 #17113 #17175 #17150 #16707 #12312 #12305 #12934 #14810 #14045 #17286 #17126): - HERMES_TUI_RESUME, HERMES_TUI_THEME, LaTeX rendering, busy-indicator styles, ctrl-x queued-message delete, git branch in status bar, per- prompt elapsed stopwatch, external-editor keybind, markdown stripping, TUI voice-mode parity, /agents overlay, /reload + /mouse. Gateway features (#16506 #15027 #13428 #12116): - Native multimodal image routing based on vision capability. - /usage account-limits section. - /steer slash command (added to reference + explanation in CLI). Plugins / hooks (#12929 #12972 #10763 #16364): - transform_tool_result, transform_terminal_output plugin hooks. - PluginContext.dispatch_tool() documented with slash-command example. - google_meet bundled plugin entry under built-in-plugins.md. Other (#16576 #16572 #16383 #15878 #15608 #15606 #14809 #14767 #14231 #14232 #14307 #13683 #12373 #11891 #11291 #10066): - hermes backup exclusions (WAL/SHM/journal + checkpoints/). - security.md hardline blocklist (floor below --yolo). - FHS install layout for root installs. - openssh-client + docker-cli baked into the Docker image. - MEDIA: tag supported extensions table (docs/office/archives/pdf). - Remote-to-host file sync on SSH/Modal/Daytona teardown. - 'hermes model' -> Configure Auxiliary Models interactive picker. - Podman support via HERMES_DOCKER_BINARY. Providers / STT / one-shot (#15045 #14473 #15704): - alibaba-coding-plan first-class provider entry. - xAI Grok STT as a 6th transcription option. - 'hermes -z' scripted one-shot mode + HERMES_INFERENCE_MODEL. Build: 'docusaurus build' succeeds. No new broken links/anchors; pre-existing warnings unchanged.
2026-04-29 20:32:37 -07:00
**xAI Grok STT** — Requires `XAI_API_KEY`. Posts to `https://api.x.ai/v1/stt` as multipart/form-data. Good choice if you're already using xAI for chat or TTS and want one API key for everything. Auto-detection order puts it after Groq — explicitly set `stt.provider: xai` to force it.
**Custom local CLI fallback** — Set `HERMES_LOCAL_STT_COMMAND` if you want Hermes to call a local transcription command directly. The command template supports `{input_path}`, `{output_dir}`, `{language}`, and `{model}` placeholders.
feat(stt): add free local whisper transcription via faster-whisper (#1185) * fix: Home Assistant event filtering now closed by default Previously, when no watch_domains or watch_entities were configured, ALL state_changed events passed through to the agent, causing users to be flooded with notifications for every HA entity change. Now events are dropped by default unless the user explicitly configures: - watch_domains: list of domains to monitor (e.g. climate, light) - watch_entities: list of specific entity IDs to monitor - watch_all: true (new option — opt-in to receive all events) A warning is logged at connect time if no filters are configured, guiding users to set up their HA platform config. All 49 gateway HA tests + 52 HA tool tests pass. * docs: update Home Assistant integration documentation - homeassistant.md: Fix event filtering docs to reflect closed-by-default behavior. Add watch_all option. Replace Python dict config example with YAML. Fix defaults table (was incorrectly showing 'all'). Add required configuration warning admonition. - environment-variables.md: Add HASS_TOKEN and HASS_URL to Messaging section. - messaging/index.md: Add Home Assistant to description, architecture diagram, platform toolsets table, and Next Steps links. * fix(terminal): strip provider env vars from background and PTY subprocesses Extends the env var blocklist from #1157 to also cover the two remaining leaky paths in process_registry.py: - spawn_local() PTY path (line 156) - spawn_local() background Popen path (line 197) Both were still using raw os.environ, leaking provider vars to background processes and interactive PTY sessions. Now uses the same dynamic _HERMES_PROVIDER_ENV_BLOCKLIST from local.py. Explicit env_vars passed to spawn_local() still override the blocklist, matching the existing behavior for callers that intentionally need these. Gap identified by PR #1004 (@PeterFile). * feat(delegate): add observability metadata to subagent results Enrich delegate_task results with metadata from the child AIAgent: - model: which model the child used - exit_reason: completed | interrupted | max_iterations - tokens.input / tokens.output: token counts - tool_trace: per-tool-call trace with byte sizes and ok/error status Tool trace uses tool_call_id matching to correctly pair parallel tool calls with their results, with a fallback for messages without IDs. Cherry-picked from PR #872 by @omerkaz, with fixes: - Fixed parallel tool call trace pairing (was always updating last entry) - Removed redundant 'iterations' field (identical to existing 'api_calls') - Added test for parallel tool call trace correctness Co-authored-by: omerkaz <omerkaz@users.noreply.github.com> * feat(stt): add free local whisper transcription via faster-whisper Replace OpenAI-only STT with a dual-provider system mirroring the TTS architecture (Edge TTS free / ElevenLabs paid): STT: faster-whisper local (free, default) / OpenAI Whisper API (paid) Changes: - tools/transcription_tools.py: Full rewrite with provider dispatch, config loading, local faster-whisper backend, and OpenAI API backend. Auto-downloads model (~150MB for 'base') on first voice message. Singleton model instance reused across calls. - pyproject.toml: Add faster-whisper>=1.0.0 as core dependency - hermes_cli/config.py: Expand stt config to match TTS pattern with provider selection and per-provider model settings - agent/context_compressor.py: Fix .strip() crash when LLM returns non-string content (dict from llama.cpp, None). Fixes #1100 partially. - tests/: 23 new tests for STT providers + 2 for compressor fix - docs/: Updated Voice & TTS page with STT provider table, model sizes, config examples, and fallback behavior Fallback behavior: - Local not installed → OpenAI API (if key set) - OpenAI key not set → local whisper (if installed) - Neither → graceful error message to user Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com> --------- Co-authored-by: omerkaz <omerkaz@users.noreply.github.com> Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com>
2026-03-13 11:11:05 -07:00
### Fallback Behavior
If your configured provider isn't available, Hermes automatically falls back:
- **Local faster-whisper unavailable** → Tries a local `whisper` CLI or `HERMES_LOCAL_STT_COMMAND` before cloud providers
- **Groq key not set** → Falls back to local transcription, then OpenAI
- **OpenAI key not set** → Falls back to local transcription, then Groq
- **Mistral key/SDK not set** → Skipped in auto-detect; falls through to next available provider
- **Nothing available** → Voice messages pass through with an accurate note to the user