mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-01 08:21:50 +08:00
Covers ~60 merged PRs from Apr 15–29 that shipped user-visible behavior without docs coverage. No functional code changes; docs + static manifest regeneration only. Highlights: Stale / incorrect: - configuration.md: auxiliary auto-routing line was wrong since #11900; now correctly states auto routes to the main model, with a note on the cost trade-off and per-task override pattern. - integrations/providers.md + configuration.md compression intro: removed stale 'Gemini Flash via OpenRouter' claim. - website/static/api/model-catalog.json: rebuilt from hermes_cli/models.py so the live manifest picks up tencent/hy3-preview (and remains in sync for future model-catalog PRs). Platform messaging (#17417 #16997 #16193 #14315 #13151 #11794 #10610 #10283 #10246 #11564 #13178): - Signal: native formatting (bodyRanges), reply quotes, reactions. - Telegram: table rendering (bullets + code-block fallback), disable_link_previews, group_allowed_chats. - Slack: strict_mention config. - Discord: slash_commands disable, send_animation GIF, send_message native media attachments. - DingTalk: require_mention + allowed_users. CLI (#16052 #16539 #16566 #15841 #14798 #10043): - New 'hermes fallback' interactive manager. - New 'hermes update --check', '--backup' flag, and pre-update pairing snapshot behavior. - 'hermes gateway start/restart --all' multi-profile flag. - cron.md: 'hermes tools' as a platform, per-job enabled_toolsets, wakeAgent gate, context_from chaining. Config keys / env vars (#17305 #17026 #17000 #15077 #14557 #14227 #14166 #14730 #17008): - terminal.docker_run_as_host_user, display.runtime_metadata_footer, compression.hygiene_hard_message_limit, HINDSIGHT_TIMEOUT, skills.guard_agent_created, TAVILY_BASE_URL, security.allow_private_urls, agent.api_max_retries, gateway hot-reload of compression/context_length config edits. TUI / CLI UX (#17130 #17113 #17175 #17150 #16707 #12312 #12305 #12934 #14810 #14045 #17286 #17126): - HERMES_TUI_RESUME, HERMES_TUI_THEME, LaTeX rendering, busy-indicator styles, ctrl-x queued-message delete, git branch in status bar, per- prompt elapsed stopwatch, external-editor keybind, markdown stripping, TUI voice-mode parity, /agents overlay, /reload + /mouse. Gateway features (#16506 #15027 #13428 #12116): - Native multimodal image routing based on vision capability. - /usage account-limits section. - /steer slash command (added to reference + explanation in CLI). Plugins / hooks (#12929 #12972 #10763 #16364): - transform_tool_result, transform_terminal_output plugin hooks. - PluginContext.dispatch_tool() documented with slash-command example. - google_meet bundled plugin entry under built-in-plugins.md. Other (#16576 #16572 #16383 #15878 #15608 #15606 #14809 #14767 #14231 #14232 #14307 #13683 #12373 #11891 #11291 #10066): - hermes backup exclusions (WAL/SHM/journal + checkpoints/). - security.md hardline blocklist (floor below --yolo). - FHS install layout for root installs. - openssh-client + docker-cli baked into the Docker image. - MEDIA: tag supported extensions table (docs/office/archives/pdf). - Remote-to-host file sync on SSH/Modal/Daytona teardown. - 'hermes model' -> Configure Auxiliary Models interactive picker. - Podman support via HERMES_DOCKER_BINARY. Providers / STT / one-shot (#15045 #14473 #15704): - alibaba-coding-plan first-class provider entry. - xAI Grok STT as a 6th transcription option. - 'hermes -z' scripted one-shot mode + HERMES_INFERENCE_MODEL. Build: 'docusaurus build' succeeds. No new broken links/anchors; pre-existing warnings unchanged.
512 lines
18 KiB
Markdown
512 lines
18 KiB
Markdown
---
|
||
sidebar_position: 10
|
||
title: "Voice Mode"
|
||
description: "Real-time voice conversations with Hermes Agent — CLI, Telegram, Discord (DMs, text channels, and voice channels)"
|
||
---
|
||
|
||
# Voice Mode
|
||
|
||
Hermes Agent supports full voice interaction across CLI and messaging platforms. Talk to the agent using your microphone, hear spoken replies, and have live voice conversations in Discord voice channels.
|
||
|
||
If you want a practical setup walkthrough with recommended configurations and real usage patterns, see [Use Voice Mode with Hermes](/docs/guides/use-voice-mode-with-hermes).
|
||
|
||
## Prerequisites
|
||
|
||
Before using voice features, make sure you have:
|
||
|
||
1. **Hermes Agent installed** — `pip install hermes-agent` (see [Installation](/docs/getting-started/installation))
|
||
2. **An LLM provider configured** — run `hermes model` or set your preferred provider credentials in `~/.hermes/.env`
|
||
3. **A working base setup** — run `hermes` to verify the agent responds to text before enabling voice
|
||
|
||
:::tip
|
||
The `~/.hermes/` directory and default `config.yaml` are created automatically the first time you run `hermes`. You only need to create `~/.hermes/.env` manually for API keys.
|
||
:::
|
||
|
||
## Overview
|
||
|
||
| Feature | Platform | Description |
|
||
|---------|----------|-------------|
|
||
| **Interactive Voice** | CLI | Press Ctrl+B to record, agent auto-detects silence and responds |
|
||
| **Auto Voice Reply** | Telegram, Discord | Agent sends spoken audio alongside text responses |
|
||
| **Voice Channel** | Discord | Bot joins VC, listens to users speaking, speaks replies back |
|
||
|
||
## Requirements
|
||
|
||
### Python Packages
|
||
|
||
```bash
|
||
# CLI voice mode (microphone + audio playback)
|
||
pip install "hermes-agent[voice]"
|
||
|
||
# Discord + Telegram messaging (includes discord.py[voice] for VC support)
|
||
pip install "hermes-agent[messaging]"
|
||
|
||
# Premium TTS (ElevenLabs)
|
||
pip install "hermes-agent[tts-premium]"
|
||
|
||
# Local TTS (NeuTTS, optional)
|
||
python -m pip install -U neutts[all]
|
||
|
||
# Everything at once
|
||
pip install "hermes-agent[all]"
|
||
```
|
||
|
||
| Extra | Packages | Required For |
|
||
|-------|----------|-------------|
|
||
| `voice` | `sounddevice`, `numpy` | CLI voice mode |
|
||
| `messaging` | `discord.py[voice]`, `python-telegram-bot`, `aiohttp` | Discord & Telegram bots |
|
||
| `tts-premium` | `elevenlabs` | ElevenLabs TTS provider |
|
||
|
||
Optional local TTS provider: install `neutts` separately with `python -m pip install -U neutts[all]`. On first use it downloads the model automatically.
|
||
|
||
:::info
|
||
`discord.py[voice]` installs **PyNaCl** (for voice encryption) and **opus bindings** automatically. This is required for Discord voice channel support.
|
||
:::
|
||
|
||
### System Dependencies
|
||
|
||
```bash
|
||
# macOS
|
||
brew install portaudio ffmpeg opus
|
||
brew install espeak-ng # for NeuTTS
|
||
|
||
# Ubuntu/Debian
|
||
sudo apt install portaudio19-dev ffmpeg libopus0
|
||
sudo apt install espeak-ng # for NeuTTS
|
||
```
|
||
|
||
| Dependency | Purpose | Required For |
|
||
|-----------|---------|-------------|
|
||
| **PortAudio** | Microphone input and audio playback | CLI voice mode |
|
||
| **ffmpeg** | Audio format conversion (MP3 → Opus, PCM → WAV) | All platforms |
|
||
| **Opus** | Discord voice codec | Discord voice channels |
|
||
| **espeak-ng** | Phonemizer backend | Local NeuTTS provider |
|
||
|
||
### API Keys
|
||
|
||
Add to `~/.hermes/.env`:
|
||
|
||
```bash
|
||
# Speech-to-Text — local provider needs NO key at all
|
||
# pip install faster-whisper # Free, runs locally, recommended
|
||
GROQ_API_KEY=your-key # Groq Whisper — fast, free tier (cloud)
|
||
VOICE_TOOLS_OPENAI_KEY=your-key # OpenAI Whisper — paid (cloud)
|
||
|
||
# Text-to-Speech (optional — Edge TTS and NeuTTS work without any key)
|
||
ELEVENLABS_API_KEY=*** # ElevenLabs — premium quality
|
||
# VOICE_TOOLS_OPENAI_KEY above also enables OpenAI TTS
|
||
```
|
||
|
||
:::tip
|
||
If `faster-whisper` is installed, voice mode works with **zero API keys** for STT. The model (~150 MB for `base`) downloads automatically on first use.
|
||
:::
|
||
|
||
---
|
||
|
||
## CLI Voice Mode
|
||
|
||
Voice mode is available in both the **classic CLI** (`hermes chat`) and the **TUI** (`hermes --tui`). Behavior is identical across both — same slash commands, same VAD silence detection, same streaming TTS, same hallucination filter. The TUI additionally forwards crash-forensic logs to `~/.hermes/logs/` so push-to-talk failures on exotic audio backends can be reported with a full stack trace rather than disappearing silently.
|
||
|
||
### Quick Start
|
||
|
||
Start the CLI and enable voice mode:
|
||
|
||
```bash
|
||
hermes # Start the interactive CLI
|
||
```
|
||
|
||
Then use these commands inside the CLI:
|
||
|
||
```
|
||
/voice Toggle voice mode on/off
|
||
/voice on Enable voice mode
|
||
/voice off Disable voice mode
|
||
/voice tts Toggle TTS output
|
||
/voice status Show current state
|
||
```
|
||
|
||
### How It Works
|
||
|
||
1. Start the CLI with `hermes` and enable voice mode with `/voice on`
|
||
2. **Press Ctrl+B** — a beep plays (880Hz), recording starts
|
||
3. **Speak** — a live audio level bar shows your input: `● [▁▂▃▅▇▇▅▂] ❯`
|
||
4. **Stop speaking** — after 3 seconds of silence, recording auto-stops
|
||
5. **Two beeps** play (660Hz) confirming the recording ended
|
||
6. Audio is transcribed via Whisper and sent to the agent
|
||
7. If TTS is enabled, the agent's reply is spoken aloud
|
||
8. Recording **automatically restarts** — speak again without pressing any key
|
||
|
||
This loop continues until you press **Ctrl+B** during recording (exits continuous mode) or 3 consecutive recordings detect no speech.
|
||
|
||
:::tip
|
||
The record key is configurable via `voice.record_key` in `~/.hermes/config.yaml` (default: `ctrl+b`).
|
||
:::
|
||
|
||
### Silence Detection
|
||
|
||
Two-stage algorithm detects when you've finished speaking:
|
||
|
||
1. **Speech confirmation** — waits for audio above the RMS threshold (200) for at least 0.3s, tolerating brief dips between syllables
|
||
2. **End detection** — once speech is confirmed, triggers after 3.0 seconds of continuous silence
|
||
|
||
If no speech is detected at all for 15 seconds, recording stops automatically.
|
||
|
||
Both `silence_threshold` and `silence_duration` are configurable in `config.yaml`. You can also disable the record start/stop beeps with `voice.beep_enabled: false`.
|
||
|
||
### Streaming TTS
|
||
|
||
When TTS is enabled, the agent speaks its reply **sentence-by-sentence** as it generates text — you don't wait for the full response:
|
||
|
||
1. Buffers text deltas into complete sentences (min 20 chars)
|
||
2. Strips markdown formatting and `<think>` blocks
|
||
3. Generates and plays audio per sentence in real-time
|
||
|
||
### Hallucination Filter
|
||
|
||
Whisper sometimes generates phantom text from silence or background noise ("Thank you for watching", "Subscribe", etc.). The agent filters these out using a set of 26 known hallucination phrases across multiple languages, plus a regex pattern that catches repetitive variations.
|
||
|
||
---
|
||
|
||
## Gateway Voice Reply (Telegram & Discord)
|
||
|
||
If you haven't set up your messaging bots yet, see the platform-specific guides:
|
||
- [Telegram Setup Guide](../messaging/telegram.md)
|
||
- [Discord Setup Guide](../messaging/discord.md)
|
||
|
||
Start the gateway to connect to your messaging platforms:
|
||
|
||
```bash
|
||
hermes gateway # Start the gateway (connects to configured platforms)
|
||
hermes gateway setup # Interactive setup wizard for first-time configuration
|
||
```
|
||
|
||
### Discord: Channels vs DMs
|
||
|
||
The bot supports two interaction modes on Discord:
|
||
|
||
| Mode | How to Talk | Mention Required | Setup |
|
||
|------|------------|-----------------|-------|
|
||
| **Direct Message (DM)** | Open the bot's profile → "Message" | No | Works immediately |
|
||
| **Server Channel** | Type in a text channel where the bot is present | Yes (`@botname`) | Bot must be invited to the server |
|
||
|
||
**DM (recommended for personal use):** Just open a DM with the bot and type — no @mention needed. Voice replies and all commands work the same as in channels.
|
||
|
||
**Server channels:** The bot only responds when you @mention it (e.g. `@hermesbyt4 hello`). Make sure you select the **bot user** from the mention popup, not the role with the same name.
|
||
|
||
:::tip
|
||
To disable the mention requirement in server channels, add to `~/.hermes/.env`:
|
||
```bash
|
||
DISCORD_REQUIRE_MENTION=false
|
||
```
|
||
Or set specific channels as free-response (no mention needed):
|
||
```bash
|
||
DISCORD_FREE_RESPONSE_CHANNELS=123456789,987654321
|
||
```
|
||
:::
|
||
|
||
### Commands
|
||
|
||
These work in both Telegram and Discord (DMs and text channels):
|
||
|
||
```
|
||
/voice Toggle voice mode on/off
|
||
/voice on Voice replies only when you send a voice message
|
||
/voice tts Voice replies for ALL messages
|
||
/voice off Disable voice replies
|
||
/voice status Show current setting
|
||
```
|
||
|
||
### Modes
|
||
|
||
| Mode | Command | Behavior |
|
||
|------|---------|----------|
|
||
| `off` | `/voice off` | Text only (default) |
|
||
| `voice_only` | `/voice on` | Speaks reply only when you send a voice message |
|
||
| `all` | `/voice tts` | Speaks reply to every message |
|
||
|
||
Voice mode setting is persisted across gateway restarts.
|
||
|
||
### Platform Delivery
|
||
|
||
| Platform | Format | Notes |
|
||
|----------|--------|-------|
|
||
| **Telegram** | Voice bubble (Opus/OGG) | Plays inline in chat. ffmpeg converts MP3 → Opus if needed |
|
||
| **Discord** | Native voice bubble (Opus/OGG) | Plays inline like a user voice message. Falls back to file attachment if voice bubble API fails |
|
||
|
||
---
|
||
|
||
## Discord Voice Channels
|
||
|
||
The most immersive voice feature: the bot joins a Discord voice channel, listens to users speaking, transcribes their speech, processes through the agent, and speaks the reply back in the voice channel.
|
||
|
||
### Setup
|
||
|
||
#### 1. Discord Bot Permissions
|
||
|
||
If you already have a Discord bot set up for text (see [Discord Setup Guide](../messaging/discord.md)), you need to add voice permissions.
|
||
|
||
Go to the [Discord Developer Portal](https://discord.com/developers/applications) → your application → **Installation** → **Default Install Settings** → **Guild Install**:
|
||
|
||
**Add these permissions to the existing text permissions:**
|
||
|
||
| Permission | Purpose | Required |
|
||
|-----------|---------|----------|
|
||
| **Connect** | Join voice channels | Yes |
|
||
| **Speak** | Play TTS audio in voice channels | Yes |
|
||
| **Use Voice Activity** | Detect when users are speaking | Recommended |
|
||
|
||
**Updated Permissions Integer:**
|
||
|
||
| Level | Integer | What's Included |
|
||
|-------|---------|----------------|
|
||
| Text only | `274878286912` | View Channels, Send Messages, Read History, Embeds, Attachments, Threads, Reactions |
|
||
| Text + Voice | `274881432640` | All above + Connect, Speak |
|
||
|
||
**Re-invite the bot** with the updated permissions URL:
|
||
|
||
```
|
||
https://discord.com/oauth2/authorize?client_id=YOUR_APP_ID&scope=bot+applications.commands&permissions=274881432640
|
||
```
|
||
|
||
Replace `YOUR_APP_ID` with your Application ID from the Developer Portal.
|
||
|
||
:::warning
|
||
Re-inviting the bot to a server it's already in will update its permissions without removing it. You won't lose any data or configuration.
|
||
:::
|
||
|
||
#### 2. Privileged Gateway Intents
|
||
|
||
In the [Developer Portal](https://discord.com/developers/applications) → your application → **Bot** → **Privileged Gateway Intents**, enable all three:
|
||
|
||
| Intent | Purpose |
|
||
|--------|---------|
|
||
| **Presence Intent** | Detect user online/offline status |
|
||
| **Server Members Intent** | Map voice SSRC identifiers to Discord user IDs |
|
||
| **Message Content Intent** | Read text message content in channels |
|
||
|
||
All three are required for full voice channel functionality. **Server Members Intent** is especially critical — without it, the bot cannot identify who is speaking in the voice channel.
|
||
|
||
#### 3. Opus Codec
|
||
|
||
The Opus codec library must be installed on the machine running the gateway:
|
||
|
||
```bash
|
||
# macOS (Homebrew)
|
||
brew install opus
|
||
|
||
# Ubuntu/Debian
|
||
sudo apt install libopus0
|
||
```
|
||
|
||
The bot auto-loads the codec from:
|
||
- **macOS:** `/opt/homebrew/lib/libopus.dylib`
|
||
- **Linux:** `libopus.so.0`
|
||
|
||
#### 4. Environment Variables
|
||
|
||
```bash
|
||
# ~/.hermes/.env
|
||
|
||
# Discord bot (already configured for text)
|
||
DISCORD_BOT_TOKEN=your-bot-token
|
||
DISCORD_ALLOWED_USERS=your-user-id
|
||
|
||
# STT — local provider needs no key (pip install faster-whisper)
|
||
# GROQ_API_KEY=your-key # Alternative: cloud-based, fast, free tier
|
||
|
||
# TTS — optional. Edge TTS and NeuTTS need no key.
|
||
# ELEVENLABS_API_KEY=*** # Premium quality
|
||
# VOICE_TOOLS_OPENAI_KEY=*** # OpenAI TTS / Whisper
|
||
```
|
||
|
||
### Start the Gateway
|
||
|
||
```bash
|
||
hermes gateway # Start with existing configuration
|
||
```
|
||
|
||
The bot should come online in Discord within a few seconds.
|
||
|
||
### Commands
|
||
|
||
Use these in the Discord text channel where the bot is present:
|
||
|
||
```
|
||
/voice join Bot joins your current voice channel
|
||
/voice channel Alias for /voice join
|
||
/voice leave Bot disconnects from voice channel
|
||
/voice status Show voice mode and connected channel
|
||
```
|
||
|
||
:::info
|
||
You must be in a voice channel before running `/voice join`. The bot joins the same VC you're in.
|
||
:::
|
||
|
||
### How It Works
|
||
|
||
When the bot joins a voice channel, it:
|
||
|
||
1. **Listens** to each user's audio stream independently
|
||
2. **Detects silence** — 1.5s of silence after at least 0.5s of speech triggers processing
|
||
3. **Transcribes** the audio via Whisper STT (local, Groq, or OpenAI)
|
||
4. **Processes** through the full agent pipeline (session, tools, memory)
|
||
5. **Speaks** the reply back in the voice channel via TTS
|
||
|
||
### Text Channel Integration
|
||
|
||
When the bot is in a voice channel:
|
||
|
||
- Transcripts appear in the text channel: `[Voice] @user: what you said`
|
||
- Agent responses are sent as text in the channel AND spoken in the VC
|
||
- The text channel is the one where `/voice join` was issued
|
||
|
||
### Echo Prevention
|
||
|
||
The bot automatically pauses its audio listener while playing TTS replies, preventing it from hearing and re-processing its own output.
|
||
|
||
### Access Control
|
||
|
||
Only users listed in `DISCORD_ALLOWED_USERS` can interact via voice. Other users' audio is silently ignored.
|
||
|
||
```bash
|
||
# ~/.hermes/.env
|
||
DISCORD_ALLOWED_USERS=284102345871466496
|
||
```
|
||
|
||
---
|
||
|
||
## Configuration Reference
|
||
|
||
### config.yaml
|
||
|
||
```yaml
|
||
# Voice recording (CLI)
|
||
voice:
|
||
record_key: "ctrl+b" # Key to start/stop recording
|
||
max_recording_seconds: 120 # Maximum recording length
|
||
auto_tts: false # Auto-enable TTS when voice mode starts
|
||
beep_enabled: true # Play record start/stop beeps
|
||
silence_threshold: 200 # RMS level (0-32767) below which counts as silence
|
||
silence_duration: 3.0 # Seconds of silence before auto-stop
|
||
|
||
# Speech-to-Text
|
||
stt:
|
||
provider: "local" # "local" (free) | "groq" | "openai"
|
||
local:
|
||
model: "base" # tiny, base, small, medium, large-v3
|
||
# model: "whisper-1" # Legacy: used when provider is not set
|
||
|
||
# Text-to-Speech
|
||
tts:
|
||
provider: "edge" # "edge" (free) | "elevenlabs" | "openai" | "neutts" | "minimax"
|
||
edge:
|
||
voice: "en-US-AriaNeural" # 322 voices, 74 languages
|
||
elevenlabs:
|
||
voice_id: "pNInz6obpgDQGcFmaJgB" # Adam
|
||
model_id: "eleven_multilingual_v2"
|
||
openai:
|
||
model: "gpt-4o-mini-tts"
|
||
voice: "alloy" # alloy, echo, fable, onyx, nova, shimmer
|
||
base_url: "https://api.openai.com/v1" # optional: override for self-hosted or OpenAI-compatible endpoints
|
||
neutts:
|
||
ref_audio: ''
|
||
ref_text: ''
|
||
model: neuphonic/neutts-air-q4-gguf
|
||
device: cpu
|
||
```
|
||
|
||
### Environment Variables
|
||
|
||
```bash
|
||
# Speech-to-Text providers (local needs no key)
|
||
# pip install faster-whisper # Free local STT — no API key needed
|
||
GROQ_API_KEY=... # Groq Whisper (fast, free tier)
|
||
VOICE_TOOLS_OPENAI_KEY=... # OpenAI Whisper (paid)
|
||
|
||
# STT advanced overrides (optional)
|
||
STT_GROQ_MODEL=whisper-large-v3-turbo # Override default Groq STT model
|
||
STT_OPENAI_MODEL=whisper-1 # Override default OpenAI STT model
|
||
GROQ_BASE_URL=https://api.groq.com/openai/v1 # Custom Groq endpoint
|
||
STT_OPENAI_BASE_URL=https://api.openai.com/v1 # Custom OpenAI STT endpoint
|
||
|
||
# Text-to-Speech providers (Edge TTS and NeuTTS need no key)
|
||
ELEVENLABS_API_KEY=*** # ElevenLabs (premium quality)
|
||
# VOICE_TOOLS_OPENAI_KEY above also enables OpenAI TTS
|
||
|
||
# Discord voice channel
|
||
DISCORD_BOT_TOKEN=...
|
||
DISCORD_ALLOWED_USERS=...
|
||
```
|
||
|
||
### STT Provider Comparison
|
||
|
||
| Provider | Model | Speed | Quality | Cost | API Key |
|
||
|----------|-------|-------|---------|------|---------|
|
||
| **Local** | `base` | Fast (depends on CPU/GPU) | Good | Free | No |
|
||
| **Local** | `small` | Medium | Better | Free | No |
|
||
| **Local** | `large-v3` | Slow | Best | Free | No |
|
||
| **Groq** | `whisper-large-v3-turbo` | Very fast (~0.5s) | Good | Free tier | Yes |
|
||
| **Groq** | `whisper-large-v3` | Fast (~1s) | Better | Free tier | Yes |
|
||
| **OpenAI** | `whisper-1` | Fast (~1s) | Good | Paid | Yes |
|
||
| **OpenAI** | `gpt-4o-transcribe` | Medium (~2s) | Best | Paid | Yes |
|
||
|
||
Provider priority (automatic fallback): **local** > **groq** > **openai**
|
||
|
||
### TTS Provider Comparison
|
||
|
||
| Provider | Quality | Cost | Latency | Key Required |
|
||
|----------|---------|------|---------|-------------|
|
||
| **Edge TTS** | Good | Free | ~1s | No |
|
||
| **ElevenLabs** | Excellent | Paid | ~2s | Yes |
|
||
| **OpenAI TTS** | Good | Paid | ~1.5s | Yes |
|
||
| **NeuTTS** | Good | Free | Depends on CPU/GPU | No |
|
||
|
||
NeuTTS uses the `tts.neutts` config block above.
|
||
|
||
---
|
||
|
||
## Troubleshooting
|
||
|
||
### "No audio device found" (CLI)
|
||
|
||
PortAudio is not installed:
|
||
|
||
```bash
|
||
brew install portaudio # macOS
|
||
sudo apt install portaudio19-dev # Ubuntu
|
||
```
|
||
|
||
### Bot doesn't respond in Discord server channels
|
||
|
||
The bot requires an @mention by default in server channels. Make sure you:
|
||
|
||
1. Type `@` and select the **bot user** (with the #discriminator), not the **role** with the same name
|
||
2. Or use DMs instead — no mention needed
|
||
3. Or set `DISCORD_REQUIRE_MENTION=false` in `~/.hermes/.env`
|
||
|
||
### Bot joins VC but doesn't hear me
|
||
|
||
- Check your Discord user ID is in `DISCORD_ALLOWED_USERS`
|
||
- Make sure you're not muted in Discord
|
||
- The bot needs a SPEAKING event from Discord before it can map your audio — start speaking within a few seconds of joining
|
||
|
||
### Bot hears me but doesn't respond
|
||
|
||
- Verify STT is available: install `faster-whisper` (no key needed) or set `GROQ_API_KEY` / `VOICE_TOOLS_OPENAI_KEY`
|
||
- Check the LLM model is configured and accessible
|
||
- Review gateway logs: `tail -f ~/.hermes/logs/gateway.log`
|
||
|
||
### Bot responds in text but not in voice channel
|
||
|
||
- TTS provider may be failing — check API key and quota
|
||
- Edge TTS (free, no key) is the default fallback
|
||
- Check logs for TTS errors
|
||
|
||
### Whisper returns garbage text
|
||
|
||
The hallucination filter catches most cases automatically. If you're still getting phantom transcripts:
|
||
|
||
- Use a quieter environment
|
||
- Adjust `silence_threshold` in config (higher = less sensitive)
|
||
- Try a different STT model
|