Compare commits

...

1 Commits

Author SHA1 Message Date
emozilla
984e6cb5b8 feat(whatsapp): add WhatsApp Business Cloud API adapter
Add an official, production-grade WhatsApp integration via Meta's
Business Cloud API as a complement to the existing Baileys bridge.
No bridge subprocess, no QR codes, no account-ban risk — at the cost
of a Meta Business account and a public HTTPS webhook URL.

Setup is fully wizard-driven: 'hermes whatsapp-cloud' walks through
every credential with paste-time validation (catches the #1 trap of
pasting a phone number into the Phone Number ID field), generates a
verify token, and ends with copy-paste instructions for the
cloudflared / Meta-dashboard / Business Manager pieces that can't be
automated. The wizard also points users at Meta's Business Manager
for setting the bot's display name and profile picture.

Feature set:

- Inbound: text, images (with native-vision routing), voice notes
  (STT), documents (small text inlined, larger cached), reply context.
- Outbound: text with WhatsApp-flavored markdown conversion, images,
  videos, documents, opus voice notes via ffmpeg with MP3 fallback.
- Native interactive buttons for clarify, dangerous-command approval,
  and slash-command confirmation flows — matches the Telegram /
  Discord UX, graceful degrades to plain text.
- Read receipts (blue double-checkmarks) and typing indicator,
  using Meta's combined endpoint so they fire in a single API call.
- Webhook security: X-Hub-Signature-256 HMAC verification (raw body,
  constant-time), wamid deduplication, group-shaped-message refusal
  (groups deferred to v2 — Baileys still covers them).
- Full integration with the gateway's session, cron, display-tier,
  prompt-hint, and auth-allowlist systems. Cloud and Baileys can run
  side-by-side against different phone numbers.

Also wires STT (speech-to-text) through Nous's managed audio gateway
for Nous subscribers — previously the default stt.provider=local
required a separate faster-whisper install. New subscribers now get
voice-note transcription out of the box.

Docs: 418-line user guide at website/docs/user-guide/messaging/
whatsapp-cloud.md, sidebar entry, environment-variables reference,
ADDING_A_PLATFORM.md updated with the optional interactive-UX
contract for future adapter authors.

Tests: 100 dedicated tests for the adapter, 32 for the setup wizard,
20 for the Nous subscription STT wiring, plus regression coverage
across display_config, prompt_builder, and the cron scheduler.

Known limitations (deferred until clear demand signal):
- Group chats — use the Baileys bridge if you need them.
- Message templates for 24-hour-window outside-conversation sends —
  reactive chat is unaffected; cron / delegate_task with gaps > 24h
  will fail with a clear error. The agent's system prompt warns the
  model about this so it knows to mention it when scheduling delayed
  messages.
2026-05-23 01:07:01 -04:00
26 changed files with 6368 additions and 287 deletions

View File

@@ -428,6 +428,23 @@ PLATFORM_HINTS = {
"files arrive as downloadable documents. You can also include image "
"URLs in markdown format ![alt](url) and they will be sent as photos."
),
"whatsapp_cloud": (
"You are on a text messaging communication platform, WhatsApp "
"(via Meta's official Business Cloud API). Standard markdown "
"(**bold**, ~~strike~~, # headers, [links](url)) is auto-converted "
"to WhatsApp's native syntax (*bold*, ~strike~, etc.) — feel free "
"to write in markdown. Tables are NOT supported — prefer bullet "
"lists or labeled key:value pairs. "
"You can send media files natively: include MEDIA:/absolute/path/to/file "
"in your response. Images (.jpg, .png) become photo attachments, "
"videos (.mp4) play inline, audio (.mp3, .ogg) sends as voice/audio "
"messages, other files arrive as documents. Image URLs in markdown "
"format ![alt](url) also work. "
"IMPORTANT: this platform has a 24-hour conversation window — if the "
"user hasn't messaged in 24h, free-form replies are refused by Meta "
"(error 131047). This rarely matters for live chat, but is worth "
"knowing if you're scheduling a delayed message."
),
"telegram": (
"You are on a text messaging communication platform, Telegram. "
"Standard markdown is automatically converted to Telegram format. "
@@ -1279,13 +1296,13 @@ def build_nous_subscription_prompt(valid_tool_names: "set[str] | None" = None) -
lines = [
"# Nous Subscription",
"Nous subscription includes managed web tools (Firecrawl), image generation (FAL), OpenAI TTS, and browser automation (Browser Use) by default. Modal execution is optional.",
"Nous subscription includes managed web tools (Firecrawl), image generation (FAL), OpenAI TTS, OpenAI Whisper STT, and browser automation (Browser Use) by default. Modal execution is optional.",
"Current capability status:",
]
lines.extend(_status_line(feature) for feature in features.items())
lines.extend(
[
"When a Nous-managed feature is active, do not ask the user for Firecrawl, FAL, OpenAI TTS, or Browser-Use API keys.",
"When a Nous-managed feature is active, do not ask the user for Firecrawl, FAL, OpenAI TTS, OpenAI Whisper, or Browser-Use API keys.",
"If the user is not subscribed and asks for a capability that Nous subscription would unlock or simplify, suggest Nous subscription as one option alongside direct setup or local alternatives.",
"Do not mention subscription unless the user asks about it or it directly solves the current missing capability.",
"Useful commands: hermes setup, hermes setup tools, hermes setup terminal, hermes status.",

View File

@@ -114,6 +114,7 @@ _HOME_TARGET_ENV_VARS = {
"bluebubbles": "BLUEBUBBLES_HOME_CHANNEL",
"qqbot": "QQBOT_HOME_CHANNEL",
"whatsapp": "WHATSAPP_HOME_CHANNEL",
"whatsapp_cloud": "WHATSAPP_CLOUD_HOME_CHANNEL",
}
# Legacy env var names kept for back-compat. Each entry is the current

View File

@@ -109,6 +109,7 @@ class Platform(Enum):
TELEGRAM = "telegram"
DISCORD = "discord"
WHATSAPP = "whatsapp"
WHATSAPP_CLOUD = "whatsapp_cloud"
SLACK = "slack"
SIGNAL = "signal"
MATTERMOST = "mattermost"
@@ -419,6 +420,9 @@ _PLATFORM_CONNECTED_CHECKERS: dict[Platform, Callable[[PlatformConfig], bool]] =
cfg.extra.get("account_id") and (cfg.token or cfg.extra.get("token"))
),
Platform.WHATSAPP: lambda cfg: True, # bridge handles auth
Platform.WHATSAPP_CLOUD: lambda cfg: bool(
cfg.extra.get("phone_number_id") and cfg.extra.get("access_token")
),
Platform.SIGNAL: lambda cfg: bool(cfg.extra.get("http_url")),
Platform.EMAIL: lambda cfg: bool(cfg.extra.get("address")),
Platform.SMS: lambda cfg: bool(os.getenv("TWILIO_ACCOUNT_SID")),
@@ -1367,6 +1371,61 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
thread_id=os.getenv("WHATSAPP_HOME_CHANNEL_THREAD_ID") or None,
)
# WhatsApp Cloud API (official Business Platform via Meta).
# Distinct from the Baileys bridge: pure HTTP graph.facebook.com calls
# outbound, public webhook inbound. Both adapters can run in parallel
# against different phone numbers.
whatsapp_cloud_phone_id = os.getenv("WHATSAPP_CLOUD_PHONE_NUMBER_ID")
whatsapp_cloud_token = os.getenv("WHATSAPP_CLOUD_ACCESS_TOKEN")
if whatsapp_cloud_phone_id and whatsapp_cloud_token:
if Platform.WHATSAPP_CLOUD not in config.platforms:
config.platforms[Platform.WHATSAPP_CLOUD] = PlatformConfig()
config.platforms[Platform.WHATSAPP_CLOUD].enabled = True
config.platforms[Platform.WHATSAPP_CLOUD].extra.update({
"phone_number_id": whatsapp_cloud_phone_id,
"access_token": whatsapp_cloud_token,
})
# Optional: app_id / app_secret (signature verification)
wa_cloud_app_id = os.getenv("WHATSAPP_CLOUD_APP_ID")
if wa_cloud_app_id:
config.platforms[Platform.WHATSAPP_CLOUD].extra["app_id"] = wa_cloud_app_id
wa_cloud_app_secret = os.getenv("WHATSAPP_CLOUD_APP_SECRET")
if wa_cloud_app_secret:
config.platforms[Platform.WHATSAPP_CLOUD].extra["app_secret"] = wa_cloud_app_secret
# Optional: WABA id (analytics, future use)
wa_cloud_waba_id = os.getenv("WHATSAPP_CLOUD_WABA_ID")
if wa_cloud_waba_id:
config.platforms[Platform.WHATSAPP_CLOUD].extra["waba_id"] = wa_cloud_waba_id
# Webhook verify token — Meta hub.verify_token shared secret
wa_cloud_verify_token = os.getenv("WHATSAPP_CLOUD_VERIFY_TOKEN")
if wa_cloud_verify_token:
config.platforms[Platform.WHATSAPP_CLOUD].extra["verify_token"] = wa_cloud_verify_token
# Webhook server bind config (defaults baked into the adapter)
wa_cloud_host = os.getenv("WHATSAPP_CLOUD_WEBHOOK_HOST")
if wa_cloud_host:
config.platforms[Platform.WHATSAPP_CLOUD].extra["webhook_host"] = wa_cloud_host
wa_cloud_port = os.getenv("WHATSAPP_CLOUD_WEBHOOK_PORT")
if wa_cloud_port:
try:
config.platforms[Platform.WHATSAPP_CLOUD].extra["webhook_port"] = int(wa_cloud_port)
except ValueError:
pass
wa_cloud_path = os.getenv("WHATSAPP_CLOUD_WEBHOOK_PATH")
if wa_cloud_path:
config.platforms[Platform.WHATSAPP_CLOUD].extra["webhook_path"] = wa_cloud_path
# Graph API version override (rarely needed)
wa_cloud_api_version = os.getenv("WHATSAPP_CLOUD_API_VERSION")
if wa_cloud_api_version:
config.platforms[Platform.WHATSAPP_CLOUD].extra["api_version"] = wa_cloud_api_version
whatsapp_cloud_home = os.getenv("WHATSAPP_CLOUD_HOME_CHANNEL")
if whatsapp_cloud_home and Platform.WHATSAPP_CLOUD in config.platforms:
config.platforms[Platform.WHATSAPP_CLOUD].home_channel = HomeChannel(
platform=Platform.WHATSAPP_CLOUD,
chat_id=whatsapp_cloud_home,
name=os.getenv("WHATSAPP_CLOUD_HOME_CHANNEL_NAME", "Home"),
thread_id=os.getenv("WHATSAPP_CLOUD_HOME_CHANNEL_THREAD_ID") or None,
)
# Slack
slack_token = os.getenv("SLACK_BOT_TOKEN")
if slack_token:

View File

@@ -95,6 +95,12 @@ _PLATFORM_DEFAULTS: dict[str, dict[str, Any]] = {
# Tier 3 — no edit support, progress messages are permanent
"signal": _TIER_LOW,
"whatsapp": _TIER_MEDIUM, # Baileys bridge supports /edit
# WhatsApp Cloud API: Meta added message editing in 2023 but the
# Hermes Cloud adapter doesn't implement edit_message yet, so we
# stay on TIER_LOW (tool_progress off) to avoid spamming each
# status update as a separate message. Promote to TIER_MEDIUM once
# Cloud's edit_message lands.
"whatsapp_cloud": _TIER_LOW,
"bluebubbles": _TIER_LOW,
"weixin": _TIER_LOW,
"wecom": _TIER_LOW,

View File

@@ -52,6 +52,22 @@ for the full pattern (Template Buttons postback at 45s, `RequestCache`
state machine, `interrupt_session_activity` override for `/stop`
orphans) and the developer-guide page for the prose walkthrough.
**Sibling adapters that share behavior.** When a single platform has
two transport modes the user picks between — unofficial vs official
APIs, polling vs websocket, library A vs library B — the right
structure is two adapters that share a behavior mixin. WhatsApp does
this: `gateway/platforms/whatsapp.py` (Baileys bridge) and
`gateway/platforms/whatsapp_cloud.py` (Meta Cloud API) both inherit
from `WhatsAppBehaviorMixin` in `gateway/platforms/whatsapp_common.py`.
The mixin owns gating, allow-lists, mention parsing, broadcast
filters, and the WhatsApp-flavored markdown conversion — everything
that's platform-protocol-agnostic. Each adapter owns its transport.
Both register distinct `Platform.*` enum values so the gateway can run
both simultaneously against different phone numbers. The mixin must
come **first** in the bases list — `class WhatsAppAdapter(Mixin,
BasePlatformAdapter)` — so the mixin's `format_message` overrides
`BasePlatformAdapter`'s generic default.
See `plugins/platforms/irc/`, `plugins/platforms/teams/`, and
`plugins/platforms/google_chat/` for complete working examples, and
`website/docs/developer-guide/adding-platform-adapters.md` for the full
@@ -94,6 +110,19 @@ The adapter is a subclass of `BasePlatformAdapter` from `gateway/platforms/base.
| `send_animation(chat_id, path, caption)` | Send a GIF/animation |
| `send_image_file(chat_id, path, caption)` | Send image from local file |
### Interactive UX (recommended if your platform supports tappable buttons)
If your platform supports interactive button/menu messages, implement these for a more polished agent experience. They all degrade gracefully to plain text when not overridden:
| Method | Purpose |
|--------|---------|
| `send_clarify(chat_id, question, choices, clarify_id, session_key, ...)` | Render the `clarify` tool's multi-choice question as tappable buttons. Pair with inbound dispatch that routes button taps to `tools.clarify_gateway.resolve_gateway_clarify`. |
| `send_exec_approval(chat_id, command, session_key, description, ...)` | Render dangerous-command approval as Approve/Deny buttons. Inbound dispatch routes to `tools.approval.resolve_gateway_approval`. |
| `send_slash_confirm(chat_id, title, message, session_key, confirm_id, ...)` | Render slash-command confirmations (e.g. `/reload-mcp`) as Once/Always/Cancel buttons. Inbound dispatch routes to `tools.slash_confirm.resolve`. |
| `send_model_picker(...)` | Interactive `/model` picker. Used by Telegram and Discord. |
See `gateway/platforms/telegram.py`, `discord.py`, and `whatsapp_cloud.py` for reference implementations. The button-callback id convention (`cl:<id>:<idx>`, `appr:<id>:<choice>`, `sc:<choice>:<id>`) is shared across adapters — match it so the gateway-side resolvers work without modification.
### Required function
```python

View File

@@ -16,11 +16,9 @@ with different backends via a bridge pattern.
"""
import asyncio
import json
import logging
import os
import platform
import re
import shutil
import signal
import subprocess
@@ -180,6 +178,7 @@ import sys
sys.path.insert(0, str(Path(__file__).resolve().parents[2]))
from gateway.config import Platform, PlatformConfig
from gateway.platforms.whatsapp_common import WhatsAppBehaviorMixin
from gateway.platforms.base import (
BasePlatformAdapter,
MessageEvent,
@@ -215,7 +214,7 @@ def check_whatsapp_requirements() -> bool:
return False
class WhatsAppAdapter(BasePlatformAdapter):
class WhatsAppAdapter(WhatsAppBehaviorMixin, BasePlatformAdapter):
"""
WhatsApp adapter.
@@ -237,13 +236,12 @@ class WhatsAppAdapter(BasePlatformAdapter):
- allow_from: List of sender IDs allowed in DMs (when dm_policy="allowlist")
- group_policy: "open" | "allowlist" | "disabled" — which groups are processed (default: "open")
- group_allow_from: List of group JIDs allowed (when group_policy="allowlist")
Behavior (gating, mention parsing, markdown conversion, chunking) is
provided by ``WhatsAppBehaviorMixin`` so the Cloud API adapter can
share it. Only transport-specific code lives here.
"""
# WhatsApp message limits — practical UX limit, not protocol max.
# WhatsApp allows ~65K but long messages are unreadable on mobile.
MAX_MESSAGE_LENGTH = 4096
DEFAULT_REPLY_PREFIX = "⚕ *Hermes Agent*\n────────────\n"
# Default bridge location relative to the hermes-agent install
_DEFAULT_BRIDGE_DIR = Path(__file__).resolve().parents[2] / "scripts" / "whatsapp-bridge"
@@ -278,213 +276,6 @@ class WhatsAppAdapter(BasePlatformAdapter):
# notification before the normal "✓ whatsapp disconnected" fires.
self._shutting_down: bool = False
def _effective_reply_prefix(self) -> str:
"""Return the prefix the Node bridge will add in self-chat mode."""
whatsapp_mode = os.getenv("WHATSAPP_MODE", "self-chat")
if whatsapp_mode != "self-chat":
return ""
if self._reply_prefix is not None:
return self._reply_prefix.replace("\\n", "\n")
env_prefix = os.getenv("WHATSAPP_REPLY_PREFIX")
if env_prefix is not None:
return env_prefix.replace("\\n", "\n")
return self.DEFAULT_REPLY_PREFIX
def _outgoing_chunk_limit(self) -> int:
"""Reserve room for the bridge-side prefix so final WhatsApp text fits."""
prefix_len = len(self._effective_reply_prefix())
# Keep enough space for truncate_message's pagination indicator and
# code-fence repair even if a user configures a very long prefix.
return max(1024, self.MAX_MESSAGE_LENGTH - prefix_len)
def _whatsapp_require_mention(self) -> bool:
configured = self.config.extra.get("require_mention")
if configured is not None:
if isinstance(configured, str):
return configured.lower() in {"true", "1", "yes", "on"}
return bool(configured)
return os.getenv("WHATSAPP_REQUIRE_MENTION", "false").lower() in {"true", "1", "yes", "on"}
def _whatsapp_free_response_chats(self) -> set[str]:
raw = self.config.extra.get("free_response_chats")
if raw is None:
raw = os.getenv("WHATSAPP_FREE_RESPONSE_CHATS", "")
if isinstance(raw, list):
return {str(part).strip() for part in raw if str(part).strip()}
return {part.strip() for part in str(raw).split(",") if part.strip()}
@staticmethod
def _coerce_allow_list(raw) -> set[str]:
"""Parse allow_from / group_allow_from from config or env var."""
if raw is None:
return set()
if isinstance(raw, list):
return {str(part).strip() for part in raw if str(part).strip()}
return {part.strip() for part in str(raw).split(",") if part.strip()}
@staticmethod
def _is_broadcast_chat(chat_id: str) -> bool:
"""True for WhatsApp pseudo-chats that aren't real conversations.
Covers Status updates (Stories) and Channel/Newsletter broadcasts.
These show up as inbound messages on Baileys but the agent should
never reply — answering a Story update spams the contact's status
feed, and Channel posts aren't addressable in the first place.
"""
if not chat_id:
return False
cid = chat_id.strip().lower()
if cid == "status@broadcast":
return True
# @broadcast suffix covers status@broadcast plus any future
# broadcast-list variants. @newsletter is the Channel JID suffix.
if cid.endswith("@broadcast") or cid.endswith("@newsletter"):
return True
return False
def _is_dm_allowed(self, sender_id: str) -> bool:
"""Check whether a DM from the given sender should be processed."""
if self._dm_policy == "disabled":
return False
if self._dm_policy == "allowlist":
return sender_id in self._allow_from
# "open" — all DMs allowed
return True
def _is_group_allowed(self, chat_id: str) -> bool:
"""Check whether a group chat should be processed."""
if self._group_policy == "disabled":
return False
if self._group_policy == "allowlist":
return chat_id in self._group_allow_from
# "open" — all groups allowed
return True
def _compile_mention_patterns(self):
patterns = self.config.extra.get("mention_patterns")
if patterns is None:
raw = os.getenv("WHATSAPP_MENTION_PATTERNS", "").strip()
if raw:
try:
patterns = json.loads(raw)
except Exception:
patterns = [part.strip() for part in raw.splitlines() if part.strip()]
if not patterns:
patterns = [part.strip() for part in raw.split(",") if part.strip()]
if patterns is None:
return []
if isinstance(patterns, str):
patterns = [patterns]
if not isinstance(patterns, list):
logger.warning("[%s] whatsapp mention_patterns must be a list or string; got %s", self.name, type(patterns).__name__)
return []
compiled = []
for pattern in patterns:
if not isinstance(pattern, str) or not pattern.strip():
continue
try:
compiled.append(re.compile(pattern, re.IGNORECASE))
except re.error as exc:
logger.warning("[%s] Invalid WhatsApp mention pattern %r: %s", self.name, pattern, exc)
if compiled:
logger.info("[%s] Loaded %d WhatsApp mention pattern(s)", self.name, len(compiled))
return compiled
@staticmethod
def _normalize_whatsapp_id(value: Optional[str]) -> str:
if not value:
return ""
normalized = str(value).strip()
if ":" in normalized and "@" in normalized:
normalized = normalized.replace(":", "@", 1)
return normalized
def _bot_ids_from_message(self, data: Dict[str, Any]) -> set[str]:
bot_ids = set()
for candidate in data.get("botIds") or []:
normalized = self._normalize_whatsapp_id(candidate)
if normalized:
bot_ids.add(normalized)
return bot_ids
def _message_is_reply_to_bot(self, data: Dict[str, Any]) -> bool:
quoted_participant = self._normalize_whatsapp_id(data.get("quotedParticipant"))
if not quoted_participant:
return False
return quoted_participant in self._bot_ids_from_message(data)
def _message_mentions_bot(self, data: Dict[str, Any]) -> bool:
bot_ids = self._bot_ids_from_message(data)
if not bot_ids:
return False
mentioned_ids = {
nid
for candidate in (data.get("mentionedIds") or [])
if (nid := self._normalize_whatsapp_id(candidate))
}
if mentioned_ids & bot_ids:
return True
body = str(data.get("body") or "")
lower_body = body.lower()
for bot_id in bot_ids:
bare_id = bot_id.split("@", 1)[0].lower()
if bare_id and (f"@{bare_id}" in lower_body or bare_id in lower_body):
return True
return False
def _message_matches_mention_patterns(self, data: Dict[str, Any]) -> bool:
if not self._mention_patterns:
return False
body = str(data.get("body") or "")
return any(pattern.search(body) for pattern in self._mention_patterns)
def _clean_bot_mention_text(self, text: str, data: Dict[str, Any]) -> str:
if not text:
return text
bot_ids = self._bot_ids_from_message(data)
cleaned = text
for bot_id in bot_ids:
bare_id = bot_id.split("@", 1)[0]
if bare_id:
cleaned = re.sub(rf"@{re.escape(bare_id)}\b[,:\-]*\s*", "", cleaned)
return cleaned.strip() or text
def _should_process_message(self, data: Dict[str, Any]) -> bool:
chat_id_raw = str(data.get("chatId") or "")
# WhatsApp uses pseudo-chats for Status updates (Stories) and
# Channel/Newsletter broadcasts. These are not real conversations
# and the agent should never reply to them — even in self-chat mode
# where the bridge may surface them as "fromMe" events.
if self._is_broadcast_chat(chat_id_raw):
return False
is_group = data.get("isGroup", False)
if is_group:
chat_id = chat_id_raw
if not self._is_group_allowed(chat_id):
return False
else:
sender_id = str(data.get("senderId") or data.get("from") or "")
if not self._is_dm_allowed(sender_id):
return False
# DMs that pass the policy gate are always processed
return True
# Group messages: check mention / free-response settings
chat_id = str(data.get("chatId") or "")
if chat_id in self._whatsapp_free_response_chats():
return True
if not self._whatsapp_require_mention():
return True
body = str(data.get("body") or "").strip()
if body.startswith("/"):
return True
if self._message_is_reply_to_bot(data):
return True
if self._message_mentions_bot(data):
return True
return self._message_matches_mention_patterns(data)
async def connect(self) -> bool:
"""
Start the WhatsApp bridge.
@@ -808,63 +599,6 @@ class WhatsAppAdapter(BasePlatformAdapter):
self._close_bridge_log()
print(f"[{self.name}] Disconnected")
def format_message(self, content: str) -> str:
"""Convert standard markdown to WhatsApp-compatible formatting.
WhatsApp supports: *bold*, _italic_, ~strikethrough~, ```code```,
and monospaced `inline`. Standard markdown uses different syntax
for bold/italic/strikethrough, so we convert here.
Code blocks (``` fenced) and inline code (`) are protected from
conversion via placeholder substitution.
"""
if not content:
return content
# --- 1. Protect fenced code blocks from formatting changes ---
_FENCE_PH = "\x00FENCE"
fences: list[str] = []
def _save_fence(m: re.Match) -> str:
fences.append(m.group(0))
return f"{_FENCE_PH}{len(fences) - 1}\x00"
result = re.sub(r"```[\s\S]*?```", _save_fence, content)
# --- 2. Protect inline code ---
_CODE_PH = "\x00CODE"
codes: list[str] = []
def _save_code(m: re.Match) -> str:
codes.append(m.group(0))
return f"{_CODE_PH}{len(codes) - 1}\x00"
result = re.sub(r"`[^`\n]+`", _save_code, result)
# --- 3. Convert markdown formatting to WhatsApp syntax ---
# Bold: **text** or __text__ → *text*
result = re.sub(r"\*\*(.+?)\*\*", r"*\1*", result)
result = re.sub(r"__(.+?)__", r"*\1*", result)
# Strikethrough: ~~text~~ → ~text~
result = re.sub(r"~~(.+?)~~", r"~\1~", result)
# Italic: *text* is already WhatsApp italic — leave as-is
# _text_ is already WhatsApp italic — leave as-is
# --- 4. Convert markdown headers to bold text ---
# # Header → *Header*
result = re.sub(r"^#{1,6}\s+(.+)$", r"*\1*", result, flags=re.MULTILINE)
# --- 5. Convert markdown links: [text](url) → text (url) ---
result = re.sub(r"\[([^\]]+)\]\(([^)]+)\)", r"\1 (\2)", result)
# --- 6. Restore protected sections ---
for i, fence in enumerate(fences):
result = result.replace(f"{_FENCE_PH}{i}\x00", fence)
for i, code in enumerate(codes):
result = result.replace(f"{_CODE_PH}{i}\x00", code)
return result
async def send(
self,
chat_id: str,

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,351 @@
"""
Transport-agnostic WhatsApp behavior shared by the Baileys bridge adapter
and the official WhatsApp Cloud API adapter.
The mixin provides:
- Allow-list / DM / group gating
- Mention detection (explicit @-mentions + configurable regex patterns)
- Quoted-reply-to-bot detection
- Broadcast / Channel / Newsletter filtering
- WhatsApp-flavored markdown conversion
- Outgoing chunk length budgeting
It is the *behavior layer*. Transport-specific concerns (subprocess management,
HTTP webhooks, Graph API calls, media upload protocols) live in each adapter.
Mixin contract — the adapter must set these on ``self`` before any of the
mixin's methods are called (typically in ``__init__``):
self.config # gateway.config.PlatformConfig
self.name # str — adapter name (used in log lines)
self._dm_policy # str: "open" | "allowlist" | "disabled"
self._allow_from # set[str]
self._group_policy # str: "open" | "allowlist" | "disabled"
self._group_allow_from # set[str]
self._mention_patterns # list[re.Pattern]
self._reply_prefix # Optional[str]
Class attributes ``MAX_MESSAGE_LENGTH`` and ``DEFAULT_REPLY_PREFIX`` are
defined on the mixin and may be overridden per-adapter if needed.
"""
from __future__ import annotations
import json
import logging
import os
import re
from typing import Any, Dict, Optional
logger = logging.getLogger(__name__)
class WhatsAppBehaviorMixin:
"""Shared behavior for all WhatsApp adapters (Baileys + Cloud API).
See module docstring for the attribute contract the host adapter must
satisfy. This mixin owns no state of its own — every value it touches
is either a class attribute or set by the adapter's ``__init__``.
"""
# WhatsApp message limits — practical UX limit, not protocol max.
# WhatsApp allows ~65K but long messages are unreadable on mobile.
MAX_MESSAGE_LENGTH: int = 4096
DEFAULT_REPLY_PREFIX: str = "⚕ *Hermes Agent*\n────────────\n"
# ------------------------------------------------------------------ config
def _effective_reply_prefix(self) -> str:
"""Return the prefix to add to outgoing replies in self-chat mode.
Subclasses that don't have a self-chat concept (the Cloud API
adapter) can override this to always return ``""`` or apply a
different policy.
"""
whatsapp_mode = os.getenv("WHATSAPP_MODE", "self-chat")
if whatsapp_mode != "self-chat":
return ""
if self._reply_prefix is not None:
return self._reply_prefix.replace("\\n", "\n")
env_prefix = os.getenv("WHATSAPP_REPLY_PREFIX")
if env_prefix is not None:
return env_prefix.replace("\\n", "\n")
return self.DEFAULT_REPLY_PREFIX
def _outgoing_chunk_limit(self) -> int:
"""Reserve room for the reply prefix so the final message fits."""
prefix_len = len(self._effective_reply_prefix())
# Keep enough space for truncate_message's pagination indicator and
# code-fence repair even if a user configures a very long prefix.
return max(1024, self.MAX_MESSAGE_LENGTH - prefix_len)
def _whatsapp_require_mention(self) -> bool:
configured = self.config.extra.get("require_mention")
if configured is not None:
if isinstance(configured, str):
return configured.lower() in {"true", "1", "yes", "on"}
return bool(configured)
return os.getenv("WHATSAPP_REQUIRE_MENTION", "false").lower() in {
"true",
"1",
"yes",
"on",
}
def _whatsapp_free_response_chats(self) -> set[str]:
raw = self.config.extra.get("free_response_chats")
if raw is None:
raw = os.getenv("WHATSAPP_FREE_RESPONSE_CHATS", "")
if isinstance(raw, list):
return {str(part).strip() for part in raw if str(part).strip()}
return {part.strip() for part in str(raw).split(",") if part.strip()}
@staticmethod
def _coerce_allow_list(raw) -> set[str]:
"""Parse allow_from / group_allow_from from config or env var."""
if raw is None:
return set()
if isinstance(raw, list):
return {str(part).strip() for part in raw if str(part).strip()}
return {part.strip() for part in str(raw).split(",") if part.strip()}
# ------------------------------------------------------------------ JID helpers
@staticmethod
def _normalize_whatsapp_id(value: Optional[str]) -> str:
if not value:
return ""
normalized = str(value).strip()
if ":" in normalized and "@" in normalized:
normalized = normalized.replace(":", "@", 1)
return normalized
@staticmethod
def _is_broadcast_chat(chat_id: str) -> bool:
"""True for WhatsApp pseudo-chats that aren't real conversations.
Covers Status updates (Stories) and Channel/Newsletter broadcasts.
These show up as inbound messages on Baileys but the agent should
never reply — answering a Story update spams the contact's status
feed, and Channel posts aren't addressable in the first place.
"""
if not chat_id:
return False
cid = chat_id.strip().lower()
if cid == "status@broadcast":
return True
# @broadcast suffix covers status@broadcast plus any future
# broadcast-list variants. @newsletter is the Channel JID suffix.
if cid.endswith("@broadcast") or cid.endswith("@newsletter"):
return True
return False
# ------------------------------------------------------------------ gating
def _is_dm_allowed(self, sender_id: str) -> bool:
"""Check whether a DM from the given sender should be processed."""
if self._dm_policy == "disabled":
return False
if self._dm_policy == "allowlist":
return sender_id in self._allow_from
# "open" — all DMs allowed
return True
def _is_group_allowed(self, chat_id: str) -> bool:
"""Check whether a group chat should be processed."""
if self._group_policy == "disabled":
return False
if self._group_policy == "allowlist":
return chat_id in self._group_allow_from
# "open" — all groups allowed
return True
def _compile_mention_patterns(self):
patterns = self.config.extra.get("mention_patterns")
if patterns is None:
raw = os.getenv("WHATSAPP_MENTION_PATTERNS", "").strip()
if raw:
try:
patterns = json.loads(raw)
except Exception:
patterns = [
part.strip() for part in raw.splitlines() if part.strip()
]
if not patterns:
patterns = [
part.strip() for part in raw.split(",") if part.strip()
]
if patterns is None:
return []
if isinstance(patterns, str):
patterns = [patterns]
if not isinstance(patterns, list):
logger.warning(
"[%s] whatsapp mention_patterns must be a list or string; got %s",
self.name,
type(patterns).__name__,
)
return []
compiled = []
for pattern in patterns:
if not isinstance(pattern, str) or not pattern.strip():
continue
try:
compiled.append(re.compile(pattern, re.IGNORECASE))
except re.error as exc:
logger.warning(
"[%s] Invalid WhatsApp mention pattern %r: %s",
self.name,
pattern,
exc,
)
if compiled:
logger.info(
"[%s] Loaded %d WhatsApp mention pattern(s)", self.name, len(compiled)
)
return compiled
def _bot_ids_from_message(self, data: Dict[str, Any]) -> set[str]:
bot_ids = set()
for candidate in data.get("botIds") or []:
normalized = self._normalize_whatsapp_id(candidate)
if normalized:
bot_ids.add(normalized)
return bot_ids
def _message_is_reply_to_bot(self, data: Dict[str, Any]) -> bool:
quoted_participant = self._normalize_whatsapp_id(data.get("quotedParticipant"))
if not quoted_participant:
return False
return quoted_participant in self._bot_ids_from_message(data)
def _message_mentions_bot(self, data: Dict[str, Any]) -> bool:
bot_ids = self._bot_ids_from_message(data)
if not bot_ids:
return False
mentioned_ids = {
nid
for candidate in (data.get("mentionedIds") or [])
if (nid := self._normalize_whatsapp_id(candidate))
}
if mentioned_ids & bot_ids:
return True
body = str(data.get("body") or "")
lower_body = body.lower()
for bot_id in bot_ids:
bare_id = bot_id.split("@", 1)[0].lower()
if bare_id and (f"@{bare_id}" in lower_body or bare_id in lower_body):
return True
return False
def _message_matches_mention_patterns(self, data: Dict[str, Any]) -> bool:
if not self._mention_patterns:
return False
body = str(data.get("body") or "")
return any(pattern.search(body) for pattern in self._mention_patterns)
def _clean_bot_mention_text(self, text: str, data: Dict[str, Any]) -> str:
if not text:
return text
bot_ids = self._bot_ids_from_message(data)
cleaned = text
for bot_id in bot_ids:
bare_id = bot_id.split("@", 1)[0]
if bare_id:
cleaned = re.sub(
rf"@{re.escape(bare_id)}\b[,:\-]*\s*", "", cleaned
)
return cleaned.strip() or text
def _should_process_message(self, data: Dict[str, Any]) -> bool:
chat_id_raw = str(data.get("chatId") or "")
# WhatsApp uses pseudo-chats for Status updates (Stories) and
# Channel/Newsletter broadcasts. These are not real conversations
# and the agent should never reply to them — even in self-chat mode
# where the bridge may surface them as "fromMe" events.
if self._is_broadcast_chat(chat_id_raw):
return False
is_group = data.get("isGroup", False)
if is_group:
chat_id = chat_id_raw
if not self._is_group_allowed(chat_id):
return False
else:
sender_id = str(data.get("senderId") or data.get("from") or "")
if not self._is_dm_allowed(sender_id):
return False
# DMs that pass the policy gate are always processed
return True
# Group messages: check mention / free-response settings
chat_id = str(data.get("chatId") or "")
if chat_id in self._whatsapp_free_response_chats():
return True
if not self._whatsapp_require_mention():
return True
body = str(data.get("body") or "").strip()
if body.startswith("/"):
return True
if self._message_is_reply_to_bot(data):
return True
if self._message_mentions_bot(data):
return True
return self._message_matches_mention_patterns(data)
# ------------------------------------------------------------------ formatting
def format_message(self, content: str) -> str:
"""Convert standard markdown to WhatsApp-compatible formatting.
WhatsApp supports: *bold*, _italic_, ~strikethrough~, ```code```,
and monospaced `inline`. Standard markdown uses different syntax
for bold/italic/strikethrough, so we convert here.
Code blocks (``` fenced) and inline code (`) are protected from
conversion via placeholder substitution.
"""
if not content:
return content
# --- 1. Protect fenced code blocks from formatting changes ---
_FENCE_PH = "\x00FENCE"
fences: list[str] = []
def _save_fence(m: re.Match) -> str:
fences.append(m.group(0))
return f"{_FENCE_PH}{len(fences) - 1}\x00"
result = re.sub(r"```[\s\S]*?```", _save_fence, content)
# --- 2. Protect inline code ---
_CODE_PH = "\x00CODE"
codes: list[str] = []
def _save_code(m: re.Match) -> str:
codes.append(m.group(0))
return f"{_CODE_PH}{len(codes) - 1}\x00"
result = re.sub(r"`[^`\n]+`", _save_code, result)
# --- 3. Convert markdown formatting to WhatsApp syntax ---
# Bold: **text** or __text__ → *text*
result = re.sub(r"\*\*(.+?)\*\*", r"*\1*", result)
result = re.sub(r"__(.+?)__", r"*\1*", result)
# Strikethrough: ~~text~~ → ~text~
result = re.sub(r"~~(.+?)~~", r"~\1~", result)
# Italic: *text* is already WhatsApp italic — leave as-is
# _text_ is already WhatsApp italic — leave as-is
# --- 4. Convert markdown headers to bold text ---
# # Header → *Header*
result = re.sub(r"^#{1,6}\s+(.+)$", r"*\1*", result, flags=re.MULTILINE)
# --- 5. Convert markdown links: [text](url) → text (url) ---
result = re.sub(r"\[([^\]]+)\]\(([^)]+)\)", r"\1 (\2)", result)
# --- 6. Restore protected sections ---
for i, fence in enumerate(fences):
result = result.replace(f"{_FENCE_PH}{i}\x00", fence)
for i, code in enumerate(codes):
result = result.replace(f"{_CODE_PH}{i}\x00", code)
return result

View File

@@ -3678,7 +3678,8 @@ class GatewayRunner:
# Warn if no user allowlists are configured and open access is not opted in
_builtin_allowed_vars = (
"TELEGRAM_ALLOWED_USERS", "DISCORD_ALLOWED_USERS",
"WHATSAPP_ALLOWED_USERS", "SLACK_ALLOWED_USERS",
"WHATSAPP_ALLOWED_USERS", "WHATSAPP_CLOUD_ALLOWED_USERS",
"SLACK_ALLOWED_USERS",
"SIGNAL_ALLOWED_USERS", "SIGNAL_GROUP_ALLOWED_USERS",
"TELEGRAM_GROUP_ALLOWED_USERS",
"TELEGRAM_GROUP_ALLOWED_CHATS",
@@ -3696,7 +3697,8 @@ class GatewayRunner:
)
_builtin_allow_all_vars = (
"TELEGRAM_ALLOW_ALL_USERS", "DISCORD_ALLOW_ALL_USERS",
"WHATSAPP_ALLOW_ALL_USERS", "SLACK_ALLOW_ALL_USERS",
"WHATSAPP_ALLOW_ALL_USERS", "WHATSAPP_CLOUD_ALLOW_ALL_USERS",
"SLACK_ALLOW_ALL_USERS",
"SIGNAL_ALLOW_ALL_USERS", "EMAIL_ALLOW_ALL_USERS",
"SMS_ALLOW_ALL_USERS", "MATTERMOST_ALLOW_ALL_USERS",
"MATRIX_ALLOW_ALL_USERS", "DINGTALK_ALLOW_ALL_USERS",
@@ -5954,6 +5956,18 @@ class GatewayRunner:
logger.warning("WhatsApp: Node.js not installed or bridge not configured")
return None
return WhatsAppAdapter(config)
elif platform == Platform.WHATSAPP_CLOUD:
from gateway.platforms.whatsapp_cloud import (
WhatsAppCloudAdapter,
check_whatsapp_cloud_requirements,
)
if not check_whatsapp_cloud_requirements():
logger.warning(
"WhatsApp Cloud: aiohttp/httpx missing — reinstall hermes-agent"
)
return None
return WhatsAppCloudAdapter(config)
elif platform == Platform.SLACK:
from gateway.platforms.slack import SlackAdapter, check_slack_requirements
@@ -6144,6 +6158,7 @@ class GatewayRunner:
Platform.TELEGRAM: "TELEGRAM_ALLOWED_USERS",
Platform.DISCORD: "DISCORD_ALLOWED_USERS",
Platform.WHATSAPP: "WHATSAPP_ALLOWED_USERS",
Platform.WHATSAPP_CLOUD: "WHATSAPP_CLOUD_ALLOWED_USERS",
Platform.SLACK: "SLACK_ALLOWED_USERS",
Platform.SIGNAL: "SIGNAL_ALLOWED_USERS",
Platform.EMAIL: "EMAIL_ALLOWED_USERS",
@@ -6170,6 +6185,7 @@ class GatewayRunner:
Platform.TELEGRAM: "TELEGRAM_ALLOW_ALL_USERS",
Platform.DISCORD: "DISCORD_ALLOW_ALL_USERS",
Platform.WHATSAPP: "WHATSAPP_ALLOW_ALL_USERS",
Platform.WHATSAPP_CLOUD: "WHATSAPP_CLOUD_ALLOW_ALL_USERS",
Platform.SLACK: "SLACK_ALLOW_ALL_USERS",
Platform.SIGNAL: "SIGNAL_ALLOW_ALL_USERS",
Platform.EMAIL: "EMAIL_ALLOW_ALL_USERS",

View File

@@ -1981,6 +1981,25 @@ def cmd_whatsapp(args):
print("⚠ Pairing may not have completed. Run 'hermes whatsapp' to try again.")
def cmd_whatsapp_cloud(args):
"""Set up WhatsApp Business Cloud API (official Meta integration).
Walks the user through the Meta-side credentials (Phone Number ID,
Access Token, App Secret, optional App/WABA IDs) plus webhook
configuration. Includes field-shape validators that catch the most
common setup mistakes (e.g. pasting a phone number into the Phone
Number ID field).
Distinct from ``hermes whatsapp`` (the Baileys bridge wizard) — the
two adapters are complementary, not alternatives. See
``hermes_cli/setup_whatsapp_cloud.py``.
"""
_require_tty("whatsapp-cloud")
from hermes_cli.setup_whatsapp_cloud import run_whatsapp_cloud_setup
return run_whatsapp_cloud_setup()
def cmd_setup(args):
"""Interactive setup wizard."""
from hermes_cli.setup import run_setup_wizard
@@ -9699,6 +9718,7 @@ def _coalesce_session_name_args(argv: list) -> list:
"gateway",
"setup",
"whatsapp",
"whatsapp-cloud",
"login",
"logout",
"auth",
@@ -10560,7 +10580,7 @@ _BUILTIN_SUBCOMMANDS = frozenset(
"model", "pairing", "plugins", "postinstall", "profile", "proxy",
"send", "sessions", "setup",
"skills", "slack", "status", "tools", "uninstall", "update",
"version", "webhook", "whatsapp", "chat", "secrets",
"version", "webhook", "whatsapp", "whatsapp-cloud", "chat", "secrets",
# Help-ish invocations — plugin commands not being listed in
# top-level --help is an acceptable trade-off for skipping an
# expensive eager import of every bundled plugin module.
@@ -11311,6 +11331,21 @@ def main():
)
whatsapp_parser.set_defaults(func=cmd_whatsapp)
# =========================================================================
# whatsapp-cloud command (official Meta Cloud API; complement to Baileys)
# =========================================================================
whatsapp_cloud_parser = subparsers.add_parser(
"whatsapp-cloud",
help="Set up WhatsApp Business Cloud API integration",
description=(
"Configure the official Meta WhatsApp Business Cloud API "
"adapter (Business account required, public webhook URL "
"required). Distinct from `hermes whatsapp` which sets up "
"the Baileys bridge for personal accounts."
),
)
whatsapp_cloud_parser.set_defaults(func=cmd_whatsapp_cloud)
# =========================================================================
# slack command
# =========================================================================

View File

@@ -66,6 +66,10 @@ class NousSubscriptionFeatures:
def tts(self) -> NousFeatureState:
return self.features["tts"]
@property
def stt(self) -> NousFeatureState:
return self.features["stt"]
@property
def browser(self) -> NousFeatureState:
return self.features["browser"]
@@ -75,7 +79,7 @@ class NousSubscriptionFeatures:
return self.features["modal"]
def items(self) -> Iterable[NousFeatureState]:
ordered = ("web", "image_gen", "tts", "browser", "modal")
ordered = ("web", "image_gen", "tts", "stt", "browser", "modal")
for key in ordered:
yield self.features[key]
@@ -159,6 +163,16 @@ def _tts_label(current_provider: str) -> str:
return mapping.get(current_provider or "edge", current_provider or "Edge TTS")
def _stt_label(current_provider: str) -> str:
mapping = {
"openai": "OpenAI Whisper",
"groq": "Groq Whisper",
"mistral": "Mistral Voxtral Transcribe",
"local": "Local faster-whisper",
}
return mapping.get(current_provider or "local", current_provider or "Local faster-whisper")
def _resolve_browser_feature_state(
*,
browser_tool_enabled: bool,
@@ -251,6 +265,7 @@ def get_nous_subscription_features(
web_cfg = config.get("web") if isinstance(config.get("web"), dict) else {}
tts_cfg = config.get("tts") if isinstance(config.get("tts"), dict) else {}
stt_cfg = config.get("stt") if isinstance(config.get("stt"), dict) else {}
browser_cfg = config.get("browser") if isinstance(config.get("browser"), dict) else {}
terminal_cfg = config.get("terminal") if isinstance(config.get("terminal"), dict) else {}
@@ -260,6 +275,11 @@ def get_nous_subscription_features(
web_search_backend = str(web_cfg.get("search_backend") or "").strip().lower()
web_extract_backend = str(web_cfg.get("extract_backend") or "").strip().lower()
tts_provider = str(tts_cfg.get("provider") or "edge").strip().lower()
# STT default is "local" (faster-whisper) per DEFAULT_CONFIG, which
# requires `pip install faster-whisper`. For Nous subscribers we'd
# rather route through the managed OpenAI audio gateway — see
# apply_nous_managed_defaults below.
stt_provider = str(stt_cfg.get("provider") or "local").strip().lower()
browser_provider_explicit = "cloud_provider" in browser_cfg
browser_provider = normalize_browser_cloud_provider(
browser_cfg.get("cloud_provider") if browser_provider_explicit else None
@@ -276,6 +296,7 @@ def get_nous_subscription_features(
# prevent gateway routing.
web_use_gateway = _uses_gateway(web_cfg)
tts_use_gateway = _uses_gateway(tts_cfg)
stt_use_gateway = _uses_gateway(stt_cfg)
browser_use_gateway = _uses_gateway(browser_cfg)
image_gen_cfg = config.get("image_gen") if isinstance(config.get("image_gen"), dict) else {}
image_use_gateway = _uses_gateway(image_gen_cfg)
@@ -293,6 +314,22 @@ def get_nous_subscription_features(
direct_browser_use = bool(get_env_value("BROWSER_USE_API_KEY"))
direct_modal = has_direct_modal_credentials()
# STT direct providers. OpenAI Whisper reuses the same audio key as
# OpenAI TTS — resolve_openai_audio_api_key() reads VOICE_TOOLS_OPENAI_KEY
# and falls back to OPENAI_API_KEY. The local provider's "direct"
# signal is whether faster-whisper is importable; we lazy-import so
# this module stays cheap on the happy path.
direct_openai_stt = bool(resolve_openai_audio_api_key())
direct_groq_stt = bool(get_env_value("GROQ_API_KEY"))
direct_mistral_stt = bool(get_env_value("MISTRAL_API_KEY"))
try:
from tools.transcription_tools import _HAS_FASTER_WHISPER
local_stt_available = bool(_HAS_FASTER_WHISPER) or bool(
get_env_value("HERMES_LOCAL_STT_COMMAND")
)
except Exception:
local_stt_available = bool(get_env_value("HERMES_LOCAL_STT_COMMAND"))
# When use_gateway is set, suppress direct credentials for managed detection
if web_use_gateway:
direct_firecrawl = False
@@ -304,6 +341,11 @@ def get_nous_subscription_features(
if tts_use_gateway:
direct_openai_tts = False
direct_elevenlabs = False
if stt_use_gateway:
direct_openai_stt = False
direct_groq_stt = False
direct_mistral_stt = False
local_stt_available = False
if browser_use_gateway:
direct_browser_use = False
direct_browserbase = False
@@ -311,6 +353,10 @@ def get_nous_subscription_features(
managed_web_available = managed_tools_flag and nous_auth_present and is_managed_tool_gateway_ready("firecrawl")
managed_image_available = managed_tools_flag and nous_auth_present and is_managed_tool_gateway_ready("fal-queue")
managed_tts_available = managed_tools_flag and nous_auth_present and is_managed_tool_gateway_ready("openai-audio")
# STT and TTS share the same managed gateway endpoint ("openai-audio")
# because the OpenAI audio API covers both /audio/speech (TTS) and
# /audio/transcriptions (STT). One probe, used by both.
managed_stt_available = managed_tts_available
managed_browser_available = managed_tools_flag and nous_auth_present and is_managed_tool_gateway_ready("browser-use")
managed_modal_available = managed_tools_flag and nous_auth_present and is_managed_tool_gateway_ready("modal")
modal_state = resolve_modal_backend_state(
@@ -361,6 +407,24 @@ def get_nous_subscription_features(
)
tts_active = bool(tts_tool_enabled and tts_available)
# STT availability per provider. Unlike TTS, STT isn't a model-callable
# tool — the gateway voice middleware calls it on every inbound voice
# message — so toolset_enabled is N/A and we treat stt as always
# "enabled" if a usable provider is configured.
stt_current_provider = stt_provider or "local"
stt_managed = (
stt_current_provider == "openai"
and managed_stt_available
and not direct_openai_stt
)
stt_available = bool(
(stt_current_provider == "local" and local_stt_available)
or (stt_current_provider == "openai" and (managed_stt_available or direct_openai_stt))
or (stt_current_provider == "groq" and direct_groq_stt)
or (stt_current_provider == "mistral" and direct_mistral_stt)
)
stt_active = stt_available
browser_local_available = _has_agent_browser()
(
browser_current_provider,
@@ -415,6 +479,13 @@ def get_nous_subscription_features(
if isinstance(raw_tts_cfg, dict) and "provider" in raw_tts_cfg:
tts_explicit_configured = tts_provider not in {"", "edge"}
# STT considers any non-default provider explicit. "local" is the
# DEFAULT_CONFIG seed, so seeing it doesn't mean the user picked it.
stt_explicit_configured = False
raw_stt_cfg = config.get("stt")
if isinstance(raw_stt_cfg, dict) and "provider" in raw_stt_cfg:
stt_explicit_configured = stt_provider not in {"", "local"}
features = {
"web": NousFeatureState(
key="web",
@@ -452,6 +523,21 @@ def get_nous_subscription_features(
current_provider=_tts_label(tts_current_provider),
explicit_configured=tts_explicit_configured,
),
"stt": NousFeatureState(
key="stt",
label="Speech-to-text",
included_by_default=True,
available=stt_available,
active=stt_active,
managed_by_nous=stt_managed,
direct_override=stt_active and not stt_managed,
# STT isn't toolset-gated (gateway middleware calls it
# unconditionally on inbound voice), so report True so the
# status display doesn't flag it as "tool disabled".
toolset_enabled=True,
current_provider=_stt_label(stt_current_provider),
explicit_configured=stt_explicit_configured,
),
"browser": NousFeatureState(
key="browser",
label="Browser automation",
@@ -514,6 +600,11 @@ def apply_nous_managed_defaults(
tts_cfg = {}
config["tts"] = tts_cfg
stt_cfg = config.get("stt")
if not isinstance(stt_cfg, dict):
stt_cfg = {}
config["stt"] = stt_cfg
browser_cfg = config.get("browser")
if not isinstance(browser_cfg, dict):
browser_cfg = {}
@@ -535,6 +626,18 @@ def apply_nous_managed_defaults(
tts_cfg["provider"] = "openai"
changed.add("tts")
# STT: same pattern as TTS. The DEFAULT_CONFIG seed is "local"
# (requires `pip install faster-whisper`); for Nous subscribers we
# flip it to "openai" so the managed audio gateway handles transcription
# via the same auth as TTS. Skipped when the user has explicitly
# configured STT or has direct credentials for a non-managed provider.
if not features.stt.explicit_configured and not (
get_env_value("GROQ_API_KEY")
or get_env_value("MISTRAL_API_KEY")
):
stt_cfg["provider"] = "openai"
changed.add("stt")
if "browser" in selected_toolsets and not features.browser.explicit_configured and not (
get_env_value("BROWSER_USE_API_KEY")
or get_env_value("BROWSERBASE_API_KEY")
@@ -556,6 +659,7 @@ _GATEWAY_TOOL_LABELS = {
"web": "Web search & extract (Firecrawl)",
"image_gen": "Image generation (FAL)",
"tts": "Text-to-speech (OpenAI TTS)",
"stt": "Speech-to-text (OpenAI Whisper)",
"browser": "Browser automation (Browser Use)",
}
@@ -575,6 +679,15 @@ def _get_gateway_direct_credentials() -> Dict[str, bool]:
resolve_openai_audio_api_key()
or get_env_value("ELEVENLABS_API_KEY")
),
# STT direct credentials. OpenAI Whisper shares the audio key
# with TTS via resolve_openai_audio_api_key() — counting it here
# too is intentional: if the user has an OpenAI audio key they
# don't need the gateway for either.
"stt": bool(
resolve_openai_audio_api_key()
or get_env_value("GROQ_API_KEY")
or get_env_value("MISTRAL_API_KEY")
),
"browser": bool(
get_env_value("BROWSER_USE_API_KEY")
or (get_env_value("BROWSERBASE_API_KEY") and get_env_value("BROWSERBASE_PROJECT_ID"))
@@ -586,10 +699,11 @@ _GATEWAY_DIRECT_LABELS = {
"web": "Firecrawl/Exa/Parallel/Tavily key",
"image_gen": "FAL key",
"tts": "OpenAI/ElevenLabs key",
"stt": "OpenAI/Groq/Mistral key",
"browser": "Browser Use/Browserbase key",
}
_ALL_GATEWAY_KEYS = ("web", "image_gen", "tts", "browser")
_ALL_GATEWAY_KEYS = ("web", "image_gen", "tts", "stt", "browser")
def get_gateway_eligible_tools(
@@ -625,6 +739,7 @@ def get_gateway_eligible_tools(
"web": _uses_gateway(config.get("web")),
"image_gen": _uses_gateway(config.get("image_gen")),
"tts": _uses_gateway(config.get("tts")),
"stt": _uses_gateway(config.get("stt")),
"browser": _uses_gateway(config.get("browser")),
}
@@ -664,6 +779,11 @@ def apply_gateway_defaults(
tts_cfg = {}
config["tts"] = tts_cfg
stt_cfg = config.get("stt")
if not isinstance(stt_cfg, dict):
stt_cfg = {}
config["stt"] = stt_cfg
browser_cfg = config.get("browser")
if not isinstance(browser_cfg, dict):
browser_cfg = {}
@@ -679,6 +799,11 @@ def apply_gateway_defaults(
tts_cfg["use_gateway"] = True
changed.add("tts")
if "stt" in tool_keys:
stt_cfg["provider"] = "openai"
stt_cfg["use_gateway"] = True
changed.add("stt")
if "browser" in tool_keys:
browser_cfg["cloud_provider"] = "browser-use"
browser_cfg["use_gateway"] = True
@@ -717,8 +842,9 @@ def prompt_enable_tool_gateway(config: Dict[str, object]) -> set[str]:
desc_parts: list[str] = [
"",
" The Tool Gateway gives you access to web search, image generation,",
" text-to-speech, and browser automation through your Nous subscription.",
" No need to sign up for separate API keys — just pick the tools you want.",
" text-to-speech, speech-to-text, and browser automation through your",
" Nous subscription. No need to sign up for separate API keys — just",
" pick the tools you want.",
"",
]
if already_managed:

View File

@@ -24,6 +24,7 @@ PLATFORMS: OrderedDict[str, PlatformInfo] = OrderedDict([
("discord", PlatformInfo(label="💬 Discord", default_toolset="hermes-discord")),
("slack", PlatformInfo(label="💼 Slack", default_toolset="hermes-slack")),
("whatsapp", PlatformInfo(label="📱 WhatsApp", default_toolset="hermes-whatsapp")),
("whatsapp_cloud", PlatformInfo(label="📱 WhatsApp Business (Cloud)", default_toolset="hermes-whatsapp")),
("signal", PlatformInfo(label="📡 Signal", default_toolset="hermes-signal")),
("bluebubbles", PlatformInfo(label="💙 BlueBubbles", default_toolset="hermes-bluebubbles")),
("email", PlatformInfo(label="📧 Email", default_toolset="hermes-email")),

View File

@@ -0,0 +1,530 @@
"""
Interactive setup wizard for the WhatsApp Cloud API adapter.
Entry point: ``hermes whatsapp-cloud`` (dispatched from
``cmd_whatsapp_cloud`` in ``hermes_cli/main.py``).
Walks the user through the 6 credentials Meta requires + recipient
allowlist, auto-generates the verify token, and prints exact follow-up
instructions for the parts that can't happen inside the wizard process
(starting cloudflared, starting the gateway, configuring Meta's
webhook dashboard, adding their phone to the recipient list).
Heavy emphasis on field-shape validation to catch the most common
configuration mistakes:
- Putting the actual phone number in ``WHATSAPP_CLOUD_PHONE_NUMBER_ID``
(the field expects Meta's 15-17 digit internal ID, not a phone number).
This is the #1 trap — caught us during Phase 3 live testing.
- Pasting tokens with trailing whitespace.
- Pasting an OpenAI / Slack / GitHub key by mistake.
- Confusing App ID with WABA ID with Phone Number ID.
Each prompt has contextual help showing exactly where to find the value
in Meta's App Dashboard, with a one-line description and the field's
expected shape ("starts with EAA", "15-17 digits", "32 hex chars", etc.).
The wizard intentionally does NOT smoke-test the webhook itself — the
Hermes gateway and the cloudflared tunnel both run in separate
processes the user starts AFTER this wizard exits, so any in-wizard
probe would fail by design. Instead the final SETUP COMPLETE block
prints the exact curl command the user can run from a third terminal
to verify the loop end-to-end once everything's running.
"""
from __future__ import annotations
import re
import secrets
import sys
from typing import Optional
# ---------------------------------------------------------------------------
# Field-shape validators
# ---------------------------------------------------------------------------
#
# Each validator returns (ok, reason_if_not_ok). The wizard uses them to
# reject obviously-malformed input before saving — saves users a round
# trip with Meta's 401 / 400 errors.
def _validate_phone_number_id(value: str) -> tuple[bool, Optional[str]]:
"""Phone Number ID is a 15-17 digit numeric ID assigned by Meta.
It's NOT a phone number. The #1 setup mistake is pasting the actual
phone number (e.g. ``15556422442``) into this field — that's only
10-11 digits and gets rejected by Graph as "Object with ID does
not exist."
"""
if not value:
return False, "Phone Number ID is required"
s = value.strip()
if not s.isdigit():
return False, "Phone Number ID must be numeric (no '+', spaces, or dashes)"
# Real phone numbers are 10-11 digits (US/CA country code + area code
# + 7 digits). Meta's internal IDs are 15-17 digits. If we see a
# phone-number-sized value, the user almost certainly pasted the
# phone number by mistake.
if 10 <= len(s) <= 12:
return False, (
"That looks like a phone number — but this field needs the "
"Phone Number ID (Meta's internal ID, 15-17 digits, e.g. "
"'7794189252778687'). Look just BELOW the 'From' dropdown in "
"API Setup → it's labelled 'Phone number ID'."
)
if len(s) < 13:
return False, "Phone Number ID looks too short (expected 13-18 digits)"
if len(s) > 20:
return False, "Phone Number ID looks too long (expected 13-18 digits)"
return True, None
def _validate_waba_id(value: str) -> tuple[bool, Optional[str]]:
"""WABA ID is numeric, similar length range as Phone Number ID."""
if not value:
return False, "WABA ID is required"
s = value.strip()
if not s.isdigit():
return False, "WABA ID must be numeric"
if len(s) < 10 or len(s) > 25:
return False, "WABA ID looks wrong (expected 10-25 digits)"
return True, None
def _validate_app_id(value: str) -> tuple[bool, Optional[str]]:
"""Meta App ID is numeric, typically 15-16 digits."""
if not value:
return False, "App ID is required"
s = value.strip()
if not s.isdigit():
return False, "App ID must be numeric"
if len(s) < 13 or len(s) > 20:
return False, "App ID looks wrong (expected 15-16 digits)"
return True, None
def _validate_app_secret(value: str) -> tuple[bool, Optional[str]]:
"""App Secret is a 32-character lowercase hex string."""
if not value:
return False, "App Secret is required"
s = value.strip()
if not re.fullmatch(r"[0-9a-f]+", s.lower()):
return False, (
"App Secret should be a hex string (only digits 0-9 and "
"letters a-f). Make sure you copied the 'App secret' from "
"Settings → Basic, not some other token."
)
if len(s) != 32:
return False, f"App Secret should be exactly 32 hex characters (got {len(s)})"
return True, None
def _validate_access_token(value: str) -> tuple[bool, Optional[str]]:
"""Meta access tokens start with ``EAA`` and are 100-300+ characters.
Both temp tokens (24h) and System User permanent tokens share this
prefix. We don't try to distinguish them.
"""
if not value:
return False, "Access token is required"
s = value.strip()
if not s.startswith("EAA"):
# Diagnose common paste mistakes
if s.startswith("sk-"):
return False, (
"That's an OpenAI key (starts with 'sk-'), not a Meta "
"WhatsApp access token. Meta tokens start with 'EAA'."
)
if s.startswith("xoxb-") or s.startswith("xoxp-"):
return False, (
"That's a Slack token, not a Meta WhatsApp access token. "
"Meta tokens start with 'EAA'."
)
if s.startswith("ghp_") or s.startswith("gho_"):
return False, (
"That's a GitHub token, not a Meta WhatsApp access "
"token. Meta tokens start with 'EAA'."
)
return False, (
"Meta WhatsApp access tokens start with 'EAA'. Check that "
"you're copying from the right place (API Setup → 'Generate "
"access token', or Business Settings → System Users → "
"'Generate token' for a permanent one)."
)
if len(s) < 100:
return False, f"Access token looks too short ({len(s)} chars, expected 100+)"
return True, None
# ---------------------------------------------------------------------------
# Prompt helpers
# ---------------------------------------------------------------------------
def _prompt(message: str, default: Optional[str] = None) -> str:
"""Read one line of input. Returns "" on EOF / Ctrl+C / empty input.
The ``default`` parameter is shown to the user but NOT auto-applied
on empty input — callers handle the "user kept existing" case
explicitly so they can distinguish between a real value and a
display preview (e.g. ``"abc12345..."`` for masked secrets).
"""
try:
suffix = f" [{default}]" if default else ""
raw = input(f"{message}{suffix}: ").strip()
except (EOFError, KeyboardInterrupt):
print()
return ""
return raw
def _prompt_validated(
message: str,
validator,
*,
current: Optional[str] = None,
help_text: Optional[str] = None,
) -> Optional[str]:
"""Repeat the prompt until the user enters a valid value or aborts.
Returns the validated value, or None if the user gave up (empty
response after an error, or Ctrl+C). ``current`` is shown as a
default for re-runs of the wizard with existing config.
"""
if help_text:
for line in help_text.strip().splitlines():
print(f" {line}")
attempts = 0
while True:
attempts += 1
value = _prompt(f"{message}", default=current)
if not value:
return None
ok, reason = validator(value)
if ok:
return value.strip()
print(f"{reason}")
if attempts >= 3:
try:
cont = input(" Try again, or press Enter to skip: ").strip()
except (EOFError, KeyboardInterrupt):
return None
if not cont:
return None
attempts = 0
# ---------------------------------------------------------------------------
# Wizard
# ---------------------------------------------------------------------------
def run_whatsapp_cloud_setup() -> int:
"""Interactive wizard for the WhatsApp Cloud API adapter.
Returns 0 on full success, 1 on user abort, 2 on partial completion
(some fields written but the user bailed before finishing).
"""
from hermes_cli.config import get_env_value, save_env_value
print()
print("⚕ WhatsApp Business Cloud API Setup")
print("=" * 50)
print()
print("This wizard configures Hermes to talk to WhatsApp via Meta's")
print("official Cloud API. It's the production-grade path:")
print()
print(" • No QR codes, no Node.js bridge subprocess")
print(" • Stable connection — no account-ban risk")
print(" • Business account required (not personal WhatsApp)")
print(" • Public webhook URL required (Cloudflare Tunnel, ngrok,")
print(" or your own reverse proxy with TLS)")
print()
print("If you don't have a Meta app set up yet, follow these steps")
print("FIRST, then come back and re-run this wizard:")
print()
print(" 1. https://developers.facebook.com/apps → Create App")
print("'Connect with customers through WhatsApp'")
print(" 2. App Dashboard → WhatsApp → API Setup")
print(" 3. Click 'Generate access token' (temp 24h token is fine to")
print(" start; switch to a System User permanent token later)")
print()
try:
proceed = input("Press Enter to continue, or Ctrl+C to abort... ").strip()
except (EOFError, KeyboardInterrupt):
print("\nSetup cancelled.")
return 1
print()
print("" * 50)
print("STEP 1 — Phone Number ID")
print("" * 50)
current_phone_id = get_env_value("WHATSAPP_CLOUD_PHONE_NUMBER_ID") or None
phone_id = _prompt_validated(
"Phone Number ID",
_validate_phone_number_id,
current=current_phone_id,
help_text=(
"Found in: App Dashboard → WhatsApp → API Setup, in the\n"
"'Send and receive messages' section.\n"
"Look BELOW the 'From' dropdown — there's a 'Phone number ID'\n"
"line with the value (15-17 digits, e.g. '7794189252778687').\n"
"It is NOT the phone number itself (+1 555-...). That's the\n"
"single most common setup mistake."
),
)
if not phone_id:
if current_phone_id:
phone_id = current_phone_id
print(f" ✓ Keeping existing: {phone_id}")
else:
print("\n✗ Phone Number ID is required. Aborting.")
return 1
else:
save_env_value("WHATSAPP_CLOUD_PHONE_NUMBER_ID", phone_id)
print(f" ✓ Saved: {phone_id}")
print()
print("" * 50)
print("STEP 2 — Access Token")
print("" * 50)
current_token = get_env_value("WHATSAPP_CLOUD_ACCESS_TOKEN") or None
current_display = (current_token[:15] + "...") if current_token else None
token = _prompt_validated(
"Access Token",
_validate_access_token,
current=current_display,
help_text=(
"Two options for getting one:\n\n"
" (a) TEMP — App Dashboard → WhatsApp → API Setup →\n"
" 'Generate access token' button. Lasts 24 hours.\n"
" Fine for testing today; you'll have to regenerate\n"
" tomorrow.\n\n"
" (b) PERMANENT (production) — System User token. One-time\n"
" setup, never expires:\n"
" • business.facebook.com → Settings → System users →\n"
" Add → Admin role\n"
" • Assign Assets → your app (Manage app), your\n"
" WhatsApp account (Manage WABAs)\n"
" • Generate token → expiration: Never → permissions:\n"
" business_management, whatsapp_business_messaging,\n"
" whatsapp_business_management\n\n"
"Tokens start with 'EAA'."
),
)
# If they had a current token and just hit Enter, keep it.
if not token:
if current_token:
token = current_token
print(" ✓ Keeping existing token")
else:
print("\n✗ Access Token is required. Aborting.")
return 1
else:
save_env_value("WHATSAPP_CLOUD_ACCESS_TOKEN", token)
print(" ✓ Saved (token hidden)")
print()
print("" * 50)
print("STEP 3 — App Secret (required for webhook signature verification)")
print("" * 50)
current_secret = get_env_value("WHATSAPP_CLOUD_APP_SECRET") or None
current_secret_display = (current_secret[:8] + "...") if current_secret else None
app_secret = _prompt_validated(
"App Secret",
_validate_app_secret,
current=current_secret_display,
help_text=(
"Found in: App Dashboard → Settings → Basic →\n"
"'App secret' field (click 'Show', enter your Facebook password).\n\n"
"If 'Show' doesn't appear, you may need Admin role on the app.\n"
"It's a 32-character lowercase hex string.\n\n"
"Without the App Secret, inbound webhook POSTs are refused\n"
"with HTTP 503 (we can't verify they actually came from Meta)."
),
)
if not app_secret:
if current_secret:
app_secret = current_secret
print(" ✓ Keeping existing App Secret")
else:
print("\n⚠ Skipping App Secret — inbound webhooks will be refused")
print(" until you set WHATSAPP_CLOUD_APP_SECRET manually.")
else:
save_env_value("WHATSAPP_CLOUD_APP_SECRET", app_secret)
print(" ✓ Saved (secret hidden)")
print()
print("" * 50)
print("STEP 4 — App ID & WABA ID (optional, for analytics)")
print("" * 50)
current_app_id = get_env_value("WHATSAPP_CLOUD_APP_ID") or None
app_id = _prompt_validated(
"App ID (optional, press Enter to skip)",
lambda v: (True, None) if not v else _validate_app_id(v),
current=current_app_id,
help_text=(
"Found in: App Dashboard → Settings → Basic → 'App ID' at the\n"
"top of the page. Numeric, ~15-16 digits.\n"
"Not required for messaging — useful only for analytics later."
),
)
if app_id:
save_env_value("WHATSAPP_CLOUD_APP_ID", app_id)
print(f" ✓ Saved: {app_id}")
elif current_app_id:
print(f" ✓ Keeping existing: {current_app_id}")
current_waba_id = get_env_value("WHATSAPP_CLOUD_WABA_ID") or None
waba_id = _prompt_validated(
"WABA ID (optional, press Enter to skip)",
lambda v: (True, None) if not v else _validate_waba_id(v),
current=current_waba_id,
help_text=(
"WhatsApp Business Account ID. Found in: App Dashboard →\n"
"WhatsApp → API Setup, near the top — 'WhatsApp Business\n"
"Account ID'. Numeric, ~15+ digits.\n"
"Not required for messaging — useful for analytics."
),
)
if waba_id:
save_env_value("WHATSAPP_CLOUD_WABA_ID", waba_id)
print(f" ✓ Saved: {waba_id}")
elif current_waba_id:
print(f" ✓ Keeping existing: {current_waba_id}")
print()
print("" * 50)
print("STEP 5 — Verify Token (auto-generated)")
print("" * 50)
current_verify = get_env_value("WHATSAPP_CLOUD_VERIFY_TOKEN") or None
if current_verify:
print(f" An existing verify token is already set ({current_verify[:8]}...).")
try:
regen = input(" Generate a new one? [y/N]: ").strip().lower()
except (EOFError, KeyboardInterrupt):
regen = "n"
if regen in {"y", "yes"}:
verify_token = secrets.token_urlsafe(32)
save_env_value("WHATSAPP_CLOUD_VERIFY_TOKEN", verify_token)
print(f" ✓ New verify token: {verify_token}")
else:
verify_token = current_verify
print(" ✓ Keeping existing verify token")
else:
verify_token = secrets.token_urlsafe(32)
save_env_value("WHATSAPP_CLOUD_VERIFY_TOKEN", verify_token)
print(f" ✓ Generated: {verify_token}")
print()
print(" → COPY THIS TOKEN NOW. You'll paste it into Meta's webhook")
print(" configuration dialog (next step).")
print()
print("" * 50)
print("STEP 6 — Recipient Allowlist")
print("" * 50)
print()
print(" Who is allowed to message the bot? (Comma-separated phone")
print(" numbers with country code, no '+' / spaces / dashes. Use '*'")
print(" to allow anyone — only safe if you've also configured Meta's")
print(" recipient whitelist for app-development mode.)")
print()
current_allow = get_env_value("WHATSAPP_CLOUD_ALLOWED_USERS") or None
allow_default = current_allow if current_allow else None
try:
allowed = input(
f" → Allowed users{' [' + allow_default + ']' if allow_default else ''}: "
).strip() or (allow_default or "")
except (EOFError, KeyboardInterrupt):
allowed = ""
if allowed:
# Light normalization — strip spaces and dashes from each entry.
allowed = ",".join(
re.sub(r"[\s\-+]", "", part) for part in allowed.split(",") if part.strip()
)
save_env_value("WHATSAPP_CLOUD_ALLOWED_USERS", allowed)
print(f" ✓ Saved: {allowed}")
else:
print(" ⚠ No allowlist — every inbound message will be denied.")
print(" Re-run this wizard or set WHATSAPP_CLOUD_ALLOWED_USERS manually.")
print()
print("" * 50)
print("SETUP COMPLETE — Next steps")
print("" * 50)
print()
print(" Hermes needs a public HTTPS URL to receive WhatsApp messages.")
print(" The recommended path is Cloudflare Tunnel (free, no port")
print(" forwarding, no DNS setup).")
print()
print(" 1. Install cloudflared (one-time, if you don't have it):")
print(" Windows: winget install Cloudflare.cloudflared")
print(" macOS: brew install cloudflared")
print(" Linux: https://github.com/cloudflare/cloudflared/releases")
print()
print(" Alternatives: ngrok, or your own domain + reverse proxy")
print(" with TLS.")
print()
print(" 2. Start the tunnel in a separate terminal:")
print(" cloudflared tunnel --url http://localhost:8090")
print(" Note the printed https://<random>.trycloudflare.com URL.")
print()
print(" 3. Start the Hermes gateway in another terminal:")
print(" hermes gateway")
print()
print(" 4. Verify your local config is reachable. From a third")
print(" terminal, with the tunnel URL substituted:")
print()
print(" curl 'https://YOUR-TUNNEL.trycloudflare.com/whatsapp/webhook?\\")
print(f" hub.mode=subscribe&hub.verify_token={verify_token}&\\")
print(" hub.challenge=hello'")
print()
print(" Expected: HTTP 200 with body 'hello'.")
print(" Also try: curl https://YOUR-TUNNEL.trycloudflare.com/health")
print(" (should return JSON with verify_token_configured: true).")
print()
print(" 5. Configure Meta to point at your tunnel:")
print(" App Dashboard → WhatsApp → Configuration → Edit webhook")
print(" Callback URL: <tunnel-url>/whatsapp/webhook")
print(f" Verify Token: {verify_token}")
print(" → Click 'Verify and save'")
print(" → Then 'Manage' webhook fields → subscribe to 'messages'")
print()
print(" 6. Add your phone to Meta's recipient list:")
print(" App Dashboard → WhatsApp → API Setup → 'To'")
print(" 'Manage phone number list'")
print()
print(" 7. DM the bot's test number from your phone.")
print()
print("" * 50)
print("Optional: polish your bot's WhatsApp profile")
print("" * 50)
print()
print(" WhatsApp shows a display name and profile picture for your bot")
print(" in every chat header and contact list. These are set in Meta's")
print(" Business Manager, not via this wizard — but here's where to do")
print(" it once you're up and running:")
print()
effective_waba = waba_id or current_waba_id
if effective_waba:
print(" • Display name + profile picture:")
print(" https://business.facebook.com/wa/manage/phone-numbers/"
f"?waba_id={effective_waba}")
else:
print(" • Display name + profile picture:")
print(" https://business.facebook.com/wa/manage/phone-numbers/")
print(" (select your WhatsApp Business Account on that page)")
print(" Display-name changes go through a ~24-48h Meta review.")
print()
print(" • About, description, website, hours, business category:")
print(" Same page → click your phone number → 'Edit profile'.")
print()
print(" • Verified badge (the green check):")
print(" Requires Meta's business verification process —")
print(" Business Manager → Security Center → Start Verification.")
print()
print(" Docs: https://hermes-agent.nousresearch.com/docs/user-guide/")
print(" messaging/whatsapp-cloud")
print()
return 0

View File

@@ -309,7 +309,7 @@ def show_status(args):
print()
print(color("◆ Nous Tool Gateway", Colors.CYAN, Colors.BOLD))
print(" Your free-tier Nous account does not include Tool Gateway access.")
print(" Upgrade your subscription to unlock managed web, image, TTS, and browser tools.")
print(" Upgrade your subscription to unlock managed web, image, TTS, STT, and browser tools.")
try:
portal_url = nous_status.get("portal_base_url", "").rstrip("/")
if portal_url:

View File

@@ -442,6 +442,7 @@ class TestBuildNousSubscriptionPrompt:
"web": NousFeatureState("web", "Web tools", True, True, True, True, False, True, "firecrawl"),
"image_gen": NousFeatureState("image_gen", "Image generation", True, True, True, True, False, True, "Nous Subscription"),
"tts": NousFeatureState("tts", "OpenAI TTS", True, True, True, True, False, True, "OpenAI TTS"),
"stt": NousFeatureState("stt", "Speech-to-text", True, True, True, True, False, True, "OpenAI Whisper"),
"browser": NousFeatureState("browser", "Browser automation", True, True, True, True, False, True, "Browser Use"),
"modal": NousFeatureState("modal", "Modal execution", False, True, False, False, False, True, "local"),
},
@@ -452,7 +453,7 @@ class TestBuildNousSubscriptionPrompt:
assert "Browser Use" in prompt
assert "Modal execution is optional" in prompt
assert "do not ask the user for Firecrawl, FAL, OpenAI TTS, or Browser-Use API keys" in prompt
assert "do not ask the user for Firecrawl, FAL, OpenAI TTS, OpenAI Whisper, or Browser-Use API keys" in prompt
def test_non_subscriber_prompt_includes_relevant_upgrade_guidance(self, monkeypatch):
monkeypatch.setattr("tools.tool_backend_helpers.managed_nous_tools_enabled", lambda: True)
@@ -466,6 +467,7 @@ class TestBuildNousSubscriptionPrompt:
"web": NousFeatureState("web", "Web tools", True, False, False, False, False, True, ""),
"image_gen": NousFeatureState("image_gen", "Image generation", True, False, False, False, False, True, ""),
"tts": NousFeatureState("tts", "OpenAI TTS", True, False, False, False, False, True, ""),
"stt": NousFeatureState("stt", "Speech-to-text", True, False, False, False, False, True, ""),
"browser": NousFeatureState("browser", "Browser automation", True, False, False, False, False, True, ""),
"modal": NousFeatureState("modal", "Modal execution", False, False, False, False, False, True, ""),
},
@@ -784,6 +786,7 @@ class TestPromptBuilderConstants:
def test_platform_hints_known_platforms(self):
assert "whatsapp" in PLATFORM_HINTS
assert "whatsapp_cloud" in PLATFORM_HINTS
assert "telegram" in PLATFORM_HINTS
assert "discord" in PLATFORM_HINTS
assert "cron" in PLATFORM_HINTS
@@ -791,6 +794,22 @@ class TestPromptBuilderConstants:
assert "api_server" in PLATFORM_HINTS
assert "webui" in PLATFORM_HINTS
def test_whatsapp_cloud_hint_mentions_24h_window(self):
"""The Cloud API's 24-hour conversation window is a hard rule the
agent should know about. Phase 5 (template fallback) was deferred,
so the model needs to know free-form replies outside the window
will fail with Graph error 131047 — otherwise it'll cheerfully
try to schedule delayed messages that silently break."""
hint = PLATFORM_HINTS["whatsapp_cloud"]
assert "24-hour" in hint or "24h" in hint or "24 hour" in hint
assert "131047" in hint
def test_whatsapp_cloud_hint_advertises_media(self):
"""Cloud adapter supports the same MEDIA:/path/ convention as
Baileys for outbound attachments."""
hint = PLATFORM_HINTS["whatsapp_cloud"]
assert "MEDIA:" in hint
def test_cli_hint_does_not_suggest_media_tags(self):
# Regression: MEDIA:/path tags are intercepted only by messaging
# gateway platforms. On the CLI they render as literal text and

View File

@@ -2510,3 +2510,26 @@ class TestSendMediaTimeoutCancelsFuture:
# 2. Second file still got dispatched — one timeout doesn't abort the batch
adapter.send_video.assert_called_once()
assert adapter.send_video.call_args[1]["video_path"] == "/tmp/fast.mp4"
class TestHomeTargetEnvVarRegistry:
"""Regression: ``_HOME_TARGET_ENV_VARS`` must include every gateway
platform that supports cron-driven outbound delivery. Missing an
entry means ``hermes cron create --deliver=<platform>`` silently
fails to route through the platform's home channel."""
def test_whatsapp_cloud_registered(self):
"""``deliver=whatsapp_cloud`` routes through
WHATSAPP_CLOUD_HOME_CHANNEL — added alongside the existing
``whatsapp`` Baileys entry."""
from cron.scheduler import _HOME_TARGET_ENV_VARS
assert "whatsapp_cloud" in _HOME_TARGET_ENV_VARS
assert _HOME_TARGET_ENV_VARS["whatsapp_cloud"] == "WHATSAPP_CLOUD_HOME_CHANNEL"
def test_baileys_whatsapp_still_registered(self):
"""Sanity guard: the Cloud addition didn't disturb Baileys
whatsapp routing."""
from cron.scheduler import _HOME_TARGET_ENV_VARS
assert _HOME_TARGET_ENV_VARS.get("whatsapp") == "WHATSAPP_HOME_CHANNEL"

View File

@@ -206,9 +206,23 @@ class TestPlatformDefaults:
"""Signal, BlueBubbles, etc. default to 'off' tool progress."""
from gateway.display_config import resolve_display_setting
for plat in ("signal", "bluebubbles", "weixin", "wecom", "dingtalk"):
for plat in ("signal", "bluebubbles", "weixin", "wecom", "dingtalk", "whatsapp_cloud"):
assert resolve_display_setting({}, plat, "tool_progress") == "off", plat
def test_whatsapp_cloud_locked_to_low_tier_until_edit_message_lands(self):
"""Regression guard: ``whatsapp_cloud`` must stay TIER_LOW until the
adapter implements edit_message. Without an edit endpoint, raising
the tier to MEDIUM would spam separate WhatsApp messages for every
tool-progress update, which is the exact failure mode this entry
exists to avoid.
When/if Cloud's edit_message lands, update _PLATFORM_DEFAULTS to
TIER_MEDIUM and update this test to assert ``"new"`` accordingly.
"""
from gateway.display_config import resolve_display_setting
assert resolve_display_setting({}, "whatsapp_cloud", "tool_progress") == "off"
assert resolve_display_setting({}, "whatsapp_cloud", "streaming") is False
def test_minimal_tier_platforms(self):
"""Email, SMS, webhook default to 'off' tool progress."""
from gateway.display_config import resolve_display_setting

File diff suppressed because it is too large Load Diff

View File

@@ -179,7 +179,13 @@ def test_get_gateway_eligible_tools_ignores_quoted_false_opt_in(monkeypatch):
monkeypatch.setattr(
ns,
"_get_gateway_direct_credentials",
lambda: {"web": True, "image_gen": False, "tts": False, "browser": False},
lambda: {
"web": True,
"image_gen": False,
"tts": False,
"stt": False,
"browser": False,
},
)
unconfigured, has_direct, already_managed = ns.get_gateway_eligible_tools(
@@ -191,4 +197,150 @@ def test_get_gateway_eligible_tools_ignores_quoted_false_opt_in(monkeypatch):
assert "web" in has_direct
assert "web" not in already_managed
assert set(unconfigured) == {"image_gen", "tts", "browser"}
assert set(unconfigured) == {"image_gen", "tts", "stt", "browser"}
# ---------------------------------------------------------------------------
# STT — managed-by-Nous detection (Phase 4 follow-up)
# ---------------------------------------------------------------------------
def test_stt_managed_by_nous_when_provider_openai_and_no_direct_key(monkeypatch):
"""Default `stt.provider: openai` with a Nous sub + no direct OpenAI key
should route through the managed audio gateway."""
monkeypatch.setattr(ns, "get_env_value", lambda name: "")
monkeypatch.setattr(ns, "get_nous_auth_status", lambda: {"logged_in": True})
monkeypatch.setattr(ns, "managed_nous_tools_enabled", lambda: True)
monkeypatch.setattr(ns, "_toolset_enabled", lambda config, key: False)
monkeypatch.setattr(ns, "_has_agent_browser", lambda: False)
monkeypatch.setattr(ns, "resolve_openai_audio_api_key", lambda: "")
monkeypatch.setattr(ns, "has_direct_modal_credentials", lambda: False)
monkeypatch.setattr(
ns,
"is_managed_tool_gateway_ready",
lambda vendor: vendor == "openai-audio",
)
features = ns.get_nous_subscription_features({"stt": {"provider": "openai"}})
assert features.stt.available is True
assert features.stt.active is True
assert features.stt.managed_by_nous is True
assert features.stt.direct_override is False
assert features.stt.current_provider == "OpenAI Whisper"
def test_stt_direct_key_overrides_managed(monkeypatch):
"""When the user has VOICE_TOOLS_OPENAI_KEY set, STT should use the
direct key, not the managed gateway — same precedence as TTS."""
monkeypatch.setattr(ns, "get_env_value", lambda name: "")
monkeypatch.setattr(ns, "get_nous_auth_status", lambda: {"logged_in": True})
monkeypatch.setattr(ns, "managed_nous_tools_enabled", lambda: True)
monkeypatch.setattr(ns, "_toolset_enabled", lambda config, key: False)
monkeypatch.setattr(ns, "_has_agent_browser", lambda: False)
monkeypatch.setattr(ns, "resolve_openai_audio_api_key", lambda: "sk-direct-key")
monkeypatch.setattr(ns, "has_direct_modal_credentials", lambda: False)
monkeypatch.setattr(
ns,
"is_managed_tool_gateway_ready",
lambda vendor: vendor == "openai-audio",
)
features = ns.get_nous_subscription_features({"stt": {"provider": "openai"}})
assert features.stt.available is True
assert features.stt.managed_by_nous is False
assert features.stt.direct_override is True
def test_stt_groq_provider_requires_groq_key(monkeypatch):
env = {"GROQ_API_KEY": "groq-key"}
monkeypatch.setattr(ns, "get_env_value", lambda name: env.get(name, ""))
monkeypatch.setattr(ns, "get_nous_auth_status", lambda: {})
monkeypatch.setattr(ns, "managed_nous_tools_enabled", lambda: False)
monkeypatch.setattr(ns, "_toolset_enabled", lambda config, key: False)
monkeypatch.setattr(ns, "_has_agent_browser", lambda: False)
monkeypatch.setattr(ns, "resolve_openai_audio_api_key", lambda: "")
monkeypatch.setattr(ns, "has_direct_modal_credentials", lambda: False)
monkeypatch.setattr(ns, "is_managed_tool_gateway_ready", lambda vendor: False)
features = ns.get_nous_subscription_features({"stt": {"provider": "groq"}})
assert features.stt.available is True
assert features.stt.managed_by_nous is False
assert features.stt.current_provider == "Groq Whisper"
assert features.stt.explicit_configured is True
def test_apply_nous_managed_defaults_flips_stt_provider_to_openai_for_nous_users(monkeypatch):
"""Fresh Nous-subscribed user with the DEFAULT_CONFIG `stt.provider: local`
seed should have it auto-flipped to "openai" so the managed audio
gateway transcribes their voice notes without needing faster-whisper
installed."""
monkeypatch.setattr(ns, "get_env_value", lambda name: "")
monkeypatch.setattr(ns, "managed_nous_tools_enabled", lambda: True)
# Avoid the heavy real probing in get_nous_subscription_features.
monkeypatch.setattr(
ns,
"get_nous_subscription_features",
lambda config: ns.NousSubscriptionFeatures(
subscribed=True,
nous_auth_present=True,
provider_is_nous=True,
features={
key: ns.NousFeatureState(
key=key, label=key, included_by_default=True,
available=False, active=False, managed_by_nous=False,
direct_override=False, toolset_enabled=False,
explicit_configured=False,
)
for key in ("web", "image_gen", "tts", "stt", "browser", "modal")
},
),
)
config = {"stt": {"provider": "local"}}
changed = ns.apply_nous_managed_defaults(config, enabled_toolsets=[])
assert "stt" in changed
assert config["stt"]["provider"] == "openai"
def test_apply_nous_managed_defaults_skips_stt_when_groq_key_present(monkeypatch):
"""Don't override a user who explicitly set up Groq for STT."""
env = {"GROQ_API_KEY": "groq-key"}
monkeypatch.setattr(ns, "get_env_value", lambda name: env.get(name, ""))
monkeypatch.setattr(ns, "managed_nous_tools_enabled", lambda: True)
monkeypatch.setattr(
ns,
"get_nous_subscription_features",
lambda config: ns.NousSubscriptionFeatures(
subscribed=True,
nous_auth_present=True,
provider_is_nous=True,
features={
key: ns.NousFeatureState(
key=key, label=key, included_by_default=True,
available=False, active=False, managed_by_nous=False,
direct_override=False, toolset_enabled=False,
explicit_configured=False,
)
for key in ("web", "image_gen", "tts", "stt", "browser", "modal")
},
),
)
config = {"stt": {"provider": "local"}}
changed = ns.apply_nous_managed_defaults(config, enabled_toolsets=[])
# STT was not flipped because the user has a Groq key configured.
assert "stt" not in changed
assert config["stt"]["provider"] == "local"
def test_apply_gateway_defaults_sets_stt_use_gateway(monkeypatch):
config = {}
changed = ns.apply_gateway_defaults(config, ["stt"])
assert "stt" in changed
assert config["stt"]["provider"] == "openai"
assert config["stt"]["use_gateway"] is True

View File

@@ -88,6 +88,7 @@ def test_show_status_reports_managed_nous_features(monkeypatch, capsys, tmp_path
"web": NousFeatureState("web", "Web tools", True, True, True, True, False, True, "firecrawl"),
"image_gen": NousFeatureState("image_gen", "Image generation", True, True, True, True, False, True, "Nous Subscription"),
"tts": NousFeatureState("tts", "OpenAI TTS", True, True, True, True, False, True, "OpenAI TTS"),
"stt": NousFeatureState("stt", "Speech-to-text", True, True, True, True, False, True, "OpenAI Whisper"),
"browser": NousFeatureState("browser", "Browser automation", True, True, True, True, False, True, "Browser Use"),
"modal": NousFeatureState("modal", "Modal execution", False, True, False, False, False, True, "local"),
},

View File

@@ -0,0 +1,406 @@
"""Tests for the WhatsApp Cloud API setup wizard.
Covers:
- Field-shape validators (catch the #1 setup mistake — phone number in
the Phone Number ID field — plus the OpenAI / Slack / GitHub token
paste-by-mistake cases)
- Wizard end-to-end flow with mocked stdin/stdout — verifies each step
writes the expected env var, validation errors block invalid input,
optional fields can be skipped, and the SETUP COMPLETE block prints
the post-setup tunnel + Meta-dashboard instructions the user needs
(the wizard can't smoke-test reachability itself because the gateway
isn't running yet during setup).
"""
from __future__ import annotations
import io
import os
from contextlib import redirect_stdout
from pathlib import Path
import pytest
from hermes_cli.setup_whatsapp_cloud import (
_validate_phone_number_id,
_validate_waba_id,
_validate_app_id,
_validate_app_secret,
_validate_access_token,
run_whatsapp_cloud_setup,
)
# ---------------------------------------------------------------------------
# Validator tests — the cheap, exhaustive coverage layer
# ---------------------------------------------------------------------------
class TestPhoneNumberIdValidator:
def test_accepts_real_meta_phone_number_id(self):
ok, _ = _validate_phone_number_id("7794189252778687")
assert ok
def test_rejects_actual_phone_number_with_helpful_message(self):
"""The #1 setup trap — pasting the phone number instead of the ID."""
ok, reason = _validate_phone_number_id("15556422442")
assert not ok
assert "phone number" in reason.lower()
assert "Phone number ID" in reason # tells them where to look
def test_rejects_phone_number_with_plus(self):
ok, reason = _validate_phone_number_id("+15556422442")
assert not ok
assert "numeric" in reason.lower() or "phone number" in reason.lower()
def test_rejects_empty(self):
ok, reason = _validate_phone_number_id("")
assert not ok
assert "required" in reason.lower()
def test_rejects_too_short(self):
ok, _ = _validate_phone_number_id("12345")
assert not ok
def test_rejects_too_long(self):
ok, _ = _validate_phone_number_id("1" * 25)
assert not ok
def test_strips_surrounding_whitespace(self):
ok, _ = _validate_phone_number_id(" 7794189252778687 ")
assert ok
class TestAccessTokenValidator:
def test_accepts_eaa_token(self):
ok, _ = _validate_access_token("EAA" + "a" * 100)
assert ok
def test_rejects_empty(self):
ok, reason = _validate_access_token("")
assert not ok
assert "required" in reason.lower()
def test_rejects_openai_key_with_helpful_message(self):
ok, reason = _validate_access_token("sk-proj-" + "a" * 100)
assert not ok
assert "OpenAI" in reason
def test_rejects_slack_token_with_helpful_message(self):
ok, reason = _validate_access_token("xoxb-1234-5678-abcdef")
assert not ok
assert "Slack" in reason
def test_rejects_github_token_with_helpful_message(self):
ok, reason = _validate_access_token("ghp_abcdefghijklmnop")
assert not ok
assert "GitHub" in reason
def test_rejects_garbage_with_helpful_message(self):
ok, reason = _validate_access_token("random-string-here")
assert not ok
assert "EAA" in reason # tells them what to look for
def test_rejects_short_token(self):
ok, reason = _validate_access_token("EAAabc")
assert not ok
assert "short" in reason.lower()
class TestAppSecretValidator:
def test_accepts_32_hex_chars(self):
ok, _ = _validate_app_secret("0123456789abcdef0123456789abcdef")
assert ok
def test_accepts_uppercase_hex(self):
ok, _ = _validate_app_secret("0123456789ABCDEF0123456789ABCDEF")
assert ok
def test_rejects_wrong_length(self):
ok, reason = _validate_app_secret("0123456789abcdef") # 16 chars
assert not ok
assert "32" in reason
def test_rejects_non_hex(self):
ok, reason = _validate_app_secret("zzzz56789abcdef0123456789abcdezz")
assert not ok
assert "hex" in reason.lower()
def test_rejects_empty(self):
ok, _ = _validate_app_secret("")
assert not ok
class TestAppIdValidator:
def test_accepts_valid(self):
ok, _ = _validate_app_id("1234567890123456")
assert ok
def test_rejects_non_numeric(self):
ok, _ = _validate_app_id("abcdef")
assert not ok
def test_rejects_too_short(self):
ok, _ = _validate_app_id("123")
assert not ok
class TestWabaIdValidator:
def test_accepts_valid(self):
ok, _ = _validate_waba_id("215589313241560883")
assert ok
def test_rejects_non_numeric(self):
ok, _ = _validate_waba_id("abc-def")
assert not ok
# ---------------------------------------------------------------------------
# End-to-end wizard flow
# ---------------------------------------------------------------------------
@pytest.fixture
def isolated_home(tmp_path, monkeypatch):
"""Redirect HERMES_HOME so save_env_value writes into a temp .env."""
home = tmp_path / "home"
hermes = home / ".hermes"
hermes.mkdir(parents=True)
monkeypatch.setattr(Path, "home", lambda: home)
monkeypatch.setenv("HERMES_HOME", str(hermes))
for key in list(os.environ):
if key.startswith("WHATSAPP_CLOUD_"):
monkeypatch.delenv(key, raising=False)
return hermes
def _env_value(hermes_home: Path, key: str) -> str | None:
env_file = hermes_home / ".env"
if not env_file.exists():
return None
for line in env_file.read_text().splitlines():
if "=" not in line:
continue
k, _, v = line.partition("=")
if k.strip() == key:
return v.strip().strip('"').strip("'")
return None
class TestWizardFlow:
def test_happy_path_minimal(self, isolated_home, monkeypatch):
"""Provide only the required fields; skip optional steps."""
inputs = iter([
"", # press Enter to continue
"7794189252778687", # Phone Number ID
"EAA" + "x" * 200, # Access Token
"0123456789abcdef0123456789abcdef", # App Secret
"", # App ID — skip
"", # WABA ID — skip
"15551234567", # Allowed users
])
monkeypatch.setattr("builtins.input", lambda *a, **kw: next(inputs))
buf = io.StringIO()
with redirect_stdout(buf):
rc = run_whatsapp_cloud_setup()
assert rc == 0
out = buf.getvalue()
assert "SETUP COMPLETE" in out
# Required fields written
assert _env_value(isolated_home, "WHATSAPP_CLOUD_PHONE_NUMBER_ID") == "7794189252778687"
assert _env_value(isolated_home, "WHATSAPP_CLOUD_ACCESS_TOKEN").startswith("EAA")
assert len(_env_value(isolated_home, "WHATSAPP_CLOUD_APP_SECRET")) == 32
assert _env_value(isolated_home, "WHATSAPP_CLOUD_ALLOWED_USERS") == "15551234567"
# Verify token auto-generated
assert _env_value(isolated_home, "WHATSAPP_CLOUD_VERIFY_TOKEN")
# Optional fields stayed unset
assert _env_value(isolated_home, "WHATSAPP_CLOUD_APP_ID") is None
assert _env_value(isolated_home, "WHATSAPP_CLOUD_WABA_ID") is None
def test_phone_number_id_validator_catches_phone_number(self, isolated_home, monkeypatch):
"""The trap test — user pastes their phone number into the
Phone Number ID field. Wizard MUST reject with a helpful
explanation, not pass through."""
inputs = iter([
"", # press Enter to continue
"15556422442", # phone number — rejected
"", # empty — gives up
])
monkeypatch.setattr("builtins.input", lambda *a, **kw: next(inputs))
buf = io.StringIO()
with redirect_stdout(buf):
rc = run_whatsapp_cloud_setup()
assert rc == 1
out = buf.getvalue()
# Must surface the specific guidance about Phone Number ID
assert "Phone number ID" in out
assert "15-17 digits" in out
# Should NOT have saved the bad value
assert _env_value(isolated_home, "WHATSAPP_CLOUD_PHONE_NUMBER_ID") is None
def test_access_token_validator_catches_openai_key(self, isolated_home, monkeypatch):
"""User pastes 'sk-proj-...' by mistake. Wizard rejects."""
inputs = iter([
"", # continue
"7794189252778687", # good Phone ID
"sk-proj-" + "x" * 100, # OpenAI key — rejected
"", # give up
])
monkeypatch.setattr("builtins.input", lambda *a, **kw: next(inputs))
buf = io.StringIO()
with redirect_stdout(buf):
rc = run_whatsapp_cloud_setup()
assert rc == 1
out = buf.getvalue()
assert "OpenAI" in out # diagnostic in error message
# Phone Number ID was saved (it was valid), but access token was not
assert _env_value(isolated_home, "WHATSAPP_CLOUD_PHONE_NUMBER_ID") == "7794189252778687"
assert _env_value(isolated_home, "WHATSAPP_CLOUD_ACCESS_TOKEN") is None
def test_verify_token_is_auto_generated(self, isolated_home, monkeypatch):
"""The verify token is one of the few things the user shouldn't
have to invent. Wizard generates a strong random one."""
inputs = iter([
"", # continue
"7794189252778687", # Phone ID
"EAA" + "x" * 200, # Token
"0123456789abcdef0123456789abcdef", # App Secret
"", # App ID — skip
"", # WABA ID — skip
"15551234567", # Allowed users
])
monkeypatch.setattr("builtins.input", lambda *a, **kw: next(inputs))
buf = io.StringIO()
with redirect_stdout(buf):
run_whatsapp_cloud_setup()
verify_token = _env_value(isolated_home, "WHATSAPP_CLOUD_VERIFY_TOKEN")
assert verify_token is not None
# secrets.token_urlsafe(32) produces ~43 chars (base64-of-32-bytes)
assert len(verify_token) >= 32
# Should also be echoed to user output so they can paste into Meta
assert verify_token in buf.getvalue()
def test_setup_complete_block_includes_post_setup_instructions(self, isolated_home, monkeypatch):
"""The wizard can't smoke-test the webhook itself (the gateway
isn't running yet), so it MUST print the exact curl/cloudflared
steps the user needs after the wizard exits."""
inputs = iter([
"", # continue
"7794189252778687", # Phone ID
"EAA" + "x" * 200, # Token
"0123456789abcdef0123456789abcdef", # App Secret
"", # App ID — skip
"", # WABA ID — skip
"15551234567", # Allowed users
])
monkeypatch.setattr("builtins.input", lambda *a, **kw: next(inputs))
buf = io.StringIO()
with redirect_stdout(buf):
run_whatsapp_cloud_setup()
out = buf.getvalue()
# Required post-setup guidance
assert "cloudflared tunnel --url http://localhost:8090" in out
assert "hermes gateway" in out
assert "Verify and save" in out
assert "messages" in out
# The verify token should be quotable on the curl line
verify_token = _env_value(isolated_home, "WHATSAPP_CLOUD_VERIFY_TOKEN")
assert verify_token in out
def test_existing_token_preserved_on_rerun(self, isolated_home, monkeypatch):
"""Re-running the wizard with existing config should let the
user keep current values by hitting Enter."""
# Pre-populate .env as if a previous run succeeded
env_file = isolated_home / ".env"
env_file.write_text(
"WHATSAPP_CLOUD_PHONE_NUMBER_ID=7794189252778687\n"
"WHATSAPP_CLOUD_ACCESS_TOKEN=EAAprevious_token_here_" + "x" * 100 + "\n"
"WHATSAPP_CLOUD_APP_SECRET=0123456789abcdef0123456789abcdef\n"
"WHATSAPP_CLOUD_VERIFY_TOKEN=existing_verify_token_already_set\n"
)
inputs = iter([
"", # continue
"", # Phone ID — keep existing
"", # Token — keep existing
"", # App Secret — keep existing
"", # App ID — skip
"", # WABA ID — skip
"", # verify token: regenerate? [y/N] — no
"", # Allowed users — keep
])
monkeypatch.setattr("builtins.input", lambda *a, **kw: next(inputs))
buf = io.StringIO()
with redirect_stdout(buf):
rc = run_whatsapp_cloud_setup()
assert rc == 0
# Values preserved
token = _env_value(isolated_home, "WHATSAPP_CLOUD_ACCESS_TOKEN")
assert token is not None
assert token.startswith("EAAprevious_token_here_")
# Verify token preserved (user said no to regenerate)
assert _env_value(isolated_home, "WHATSAPP_CLOUD_VERIFY_TOKEN") == "existing_verify_token_already_set"
# =========================================================================
# Profile polish block (SETUP COMPLETE → optional WhatsApp profile setup)
# =========================================================================
class TestProfilePolishGuidance:
"""The wizard can't set the bot's WhatsApp display name or profile
picture via the API — those go through Meta's Business Manager UI.
Verify that the SETUP COMPLETE block points the user at the right
place rather than leaving them to figure it out on their own."""
def test_polish_block_present_and_points_at_business_manager(
self, isolated_home, monkeypatch
):
inputs = iter([
"",
"7794189252778687",
"EAA" + "x" * 200,
"0123456789abcdef0123456789abcdef",
"", # App ID — skip
"", # WABA ID — skip
"15551234567",
])
monkeypatch.setattr("builtins.input", lambda *a, **kw: next(inputs))
buf = io.StringIO()
with redirect_stdout(buf):
run_whatsapp_cloud_setup()
out = buf.getvalue()
# Polish block header
assert "polish your bot's WhatsApp profile" in out
# Direct user at Meta's Business Manager (not the developer dash)
assert "business.facebook.com/wa/manage/phone-numbers" in out
# Mention each of the three things the user can do there
assert "Display name" in out
assert "profile picture" in out
assert "Edit profile" in out
# Set expectations about display-name reviews
assert "24-48h" in out or "2448h" in out
def test_polish_block_deeplinks_when_waba_id_known(
self, isolated_home, monkeypatch
):
"""If the user gave us the WABA ID earlier in the wizard, the
Business Manager URL should pre-select their account."""
waba = "987654321098765"
inputs = iter([
"",
"7794189252778687",
"EAA" + "x" * 200,
"0123456789abcdef0123456789abcdef",
"", # App ID — skip
waba, # WABA ID — provided
"15551234567",
])
monkeypatch.setattr("builtins.input", lambda *a, **kw: next(inputs))
buf = io.StringIO()
with redirect_stdout(buf):
run_whatsapp_cloud_setup()
out = buf.getvalue()
# Deep-linked URL with the user's WABA pre-selected
assert f"waba_id={waba}" in out
# Without WABA, we tell the user they'll need to pick their account
assert "select your WhatsApp Business Account" not in out

View File

@@ -301,6 +301,19 @@ For cloud sandbox backends, persistence is filesystem-oriented. `TERMINAL_LIFETI
| `WHATSAPP_ALLOWED_USERS` | Comma-separated phone numbers (with country code, no `+`), or `*` to allow all senders |
| `WHATSAPP_ALLOW_ALL_USERS` | Allow all WhatsApp senders without an allowlist (`true`/`false`) |
| `WHATSAPP_DEBUG` | Log raw message events in the bridge for troubleshooting (`true`/`false`) |
| `WHATSAPP_CLOUD_PHONE_NUMBER_ID` | Meta Phone Number ID from the WhatsApp Business Cloud API (1517 digits; **not** the phone number itself) |
| `WHATSAPP_CLOUD_ACCESS_TOKEN` | Meta access token (starts with `EAA`); temporary tokens expire after 24h, System User tokens are permanent |
| `WHATSAPP_CLOUD_APP_SECRET` | 32-char hex app secret used to verify inbound webhook signatures |
| `WHATSAPP_CLOUD_VERIFY_TOKEN` | Shared secret for Meta's webhook verification handshake (auto-generated by the setup wizard) |
| `WHATSAPP_CLOUD_ALLOWED_USERS` | Comma-separated `wa_id`s (phone numbers with country code, no `+`) allowed to message the bot |
| `WHATSAPP_CLOUD_ALLOW_ALL_USERS` | Allow all WhatsApp Cloud senders without an allowlist (`true`/`false`) |
| `WHATSAPP_CLOUD_APP_ID` | Optional Meta App ID (for future analytics integration) |
| `WHATSAPP_CLOUD_WABA_ID` | Optional WhatsApp Business Account ID (for future analytics integration) |
| `WHATSAPP_CLOUD_WEBHOOK_HOST` | Interface the inbound webhook server binds to (default `0.0.0.0`) |
| `WHATSAPP_CLOUD_WEBHOOK_PORT` | Port the inbound webhook server binds to (default `8090`) |
| `WHATSAPP_CLOUD_WEBHOOK_PATH` | URL path Meta posts inbound messages to (default `/whatsapp/webhook`) |
| `WHATSAPP_CLOUD_API_VERSION` | Meta Graph API version to call (default `v20.0`) |
| `WHATSAPP_CLOUD_HOME_CHANNEL` | `wa_id` to use as the bot's home channel (for cron jobs etc.) |
| `SIGNAL_HTTP_URL` | signal-cli daemon HTTP endpoint (for example `http://127.0.0.1:8080`) |
| `SIGNAL_ACCOUNT` | Bot phone number in E.164 format |
| `SIGNAL_ALLOWED_USERS` | Comma-separated E.164 phone numbers or UUIDs |

View File

@@ -423,6 +423,7 @@ Each platform has its own toolset:
| Telegram | `hermes-telegram` | Full tools including terminal |
| Discord | `hermes-discord` | Full tools including terminal |
| WhatsApp | `hermes-whatsapp` | Full tools including terminal |
| WhatsApp Cloud API | `hermes-whatsapp` | Full tools including terminal (shares toolset with the Baileys bridge) |
| Slack | `hermes-slack` | Full tools including terminal |
| Google Chat | `hermes-google_chat` | Full tools including terminal |
| Signal | `hermes-signal` | Full tools including terminal |
@@ -528,6 +529,7 @@ Defaults to `false`. Only platforms whose adapter implements `delete_message` ho
- [Slack Setup](slack.md)
- [Google Chat Setup](google_chat.md)
- [WhatsApp Setup](whatsapp.md)
- [WhatsApp Business Cloud API Setup](whatsapp-cloud.md)
- [Signal Setup](signal.md)
- [SMS Setup (Twilio)](sms.md)
- [Email Setup](email.md)

View File

@@ -0,0 +1,418 @@
---
sidebar_position: 6
title: "WhatsApp Business (Cloud API)"
description: "Set up Hermes Agent as a WhatsApp bot via Meta's official Business Cloud API"
---
# WhatsApp Business Cloud API Setup
Hermes can connect to WhatsApp through Meta's **official** WhatsApp Business Cloud API. This is the production-grade path: no Node.js bridge subprocess, no QR codes, no account-ban risk.
In exchange:
- You need a **Meta Business account** (not personal WhatsApp).
- The bot operates on a dedicated business phone number, not your personal number.
- The Hermes gateway needs a **public HTTPS URL** so Meta can deliver inbound messages via webhook.
- Replies more than 24 hours after the user's last message require a pre-approved **template** (this is Meta's "customer service window" rule, not a Hermes limit).
If those constraints don't work for your use case, the [Baileys bridge integration](./whatsapp.md) is the alternative — personal account, no public URL needed, but unofficial and ban-prone.
:::tip Which one should I use?
- **Cloud API (this guide)** — running a real business bot, want stability, fine with the Meta verification + template paperwork
- **[Baileys bridge](./whatsapp.md)** — personal projects, quick demos, single-user setups, willing to risk the bot phone number's account
:::
---
## Quick start
```bash
hermes whatsapp-cloud
```
The wizard walks you through every credential, validates each one as you paste it (catches the #1 setup trap — pasting a phone number into the Phone Number ID field), and prints exact follow-up instructions for the parts that need to happen outside the wizard (starting cloudflared, configuring Meta's webhook dashboard).
The rest of this page is the manual reference.
---
## Prerequisites
1. **A Meta Business account**. Create one at [business.facebook.com](https://business.facebook.com/).
2. **A Meta app with WhatsApp enabled**. See "Creating the Meta app" below.
3. **A way to expose a local port to the public internet** with HTTPS. Cloudflare Tunnel (`cloudflared`) is recommended — free, no port forwarding, no domain required. ngrok, your own domain with a reverse proxy + TLS, or a VPS with the gateway directly bound to a public IP all work too.
4. **Optional but recommended**: ffmpeg on `PATH` so outbound voice messages render as native WhatsApp voice-note bubbles (green waveform) instead of MP3 audio attachments. Hermes degrades gracefully if absent.
---
## Creating the Meta app
1. Go to [developers.facebook.com/apps](https://developers.facebook.com/apps) → **Create App**.
2. Choose use case: **"Connect with customers through WhatsApp"** → **Next**.
3. Pick or create a business portfolio. Review the publishing requirements. Confirm → **Create app**.
4. After creation you'll land on **Customize use case → Connect on WhatsApp → Quickstart**. Click **Start using the API** → you're now on the **API Setup** page.
5. Make sure a WhatsApp Business Account (WABA) is linked. If you created a new portfolio in step 3, one was auto-created. Verify in the API Setup page.
You'll need these values from the dashboard — the wizard prompts for them in this order:
| Value | Where in dashboard | Field shape | Notes |
|---|---|---|---|
| **Phone Number ID** | App Dashboard → WhatsApp → API Setup → below the "From" dropdown | Numeric, 15-17 digits | **NOT** the phone number itself. The #1 setup mistake is pasting the actual phone number here. |
| **Access Token** | App Dashboard → WhatsApp → API Setup → "Generate access token" | Starts with `EAA`, 100+ chars | Temp tokens last 24h — see "Permanent token" below for production. |
| **App Secret** | App Dashboard → Settings → Basic → click "Show" next to App secret | 32-character lowercase hex | Used to verify incoming webhook signatures. Without it, inbound delivery is refused with 503. |
| **App ID** (optional) | App Dashboard → Settings → Basic | Numeric, 15-16 digits | Not required for messaging, useful for analytics. |
| **WABA ID** (optional) | App Dashboard → WhatsApp → API Setup → near the top | Numeric, 15+ digits | Not required for messaging, useful for analytics. |
---
## Permanent token (production)
Temporary access tokens expire after **24 hours**, which means a token generated today stops working tomorrow. For production deployments use a **System User permanent token**:
1. Go to [business.facebook.com/latest/settings](https://business.facebook.com/latest/settings) → **System users** (left sidebar).
2. **Add** → name (e.g. `hermes-bot`) → role: **Admin**.
3. Select the new user → **Assign Assets**:
- Select your app → toggle **Manage app** under Full control.
- Select your WhatsApp account → toggle **Manage WhatsApp Business Accounts** under Full control.
- Click **Assign assets**.
4. **Generate token** with these permissions:
- `business_management`
- `whatsapp_business_messaging`
- `whatsapp_business_management`
5. Set **token expiration: Never**.
6. Copy the token → update `WHATSAPP_CLOUD_ACCESS_TOKEN` in `~/.hermes/.env` → restart the gateway.
System User tokens don't expire unless you explicitly revoke them.
---
## Exposing Hermes to the internet
The Cloud API delivers inbound messages by HTTPS POST to your webhook URL — that means the Hermes gateway has to be reachable from Meta's servers. Three common ways:
### Cloudflare Tunnel (recommended)
Free, no port forwarding, works on Windows / macOS / Linux. Runs as a separate process alongside the gateway.
**Install:**
```bash
# Windows
winget install Cloudflare.cloudflared
# macOS
brew install cloudflared
# Linux
# Download the binary from https://github.com/cloudflare/cloudflared/releases
```
**Run a quick tunnel** (no Cloudflare account needed — gives you a `https://<random>.trycloudflare.com` URL):
```bash
cloudflared tunnel --url http://localhost:8090
```
Note the printed URL — that's what you'll give Meta.
:::warning Quick tunnels rotate
The free quick-tunnel URL changes every time you restart `cloudflared`. For a stable URL, log in with `cloudflared tunnel login` and create a named tunnel. Free Cloudflare accounts get unlimited named tunnels — see [Cloudflare's docs](https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/) for the named-tunnel workflow.
:::
### ngrok
```bash
ngrok http 8090
```
Free tier shows a different URL on each restart. Paid tier gives you a stable subdomain.
### Your own domain + reverse proxy
If you already have a server with a TLS cert (Caddy, nginx, etc.), point a route at `localhost:8090`. This is the most stable option for production but requires existing infrastructure.
---
## Configuring the webhook on Meta's side
Once your tunnel is running:
1. Note the public URL printed by your tunnel — say `https://abc123.trycloudflare.com`.
2. Generate a **Verify Token** — the wizard does this for you with `secrets.token_urlsafe(32)`; if you're configuring manually, run:
```bash
python -c "import secrets; print(secrets.token_urlsafe(32))"
```
Save it as `WHATSAPP_CLOUD_VERIFY_TOKEN` in `~/.hermes/.env`.
3. Start the Hermes gateway: `hermes gateway`.
4. In the Meta App Dashboard → **WhatsApp → Configuration** (or **Use cases → Customize → Configuration** depending on UI version) → click **Edit** on the Webhook section.
5. Fill in:
- **Callback URL**: `https://abc123.trycloudflare.com/whatsapp/webhook`
- **Verify Token**: the string from step 2 (must match exactly)
6. Click **Verify and save**. Meta hits your URL with a GET request, the gateway echoes back the challenge, and Meta marks the webhook as verified.
7. Under **Webhook fields**, click **Manage** → subscribe to the **messages** field. This is what tells Meta to actually deliver inbound messages to your webhook.
**To verify the loop manually** (from a third terminal):
```bash
TUNNEL="https://abc123.trycloudflare.com"
VERIFY="<your verify token>"
# Should print HTTP 200 with body "hello"
curl -i "$TUNNEL/whatsapp/webhook?hub.mode=subscribe&hub.verify_token=$VERIFY&hub.challenge=hello"
# Health endpoint — should show verify_token_configured: true and app_secret_configured: true
curl "$TUNNEL/health"
```
---
## Recipient whitelist (Meta-side)
In development mode (before your app goes through App Review), Meta restricts which numbers your bot can message:
1. App Dashboard → WhatsApp → API Setup → **To** dropdown.
2. Click **Manage phone number list**.
3. Add the phone numbers you want to message (yours, your team's, friendly testers). Meta sends each one a 6-digit verification code via SMS or WhatsApp.
Up to 5 numbers in dev mode. Going to App Review removes this limit.
---
## Allowlist (Hermes-side)
In addition to Meta's recipient whitelist, Hermes has its own per-platform allowlist that controls **which incoming messages the agent processes**. Add to `~/.hermes/.env`:
```bash
# Comma-separated phone numbers, country code, no '+' / spaces / dashes
WHATSAPP_CLOUD_ALLOWED_USERS=15551234567,15557654321
# Or allow everyone (only safe in combination with Meta's recipient whitelist)
# WHATSAPP_CLOUD_ALLOW_ALL_USERS=true
```
The wizard sets this in step 6. Without an allowlist, **every inbound message is denied** — this is intentional, so the bot can't be invoked by random numbers if the recipient whitelist is ever loosened.
---
## Polishing your bot's WhatsApp profile
WhatsApp displays a **name and profile picture** for your bot in the chat header and contact list. These can't be set via the Cloud API — they live in Meta's Business Manager.
Once your bot is working, head to **[business.facebook.com/wa/manage/phone-numbers](https://business.facebook.com/wa/manage/phone-numbers/)**, click your phone number, and you'll find:
| What | Where | Notes |
|---|---|---|
| **Display name** | Top of the phone-number page | Changes go through Meta's name-review process (~2448 hours). |
| **Profile picture** | Top of the phone-number page | Square image, ≥640×640px recommended. Updates immediately. |
| **About / description / website / email / hours / category** | "Edit profile" button | These appear in the info pane when a user taps the bot's name. Cosmetic. |
| **Verified badge** (green checkmark) | Business Manager → Security Center → Start Verification | Requires Meta's separate business verification process. |
The `hermes whatsapp-cloud` wizard prints these links at the end of setup. None of this is required for the bot to work — it's pure polish for how your bot appears to users.
---
## Configuration reference
All settings live in `~/.hermes/.env`. Required values are in **bold**.
| Variable | Default | Description |
|---|---|---|
| **`WHATSAPP_CLOUD_PHONE_NUMBER_ID`** | — | The 15-17 digit ID from API Setup. **Not** the phone number. |
| **`WHATSAPP_CLOUD_ACCESS_TOKEN`** | — | Meta access token (starts with `EAA`). Temp 24h or System User permanent. |
| **`WHATSAPP_CLOUD_APP_SECRET`** | — | 32-char hex from Settings → Basic. Without it, inbound is refused with 503. |
| **`WHATSAPP_CLOUD_VERIFY_TOKEN`** | — | Shared secret for the GET handshake. Auto-generated by the wizard. |
| **`WHATSAPP_CLOUD_ALLOWED_USERS`** | — | Comma-separated wa_ids allowed to message the bot. |
| `WHATSAPP_CLOUD_ALLOW_ALL_USERS` | `false` | Set to `true` to bypass the allowlist. |
| `WHATSAPP_CLOUD_APP_ID` | — | Optional, for future analytics integration. |
| `WHATSAPP_CLOUD_WABA_ID` | — | Optional, for future analytics integration. |
| `WHATSAPP_CLOUD_WEBHOOK_HOST` | `0.0.0.0` | Interface the webhook server binds to. |
| `WHATSAPP_CLOUD_WEBHOOK_PORT` | `8090` | Port the webhook server binds to. Must match the port your tunnel forwards. |
| `WHATSAPP_CLOUD_WEBHOOK_PATH` | `/whatsapp/webhook` | URL path Meta posts to. |
| `WHATSAPP_CLOUD_API_VERSION` | `v20.0` | Meta Graph API version. Only override if a newer version is recommended in Meta's docs. |
| `WHATSAPP_CLOUD_HOME_CHANNEL` | — | wa_id to use as the bot's home channel (for cron jobs etc). |
You can have **both** the Baileys (`whatsapp`) and Cloud (`whatsapp_cloud`) adapters enabled simultaneously, targeting different phone numbers.
---
## Features
### Inbound
- **Text messages** — passed straight to the agent.
- **Images** — auto-downloaded and attached to the agent's input. Models with native vision (Claude, GPT-4o, Gemini, etc.) read the image directly; non-vision models receive an auto-generated text description.
- **Voice notes** — auto-downloaded as `.ogg`, transcribed via your configured STT provider (local faster-whisper, OpenAI/Nous, Groq, etc.), then handed to the agent as text.
- **Documents** — auto-downloaded. Small text-readable files (`.txt`, `.md`, `.json`, `.py`, `.csv`, etc.) up to 100KB get inlined into the agent's input so it can read them without a tool call. Larger files are cached locally for the agent's other tools to access.
- **Button taps** — when the user taps a button the bot sent earlier (clarify choice, command approval, slash-command confirm), the tap is routed directly to the right handler. Stale taps fall back to being treated as regular text input.
- **Reply context** — when the user replies to a previous bot message, the agent sees the original message as context.
### Outbound
- **Text** — markdown is auto-converted to WhatsApp's flavored syntax (`**bold**` → `*bold*`, `~~strike~~` → `~strike~`, headers → bold, `[link](url)` → `link (url)`). Long messages split at 4096 chars per chunk.
- **Images** — agent-generated images and local image files both supported, delivered as native photo attachments.
- **Voice messages** — text-to-speech output is converted via ffmpeg into the native WhatsApp voice-note bubble (green waveform). Without ffmpeg installed, falls back to an MP3 audio attachment. See "Voice messages" below.
- **Video / documents** — both supported, sent as native attachments.
### Interactive UX
When the agent invokes any of these flows, Hermes uses WhatsApp's native interactive messages — tap-to-answer buttons instead of "reply with the number" prompts:
- **`clarify` tool** — multi-choice questions render as quick-reply buttons (13 choices) or a tap-to-open list sheet (4+ choices). Picking "✏️ Other" lets the user type a free-form answer that the agent receives as the resolution.
- **Dangerous-command approvals** — when the agent's terminal/code execution hits a gated command, the user sees `✅ Approve` / `❌ Deny` buttons instead of needing to type `/approve` or `/deny`.
- **Slash-command confirmations** — privileged commands like `/reload-mcp` show `✅ Approve Once` / `🔒 Always` / `❌ Cancel` buttons.
All interactive prompts gracefully degrade to plain text if the buttons fail to render (e.g. on legacy WhatsApp clients).
### Read receipts and typing indicator
Hermes acknowledges inbound messages immediately:
- Your message shows **blue double-checkmarks** as soon as the gateway receives it.
- The bot's name in your WhatsApp chat shows **"typing…"** while the agent is preparing a reply.
- The typing indicator auto-dismisses when the bot's first response message arrives.
This makes it obvious when the bot has seen your message versus when it's still working on a response.
### Voice messages
WhatsApp distinguishes between a "voice note" (the green waveform bubble) and a generic audio file attachment. The difference is purely codec: voice notes need to be `audio/ogg` with `opus` encoding.
Hermes TTS produces MP3. Two paths:
- **With ffmpeg on PATH** (recommended) — outbound TTS is converted and arrives as a proper voice note. Install:
- Windows: `winget install Gyan.FFmpeg`
- macOS: `brew install ffmpeg`
- Linux: package manager
- **Without ffmpeg** — outbound TTS arrives as an MP3 audio attachment. Plays fine, just doesn't look like a voice note. A one-time warning fires in the gateway log so you know.
You can check whether the gateway found ffmpeg via the health endpoint:
```bash
curl http://localhost:8090/health
# look for "ffmpeg_present": true
```
---
## Known limitations
### 24-hour conversation window
Meta only allows **free-form messages** within a 24-hour window after the user's last inbound message. Outside that window, the only thing Meta's API accepts is a pre-approved **message template**.
**What this means in practice:**
- Reactive chat (user DMs → bot replies within 24h → user replies → ...) works forever. This covers >95% of normal bot use.
- **Cron jobs that deliver to WhatsApp** after a gap > 24h will fail with Graph error code `131047` ("Re-engagement message").
- **Long-running `delegate_task` async results** that take longer than 24h fail the same way.
- **Webhook subscribers** that route external events to WhatsApp fail when the user hasn't DM'd the bot recently.
Hermes warns the agent about this window in its system prompt, so the model knows to mention it when scheduling delayed messages.
Message-template support (the workaround for outside-window sends) is not yet implemented in Hermes. If you need it, please [open an issue](https://github.com/NousResearch/hermes-agent/issues) — it's planned but waiting on a clear demand signal.
### Group chats
The Cloud API has limited group support (capability-tier gated by Meta). Hermes's `whatsapp_cloud` adapter currently handles **direct messages only** in v1. If you need group chats, use the Baileys bridge.
### Outbound rate limit
Meta's default throughput is **80 messages/second per business phone number**, with upgrades available. Hermes doesn't currently enforce this client-side — extremely high-volume sends could hit Meta's limit.
---
## Troubleshooting
### Setup verification fails ("URL couldn't be validated") in Meta dashboard
Almost always one of:
- **Tunnel URL is wrong or stale** — cloudflared quick tunnels rotate. Get a fresh URL and update both `.env` and Meta's dashboard.
- **Verify token mismatch** — the token in `~/.hermes/.env`'s `WHATSAPP_CLOUD_VERIFY_TOKEN` must match exactly what you typed into Meta's dashboard. Run the curl probe above to confirm the gateway's verify handshake works locally first.
- **Gateway not running** — check `hermes gateway` is up.
- **App Secret not set** — without it, Hermes refuses inbound POSTs with 503. Meta interprets that as "can't validate."
### `graph error 100`: Object with ID '...' does not exist
You pasted your phone number (10-11 digits) into `WHATSAPP_CLOUD_PHONE_NUMBER_ID` instead of the Phone Number ID (Meta's 15-17 digit internal ID). Re-check the API Setup page — the Phone Number ID is shown *below* the "From" dropdown.
The wizard catches this with a validator now, but it's worth knowing if you're configuring manually.
### `graph error 190`: Authentication Error
Your access token is invalid. Subcodes:
- `subcode 463` — token expired. Temp tokens last 24h. Regenerate, or switch to a System User permanent token (see above).
- `subcode 467` — token invalidated (revoked or password changed).
- Other 190 — token didn't have the required permissions when generated. Make sure all three (`business_management`, `whatsapp_business_messaging`, `whatsapp_business_management`) were selected.
### `graph error 131047`: Re-engagement message
The 24-hour conversation window expired (see "Known limitations"). Either:
- Ask the user to DM the bot first to reopen the window.
- Wait for template support to land in Hermes.
### Inbound message: `media metadata fetch failed (status=401)`
Same 401 root causes as outbound (`graph error 190`) — the access token is invalid or expired. Fix the token.
### Bot replies appear as raw JSON / tool-call leakage
Common cause: the toolset configured for `whatsapp_cloud` is missing the tools the agent wants to call. Check `hermes tools list` and verify the platform is using `hermes-whatsapp` (the default Cloud adapter toolset, same as Baileys).
If the model emits tool-call-shaped text instead of a structured call, it usually means the toolset was effectively empty. See `hermes_cli/platforms.py` for the platform → default toolset mapping.
### STT (voice note transcription) returns empty / "could not transcribe"
The default `stt.provider: local` requires `pip install faster-whisper`. If you're a Nous subscriber, you can route STT through Meta's managed audio gateway instead:
```bash
hermes config set stt.provider openai
hermes config set stt.use_gateway true
hermes gateway restart
```
This uses your Nous Portal access token instead of needing a separate OpenAI key.
---
## Security notes
- **Treat the App Secret like a password** — anyone with it can forge webhook payloads that Hermes will accept as authentic.
- **The verify token is a shared secret** — leaks are lower-stakes (worst case someone could re-subscribe Meta's webhook to a different URL of theirs), but still avoid committing it.
- **The access token is your bot's identity** — System User tokens are equivalent to long-lived API keys. Rotate immediately if a deployment is compromised.
- **The webhook endpoint accepts only signed requests when `WHATSAPP_CLOUD_APP_SECRET` is set** — leave it set even in development. Without it, the gateway refuses inbound delivery with HTTP 503.
- **The `/health` endpoint is unauthenticated** — it's safe to expose because it only reports config-presence booleans, not the values themselves. But if you'd rather not surface it, restrict access at the reverse proxy / tunnel layer.
---
## Comparison to the Baileys bridge
| | Baileys (`hermes whatsapp`) | Cloud API (`hermes whatsapp-cloud`) |
|---|---|---|
| Account type | Personal | Business |
| Setup | QR code scan | Meta app + WABA + token |
| Dependencies | Node.js + npm | Pure Python (httpx + aiohttp) |
| Process | Managed Node subprocess | aiohttp webhook server |
| Public URL needed? | No | Yes |
| Account ban risk | Yes (unofficial API) | No (officially supported) |
| Inbound | Polling Node bridge | Webhook POST from Meta |
| Outbound | Local bridge → Baileys | HTTPS to graph.facebook.com |
| Groups | Full support | DMs only (v1) |
| 24h window | No restriction | Hard rule — templates required after |
| Voice notes (out) | Native | Native with ffmpeg, MP3 fallback otherwise |
| Read receipts | No | Yes (blue double-checkmarks) |
| Typing indicator | No | Yes (auto-dismisses on response) |
| Interactive buttons | Text fallback only | Native (clarify, approval, slash-confirm) |
| Production use | Risky (Meta can ban) | Designed for it |
Most users running Hermes for personal projects prefer Baileys. Most users running customer-facing bots prefer Cloud API.
---
## See also
- [Meta's official WhatsApp Business Cloud API docs](https://developers.facebook.com/documentation/business-messaging/whatsapp/) — authoritative reference for the underlying platform, pricing, App Review, and Meta-side rate limits.
- [WhatsApp (Baileys bridge) Setup](whatsapp.md) — the alternative integration for personal projects.
- [Messaging Platforms overview](index.md) — all messaging integrations at a glance.

View File

@@ -8,6 +8,14 @@ description: "Set up Hermes Agent as a WhatsApp bot via the built-in Baileys bri
Hermes connects to WhatsApp through a built-in bridge based on **Baileys**. This works by emulating a WhatsApp Web session — **not** through the official WhatsApp Business API. No Meta developer account or Business verification is required.
:::tip Two WhatsApp integrations
This page is for the **Baileys bridge** — quick to set up, personal accounts, no public URL needed, ban risk.
If you're running a real business bot and want stability, see the **[WhatsApp Business Cloud API guide](./whatsapp-cloud.md)** instead. It's the official Meta-supported path: no account ban risk, but requires a Meta Business account and a public webhook URL.
The two adapters can also run in parallel against different phone numbers if you have a reason to.
:::
:::warning Unofficial API — Ban Risk
WhatsApp does **not** officially support third-party bots outside the Business API. Using a third-party bridge carries a small risk of account restrictions. To minimize risk:
- **Use a dedicated phone number** for the bot (not your personal number)

View File

@@ -617,6 +617,7 @@ const sidebars: SidebarsConfig = {
'user-guide/messaging/discord',
'user-guide/messaging/slack',
'user-guide/messaging/whatsapp',
'user-guide/messaging/whatsapp-cloud',
'user-guide/messaging/signal',
'user-guide/messaging/email',
'user-guide/messaging/sms',