mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-28 06:51:16 +08:00
feat(skills): add optional telephony skill with Twilio, SMS, and AI calls (#1289)
* feat: improve context compaction handoff summaries Adapt PR #916 onto current main by replacing the old context summary marker with a clearer handoff wrapper, updating the summarization prompt for resume-oriented summaries, and preserving the current call_llm-based compression path. * fix: clearer error when docker backend is unavailable * fix: preserve docker discovery in backend preflight Follow up on salvaged PR #940 by reusing find_docker() during the new availability check so non-PATH Docker Desktop installs still work. Add a regression test covering the resolved executable path. * test: make gateway async tests xdist-safe Replace sync test usage of asyncio.get_event_loop().run_until_complete() with asyncio.run() so tests do not depend on an ambient current event loop. Also create the email disconnect poll task inside a running loop. This fixes xdist/CI failures where workers have no current loop in MainThread. * feat(skills): add phone-calls skill for outbound AI voice calls Reformulated from core tool (PR #847 feedback) into a skill with a standalone helper script. No new dependencies — uses only Python stdlib. Two providers supported: - Bland.ai (default): simple setup, one API key - Vapi: flexible, better voice quality via ElevenLabs/Deepgram + Twilio Includes: - SKILL.md with full procedure, safety rules, provider docs, pitfalls - scripts/phone_call.py CLI helper (call, status, diagnose commands) * feat(skills): expand phone-calls into optional telephony skill Follow up on salvaged PR #965 by moving the capability into optional-skills and broadening it from outbound AI calling to a full telephony skill. Add Twilio number provisioning, env/state persistence, SMS/MMS, inbound SMS polling, Vapi import helpers, and a provider decision tree while keeping telephony out of core runtime code. * docs(skills): clarify Hermes TTS telephony workflow --------- Co-authored-by: aydnOktay <xaydinoktay@gmail.com> Co-authored-by: mormio <morganemoss@gmai.com>
This commit is contained in:
417
optional-skills/productivity/telephony/SKILL.md
Normal file
417
optional-skills/productivity/telephony/SKILL.md
Normal file
@@ -0,0 +1,417 @@
|
||||
---
|
||||
name: telephony
|
||||
description: Give Hermes phone capabilities without core tool changes. Provision and persist a Twilio number, send and receive SMS/MMS, make direct calls, and place AI-driven outbound calls through Bland.ai or Vapi.
|
||||
version: 1.0.0
|
||||
author: Nous Research
|
||||
license: MIT
|
||||
metadata:
|
||||
hermes:
|
||||
tags: [telephony, phone, sms, mms, voice, twilio, bland.ai, vapi, calling, texting]
|
||||
related_skills: [find-nearby, google-workspace, agentmail]
|
||||
category: productivity
|
||||
---
|
||||
|
||||
# Telephony — Numbers, Calls, and Texts without Core Tool Changes
|
||||
|
||||
This optional skill gives Hermes practical phone capabilities while keeping telephony out of the core tool list.
|
||||
|
||||
It ships with a helper script, `scripts/telephony.py`, that can:
|
||||
- save provider credentials into `~/.hermes/.env`
|
||||
- search for and buy a Twilio phone number
|
||||
- remember that owned number for later sessions
|
||||
- send SMS / MMS from the owned number
|
||||
- poll inbound SMS for that number with no webhook server required
|
||||
- make direct Twilio calls using TwiML `<Say>` or `<Play>`
|
||||
- import the owned Twilio number into Vapi
|
||||
- place outbound AI calls through Bland.ai or Vapi
|
||||
|
||||
## What this solves
|
||||
|
||||
This skill is meant to cover the practical phone tasks users actually want:
|
||||
- outbound calls
|
||||
- texting
|
||||
- owning a reusable agent number
|
||||
- checking messages that arrive to that number later
|
||||
- preserving that number and related IDs between sessions
|
||||
- future-friendly telephony identity for inbound SMS polling and other automations
|
||||
|
||||
It does **not** turn Hermes into a real-time inbound phone gateway. Inbound SMS is handled by polling the Twilio REST API. That is enough for many workflows, including notifications and some one-time-code retrieval, without adding core webhook infrastructure.
|
||||
|
||||
## Safety rules — mandatory
|
||||
|
||||
1. Always confirm before placing a call or sending a text.
|
||||
2. Never dial emergency numbers.
|
||||
3. Never use telephony for harassment, spam, impersonation, or anything illegal.
|
||||
4. Treat third-party phone numbers as sensitive operational data:
|
||||
- do not save them to Hermes memory
|
||||
- do not include them in skill docs, summaries, or follow-up notes unless the user explicitly wants that
|
||||
5. It is fine to persist the **agent-owned Twilio number** because that is part of the user's configuration.
|
||||
6. VoIP numbers are **not guaranteed** to work for all third-party 2FA flows. Use with caution and set user expectations clearly.
|
||||
|
||||
## Decision tree — which service to use?
|
||||
|
||||
Use this logic instead of hardcoded provider routing:
|
||||
|
||||
### 1) "I want Hermes to own a real phone number"
|
||||
Use **Twilio**.
|
||||
|
||||
Why:
|
||||
- easiest path to buying and keeping a number
|
||||
- best SMS / MMS support
|
||||
- simplest inbound SMS polling story
|
||||
- cleanest future path to inbound webhooks or call handling
|
||||
|
||||
Use cases:
|
||||
- receive texts later
|
||||
- send deployment alerts / cron notifications
|
||||
- maintain a reusable phone identity for the agent
|
||||
- experiment with phone-based auth flows later
|
||||
|
||||
### 2) "I only need the easiest outbound AI phone call right now"
|
||||
Use **Bland.ai**.
|
||||
|
||||
Why:
|
||||
- quickest setup
|
||||
- one API key
|
||||
- no need to first buy/import a number yourself
|
||||
|
||||
Tradeoff:
|
||||
- less flexible
|
||||
- voice quality is decent, but not the best
|
||||
|
||||
### 3) "I want the best conversational AI voice quality"
|
||||
Use **Twilio + Vapi**.
|
||||
|
||||
Why:
|
||||
- Twilio gives you the owned number
|
||||
- Vapi gives you better conversational AI call quality and more voice/model flexibility
|
||||
|
||||
Recommended flow:
|
||||
1. Buy/save a Twilio number
|
||||
2. Import it into Vapi
|
||||
3. Save the returned `VAPI_PHONE_NUMBER_ID`
|
||||
4. Use `ai-call --provider vapi`
|
||||
|
||||
### 4) "I want to call with a custom prerecorded voice message"
|
||||
Use **Twilio direct call** with a public audio URL.
|
||||
|
||||
Why:
|
||||
- easiest way to play a custom MP3
|
||||
- pairs well with Hermes `text_to_speech` plus a public file host or tunnel
|
||||
|
||||
## Files and persistent state
|
||||
|
||||
The skill persists telephony state in two places:
|
||||
|
||||
### `~/.hermes/.env`
|
||||
Used for long-lived provider credentials and owned-number IDs, for example:
|
||||
- `TWILIO_ACCOUNT_SID`
|
||||
- `TWILIO_AUTH_TOKEN`
|
||||
- `TWILIO_PHONE_NUMBER`
|
||||
- `TWILIO_PHONE_NUMBER_SID`
|
||||
- `BLAND_API_KEY`
|
||||
- `VAPI_API_KEY`
|
||||
- `VAPI_PHONE_NUMBER_ID`
|
||||
- `PHONE_PROVIDER` (AI call provider: bland or vapi)
|
||||
|
||||
### `~/.hermes/telephony_state.json`
|
||||
Used for skill-only state that should survive across sessions, for example:
|
||||
- remembered default Twilio number / SID
|
||||
- remembered Vapi phone number ID
|
||||
- last inbound message SID/date for inbox polling checkpoints
|
||||
|
||||
This means:
|
||||
- the next time the skill is loaded, `diagnose` can tell you what number is already configured
|
||||
- `twilio-inbox --since-last --mark-seen` can continue from the previous checkpoint
|
||||
|
||||
## Locate the helper script
|
||||
|
||||
After installing this skill, locate the script like this:
|
||||
|
||||
```bash
|
||||
SCRIPT="$(find ~/.hermes/skills -path '*/telephony/scripts/telephony.py' -print -quit)"
|
||||
```
|
||||
|
||||
If `SCRIPT` is empty, the skill is not installed yet.
|
||||
|
||||
## Install
|
||||
|
||||
This is an official optional skill, so install it from the Skills Hub:
|
||||
|
||||
```bash
|
||||
hermes skills search telephony
|
||||
hermes skills install official/productivity/telephony
|
||||
```
|
||||
|
||||
## Provider setup
|
||||
|
||||
### Twilio — owned number, SMS/MMS, direct calls, inbound SMS polling
|
||||
|
||||
Sign up at:
|
||||
- https://www.twilio.com/try-twilio
|
||||
|
||||
Then save credentials into Hermes:
|
||||
|
||||
```bash
|
||||
python3 "$SCRIPT" save-twilio ACXXXXXXXXXXXXXXXXXXXXXXXXXXXX your_auth_token_here
|
||||
```
|
||||
|
||||
Search for available numbers:
|
||||
|
||||
```bash
|
||||
python3 "$SCRIPT" twilio-search --country US --area-code 702 --limit 5
|
||||
```
|
||||
|
||||
Buy and remember a number:
|
||||
|
||||
```bash
|
||||
python3 "$SCRIPT" twilio-buy "+17025551234" --save-env
|
||||
```
|
||||
|
||||
List owned numbers:
|
||||
|
||||
```bash
|
||||
python3 "$SCRIPT" twilio-owned
|
||||
```
|
||||
|
||||
Set one of them as the default later:
|
||||
|
||||
```bash
|
||||
python3 "$SCRIPT" twilio-set-default "+17025551234" --save-env
|
||||
# or
|
||||
python3 "$SCRIPT" twilio-set-default PNXXXXXXXXXXXXXXXXXXXXXXXXXXXX --save-env
|
||||
```
|
||||
|
||||
### Bland.ai — easiest outbound AI calling
|
||||
|
||||
Sign up at:
|
||||
- https://app.bland.ai
|
||||
|
||||
Save config:
|
||||
|
||||
```bash
|
||||
python3 "$SCRIPT" save-bland your_bland_api_key --voice mason
|
||||
```
|
||||
|
||||
### Vapi — better conversational voice quality
|
||||
|
||||
Sign up at:
|
||||
- https://dashboard.vapi.ai
|
||||
|
||||
Save the API key first:
|
||||
|
||||
```bash
|
||||
python3 "$SCRIPT" save-vapi your_vapi_api_key
|
||||
```
|
||||
|
||||
Import your owned Twilio number into Vapi and persist the returned phone number ID:
|
||||
|
||||
```bash
|
||||
python3 "$SCRIPT" vapi-import-twilio --save-env
|
||||
```
|
||||
|
||||
If you already know the Vapi phone number ID, save it directly:
|
||||
|
||||
```bash
|
||||
python3 "$SCRIPT" save-vapi your_vapi_api_key --phone-number-id vapi_phone_number_id_here
|
||||
```
|
||||
|
||||
## Diagnose current state
|
||||
|
||||
At any time, inspect what the skill already knows:
|
||||
|
||||
```bash
|
||||
python3 "$SCRIPT" diagnose
|
||||
```
|
||||
|
||||
Use this first when resuming work in a later session.
|
||||
|
||||
## Common workflows
|
||||
|
||||
### A. Buy an agent number and keep using it later
|
||||
|
||||
1. Save Twilio credentials:
|
||||
```bash
|
||||
python3 "$SCRIPT" save-twilio AC... auth_token_here
|
||||
```
|
||||
|
||||
2. Search for a number:
|
||||
```bash
|
||||
python3 "$SCRIPT" twilio-search --country US --area-code 702 --limit 10
|
||||
```
|
||||
|
||||
3. Buy it and save it into `~/.hermes/.env` + state:
|
||||
```bash
|
||||
python3 "$SCRIPT" twilio-buy "+17025551234" --save-env
|
||||
```
|
||||
|
||||
4. Next session, run:
|
||||
```bash
|
||||
python3 "$SCRIPT" diagnose
|
||||
```
|
||||
This shows the remembered default number and inbox checkpoint state.
|
||||
|
||||
### B. Send a text from the agent number
|
||||
|
||||
```bash
|
||||
python3 "$SCRIPT" twilio-send-sms "+15551230000" "Your deployment completed successfully."
|
||||
```
|
||||
|
||||
With media:
|
||||
|
||||
```bash
|
||||
python3 "$SCRIPT" twilio-send-sms "+15551230000" "Here is the chart." --media-url "https://example.com/chart.png"
|
||||
```
|
||||
|
||||
### C. Check inbound texts later with no webhook server
|
||||
|
||||
Poll the inbox for the default Twilio number:
|
||||
|
||||
```bash
|
||||
python3 "$SCRIPT" twilio-inbox --limit 20
|
||||
```
|
||||
|
||||
Only show messages that arrived after the last checkpoint, and advance the checkpoint when you're done reading:
|
||||
|
||||
```bash
|
||||
python3 "$SCRIPT" twilio-inbox --since-last --mark-seen
|
||||
```
|
||||
|
||||
This is the main answer to “how do I access messages the number receives next time the skill is loaded?”
|
||||
|
||||
### D. Make a direct Twilio call with built-in TTS
|
||||
|
||||
```bash
|
||||
python3 "$SCRIPT" twilio-call "+15551230000" --message "Hello! This is Hermes calling with your status update." --voice Polly.Joanna
|
||||
```
|
||||
|
||||
### E. Call with a prerecorded / custom voice message
|
||||
|
||||
This is the main path for reusing Hermes's existing `text_to_speech` support.
|
||||
|
||||
Use this when:
|
||||
- you want the call to use Hermes's configured TTS voice rather than Twilio `<Say>`
|
||||
- you want a one-way voice delivery (briefing, alert, joke, reminder, status update)
|
||||
- you do **not** need a live conversational phone call
|
||||
|
||||
Generate or host audio separately, then:
|
||||
|
||||
```bash
|
||||
python3 "$SCRIPT" twilio-call "+155****0000" --audio-url "https://example.com/briefing.mp3"
|
||||
```
|
||||
|
||||
Recommended Hermes TTS -> Twilio Play workflow:
|
||||
|
||||
1. Generate the audio with Hermes `text_to_speech`.
|
||||
2. Make the resulting MP3 publicly reachable.
|
||||
3. Place the Twilio call with `--audio-url`.
|
||||
|
||||
Example agent flow:
|
||||
- Ask Hermes to create the message audio with `text_to_speech`
|
||||
- If needed, expose the file with a temporary static host / tunnel / object storage URL
|
||||
- Use `twilio-call --audio-url ...` to deliver it by phone
|
||||
|
||||
Good hosting options for the MP3:
|
||||
- a temporary public object/storage URL
|
||||
- a short-lived tunnel to a local static file server
|
||||
- any existing HTTPS URL the phone provider can fetch directly
|
||||
|
||||
Important note:
|
||||
- Hermes TTS is great for prerecorded outbound messages
|
||||
- Bland/Vapi are better for **live conversational AI calls** because they handle the real-time telephony audio stack themselves
|
||||
- Hermes STT/TTS alone is not being used here as a full duplex phone conversation engine; that would require a much heavier streaming/webhook integration than this skill is trying to introduce
|
||||
|
||||
### F. Navigate a phone tree / IVR with Twilio direct calling
|
||||
|
||||
If you need to press digits after the call connects, use `--send-digits`.
|
||||
Twilio interprets `w` as a short wait.
|
||||
|
||||
```bash
|
||||
python3 "$SCRIPT" twilio-call "+18005551234" --message "Connecting to billing now." --send-digits "ww1w2w3"
|
||||
```
|
||||
|
||||
This is useful for reaching a specific menu branch before handing off to a human or delivering a short status message.
|
||||
|
||||
### G. Outbound AI phone call with Bland.ai
|
||||
|
||||
```bash
|
||||
python3 "$SCRIPT" ai-call "+15551230000" "Call the dental office, ask for a cleaning appointment on Tuesday afternoon, and if they do not have Tuesday availability, ask for Wednesday or Thursday instead." --provider bland --voice mason --max-duration 3
|
||||
```
|
||||
|
||||
Check status:
|
||||
|
||||
```bash
|
||||
python3 "$SCRIPT" ai-status <call_id> --provider bland
|
||||
```
|
||||
|
||||
Ask Bland analysis questions after completion:
|
||||
|
||||
```bash
|
||||
python3 "$SCRIPT" ai-status <call_id> --provider bland --analyze "Was the appointment confirmed?,What date and time?,Any special instructions?"
|
||||
```
|
||||
|
||||
### H. Outbound AI phone call with Vapi on your owned number
|
||||
|
||||
1. Import your Twilio number into Vapi:
|
||||
```bash
|
||||
python3 "$SCRIPT" vapi-import-twilio --save-env
|
||||
```
|
||||
|
||||
2. Place the call:
|
||||
```bash
|
||||
python3 "$SCRIPT" ai-call "+15551230000" "You are calling to make a dinner reservation for two at 7:30 PM. If that is unavailable, ask for the nearest time between 6:30 and 8:30 PM." --provider vapi --max-duration 4
|
||||
```
|
||||
|
||||
3. Check result:
|
||||
```bash
|
||||
python3 "$SCRIPT" ai-status <call_id> --provider vapi
|
||||
```
|
||||
|
||||
## Suggested agent procedure
|
||||
|
||||
When the user asks for a call or text:
|
||||
|
||||
1. Determine which path fits the request via the decision tree.
|
||||
2. Run `diagnose` if configuration state is unclear.
|
||||
3. Gather the full task details.
|
||||
4. Confirm with the user before dialing or texting.
|
||||
5. Use the correct command.
|
||||
6. Poll for results if needed.
|
||||
7. Summarize the outcome without persisting third-party numbers to Hermes memory.
|
||||
|
||||
## What this skill still does not do
|
||||
|
||||
- real-time inbound call answering
|
||||
- webhook-based live SMS push into the agent loop
|
||||
- guaranteed support for arbitrary third-party 2FA providers
|
||||
|
||||
Those would require more infrastructure than a pure optional skill.
|
||||
|
||||
## Pitfalls
|
||||
|
||||
- Twilio trial accounts and regional rules can restrict who you can call/text.
|
||||
- Some services reject VoIP numbers for 2FA.
|
||||
- `twilio-inbox` polls the REST API; it is not instant push delivery.
|
||||
- Vapi outbound calling still depends on having a valid imported number.
|
||||
- Bland is easiest, but not always the best-sounding.
|
||||
- Do not store arbitrary third-party phone numbers in Hermes memory.
|
||||
|
||||
## Verification checklist
|
||||
|
||||
After setup, you should be able to do all of the following with just this skill:
|
||||
|
||||
1. `diagnose` shows provider readiness and remembered state
|
||||
2. search and buy a Twilio number
|
||||
3. persist that number to `~/.hermes/.env`
|
||||
4. send an SMS from the owned number
|
||||
5. poll inbound texts for the owned number later
|
||||
6. place a direct Twilio call
|
||||
7. place an AI call via Bland or Vapi
|
||||
|
||||
## References
|
||||
|
||||
- Twilio phone numbers: https://www.twilio.com/docs/phone-numbers/api
|
||||
- Twilio messaging: https://www.twilio.com/docs/messaging/api/message-resource
|
||||
- Twilio voice: https://www.twilio.com/docs/voice/api/call-resource
|
||||
- Vapi docs: https://docs.vapi.ai/
|
||||
- Bland.ai: https://app.bland.ai/
|
||||
1343
optional-skills/productivity/telephony/scripts/telephony.py
Normal file
1343
optional-skills/productivity/telephony/scripts/telephony.py
Normal file
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user