mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-28 06:51:16 +08:00
feat(plugins): google_meet \u2014 join, transcribe, speak, follow up (#16364)
* feat(plugins): google_meet — bundled plugin for join+transcribe Meet calls v1 shipping transcribe-only. Spawns headless Chromium via Playwright, joins an explicit https://meet.google.com/ URL, enables live captions, and scrapes them into a transcript file the agent can read across turns. The agent then has the meeting content in context and can do followup work (send recap, file issues, schedule followups) with its regular tools. Surface: - Tools: meet_join, meet_status, meet_transcript, meet_leave, meet_say (meet_say is a v1 stub — returns not-implemented; v2 will wire realtime duplex audio via OpenAI Realtime / Gemini Live + BlackHole / PulseAudio null-sink.) - CLI: hermes meet setup | auth | join | status | transcript | stop - Lifecycle: on_session_end auto-leaves any still-running bot. Safety: - URL regex rejects anything that isn't https://meet.google.com/... - No calendar scanning, no auto-dial, no auto-consent announcement. - Single active meeting per install; a second meet_join leaves the first. - Platform-gated to Linux + macOS (Windows audio routing for v2 untested). - Opt-in: standalone plugin, user must add 'google_meet' to plugins.enabled in config.yaml. Zero core changes. Plugin uses existing register_tool / register_cli_command / register_hook surfaces. 21 new unit tests cover the URL safety gate, transcript dedup + status round-trip, process-manager refusals/start/stop paths, tool-handler JSON shape under each branch, session-end cleanup, and platform-gated register(). * feat(plugins/google_meet): v2 realtime audio + v3 remote node host v2 \u2014 agent speaks in-meeting audio_bridge.py: PulseAudio null-sink (Linux) + BlackHole probe (macOS). On Linux we load pactl module-null-sink + module-virtual-source, track module ids for teardown; Chrome gets PULSE_SOURCE=<virt src> env so its fake mic reads what we write to the sink. macOS just probes BlackHole 2ch and returns its device name \u2014 the plugin refuses to switch the user's default audio input (that would surprise them). realtime/openai_client.py: sync WebSocket client for the OpenAI Realtime API. RealtimeSession.speak(text) sends conversation.item.create + response.create, accumulates response.audio.delta PCM bytes, appends them to a file. RealtimeSpeaker runs a JSONL-queue loop consuming meet_say calls. 'websockets' is an optional dep imported lazily. meet_bot.py: when HERMES_MEET_MODE=realtime, provisions AudioBridge, starts RealtimeSession + speaker thread, spawns paplay to pump PCM into the null-sink, then cleans everything up on SIGTERM. If any realtime setup step fails, falls back cleanly to transcribe mode with an error flagged in status.json. process_manager.enqueue_say(): writes a JSONL line to say_queue.jsonl; refuses when no active meeting or active meeting is transcribe-only. tools.meet_say: real implementation; requires active mode='realtime'. meet_join: adds mode='transcribe'|'realtime' param. v3 \u2014 remote node host node/protocol.py: JSON envelope (type, id, token, payload) + validate. node/registry.py: $HERMES_HOME/workspace/meetings/nodes.json, with resolve() auto-selecting the sole registered node when name is None. node/server.py: NodeServer \u2014 websockets.serve, bearer-token auth, dispatches start_bot/stop/status/transcript/say/ping onto the local process_manager. Token auto-generated + persisted on first run. node/client.py: NodeClient \u2014 short-lived sync WS per RPC, raises RuntimeError on error envelopes, clean API matching the server. node/cli.py: 'hermes meet node {run,list,approve,remove,status,ping}' subtree; wired into the main meet CLI by cli.py so 'hermes meet node' Just Works. tools.py: every meet_* tool accepts node='<name>'|'auto'; when set, routes through NodeClient to the remote bot instead of running locally. Unknown node \u2192 clear 'no registered meet node matches ...' error. cli.py: 'hermes meet join --node my-mac --mode realtime' and 'hermes meet say "..." --node my-mac' route to the node; 'hermes meet node approve <name> <url> <token>' registers one. Tests 21 v1 tests updated (meet_say is no longer a stub; active-record now carries mode). 20 new audio_bridge + realtime tests. 42 new node tests (protocol/registry/server/client/cli). 17 new v1/v2/v3 integration tests at the plugin level covering enqueue_say edge cases, env var passthrough, mode validation, node routing (known/unknown/auto/ambiguous), and argparse wiring for `hermes meet say` + `hermes meet node` + --mode/--node flags. Total: 100 plugin tests + 58 plugin-system tests = 158 passing. E2E verified on Linux with fresh HERMES_HOME: plugin loads, 5 tools register, on_session_end hook wires, 'hermes meet' CLI tree wires including the node subtree, NodeRegistry round-trips, meet_join routes correctly to NodeClient under node='my-mac' with mode='realtime', enqueue_say accepts realtime/rejects transcribe, argparse parses every new flag cleanly. Zero changes to core. All new code lives under plugins/google_meet/. * feat(plugins/google_meet): auto-install, admission detect, mac PCM pump, barge-in, richer status Ready-for-live-test follow-up on PR #16364. Five additions that matter for the first live run on a real Meet, in priority order: 1. hermes meet install [--realtime] [--yes] pip install playwright websockets + python -m playwright install chromium --realtime: installs platform audio deps (pulseaudio-utils on Linux via sudo apt, blackhole-2ch + ffmpeg on macOS via brew). Prompts before sudo/brew unless --yes. Refuses on Windows. Refuses to auto-flip the macOS default input — user still selects BlackHole in System Settings (deliberate; surprise audio rerouting is worse than a manual step). 2. Admission detection _detect_admission(page): Leave-button visible OR caption region attached OR participants list present → we're in-call. _detect_denied(page): 'You can\'t join this video call' / 'You were removed' / 'No one responded to your request' → bail out. HERMES_MEET_LOBBY_TIMEOUT (default 300s) caps how long we sit in the lobby before giving up. in_call stays False until admitted. Status surfaces leaveReason: duration_expired | lobby_timeout | denied | page_closed. 3. macOS PCM pump ffmpeg reads speaker.pcm (24kHz s16le mono) and writes to the BlackHole AVFoundation output via -f audiotoolbox -audio_device_index <N>. _mac_audio_device_index() probes ffmpeg -f avfoundation -list_devices true to resolve 'BlackHole 2ch' → numeric index. Falls back to index 0 on probe failure. Linux paplay pump unchanged. 4. Richer status dict _BotState now tracks realtime, realtimeReady, realtimeDevice, audioBytesOut, lastAudioOutAt, lastBargeInAt, joinAttemptedAt, leaveReason. RealtimeSession.audio_bytes_out / last_audio_out_at counters fold into the status file once a second so meet_status() can show the agent's voice activity in near-real-time. 5. Barge-in RealtimeSession.cancel_response() sends type='response.cancel' over the same WS (lock-guarded so it's safe to call from the caption thread while speak() is reading frames). Handles response.cancelled as a terminal frame type. _looks_like_human_speaker() gates triggers so the bot's own name, 'You', 'Unknown', and blanks don't self-cancel. Called from the caption drain loop: when a new caption arrives attributed to a real participant while rt.session exists, we fire cancel_response() and stamp lastBargeInAt. Tests: 20 new unit tests across _BotState telemetry, barge-in gating, admission/denied probe error handling, cancel_response with and without a connected WS, and `hermes meet install` CLI wiring (flag parsing + end-to-end subprocess.run verification + Linux-already-installed fast path). Total 171 passing across all google_meet test files + the plugin-system regression suite. E2E verified on Linux: plugin loads, all 5 tools register, `hermes meet install --realtime --yes` parses, fresh-bot status.json has every new telemetry key, cancel_response on a disconnected session returns False without raising, barge-in helper gates the bot's own name correctly. Still out of scope (for a future PR, not blocking live test): mic → Realtime duplex (the agent listening to meeting audio via WebRTC), node-host TLS/pairing UX, Windows audio, Meet create+Twilio. Docs updated: SKILL.md now lists the installer subcommand, lobby timeout, barge-in caveat, and the full status-dict reference table. README.md quick-start uses hermes meet install.
This commit is contained in:
131
plugins/google_meet/README.md
Normal file
131
plugins/google_meet/README.md
Normal file
@@ -0,0 +1,131 @@
|
||||
# google_meet plugin
|
||||
|
||||
Let the hermes agent join a Google Meet call, transcribe it, optionally speak
|
||||
in it, and do the followup work afterwards.
|
||||
|
||||
## What ships
|
||||
|
||||
| Version | What | Status |
|
||||
|---|---|---|
|
||||
| v1 | Transcribe-only: Playwright joins Meet, scrapes captions to transcript file | ✓ ships by default |
|
||||
| v2 | Realtime duplex audio: bot speaks in-call via OpenAI Realtime + BlackHole/PulseAudio null-sink | ✓ opt in with `mode='realtime'` |
|
||||
| v3 | Remote node host: run the bot on a different machine than the gateway | ✓ opt in with `node='<name>'` |
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─ gateway (Linux box, where hermes runs) ────────────────────────────┐
|
||||
│ │
|
||||
│ agent → meet_join(url, mode='realtime', node='my-mac') │
|
||||
│ │ │
|
||||
│ └─ NodeClient ─── ws ────┐ │
|
||||
│ │ │
|
||||
└──────────────────────────────────┼───────────────────────────────────┘
|
||||
│ wss (token auth)
|
||||
▼
|
||||
┌─ node host (user's Mac, signed-in Chrome lives here) ───────────────┐
|
||||
│ │
|
||||
│ NodeServer (from `hermes meet node run`) │
|
||||
│ │ │
|
||||
│ ├─ start_bot → process_manager.start() → spawns meet_bot │
|
||||
│ │ │
|
||||
│ └─ meet_bot (Playwright) │
|
||||
│ ├─ Chromium → meet.google.com │
|
||||
│ ├─ caption scraper → transcript.txt │
|
||||
│ └─ (realtime mode only) RealtimeSpeaker thread │
|
||||
│ ↓ │
|
||||
│ OpenAI Realtime WS → speaker.pcm │
|
||||
│ ↓ │
|
||||
│ paplay → null-sink ← Chrome fake mic │
|
||||
│ │
|
||||
└──────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
Without v3: the whole right column runs on the gateway machine.
|
||||
Without v2: the "realtime" path is skipped; transcribe runs alone.
|
||||
|
||||
## Files
|
||||
|
||||
| Path | Purpose |
|
||||
|---|---|
|
||||
| `plugin.yaml` | manifest |
|
||||
| `__init__.py` | `register(ctx)` — registers 5 tools + `on_session_end` hook + `hermes meet` CLI |
|
||||
| `meet_bot.py` | Playwright bot subprocess (standalone, `python -m plugins.google_meet.meet_bot`) |
|
||||
| `process_manager.py` | local bot lifecycle + `enqueue_say` |
|
||||
| `tools.py` | agent-facing tools + node-routing helper |
|
||||
| `cli.py` | `hermes meet setup / auth / join / status / transcript / say / stop / node ...` |
|
||||
| `audio_bridge.py` | v2: PulseAudio null-sink (Linux) + BlackHole probe (macOS) |
|
||||
| `realtime/openai_client.py` | v2: `RealtimeSession` + `RealtimeSpeaker` (file-queue → OpenAI Realtime WS → PCM) |
|
||||
| `node/protocol.py` | v3: message envelope + validation |
|
||||
| `node/registry.py` | v3: `$HERMES_HOME/workspace/meetings/nodes.json` |
|
||||
| `node/server.py` | v3: `NodeServer` (runs on host machine) |
|
||||
| `node/client.py` | v3: `NodeClient` (used by tool handlers + CLI on gateway) |
|
||||
| `node/cli.py` | v3: `hermes meet node {run,list,approve,remove,status,ping}` |
|
||||
| `SKILL.md` | agent usage guide |
|
||||
|
||||
## Local quick start
|
||||
|
||||
```bash
|
||||
hermes plugins enable google_meet
|
||||
hermes meet install # pip + Chromium
|
||||
hermes meet setup # preflight
|
||||
hermes meet auth # optional
|
||||
hermes meet join https://meet.google.com/abc-defg-hij # transcribe
|
||||
```
|
||||
|
||||
## Realtime mode
|
||||
|
||||
Linux (preferred, most automated):
|
||||
```bash
|
||||
hermes meet install --realtime # installs pulseaudio-utils
|
||||
echo 'OPENAI_API_KEY=sk-...' >> ~/.hermes/.env
|
||||
hermes meet join https://meet.google.com/abc-defg-hij --mode realtime
|
||||
# then from the agent or CLI:
|
||||
hermes meet say "Good morning everyone, I'm the note-taker bot."
|
||||
```
|
||||
|
||||
macOS:
|
||||
```bash
|
||||
hermes meet install --realtime # runs: brew install blackhole-2ch ffmpeg
|
||||
# then — manually! — open System Settings → Sound → Input → BlackHole 2ch
|
||||
echo 'OPENAI_API_KEY=sk-...' >> ~/.hermes/.env
|
||||
hermes meet join https://meet.google.com/abc-defg-hij --mode realtime
|
||||
```
|
||||
|
||||
On macOS, hermes will **not** switch your system audio input automatically — the
|
||||
user has to do it. This is deliberate: switching default input on a whim would
|
||||
be a surprising side effect.
|
||||
|
||||
## Remote node host
|
||||
|
||||
On the node machine (e.g. user's Mac with a signed-in Chrome):
|
||||
```bash
|
||||
pip install playwright websockets
|
||||
python -m playwright install chromium
|
||||
hermes plugins enable google_meet
|
||||
hermes meet node run --display-name my-mac --host 0.0.0.0 --port 18789
|
||||
# prints the bearer token on first run; copy it
|
||||
```
|
||||
|
||||
On the gateway:
|
||||
```bash
|
||||
hermes meet node approve my-mac ws://<mac-ip>:18789 <token>
|
||||
hermes meet node ping my-mac
|
||||
# now any meet_* tool call accepts node='my-mac' (or 'auto')
|
||||
```
|
||||
|
||||
## Safety
|
||||
|
||||
- URL gate: only `https://meet.google.com/abc-defg-hij`, `/new`, `/lookup/<id>`.
|
||||
- No calendar scanning, no auto-dial, no auto-consent announcement.
|
||||
- Node server uses bearer-token auth; no key exchange, no TLS termination
|
||||
built in — run it on a LAN or behind a reverse proxy you trust.
|
||||
- One active meeting per (gateway, node) pair. A second `meet_join` leaves the first.
|
||||
- `meet_say` refuses unless the active meeting was started with `mode='realtime'`.
|
||||
|
||||
## Out of scope
|
||||
|
||||
- **Calendar scanning** — deliberately not implemented. Join URLs must be explicit.
|
||||
- **Multi-tenant node sharing** — a node serves one gateway at a time.
|
||||
- **Windows** — audio bridging isn't tested; `register()` no-ops on Windows.
|
||||
- **System audio input switching on macOS** — user responsibility, not the bot's.
|
||||
148
plugins/google_meet/SKILL.md
Normal file
148
plugins/google_meet/SKILL.md
Normal file
@@ -0,0 +1,148 @@
|
||||
---
|
||||
name: google_meet
|
||||
description: Join a Google Meet call, transcribe live captions, optionally speak in realtime, and do the followup work afterwards. Use when the user asks the agent to sit in on a meeting, take notes, summarize, respond in-call, or action items from it.
|
||||
version: 0.2.0
|
||||
platforms:
|
||||
- linux
|
||||
- macos
|
||||
metadata:
|
||||
hermes:
|
||||
tags: [meetings, google-meet, transcription, realtime-voice]
|
||||
---
|
||||
|
||||
# google_meet
|
||||
|
||||
## When to use
|
||||
|
||||
The user says any of:
|
||||
|
||||
- "join my Meet at <url>"
|
||||
- "take notes on this meeting"
|
||||
- "summarize the meeting and send followups"
|
||||
- "sit in on my standup"
|
||||
- "be a bot in this call and speak up when X"
|
||||
|
||||
## Two modes
|
||||
|
||||
| Mode | What the bot does |
|
||||
|---|---|
|
||||
| `transcribe` (default) | Joins, enables captions, scrapes a transcript. Listen-only. |
|
||||
| `realtime` | Same as transcribe PLUS speaks into the meeting via OpenAI Realtime. The agent calls `meet_say(text)` and the bot's voice comes out of the call. |
|
||||
|
||||
Pick `realtime` only when the user actually wants the agent to speak. It costs real money (OpenAI Realtime is pay-per-audio-minute) and requires a virtual audio device set up on the machine running the bot.
|
||||
|
||||
## Two locations
|
||||
|
||||
| Location | When |
|
||||
|---|---|
|
||||
| Local (default) | Gateway machine runs the Playwright bot directly. |
|
||||
| Remote node (`node="<name>"`) | Bot runs on a different machine that has a signed-in Chrome and (for realtime) a configured audio bridge. Useful when the gateway runs on a headless Linux box but the user's real signed-in Chrome lives on their Mac. |
|
||||
|
||||
## Prerequisites the user must handle once
|
||||
|
||||
Easiest path — run the built-in installer:
|
||||
|
||||
```bash
|
||||
hermes plugins enable google_meet
|
||||
hermes meet install # pip deps + Chromium (transcribe only)
|
||||
hermes meet install --realtime # + pulseaudio-utils / brew blackhole+ffmpeg
|
||||
hermes meet auth # optional; skips guest-lobby wait
|
||||
hermes meet setup # preflight checks
|
||||
```
|
||||
|
||||
`hermes meet install --realtime` prompts before running `sudo apt-get` (Linux)
|
||||
or `brew install` (macOS). Pass `--yes` to skip the prompt. It will NOT touch
|
||||
your macOS default-input setting — you have to select BlackHole 2ch in
|
||||
System Settings yourself before starting a realtime meeting.
|
||||
|
||||
Or do it manually:
|
||||
```bash
|
||||
pip install playwright websockets && python -m playwright install chromium
|
||||
|
||||
# For realtime mode, additionally:
|
||||
# Linux: sudo apt install pulseaudio-utils
|
||||
# macOS: brew install blackhole-2ch ffmpeg
|
||||
# → System Settings → Sound → Input → BlackHole 2ch
|
||||
# Then set OPENAI_API_KEY or HERMES_MEET_REALTIME_KEY in ~/.hermes/.env
|
||||
```
|
||||
|
||||
For a remote node:
|
||||
```bash
|
||||
# on the user's Mac (where Chrome is signed in):
|
||||
pip install playwright websockets && python -m playwright install chromium
|
||||
hermes plugins enable google_meet
|
||||
hermes meet node run --display-name my-mac # persistent server
|
||||
# copy the printed token
|
||||
|
||||
# on the gateway:
|
||||
hermes meet node approve my-mac ws://<mac-ip>:18789 <token>
|
||||
hermes meet node ping my-mac # confirm reachable
|
||||
```
|
||||
|
||||
Run `hermes meet setup` to preflight local prereqs.
|
||||
|
||||
## Flow
|
||||
|
||||
1. **Join** — call `meet_join(url=..., mode=..., node=...)`. Returns immediately.
|
||||
2. **Announce yourself** — no auto-consent. Say (in whatever channel the user is watching): "A Hermes agent bot is in this call taking notes."
|
||||
3. **Poll** — `meet_status()` for liveness, `meet_transcript(last=20)` for recent captions. Don't re-read the whole transcript every turn.
|
||||
4. **Speak (realtime only)** — `meet_say(text="...")` queues text for TTS. The speech lags by ~2s. Don't spam it.
|
||||
5. **Leave** — `meet_leave()` when done, or set `duration="30m"` on `meet_join` for auto-leave.
|
||||
6. **Follow up** — read `meet_transcript()` in full, summarize, and use regular tools to send the recap, file issues, schedule followups.
|
||||
|
||||
## Tool reference
|
||||
|
||||
| Tool | Parameters | Use |
|
||||
|---|---|---|
|
||||
| `meet_join` | `url`, `mode?`, `guest_name?`, `duration?`, `headed?`, `node?` | Start bot |
|
||||
| `meet_status` | `node?` | Liveness + progress |
|
||||
| `meet_transcript` | `last?`, `node?` | Read captions |
|
||||
| `meet_leave` | `node?` | Close bot |
|
||||
| `meet_say` | `text`, `node?` | Speak in realtime meeting |
|
||||
|
||||
`node?` on all tools: pass a registered node name (or `"auto"` for the sole node) to operate a remote bot instead of a local one. Omit for local.
|
||||
|
||||
## Important limits
|
||||
|
||||
- Captions are only as good as Google Meet's live captions. English-biased, lossy on overlapping speakers.
|
||||
- Guest mode sits in the lobby until a host admits. Warn the user; `hermes meet auth` avoids this.
|
||||
- **Lobby timeout**: if the host doesn't admit the bot within 5 minutes (configurable via `HERMES_MEET_LOBBY_TIMEOUT` env), the bot leaves and `meet_status` reports `leaveReason: "lobby_timeout"`.
|
||||
- **One active meeting per install per location.** A second `meet_join` leaves the first.
|
||||
- **Windows not supported.**
|
||||
- Realtime mode needs a virtual audio device. If the audio bridge setup fails, the bot falls back to transcribe mode and flags it in `meet_status().error`.
|
||||
- `meet_say` requires `mode='realtime'` on the originating `meet_join`. Calling it against a transcribe-mode meeting returns a clear error.
|
||||
- **Barge-in is best-effort.** When a caption arrives attributed to a real participant while the bot is generating audio, the bot sends `response.cancel` to OpenAI Realtime. Captions take ~500ms to show up, so the bot will talk over the first second or so of a human interruption.
|
||||
|
||||
## Status dict reference
|
||||
|
||||
`meet_status()` returns (subset shown, there are more):
|
||||
|
||||
| Key | Meaning |
|
||||
|---|---|
|
||||
| `inCall` | Past the lobby. False while waiting for admission. |
|
||||
| `lobbyWaiting` | Clicked "Ask to join", waiting on host. |
|
||||
| `joinAttemptedAt` / `joinedAt` | Timestamps for lobby-click and actual admission. |
|
||||
| `captioning` | Caption observer is installed. |
|
||||
| `transcriptLines` / `lastCaptionAt` | Transcript progress. |
|
||||
| `realtime` / `realtimeReady` | Realtime mode provisioned / WS connected. |
|
||||
| `realtimeDevice` | Audio device name the bot is feeding (e.g. `hermes_meet_src`). |
|
||||
| `audioBytesOut` / `lastAudioOutAt` | How much PCM the OpenAI session has produced. |
|
||||
| `lastBargeInAt` | Timestamp of the most recent `response.cancel` sent. |
|
||||
| `leaveReason` | `duration_expired`, `lobby_timeout`, `denied`, `page_closed`, or null. |
|
||||
| `error` | Last error (soft — bot may still be running). |
|
||||
|
||||
## Transcript location
|
||||
|
||||
Local:
|
||||
```
|
||||
$HERMES_HOME/workspace/meetings/<meeting-id>/transcript.txt
|
||||
```
|
||||
|
||||
Remote node: transcript lives on the node host's disk. Use `meet_transcript(node=...)` to read it over RPC.
|
||||
|
||||
## Safety
|
||||
|
||||
- URL regex: only `https://meet.google.com/...` URLs pass.
|
||||
- No calendar scanning. No auto-dial.
|
||||
- Remote nodes use bearer-token auth; tokens are generated on the node (32 hex chars, persisted in `$HERMES_HOME/workspace/meetings/node_token.json`) and must be copied to the gateway via `hermes meet node approve`.
|
||||
- `meet_say` text is rate-limited by the OpenAI Realtime session; spam-protection is the bot's problem, not yours, but still — don't queue hundreds of lines.
|
||||
103
plugins/google_meet/__init__.py
Normal file
103
plugins/google_meet/__init__.py
Normal file
@@ -0,0 +1,103 @@
|
||||
"""google_meet plugin — let the agent join a Meet call, transcribe it, follow up.
|
||||
|
||||
v1: transcribe-only. Spawns a headless Chromium via Playwright, joins the Meet
|
||||
URL, enables live captions, scrapes them into a transcript file. The agent then
|
||||
has the transcript in its workspace and can do whatever followup work it needs
|
||||
using its regular tools.
|
||||
|
||||
v2 (not in this PR): realtime duplex audio so the agent can speak in the
|
||||
meeting, via OpenAI Realtime / Gemini Live + BlackHole / PulseAudio null-sink.
|
||||
``meet_say`` exists as a stub today so the tool surface is stable.
|
||||
|
||||
Explicit-by-design: only joins ``https://meet.google.com/`` URLs explicitly
|
||||
passed in. No calendar scanning, no auto-dial, no consent announcement.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
import platform
|
||||
|
||||
from plugins.google_meet import process_manager as pm
|
||||
from plugins.google_meet.cli import register_cli as _register_meet_cli
|
||||
from plugins.google_meet.cli import meet_command as _meet_command
|
||||
from plugins.google_meet.tools import (
|
||||
MEET_JOIN_SCHEMA,
|
||||
MEET_LEAVE_SCHEMA,
|
||||
MEET_SAY_SCHEMA,
|
||||
MEET_STATUS_SCHEMA,
|
||||
MEET_TRANSCRIPT_SCHEMA,
|
||||
check_meet_requirements,
|
||||
handle_meet_join,
|
||||
handle_meet_leave,
|
||||
handle_meet_say,
|
||||
handle_meet_status,
|
||||
handle_meet_transcript,
|
||||
)
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
_TOOLS = (
|
||||
("meet_join", MEET_JOIN_SCHEMA, handle_meet_join, "📞"),
|
||||
("meet_status", MEET_STATUS_SCHEMA, handle_meet_status, "🟢"),
|
||||
("meet_transcript", MEET_TRANSCRIPT_SCHEMA, handle_meet_transcript, "📝"),
|
||||
("meet_leave", MEET_LEAVE_SCHEMA, handle_meet_leave, "👋"),
|
||||
("meet_say", MEET_SAY_SCHEMA, handle_meet_say, "🗣️"),
|
||||
)
|
||||
|
||||
|
||||
def _on_session_end(**kwargs) -> None:
|
||||
"""Best-effort cleanup — if a meet bot is still running when the session
|
||||
ends, leave the call so we don't orphan a headless Chromium.
|
||||
|
||||
No-ops when nothing is active. Swallows all exceptions — session end must
|
||||
not fail because the bot cleanup hit an edge case.
|
||||
"""
|
||||
try:
|
||||
status = pm.status()
|
||||
if status.get("ok") and status.get("alive"):
|
||||
pm.stop(reason="session ended")
|
||||
except Exception as e: # pragma: no cover — defensive
|
||||
logger.debug("google_meet on_session_end cleanup failed: %s", e)
|
||||
|
||||
|
||||
def register(ctx) -> None:
|
||||
"""Register tools, CLI, and lifecycle hooks.
|
||||
|
||||
Called once by the plugin loader when the plugin is enabled via
|
||||
``plugins.enabled`` in config.yaml.
|
||||
"""
|
||||
# Windows is not supported in v1 — audio routing for v2 doesn't have a
|
||||
# tested path there and guest-join Chromium is flakier. Refuse to register
|
||||
# rather than half-working.
|
||||
system = platform.system().lower()
|
||||
if system not in ("linux", "darwin"):
|
||||
logger.info(
|
||||
"google_meet plugin: platform=%s not supported (linux/macos only)",
|
||||
system,
|
||||
)
|
||||
return
|
||||
|
||||
for name, schema, handler, emoji in _TOOLS:
|
||||
ctx.register_tool(
|
||||
name=name,
|
||||
toolset="google_meet",
|
||||
schema=schema,
|
||||
handler=handler,
|
||||
check_fn=check_meet_requirements,
|
||||
emoji=emoji,
|
||||
)
|
||||
|
||||
ctx.register_cli_command(
|
||||
name="meet",
|
||||
help="Google Meet bot (join, transcribe, follow up)",
|
||||
setup_fn=_register_meet_cli,
|
||||
handler_fn=_meet_command,
|
||||
description=(
|
||||
"Let the hermes agent join a Google Meet call and scrape live "
|
||||
"captions into a transcript. See: hermes meet setup"
|
||||
),
|
||||
)
|
||||
|
||||
ctx.register_hook("on_session_end", _on_session_end)
|
||||
244
plugins/google_meet/audio_bridge.py
Normal file
244
plugins/google_meet/audio_bridge.py
Normal file
@@ -0,0 +1,244 @@
|
||||
"""Virtual audio bridge for feeding generated speech into Chrome's mic.
|
||||
|
||||
v2 module. Provisions a platform-specific virtual audio device so the
|
||||
Meet bot's Chromium instance can be pointed at an input source we
|
||||
control. The OpenAI Realtime client writes PCM bytes into this device;
|
||||
Chrome reads them as if they were coming from a microphone.
|
||||
|
||||
Linux (primary): uses pactl (PulseAudio) to create a null-sink plus a
|
||||
virtual source whose master is the null-sink's monitor. Callers set
|
||||
PULSE_SOURCE=<source_name> in Chrome's env and pass the fake-mic flag.
|
||||
|
||||
macOS: requires BlackHole 2ch to be installed. This module only
|
||||
verifies its presence and returns the device name; routing OS default
|
||||
input is left to the user (or a future switchaudio-osx integration) to
|
||||
avoid surprising the user's system audio state.
|
||||
|
||||
Windows: not supported in v2.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import platform
|
||||
import subprocess
|
||||
from typing import Optional
|
||||
|
||||
|
||||
_BLACKHOLE_DEVICE = "BlackHole 2ch"
|
||||
|
||||
|
||||
class AudioBridge:
|
||||
"""Manages a virtual audio device for Chrome fake-mic input.
|
||||
|
||||
Call ``setup()`` once before launching the Meet bot and
|
||||
``teardown()`` when the session ends. ``teardown()`` is idempotent.
|
||||
"""
|
||||
|
||||
def __init__(self, name_prefix: str = "hermes_meet") -> None:
|
||||
self._name_prefix = name_prefix
|
||||
self._platform: Optional[str] = None
|
||||
self._device_name: Optional[str] = None
|
||||
self._write_target: Optional[str] = None
|
||||
self._module_ids: list[int] = []
|
||||
self._torn_down = False
|
||||
|
||||
# ── public properties ─────────────────────────────────────────────────
|
||||
|
||||
@property
|
||||
def device_name(self) -> str:
|
||||
if not self._device_name:
|
||||
raise RuntimeError("AudioBridge not set up yet")
|
||||
return self._device_name
|
||||
|
||||
@property
|
||||
def write_target(self) -> str:
|
||||
if not self._write_target:
|
||||
raise RuntimeError("AudioBridge not set up yet")
|
||||
return self._write_target
|
||||
|
||||
# ── lifecycle ─────────────────────────────────────────────────────────
|
||||
|
||||
def setup(self) -> dict:
|
||||
"""Provision the virtual audio device.
|
||||
|
||||
Returns a dict describing the device. Raises RuntimeError on
|
||||
unsupported platforms or when required system tools are missing.
|
||||
"""
|
||||
system = platform.system()
|
||||
if system == "Linux":
|
||||
return self._setup_linux()
|
||||
if system == "Darwin":
|
||||
return self._setup_darwin()
|
||||
if system == "Windows":
|
||||
raise RuntimeError("windows not supported in v2")
|
||||
raise RuntimeError(f"unsupported platform: {system}")
|
||||
|
||||
def teardown(self) -> None:
|
||||
"""Release the virtual audio device. Idempotent."""
|
||||
if self._torn_down:
|
||||
return
|
||||
# Only Linux needs explicit unloading.
|
||||
if self._platform == "linux" and self._module_ids:
|
||||
# Unload in reverse order (virtual-source before null-sink).
|
||||
for mod_id in reversed(self._module_ids):
|
||||
try:
|
||||
subprocess.run(
|
||||
["pactl", "unload-module", str(mod_id)],
|
||||
check=False,
|
||||
capture_output=True,
|
||||
)
|
||||
except Exception:
|
||||
# Best-effort teardown — never raise from here.
|
||||
pass
|
||||
self._module_ids = []
|
||||
self._torn_down = True
|
||||
|
||||
# ── platform impls ────────────────────────────────────────────────────
|
||||
|
||||
def _setup_linux(self) -> dict:
|
||||
sink_name = f"{self._name_prefix}_sink"
|
||||
src_name = f"{self._name_prefix}_src"
|
||||
|
||||
try:
|
||||
sink_out = subprocess.run(
|
||||
[
|
||||
"pactl",
|
||||
"load-module",
|
||||
"module-null-sink",
|
||||
f"sink_name={sink_name}",
|
||||
f"sink_properties=device.description=HermesMeetSink",
|
||||
],
|
||||
check=True,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
except FileNotFoundError as exc:
|
||||
raise RuntimeError(
|
||||
"pactl not found — install PulseAudio/pipewire-pulse"
|
||||
) from exc
|
||||
except subprocess.CalledProcessError as exc:
|
||||
raise RuntimeError(
|
||||
f"pactl load-module null-sink failed: {exc.stderr or exc}"
|
||||
) from exc
|
||||
|
||||
sink_mod_id = self._parse_module_id(sink_out.stdout)
|
||||
|
||||
try:
|
||||
src_out = subprocess.run(
|
||||
[
|
||||
"pactl",
|
||||
"load-module",
|
||||
"module-virtual-source",
|
||||
f"source_name={src_name}",
|
||||
f"master={sink_name}.monitor",
|
||||
],
|
||||
check=True,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
except subprocess.CalledProcessError as exc:
|
||||
# Roll back the null-sink we just created so we don't leak it.
|
||||
subprocess.run(
|
||||
["pactl", "unload-module", str(sink_mod_id)],
|
||||
check=False,
|
||||
capture_output=True,
|
||||
)
|
||||
raise RuntimeError(
|
||||
f"pactl load-module virtual-source failed: {exc.stderr or exc}"
|
||||
) from exc
|
||||
|
||||
src_mod_id = self._parse_module_id(src_out.stdout)
|
||||
|
||||
self._platform = "linux"
|
||||
self._device_name = src_name
|
||||
self._write_target = sink_name
|
||||
self._module_ids = [sink_mod_id, src_mod_id]
|
||||
self._torn_down = False
|
||||
|
||||
return {
|
||||
"platform": "linux",
|
||||
"device_name": src_name,
|
||||
"sample_rate": 48000,
|
||||
"channels": 2,
|
||||
"module_ids": list(self._module_ids),
|
||||
"write_target": sink_name,
|
||||
}
|
||||
|
||||
def _setup_darwin(self) -> dict:
|
||||
try:
|
||||
out = subprocess.check_output(
|
||||
["system_profiler", "SPAudioDataType"],
|
||||
text=True,
|
||||
stderr=subprocess.STDOUT,
|
||||
)
|
||||
except FileNotFoundError as exc:
|
||||
raise RuntimeError(
|
||||
"system_profiler not found (macOS-only command)"
|
||||
) from exc
|
||||
except subprocess.CalledProcessError as exc:
|
||||
raise RuntimeError(
|
||||
f"system_profiler failed: {exc.output}"
|
||||
) from exc
|
||||
|
||||
if "BlackHole" not in out:
|
||||
raise RuntimeError(
|
||||
"BlackHole virtual audio device not installed. "
|
||||
"Install via: brew install blackhole-2ch"
|
||||
)
|
||||
|
||||
self._platform = "darwin"
|
||||
self._device_name = _BLACKHOLE_DEVICE
|
||||
self._write_target = _BLACKHOLE_DEVICE
|
||||
self._module_ids = []
|
||||
self._torn_down = False
|
||||
|
||||
return {
|
||||
"platform": "darwin",
|
||||
"device_name": _BLACKHOLE_DEVICE,
|
||||
"sample_rate": 48000,
|
||||
"channels": 2,
|
||||
"module_ids": [],
|
||||
"write_target": _BLACKHOLE_DEVICE,
|
||||
}
|
||||
|
||||
# ── helpers ──────────────────────────────────────────────────────────
|
||||
|
||||
@staticmethod
|
||||
def _parse_module_id(stdout: str) -> int:
|
||||
"""pactl load-module prints the new module ID to stdout."""
|
||||
text = (stdout or "").strip()
|
||||
if not text:
|
||||
raise RuntimeError("pactl load-module returned empty stdout")
|
||||
# Take the last whitespace-separated token on the first non-empty line.
|
||||
first = text.splitlines()[0].strip()
|
||||
token = first.split()[-1]
|
||||
try:
|
||||
return int(token)
|
||||
except ValueError as exc:
|
||||
raise RuntimeError(
|
||||
f"could not parse pactl module id from: {stdout!r}"
|
||||
) from exc
|
||||
|
||||
|
||||
def chrome_fake_audio_flags(bridge_info: dict) -> list[str]:
|
||||
"""Return Chrome flags for using the fake audio input.
|
||||
|
||||
The PulseAudio source is selected via the ``PULSE_SOURCE`` env var,
|
||||
which callers must set in Chrome's environment before launch:
|
||||
|
||||
env["PULSE_SOURCE"] = bridge_info["device_name"]
|
||||
|
||||
On macOS the caller must ensure the system default audio input is
|
||||
set to the returned BlackHole device (we do not flip that switch).
|
||||
"""
|
||||
system = platform.system()
|
||||
if system == "Linux":
|
||||
# Chromium on Linux picks up the PulseAudio source selected via
|
||||
# PULSE_SOURCE env var; the fake-ui flag skips the permission
|
||||
# prompt so the bot can pick "use my mic" without user input.
|
||||
return ["--use-fake-ui-for-media-stream"]
|
||||
if system == "Darwin":
|
||||
return ["--use-fake-ui-for-media-stream"]
|
||||
if system == "Windows":
|
||||
raise RuntimeError("windows not supported in v2")
|
||||
raise RuntimeError(f"unsupported platform: {system}")
|
||||
478
plugins/google_meet/cli.py
Normal file
478
plugins/google_meet/cli.py
Normal file
@@ -0,0 +1,478 @@
|
||||
"""CLI commands for the google_meet plugin.
|
||||
|
||||
Wires ``hermes meet <subcommand>``:
|
||||
setup — preflight playwright, chromium, auth file, print fixes
|
||||
auth — open a browser to sign into Google, save storage state
|
||||
join <url> — join a Meet URL synchronously (also callable from the agent)
|
||||
status — print current bot state
|
||||
transcript — print the transcript
|
||||
stop — leave the current meeting
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
from hermes_constants import get_hermes_home
|
||||
|
||||
from plugins.google_meet import process_manager as pm
|
||||
from plugins.google_meet.meet_bot import _is_safe_meet_url
|
||||
|
||||
|
||||
def _auth_state_path() -> Path:
|
||||
return Path(get_hermes_home()) / "workspace" / "meetings" / "auth.json"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# argparse wiring
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def register_cli(subparser: argparse.ArgumentParser) -> None:
|
||||
"""Build the ``hermes meet`` argparse tree.
|
||||
|
||||
Called by :func:`_register_cli_commands` at plugin load time.
|
||||
"""
|
||||
subs = subparser.add_subparsers(dest="meet_command")
|
||||
|
||||
subs.add_parser("setup", help="Preflight: playwright, chromium, auth")
|
||||
|
||||
inst_p = subs.add_parser(
|
||||
"install",
|
||||
help="Install prerequisites (pip deps, Chromium, platform audio tools)",
|
||||
)
|
||||
inst_p.add_argument(
|
||||
"--realtime", action="store_true",
|
||||
help="Also install realtime audio tools (pulseaudio-utils on Linux, BlackHole+ffmpeg on macOS). Uses sudo/brew, prompts before invoking either.",
|
||||
)
|
||||
inst_p.add_argument(
|
||||
"--yes", "-y", action="store_true",
|
||||
help="Answer yes to all prompts (use with care; will run sudo apt-get or brew without asking).",
|
||||
)
|
||||
|
||||
subs.add_parser("auth", help="Sign in to Google and save session state")
|
||||
|
||||
join_p = subs.add_parser("join", help="Join a Meet URL")
|
||||
join_p.add_argument("url", help="https://meet.google.com/...")
|
||||
join_p.add_argument("--guest-name", default="Hermes Agent")
|
||||
join_p.add_argument("--duration", default=None, help="e.g. 30m, 2h, 90s")
|
||||
join_p.add_argument("--headed", action="store_true", help="show browser")
|
||||
join_p.add_argument(
|
||||
"--mode", choices=("transcribe", "realtime"), default="transcribe",
|
||||
help="transcribe (default, listen-only) or realtime (speak via OpenAI Realtime)"
|
||||
)
|
||||
join_p.add_argument(
|
||||
"--node", default=None,
|
||||
help="remote node name, or 'auto' to use the sole registered node"
|
||||
)
|
||||
|
||||
subs.add_parser("status", help="Print current Meet bot state")
|
||||
|
||||
tr_p = subs.add_parser("transcript", help="Print the scraped transcript")
|
||||
tr_p.add_argument("--last", type=int, default=None)
|
||||
|
||||
say_p = subs.add_parser("say", help="Speak text in an active realtime meeting")
|
||||
say_p.add_argument("text", help="what to say")
|
||||
say_p.add_argument("--node", default=None)
|
||||
|
||||
subs.add_parser("stop", help="Leave the current meeting")
|
||||
|
||||
# v3: remote node host management.
|
||||
node_p = subs.add_parser(
|
||||
"node",
|
||||
help="Manage remote meet node hosts (run/list/approve/remove/status/ping)",
|
||||
)
|
||||
try:
|
||||
from plugins.google_meet.node.cli import register_cli as _register_node_cli
|
||||
_register_node_cli(node_p)
|
||||
except Exception as e: # pragma: no cover — defensive
|
||||
# If the node module fails to import for any reason (optional dep
|
||||
# missing at import time etc.), leave the subparser present but
|
||||
# flag it. The argparse dispatch will surface a clear error.
|
||||
def _node_unavailable(args):
|
||||
print(f"hermes meet node: module unavailable ({e})")
|
||||
return 1
|
||||
node_p.set_defaults(func=_node_unavailable)
|
||||
|
||||
subparser.set_defaults(func=meet_command)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Dispatch
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def meet_command(args: argparse.Namespace) -> int:
|
||||
sub = getattr(args, "meet_command", None)
|
||||
if not sub:
|
||||
print("usage: hermes meet {setup,auth,join,status,transcript,say,stop,node}")
|
||||
return 2
|
||||
if sub == "setup":
|
||||
return _cmd_setup()
|
||||
if sub == "install":
|
||||
return _cmd_install(
|
||||
realtime=bool(getattr(args, "realtime", False)),
|
||||
assume_yes=bool(getattr(args, "yes", False)),
|
||||
)
|
||||
if sub == "auth":
|
||||
return _cmd_auth()
|
||||
if sub == "join":
|
||||
return _cmd_join(
|
||||
url=args.url,
|
||||
guest_name=args.guest_name,
|
||||
duration=args.duration,
|
||||
headed=args.headed,
|
||||
mode=getattr(args, "mode", "transcribe"),
|
||||
node=getattr(args, "node", None),
|
||||
)
|
||||
if sub == "status":
|
||||
return _cmd_status()
|
||||
if sub == "transcript":
|
||||
return _cmd_transcript(last=args.last)
|
||||
if sub == "say":
|
||||
return _cmd_say(text=args.text, node=getattr(args, "node", None))
|
||||
if sub == "stop":
|
||||
return _cmd_stop()
|
||||
if sub == "node":
|
||||
# Dispatch was set by the node cli's register_cli; fall through to
|
||||
# whatever its subparsers wired.
|
||||
fn = getattr(args, "func", None)
|
||||
if fn is None or fn is meet_command:
|
||||
print("usage: hermes meet node {run,list,approve,remove,status,ping}")
|
||||
return 2
|
||||
return fn(args)
|
||||
print(f"unknown subcommand: {sub}")
|
||||
return 2
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Subcommand handlers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def _cmd_setup() -> int:
|
||||
import platform as _p
|
||||
|
||||
print("google_meet preflight")
|
||||
print("---------------------")
|
||||
|
||||
system = _p.system()
|
||||
system_ok = system in ("Linux", "Darwin")
|
||||
print(f" platform : {system} [{'ok' if system_ok else 'unsupported'}]")
|
||||
|
||||
try:
|
||||
import playwright # noqa: F401
|
||||
pw_ok = True
|
||||
pw_msg = "installed"
|
||||
except ImportError:
|
||||
pw_ok = False
|
||||
pw_msg = "NOT installed — run: pip install playwright"
|
||||
print(f" playwright : {pw_msg}")
|
||||
|
||||
chromium_ok = False
|
||||
chromium_msg = "unknown"
|
||||
if pw_ok:
|
||||
try:
|
||||
from playwright.sync_api import sync_playwright
|
||||
with sync_playwright() as p:
|
||||
try:
|
||||
exe = p.chromium.executable_path
|
||||
if exe and Path(exe).exists():
|
||||
chromium_ok = True
|
||||
chromium_msg = f"ok ({exe})"
|
||||
else:
|
||||
chromium_msg = (
|
||||
"not installed — run: "
|
||||
"python -m playwright install chromium"
|
||||
)
|
||||
except Exception as e:
|
||||
chromium_msg = f"probe failed: {e}"
|
||||
except Exception as e:
|
||||
chromium_msg = f"probe failed: {e}"
|
||||
print(f" chromium : {chromium_msg}")
|
||||
|
||||
auth_path = _auth_state_path()
|
||||
auth_ok = auth_path.is_file()
|
||||
print(
|
||||
" google auth : "
|
||||
+ (f"ok ({auth_path})" if auth_ok else "not saved — run: hermes meet auth")
|
||||
)
|
||||
|
||||
print()
|
||||
all_ok = system_ok and pw_ok and chromium_ok
|
||||
if all_ok:
|
||||
print(
|
||||
"ready. Join a meeting: "
|
||||
"hermes meet join https://meet.google.com/abc-defg-hij"
|
||||
)
|
||||
else:
|
||||
print("not ready yet — fix the items above.")
|
||||
return 0 if all_ok else 1
|
||||
|
||||
|
||||
def _cmd_install(*, realtime: bool, assume_yes: bool) -> int:
|
||||
"""Install the plugin's prerequisites.
|
||||
|
||||
Always: pip install playwright + websockets, then
|
||||
``python -m playwright install chromium``.
|
||||
|
||||
With ``--realtime``: also install the platform audio bridge deps.
|
||||
Linux : ``sudo apt-get install -y pulseaudio-utils``
|
||||
macOS : ``brew install blackhole-2ch ffmpeg`` (+ remind the user
|
||||
to select BlackHole as the default input device manually)
|
||||
|
||||
Prompts before every package-manager invocation unless ``--yes``.
|
||||
Refuses to run on Windows.
|
||||
"""
|
||||
import platform as _p
|
||||
import shutil as _shutil
|
||||
import subprocess as _sp
|
||||
|
||||
system = _p.system()
|
||||
if system not in ("Linux", "Darwin"):
|
||||
print(f"google_meet install: {system} is not supported (linux/macos only)")
|
||||
return 1
|
||||
|
||||
def _confirm(prompt: str) -> bool:
|
||||
if assume_yes:
|
||||
return True
|
||||
try:
|
||||
ans = input(f"{prompt} [y/N] ").strip().lower()
|
||||
except EOFError:
|
||||
return False
|
||||
return ans in ("y", "yes")
|
||||
|
||||
print("google_meet install")
|
||||
print("-------------------")
|
||||
|
||||
# 1) pip deps — always safe, venv-scoped.
|
||||
pip_pkgs = ["playwright", "websockets"]
|
||||
print(f"\n[1/3] pip install: {' '.join(pip_pkgs)}")
|
||||
try:
|
||||
res = _sp.run(
|
||||
[sys.executable, "-m", "pip", "install", "--upgrade", *pip_pkgs],
|
||||
check=False,
|
||||
)
|
||||
if res.returncode != 0:
|
||||
print(" pip install failed")
|
||||
return 1
|
||||
except Exception as e:
|
||||
print(f" pip install failed: {e}")
|
||||
return 1
|
||||
|
||||
# 2) Playwright browsers — pulls chromium (~300MB first run).
|
||||
print("\n[2/3] python -m playwright install chromium")
|
||||
try:
|
||||
res = _sp.run(
|
||||
[sys.executable, "-m", "playwright", "install", "chromium"],
|
||||
check=False,
|
||||
)
|
||||
if res.returncode != 0:
|
||||
print(" playwright install failed (may already be installed)")
|
||||
except Exception as e:
|
||||
print(f" playwright install failed: {e}")
|
||||
return 1
|
||||
|
||||
# 3) Platform audio deps for realtime mode.
|
||||
if realtime:
|
||||
print("\n[3/3] realtime audio deps")
|
||||
if system == "Linux":
|
||||
if _shutil.which("paplay") and _shutil.which("pactl"):
|
||||
print(" pulseaudio-utils already installed.")
|
||||
else:
|
||||
if not _confirm(
|
||||
" install pulseaudio-utils? this runs `sudo apt-get install -y pulseaudio-utils`"
|
||||
):
|
||||
print(" skipped (you can run it manually later)")
|
||||
else:
|
||||
cmd = ["sudo", "apt-get", "install", "-y", "pulseaudio-utils"]
|
||||
print(f" $ {' '.join(cmd)}")
|
||||
res = _sp.run(cmd, check=False)
|
||||
if res.returncode != 0:
|
||||
print(" apt install failed — install pulseaudio-utils manually")
|
||||
elif system == "Darwin":
|
||||
have_bh = False
|
||||
try:
|
||||
out = _sp.check_output(["system_profiler", "SPAudioDataType"], text=True)
|
||||
have_bh = "BlackHole" in out
|
||||
except Exception:
|
||||
pass
|
||||
have_ffmpeg = bool(_shutil.which("ffmpeg"))
|
||||
needs = []
|
||||
if not have_bh:
|
||||
needs.append("blackhole-2ch")
|
||||
if not have_ffmpeg:
|
||||
needs.append("ffmpeg")
|
||||
if not needs:
|
||||
print(" BlackHole and ffmpeg already installed.")
|
||||
elif not _shutil.which("brew"):
|
||||
print(
|
||||
" missing: " + ", ".join(needs) + "\n"
|
||||
" install Homebrew first (https://brew.sh) or install the packages manually."
|
||||
)
|
||||
else:
|
||||
if not _confirm(f" install via brew: {' '.join(needs)}?"):
|
||||
print(" skipped (you can run it manually later)")
|
||||
else:
|
||||
cmd = ["brew", "install", *needs]
|
||||
print(f" $ {' '.join(cmd)}")
|
||||
res = _sp.run(cmd, check=False)
|
||||
if res.returncode != 0:
|
||||
print(" brew install failed — install them manually")
|
||||
print(
|
||||
"\n NOTE: macOS does not auto-route audio. Open\n"
|
||||
" System Settings → Sound → Input\n"
|
||||
" and select 'BlackHole 2ch' before starting a realtime meeting.\n"
|
||||
" hermes will not switch your default input for you."
|
||||
)
|
||||
else:
|
||||
print("\n[3/3] skipped (pass --realtime to install audio tooling too)")
|
||||
|
||||
print("\ndone. verify with: hermes meet setup")
|
||||
return 0
|
||||
|
||||
|
||||
def _cmd_auth() -> int:
|
||||
"""Open a headed Chromium, let the user sign in, save storage_state."""
|
||||
try:
|
||||
from playwright.sync_api import sync_playwright
|
||||
except ImportError:
|
||||
print(
|
||||
"playwright is not installed. run:\n"
|
||||
" pip install playwright && python -m playwright install chromium"
|
||||
)
|
||||
return 1
|
||||
|
||||
path = _auth_state_path()
|
||||
path.parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
print(f"opening Chromium — sign in to Google, then return here and press Enter.")
|
||||
print(f"saving storage state to: {path}")
|
||||
try:
|
||||
with sync_playwright() as pw:
|
||||
browser = pw.chromium.launch(headless=False)
|
||||
context = browser.new_context()
|
||||
page = context.new_page()
|
||||
page.goto("https://accounts.google.com/", wait_until="domcontentloaded")
|
||||
try:
|
||||
input("press Enter after you've signed in ... ")
|
||||
except EOFError:
|
||||
pass
|
||||
context.storage_state(path=str(path))
|
||||
browser.close()
|
||||
except Exception as e:
|
||||
print(f"auth failed: {e}")
|
||||
return 1
|
||||
print("saved. you can now run: hermes meet join <url>")
|
||||
return 0
|
||||
|
||||
|
||||
def _cmd_join(
|
||||
url: str,
|
||||
*,
|
||||
guest_name: str,
|
||||
duration: Optional[str],
|
||||
headed: bool,
|
||||
mode: str = "transcribe",
|
||||
node: Optional[str] = None,
|
||||
) -> int:
|
||||
if not _is_safe_meet_url(url):
|
||||
print(f"refusing: not a meet.google.com URL: {url}")
|
||||
return 2
|
||||
if node:
|
||||
# Remote: go through NodeClient.
|
||||
try:
|
||||
from plugins.google_meet.node.registry import NodeRegistry
|
||||
from plugins.google_meet.node.client import NodeClient
|
||||
except ImportError as e:
|
||||
print(f"node module unavailable: {e}")
|
||||
return 1
|
||||
reg = NodeRegistry()
|
||||
entry = reg.resolve(node if node != "auto" else None)
|
||||
if entry is None:
|
||||
print(f"no registered node matches {node!r}")
|
||||
return 1
|
||||
client = NodeClient(url=entry["url"], token=entry["token"])
|
||||
try:
|
||||
res = client.start_bot(
|
||||
url=url, guest_name=guest_name, duration=duration,
|
||||
headed=headed, mode=mode,
|
||||
)
|
||||
except Exception as e:
|
||||
print(f"remote start_bot failed: {e}")
|
||||
return 1
|
||||
print(json.dumps({"node": entry.get("name"), **res}, indent=2))
|
||||
return 0 if res.get("ok") else 1
|
||||
|
||||
auth = _auth_state_path()
|
||||
res = pm.start(
|
||||
url=url,
|
||||
headed=headed,
|
||||
guest_name=guest_name,
|
||||
duration=duration,
|
||||
auth_state=str(auth) if auth.is_file() else None,
|
||||
mode=mode,
|
||||
)
|
||||
print(json.dumps(res, indent=2))
|
||||
return 0 if res.get("ok") else 1
|
||||
|
||||
|
||||
def _cmd_say(text: str, node: Optional[str] = None) -> int:
|
||||
if not (text or "").strip():
|
||||
print("refusing: empty text")
|
||||
return 2
|
||||
if node:
|
||||
try:
|
||||
from plugins.google_meet.node.registry import NodeRegistry
|
||||
from plugins.google_meet.node.client import NodeClient
|
||||
except ImportError as e:
|
||||
print(f"node module unavailable: {e}")
|
||||
return 1
|
||||
reg = NodeRegistry()
|
||||
entry = reg.resolve(node if node != "auto" else None)
|
||||
if entry is None:
|
||||
print(f"no registered node matches {node!r}")
|
||||
return 1
|
||||
client = NodeClient(url=entry["url"], token=entry["token"])
|
||||
try:
|
||||
res = client.say(text)
|
||||
except Exception as e:
|
||||
print(f"remote say failed: {e}")
|
||||
return 1
|
||||
print(json.dumps({"node": entry.get("name"), **res}, indent=2))
|
||||
return 0 if res.get("ok") else 1
|
||||
|
||||
res = pm.enqueue_say(text)
|
||||
print(json.dumps(res, indent=2))
|
||||
return 0 if res.get("ok") else 1
|
||||
|
||||
|
||||
def _cmd_status() -> int:
|
||||
res = pm.status()
|
||||
print(json.dumps(res, indent=2))
|
||||
return 0 if res.get("ok") else 1
|
||||
|
||||
|
||||
def _cmd_transcript(last: Optional[int]) -> int:
|
||||
res = pm.transcript(last=last)
|
||||
if not res.get("ok"):
|
||||
print(json.dumps(res, indent=2))
|
||||
return 1
|
||||
for ln in res.get("lines", []):
|
||||
print(ln)
|
||||
return 0
|
||||
|
||||
|
||||
def _cmd_stop() -> int:
|
||||
res = pm.stop(reason="hermes meet stop")
|
||||
print(json.dumps(res, indent=2))
|
||||
return 0 if res.get("ok") else 1
|
||||
|
||||
|
||||
if __name__ == "__main__": # pragma: no cover
|
||||
parser = argparse.ArgumentParser(prog="hermes meet")
|
||||
register_cli(parser)
|
||||
ns = parser.parse_args()
|
||||
sys.exit(meet_command(ns))
|
||||
852
plugins/google_meet/meet_bot.py
Normal file
852
plugins/google_meet/meet_bot.py
Normal file
@@ -0,0 +1,852 @@
|
||||
"""Headless Google Meet bot — Playwright + live-caption scraping.
|
||||
|
||||
Runs as a standalone subprocess spawned by ``process_manager.py``. Reads config
|
||||
from env vars, writes status + transcript to files under
|
||||
``$HERMES_HOME/workspace/meetings/<meeting-id>/``. The main hermes process
|
||||
reads those files via the ``meet_*`` tools — no IPC beyond filesystem.
|
||||
|
||||
The scraping strategy mirrors OpenUtter (sumansid/openutter): we don't parse
|
||||
WebRTC audio, we enable Google Meet's built-in live captions and observe the
|
||||
captions container in the DOM via a MutationObserver. This is lossy and
|
||||
English-biased but it is:
|
||||
|
||||
* deterministic (no API keys, no STT billing),
|
||||
* works behind Meet's normal login / admission,
|
||||
* survives Meet UI rewrites fairly well because the caption container has a
|
||||
stable ARIA role.
|
||||
|
||||
Run standalone for debugging::
|
||||
|
||||
HERMES_MEET_URL=https://meet.google.com/abc-defg-hij \\
|
||||
HERMES_MEET_OUT_DIR=/tmp/meet-debug \\
|
||||
HERMES_MEET_HEADED=1 \\
|
||||
python -m plugins.google_meet.meet_bot
|
||||
|
||||
No meet.google.com URL → exits non-zero. Any URL that doesn't start with
|
||||
``https://meet.google.com/`` is rejected (explicit-by-design).
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import os
|
||||
import re
|
||||
import signal
|
||||
import sys
|
||||
import threading
|
||||
import time
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
# Match ``https://meet.google.com/abc-defg-hij`` or ``.../lookup/...`` — the
|
||||
# short three-segment code or a lookup URL. Anything else is rejected.
|
||||
MEET_URL_RE = re.compile(
|
||||
r"^https://meet\.google\.com/("
|
||||
r"[a-z0-9]{3,}-[a-z0-9]{3,}-[a-z0-9]{3,}"
|
||||
r"|lookup/[^/?#]+"
|
||||
r"|new"
|
||||
r")(?:[/?#].*)?$"
|
||||
)
|
||||
|
||||
|
||||
# Filenames the bot reads/writes in ``HERMES_MEET_OUT_DIR``.
|
||||
SAY_QUEUE_FILENAME = "say_queue.jsonl"
|
||||
SAY_PCM_FILENAME = "speaker.pcm"
|
||||
|
||||
|
||||
def _is_safe_meet_url(url: str) -> bool:
|
||||
"""Return True if *url* is a Google Meet URL we're willing to navigate to."""
|
||||
if not isinstance(url, str):
|
||||
return False
|
||||
return bool(MEET_URL_RE.match(url.strip()))
|
||||
|
||||
|
||||
def _meeting_id_from_url(url: str) -> str:
|
||||
"""Extract the 3-segment meeting code from a Meet URL.
|
||||
|
||||
For ``https://meet.google.com/abc-defg-hij`` → ``abc-defg-hij``.
|
||||
For ``.../lookup/<id>`` or ``/new`` we fall back to a timestamped id — the
|
||||
bot won't know the real code until after redirect, and callers pass this
|
||||
through to filename anyway.
|
||||
"""
|
||||
m = re.search(
|
||||
r"meet\.google\.com/([a-z0-9]{3,}-[a-z0-9]{3,}-[a-z0-9]{3,})",
|
||||
url or "",
|
||||
)
|
||||
if m:
|
||||
return m.group(1)
|
||||
return f"meet-{int(time.time())}"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Status + transcript file writers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class _BotState:
|
||||
"""Single-process mutable state, flushed to ``status.json`` on each change."""
|
||||
|
||||
def __init__(self, out_dir: Path, meeting_id: str, url: str):
|
||||
self.out_dir = out_dir
|
||||
self.meeting_id = meeting_id
|
||||
self.url = url
|
||||
self.in_call = False
|
||||
self.captioning = False
|
||||
self.captions_enabled_attempted = False
|
||||
self.lobby_waiting = False
|
||||
self.join_attempted_at: Optional[float] = None
|
||||
self.joined_at: Optional[float] = None
|
||||
self.last_caption_at: Optional[float] = None
|
||||
self.transcript_lines = 0
|
||||
self.error: Optional[str] = None
|
||||
self.exited = False
|
||||
# v2 realtime fields.
|
||||
self.realtime = False
|
||||
self.realtime_ready = False
|
||||
self.realtime_device: Optional[str] = None
|
||||
self.audio_bytes_out: int = 0
|
||||
self.last_audio_out_at: Optional[float] = None
|
||||
self.last_barge_in_at: Optional[float] = None
|
||||
self.leave_reason: Optional[str] = None
|
||||
# Scraped captions, in order, deduped. Each entry is a dict of
|
||||
# {"ts": <epoch>, "speaker": str, "text": str}.
|
||||
self._seen: set = set()
|
||||
out_dir.mkdir(parents=True, exist_ok=True)
|
||||
self.transcript_path = out_dir / "transcript.txt"
|
||||
self.status_path = out_dir / "status.json"
|
||||
self._flush()
|
||||
|
||||
# -------- transcript ------------------------------------------------
|
||||
|
||||
def record_caption(self, speaker: str, text: str) -> None:
|
||||
"""Append a caption line if we haven't seen this exact (speaker, text)."""
|
||||
speaker = (speaker or "").strip() or "Unknown"
|
||||
text = (text or "").strip()
|
||||
if not text:
|
||||
return
|
||||
key = f"{speaker}|{text}"
|
||||
if key in self._seen:
|
||||
return
|
||||
self._seen.add(key)
|
||||
self.transcript_lines += 1
|
||||
self.last_caption_at = time.time()
|
||||
ts = time.strftime("%H:%M:%S", time.localtime(self.last_caption_at))
|
||||
line = f"[{ts}] {speaker}: {text}\n"
|
||||
# Atomic-ish append — good enough for a single-writer.
|
||||
with self.transcript_path.open("a", encoding="utf-8") as f:
|
||||
f.write(line)
|
||||
self._flush()
|
||||
|
||||
# -------- status file ----------------------------------------------
|
||||
|
||||
def _flush(self) -> None:
|
||||
data = {
|
||||
"meetingId": self.meeting_id,
|
||||
"url": self.url,
|
||||
"inCall": self.in_call,
|
||||
"captioning": self.captioning,
|
||||
"captionsEnabledAttempted": self.captions_enabled_attempted,
|
||||
"lobbyWaiting": self.lobby_waiting,
|
||||
"joinAttemptedAt": self.join_attempted_at,
|
||||
"joinedAt": self.joined_at,
|
||||
"lastCaptionAt": self.last_caption_at,
|
||||
"transcriptLines": self.transcript_lines,
|
||||
"transcriptPath": str(self.transcript_path),
|
||||
"error": self.error,
|
||||
"exited": self.exited,
|
||||
"pid": os.getpid(),
|
||||
# v2 realtime telemetry.
|
||||
"realtime": self.realtime,
|
||||
"realtimeReady": self.realtime_ready,
|
||||
"realtimeDevice": self.realtime_device,
|
||||
"audioBytesOut": self.audio_bytes_out,
|
||||
"lastAudioOutAt": self.last_audio_out_at,
|
||||
"lastBargeInAt": self.last_barge_in_at,
|
||||
"leaveReason": self.leave_reason,
|
||||
}
|
||||
tmp = self.status_path.with_suffix(".json.tmp")
|
||||
tmp.write_text(json.dumps(data, indent=2), encoding="utf-8")
|
||||
tmp.replace(self.status_path)
|
||||
|
||||
def set(self, **kwargs) -> None:
|
||||
for k, v in kwargs.items():
|
||||
setattr(self, k, v)
|
||||
self._flush()
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Playwright bot entry point
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
# JavaScript injected into the Meet tab to observe captions. Captures
|
||||
# {speaker, text} tuples via a MutationObserver on the caption container,
|
||||
# and exposes ``window.__hermesMeetDrain()`` to pull new entries. This
|
||||
# mirrors the OpenUtter caption scraping approach.
|
||||
_CAPTION_OBSERVER_JS = r"""
|
||||
(() => {
|
||||
if (window.__hermesMeetInstalled) return;
|
||||
window.__hermesMeetInstalled = true;
|
||||
window.__hermesMeetQueue = [];
|
||||
|
||||
const captionSelector = '[role="region"][aria-label*="aption" i], ' +
|
||||
'div[jsname="YSxPC"], ' + // legacy
|
||||
'div[jsname="tgaKEf"]'; // current (Apr 2026)
|
||||
|
||||
function pushEntry(speaker, text) {
|
||||
if (!text || !text.trim()) return;
|
||||
window.__hermesMeetQueue.push({
|
||||
ts: Date.now(),
|
||||
speaker: (speaker || '').trim(),
|
||||
text: text.trim(),
|
||||
});
|
||||
}
|
||||
|
||||
function scan(root) {
|
||||
// Meet captions render as a list of rows; each row contains a speaker
|
||||
// label and a text block. Selectors vary across Meet rewrites; we try
|
||||
// a few shapes and fall back to raw text.
|
||||
const rows = root.querySelectorAll('div[jsname="dsyhDe"], div.CNusmb, div.TBMuR');
|
||||
if (rows.length) {
|
||||
rows.forEach((row) => {
|
||||
const spkEl = row.querySelector('div.KcIKyf, div.zs7s8d, span[jsname="YSxPC"]');
|
||||
const txtEl = row.querySelector('div.bh44bd, span[jsname="tgaKEf"], div.iTTPOb');
|
||||
const speaker = spkEl ? spkEl.innerText : '';
|
||||
const text = txtEl ? txtEl.innerText : row.innerText;
|
||||
pushEntry(speaker, text);
|
||||
});
|
||||
return;
|
||||
}
|
||||
// Fallback: treat the whole region's innerText as one anonymous line.
|
||||
const text = (root.innerText || '').split('\n').filter(Boolean).pop();
|
||||
pushEntry('', text);
|
||||
}
|
||||
|
||||
function attach() {
|
||||
const el = document.querySelector(captionSelector);
|
||||
if (!el) return false;
|
||||
const obs = new MutationObserver(() => scan(el));
|
||||
obs.observe(el, { childList: true, subtree: true, characterData: true });
|
||||
scan(el);
|
||||
return true;
|
||||
}
|
||||
|
||||
// Try now and retry on interval — the caption region only appears after
|
||||
// captions are enabled and someone speaks.
|
||||
if (!attach()) {
|
||||
const iv = setInterval(() => { if (attach()) clearInterval(iv); }, 1500);
|
||||
}
|
||||
|
||||
window.__hermesMeetDrain = () => {
|
||||
const out = window.__hermesMeetQueue.slice();
|
||||
window.__hermesMeetQueue = [];
|
||||
return out;
|
||||
};
|
||||
})();
|
||||
"""
|
||||
|
||||
|
||||
def _enable_captions_js() -> str:
|
||||
"""Return a small JS snippet that tries to click the 'Turn on captions' button.
|
||||
|
||||
Best-effort — Meet's caption toggle is keyboard-accessible via ``c``. We
|
||||
dispatch that keystroke as a cheap fallback. Real click targeting is too
|
||||
brittle to rely on.
|
||||
"""
|
||||
return r"""
|
||||
(() => {
|
||||
const ev = new KeyboardEvent('keydown', {
|
||||
key: 'c', code: 'KeyC', keyCode: 67, which: 67, bubbles: true,
|
||||
});
|
||||
document.body.dispatchEvent(ev);
|
||||
return true;
|
||||
})();
|
||||
"""
|
||||
|
||||
|
||||
def _start_realtime_speaker(
|
||||
*,
|
||||
rt: dict,
|
||||
out_dir: Path,
|
||||
bridge_info: dict,
|
||||
api_key: str,
|
||||
model: str,
|
||||
voice: str,
|
||||
instructions: str,
|
||||
stop_flag: dict,
|
||||
state: "_BotState",
|
||||
) -> None:
|
||||
"""Wire up the OpenAI Realtime session + speaker thread + PCM pump.
|
||||
|
||||
The speaker thread reads text lines from ``say_queue.jsonl``, sends each
|
||||
to OpenAI Realtime, and writes PCM audio into ``speaker.pcm``. A
|
||||
separate *pump* thread forwards that PCM into the OS audio sink so
|
||||
Chrome's fake mic picks it up. On Linux we pipe to ``paplay`` against
|
||||
the null-sink; on macOS the caller is expected to have the BlackHole
|
||||
device selected as default input.
|
||||
"""
|
||||
try:
|
||||
from plugins.google_meet.realtime.openai_client import (
|
||||
RealtimeSession,
|
||||
RealtimeSpeaker,
|
||||
)
|
||||
except Exception as e:
|
||||
state.set(error=f"realtime import failed: {e}")
|
||||
return
|
||||
|
||||
pcm_path = out_dir / SAY_PCM_FILENAME
|
||||
queue_path = out_dir / SAY_QUEUE_FILENAME
|
||||
processed_path = out_dir / "say_processed.jsonl"
|
||||
# Reset the sink file so we start clean each session.
|
||||
pcm_path.write_bytes(b"")
|
||||
# Make sure the queue exists so the speaker poller doesn't error on
|
||||
# first iteration.
|
||||
queue_path.touch()
|
||||
|
||||
try:
|
||||
session = RealtimeSession(
|
||||
api_key=api_key,
|
||||
model=model,
|
||||
voice=voice,
|
||||
instructions=instructions,
|
||||
audio_sink_path=pcm_path,
|
||||
sample_rate=24000,
|
||||
)
|
||||
session.connect()
|
||||
except Exception as e:
|
||||
state.set(error=f"realtime connect failed: {e}")
|
||||
return
|
||||
|
||||
rt["session"] = session
|
||||
|
||||
def _stop_fn():
|
||||
return stop_flag.get("stop", False)
|
||||
|
||||
rt["speaker_stop"] = lambda: stop_flag.__setitem__("stop", stop_flag.get("stop", False))
|
||||
|
||||
speaker = RealtimeSpeaker(
|
||||
session=session,
|
||||
queue_path=queue_path,
|
||||
processed_path=processed_path,
|
||||
)
|
||||
|
||||
def _speaker_loop():
|
||||
try:
|
||||
speaker.run_until_stopped(_stop_fn)
|
||||
except Exception as e:
|
||||
state.set(error=f"realtime speaker crashed: {e}")
|
||||
|
||||
t_speaker = threading.Thread(target=_speaker_loop, name="meet-speaker", daemon=True)
|
||||
t_speaker.start()
|
||||
rt["speaker_thread"] = t_speaker
|
||||
|
||||
# PCM pump: feeds speaker.pcm (24kHz s16le mono) into the OS audio
|
||||
# device that Chrome's fake mic reads from. Different tools per
|
||||
# platform, but the contract is the same — block-read the growing
|
||||
# PCM file and stream it to the device in near-real-time.
|
||||
platform_tag = (bridge_info or {}).get("platform")
|
||||
if platform_tag == "linux":
|
||||
import subprocess as _sp
|
||||
|
||||
sink = (bridge_info or {}).get("write_target") or "hermes_meet_sink"
|
||||
try:
|
||||
proc = _sp.Popen(
|
||||
[
|
||||
"paplay",
|
||||
"--raw",
|
||||
"--rate=24000",
|
||||
"--format=s16le",
|
||||
"--channels=1",
|
||||
f"--device={sink}",
|
||||
str(pcm_path),
|
||||
],
|
||||
stdin=_sp.DEVNULL,
|
||||
stdout=_sp.DEVNULL,
|
||||
stderr=_sp.DEVNULL,
|
||||
)
|
||||
rt["pcm_pump"] = proc
|
||||
except FileNotFoundError:
|
||||
state.set(error="paplay not found — install pulseaudio-utils for realtime on Linux")
|
||||
elif platform_tag == "darwin":
|
||||
# macOS: use ffmpeg to tail-read speaker.pcm and write it to the
|
||||
# BlackHole output device. The user must have BlackHole selected
|
||||
# as the default input in System Settings → Sound for Chrome to
|
||||
# pick it up. We prefer ffmpeg because it's scriptable and can
|
||||
# target AVFoundation devices by name; fall back to afplay-ing
|
||||
# the file in a tight loop if ffmpeg is absent.
|
||||
import shutil as _shutil
|
||||
import subprocess as _sp
|
||||
|
||||
device_name = (bridge_info or {}).get("write_target") or "BlackHole 2ch"
|
||||
if _shutil.which("ffmpeg"):
|
||||
try:
|
||||
# -re: read input at native frame rate.
|
||||
# -f avfoundation -i: speaker path as raw PCM.
|
||||
# -f s16le -ar 24000 -ac 1 -i <pcm>: interpret the file.
|
||||
# -f audiotoolbox -audio_device_index: write to BlackHole.
|
||||
# Simpler: output as raw via coreaudio using "-f audiotoolbox".
|
||||
# ffmpeg's audiotoolbox output picks the current default
|
||||
# output device, which isn't what we want. Instead we use
|
||||
# -f avfoundation with the named device as OUTPUT via
|
||||
# -vn and the device name.
|
||||
proc = _sp.Popen(
|
||||
[
|
||||
"ffmpeg",
|
||||
"-nostdin", "-hide_banner", "-loglevel", "error",
|
||||
"-re",
|
||||
"-f", "s16le", "-ar", "24000", "-ac", "1",
|
||||
"-i", str(pcm_path),
|
||||
"-f", "audiotoolbox",
|
||||
"-audio_device_index", _mac_audio_device_index(device_name),
|
||||
"-",
|
||||
],
|
||||
stdin=_sp.DEVNULL,
|
||||
stdout=_sp.DEVNULL,
|
||||
stderr=_sp.DEVNULL,
|
||||
)
|
||||
rt["pcm_pump"] = proc
|
||||
except FileNotFoundError:
|
||||
state.set(error="ffmpeg not found — install via `brew install ffmpeg` for realtime on macOS")
|
||||
except Exception as e:
|
||||
state.set(error=f"macOS pcm pump failed to start: {e}")
|
||||
else:
|
||||
state.set(error="ffmpeg not found — install via `brew install ffmpeg` for realtime on macOS")
|
||||
|
||||
|
||||
def _mac_audio_device_index(device_name: str) -> str:
|
||||
"""Return the ffmpeg ``-audio_device_index`` for *device_name*, as a string.
|
||||
|
||||
Probes ``ffmpeg -f avfoundation -list_devices true -i ''`` (which prints
|
||||
the device table on stderr) and matches *device_name* case-insensitively.
|
||||
Defaults to ``"0"`` if the device can't be found — caller will get a
|
||||
misrouted stream but not a crash, and the error will be obvious.
|
||||
"""
|
||||
import subprocess as _sp
|
||||
|
||||
try:
|
||||
out = _sp.run(
|
||||
["ffmpeg", "-f", "avfoundation", "-list_devices", "true", "-i", ""],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=10,
|
||||
)
|
||||
except Exception:
|
||||
return "0"
|
||||
# ffmpeg prints the table on stderr. Lines look like:
|
||||
# [AVFoundation indev @ 0x...] [0] BlackHole 2ch
|
||||
import re as _re
|
||||
|
||||
needle = device_name.strip().lower()
|
||||
for line in (out.stderr or "").splitlines():
|
||||
m = _re.search(r"\[(\d+)\]\s+(.+)$", line)
|
||||
if not m:
|
||||
continue
|
||||
if m.group(2).strip().lower() == needle:
|
||||
return m.group(1)
|
||||
return "0"
|
||||
|
||||
|
||||
def run_bot() -> int: # noqa: C901 — orchestration, explicit branches
|
||||
url = os.environ.get("HERMES_MEET_URL", "").strip()
|
||||
out_dir_env = os.environ.get("HERMES_MEET_OUT_DIR", "").strip()
|
||||
headed = os.environ.get("HERMES_MEET_HEADED", "").lower() in ("1", "true", "yes")
|
||||
auth_state = os.environ.get("HERMES_MEET_AUTH_STATE", "").strip()
|
||||
guest_name = os.environ.get("HERMES_MEET_GUEST_NAME", "Hermes Agent")
|
||||
duration_s = _parse_duration(os.environ.get("HERMES_MEET_DURATION", ""))
|
||||
# v2: optional realtime mode. Enabled when HERMES_MEET_MODE=realtime.
|
||||
mode = os.environ.get("HERMES_MEET_MODE", "transcribe").strip().lower()
|
||||
realtime_model = os.environ.get("HERMES_MEET_REALTIME_MODEL", "gpt-realtime")
|
||||
realtime_voice = os.environ.get("HERMES_MEET_REALTIME_VOICE", "alloy")
|
||||
realtime_instructions = os.environ.get("HERMES_MEET_REALTIME_INSTRUCTIONS", "")
|
||||
realtime_api_key = os.environ.get("HERMES_MEET_REALTIME_KEY") or os.environ.get("OPENAI_API_KEY", "")
|
||||
|
||||
if not url or not _is_safe_meet_url(url):
|
||||
sys.stderr.write(
|
||||
"google_meet bot: refusing to launch — HERMES_MEET_URL must be a "
|
||||
"meet.google.com URL. got: %r\n" % url
|
||||
)
|
||||
return 2
|
||||
if not out_dir_env:
|
||||
sys.stderr.write("google_meet bot: HERMES_MEET_OUT_DIR is required\n")
|
||||
return 2
|
||||
|
||||
out_dir = Path(out_dir_env)
|
||||
meeting_id = _meeting_id_from_url(url)
|
||||
state = _BotState(out_dir=out_dir, meeting_id=meeting_id, url=url)
|
||||
|
||||
# SIGTERM → exit cleanly so the parent ``meet_leave`` gets a finalized
|
||||
# transcript. We set a flag instead of raising so the Playwright context
|
||||
# teardown runs in the finally block below.
|
||||
stop_flag = {"stop": False}
|
||||
|
||||
def _on_signal(_sig, _frame):
|
||||
stop_flag["stop"] = True
|
||||
|
||||
signal.signal(signal.SIGTERM, _on_signal)
|
||||
signal.signal(signal.SIGINT, _on_signal)
|
||||
|
||||
# v2 realtime: provision virtual audio device + start speaker thread.
|
||||
# We track these in a dict so the finally block can tear them down
|
||||
# regardless of how we exit. If anything in the realtime setup fails we
|
||||
# fall back to transcribe mode with a status flag.
|
||||
rt = {
|
||||
"enabled": mode == "realtime",
|
||||
"bridge": None, # AudioBridge | None
|
||||
"bridge_info": None, # dict | None
|
||||
"session": None, # RealtimeSession | None
|
||||
"speaker_thread": None, # threading.Thread | None
|
||||
"speaker_stop": None, # callable | None
|
||||
}
|
||||
if rt["enabled"]:
|
||||
if not realtime_api_key:
|
||||
state.set(error="realtime mode requested but no API key in HERMES_MEET_REALTIME_KEY/OPENAI_API_KEY — falling back to transcribe")
|
||||
rt["enabled"] = False
|
||||
else:
|
||||
try:
|
||||
from plugins.google_meet.audio_bridge import AudioBridge
|
||||
bridge = AudioBridge()
|
||||
rt["bridge_info"] = bridge.setup()
|
||||
rt["bridge"] = bridge
|
||||
state.set(realtime=True, realtime_device=rt["bridge_info"].get("device_name"))
|
||||
except Exception as e:
|
||||
state.set(error=f"audio bridge setup failed: {e} — falling back to transcribe")
|
||||
rt["enabled"] = False
|
||||
|
||||
try:
|
||||
from playwright.sync_api import sync_playwright
|
||||
except ImportError as e:
|
||||
state.set(error=f"playwright not installed: {e}", exited=True)
|
||||
sys.stderr.write(
|
||||
"google_meet bot: playwright is not installed. Run "
|
||||
"`pip install playwright && python -m playwright install chromium`\n"
|
||||
)
|
||||
if rt["bridge"]:
|
||||
rt["bridge"].teardown()
|
||||
return 3
|
||||
|
||||
# Chrome env: if realtime is live on Linux, point PULSE_SOURCE at the
|
||||
# virtual source so Chrome's fake mic reads the audio we generate.
|
||||
chrome_env = os.environ.copy()
|
||||
chrome_args = [
|
||||
"--use-fake-ui-for-media-stream",
|
||||
"--disable-blink-features=AutomationControlled",
|
||||
]
|
||||
if not rt["enabled"]:
|
||||
# v1-style fake device (silence) — we don't care about mic content
|
||||
# when we're not speaking.
|
||||
chrome_args.insert(1, "--use-fake-device-for-media-stream")
|
||||
elif rt["bridge_info"] and rt["bridge_info"].get("platform") == "linux":
|
||||
chrome_env["PULSE_SOURCE"] = rt["bridge_info"].get("device_name", "")
|
||||
|
||||
try:
|
||||
with sync_playwright() as pw:
|
||||
# Playwright's launch() doesn't take env; we set PULSE_SOURCE
|
||||
# via the process env before launch so the child Chrome inherits it.
|
||||
for k, v in chrome_env.items():
|
||||
os.environ[k] = v
|
||||
browser = pw.chromium.launch(
|
||||
headless=not headed,
|
||||
args=chrome_args,
|
||||
)
|
||||
context_args = {
|
||||
"viewport": {"width": 1280, "height": 800},
|
||||
"user_agent": (
|
||||
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 "
|
||||
"(KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36"
|
||||
),
|
||||
"permissions": ["microphone", "camera"],
|
||||
}
|
||||
if auth_state and Path(auth_state).is_file():
|
||||
context_args["storage_state"] = auth_state
|
||||
context = browser.new_context(**context_args)
|
||||
page = context.new_page()
|
||||
|
||||
try:
|
||||
page.goto(url, wait_until="domcontentloaded", timeout=30_000)
|
||||
except Exception as e:
|
||||
state.set(error=f"navigate failed: {e}", exited=True)
|
||||
return 4
|
||||
|
||||
# Guest-mode: Meet shows a name field before "Ask to join". When
|
||||
# we're authed, we instead see "Join now".
|
||||
_try_guest_name(page, guest_name)
|
||||
_click_join(page, state)
|
||||
|
||||
# Install caption observer and attempt to enable captions.
|
||||
try:
|
||||
page.evaluate(_enable_captions_js())
|
||||
state.set(captions_enabled_attempted=True)
|
||||
except Exception:
|
||||
pass
|
||||
try:
|
||||
page.evaluate(_CAPTION_OBSERVER_JS)
|
||||
except Exception as e:
|
||||
state.set(error=f"caption observer install failed: {e}")
|
||||
|
||||
# Note: in_call=False until admission is confirmed (we detect
|
||||
# either the Leave button or the caption region, signalling we
|
||||
# made it past the lobby).
|
||||
state.set(captioning=True, join_attempted_at=time.time())
|
||||
|
||||
# v2 realtime: start the speaker thread reading from the
|
||||
# plugin-side say queue. The thread reads JSONL lines written by
|
||||
# meet_say, calls OpenAI Realtime, and streams the audio PCM to
|
||||
# the virtual sink that Chrome's fake-mic is pointed at.
|
||||
if rt["enabled"]:
|
||||
_start_realtime_speaker(
|
||||
rt=rt,
|
||||
out_dir=out_dir,
|
||||
bridge_info=rt["bridge_info"],
|
||||
api_key=realtime_api_key,
|
||||
model=realtime_model,
|
||||
voice=realtime_voice,
|
||||
instructions=realtime_instructions,
|
||||
stop_flag=stop_flag,
|
||||
state=state,
|
||||
)
|
||||
if rt["session"] is not None:
|
||||
state.set(realtime_ready=True)
|
||||
|
||||
# Admission + drain loop. Runs until SIGTERM, duration expiry,
|
||||
# or the page detects "You were removed / you left the
|
||||
# meeting". Responsible for:
|
||||
# * detecting admission (Leave button visible → in_call=True)
|
||||
# * timing out stuck-in-lobby (default 5 minutes)
|
||||
# * draining scraped captions into the transcript
|
||||
# * triggering realtime barge-in when a human speaks while
|
||||
# the bot is generating audio
|
||||
# * periodically flushing realtime counters into status.json
|
||||
deadline = (time.time() + duration_s) if duration_s else None
|
||||
lobby_deadline = time.time() + float(
|
||||
os.environ.get("HERMES_MEET_LOBBY_TIMEOUT", "300")
|
||||
)
|
||||
last_admission_check = 0.0
|
||||
while not stop_flag["stop"]:
|
||||
now = time.time()
|
||||
if deadline and now > deadline:
|
||||
state.set(leave_reason="duration_expired")
|
||||
break
|
||||
|
||||
# Admission detection every ~3s until admitted.
|
||||
if not state.in_call and (now - last_admission_check) > 3.0:
|
||||
last_admission_check = now
|
||||
admitted = _detect_admission(page)
|
||||
if admitted:
|
||||
state.set(
|
||||
in_call=True,
|
||||
lobby_waiting=False,
|
||||
joined_at=now,
|
||||
)
|
||||
elif now > lobby_deadline:
|
||||
state.set(
|
||||
error=(
|
||||
"lobby timeout — host never admitted the bot "
|
||||
f"within {int(lobby_deadline - state.join_attempted_at) if state.join_attempted_at else 0}s"
|
||||
),
|
||||
leave_reason="lobby_timeout",
|
||||
)
|
||||
break
|
||||
elif _detect_denied(page):
|
||||
state.set(
|
||||
error="host denied admission",
|
||||
leave_reason="denied",
|
||||
)
|
||||
break
|
||||
|
||||
try:
|
||||
queued = page.evaluate("window.__hermesMeetDrain && window.__hermesMeetDrain()")
|
||||
if isinstance(queued, list):
|
||||
for entry in queued:
|
||||
if not isinstance(entry, dict):
|
||||
continue
|
||||
speaker = str(entry.get("speaker", ""))
|
||||
text = str(entry.get("text", ""))
|
||||
state.record_caption(speaker=speaker, text=text)
|
||||
# Barge-in: if the bot is currently generating
|
||||
# audio AND a real human just spoke, cancel the
|
||||
# in-flight response so we don't talk over them.
|
||||
if rt["enabled"] and rt["session"] is not None:
|
||||
if _looks_like_human_speaker(speaker, guest_name):
|
||||
try:
|
||||
cancelled = rt["session"].cancel_response()
|
||||
if cancelled:
|
||||
state.set(last_barge_in_at=now)
|
||||
except Exception:
|
||||
pass
|
||||
except Exception:
|
||||
# Meet reloaded or we got booted — try to detect and
|
||||
# exit gracefully rather than spinning.
|
||||
if page.is_closed():
|
||||
state.set(leave_reason="page_closed")
|
||||
break
|
||||
|
||||
# Fold the realtime session's byte/timestamp counters into
|
||||
# the status file so meet_status can surface them.
|
||||
if rt["session"] is not None:
|
||||
state.set(
|
||||
audio_bytes_out=getattr(rt["session"], "audio_bytes_out", 0),
|
||||
last_audio_out_at=getattr(rt["session"], "last_audio_out_at", None),
|
||||
)
|
||||
|
||||
time.sleep(1.0)
|
||||
|
||||
# Try to leave cleanly — click "Leave call" button if present.
|
||||
try:
|
||||
page.evaluate(
|
||||
"() => { const b = document.querySelector('button[aria-label*=\"eave call\"]');"
|
||||
" if (b) b.click(); }"
|
||||
)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
context.close()
|
||||
browser.close()
|
||||
# v2: teardown realtime speaker + audio bridge.
|
||||
if rt["speaker_stop"]:
|
||||
try:
|
||||
rt["speaker_stop"]()
|
||||
except Exception:
|
||||
pass
|
||||
if rt["speaker_thread"] is not None:
|
||||
try:
|
||||
rt["speaker_thread"].join(timeout=5.0)
|
||||
except Exception:
|
||||
pass
|
||||
if rt["session"]:
|
||||
try:
|
||||
rt["session"].close()
|
||||
except Exception:
|
||||
pass
|
||||
if rt["bridge"]:
|
||||
try:
|
||||
rt["bridge"].teardown()
|
||||
except Exception:
|
||||
pass
|
||||
state.set(in_call=False, captioning=False, exited=True)
|
||||
return 0
|
||||
|
||||
except Exception as e:
|
||||
state.set(error=f"unhandled: {e}", exited=True)
|
||||
return 1
|
||||
|
||||
|
||||
def _try_guest_name(page, guest_name: str) -> None:
|
||||
"""If Meet is showing a guest-name input, type *guest_name* into it."""
|
||||
try:
|
||||
# Meet's guest name input has placeholder "Your name".
|
||||
locator = page.locator('input[aria-label*="name" i]').first
|
||||
if locator.count() and locator.is_visible():
|
||||
locator.fill(guest_name, timeout=2_000)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
|
||||
def _detect_admission(page) -> bool:
|
||||
"""True if we're clearly past the lobby and in the call itself.
|
||||
|
||||
Uses a JS-side probe because Meet's DOM structure varies by client
|
||||
version. We check several high-signal indicators and declare admission
|
||||
on the first hit:
|
||||
|
||||
1. Leave-call button is present (``aria-label`` contains "eave call").
|
||||
2. Caption region has appeared (we installed the observer and it attached).
|
||||
3. The participant list container is visible.
|
||||
|
||||
Conservative by default — returns False on any error.
|
||||
"""
|
||||
probe = r"""
|
||||
(() => {
|
||||
const leave = document.querySelector('button[aria-label*="eave call" i]');
|
||||
if (leave) return true;
|
||||
if (window.__hermesMeetInstalled) {
|
||||
const caps = document.querySelector(
|
||||
'[role="region"][aria-label*="aption" i], ' +
|
||||
'div[jsname="YSxPC"], div[jsname="tgaKEf"]'
|
||||
);
|
||||
if (caps) return true;
|
||||
}
|
||||
const parts = document.querySelector('[aria-label*="articipants" i]');
|
||||
if (parts) return true;
|
||||
return false;
|
||||
})();
|
||||
"""
|
||||
try:
|
||||
return bool(page.evaluate(probe))
|
||||
except Exception:
|
||||
return False
|
||||
|
||||
|
||||
def _detect_denied(page) -> bool:
|
||||
"""True when Meet is showing a 'you were denied' / 'no one admitted' page."""
|
||||
probe = r"""
|
||||
(() => {
|
||||
const text = document.body ? document.body.innerText || '' : '';
|
||||
// English only — matches what shows up when the host denies or
|
||||
// removes a guest.
|
||||
if (/You can't join this video call/i.test(text)) return true;
|
||||
if (/You were removed from the meeting/i.test(text)) return true;
|
||||
if (/No one responded to your request to join/i.test(text)) return true;
|
||||
return false;
|
||||
})();
|
||||
"""
|
||||
try:
|
||||
return bool(page.evaluate(probe))
|
||||
except Exception:
|
||||
return False
|
||||
|
||||
|
||||
def _looks_like_human_speaker(speaker: str, bot_guest_name: str) -> bool:
|
||||
"""Whether a caption line's speaker is probably a human, not our bot echo.
|
||||
|
||||
Meet attributes captions to the speaker's display name. When Chrome is
|
||||
reading our fake mic, Meet still attributes captions to *our* bot name
|
||||
(because the bot is the one "speaking"). We don't want those to trigger
|
||||
barge-in. Anything else — real participant names — does.
|
||||
|
||||
Conservative: unknown / blank speakers (common when caption scraping
|
||||
falls back to raw text) do NOT trigger barge-in, because we can't tell
|
||||
whether it was a human or us.
|
||||
"""
|
||||
if not speaker or not speaker.strip():
|
||||
return False
|
||||
spk = speaker.strip().lower()
|
||||
if spk in ("unknown", "you", bot_guest_name.strip().lower()):
|
||||
return False
|
||||
return True
|
||||
|
||||
|
||||
def _click_join(page, state: _BotState) -> None:
|
||||
"""Click 'Join now' or 'Ask to join' if either button is visible.
|
||||
|
||||
Flags ``lobby_waiting`` when we hit the "waiting for host to admit you"
|
||||
state so the agent can surface that in status.
|
||||
"""
|
||||
for label in ("Join now", "Ask to join"):
|
||||
try:
|
||||
btn = page.get_by_role("button", name=label, exact=False).first
|
||||
if btn.count() and btn.is_visible():
|
||||
btn.click(timeout=3_000)
|
||||
if label == "Ask to join":
|
||||
state.set(lobby_waiting=True)
|
||||
break
|
||||
except Exception:
|
||||
continue
|
||||
|
||||
|
||||
def _parse_duration(raw: str) -> Optional[float]:
|
||||
"""Parse ``30m`` / ``2h`` / ``90`` (seconds) → float seconds, or None."""
|
||||
if not raw:
|
||||
return None
|
||||
raw = raw.strip().lower()
|
||||
try:
|
||||
if raw.endswith("h"):
|
||||
return float(raw[:-1]) * 3600
|
||||
if raw.endswith("m"):
|
||||
return float(raw[:-1]) * 60
|
||||
if raw.endswith("s"):
|
||||
return float(raw[:-1])
|
||||
return float(raw)
|
||||
except ValueError:
|
||||
return None
|
||||
|
||||
|
||||
if __name__ == "__main__": # pragma: no cover — subprocess entry point
|
||||
sys.exit(run_bot())
|
||||
54
plugins/google_meet/node/__init__.py
Normal file
54
plugins/google_meet/node/__init__.py
Normal file
@@ -0,0 +1,54 @@
|
||||
"""Remote 'node host' primitive for the google_meet plugin.
|
||||
|
||||
Lets the Meet bot (Playwright + Chrome) run on a different machine than
|
||||
the hermes-agent gateway. The gateway speaks a small JSON-over-WebSocket
|
||||
RPC protocol to the remote node; the node wraps the existing
|
||||
``plugins.google_meet.process_manager`` API.
|
||||
|
||||
Topology
|
||||
--------
|
||||
gateway (Linux) ── ws://mac.local:18789 ──▶ node server (Mac)
|
||||
└─ process_manager
|
||||
└─ meet_bot (Playwright)
|
||||
|
||||
Why: Google sign-in + Chrome profile live on the user's laptop. Running
|
||||
the bot there reuses that profile without shipping credentials to the
|
||||
server.
|
||||
|
||||
Public surface
|
||||
--------------
|
||||
NodeClient — gateway-side RPC client (short-lived sync WS per call)
|
||||
NodeServer — long-running server that hosts the bot
|
||||
NodeRegistry — local JSON registry of approved nodes (name → url+token)
|
||||
protocol — message envelope helpers (make_request, encode, decode, ...)
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from plugins.google_meet.node import protocol
|
||||
from plugins.google_meet.node.client import NodeClient
|
||||
from plugins.google_meet.node.protocol import (
|
||||
VALID_REQUEST_TYPES,
|
||||
decode,
|
||||
encode,
|
||||
make_error,
|
||||
make_request,
|
||||
make_response,
|
||||
validate_request,
|
||||
)
|
||||
from plugins.google_meet.node.registry import NodeRegistry
|
||||
from plugins.google_meet.node.server import NodeServer
|
||||
|
||||
__all__ = [
|
||||
"NodeClient",
|
||||
"NodeServer",
|
||||
"NodeRegistry",
|
||||
"protocol",
|
||||
"make_request",
|
||||
"make_response",
|
||||
"make_error",
|
||||
"encode",
|
||||
"decode",
|
||||
"validate_request",
|
||||
"VALID_REQUEST_TYPES",
|
||||
]
|
||||
125
plugins/google_meet/node/cli.py
Normal file
125
plugins/google_meet/node/cli.py
Normal file
@@ -0,0 +1,125 @@
|
||||
"""`hermes meet node ...` subcommand tree.
|
||||
|
||||
Wired into the existing ``hermes meet`` parser by the plugin's top-level
|
||||
CLI. This module only defines the subparsers and their dispatch — it
|
||||
does not mutate the existing cli.py.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import asyncio
|
||||
import json
|
||||
import sys
|
||||
from typing import Any
|
||||
|
||||
from plugins.google_meet.node.client import NodeClient
|
||||
from plugins.google_meet.node.registry import NodeRegistry
|
||||
from plugins.google_meet.node.server import NodeServer
|
||||
|
||||
|
||||
def register_cli(subparser: argparse.ArgumentParser) -> None:
|
||||
"""Add ``run / list / approve / remove / status / ping`` subparsers.
|
||||
|
||||
*subparser* is the ``hermes meet node`` argparse object — typically
|
||||
the result of ``meet_parser.add_parser('node', ...)``.
|
||||
"""
|
||||
sp = subparser.add_subparsers(dest="node_cmd", required=True)
|
||||
|
||||
run = sp.add_parser("run", help="Start a node server on this machine.")
|
||||
run.add_argument("--host", default="0.0.0.0")
|
||||
run.add_argument("--port", type=int, default=18789)
|
||||
run.add_argument("--display-name", default="hermes-meet-node")
|
||||
run.set_defaults(func=node_command)
|
||||
|
||||
lst = sp.add_parser("list", help="List approved remote nodes.")
|
||||
lst.set_defaults(func=node_command)
|
||||
|
||||
app = sp.add_parser("approve", help="Register a remote node on the gateway.")
|
||||
app.add_argument("name")
|
||||
app.add_argument("url")
|
||||
app.add_argument("token")
|
||||
app.set_defaults(func=node_command)
|
||||
|
||||
rm = sp.add_parser("remove", help="Forget a registered node.")
|
||||
rm.add_argument("name")
|
||||
rm.set_defaults(func=node_command)
|
||||
|
||||
st = sp.add_parser("status", help="Ping a registered node.")
|
||||
st.add_argument("name")
|
||||
st.set_defaults(func=node_command)
|
||||
|
||||
pg = sp.add_parser("ping", help="Alias for status.")
|
||||
pg.add_argument("name")
|
||||
pg.set_defaults(func=node_command)
|
||||
|
||||
|
||||
def node_command(args: argparse.Namespace) -> int:
|
||||
"""Dispatch for ``hermes meet node ...``.
|
||||
|
||||
Returns a process exit code. Side-effects print to stdout/stderr.
|
||||
"""
|
||||
cmd = getattr(args, "node_cmd", None)
|
||||
|
||||
if cmd == "run":
|
||||
server = NodeServer(
|
||||
host=args.host,
|
||||
port=args.port,
|
||||
display_name=args.display_name,
|
||||
)
|
||||
token = server.ensure_token()
|
||||
print(f"[meet-node] display_name={server.display_name}")
|
||||
print(f"[meet-node] listening on ws://{args.host}:{args.port}")
|
||||
print(f"[meet-node] token (copy to gateway): {token}")
|
||||
print(f"[meet-node] approve with:")
|
||||
print(f" hermes meet node approve <name> ws://<host>:{args.port} {token}")
|
||||
try:
|
||||
asyncio.run(server.serve())
|
||||
except KeyboardInterrupt:
|
||||
return 0
|
||||
except RuntimeError as exc:
|
||||
print(f"[meet-node] error: {exc}", file=sys.stderr)
|
||||
return 2
|
||||
return 0
|
||||
|
||||
reg = NodeRegistry()
|
||||
|
||||
if cmd == "list":
|
||||
nodes = reg.list_all()
|
||||
if not nodes:
|
||||
print("no nodes registered")
|
||||
return 0
|
||||
for n in nodes:
|
||||
print(f"{n['name']}\t{n['url']}\ttoken={n['token'][:6]}…")
|
||||
return 0
|
||||
|
||||
if cmd == "approve":
|
||||
reg.add(args.name, args.url, args.token)
|
||||
print(f"approved node {args.name!r} at {args.url}")
|
||||
return 0
|
||||
|
||||
if cmd == "remove":
|
||||
ok = reg.remove(args.name)
|
||||
print(f"removed {args.name!r}" if ok else f"no such node: {args.name!r}")
|
||||
return 0 if ok else 1
|
||||
|
||||
if cmd in ("status", "ping"):
|
||||
entry = reg.get(args.name)
|
||||
if entry is None:
|
||||
print(f"no such node: {args.name!r}", file=sys.stderr)
|
||||
return 1
|
||||
client = NodeClient(entry["url"], entry["token"])
|
||||
try:
|
||||
result = client.ping()
|
||||
except Exception as exc: # noqa: BLE001 — surface any connection error
|
||||
print(json.dumps({"ok": False, "error": str(exc)}))
|
||||
return 1
|
||||
print(json.dumps({"ok": True, "node": args.name, **_coerce_dict(result)}))
|
||||
return 0
|
||||
|
||||
print(f"unknown node command: {cmd!r}", file=sys.stderr)
|
||||
return 2
|
||||
|
||||
|
||||
def _coerce_dict(value: Any) -> dict:
|
||||
return value if isinstance(value, dict) else {"result": value}
|
||||
107
plugins/google_meet/node/client.py
Normal file
107
plugins/google_meet/node/client.py
Normal file
@@ -0,0 +1,107 @@
|
||||
"""Gateway-side RPC client for a remote meet node.
|
||||
|
||||
Each call opens a short-lived synchronous WebSocket to the node, sends
|
||||
exactly one request, reads exactly one response, and closes. This keeps
|
||||
the client trivial to use from non-async tool handlers and avoids
|
||||
maintaining persistent connection state across agent turns.
|
||||
|
||||
The ``websockets`` package is an optional dep — we import it lazily so
|
||||
plugin load doesn't require it.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Any, Dict, Optional
|
||||
|
||||
from plugins.google_meet.node import protocol as _proto
|
||||
|
||||
|
||||
class NodeClient:
|
||||
"""Thin synchronous WS client matching the server's request surface."""
|
||||
|
||||
def __init__(self, url: str, token: str, timeout: float = 10.0) -> None:
|
||||
if not isinstance(url, str) or not url:
|
||||
raise ValueError("url must be a non-empty string")
|
||||
if not isinstance(token, str) or not token:
|
||||
raise ValueError("token must be a non-empty string")
|
||||
self.url = url
|
||||
self.token = token
|
||||
self.timeout = float(timeout)
|
||||
|
||||
# ----- core RPC -----------------------------------------------------
|
||||
|
||||
def _rpc(self, type: str, payload: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""Send one request, return the response payload dict.
|
||||
|
||||
Raises RuntimeError when the server sends an ``error`` envelope
|
||||
or the response id doesn't match.
|
||||
"""
|
||||
try:
|
||||
from websockets.sync.client import connect # type: ignore
|
||||
except ImportError as exc:
|
||||
raise RuntimeError(
|
||||
"NodeClient requires the 'websockets' package. "
|
||||
"Install it with: pip install websockets"
|
||||
) from exc
|
||||
|
||||
req = _proto.make_request(type, self.token, payload)
|
||||
raw_out = _proto.encode(req)
|
||||
|
||||
with connect(self.url, open_timeout=self.timeout,
|
||||
close_timeout=self.timeout) as ws:
|
||||
ws.send(raw_out)
|
||||
raw_in = ws.recv(timeout=self.timeout)
|
||||
|
||||
if isinstance(raw_in, (bytes, bytearray)):
|
||||
raw_in = raw_in.decode("utf-8")
|
||||
resp = _proto.decode(raw_in)
|
||||
|
||||
if resp.get("type") == "error":
|
||||
raise RuntimeError(f"node error: {resp.get('error', '<unknown>')}")
|
||||
if resp.get("id") != req["id"]:
|
||||
raise RuntimeError(
|
||||
f"response id mismatch: sent {req['id']}, got {resp.get('id')!r}"
|
||||
)
|
||||
payload_out = resp.get("payload")
|
||||
if not isinstance(payload_out, dict):
|
||||
# Ping returns {"type": "pong", "payload": {...}} — still a dict.
|
||||
raise RuntimeError("response missing payload dict")
|
||||
return payload_out
|
||||
|
||||
# ----- convenience methods -----------------------------------------
|
||||
|
||||
def start_bot(
|
||||
self,
|
||||
url: str,
|
||||
guest_name: str = "Hermes Agent",
|
||||
duration: Optional[str] = None,
|
||||
headed: bool = False,
|
||||
mode: str = "transcribe",
|
||||
) -> Dict[str, Any]:
|
||||
payload: Dict[str, Any] = {
|
||||
"url": url,
|
||||
"guest_name": guest_name,
|
||||
"headed": bool(headed),
|
||||
"mode": mode,
|
||||
}
|
||||
if duration is not None:
|
||||
payload["duration"] = duration
|
||||
return self._rpc("start_bot", payload)
|
||||
|
||||
def stop(self) -> Dict[str, Any]:
|
||||
return self._rpc("stop", {})
|
||||
|
||||
def status(self) -> Dict[str, Any]:
|
||||
return self._rpc("status", {})
|
||||
|
||||
def transcript(self, last: Optional[int] = None) -> Dict[str, Any]:
|
||||
payload: Dict[str, Any] = {}
|
||||
if last is not None:
|
||||
payload["last"] = int(last)
|
||||
return self._rpc("transcript", payload)
|
||||
|
||||
def say(self, text: str) -> Dict[str, Any]:
|
||||
return self._rpc("say", {"text": str(text)})
|
||||
|
||||
def ping(self) -> Dict[str, Any]:
|
||||
return self._rpc("ping", {})
|
||||
124
plugins/google_meet/node/protocol.py
Normal file
124
plugins/google_meet/node/protocol.py
Normal file
@@ -0,0 +1,124 @@
|
||||
"""Wire protocol for gateway ↔ node RPC.
|
||||
|
||||
Everything is a JSON object with the same envelope shape:
|
||||
|
||||
Request: {"type": <str>, "id": <str>, "token": <str>, "payload": <dict>}
|
||||
Response: {"type": "<req-type>_res", "id": <req-id>, "payload": <dict>}
|
||||
Error: {"type": "error", "id": <req-id>, "error": <str>}
|
||||
|
||||
Requests must carry the shared bearer token (set up via
|
||||
``hermes meet node approve`` on the gateway and read off disk on the
|
||||
server). Mismatched tokens are rejected before dispatch.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import uuid
|
||||
from typing import Any, Dict, Tuple
|
||||
|
||||
|
||||
VALID_REQUEST_TYPES = frozenset({
|
||||
"start_bot",
|
||||
"stop",
|
||||
"status",
|
||||
"transcript",
|
||||
"say",
|
||||
"ping",
|
||||
})
|
||||
|
||||
|
||||
def make_request(
|
||||
type: str,
|
||||
token: str,
|
||||
payload: Dict[str, Any],
|
||||
req_id: str | None = None,
|
||||
) -> Dict[str, Any]:
|
||||
"""Construct a request envelope.
|
||||
|
||||
``req_id`` is auto-generated (uuid4 hex) when not supplied so callers
|
||||
can correlate async responses.
|
||||
"""
|
||||
if not isinstance(type, str) or not type:
|
||||
raise ValueError("type must be a non-empty string")
|
||||
if type not in VALID_REQUEST_TYPES:
|
||||
raise ValueError(f"unknown request type: {type!r}")
|
||||
if not isinstance(token, str):
|
||||
raise ValueError("token must be a string")
|
||||
if not isinstance(payload, dict):
|
||||
raise ValueError("payload must be a dict")
|
||||
return {
|
||||
"type": type,
|
||||
"id": req_id or uuid.uuid4().hex,
|
||||
"token": token,
|
||||
"payload": payload,
|
||||
}
|
||||
|
||||
|
||||
def make_response(req_id: str, payload: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""Build a success response. The caller supplies the *request* type;
|
||||
we suffix it with ``_res`` so clients can assert they got the right
|
||||
reply.
|
||||
|
||||
For simplicity we don't require the type here — clients usually just
|
||||
key off ``id``. But we still emit a generic ``*_res`` envelope.
|
||||
"""
|
||||
if not isinstance(payload, dict):
|
||||
raise ValueError("payload must be a dict")
|
||||
return {"type": "response", "id": req_id, "payload": payload}
|
||||
|
||||
|
||||
def make_error(req_id: str, error: str) -> Dict[str, Any]:
|
||||
return {"type": "error", "id": req_id, "error": str(error)}
|
||||
|
||||
|
||||
def encode(msg: Dict[str, Any]) -> str:
|
||||
"""Serialize a message envelope to a JSON string."""
|
||||
return json.dumps(msg, separators=(",", ":"), ensure_ascii=False)
|
||||
|
||||
|
||||
def decode(raw: str) -> Dict[str, Any]:
|
||||
"""Parse a JSON envelope, raising ValueError on anything malformed.
|
||||
|
||||
Minimal type validation: must be an object, must contain ``type`` and
|
||||
``id``. Heavier validation (token match, payload shape) happens in
|
||||
:func:`validate_request` on the server side.
|
||||
"""
|
||||
try:
|
||||
obj = json.loads(raw)
|
||||
except (TypeError, json.JSONDecodeError) as exc:
|
||||
raise ValueError(f"malformed JSON: {exc}") from exc
|
||||
if not isinstance(obj, dict):
|
||||
raise ValueError("envelope must be a JSON object")
|
||||
if "type" not in obj or not isinstance(obj["type"], str):
|
||||
raise ValueError("envelope missing string 'type'")
|
||||
if "id" not in obj or not isinstance(obj["id"], str):
|
||||
raise ValueError("envelope missing string 'id'")
|
||||
return obj
|
||||
|
||||
|
||||
def validate_request(msg: Dict[str, Any], expected_token: str) -> Tuple[bool, str]:
|
||||
"""Check a decoded request against the server's shared token.
|
||||
|
||||
Returns ``(True, "")`` when the envelope is acceptable or
|
||||
``(False, <reason>)`` otherwise. Reason strings are safe to surface
|
||||
back to the client in an error envelope.
|
||||
"""
|
||||
if not isinstance(msg, dict):
|
||||
return False, "envelope must be a dict"
|
||||
t = msg.get("type")
|
||||
if not isinstance(t, str) or not t:
|
||||
return False, "missing or non-string 'type'"
|
||||
if t not in VALID_REQUEST_TYPES:
|
||||
return False, f"unknown request type: {t!r}"
|
||||
if not isinstance(msg.get("id"), str) or not msg.get("id"):
|
||||
return False, "missing or non-string 'id'"
|
||||
token = msg.get("token")
|
||||
if not isinstance(token, str) or not token:
|
||||
return False, "missing token"
|
||||
if token != expected_token:
|
||||
return False, "token mismatch"
|
||||
payload = msg.get("payload")
|
||||
if not isinstance(payload, dict):
|
||||
return False, "payload must be a dict"
|
||||
return True, ""
|
||||
112
plugins/google_meet/node/registry.py
Normal file
112
plugins/google_meet/node/registry.py
Normal file
@@ -0,0 +1,112 @@
|
||||
"""Local JSON registry of approved remote meet nodes.
|
||||
|
||||
Lives at ``$HERMES_HOME/workspace/meetings/nodes.json``. The gateway
|
||||
consults it to resolve a ``chrome_node`` name to a ``(url, token)`` pair
|
||||
before opening a WebSocket to the remote bot host.
|
||||
|
||||
Schema
|
||||
------
|
||||
{
|
||||
"nodes": {
|
||||
"<name>": {
|
||||
"url": "ws://host:port",
|
||||
"token": "...",
|
||||
"added_at": <epoch_float>
|
||||
}
|
||||
}
|
||||
}
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import time
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
from hermes_constants import get_hermes_home
|
||||
|
||||
|
||||
def _default_path() -> Path:
|
||||
return Path(get_hermes_home()) / "workspace" / "meetings" / "nodes.json"
|
||||
|
||||
|
||||
class NodeRegistry:
|
||||
"""Simple file-backed registry. Not concurrent-safe across processes
|
||||
— single writer assumed (the gateway CLI)."""
|
||||
|
||||
def __init__(self, path: Optional[Path] = None) -> None:
|
||||
self.path = Path(path) if path is not None else _default_path()
|
||||
|
||||
# ----- storage ------------------------------------------------------
|
||||
|
||||
def _load(self) -> Dict[str, Any]:
|
||||
if not self.path.is_file():
|
||||
return {"nodes": {}}
|
||||
try:
|
||||
data = json.loads(self.path.read_text(encoding="utf-8"))
|
||||
except (OSError, json.JSONDecodeError):
|
||||
return {"nodes": {}}
|
||||
if not isinstance(data, dict) or not isinstance(data.get("nodes"), dict):
|
||||
return {"nodes": {}}
|
||||
return data
|
||||
|
||||
def _save(self, data: Dict[str, Any]) -> None:
|
||||
self.path.parent.mkdir(parents=True, exist_ok=True)
|
||||
tmp = self.path.with_suffix(".json.tmp")
|
||||
tmp.write_text(json.dumps(data, indent=2), encoding="utf-8")
|
||||
tmp.replace(self.path)
|
||||
|
||||
# ----- public API ---------------------------------------------------
|
||||
|
||||
def get(self, name: str) -> Optional[Dict[str, Any]]:
|
||||
data = self._load()
|
||||
entry = data["nodes"].get(name)
|
||||
if entry is None:
|
||||
return None
|
||||
return {"name": name, **entry}
|
||||
|
||||
def add(self, name: str, url: str, token: str) -> None:
|
||||
if not isinstance(name, str) or not name:
|
||||
raise ValueError("node name must be a non-empty string")
|
||||
if not isinstance(url, str) or not url:
|
||||
raise ValueError("url must be a non-empty string")
|
||||
if not isinstance(token, str) or not token:
|
||||
raise ValueError("token must be a non-empty string")
|
||||
data = self._load()
|
||||
data["nodes"][name] = {
|
||||
"url": url,
|
||||
"token": token,
|
||||
"added_at": time.time(),
|
||||
}
|
||||
self._save(data)
|
||||
|
||||
def remove(self, name: str) -> bool:
|
||||
data = self._load()
|
||||
if name in data["nodes"]:
|
||||
del data["nodes"][name]
|
||||
self._save(data)
|
||||
return True
|
||||
return False
|
||||
|
||||
def list_all(self) -> List[Dict[str, Any]]:
|
||||
data = self._load()
|
||||
out: List[Dict[str, Any]] = []
|
||||
for name, entry in sorted(data["nodes"].items()):
|
||||
out.append({"name": name, **entry})
|
||||
return out
|
||||
|
||||
def resolve(self, chrome_node: Optional[str]) -> Optional[Dict[str, Any]]:
|
||||
"""Resolve a node name to its entry.
|
||||
|
||||
If ``chrome_node`` is provided, return that named node (or None).
|
||||
If ``chrome_node`` is None, return the sole registered node when
|
||||
exactly one is registered; otherwise return None (ambiguous or
|
||||
empty).
|
||||
"""
|
||||
if chrome_node:
|
||||
return self.get(chrome_node)
|
||||
nodes = self.list_all()
|
||||
if len(nodes) == 1:
|
||||
return nodes[0]
|
||||
return None
|
||||
193
plugins/google_meet/node/server.py
Normal file
193
plugins/google_meet/node/server.py
Normal file
@@ -0,0 +1,193 @@
|
||||
"""Remote node server.
|
||||
|
||||
Runs on the machine that will host the Meet bot (typically the user's
|
||||
Mac laptop with a signed-in Chrome). Exposes a WebSocket endpoint that
|
||||
accepts signed RPC requests and dispatches them to the existing
|
||||
``plugins.google_meet.process_manager`` module.
|
||||
|
||||
Launched by ``hermes meet node run``.
|
||||
|
||||
Token handling
|
||||
--------------
|
||||
On first boot we mint 32 hex chars of entropy and persist them at
|
||||
``$HERMES_HOME/workspace/meetings/node_token.json``. Subsequent boots
|
||||
reuse the same token so previously-approved gateways don't need to be
|
||||
re-paired. The operator copies this token out-of-band to the gateway
|
||||
via ``hermes meet node approve <name> <url> <token>``.
|
||||
|
||||
Dependencies
|
||||
------------
|
||||
``websockets`` is an optional dep. We import it lazily inside
|
||||
:meth:`serve` so installing the plugin doesn't require it unless you
|
||||
actually host a node.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import secrets
|
||||
import time
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict, Optional
|
||||
|
||||
from hermes_constants import get_hermes_home
|
||||
from plugins.google_meet.node import protocol as _proto
|
||||
|
||||
|
||||
def _default_token_path() -> Path:
|
||||
return Path(get_hermes_home()) / "workspace" / "meetings" / "node_token.json"
|
||||
|
||||
|
||||
class NodeServer:
|
||||
"""WebSocket server that executes meet bot RPCs locally."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
host: str = "0.0.0.0",
|
||||
port: int = 18789,
|
||||
token_path: Optional[Path] = None,
|
||||
display_name: str = "hermes-meet-node",
|
||||
) -> None:
|
||||
self.host = host
|
||||
self.port = port
|
||||
self.display_name = display_name
|
||||
self.token_path = Path(token_path) if token_path is not None else _default_token_path()
|
||||
self._token: Optional[str] = None
|
||||
|
||||
# ----- token management --------------------------------------------
|
||||
|
||||
def ensure_token(self) -> str:
|
||||
"""Return the persisted shared secret, generating one on first use."""
|
||||
if self._token:
|
||||
return self._token
|
||||
if self.token_path.is_file():
|
||||
try:
|
||||
data = json.loads(self.token_path.read_text(encoding="utf-8"))
|
||||
tok = data.get("token")
|
||||
if isinstance(tok, str) and tok:
|
||||
self._token = tok
|
||||
return tok
|
||||
except (OSError, json.JSONDecodeError):
|
||||
pass
|
||||
tok = secrets.token_hex(16) # 32 hex chars
|
||||
self.token_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
tmp = self.token_path.with_suffix(".json.tmp")
|
||||
tmp.write_text(
|
||||
json.dumps({"token": tok, "generated_at": time.time()}, indent=2),
|
||||
encoding="utf-8",
|
||||
)
|
||||
tmp.replace(self.token_path)
|
||||
self._token = tok
|
||||
return tok
|
||||
|
||||
def get_token(self) -> str:
|
||||
"""Alias for :meth:`ensure_token`; does not mutate on subsequent calls."""
|
||||
return self.ensure_token()
|
||||
|
||||
# ----- dispatch -----------------------------------------------------
|
||||
|
||||
async def _handle_request(self, msg: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""Validate + dispatch a single decoded request envelope.
|
||||
|
||||
Always returns a response envelope (success or error); never
|
||||
raises. Errors from inside the process_manager are wrapped into
|
||||
the response payload's ``ok``/``error`` keys (which pm already
|
||||
does) rather than being re-encoded as error envelopes — the
|
||||
envelope-level error channel is reserved for auth / protocol
|
||||
failures.
|
||||
"""
|
||||
expected = self.ensure_token()
|
||||
ok, reason = _proto.validate_request(msg, expected)
|
||||
if not ok:
|
||||
return _proto.make_error(str(msg.get("id") or ""), reason)
|
||||
|
||||
req_id = msg["id"]
|
||||
t = msg["type"]
|
||||
payload = msg["payload"]
|
||||
|
||||
# Import lazily so test mocks can monkeypatch freely.
|
||||
from plugins.google_meet import process_manager as pm
|
||||
|
||||
try:
|
||||
if t == "ping":
|
||||
return {"type": "pong", "id": req_id,
|
||||
"payload": {"display_name": self.display_name,
|
||||
"ts": time.time()}}
|
||||
if t == "start_bot":
|
||||
# Whitelist kwargs we pass through to pm.start.
|
||||
kwargs = {
|
||||
k: payload[k]
|
||||
for k in ("url", "guest_name", "duration", "headed",
|
||||
"auth_state", "session_id", "out_dir")
|
||||
if k in payload
|
||||
}
|
||||
if "url" not in kwargs:
|
||||
return _proto.make_error(req_id, "missing 'url' in payload")
|
||||
result = pm.start(**kwargs)
|
||||
return _proto.make_response(req_id, result)
|
||||
if t == "stop":
|
||||
reason_arg = payload.get("reason", "requested")
|
||||
result = pm.stop(reason=reason_arg)
|
||||
return _proto.make_response(req_id, result)
|
||||
if t == "status":
|
||||
return _proto.make_response(req_id, pm.status())
|
||||
if t == "transcript":
|
||||
last = payload.get("last")
|
||||
result = pm.transcript(last=last)
|
||||
return _proto.make_response(req_id, result)
|
||||
if t == "say":
|
||||
# v2 wiring: enqueue into say_queue.jsonl inside the
|
||||
# active meeting's out_dir when present. The bot-side
|
||||
# consumer is v3+ (for v1 this is a stub returning ok).
|
||||
text = payload.get("text", "")
|
||||
active = pm._read_active() # type: ignore[attr-defined]
|
||||
enqueued = False
|
||||
if active and active.get("out_dir"):
|
||||
queue = Path(active["out_dir"]) / "say_queue.jsonl"
|
||||
try:
|
||||
queue.parent.mkdir(parents=True, exist_ok=True)
|
||||
with queue.open("a", encoding="utf-8") as fh:
|
||||
fh.write(json.dumps({"text": text, "ts": time.time()}) + "\n")
|
||||
enqueued = True
|
||||
except OSError:
|
||||
enqueued = False
|
||||
return _proto.make_response(
|
||||
req_id,
|
||||
{"ok": True, "enqueued": enqueued, "text": text},
|
||||
)
|
||||
except Exception as exc: # noqa: BLE001 — surface any pm crash to client
|
||||
return _proto.make_error(req_id, f"{type(exc).__name__}: {exc}")
|
||||
|
||||
return _proto.make_error(req_id, f"unhandled type: {t!r}")
|
||||
|
||||
# ----- server loop --------------------------------------------------
|
||||
|
||||
async def serve(self) -> None:
|
||||
"""Run the WebSocket server until cancelled.
|
||||
|
||||
Blocks forever. Callers typically wrap this in ``asyncio.run``.
|
||||
"""
|
||||
try:
|
||||
import websockets # type: ignore
|
||||
except ImportError as exc:
|
||||
raise RuntimeError(
|
||||
"NodeServer.serve requires the 'websockets' package. "
|
||||
"Install it with: pip install websockets"
|
||||
) from exc
|
||||
|
||||
self.ensure_token()
|
||||
|
||||
async def _handler(ws):
|
||||
async for raw in ws:
|
||||
try:
|
||||
msg = _proto.decode(raw if isinstance(raw, str) else raw.decode("utf-8"))
|
||||
except ValueError as exc:
|
||||
await ws.send(_proto.encode(_proto.make_error("", f"decode: {exc}")))
|
||||
continue
|
||||
reply = await self._handle_request(msg)
|
||||
await ws.send(_proto.encode(reply))
|
||||
|
||||
async with websockets.serve(_handler, self.host, self.port):
|
||||
# Run until cancelled.
|
||||
import asyncio
|
||||
await asyncio.Future()
|
||||
16
plugins/google_meet/plugin.yaml
Normal file
16
plugins/google_meet/plugin.yaml
Normal file
@@ -0,0 +1,16 @@
|
||||
name: google_meet
|
||||
version: 0.2.0
|
||||
description: "Join a Google Meet call, transcribe live captions, speak in realtime, and follow up afterwards. v1 transcribe-only is the default; v2 realtime duplex audio via OpenAI Realtime + BlackHole/PulseAudio ships with mode='realtime'; v3 remote node host lets the bot run on a different machine than the gateway (gateway on Linux, Chrome+signed-in profile on the user's Mac). Explicit-by-design: only joins meet.google.com URLs passed in \u2014 no calendar scanning, no auto-dial."
|
||||
author: NousResearch
|
||||
kind: standalone
|
||||
platforms:
|
||||
- linux
|
||||
- macos
|
||||
provides_tools:
|
||||
- meet_join
|
||||
- meet_leave
|
||||
- meet_status
|
||||
- meet_transcript
|
||||
- meet_say
|
||||
hooks:
|
||||
- on_session_end
|
||||
326
plugins/google_meet/process_manager.py
Normal file
326
plugins/google_meet/process_manager.py
Normal file
@@ -0,0 +1,326 @@
|
||||
"""Subprocess lifecycle manager for the google_meet bot.
|
||||
|
||||
Single active meeting at a time. Stores the running pid + out_dir in a
|
||||
session-scoped state file under ``$HERMES_HOME/workspace/meetings/.active.json``
|
||||
so tool calls across turns can find the bot, and ``on_session_end`` can clean
|
||||
it up.
|
||||
|
||||
The bot runs as a detached subprocess — we don't hold file descriptors open,
|
||||
so the parent agent loop can't block on it. We communicate via files only.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import os
|
||||
import signal
|
||||
import subprocess
|
||||
import sys
|
||||
import time
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict, Optional
|
||||
|
||||
from hermes_constants import get_hermes_home
|
||||
|
||||
# File + directory layout (under $HERMES_HOME):
|
||||
#
|
||||
# workspace/meetings/
|
||||
# .active.json # pointer to current session's bot
|
||||
# <meeting-id>/
|
||||
# status.json # live bot state (written by bot each tick)
|
||||
# transcript.txt # scraped captions
|
||||
#
|
||||
# .active.json holds:
|
||||
# {"pid": 12345, "meeting_id": "abc-defg-hij", "out_dir": "...",
|
||||
# "url": "https://meet.google.com/...", "started_at": 1714159200.0,
|
||||
# "session_id": "optional"}
|
||||
|
||||
|
||||
def _root() -> Path:
|
||||
return Path(get_hermes_home()) / "workspace" / "meetings"
|
||||
|
||||
|
||||
def _active_file() -> Path:
|
||||
return _root() / ".active.json"
|
||||
|
||||
|
||||
def _read_active() -> Optional[Dict[str, Any]]:
|
||||
p = _active_file()
|
||||
if not p.is_file():
|
||||
return None
|
||||
try:
|
||||
return json.loads(p.read_text(encoding="utf-8"))
|
||||
except Exception:
|
||||
return None
|
||||
|
||||
|
||||
def _write_active(data: Dict[str, Any]) -> None:
|
||||
p = _active_file()
|
||||
p.parent.mkdir(parents=True, exist_ok=True)
|
||||
tmp = p.with_suffix(".json.tmp")
|
||||
tmp.write_text(json.dumps(data, indent=2), encoding="utf-8")
|
||||
tmp.replace(p)
|
||||
|
||||
|
||||
def _clear_active() -> None:
|
||||
try:
|
||||
_active_file().unlink()
|
||||
except FileNotFoundError:
|
||||
pass
|
||||
|
||||
|
||||
def _pid_alive(pid: int) -> bool:
|
||||
try:
|
||||
os.kill(pid, 0)
|
||||
except ProcessLookupError:
|
||||
return False
|
||||
except PermissionError:
|
||||
# Process exists but we can't signal it — treat as alive.
|
||||
return True
|
||||
return True
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Public API — used by tool handlers + CLI
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def start(
|
||||
url: str,
|
||||
*,
|
||||
out_dir: Optional[Path] = None,
|
||||
headed: bool = False,
|
||||
auth_state: Optional[str] = None,
|
||||
guest_name: str = "Hermes Agent",
|
||||
duration: Optional[str] = None,
|
||||
session_id: Optional[str] = None,
|
||||
mode: str = "transcribe",
|
||||
realtime_model: Optional[str] = None,
|
||||
realtime_voice: Optional[str] = None,
|
||||
realtime_instructions: Optional[str] = None,
|
||||
realtime_api_key: Optional[str] = None,
|
||||
) -> Dict[str, Any]:
|
||||
"""Spawn the meet_bot subprocess for *url*.
|
||||
|
||||
If a bot is already running for this hermes install, leave it first —
|
||||
we enforce single-active-meeting semantics.
|
||||
|
||||
Returns a dict summarizing the started bot.
|
||||
"""
|
||||
from plugins.google_meet.meet_bot import _is_safe_meet_url, _meeting_id_from_url
|
||||
|
||||
if not _is_safe_meet_url(url):
|
||||
return {
|
||||
"ok": False,
|
||||
"error": (
|
||||
"refusing: only https://meet.google.com/ URLs are allowed. "
|
||||
"got: " + repr(url)
|
||||
),
|
||||
}
|
||||
|
||||
existing = _read_active()
|
||||
if existing and _pid_alive(int(existing.get("pid", 0))):
|
||||
stop(reason="replaced by new meet_join")
|
||||
|
||||
meeting_id = _meeting_id_from_url(url)
|
||||
out = out_dir or (_root() / meeting_id)
|
||||
out.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Wipe any stale transcript/status files from a previous run of this
|
||||
# meeting id so polling isn't confused.
|
||||
for name in ("transcript.txt", "status.json"):
|
||||
f = out / name
|
||||
if f.exists():
|
||||
try:
|
||||
f.unlink()
|
||||
except OSError:
|
||||
pass
|
||||
|
||||
env = os.environ.copy()
|
||||
env["HERMES_MEET_URL"] = url
|
||||
env["HERMES_MEET_OUT_DIR"] = str(out)
|
||||
env["HERMES_MEET_GUEST_NAME"] = guest_name
|
||||
if headed:
|
||||
env["HERMES_MEET_HEADED"] = "1"
|
||||
if auth_state:
|
||||
env["HERMES_MEET_AUTH_STATE"] = auth_state
|
||||
if duration:
|
||||
env["HERMES_MEET_DURATION"] = duration
|
||||
# v2: realtime mode + passthroughs. The bot defaults to transcribe
|
||||
# mode if HERMES_MEET_MODE isn't set, matching v1 behavior.
|
||||
if mode:
|
||||
env["HERMES_MEET_MODE"] = mode
|
||||
if realtime_model:
|
||||
env["HERMES_MEET_REALTIME_MODEL"] = realtime_model
|
||||
if realtime_voice:
|
||||
env["HERMES_MEET_REALTIME_VOICE"] = realtime_voice
|
||||
if realtime_instructions:
|
||||
env["HERMES_MEET_REALTIME_INSTRUCTIONS"] = realtime_instructions
|
||||
if realtime_api_key:
|
||||
env["HERMES_MEET_REALTIME_KEY"] = realtime_api_key
|
||||
|
||||
log_path = out / "bot.log"
|
||||
# Detach: stdin=devnull, stdout/stderr → log file, new session so parent
|
||||
# signals don't propagate.
|
||||
log_fh = open(log_path, "ab", buffering=0)
|
||||
try:
|
||||
proc = subprocess.Popen(
|
||||
[sys.executable, "-m", "plugins.google_meet.meet_bot"],
|
||||
stdin=subprocess.DEVNULL,
|
||||
stdout=log_fh,
|
||||
stderr=subprocess.STDOUT,
|
||||
env=env,
|
||||
start_new_session=True,
|
||||
close_fds=True,
|
||||
)
|
||||
finally:
|
||||
# The subprocess now owns the log fd; we can close ours.
|
||||
log_fh.close()
|
||||
|
||||
record = {
|
||||
"pid": proc.pid,
|
||||
"meeting_id": meeting_id,
|
||||
"out_dir": str(out),
|
||||
"url": url,
|
||||
"started_at": time.time(),
|
||||
"session_id": session_id,
|
||||
"log_path": str(log_path),
|
||||
"mode": mode,
|
||||
}
|
||||
_write_active(record)
|
||||
return {"ok": True, **record}
|
||||
|
||||
|
||||
def status() -> Dict[str, Any]:
|
||||
"""Return the current meeting state, or ``{"ok": False, "reason": ...}``."""
|
||||
active = _read_active()
|
||||
if not active:
|
||||
return {"ok": False, "reason": "no active meeting"}
|
||||
|
||||
pid = int(active.get("pid", 0))
|
||||
alive = _pid_alive(pid) if pid else False
|
||||
|
||||
status_path = Path(active.get("out_dir", "")) / "status.json"
|
||||
bot_status: Dict[str, Any] = {}
|
||||
if status_path.is_file():
|
||||
try:
|
||||
bot_status = json.loads(status_path.read_text(encoding="utf-8"))
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
return {
|
||||
"ok": True,
|
||||
"alive": alive,
|
||||
"pid": pid,
|
||||
"meetingId": active.get("meeting_id"),
|
||||
"url": active.get("url"),
|
||||
"startedAt": active.get("started_at"),
|
||||
"outDir": active.get("out_dir"),
|
||||
**bot_status,
|
||||
}
|
||||
|
||||
|
||||
def transcript(last: Optional[int] = None) -> Dict[str, Any]:
|
||||
"""Read the current transcript file. Returns ok=False if none exists."""
|
||||
active = _read_active()
|
||||
if not active:
|
||||
return {"ok": False, "reason": "no active meeting"}
|
||||
|
||||
tp = Path(active.get("out_dir", "")) / "transcript.txt"
|
||||
if not tp.is_file():
|
||||
return {
|
||||
"ok": True,
|
||||
"meetingId": active.get("meeting_id"),
|
||||
"lines": [],
|
||||
"total": 0,
|
||||
"path": str(tp),
|
||||
}
|
||||
text = tp.read_text(encoding="utf-8", errors="replace")
|
||||
all_lines = [ln for ln in text.splitlines() if ln.strip()]
|
||||
lines = all_lines[-last:] if last else all_lines
|
||||
return {
|
||||
"ok": True,
|
||||
"meetingId": active.get("meeting_id"),
|
||||
"lines": lines,
|
||||
"total": len(all_lines),
|
||||
"path": str(tp),
|
||||
}
|
||||
|
||||
|
||||
def enqueue_say(text: str) -> Dict[str, Any]:
|
||||
"""Append a ``say`` request to the active bot's JSONL queue.
|
||||
|
||||
Returns ``{"ok": False, "reason": ...}`` when no meeting is active or
|
||||
the active bot is in transcribe-only mode. Otherwise writes a line to
|
||||
``<out_dir>/say_queue.jsonl`` that the bot's realtime speaker thread
|
||||
will consume.
|
||||
"""
|
||||
import uuid
|
||||
|
||||
text = (text or "").strip()
|
||||
if not text:
|
||||
return {"ok": False, "reason": "text is required"}
|
||||
|
||||
active = _read_active()
|
||||
if not active:
|
||||
return {"ok": False, "reason": "no active meeting"}
|
||||
if active.get("mode") != "realtime":
|
||||
return {
|
||||
"ok": False,
|
||||
"reason": (
|
||||
"active meeting is in transcribe mode — pass mode='realtime' "
|
||||
"to meet_join to enable agent speech"
|
||||
),
|
||||
}
|
||||
|
||||
out_dir = Path(active.get("out_dir", ""))
|
||||
if not out_dir.is_dir():
|
||||
return {"ok": False, "reason": f"out_dir missing: {out_dir}"}
|
||||
|
||||
queue_path = out_dir / "say_queue.jsonl"
|
||||
entry = {"id": uuid.uuid4().hex[:12], "text": text}
|
||||
with queue_path.open("a", encoding="utf-8") as f:
|
||||
f.write(json.dumps(entry) + "\n")
|
||||
return {
|
||||
"ok": True,
|
||||
"meetingId": active.get("meeting_id"),
|
||||
"enqueued_id": entry["id"],
|
||||
"queue_path": str(queue_path),
|
||||
}
|
||||
|
||||
|
||||
def stop(*, reason: str = "requested") -> Dict[str, Any]:
|
||||
"""Signal the active bot to leave cleanly, then clear the active pointer.
|
||||
|
||||
Sends SIGTERM and waits up to 10s for the bot to exit. Falls back to
|
||||
SIGKILL if the bot doesn't respond.
|
||||
"""
|
||||
active = _read_active()
|
||||
if not active:
|
||||
return {"ok": False, "reason": "no active meeting"}
|
||||
|
||||
pid = int(active.get("pid", 0))
|
||||
out_dir = active.get("out_dir")
|
||||
transcript_path = Path(out_dir) / "transcript.txt" if out_dir else None
|
||||
|
||||
if pid and _pid_alive(pid):
|
||||
try:
|
||||
os.kill(pid, signal.SIGTERM)
|
||||
except ProcessLookupError:
|
||||
pass
|
||||
for _ in range(20):
|
||||
if not _pid_alive(pid):
|
||||
break
|
||||
time.sleep(0.5)
|
||||
if _pid_alive(pid):
|
||||
try:
|
||||
os.kill(pid, signal.SIGKILL)
|
||||
except ProcessLookupError:
|
||||
pass
|
||||
|
||||
_clear_active()
|
||||
return {
|
||||
"ok": True,
|
||||
"reason": reason,
|
||||
"meetingId": active.get("meeting_id"),
|
||||
"transcriptPath": str(transcript_path) if transcript_path else None,
|
||||
}
|
||||
10
plugins/google_meet/realtime/__init__.py
Normal file
10
plugins/google_meet/realtime/__init__.py
Normal file
@@ -0,0 +1,10 @@
|
||||
"""Realtime speech subpackage for the google_meet plugin (v2).
|
||||
|
||||
Provides a thin OpenAI Realtime API client and a file-queue speaker
|
||||
wrapper so the Meet bot can play synthesized speech through the
|
||||
virtual audio bridge.
|
||||
"""
|
||||
|
||||
from .openai_client import RealtimeSession, RealtimeSpeaker # noqa: F401
|
||||
|
||||
__all__ = ["RealtimeSession", "RealtimeSpeaker"]
|
||||
332
plugins/google_meet/realtime/openai_client.py
Normal file
332
plugins/google_meet/realtime/openai_client.py
Normal file
@@ -0,0 +1,332 @@
|
||||
"""OpenAI Realtime API WebSocket client + file-queue speaker.
|
||||
|
||||
This module is the "output" side of the v2 voice bridge: it takes text,
|
||||
sends it to the OpenAI Realtime API, receives audio deltas back, and
|
||||
appends the PCM bytes to a file. A separate consumer (the audio
|
||||
bridge) streams that file into Chrome's fake microphone.
|
||||
|
||||
Designed for simplicity: a single synchronous WebSocket connection per
|
||||
speaker, per session. The ``websockets`` package is imported lazily so
|
||||
that importing this module never fails just because the optional dep
|
||||
is missing.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import base64
|
||||
import json
|
||||
import time
|
||||
import uuid
|
||||
from pathlib import Path
|
||||
from typing import Any, Callable, Optional
|
||||
|
||||
|
||||
REALTIME_URL = "wss://api.openai.com/v1/realtime"
|
||||
|
||||
|
||||
def _require_websockets():
|
||||
"""Import ``websockets.sync.client.connect`` or raise with hint."""
|
||||
try:
|
||||
from websockets.sync.client import connect as _connect # type: ignore
|
||||
except ImportError as exc: # pragma: no cover - exercised via test
|
||||
raise RuntimeError(
|
||||
"websockets package is required for OpenAI Realtime; "
|
||||
"install with: pip install websockets"
|
||||
) from exc
|
||||
return _connect
|
||||
|
||||
|
||||
class RealtimeSession:
|
||||
"""Minimal sync client for the OpenAI Realtime WebSocket API.
|
||||
|
||||
Usage:
|
||||
sess = RealtimeSession(api_key=..., audio_sink_path=Path("out.pcm"))
|
||||
sess.connect()
|
||||
sess.speak("Hello team.")
|
||||
sess.close()
|
||||
|
||||
Thread safety: ``speak`` and ``cancel_response`` may be called from
|
||||
different threads; a lock serializes WebSocket writes.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
api_key: str,
|
||||
model: str = "gpt-realtime",
|
||||
voice: str = "alloy",
|
||||
instructions: str = "",
|
||||
audio_sink_path: Optional[Path] = None,
|
||||
sample_rate: int = 24000,
|
||||
) -> None:
|
||||
import threading as _threading
|
||||
self.api_key = api_key
|
||||
self.model = model
|
||||
self.voice = voice
|
||||
self.instructions = instructions
|
||||
self.audio_sink_path = Path(audio_sink_path) if audio_sink_path else None
|
||||
self.sample_rate = sample_rate
|
||||
self._ws: Any = None
|
||||
self._send_lock = _threading.Lock()
|
||||
self._last_response_id: Optional[str] = None
|
||||
# Public counters for status reporting.
|
||||
self.audio_bytes_out: int = 0
|
||||
self.last_audio_out_at: Optional[float] = None
|
||||
|
||||
# ── lifecycle ─────────────────────────────────────────────────────────
|
||||
|
||||
def connect(self) -> None:
|
||||
"""Open WS and send session.update with voice+instructions."""
|
||||
connect = _require_websockets()
|
||||
url = f"{REALTIME_URL}?model={self.model}"
|
||||
headers = [
|
||||
("Authorization", f"Bearer {self.api_key}"),
|
||||
("OpenAI-Beta", "realtime=v1"),
|
||||
]
|
||||
# websockets.sync.client.connect accepts either additional_headers=
|
||||
# (newer) or extra_headers= depending on version; try the newer
|
||||
# name first and fall back.
|
||||
try:
|
||||
self._ws = connect(url, additional_headers=headers)
|
||||
except TypeError:
|
||||
self._ws = connect(url, extra_headers=headers)
|
||||
|
||||
self._send_json(
|
||||
{
|
||||
"type": "session.update",
|
||||
"session": {
|
||||
"voice": self.voice,
|
||||
"instructions": self.instructions,
|
||||
"modalities": ["audio", "text"],
|
||||
"output_audio_format": "pcm16",
|
||||
"input_audio_format": "pcm16",
|
||||
},
|
||||
}
|
||||
)
|
||||
|
||||
def close(self) -> None:
|
||||
if self._ws is not None:
|
||||
try:
|
||||
self._ws.close()
|
||||
except Exception:
|
||||
pass
|
||||
self._ws = None
|
||||
|
||||
# ── speaking ──────────────────────────────────────────────────────────
|
||||
|
||||
def speak(self, text: str, timeout: float = 30.0) -> dict:
|
||||
"""Send ``text`` and accumulate the audio response.
|
||||
|
||||
Audio deltas are base64-decoded and appended to
|
||||
``audio_sink_path`` (opened 'ab' and closed per call, so a
|
||||
separate streaming reader can consume whatever is there).
|
||||
"""
|
||||
if self._ws is None:
|
||||
raise RuntimeError("RealtimeSession.connect() must be called first")
|
||||
|
||||
start = time.monotonic()
|
||||
|
||||
self._send_json(
|
||||
{
|
||||
"type": "conversation.item.create",
|
||||
"item": {
|
||||
"type": "message",
|
||||
"role": "user",
|
||||
"content": [{"type": "input_text", "text": text}],
|
||||
},
|
||||
}
|
||||
)
|
||||
self._send_json(
|
||||
{
|
||||
"type": "response.create",
|
||||
"response": {"modalities": ["audio"]},
|
||||
}
|
||||
)
|
||||
|
||||
bytes_written = 0
|
||||
sink_fp = None
|
||||
if self.audio_sink_path is not None:
|
||||
self.audio_sink_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
sink_fp = open(self.audio_sink_path, "ab")
|
||||
|
||||
try:
|
||||
while True:
|
||||
remaining = timeout - (time.monotonic() - start)
|
||||
if remaining <= 0:
|
||||
raise TimeoutError(
|
||||
f"realtime response did not complete within {timeout}s"
|
||||
)
|
||||
raw = self._recv(timeout=remaining)
|
||||
if raw is None:
|
||||
# Connection closed by peer.
|
||||
break
|
||||
try:
|
||||
frame = json.loads(raw) if isinstance(raw, (str, bytes, bytearray)) else raw
|
||||
except (TypeError, ValueError):
|
||||
continue
|
||||
if not isinstance(frame, dict):
|
||||
continue
|
||||
ftype = frame.get("type")
|
||||
if ftype == "response.audio.delta":
|
||||
b64 = frame.get("delta") or frame.get("audio") or ""
|
||||
if b64 and sink_fp is not None:
|
||||
try:
|
||||
chunk = base64.b64decode(b64)
|
||||
except (ValueError, TypeError):
|
||||
chunk = b""
|
||||
if chunk:
|
||||
sink_fp.write(chunk)
|
||||
sink_fp.flush()
|
||||
bytes_written += len(chunk)
|
||||
self.audio_bytes_out += len(chunk)
|
||||
self.last_audio_out_at = time.time()
|
||||
elif ftype == "response.created":
|
||||
rid = (frame.get("response") or {}).get("id")
|
||||
if rid:
|
||||
self._last_response_id = rid
|
||||
elif ftype in ("response.done", "response.completed", "response.cancelled"):
|
||||
break
|
||||
elif ftype == "error":
|
||||
err = frame.get("error") or frame
|
||||
raise RuntimeError(f"realtime error: {err}")
|
||||
# All other frames (response.created, response.output_item.*,
|
||||
# response.audio_transcript.delta, rate_limits.updated, ...)
|
||||
# are ignored for v2.
|
||||
finally:
|
||||
if sink_fp is not None:
|
||||
sink_fp.close()
|
||||
|
||||
duration_ms = (time.monotonic() - start) * 1000.0
|
||||
return {
|
||||
"ok": True,
|
||||
"bytes_written": bytes_written,
|
||||
"duration_ms": duration_ms,
|
||||
}
|
||||
|
||||
# ── ws plumbing ───────────────────────────────────────────────────────
|
||||
|
||||
def cancel_response(self) -> bool:
|
||||
"""Interrupt the in-flight response (barge-in).
|
||||
|
||||
Sends ``response.cancel`` on the current WebSocket so the model
|
||||
stops generating audio immediately. Safe to call at any time;
|
||||
returns True if a cancel was actually sent, False when there's
|
||||
nothing to cancel or the socket isn't open.
|
||||
"""
|
||||
if self._ws is None:
|
||||
return False
|
||||
try:
|
||||
self._send_json({"type": "response.cancel"})
|
||||
return True
|
||||
except Exception:
|
||||
return False
|
||||
|
||||
def _send_json(self, payload: dict) -> None:
|
||||
assert self._ws is not None
|
||||
with self._send_lock:
|
||||
self._ws.send(json.dumps(payload))
|
||||
|
||||
def _recv(self, timeout: Optional[float] = None):
|
||||
assert self._ws is not None
|
||||
try:
|
||||
if timeout is None:
|
||||
return self._ws.recv()
|
||||
return self._ws.recv(timeout=timeout)
|
||||
except TypeError:
|
||||
# Older websockets may not accept timeout kwarg.
|
||||
return self._ws.recv()
|
||||
|
||||
|
||||
class RealtimeSpeaker:
|
||||
"""File-based JSONL queue wrapper around :class:`RealtimeSession`.
|
||||
|
||||
Each line in ``queue_path`` is a JSON object of the form
|
||||
``{"id": "<uuid>", "text": "..."}``. Processed lines are appended
|
||||
to ``processed_path`` (if set) and then removed from the queue;
|
||||
if ``processed_path`` is ``None``, processed lines are simply
|
||||
dropped.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
session: RealtimeSession,
|
||||
queue_path: Path,
|
||||
processed_path: Optional[Path] = None,
|
||||
) -> None:
|
||||
self.session = session
|
||||
self.queue_path = Path(queue_path)
|
||||
self.processed_path = Path(processed_path) if processed_path else None
|
||||
|
||||
# ── helpers ──────────────────────────────────────────────────────────
|
||||
|
||||
def _read_queue(self) -> list[dict]:
|
||||
if not self.queue_path.exists():
|
||||
return []
|
||||
out: list[dict] = []
|
||||
for line in self.queue_path.read_text().splitlines():
|
||||
line = line.strip()
|
||||
if not line:
|
||||
continue
|
||||
try:
|
||||
entry = json.loads(line)
|
||||
except ValueError:
|
||||
continue
|
||||
if not isinstance(entry, dict):
|
||||
continue
|
||||
if "id" not in entry:
|
||||
entry["id"] = str(uuid.uuid4())
|
||||
out.append(entry)
|
||||
return out
|
||||
|
||||
def _rewrite_queue(self, remaining: list[dict]) -> None:
|
||||
if not remaining:
|
||||
# Keep the file but empty — consumers may be watching for
|
||||
# new writes via mtime, and delete-then-recreate is a race.
|
||||
self.queue_path.write_text("")
|
||||
return
|
||||
self.queue_path.write_text(
|
||||
"\n".join(json.dumps(e) for e in remaining) + "\n"
|
||||
)
|
||||
|
||||
def _append_processed(self, entry: dict, result: dict) -> None:
|
||||
if self.processed_path is None:
|
||||
return
|
||||
self.processed_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
record = {"id": entry.get("id"), "text": entry.get("text", ""), "result": result}
|
||||
with open(self.processed_path, "a") as fp:
|
||||
fp.write(json.dumps(record) + "\n")
|
||||
|
||||
# ── main loop ────────────────────────────────────────────────────────
|
||||
|
||||
def run_until_stopped(
|
||||
self,
|
||||
stop_fn: Callable[[], bool],
|
||||
poll_interval: float = 0.5,
|
||||
) -> None:
|
||||
while not stop_fn():
|
||||
entries = self._read_queue()
|
||||
if not entries:
|
||||
time.sleep(poll_interval)
|
||||
continue
|
||||
# Process one at a time; re-check the queue file after each
|
||||
# speak() call because new entries may have arrived.
|
||||
head = entries[0]
|
||||
text = (head.get("text") or "").strip()
|
||||
if text:
|
||||
try:
|
||||
result = self.session.speak(text)
|
||||
except Exception as exc:
|
||||
result = {"ok": False, "error": str(exc)}
|
||||
else:
|
||||
result = {"ok": True, "bytes_written": 0, "duration_ms": 0.0}
|
||||
self._append_processed(head, result)
|
||||
|
||||
# Re-read the queue from disk in case it was appended to
|
||||
# while we were speaking, then drop the head.
|
||||
latest = self._read_queue()
|
||||
if latest and latest[0].get("id") == head.get("id"):
|
||||
self._rewrite_queue(latest[1:])
|
||||
else:
|
||||
# Fallback: drop-by-id anywhere in the queue.
|
||||
self._rewrite_queue(
|
||||
[e for e in latest if e.get("id") != head.get("id")]
|
||||
)
|
||||
348
plugins/google_meet/tools.py
Normal file
348
plugins/google_meet/tools.py
Normal file
@@ -0,0 +1,348 @@
|
||||
"""Agent-facing tools for the google_meet plugin.
|
||||
|
||||
Tools:
|
||||
meet_join — join a Google Meet URL (spawns Playwright bot locally
|
||||
OR on a remote node host via node=<name>)
|
||||
meet_status — report bot liveness + transcript progress
|
||||
meet_transcript — read the current transcript (optional last-N)
|
||||
meet_leave — signal the bot to leave cleanly
|
||||
meet_say — (v2) speak text through the realtime audio bridge.
|
||||
Requires the active meeting to have been joined with
|
||||
mode='realtime'.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
from typing import Any, Dict, Optional
|
||||
|
||||
from plugins.google_meet import process_manager as pm
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Runtime gate
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def check_meet_requirements() -> bool:
|
||||
"""Return True when the plugin can actually run LOCALLY.
|
||||
|
||||
Gates on:
|
||||
* Python ``playwright`` package importable
|
||||
* the plugin being on a supported platform (Linux or macOS)
|
||||
|
||||
Note: remote-node operation (``node=<name>``) only needs the
|
||||
``websockets`` dep on the gateway side — Chromium lives on the node.
|
||||
But the plugin-level gate keeps the v1 semantics; individual tool
|
||||
handlers relax the requirement when a node is addressed.
|
||||
"""
|
||||
import platform as _p
|
||||
if _p.system().lower() not in ("linux", "darwin"):
|
||||
return False
|
||||
try:
|
||||
import playwright # noqa: F401
|
||||
except ImportError:
|
||||
return False
|
||||
return True
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Node client helper
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def _resolve_node_client(node: Optional[str]):
|
||||
"""Return (NodeClient, node_name) for *node*, or (None, None) to run local.
|
||||
|
||||
Raises RuntimeError with a readable message if the node is named but
|
||||
unresolvable, so the handler can surface a clear error to the agent.
|
||||
"""
|
||||
if node is None or node == "":
|
||||
return None, None
|
||||
from plugins.google_meet.node.registry import NodeRegistry
|
||||
from plugins.google_meet.node.client import NodeClient
|
||||
|
||||
reg = NodeRegistry()
|
||||
entry = reg.resolve(node if node != "auto" else None)
|
||||
if entry is None:
|
||||
raise RuntimeError(
|
||||
f"no registered meet node matches {node!r} — "
|
||||
"run `hermes meet node approve <name> <url> <token>` first"
|
||||
)
|
||||
client = NodeClient(url=entry["url"], token=entry["token"])
|
||||
return client, entry.get("name")
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Schemas
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
MEET_JOIN_SCHEMA: Dict[str, Any] = {
|
||||
"name": "meet_join",
|
||||
"description": (
|
||||
"Join a Google Meet call and start scraping live captions into a "
|
||||
"transcript file. Only meet.google.com URLs are accepted; no calendar "
|
||||
"scanning, no auto-dial. Spawns a headless Chromium subprocess that "
|
||||
"runs in parallel with the agent loop — returns immediately. Poll "
|
||||
"with meet_status and read captions with meet_transcript. Reminder "
|
||||
"to the agent: you should announce yourself in the meeting (there is "
|
||||
"no automatic consent announcement)."
|
||||
),
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"url": {
|
||||
"type": "string",
|
||||
"description": (
|
||||
"Full https://meet.google.com/... URL. Required."
|
||||
),
|
||||
},
|
||||
"mode": {
|
||||
"type": "string",
|
||||
"enum": ["transcribe", "realtime"],
|
||||
"description": (
|
||||
"transcribe (default): listen-only, scrape captions. "
|
||||
"realtime: also enable agent speech via meet_say "
|
||||
"(requires OpenAI Realtime key + platform audio bridge)."
|
||||
),
|
||||
},
|
||||
"guest_name": {
|
||||
"type": "string",
|
||||
"description": (
|
||||
"Display name to use when joining as guest. Defaults to "
|
||||
"'Hermes Agent'."
|
||||
),
|
||||
},
|
||||
"duration": {
|
||||
"type": "string",
|
||||
"description": (
|
||||
"Optional max duration before auto-leave (e.g. '30m', "
|
||||
"'2h', '90s'). Omit to stay until meet_leave is called."
|
||||
),
|
||||
},
|
||||
"headed": {
|
||||
"type": "boolean",
|
||||
"description": (
|
||||
"Run Chromium headed instead of headless (debug only). "
|
||||
"Default false."
|
||||
),
|
||||
},
|
||||
"node": {
|
||||
"type": "string",
|
||||
"description": (
|
||||
"Name of a registered remote node to run the bot on "
|
||||
"(useful when the gateway runs on a headless Linux box "
|
||||
"but the user's Chrome with a signed-in Google profile "
|
||||
"lives on their Mac). Pass 'auto' to use the single "
|
||||
"registered node. Default: run locally. Nodes are "
|
||||
"approved via `hermes meet node approve`."
|
||||
),
|
||||
},
|
||||
},
|
||||
"required": ["url"],
|
||||
"additionalProperties": False,
|
||||
},
|
||||
}
|
||||
|
||||
MEET_STATUS_SCHEMA: Dict[str, Any] = {
|
||||
"name": "meet_status",
|
||||
"description": (
|
||||
"Report the current Meet session state — whether the bot is alive, "
|
||||
"has joined, is sitting in the lobby, number of transcript lines "
|
||||
"captured, and last-caption timestamp."
|
||||
),
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"node": {"type": "string"},
|
||||
},
|
||||
"additionalProperties": False,
|
||||
},
|
||||
}
|
||||
|
||||
MEET_TRANSCRIPT_SCHEMA: Dict[str, Any] = {
|
||||
"name": "meet_transcript",
|
||||
"description": (
|
||||
"Read the scraped transcript for the active Meet session. Returns "
|
||||
"full transcript unless 'last' is set, in which case returns the last "
|
||||
"N lines only."
|
||||
),
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"last": {
|
||||
"type": "integer",
|
||||
"description": (
|
||||
"Optional: return only the last N caption lines. Useful "
|
||||
"for polling during a meeting without re-reading the "
|
||||
"whole transcript."
|
||||
),
|
||||
"minimum": 1,
|
||||
},
|
||||
"node": {"type": "string"},
|
||||
},
|
||||
"additionalProperties": False,
|
||||
},
|
||||
}
|
||||
|
||||
MEET_LEAVE_SCHEMA: Dict[str, Any] = {
|
||||
"name": "meet_leave",
|
||||
"description": (
|
||||
"Leave the active Meet call cleanly, stop caption scraping, and "
|
||||
"finalize the transcript file. Safe to call when no meeting is "
|
||||
"active — returns ok=false with a reason."
|
||||
),
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"node": {"type": "string"},
|
||||
},
|
||||
"additionalProperties": False,
|
||||
},
|
||||
}
|
||||
|
||||
MEET_SAY_SCHEMA: Dict[str, Any] = {
|
||||
"name": "meet_say",
|
||||
"description": (
|
||||
"Speak text into the active Meet call. Requires the active meeting "
|
||||
"to have been joined with mode='realtime'. The text is queued to "
|
||||
"the bot's OpenAI Realtime session; the generated audio is streamed "
|
||||
"into Chrome's fake microphone via a virtual audio device "
|
||||
"(PulseAudio null-sink on Linux, BlackHole on macOS). Returns "
|
||||
"immediately — the actual speech lags by a couple of seconds."
|
||||
),
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"text": {"type": "string", "description": "Text to speak."},
|
||||
"node": {"type": "string"},
|
||||
},
|
||||
"required": ["text"],
|
||||
"additionalProperties": False,
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Handlers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def _json(obj: Any) -> str:
|
||||
return json.dumps(obj, ensure_ascii=False)
|
||||
|
||||
|
||||
def _err(msg: str, **extra) -> str:
|
||||
return _json({"success": False, "error": msg, **extra})
|
||||
|
||||
|
||||
def handle_meet_join(args: Dict[str, Any], **_kw) -> str:
|
||||
url = (args.get("url") or "").strip()
|
||||
if not url:
|
||||
return _err("url is required")
|
||||
mode = (args.get("mode") or "transcribe").strip().lower()
|
||||
if mode not in ("transcribe", "realtime"):
|
||||
return _err(f"mode must be 'transcribe' or 'realtime' (got {mode!r})")
|
||||
|
||||
node = args.get("node")
|
||||
try:
|
||||
client, node_name = _resolve_node_client(node)
|
||||
except RuntimeError as e:
|
||||
return _err(str(e))
|
||||
|
||||
if client is not None:
|
||||
# Remote path — delegate to the node host.
|
||||
try:
|
||||
res = client.start_bot(
|
||||
url=url,
|
||||
guest_name=str(args.get("guest_name") or "Hermes Agent"),
|
||||
duration=str(args.get("duration")) if args.get("duration") else None,
|
||||
headed=bool(args.get("headed", False)),
|
||||
mode=mode,
|
||||
)
|
||||
return _json({"success": bool(res.get("ok")), "node": node_name, **res})
|
||||
except Exception as e:
|
||||
return _err(f"remote node start_bot failed: {e}", node=node_name)
|
||||
|
||||
# Local path — same as v1, with v2 params.
|
||||
if not check_meet_requirements():
|
||||
return _err(
|
||||
"google_meet plugin prerequisites missing — install with "
|
||||
"`pip install playwright && python -m playwright install "
|
||||
"chromium`. Plugin is supported on Linux and macOS only."
|
||||
)
|
||||
res = pm.start(
|
||||
url=url,
|
||||
headed=bool(args.get("headed", False)),
|
||||
guest_name=str(args.get("guest_name") or "Hermes Agent"),
|
||||
duration=str(args.get("duration")) if args.get("duration") else None,
|
||||
mode=mode,
|
||||
)
|
||||
return _json({"success": bool(res.get("ok")), **res})
|
||||
|
||||
|
||||
def handle_meet_status(args: Dict[str, Any], **_kw) -> str:
|
||||
try:
|
||||
client, node_name = _resolve_node_client(args.get("node"))
|
||||
except RuntimeError as e:
|
||||
return _err(str(e))
|
||||
if client is not None:
|
||||
try:
|
||||
res = client.status()
|
||||
return _json({"success": bool(res.get("ok")), "node": node_name, **res})
|
||||
except Exception as e:
|
||||
return _err(f"remote node status failed: {e}", node=node_name)
|
||||
res = pm.status()
|
||||
return _json({"success": bool(res.get("ok")), **res})
|
||||
|
||||
|
||||
def handle_meet_transcript(args: Dict[str, Any], **_kw) -> str:
|
||||
last = args.get("last")
|
||||
try:
|
||||
last_i = int(last) if last is not None else None
|
||||
if last_i is not None and last_i < 1:
|
||||
last_i = None
|
||||
except (TypeError, ValueError):
|
||||
last_i = None
|
||||
try:
|
||||
client, node_name = _resolve_node_client(args.get("node"))
|
||||
except RuntimeError as e:
|
||||
return _err(str(e))
|
||||
if client is not None:
|
||||
try:
|
||||
res = client.transcript(last=last_i)
|
||||
return _json({"success": bool(res.get("ok")), "node": node_name, **res})
|
||||
except Exception as e:
|
||||
return _err(f"remote node transcript failed: {e}", node=node_name)
|
||||
res = pm.transcript(last=last_i)
|
||||
return _json({"success": bool(res.get("ok")), **res})
|
||||
|
||||
|
||||
def handle_meet_leave(args: Dict[str, Any], **_kw) -> str:
|
||||
try:
|
||||
client, node_name = _resolve_node_client(args.get("node"))
|
||||
except RuntimeError as e:
|
||||
return _err(str(e))
|
||||
if client is not None:
|
||||
try:
|
||||
res = client.stop()
|
||||
return _json({"success": bool(res.get("ok")), "node": node_name, **res})
|
||||
except Exception as e:
|
||||
return _err(f"remote node stop failed: {e}", node=node_name)
|
||||
res = pm.stop(reason="agent called meet_leave")
|
||||
return _json({"success": bool(res.get("ok")), **res})
|
||||
|
||||
|
||||
def handle_meet_say(args: Dict[str, Any], **_kw) -> str:
|
||||
text = (args.get("text") or "").strip()
|
||||
if not text:
|
||||
return _err("text is required")
|
||||
try:
|
||||
client, node_name = _resolve_node_client(args.get("node"))
|
||||
except RuntimeError as e:
|
||||
return _err(str(e))
|
||||
if client is not None:
|
||||
try:
|
||||
res = client.say(text)
|
||||
return _json({"success": bool(res.get("ok")), "node": node_name, **res})
|
||||
except Exception as e:
|
||||
return _err(f"remote node say failed: {e}", node=node_name)
|
||||
res = pm.enqueue_say(text)
|
||||
return _json({"success": bool(res.get("ok")), **res})
|
||||
Reference in New Issue
Block a user