Files
hermes-agent/tests/plugins/test_google_meet_audio.py
Teknium df3c9593f8 feat(plugins): google_meet \u2014 join, transcribe, speak, follow up (#16364)
* feat(plugins): google_meet — bundled plugin for join+transcribe Meet calls

v1 shipping transcribe-only. Spawns headless Chromium via Playwright,
joins an explicit https://meet.google.com/ URL, enables live captions,
and scrapes them into a transcript file the agent can read across turns.
The agent then has the meeting content in context and can do followup
work (send recap, file issues, schedule followups) with its regular tools.

Surface:
  - Tools: meet_join, meet_status, meet_transcript, meet_leave, meet_say
    (meet_say is a v1 stub — returns not-implemented; v2 will wire
    realtime duplex audio via OpenAI Realtime / Gemini Live +
    BlackHole / PulseAudio null-sink.)
  - CLI: hermes meet setup | auth | join | status | transcript | stop
  - Lifecycle: on_session_end auto-leaves any still-running bot.

Safety:
  - URL regex rejects anything that isn't https://meet.google.com/...
  - No calendar scanning, no auto-dial, no auto-consent announcement.
  - Single active meeting per install; a second meet_join leaves the first.
  - Platform-gated to Linux + macOS (Windows audio routing for v2 untested).
  - Opt-in: standalone plugin, user must add 'google_meet' to
    plugins.enabled in config.yaml.

Zero core changes. Plugin uses existing register_tool /
register_cli_command / register_hook surfaces. 21 new unit tests cover the
URL safety gate, transcript dedup + status round-trip, process-manager
refusals/start/stop paths, tool-handler JSON shape under each branch,
session-end cleanup, and platform-gated register().

* feat(plugins/google_meet): v2 realtime audio + v3 remote node host

v2 \u2014 agent speaks in-meeting
  audio_bridge.py: PulseAudio null-sink (Linux) + BlackHole probe (macOS).
    On Linux we load pactl module-null-sink + module-virtual-source, track
    module ids for teardown; Chrome gets PULSE_SOURCE=<virt src> env so its
    fake mic reads what we write to the sink. macOS just probes BlackHole
    2ch and returns its device name \u2014 the plugin refuses to switch the
    user's default audio input (that would surprise them).
  realtime/openai_client.py: sync WebSocket client for the OpenAI Realtime
    API. RealtimeSession.speak(text) sends conversation.item.create +
    response.create, accumulates response.audio.delta PCM bytes, appends
    them to a file. RealtimeSpeaker runs a JSONL-queue loop consuming
    meet_say calls. 'websockets' is an optional dep imported lazily.
  meet_bot.py: when HERMES_MEET_MODE=realtime, provisions AudioBridge,
    starts RealtimeSession + speaker thread, spawns paplay to pump PCM
    into the null-sink, then cleans everything up on SIGTERM. If any
    realtime setup step fails, falls back cleanly to transcribe mode
    with an error flagged in status.json.
  process_manager.enqueue_say(): writes a JSONL line to say_queue.jsonl;
    refuses when no active meeting or active meeting is transcribe-only.
  tools.meet_say: real implementation; requires active mode='realtime'.
  meet_join: adds mode='transcribe'|'realtime' param.

v3 \u2014 remote node host
  node/protocol.py: JSON envelope (type, id, token, payload) + validate.
  node/registry.py: $HERMES_HOME/workspace/meetings/nodes.json, with
    resolve() auto-selecting the sole registered node when name is None.
  node/server.py: NodeServer \u2014 websockets.serve, bearer-token auth,
    dispatches start_bot/stop/status/transcript/say/ping onto the local
    process_manager. Token auto-generated + persisted on first run.
  node/client.py: NodeClient \u2014 short-lived sync WS per RPC, raises
    RuntimeError on error envelopes, clean API matching the server.
  node/cli.py: 'hermes meet node {run,list,approve,remove,status,ping}'
    subtree; wired into the main meet CLI by cli.py so 'hermes meet node'
    Just Works.
  tools.py: every meet_* tool accepts node='<name>'|'auto'; when set,
    routes through NodeClient to the remote bot instead of running
    locally. Unknown node \u2192 clear 'no registered meet node matches ...'
    error.
  cli.py: 'hermes meet join --node my-mac --mode realtime' and
    'hermes meet say "..." --node my-mac' route to the node; 'hermes
    meet node approve <name> <url> <token>' registers one.

Tests
  21 v1 tests updated (meet_say is no longer a stub; active-record now
    carries mode).
  20 new audio_bridge + realtime tests.
  42 new node tests (protocol/registry/server/client/cli).
  17 new v1/v2/v3 integration tests at the plugin level covering
    enqueue_say edge cases, env var passthrough, mode validation, node
    routing (known/unknown/auto/ambiguous), and argparse wiring for
    `hermes meet say` + `hermes meet node` + --mode/--node flags.
  Total: 100 plugin tests + 58 plugin-system tests = 158 passing.

E2E verified on Linux with fresh HERMES_HOME: plugin loads, 5 tools
register, on_session_end hook wires, 'hermes meet' CLI tree wires
including the node subtree, NodeRegistry round-trips, meet_join routes
correctly to NodeClient under node='my-mac' with mode='realtime',
enqueue_say accepts realtime/rejects transcribe, argparse parses every
new flag cleanly.

Zero changes to core. All new code lives under plugins/google_meet/.

* feat(plugins/google_meet): auto-install, admission detect, mac PCM pump, barge-in, richer status

Ready-for-live-test follow-up on PR #16364. Five additions that matter for
the first live run on a real Meet, in priority order:

1. hermes meet install [--realtime] [--yes]
   pip install playwright websockets + python -m playwright install chromium
   --realtime: installs platform audio deps (pulseaudio-utils on Linux via
   sudo apt, blackhole-2ch + ffmpeg on macOS via brew). Prompts before
   sudo/brew unless --yes. Refuses on Windows. Refuses to auto-flip the
   macOS default input — user still selects BlackHole in System Settings
   (deliberate; surprise audio rerouting is worse than a manual step).

2. Admission detection
   _detect_admission(page): Leave-button visible OR caption region
   attached OR participants list present → we're in-call.
   _detect_denied(page): 'You can\'t join this video call' / 'You were
   removed' / 'No one responded to your request' → bail out.
   HERMES_MEET_LOBBY_TIMEOUT (default 300s) caps how long we sit in
   the lobby before giving up. in_call stays False until admitted.
   Status surfaces leaveReason: duration_expired | lobby_timeout |
   denied | page_closed.

3. macOS PCM pump
   ffmpeg reads speaker.pcm (24kHz s16le mono) and writes to the
   BlackHole AVFoundation output via -f audiotoolbox
   -audio_device_index <N>. _mac_audio_device_index() probes
   ffmpeg -f avfoundation -list_devices true to resolve 'BlackHole 2ch'
   → numeric index. Falls back to index 0 on probe failure. Linux
   paplay pump unchanged.

4. Richer status dict
   _BotState now tracks realtime, realtimeReady, realtimeDevice,
   audioBytesOut, lastAudioOutAt, lastBargeInAt, joinAttemptedAt,
   leaveReason. RealtimeSession.audio_bytes_out / last_audio_out_at
   counters fold into the status file once a second so meet_status()
   can show the agent's voice activity in near-real-time.

5. Barge-in
   RealtimeSession.cancel_response() sends type='response.cancel' over
   the same WS (lock-guarded so it's safe to call from the caption
   thread while speak() is reading frames). Handles response.cancelled
   as a terminal frame type. _looks_like_human_speaker() gates triggers
   so the bot's own name, 'You', 'Unknown', and blanks don't self-cancel.
   Called from the caption drain loop: when a new caption arrives
   attributed to a real participant while rt.session exists, we fire
   cancel_response() and stamp lastBargeInAt.

Tests: 20 new unit tests across _BotState telemetry, barge-in gating,
admission/denied probe error handling, cancel_response with and without
a connected WS, and `hermes meet install` CLI wiring (flag parsing +
end-to-end subprocess.run verification + Linux-already-installed fast
path). Total 171 passing across all google_meet test files + the
plugin-system regression suite.

E2E verified on Linux: plugin loads, all 5 tools register,
`hermes meet install --realtime --yes` parses, fresh-bot status.json
has every new telemetry key, cancel_response on a disconnected session
returns False without raising, barge-in helper gates the bot's own
name correctly.

Still out of scope (for a future PR, not blocking live test):
mic → Realtime duplex (the agent listening to meeting audio via
WebRTC), node-host TLS/pairing UX, Windows audio, Meet create+Twilio.

Docs updated: SKILL.md now lists the installer subcommand, lobby
timeout, barge-in caveat, and the full status-dict reference table.
README.md quick-start uses hermes meet install.
2026-04-27 06:22:25 -07:00

267 lines
8.8 KiB
Python

"""Tests for plugins.google_meet.audio_bridge (v2).
Covers the platform gating and pactl / system_profiler plumbing
without actually invoking those tools on the host.
"""
from __future__ import annotations
import subprocess
from unittest.mock import MagicMock, patch
import pytest
@pytest.fixture(autouse=True)
def _isolate_home(tmp_path, monkeypatch):
hermes_home = tmp_path / ".hermes"
hermes_home.mkdir()
monkeypatch.setenv("HERMES_HOME", str(hermes_home))
yield hermes_home
# ---------------------------------------------------------------------------
# Linux setup / teardown
# ---------------------------------------------------------------------------
def _linux_pactl_result(stdout: str) -> MagicMock:
"""Build a fake CompletedProcess-ish object for subprocess.run."""
m = MagicMock()
m.stdout = stdout
m.stderr = ""
m.returncode = 0
return m
def test_setup_linux_loads_null_sink_and_virtual_source():
from plugins.google_meet.audio_bridge import AudioBridge
calls: list[list[str]] = []
def _fake_run(argv, **kwargs):
calls.append(list(argv))
# First call = null-sink → module id 42
# Second call = virtual-source → module id 43
if "module-null-sink" in argv:
return _linux_pactl_result("42\n")
if "module-virtual-source" in argv:
return _linux_pactl_result("43\n")
raise AssertionError(f"unexpected pactl invocation: {argv}")
with patch("plugins.google_meet.audio_bridge.platform.system",
return_value="Linux"), \
patch("plugins.google_meet.audio_bridge.subprocess.run",
side_effect=_fake_run):
br = AudioBridge()
info = br.setup()
# Two pactl load-module calls, in order.
assert len(calls) == 2
assert calls[0][0] == "pactl" and calls[0][1] == "load-module"
assert "module-null-sink" in calls[0]
assert any(a.startswith("sink_name=hermes_meet_sink") for a in calls[0])
assert calls[1][0] == "pactl" and calls[1][1] == "load-module"
assert "module-virtual-source" in calls[1]
assert any(a.startswith("source_name=hermes_meet_src") for a in calls[1])
assert any("master=hermes_meet_sink.monitor" in a for a in calls[1])
# Dict shape.
assert info["platform"] == "linux"
assert info["device_name"] == "hermes_meet_src"
assert info["write_target"] == "hermes_meet_sink"
assert info["sample_rate"] == 48000
assert info["channels"] == 2
assert info["module_ids"] == [42, 43]
# Properties.
assert br.device_name == "hermes_meet_src"
assert br.write_target == "hermes_meet_sink"
def test_teardown_linux_unloads_modules_in_reverse_order():
from plugins.google_meet.audio_bridge import AudioBridge
def _setup_run(argv, **kwargs):
if "module-null-sink" in argv:
return _linux_pactl_result("42\n")
return _linux_pactl_result("43\n")
with patch("plugins.google_meet.audio_bridge.platform.system",
return_value="Linux"), \
patch("plugins.google_meet.audio_bridge.subprocess.run",
side_effect=_setup_run):
br = AudioBridge()
br.setup()
unload_calls: list[list[str]] = []
def _teardown_run(argv, **kwargs):
unload_calls.append(list(argv))
return _linux_pactl_result("")
with patch("plugins.google_meet.audio_bridge.subprocess.run",
side_effect=_teardown_run):
br.teardown()
# Two unload calls, in reverse order: 43 (virtual-source) then 42 (sink).
assert [c[1] for c in unload_calls] == ["unload-module", "unload-module"]
assert unload_calls[0][2] == "43"
assert unload_calls[1][2] == "42"
# Second teardown is a no-op.
with patch("plugins.google_meet.audio_bridge.subprocess.run") as run_mock:
br.teardown()
run_mock.assert_not_called()
def test_setup_linux_parses_module_id_from_multi_line_output():
"""Some pactl builds include trailing whitespace / notices."""
from plugins.google_meet.audio_bridge import AudioBridge
def _fake_run(argv, **kwargs):
if "module-null-sink" in argv:
return _linux_pactl_result("42 \n")
return _linux_pactl_result("43\n")
with patch("plugins.google_meet.audio_bridge.platform.system",
return_value="Linux"), \
patch("plugins.google_meet.audio_bridge.subprocess.run",
side_effect=_fake_run):
br = AudioBridge()
info = br.setup()
assert info["module_ids"] == [42, 43]
def test_setup_linux_pactl_missing_raises_clean_error():
from plugins.google_meet.audio_bridge import AudioBridge
with patch("plugins.google_meet.audio_bridge.platform.system",
return_value="Linux"), \
patch("plugins.google_meet.audio_bridge.subprocess.run",
side_effect=FileNotFoundError("pactl")):
br = AudioBridge()
with pytest.raises(RuntimeError, match="pactl"):
br.setup()
# ---------------------------------------------------------------------------
# macOS setup
# ---------------------------------------------------------------------------
_BH_PRESENT = (
"Audio:\n"
" Devices:\n"
" BlackHole 2ch:\n"
" Manufacturer: Existential Audio\n"
)
_BH_ABSENT = (
"Audio:\n"
" Devices:\n"
" MacBook Pro Microphone:\n"
" Default Input: Yes\n"
)
def test_setup_darwin_returns_blackhole_when_present():
from plugins.google_meet.audio_bridge import AudioBridge
with patch("plugins.google_meet.audio_bridge.platform.system",
return_value="Darwin"), \
patch("plugins.google_meet.audio_bridge.subprocess.check_output",
return_value=_BH_PRESENT) as check:
br = AudioBridge()
info = br.setup()
check.assert_called_once()
argv = check.call_args.args[0]
assert argv[0] == "system_profiler"
assert "SPAudioDataType" in argv
assert info["platform"] == "darwin"
assert info["device_name"] == "BlackHole 2ch"
assert info["write_target"] == "BlackHole 2ch"
assert info["module_ids"] == []
assert info["sample_rate"] == 48000
assert info["channels"] == 2
# teardown is a no-op on darwin (no modules to unload).
with patch("plugins.google_meet.audio_bridge.subprocess.run") as run_mock:
br.teardown()
run_mock.assert_not_called()
def test_setup_darwin_raises_when_blackhole_missing():
from plugins.google_meet.audio_bridge import AudioBridge
with patch("plugins.google_meet.audio_bridge.platform.system",
return_value="Darwin"), \
patch("plugins.google_meet.audio_bridge.subprocess.check_output",
return_value=_BH_ABSENT):
br = AudioBridge()
with pytest.raises(RuntimeError, match="BlackHole"):
br.setup()
# ---------------------------------------------------------------------------
# Windows / unsupported
# ---------------------------------------------------------------------------
def test_setup_windows_raises():
from plugins.google_meet.audio_bridge import AudioBridge
with patch("plugins.google_meet.audio_bridge.platform.system",
return_value="Windows"):
br = AudioBridge()
with pytest.raises(RuntimeError, match="not supported"):
br.setup()
# ---------------------------------------------------------------------------
# chrome_fake_audio_flags
# ---------------------------------------------------------------------------
def test_chrome_fake_audio_flags_linux():
from plugins.google_meet.audio_bridge import chrome_fake_audio_flags
with patch("plugins.google_meet.audio_bridge.platform.system",
return_value="Linux"):
flags = chrome_fake_audio_flags(
{"platform": "linux", "device_name": "hermes_meet_src"}
)
assert "--use-fake-ui-for-media-stream" in flags
def test_chrome_fake_audio_flags_darwin():
from plugins.google_meet.audio_bridge import chrome_fake_audio_flags
with patch("plugins.google_meet.audio_bridge.platform.system",
return_value="Darwin"):
flags = chrome_fake_audio_flags(
{"platform": "darwin", "device_name": "BlackHole 2ch"}
)
assert "--use-fake-ui-for-media-stream" in flags
def test_chrome_fake_audio_flags_windows_raises():
from plugins.google_meet.audio_bridge import chrome_fake_audio_flags
with patch("plugins.google_meet.audio_bridge.platform.system",
return_value="Windows"):
with pytest.raises(RuntimeError):
chrome_fake_audio_flags({"platform": "windows"})
def test_property_access_before_setup_raises():
from plugins.google_meet.audio_bridge import AudioBridge
br = AudioBridge()
with pytest.raises(RuntimeError):
_ = br.device_name
with pytest.raises(RuntimeError):
_ = br.write_target