mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-06-10 12:18:44 +08:00
Compare commits
1 Commits
plugin-sdk
...
docs/video
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
bc29596b6c |
@@ -46,12 +46,13 @@ Routing happens through OpenRouter under the hood, so model availability and fai
|
||||
|
||||
### The Nous Tool Gateway
|
||||
|
||||
The same subscription unlocks the [Tool Gateway](/user-guide/features/tool-gateway), which routes Hermes Agent's tool calls through Nous-managed infrastructure. Five backends, one login:
|
||||
The same subscription unlocks the [Tool Gateway](/user-guide/features/tool-gateway), which routes Hermes Agent's tool calls through Nous-managed infrastructure. Six backends, one login:
|
||||
|
||||
| Tool | Partner | What it does |
|
||||
|------|---------|--------------|
|
||||
| **Web search & extract** | Firecrawl | Agent-grade search and full-page extraction. No Firecrawl API key, no rate limit babysitting. |
|
||||
| **Image generation** | FAL | Nine models under one endpoint: FLUX 2 Klein 9B, FLUX 2 Pro, Z-Image Turbo, Nano Banana Pro (Gemini 3 Pro Image), GPT Image 1.5, GPT Image 2, Ideogram V3, Recraft V4 Pro, Qwen Image. |
|
||||
| **Video generation** | FAL | Text-to-video and image-to-video without a FAL key: Veo 3.1, Pixverse v6, Kling, LTX-2.3. Pick per-generation or set a default with `hermes tools`. |
|
||||
| **Text-to-speech** | OpenAI TTS | High-quality TTS without a separate OpenAI key. Enables [voice mode](/user-guide/features/voice-mode) across messaging platforms. |
|
||||
| **Cloud browser automation** | Browser Use | Headless Chromium sessions for `browser_navigate`, `browser_click`, `browser_type`, `browser_vision`. No Browserbase account needed. |
|
||||
| **Cloud terminal sandbox** | Modal | Serverless terminal sandboxes for code execution (optional add-on). |
|
||||
@@ -217,6 +218,10 @@ web:
|
||||
image_gen:
|
||||
provider: nous
|
||||
|
||||
video_gen:
|
||||
provider: fal
|
||||
use_gateway: true
|
||||
|
||||
tts:
|
||||
provider: nous
|
||||
|
||||
|
||||
@@ -1430,6 +1430,7 @@ You can switch between providers at any time with `hermes model` — no restart
|
||||
| Web scraping | [Firecrawl](https://firecrawl.dev/) | `FIRECRAWL_API_KEY`, `FIRECRAWL_API_URL` |
|
||||
| Browser automation | [Browserbase](https://browserbase.com/) | `BROWSERBASE_API_KEY`, `BROWSERBASE_PROJECT_ID` |
|
||||
| Image generation | [FAL](https://fal.ai/) | `FAL_KEY` |
|
||||
| Video generation | [FAL](https://fal.ai/) | `FAL_KEY` |
|
||||
| Premium TTS voices | [ElevenLabs](https://elevenlabs.io/) | `ELEVENLABS_API_KEY` |
|
||||
| OpenAI TTS + voice transcription | [OpenAI](https://platform.openai.com/api-keys) | `VOICE_TOOLS_OPENAI_KEY` |
|
||||
| Mistral TTS + voice transcription | [Mistral](https://console.mistral.ai/) | `MISTRAL_API_KEY` |
|
||||
|
||||
@@ -198,7 +198,7 @@ Opt-in toolset (not loaded in the default `hermes-cli` set). Add via `--toolsets
|
||||
Backends ship as plugins under `plugins/video_gen/<name>/`:
|
||||
|
||||
- **xAI Grok-Imagine** — text-to-video and image-to-video (SuperGrok OAuth or `XAI_API_KEY`).
|
||||
- **FAL.ai** — Veo 3.1, Pixverse v6, Kling O3 (requires `FAL_KEY`).
|
||||
- **FAL.ai** — Veo 3.1, Pixverse v6, Seedance 2.0, Kling, LTX-2.3, Happy Horse (direct `FAL_KEY`, or no key at all via the [Nous Tool Gateway](/user-guide/features/tool-gateway) — pick "Nous Subscription" in `hermes tools` → Video Generation, which sets `video_gen.provider: fal` + `video_gen.use_gateway: true`).
|
||||
|
||||
The single `video_generate` tool covers both modalities — pass `image_url` to animate a still, omit it to generate from text alone. The active backend auto-routes to the right endpoint. The tool's description is rebuilt at session start to reflect the active backend's actual capabilities (modalities, aspect ratios, resolutions, duration range, max reference images, audio support). See [Video Generation Provider Plugins](/developer-guide/video-gen-provider-plugin) for backend authoring.
|
||||
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
---
|
||||
title: "Nous Tool Gateway"
|
||||
description: "One subscription, every tool. Web search, image generation, TTS, and cloud browsers — all routed through Nous Portal with no extra API keys."
|
||||
description: "One subscription, every tool. Web search, image generation, video generation, TTS, and cloud browsers — all routed through Nous Portal with no extra API keys."
|
||||
sidebar_label: "Tool Gateway"
|
||||
sidebar_position: 2
|
||||
---
|
||||
@@ -9,7 +9,7 @@ sidebar_position: 2
|
||||
|
||||
**One subscription. Every tool built in.**
|
||||
|
||||
The Tool Gateway is included with every paid [Nous Portal](https://portal.nousresearch.com) subscription. It routes Hermes' tool calls — web search, image generation, text-to-speech, and cloud browser automation — through infrastructure Nous already runs, so you don't have to sign up with Firecrawl, FAL, OpenAI, Browser Use, or anyone else just to make your agent useful.
|
||||
The Tool Gateway is included with every paid [Nous Portal](https://portal.nousresearch.com) subscription. It routes Hermes' tool calls — web search, image generation, video generation, text-to-speech, and cloud browser automation — through infrastructure Nous already runs, so you don't have to sign up with Firecrawl, FAL, OpenAI, Browser Use, or anyone else just to make your agent useful.
|
||||
|
||||
<div style={{display: 'flex', gap: '1rem', flexWrap: 'wrap', margin: '1.5rem 0'}}>
|
||||
<a href="https://portal.nousresearch.com/manage-subscription" style={{background: 'var(--ifm-color-primary)', color: 'white', padding: '0.75rem 1.5rem', borderRadius: '6px', textDecoration: 'none', fontWeight: 'bold'}}>Start or manage subscription →</a>
|
||||
@@ -21,6 +21,7 @@ The Tool Gateway is included with every paid [Nous Portal](https://portal.nousre
|
||||
|---|---|---|
|
||||
| 🔍 | **Web search & extract** | Agent-grade web search and full-page extraction via Firecrawl. No rate limits to worry about — the gateway handles scaling. |
|
||||
| 🎨 | **Image generation** | Nine models under one endpoint: **FLUX 2 Klein 9B**, **FLUX 2 Pro**, **Z-Image Turbo**, **Nano Banana Pro** (Gemini 3 Pro Image), **GPT Image 1.5**, **GPT Image 2**, **Ideogram V3**, **Recraft V4 Pro**, **Qwen Image**. Pick per-generation with a flag, or let Hermes default to FLUX 2 Klein. |
|
||||
| 🎬 | **Video generation** | Text-to-video and image-to-video through FAL — **Veo 3.1**, **Pixverse v6**, **Kling**, **LTX-2.3** — wired into the `video_generate` tool. No FAL key required. Pick a model with `hermes tools` → Video Generation. |
|
||||
| 🔊 | **Text-to-speech** | OpenAI TTS voices wired into the `text_to_speech` tool. Drop voice notes into Telegram, generate audio for pipelines, narrate anything. |
|
||||
| 🌐 | **Cloud browser automation** | Headless Chromium sessions via Browser Use. `browser_navigate`, `browser_click`, `browser_type`, `browser_vision` — all the agent-driving primitives, no Browserbase account required. |
|
||||
|
||||
@@ -114,6 +115,25 @@ The set evolves — `hermes tools` → Image Generation shows the current live l
|
||||
|
||||
---
|
||||
|
||||
## Using individual video models
|
||||
|
||||
Video generation routes through FAL the same way image generation does. Set a default model with `hermes tools` → Video Generation, or pin it in `config.yaml` under `video_gen.model`. Use the short family name (not the raw FAL endpoint):
|
||||
|
||||
| Model | `video_gen.model` | Tier | Notes |
|
||||
|---|---|---|---|
|
||||
| LTX 2.3 (22B) | `ltx-2.3` | cheap | 22B with native audio. Fast (~30-60s). |
|
||||
| Pixverse v6 | `pixverse-v6` | cheap | Negative prompts, 1-15s durations. |
|
||||
| Veo 3.1 | `veo3.1` | premium | Google DeepMind. Cinematic, native audio, strong prompt adherence. |
|
||||
| Seedance 2.0 | `seedance-2.0` | premium | ByteDance. Synchronized audio + lip-sync, 4-15s. |
|
||||
| Kling v3 4K | `kling-v3-4k` | premium | 4K output, native audio, 3-15s. |
|
||||
| Happy Horse 1.0 | `happy-horse` | premium | Alibaba. |
|
||||
|
||||
Every model supports both text-to-video (omit `image_url`) and image-to-video (pass `image_url`); the active backend auto-routes to the right endpoint. The `video_generate` tool description is rebuilt at session start to reflect the chosen model's real capabilities — aspect ratios, resolutions, duration range, audio support.
|
||||
|
||||
Which models are enabled on a given subscription is decided gateway-side, not by your config. If a model returns an HTTP 4xx with *"Nous Subscription gateway rejected endpoint … This model may not yet be enabled,"* it isn't allowlisted on your subscription yet — pick another model, or set `FAL_KEY` in `.env` to hit FAL directly and bypass the gateway allowlist entirely.
|
||||
|
||||
---
|
||||
|
||||
## Configuration reference
|
||||
|
||||
Most users never need to touch this — `hermes model` and `hermes tools` cover every workflow interactively. This section is for writing config.yaml directly or scripting setups.
|
||||
@@ -130,6 +150,10 @@ web:
|
||||
image_gen:
|
||||
use_gateway: true
|
||||
|
||||
video_gen:
|
||||
provider: fal
|
||||
use_gateway: true
|
||||
|
||||
tts:
|
||||
provider: openai
|
||||
use_gateway: true
|
||||
|
||||
Reference in New Issue
Block a user