Files
hermes-agent/website/docs/user-guide/skills/bundled/mlops/mlops-models-segment-anything.md

525 lines
14 KiB
Markdown
Raw Normal View History

docs(website): dedicated page per bundled + optional skill (#14929) Generates a full dedicated Docusaurus page for every one of the 132 skills (73 bundled + 59 optional) under website/docs/user-guide/skills/{bundled,optional}/<category>/. Each page carries the skill's description, metadata (version, author, license, dependencies, platform gating, tags, related skills cross-linked to their own pages), and the complete SKILL.md body that Hermes loads at runtime. Previously the two catalog pages just listed skills with a one-line blurb and no way to see what the skill actually did — users had to go read the source repo. Now every skill has a browsable, searchable, cross-linked reference in the docs. - website/scripts/generate-skill-docs.py — generator that reads skills/ and optional-skills/, writes per-skill pages, regenerates both catalog indexes, and rewrites the Skills section of sidebars.ts. Handles MDX escaping (outside fenced code blocks: curly braces, unsafe HTML-ish tags) and rewrites relative references/*.md links to point at the GitHub source. - website/docs/reference/skills-catalog.md — regenerated; each row links to the new dedicated page. - website/docs/reference/optional-skills-catalog.md — same. - website/sidebars.ts — Skills section now has Bundled / Optional subtrees with one nested category per skill folder. - .github/workflows/{docs-site-checks,deploy-site}.yml — run the generator before docusaurus build so CI stays in sync with the source SKILL.md files. Build verified locally with `npx docusaurus build`. Only remaining warnings are pre-existing broken link/anchor issues in unrelated pages.
2026-04-23 22:22:11 -07:00
---
docs: resync reference, user-guide, developer-guide, and messaging pages against code (#17738) Broad drift audit against origin/main (b52b63396). Reference pages (most user-visible drift): - slash-commands: add /busy, /curator, /footer, /indicator, /redraw, /steer that were missing; drop non-existent /terminal-setup; fix /q footnote (resolves to /queue, not /quit); extend CLI-only list with all 24 CLI-only commands in the registry - cli-commands: add dedicated sections for hermes curator / fallback / hooks (new subcommands not previously documented); remove stale hermes honcho standalone section (the plugin registers dynamically via hermes memory); list curator/fallback/hooks in top-level table; fix completion to include fish - toolsets-reference: document the real 52-toolset count; split browser vs browser-cdp; add discord / discord_admin / spotify / yuanbao; correct hermes-cli tool count from 36 to 38; fix misleading claim that hermes-homeassistant adds tools (it's identical to hermes-cli) - tools-reference: bump tool count 55 -> 68; add 7 Spotify, 5 Yuanbao, 2 Discord toolsets; move browser_cdp/browser_dialog to their own browser-cdp toolset section - environment-variables: add 40+ user-facing HERMES_* vars that were undocumented (--yolo, --accept-hooks, --ignore-*, inference model override, agent/stream/checkpoint timeouts, OAuth trace, per-platform batch tuning for Telegram/Discord/Matrix/Feishu/WeCom, cron knobs, gateway restart/connect timeouts); dedupe the Cron Scheduler section; replace stale QQ_SANDBOX with QQ_PORTAL_HOST User-guide (top level): - cli.md: compression preserves last 20 turns, not 4 (protect_last_n: 20) - configuration.md: display.platforms is the canonical per-platform override key; tool_progress_overrides is deprecated and auto-migrated - profiles.md: model.default is the config key, not model.model - sessions.md: CLI/TUI session IDs use 6-char hex, gateway uses 8 - checkpoints-and-rollback.md: destructive-command list now matches _DESTRUCTIVE_PATTERNS (adds rmdir, cp, install, dd) - docker.md: the container runs as non-root hermes (UID 10000) via gosu; fix install command (uv pip); add missing --insecure on the dashboard compose example (required for non-loopback bind) - security.md: systemctl danger pattern also matches 'restart' - index.md: built-in tool count 47 -> 68 - integrations/index.md: 6 STT providers, 8 memory providers - integrations/providers.md: drop fictional dashscope/qwen aliases Features: - overview.md: 9 image models (not 8), 9 TTS providers (not 5), 8 memory providers (Supermemory was missing) - tool-gateway.md: 9 image models - tools.md: extend common-toolsets list with search / messaging / spotify / discord / debugging / safe - fallback-providers.md: add 6 real providers from PROVIDER_REGISTRY (lmstudio, kimi-coding-cn, stepfun, alibaba-coding-plan, tencent-tokenhub, azure-foundry) - plugins.md: Available Hooks table now includes on_session_finalize, on_session_reset, subagent_stop - built-in-plugins.md: add the 7 bundled plugins the page didn't mention (spotify, google_meet, three image_gen providers, two dashboard examples) - web-dashboard.md: add --insecure and --tui flags - cron.md: hermes cron create takes positional schedule/prompt, not flags Messaging: - telegram.md: TELEGRAM_WEBHOOK_SECRET is now REQUIRED when TELEGRAM_WEBHOOK_URL is set (gateway refuses to start without it per GHSA-3vpc-7q5r-276h). Biggest user-visible drift in the batch. - discord.md: HERMES_DISCORD_TEXT_BATCH_SPLIT_DELAY_SECONDS default is 2.0, not 0.1 - dingtalk.md: document DINGTALK_REQUIRE_MENTION / FREE_RESPONSE_CHATS / MENTION_PATTERNS / HOME_CHANNEL / ALLOW_ALL_USERS that the adapter supports - bluebubbles.md: drop fictional BLUEBUBBLES_SEND_READ_RECEIPTS env var; the setting lives in platforms.bluebubbles.extra only - qqbot.md: drop dead QQ_SANDBOX; add real QQ_PORTAL_HOST and QQ_GROUP_ALLOWED_USERS - wecom-callback.md: replace 'hermes gateway start' (service-only) with 'hermes gateway' for first-time setup Developer-guide: - architecture.md: refresh tool/toolset counts (61/52), terminal backend count (7), line counts for run_agent.py (~13.7k), cli.py (~11.5k), main.py (~10.4k), setup.py (~3.5k), gateway/run.py (~12.2k), mcp_tool.py (~3.1k); add yuanbao adapter, bump platform adapter count 18 -> 20 - agent-loop.md: run_agent.py line count 10.7k -> 13.7k - tools-runtime.md: add vercel_sandbox backend - adding-tools.md: remove stale 'Discovery import added to model_tools.py' checklist item (registry auto-discovery) - adding-platform-adapters.md: mark send_typing / get_chat_info as concrete base methods; only connect/disconnect/send are abstract - acp-internals.md: ACP sessions now persist to SessionDB (~/.hermes/state.db); acp.run_agent call uses use_unstable_protocol=True - cron-internals.md: gateway runs scheduler in a dedicated background thread via _start_cron_ticker, not on a maintenance cycle; locking is cross-process via fcntl.flock (Unix) / msvcrt.locking (Windows) - gateway-internals.md: gateway/run.py ~12k lines - provider-runtime.md: cron DOES support fallback (run_job reads fallback_providers from config) - session-storage.md: SCHEMA_VERSION = 11 (not 9); add migrations 10 and 11 (trigram FTS, inline-mode FTS5 re-index); add api_call_count column to Sessions DDL; document messages_fts_trigram and state_meta in the architecture tree - context-compression-and-caching.md: remove the obsolete 'context pressure warnings' section (warnings were removed for causing models to give up early) - context-engine-plugin.md: compress() signature now includes focus_topic param - extending-the-cli.md: _build_tui_layout_children signature now includes model_picker_widget; add to default layout Also fixed three pre-existing broken links/anchors the build warned about (docker.md -> api-server.md, yuanbao.md -> cron-jobs.md and tips#background-tasks, nix-setup.md -> #container-aware-cli). Regenerated per-skill pages via website/scripts/generate-skill-docs.py so catalog tables and sidebar are consistent with current SKILL.md frontmatter. docusaurus build: clean, no broken links or anchors.
2026-04-29 20:55:59 -07:00
title: "Segment Anything Model — SAM: zero-shot image segmentation via points, boxes, masks"
docs(website): dedicated page per bundled + optional skill (#14929) Generates a full dedicated Docusaurus page for every one of the 132 skills (73 bundled + 59 optional) under website/docs/user-guide/skills/{bundled,optional}/<category>/. Each page carries the skill's description, metadata (version, author, license, dependencies, platform gating, tags, related skills cross-linked to their own pages), and the complete SKILL.md body that Hermes loads at runtime. Previously the two catalog pages just listed skills with a one-line blurb and no way to see what the skill actually did — users had to go read the source repo. Now every skill has a browsable, searchable, cross-linked reference in the docs. - website/scripts/generate-skill-docs.py — generator that reads skills/ and optional-skills/, writes per-skill pages, regenerates both catalog indexes, and rewrites the Skills section of sidebars.ts. Handles MDX escaping (outside fenced code blocks: curly braces, unsafe HTML-ish tags) and rewrites relative references/*.md links to point at the GitHub source. - website/docs/reference/skills-catalog.md — regenerated; each row links to the new dedicated page. - website/docs/reference/optional-skills-catalog.md — same. - website/sidebars.ts — Skills section now has Bundled / Optional subtrees with one nested category per skill folder. - .github/workflows/{docs-site-checks,deploy-site}.yml — run the generator before docusaurus build so CI stays in sync with the source SKILL.md files. Build verified locally with `npx docusaurus build`. Only remaining warnings are pre-existing broken link/anchor issues in unrelated pages.
2026-04-23 22:22:11 -07:00
sidebar_label: "Segment Anything Model"
docs: resync reference, user-guide, developer-guide, and messaging pages against code (#17738) Broad drift audit against origin/main (b52b63396). Reference pages (most user-visible drift): - slash-commands: add /busy, /curator, /footer, /indicator, /redraw, /steer that were missing; drop non-existent /terminal-setup; fix /q footnote (resolves to /queue, not /quit); extend CLI-only list with all 24 CLI-only commands in the registry - cli-commands: add dedicated sections for hermes curator / fallback / hooks (new subcommands not previously documented); remove stale hermes honcho standalone section (the plugin registers dynamically via hermes memory); list curator/fallback/hooks in top-level table; fix completion to include fish - toolsets-reference: document the real 52-toolset count; split browser vs browser-cdp; add discord / discord_admin / spotify / yuanbao; correct hermes-cli tool count from 36 to 38; fix misleading claim that hermes-homeassistant adds tools (it's identical to hermes-cli) - tools-reference: bump tool count 55 -> 68; add 7 Spotify, 5 Yuanbao, 2 Discord toolsets; move browser_cdp/browser_dialog to their own browser-cdp toolset section - environment-variables: add 40+ user-facing HERMES_* vars that were undocumented (--yolo, --accept-hooks, --ignore-*, inference model override, agent/stream/checkpoint timeouts, OAuth trace, per-platform batch tuning for Telegram/Discord/Matrix/Feishu/WeCom, cron knobs, gateway restart/connect timeouts); dedupe the Cron Scheduler section; replace stale QQ_SANDBOX with QQ_PORTAL_HOST User-guide (top level): - cli.md: compression preserves last 20 turns, not 4 (protect_last_n: 20) - configuration.md: display.platforms is the canonical per-platform override key; tool_progress_overrides is deprecated and auto-migrated - profiles.md: model.default is the config key, not model.model - sessions.md: CLI/TUI session IDs use 6-char hex, gateway uses 8 - checkpoints-and-rollback.md: destructive-command list now matches _DESTRUCTIVE_PATTERNS (adds rmdir, cp, install, dd) - docker.md: the container runs as non-root hermes (UID 10000) via gosu; fix install command (uv pip); add missing --insecure on the dashboard compose example (required for non-loopback bind) - security.md: systemctl danger pattern also matches 'restart' - index.md: built-in tool count 47 -> 68 - integrations/index.md: 6 STT providers, 8 memory providers - integrations/providers.md: drop fictional dashscope/qwen aliases Features: - overview.md: 9 image models (not 8), 9 TTS providers (not 5), 8 memory providers (Supermemory was missing) - tool-gateway.md: 9 image models - tools.md: extend common-toolsets list with search / messaging / spotify / discord / debugging / safe - fallback-providers.md: add 6 real providers from PROVIDER_REGISTRY (lmstudio, kimi-coding-cn, stepfun, alibaba-coding-plan, tencent-tokenhub, azure-foundry) - plugins.md: Available Hooks table now includes on_session_finalize, on_session_reset, subagent_stop - built-in-plugins.md: add the 7 bundled plugins the page didn't mention (spotify, google_meet, three image_gen providers, two dashboard examples) - web-dashboard.md: add --insecure and --tui flags - cron.md: hermes cron create takes positional schedule/prompt, not flags Messaging: - telegram.md: TELEGRAM_WEBHOOK_SECRET is now REQUIRED when TELEGRAM_WEBHOOK_URL is set (gateway refuses to start without it per GHSA-3vpc-7q5r-276h). Biggest user-visible drift in the batch. - discord.md: HERMES_DISCORD_TEXT_BATCH_SPLIT_DELAY_SECONDS default is 2.0, not 0.1 - dingtalk.md: document DINGTALK_REQUIRE_MENTION / FREE_RESPONSE_CHATS / MENTION_PATTERNS / HOME_CHANNEL / ALLOW_ALL_USERS that the adapter supports - bluebubbles.md: drop fictional BLUEBUBBLES_SEND_READ_RECEIPTS env var; the setting lives in platforms.bluebubbles.extra only - qqbot.md: drop dead QQ_SANDBOX; add real QQ_PORTAL_HOST and QQ_GROUP_ALLOWED_USERS - wecom-callback.md: replace 'hermes gateway start' (service-only) with 'hermes gateway' for first-time setup Developer-guide: - architecture.md: refresh tool/toolset counts (61/52), terminal backend count (7), line counts for run_agent.py (~13.7k), cli.py (~11.5k), main.py (~10.4k), setup.py (~3.5k), gateway/run.py (~12.2k), mcp_tool.py (~3.1k); add yuanbao adapter, bump platform adapter count 18 -> 20 - agent-loop.md: run_agent.py line count 10.7k -> 13.7k - tools-runtime.md: add vercel_sandbox backend - adding-tools.md: remove stale 'Discovery import added to model_tools.py' checklist item (registry auto-discovery) - adding-platform-adapters.md: mark send_typing / get_chat_info as concrete base methods; only connect/disconnect/send are abstract - acp-internals.md: ACP sessions now persist to SessionDB (~/.hermes/state.db); acp.run_agent call uses use_unstable_protocol=True - cron-internals.md: gateway runs scheduler in a dedicated background thread via _start_cron_ticker, not on a maintenance cycle; locking is cross-process via fcntl.flock (Unix) / msvcrt.locking (Windows) - gateway-internals.md: gateway/run.py ~12k lines - provider-runtime.md: cron DOES support fallback (run_job reads fallback_providers from config) - session-storage.md: SCHEMA_VERSION = 11 (not 9); add migrations 10 and 11 (trigram FTS, inline-mode FTS5 re-index); add api_call_count column to Sessions DDL; document messages_fts_trigram and state_meta in the architecture tree - context-compression-and-caching.md: remove the obsolete 'context pressure warnings' section (warnings were removed for causing models to give up early) - context-engine-plugin.md: compress() signature now includes focus_topic param - extending-the-cli.md: _build_tui_layout_children signature now includes model_picker_widget; add to default layout Also fixed three pre-existing broken links/anchors the build warned about (docker.md -> api-server.md, yuanbao.md -> cron-jobs.md and tips#background-tasks, nix-setup.md -> #container-aware-cli). Regenerated per-skill pages via website/scripts/generate-skill-docs.py so catalog tables and sidebar are consistent with current SKILL.md frontmatter. docusaurus build: clean, no broken links or anchors.
2026-04-29 20:55:59 -07:00
description: "SAM: zero-shot image segmentation via points, boxes, masks"
docs(website): dedicated page per bundled + optional skill (#14929) Generates a full dedicated Docusaurus page for every one of the 132 skills (73 bundled + 59 optional) under website/docs/user-guide/skills/{bundled,optional}/<category>/. Each page carries the skill's description, metadata (version, author, license, dependencies, platform gating, tags, related skills cross-linked to their own pages), and the complete SKILL.md body that Hermes loads at runtime. Previously the two catalog pages just listed skills with a one-line blurb and no way to see what the skill actually did — users had to go read the source repo. Now every skill has a browsable, searchable, cross-linked reference in the docs. - website/scripts/generate-skill-docs.py — generator that reads skills/ and optional-skills/, writes per-skill pages, regenerates both catalog indexes, and rewrites the Skills section of sidebars.ts. Handles MDX escaping (outside fenced code blocks: curly braces, unsafe HTML-ish tags) and rewrites relative references/*.md links to point at the GitHub source. - website/docs/reference/skills-catalog.md — regenerated; each row links to the new dedicated page. - website/docs/reference/optional-skills-catalog.md — same. - website/sidebars.ts — Skills section now has Bundled / Optional subtrees with one nested category per skill folder. - .github/workflows/{docs-site-checks,deploy-site}.yml — run the generator before docusaurus build so CI stays in sync with the source SKILL.md files. Build verified locally with `npx docusaurus build`. Only remaining warnings are pre-existing broken link/anchor issues in unrelated pages.
2026-04-23 22:22:11 -07:00
---
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
# Segment Anything Model
docs: resync reference, user-guide, developer-guide, and messaging pages against code (#17738) Broad drift audit against origin/main (b52b63396). Reference pages (most user-visible drift): - slash-commands: add /busy, /curator, /footer, /indicator, /redraw, /steer that were missing; drop non-existent /terminal-setup; fix /q footnote (resolves to /queue, not /quit); extend CLI-only list with all 24 CLI-only commands in the registry - cli-commands: add dedicated sections for hermes curator / fallback / hooks (new subcommands not previously documented); remove stale hermes honcho standalone section (the plugin registers dynamically via hermes memory); list curator/fallback/hooks in top-level table; fix completion to include fish - toolsets-reference: document the real 52-toolset count; split browser vs browser-cdp; add discord / discord_admin / spotify / yuanbao; correct hermes-cli tool count from 36 to 38; fix misleading claim that hermes-homeassistant adds tools (it's identical to hermes-cli) - tools-reference: bump tool count 55 -> 68; add 7 Spotify, 5 Yuanbao, 2 Discord toolsets; move browser_cdp/browser_dialog to their own browser-cdp toolset section - environment-variables: add 40+ user-facing HERMES_* vars that were undocumented (--yolo, --accept-hooks, --ignore-*, inference model override, agent/stream/checkpoint timeouts, OAuth trace, per-platform batch tuning for Telegram/Discord/Matrix/Feishu/WeCom, cron knobs, gateway restart/connect timeouts); dedupe the Cron Scheduler section; replace stale QQ_SANDBOX with QQ_PORTAL_HOST User-guide (top level): - cli.md: compression preserves last 20 turns, not 4 (protect_last_n: 20) - configuration.md: display.platforms is the canonical per-platform override key; tool_progress_overrides is deprecated and auto-migrated - profiles.md: model.default is the config key, not model.model - sessions.md: CLI/TUI session IDs use 6-char hex, gateway uses 8 - checkpoints-and-rollback.md: destructive-command list now matches _DESTRUCTIVE_PATTERNS (adds rmdir, cp, install, dd) - docker.md: the container runs as non-root hermes (UID 10000) via gosu; fix install command (uv pip); add missing --insecure on the dashboard compose example (required for non-loopback bind) - security.md: systemctl danger pattern also matches 'restart' - index.md: built-in tool count 47 -> 68 - integrations/index.md: 6 STT providers, 8 memory providers - integrations/providers.md: drop fictional dashscope/qwen aliases Features: - overview.md: 9 image models (not 8), 9 TTS providers (not 5), 8 memory providers (Supermemory was missing) - tool-gateway.md: 9 image models - tools.md: extend common-toolsets list with search / messaging / spotify / discord / debugging / safe - fallback-providers.md: add 6 real providers from PROVIDER_REGISTRY (lmstudio, kimi-coding-cn, stepfun, alibaba-coding-plan, tencent-tokenhub, azure-foundry) - plugins.md: Available Hooks table now includes on_session_finalize, on_session_reset, subagent_stop - built-in-plugins.md: add the 7 bundled plugins the page didn't mention (spotify, google_meet, three image_gen providers, two dashboard examples) - web-dashboard.md: add --insecure and --tui flags - cron.md: hermes cron create takes positional schedule/prompt, not flags Messaging: - telegram.md: TELEGRAM_WEBHOOK_SECRET is now REQUIRED when TELEGRAM_WEBHOOK_URL is set (gateway refuses to start without it per GHSA-3vpc-7q5r-276h). Biggest user-visible drift in the batch. - discord.md: HERMES_DISCORD_TEXT_BATCH_SPLIT_DELAY_SECONDS default is 2.0, not 0.1 - dingtalk.md: document DINGTALK_REQUIRE_MENTION / FREE_RESPONSE_CHATS / MENTION_PATTERNS / HOME_CHANNEL / ALLOW_ALL_USERS that the adapter supports - bluebubbles.md: drop fictional BLUEBUBBLES_SEND_READ_RECEIPTS env var; the setting lives in platforms.bluebubbles.extra only - qqbot.md: drop dead QQ_SANDBOX; add real QQ_PORTAL_HOST and QQ_GROUP_ALLOWED_USERS - wecom-callback.md: replace 'hermes gateway start' (service-only) with 'hermes gateway' for first-time setup Developer-guide: - architecture.md: refresh tool/toolset counts (61/52), terminal backend count (7), line counts for run_agent.py (~13.7k), cli.py (~11.5k), main.py (~10.4k), setup.py (~3.5k), gateway/run.py (~12.2k), mcp_tool.py (~3.1k); add yuanbao adapter, bump platform adapter count 18 -> 20 - agent-loop.md: run_agent.py line count 10.7k -> 13.7k - tools-runtime.md: add vercel_sandbox backend - adding-tools.md: remove stale 'Discovery import added to model_tools.py' checklist item (registry auto-discovery) - adding-platform-adapters.md: mark send_typing / get_chat_info as concrete base methods; only connect/disconnect/send are abstract - acp-internals.md: ACP sessions now persist to SessionDB (~/.hermes/state.db); acp.run_agent call uses use_unstable_protocol=True - cron-internals.md: gateway runs scheduler in a dedicated background thread via _start_cron_ticker, not on a maintenance cycle; locking is cross-process via fcntl.flock (Unix) / msvcrt.locking (Windows) - gateway-internals.md: gateway/run.py ~12k lines - provider-runtime.md: cron DOES support fallback (run_job reads fallback_providers from config) - session-storage.md: SCHEMA_VERSION = 11 (not 9); add migrations 10 and 11 (trigram FTS, inline-mode FTS5 re-index); add api_call_count column to Sessions DDL; document messages_fts_trigram and state_meta in the architecture tree - context-compression-and-caching.md: remove the obsolete 'context pressure warnings' section (warnings were removed for causing models to give up early) - context-engine-plugin.md: compress() signature now includes focus_topic param - extending-the-cli.md: _build_tui_layout_children signature now includes model_picker_widget; add to default layout Also fixed three pre-existing broken links/anchors the build warned about (docker.md -> api-server.md, yuanbao.md -> cron-jobs.md and tips#background-tasks, nix-setup.md -> #container-aware-cli). Regenerated per-skill pages via website/scripts/generate-skill-docs.py so catalog tables and sidebar are consistent with current SKILL.md frontmatter. docusaurus build: clean, no broken links or anchors.
2026-04-29 20:55:59 -07:00
SAM: zero-shot image segmentation via points, boxes, masks.
docs(website): dedicated page per bundled + optional skill (#14929) Generates a full dedicated Docusaurus page for every one of the 132 skills (73 bundled + 59 optional) under website/docs/user-guide/skills/{bundled,optional}/<category>/. Each page carries the skill's description, metadata (version, author, license, dependencies, platform gating, tags, related skills cross-linked to their own pages), and the complete SKILL.md body that Hermes loads at runtime. Previously the two catalog pages just listed skills with a one-line blurb and no way to see what the skill actually did — users had to go read the source repo. Now every skill has a browsable, searchable, cross-linked reference in the docs. - website/scripts/generate-skill-docs.py — generator that reads skills/ and optional-skills/, writes per-skill pages, regenerates both catalog indexes, and rewrites the Skills section of sidebars.ts. Handles MDX escaping (outside fenced code blocks: curly braces, unsafe HTML-ish tags) and rewrites relative references/*.md links to point at the GitHub source. - website/docs/reference/skills-catalog.md — regenerated; each row links to the new dedicated page. - website/docs/reference/optional-skills-catalog.md — same. - website/sidebars.ts — Skills section now has Bundled / Optional subtrees with one nested category per skill folder. - .github/workflows/{docs-site-checks,deploy-site}.yml — run the generator before docusaurus build so CI stays in sync with the source SKILL.md files. Build verified locally with `npx docusaurus build`. Only remaining warnings are pre-existing broken link/anchor issues in unrelated pages.
2026-04-23 22:22:11 -07:00
## Skill metadata
| | |
|---|---|
| Source | Bundled (installed by default) |
| Path | `skills/mlops/models/segment-anything` |
| Version | `1.0.0` |
| Author | Orchestra Research |
| License | MIT |
| Dependencies | `segment-anything`, `transformers>=4.30.0`, `torch>=1.7.0` |
| Tags | `Multimodal`, `Image Segmentation`, `Computer Vision`, `SAM`, `Zero-Shot` |
## Reference: full SKILL.md
:::info
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
:::
# Segment Anything Model (SAM)
Comprehensive guide to using Meta AI's Segment Anything Model for zero-shot image segmentation.
## When to use SAM
**Use SAM when:**
- Need to segment any object in images without task-specific training
- Building interactive annotation tools with point/box prompts
- Generating training data for other vision models
- Need zero-shot transfer to new image domains
- Building object detection/segmentation pipelines
- Processing medical, satellite, or domain-specific images
**Key features:**
- **Zero-shot segmentation**: Works on any image domain without fine-tuning
- **Flexible prompts**: Points, bounding boxes, or previous masks
- **Automatic segmentation**: Generate all object masks automatically
- **High quality**: Trained on 1.1 billion masks from 11 million images
- **Multiple model sizes**: ViT-B (fastest), ViT-L, ViT-H (most accurate)
- **ONNX export**: Deploy in browsers and edge devices
**Use alternatives instead:**
- **YOLO/Detectron2**: For real-time object detection with classes
- **Mask2Former**: For semantic/panoptic segmentation with categories
- **GroundingDINO + SAM**: For text-prompted segmentation
- **SAM 2**: For video segmentation tasks
## Quick start
### Installation
```bash
# From GitHub
pip install git+https://github.com/facebookresearch/segment-anything.git
# Optional dependencies
pip install opencv-python pycocotools matplotlib
# Or use HuggingFace transformers
pip install transformers
```
### Download checkpoints
```bash
# ViT-H (largest, most accurate) - 2.4GB
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
# ViT-L (medium) - 1.2GB
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_l_0b3195.pth
# ViT-B (smallest, fastest) - 375MB
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth
```
### Basic usage with SamPredictor
```python
import numpy as np
from segment_anything import sam_model_registry, SamPredictor
# Load model
sam = sam_model_registry["vit_h"](https://github.com/NousResearch/hermes-agent/blob/main/skills/mlops/models/segment-anything/checkpoint="sam_vit_h_4b8939.pth")
sam.to(device="cuda")
# Create predictor
predictor = SamPredictor(sam)
# Set image (computes embeddings once)
image = cv2.imread("image.jpg")
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
predictor.set_image(image)
# Predict with point prompts
input_point = np.array([[500, 375]]) # (x, y) coordinates
input_label = np.array([1]) # 1 = foreground, 0 = background
masks, scores, logits = predictor.predict(
point_coords=input_point,
point_labels=input_label,
multimask_output=True # Returns 3 mask options
)
# Select best mask
best_mask = masks[np.argmax(scores)]
```
### HuggingFace Transformers
```python
import torch
from PIL import Image
from transformers import SamModel, SamProcessor
# Load model and processor
model = SamModel.from_pretrained("facebook/sam-vit-huge")
processor = SamProcessor.from_pretrained("facebook/sam-vit-huge")
model.to("cuda")
# Process image with point prompt
image = Image.open("image.jpg")
input_points = [[[450, 600]]] # Batch of points
inputs = processor(image, input_points=input_points, return_tensors="pt")
inputs = {k: v.to("cuda") for k, v in inputs.items()}
# Generate masks
with torch.no_grad():
outputs = model(**inputs)
# Post-process masks to original size
masks = processor.image_processor.post_process_masks(
outputs.pred_masks.cpu(),
inputs["original_sizes"].cpu(),
inputs["reshaped_input_sizes"].cpu()
)
```
## Core concepts
### Model architecture
docs: resync reference, user-guide, developer-guide, and messaging pages against code (#17738) Broad drift audit against origin/main (b52b63396). Reference pages (most user-visible drift): - slash-commands: add /busy, /curator, /footer, /indicator, /redraw, /steer that were missing; drop non-existent /terminal-setup; fix /q footnote (resolves to /queue, not /quit); extend CLI-only list with all 24 CLI-only commands in the registry - cli-commands: add dedicated sections for hermes curator / fallback / hooks (new subcommands not previously documented); remove stale hermes honcho standalone section (the plugin registers dynamically via hermes memory); list curator/fallback/hooks in top-level table; fix completion to include fish - toolsets-reference: document the real 52-toolset count; split browser vs browser-cdp; add discord / discord_admin / spotify / yuanbao; correct hermes-cli tool count from 36 to 38; fix misleading claim that hermes-homeassistant adds tools (it's identical to hermes-cli) - tools-reference: bump tool count 55 -> 68; add 7 Spotify, 5 Yuanbao, 2 Discord toolsets; move browser_cdp/browser_dialog to their own browser-cdp toolset section - environment-variables: add 40+ user-facing HERMES_* vars that were undocumented (--yolo, --accept-hooks, --ignore-*, inference model override, agent/stream/checkpoint timeouts, OAuth trace, per-platform batch tuning for Telegram/Discord/Matrix/Feishu/WeCom, cron knobs, gateway restart/connect timeouts); dedupe the Cron Scheduler section; replace stale QQ_SANDBOX with QQ_PORTAL_HOST User-guide (top level): - cli.md: compression preserves last 20 turns, not 4 (protect_last_n: 20) - configuration.md: display.platforms is the canonical per-platform override key; tool_progress_overrides is deprecated and auto-migrated - profiles.md: model.default is the config key, not model.model - sessions.md: CLI/TUI session IDs use 6-char hex, gateway uses 8 - checkpoints-and-rollback.md: destructive-command list now matches _DESTRUCTIVE_PATTERNS (adds rmdir, cp, install, dd) - docker.md: the container runs as non-root hermes (UID 10000) via gosu; fix install command (uv pip); add missing --insecure on the dashboard compose example (required for non-loopback bind) - security.md: systemctl danger pattern also matches 'restart' - index.md: built-in tool count 47 -> 68 - integrations/index.md: 6 STT providers, 8 memory providers - integrations/providers.md: drop fictional dashscope/qwen aliases Features: - overview.md: 9 image models (not 8), 9 TTS providers (not 5), 8 memory providers (Supermemory was missing) - tool-gateway.md: 9 image models - tools.md: extend common-toolsets list with search / messaging / spotify / discord / debugging / safe - fallback-providers.md: add 6 real providers from PROVIDER_REGISTRY (lmstudio, kimi-coding-cn, stepfun, alibaba-coding-plan, tencent-tokenhub, azure-foundry) - plugins.md: Available Hooks table now includes on_session_finalize, on_session_reset, subagent_stop - built-in-plugins.md: add the 7 bundled plugins the page didn't mention (spotify, google_meet, three image_gen providers, two dashboard examples) - web-dashboard.md: add --insecure and --tui flags - cron.md: hermes cron create takes positional schedule/prompt, not flags Messaging: - telegram.md: TELEGRAM_WEBHOOK_SECRET is now REQUIRED when TELEGRAM_WEBHOOK_URL is set (gateway refuses to start without it per GHSA-3vpc-7q5r-276h). Biggest user-visible drift in the batch. - discord.md: HERMES_DISCORD_TEXT_BATCH_SPLIT_DELAY_SECONDS default is 2.0, not 0.1 - dingtalk.md: document DINGTALK_REQUIRE_MENTION / FREE_RESPONSE_CHATS / MENTION_PATTERNS / HOME_CHANNEL / ALLOW_ALL_USERS that the adapter supports - bluebubbles.md: drop fictional BLUEBUBBLES_SEND_READ_RECEIPTS env var; the setting lives in platforms.bluebubbles.extra only - qqbot.md: drop dead QQ_SANDBOX; add real QQ_PORTAL_HOST and QQ_GROUP_ALLOWED_USERS - wecom-callback.md: replace 'hermes gateway start' (service-only) with 'hermes gateway' for first-time setup Developer-guide: - architecture.md: refresh tool/toolset counts (61/52), terminal backend count (7), line counts for run_agent.py (~13.7k), cli.py (~11.5k), main.py (~10.4k), setup.py (~3.5k), gateway/run.py (~12.2k), mcp_tool.py (~3.1k); add yuanbao adapter, bump platform adapter count 18 -> 20 - agent-loop.md: run_agent.py line count 10.7k -> 13.7k - tools-runtime.md: add vercel_sandbox backend - adding-tools.md: remove stale 'Discovery import added to model_tools.py' checklist item (registry auto-discovery) - adding-platform-adapters.md: mark send_typing / get_chat_info as concrete base methods; only connect/disconnect/send are abstract - acp-internals.md: ACP sessions now persist to SessionDB (~/.hermes/state.db); acp.run_agent call uses use_unstable_protocol=True - cron-internals.md: gateway runs scheduler in a dedicated background thread via _start_cron_ticker, not on a maintenance cycle; locking is cross-process via fcntl.flock (Unix) / msvcrt.locking (Windows) - gateway-internals.md: gateway/run.py ~12k lines - provider-runtime.md: cron DOES support fallback (run_job reads fallback_providers from config) - session-storage.md: SCHEMA_VERSION = 11 (not 9); add migrations 10 and 11 (trigram FTS, inline-mode FTS5 re-index); add api_call_count column to Sessions DDL; document messages_fts_trigram and state_meta in the architecture tree - context-compression-and-caching.md: remove the obsolete 'context pressure warnings' section (warnings were removed for causing models to give up early) - context-engine-plugin.md: compress() signature now includes focus_topic param - extending-the-cli.md: _build_tui_layout_children signature now includes model_picker_widget; add to default layout Also fixed three pre-existing broken links/anchors the build warned about (docker.md -> api-server.md, yuanbao.md -> cron-jobs.md and tips#background-tasks, nix-setup.md -> #container-aware-cli). Regenerated per-skill pages via website/scripts/generate-skill-docs.py so catalog tables and sidebar are consistent with current SKILL.md frontmatter. docusaurus build: clean, no broken links or anchors.
2026-04-29 20:55:59 -07:00
<!-- ascii-guard-ignore -->
2026-04-24 12:32:10 -04:00
<!-- ascii-guard-ignore -->
docs(website): dedicated page per bundled + optional skill (#14929) Generates a full dedicated Docusaurus page for every one of the 132 skills (73 bundled + 59 optional) under website/docs/user-guide/skills/{bundled,optional}/<category>/. Each page carries the skill's description, metadata (version, author, license, dependencies, platform gating, tags, related skills cross-linked to their own pages), and the complete SKILL.md body that Hermes loads at runtime. Previously the two catalog pages just listed skills with a one-line blurb and no way to see what the skill actually did — users had to go read the source repo. Now every skill has a browsable, searchable, cross-linked reference in the docs. - website/scripts/generate-skill-docs.py — generator that reads skills/ and optional-skills/, writes per-skill pages, regenerates both catalog indexes, and rewrites the Skills section of sidebars.ts. Handles MDX escaping (outside fenced code blocks: curly braces, unsafe HTML-ish tags) and rewrites relative references/*.md links to point at the GitHub source. - website/docs/reference/skills-catalog.md — regenerated; each row links to the new dedicated page. - website/docs/reference/optional-skills-catalog.md — same. - website/sidebars.ts — Skills section now has Bundled / Optional subtrees with one nested category per skill folder. - .github/workflows/{docs-site-checks,deploy-site}.yml — run the generator before docusaurus build so CI stays in sync with the source SKILL.md files. Build verified locally with `npx docusaurus build`. Only remaining warnings are pre-existing broken link/anchor issues in unrelated pages.
2026-04-23 22:22:11 -07:00
```
SAM Architecture:
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Image Encoder │────▶│ Prompt Encoder │────▶│ Mask Decoder │
│ (ViT) │ │ (Points/Boxes) │ │ (Transformer) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
Image Embeddings Prompt Embeddings Masks + IoU
(computed once) (per prompt) predictions
```
2026-04-24 12:32:10 -04:00
<!-- ascii-guard-ignore-end -->
docs: resync reference, user-guide, developer-guide, and messaging pages against code (#17738) Broad drift audit against origin/main (b52b63396). Reference pages (most user-visible drift): - slash-commands: add /busy, /curator, /footer, /indicator, /redraw, /steer that were missing; drop non-existent /terminal-setup; fix /q footnote (resolves to /queue, not /quit); extend CLI-only list with all 24 CLI-only commands in the registry - cli-commands: add dedicated sections for hermes curator / fallback / hooks (new subcommands not previously documented); remove stale hermes honcho standalone section (the plugin registers dynamically via hermes memory); list curator/fallback/hooks in top-level table; fix completion to include fish - toolsets-reference: document the real 52-toolset count; split browser vs browser-cdp; add discord / discord_admin / spotify / yuanbao; correct hermes-cli tool count from 36 to 38; fix misleading claim that hermes-homeassistant adds tools (it's identical to hermes-cli) - tools-reference: bump tool count 55 -> 68; add 7 Spotify, 5 Yuanbao, 2 Discord toolsets; move browser_cdp/browser_dialog to their own browser-cdp toolset section - environment-variables: add 40+ user-facing HERMES_* vars that were undocumented (--yolo, --accept-hooks, --ignore-*, inference model override, agent/stream/checkpoint timeouts, OAuth trace, per-platform batch tuning for Telegram/Discord/Matrix/Feishu/WeCom, cron knobs, gateway restart/connect timeouts); dedupe the Cron Scheduler section; replace stale QQ_SANDBOX with QQ_PORTAL_HOST User-guide (top level): - cli.md: compression preserves last 20 turns, not 4 (protect_last_n: 20) - configuration.md: display.platforms is the canonical per-platform override key; tool_progress_overrides is deprecated and auto-migrated - profiles.md: model.default is the config key, not model.model - sessions.md: CLI/TUI session IDs use 6-char hex, gateway uses 8 - checkpoints-and-rollback.md: destructive-command list now matches _DESTRUCTIVE_PATTERNS (adds rmdir, cp, install, dd) - docker.md: the container runs as non-root hermes (UID 10000) via gosu; fix install command (uv pip); add missing --insecure on the dashboard compose example (required for non-loopback bind) - security.md: systemctl danger pattern also matches 'restart' - index.md: built-in tool count 47 -> 68 - integrations/index.md: 6 STT providers, 8 memory providers - integrations/providers.md: drop fictional dashscope/qwen aliases Features: - overview.md: 9 image models (not 8), 9 TTS providers (not 5), 8 memory providers (Supermemory was missing) - tool-gateway.md: 9 image models - tools.md: extend common-toolsets list with search / messaging / spotify / discord / debugging / safe - fallback-providers.md: add 6 real providers from PROVIDER_REGISTRY (lmstudio, kimi-coding-cn, stepfun, alibaba-coding-plan, tencent-tokenhub, azure-foundry) - plugins.md: Available Hooks table now includes on_session_finalize, on_session_reset, subagent_stop - built-in-plugins.md: add the 7 bundled plugins the page didn't mention (spotify, google_meet, three image_gen providers, two dashboard examples) - web-dashboard.md: add --insecure and --tui flags - cron.md: hermes cron create takes positional schedule/prompt, not flags Messaging: - telegram.md: TELEGRAM_WEBHOOK_SECRET is now REQUIRED when TELEGRAM_WEBHOOK_URL is set (gateway refuses to start without it per GHSA-3vpc-7q5r-276h). Biggest user-visible drift in the batch. - discord.md: HERMES_DISCORD_TEXT_BATCH_SPLIT_DELAY_SECONDS default is 2.0, not 0.1 - dingtalk.md: document DINGTALK_REQUIRE_MENTION / FREE_RESPONSE_CHATS / MENTION_PATTERNS / HOME_CHANNEL / ALLOW_ALL_USERS that the adapter supports - bluebubbles.md: drop fictional BLUEBUBBLES_SEND_READ_RECEIPTS env var; the setting lives in platforms.bluebubbles.extra only - qqbot.md: drop dead QQ_SANDBOX; add real QQ_PORTAL_HOST and QQ_GROUP_ALLOWED_USERS - wecom-callback.md: replace 'hermes gateway start' (service-only) with 'hermes gateway' for first-time setup Developer-guide: - architecture.md: refresh tool/toolset counts (61/52), terminal backend count (7), line counts for run_agent.py (~13.7k), cli.py (~11.5k), main.py (~10.4k), setup.py (~3.5k), gateway/run.py (~12.2k), mcp_tool.py (~3.1k); add yuanbao adapter, bump platform adapter count 18 -> 20 - agent-loop.md: run_agent.py line count 10.7k -> 13.7k - tools-runtime.md: add vercel_sandbox backend - adding-tools.md: remove stale 'Discovery import added to model_tools.py' checklist item (registry auto-discovery) - adding-platform-adapters.md: mark send_typing / get_chat_info as concrete base methods; only connect/disconnect/send are abstract - acp-internals.md: ACP sessions now persist to SessionDB (~/.hermes/state.db); acp.run_agent call uses use_unstable_protocol=True - cron-internals.md: gateway runs scheduler in a dedicated background thread via _start_cron_ticker, not on a maintenance cycle; locking is cross-process via fcntl.flock (Unix) / msvcrt.locking (Windows) - gateway-internals.md: gateway/run.py ~12k lines - provider-runtime.md: cron DOES support fallback (run_job reads fallback_providers from config) - session-storage.md: SCHEMA_VERSION = 11 (not 9); add migrations 10 and 11 (trigram FTS, inline-mode FTS5 re-index); add api_call_count column to Sessions DDL; document messages_fts_trigram and state_meta in the architecture tree - context-compression-and-caching.md: remove the obsolete 'context pressure warnings' section (warnings were removed for causing models to give up early) - context-engine-plugin.md: compress() signature now includes focus_topic param - extending-the-cli.md: _build_tui_layout_children signature now includes model_picker_widget; add to default layout Also fixed three pre-existing broken links/anchors the build warned about (docker.md -> api-server.md, yuanbao.md -> cron-jobs.md and tips#background-tasks, nix-setup.md -> #container-aware-cli). Regenerated per-skill pages via website/scripts/generate-skill-docs.py so catalog tables and sidebar are consistent with current SKILL.md frontmatter. docusaurus build: clean, no broken links or anchors.
2026-04-29 20:55:59 -07:00
<!-- ascii-guard-ignore-end -->
docs(website): dedicated page per bundled + optional skill (#14929) Generates a full dedicated Docusaurus page for every one of the 132 skills (73 bundled + 59 optional) under website/docs/user-guide/skills/{bundled,optional}/<category>/. Each page carries the skill's description, metadata (version, author, license, dependencies, platform gating, tags, related skills cross-linked to their own pages), and the complete SKILL.md body that Hermes loads at runtime. Previously the two catalog pages just listed skills with a one-line blurb and no way to see what the skill actually did — users had to go read the source repo. Now every skill has a browsable, searchable, cross-linked reference in the docs. - website/scripts/generate-skill-docs.py — generator that reads skills/ and optional-skills/, writes per-skill pages, regenerates both catalog indexes, and rewrites the Skills section of sidebars.ts. Handles MDX escaping (outside fenced code blocks: curly braces, unsafe HTML-ish tags) and rewrites relative references/*.md links to point at the GitHub source. - website/docs/reference/skills-catalog.md — regenerated; each row links to the new dedicated page. - website/docs/reference/optional-skills-catalog.md — same. - website/sidebars.ts — Skills section now has Bundled / Optional subtrees with one nested category per skill folder. - .github/workflows/{docs-site-checks,deploy-site}.yml — run the generator before docusaurus build so CI stays in sync with the source SKILL.md files. Build verified locally with `npx docusaurus build`. Only remaining warnings are pre-existing broken link/anchor issues in unrelated pages.
2026-04-23 22:22:11 -07:00
### Model variants
| Model | Checkpoint | Size | Speed | Accuracy |
|-------|------------|------|-------|----------|
| ViT-H | `vit_h` | 2.4 GB | Slowest | Best |
| ViT-L | `vit_l` | 1.2 GB | Medium | Good |
| ViT-B | `vit_b` | 375 MB | Fastest | Good |
### Prompt types
| Prompt | Description | Use Case |
|--------|-------------|----------|
| Point (foreground) | Click on object | Single object selection |
| Point (background) | Click outside object | Exclude regions |
| Bounding box | Rectangle around object | Larger objects |
| Previous mask | Low-res mask input | Iterative refinement |
## Interactive segmentation
### Point prompts
```python
# Single foreground point
input_point = np.array([[500, 375]])
input_label = np.array([1])
masks, scores, logits = predictor.predict(
point_coords=input_point,
point_labels=input_label,
multimask_output=True
)
# Multiple points (foreground + background)
input_points = np.array([[500, 375], [600, 400], [450, 300]])
input_labels = np.array([1, 1, 0]) # 2 foreground, 1 background
masks, scores, logits = predictor.predict(
point_coords=input_points,
point_labels=input_labels,
multimask_output=False # Single mask when prompts are clear
)
```
### Box prompts
```python
# Bounding box [x1, y1, x2, y2]
input_box = np.array([425, 600, 700, 875])
masks, scores, logits = predictor.predict(
box=input_box,
multimask_output=False
)
```
### Combined prompts
```python
# Box + points for precise control
masks, scores, logits = predictor.predict(
point_coords=np.array([[500, 375]]),
point_labels=np.array([1]),
box=np.array([400, 300, 700, 600]),
multimask_output=False
)
```
### Iterative refinement
```python
# Initial prediction
masks, scores, logits = predictor.predict(
point_coords=np.array([[500, 375]]),
point_labels=np.array([1]),
multimask_output=True
)
# Refine with additional point using previous mask
masks, scores, logits = predictor.predict(
point_coords=np.array([[500, 375], [550, 400]]),
point_labels=np.array([1, 0]), # Add background point
mask_input=logits[np.argmax(scores)][None, :, :], # Use best mask
multimask_output=False
)
```
## Automatic mask generation
### Basic automatic segmentation
```python
from segment_anything import SamAutomaticMaskGenerator
# Create generator
mask_generator = SamAutomaticMaskGenerator(sam)
# Generate all masks
masks = mask_generator.generate(image)
# Each mask contains:
# - segmentation: binary mask
# - bbox: [x, y, w, h]
# - area: pixel count
# - predicted_iou: quality score
# - stability_score: robustness score
# - point_coords: generating point
```
### Customized generation
```python
mask_generator = SamAutomaticMaskGenerator(
model=sam,
points_per_side=32, # Grid density (more = more masks)
pred_iou_thresh=0.88, # Quality threshold
stability_score_thresh=0.95, # Stability threshold
crop_n_layers=1, # Multi-scale crops
crop_n_points_downscale_factor=2,
min_mask_region_area=100, # Remove tiny masks
)
masks = mask_generator.generate(image)
```
### Filtering masks
```python
# Sort by area (largest first)
masks = sorted(masks, key=lambda x: x['area'], reverse=True)
# Filter by predicted IoU
high_quality = [m for m in masks if m['predicted_iou'] > 0.9]
# Filter by stability score
stable_masks = [m for m in masks if m['stability_score'] > 0.95]
```
## Batched inference
### Multiple images
```python
# Process multiple images efficiently
images = [cv2.imread(f"image_{i}.jpg") for i in range(10)]
all_masks = []
for image in images:
predictor.set_image(image)
masks, _, _ = predictor.predict(
point_coords=np.array([[500, 375]]),
point_labels=np.array([1]),
multimask_output=True
)
all_masks.append(masks)
```
### Multiple prompts per image
```python
# Process multiple prompts efficiently (one image encoding)
predictor.set_image(image)
# Batch of point prompts
points = [
np.array([[100, 100]]),
np.array([[200, 200]]),
np.array([[300, 300]])
]
all_masks = []
for point in points:
masks, scores, _ = predictor.predict(
point_coords=point,
point_labels=np.array([1]),
multimask_output=True
)
all_masks.append(masks[np.argmax(scores)])
```
## ONNX deployment
### Export model
```bash
python scripts/export_onnx_model.py \
--checkpoint sam_vit_h_4b8939.pth \
--model-type vit_h \
--output sam_onnx.onnx \
--return-single-mask
```
### Use ONNX model
```python
import onnxruntime
# Load ONNX model
ort_session = onnxruntime.InferenceSession("sam_onnx.onnx")
# Run inference (image embeddings computed separately)
masks = ort_session.run(
None,
{
"image_embeddings": image_embeddings,
"point_coords": point_coords,
"point_labels": point_labels,
"mask_input": np.zeros((1, 1, 256, 256), dtype=np.float32),
"has_mask_input": np.array([0], dtype=np.float32),
"orig_im_size": np.array([h, w], dtype=np.float32)
}
)
```
## Common workflows
### Workflow 1: Annotation tool
```python
import cv2
# Load model
predictor = SamPredictor(sam)
predictor.set_image(image)
def on_click(event, x, y, flags, param):
if event == cv2.EVENT_LBUTTONDOWN:
# Foreground point
masks, scores, _ = predictor.predict(
point_coords=np.array([[x, y]]),
point_labels=np.array([1]),
multimask_output=True
)
# Display best mask
display_mask(masks[np.argmax(scores)])
```
### Workflow 2: Object extraction
```python
def extract_object(image, point):
"""Extract object at point with transparent background."""
predictor.set_image(image)
masks, scores, _ = predictor.predict(
point_coords=np.array([point]),
point_labels=np.array([1]),
multimask_output=True
)
best_mask = masks[np.argmax(scores)]
# Create RGBA output
rgba = np.zeros((image.shape[0], image.shape[1], 4), dtype=np.uint8)
rgba[:, :, :3] = image
rgba[:, :, 3] = best_mask * 255
return rgba
```
### Workflow 3: Medical image segmentation
```python
# Process medical images (grayscale to RGB)
medical_image = cv2.imread("scan.png", cv2.IMREAD_GRAYSCALE)
rgb_image = cv2.cvtColor(medical_image, cv2.COLOR_GRAY2RGB)
predictor.set_image(rgb_image)
# Segment region of interest
masks, scores, _ = predictor.predict(
box=np.array([x1, y1, x2, y2]), # ROI bounding box
multimask_output=True
)
```
## Output format
### Mask data structure
```python
# SamAutomaticMaskGenerator output
{
"segmentation": np.ndarray, # H×W binary mask
"bbox": [x, y, w, h], # Bounding box
"area": int, # Pixel count
"predicted_iou": float, # 0-1 quality score
"stability_score": float, # 0-1 robustness score
"crop_box": [x, y, w, h], # Generation crop region
"point_coords": [[x, y]], # Input point
}
```
### COCO RLE format
```python
from pycocotools import mask as mask_utils
# Encode mask to RLE
rle = mask_utils.encode(np.asfortranarray(mask.astype(np.uint8)))
rle["counts"] = rle["counts"].decode("utf-8")
# Decode RLE to mask
decoded_mask = mask_utils.decode(rle)
```
## Performance optimization
### GPU memory
```python
# Use smaller model for limited VRAM
sam = sam_model_registry["vit_b"](https://github.com/NousResearch/hermes-agent/blob/main/skills/mlops/models/segment-anything/checkpoint="sam_vit_b_01ec64.pth")
# Process images in batches
# Clear CUDA cache between large batches
torch.cuda.empty_cache()
```
### Speed optimization
```python
# Use half precision
sam = sam.half()
# Reduce points for automatic generation
mask_generator = SamAutomaticMaskGenerator(
model=sam,
points_per_side=16, # Default is 32
)
# Use ONNX for deployment
# Export with --return-single-mask for faster inference
```
## Common issues
| Issue | Solution |
|-------|----------|
| Out of memory | Use ViT-B model, reduce image size |
| Slow inference | Use ViT-B, reduce points_per_side |
| Poor mask quality | Try different prompts, use box + points |
| Edge artifacts | Use stability_score filtering |
| Small objects missed | Increase points_per_side |
## References
- **[Advanced Usage](https://github.com/NousResearch/hermes-agent/blob/main/skills/mlops/models/segment-anything/references/advanced-usage.md)** - Batching, fine-tuning, integration
- **[Troubleshooting](https://github.com/NousResearch/hermes-agent/blob/main/skills/mlops/models/segment-anything/references/troubleshooting.md)** - Common issues and solutions
## Resources
- **GitHub**: https://github.com/facebookresearch/segment-anything
- **Paper**: https://arxiv.org/abs/2304.02643
- **Demo**: https://segment-anything.com
- **SAM 2 (Video)**: https://github.com/facebookresearch/segment-anything-2
- **HuggingFace**: https://huggingface.co/facebook/sam-vit-huge