mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-28 06:51:16 +08:00
98 lines
3.0 KiB
Markdown
98 lines
3.0 KiB
Markdown
|
|
---
|
||
|
|
title: "Songsee — Generate spectrograms and audio feature visualizations (mel, chroma, MFCC, tempogram, etc"
|
||
|
|
sidebar_label: "Songsee"
|
||
|
|
description: "Generate spectrograms and audio feature visualizations (mel, chroma, MFCC, tempogram, etc"
|
||
|
|
---
|
||
|
|
|
||
|
|
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||
|
|
|
||
|
|
# Songsee
|
||
|
|
|
||
|
|
Generate spectrograms and audio feature visualizations (mel, chroma, MFCC, tempogram, etc.) from audio files via CLI. Useful for audio analysis, music production debugging, and visual documentation.
|
||
|
|
|
||
|
|
## Skill metadata
|
||
|
|
|
||
|
|
| | |
|
||
|
|
|---|---|
|
||
|
|
| Source | Bundled (installed by default) |
|
||
|
|
| Path | `skills/media/songsee` |
|
||
|
|
| Version | `1.0.0` |
|
||
|
|
| Author | community |
|
||
|
|
| License | MIT |
|
||
|
|
| Tags | `Audio`, `Visualization`, `Spectrogram`, `Music`, `Analysis` |
|
||
|
|
|
||
|
|
## Reference: full SKILL.md
|
||
|
|
|
||
|
|
:::info
|
||
|
|
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||
|
|
:::
|
||
|
|
|
||
|
|
# songsee
|
||
|
|
|
||
|
|
Generate spectrograms and multi-panel audio feature visualizations from audio files.
|
||
|
|
|
||
|
|
## Prerequisites
|
||
|
|
|
||
|
|
Requires [Go](https://go.dev/doc/install):
|
||
|
|
```bash
|
||
|
|
go install github.com/steipete/songsee/cmd/songsee@latest
|
||
|
|
```
|
||
|
|
|
||
|
|
Optional: `ffmpeg` for formats beyond WAV/MP3.
|
||
|
|
|
||
|
|
## Quick Start
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Basic spectrogram
|
||
|
|
songsee track.mp3
|
||
|
|
|
||
|
|
# Save to specific file
|
||
|
|
songsee track.mp3 -o spectrogram.png
|
||
|
|
|
||
|
|
# Multi-panel visualization grid
|
||
|
|
songsee track.mp3 --viz spectrogram,mel,chroma,hpss,selfsim,loudness,tempogram,mfcc,flux
|
||
|
|
|
||
|
|
# Time slice (start at 12.5s, 8s duration)
|
||
|
|
songsee track.mp3 --start 12.5 --duration 8 -o slice.jpg
|
||
|
|
|
||
|
|
# From stdin
|
||
|
|
cat track.mp3 | songsee - --format png -o out.png
|
||
|
|
```
|
||
|
|
|
||
|
|
## Visualization Types
|
||
|
|
|
||
|
|
Use `--viz` with comma-separated values:
|
||
|
|
|
||
|
|
| Type | Description |
|
||
|
|
|------|-------------|
|
||
|
|
| `spectrogram` | Standard frequency spectrogram |
|
||
|
|
| `mel` | Mel-scaled spectrogram |
|
||
|
|
| `chroma` | Pitch class distribution |
|
||
|
|
| `hpss` | Harmonic/percussive separation |
|
||
|
|
| `selfsim` | Self-similarity matrix |
|
||
|
|
| `loudness` | Loudness over time |
|
||
|
|
| `tempogram` | Tempo estimation |
|
||
|
|
| `mfcc` | Mel-frequency cepstral coefficients |
|
||
|
|
| `flux` | Spectral flux (onset detection) |
|
||
|
|
|
||
|
|
Multiple `--viz` types render as a grid in a single image.
|
||
|
|
|
||
|
|
## Common Flags
|
||
|
|
|
||
|
|
| Flag | Description |
|
||
|
|
|------|-------------|
|
||
|
|
| `--viz` | Visualization types (comma-separated) |
|
||
|
|
| `--style` | Color palette: `classic`, `magma`, `inferno`, `viridis`, `gray` |
|
||
|
|
| `--width` / `--height` | Output image dimensions |
|
||
|
|
| `--window` / `--hop` | FFT window and hop size |
|
||
|
|
| `--min-freq` / `--max-freq` | Frequency range filter |
|
||
|
|
| `--start` / `--duration` | Time slice of the audio |
|
||
|
|
| `--format` | Output format: `jpg` or `png` |
|
||
|
|
| `-o` | Output file path |
|
||
|
|
|
||
|
|
## Notes
|
||
|
|
|
||
|
|
- WAV and MP3 are decoded natively; other formats require `ffmpeg`
|
||
|
|
- Output images can be inspected with `vision_analyze` for automated audio analysis
|
||
|
|
- Useful for comparing audio outputs, debugging synthesis, or documenting audio processing pipelines
|