mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-04 09:47:54 +08:00
Broad drift audit against origin/main (b52b63396).
Reference pages (most user-visible drift):
- slash-commands: add /busy, /curator, /footer, /indicator, /redraw, /steer
that were missing; drop non-existent /terminal-setup; fix /q footnote
(resolves to /queue, not /quit); extend CLI-only list with all 24
CLI-only commands in the registry
- cli-commands: add dedicated sections for hermes curator / fallback /
hooks (new subcommands not previously documented); remove stale
hermes honcho standalone section (the plugin registers dynamically
via hermes memory); list curator/fallback/hooks in top-level table;
fix completion to include fish
- toolsets-reference: document the real 52-toolset count; split browser
vs browser-cdp; add discord / discord_admin / spotify / yuanbao;
correct hermes-cli tool count from 36 to 38; fix misleading claim
that hermes-homeassistant adds tools (it's identical to hermes-cli)
- tools-reference: bump tool count 55 -> 68; add 7 Spotify, 5 Yuanbao,
2 Discord toolsets; move browser_cdp/browser_dialog to their own
browser-cdp toolset section
- environment-variables: add 40+ user-facing HERMES_* vars that were
undocumented (--yolo, --accept-hooks, --ignore-*, inference model
override, agent/stream/checkpoint timeouts, OAuth trace, per-platform
batch tuning for Telegram/Discord/Matrix/Feishu/WeCom, cron knobs,
gateway restart/connect timeouts); dedupe the Cron Scheduler section;
replace stale QQ_SANDBOX with QQ_PORTAL_HOST
User-guide (top level):
- cli.md: compression preserves last 20 turns, not 4 (protect_last_n: 20)
- configuration.md: display.platforms is the canonical per-platform
override key; tool_progress_overrides is deprecated and auto-migrated
- profiles.md: model.default is the config key, not model.model
- sessions.md: CLI/TUI session IDs use 6-char hex, gateway uses 8
- checkpoints-and-rollback.md: destructive-command list now matches
_DESTRUCTIVE_PATTERNS (adds rmdir, cp, install, dd)
- docker.md: the container runs as non-root hermes (UID 10000) via
gosu; fix install command (uv pip); add missing --insecure on the
dashboard compose example (required for non-loopback bind)
- security.md: systemctl danger pattern also matches 'restart'
- index.md: built-in tool count 47 -> 68
- integrations/index.md: 6 STT providers, 8 memory providers
- integrations/providers.md: drop fictional dashscope/qwen aliases
Features:
- overview.md: 9 image models (not 8), 9 TTS providers (not 5),
8 memory providers (Supermemory was missing)
- tool-gateway.md: 9 image models
- tools.md: extend common-toolsets list with search / messaging /
spotify / discord / debugging / safe
- fallback-providers.md: add 6 real providers from PROVIDER_REGISTRY
(lmstudio, kimi-coding-cn, stepfun, alibaba-coding-plan,
tencent-tokenhub, azure-foundry)
- plugins.md: Available Hooks table now includes on_session_finalize,
on_session_reset, subagent_stop
- built-in-plugins.md: add the 7 bundled plugins the page didn't
mention (spotify, google_meet, three image_gen providers, two
dashboard examples)
- web-dashboard.md: add --insecure and --tui flags
- cron.md: hermes cron create takes positional schedule/prompt, not
flags
Messaging:
- telegram.md: TELEGRAM_WEBHOOK_SECRET is now REQUIRED when
TELEGRAM_WEBHOOK_URL is set (gateway refuses to start without it
per GHSA-3vpc-7q5r-276h). Biggest user-visible drift in the batch.
- discord.md: HERMES_DISCORD_TEXT_BATCH_SPLIT_DELAY_SECONDS default
is 2.0, not 0.1
- dingtalk.md: document DINGTALK_REQUIRE_MENTION /
FREE_RESPONSE_CHATS / MENTION_PATTERNS / HOME_CHANNEL /
ALLOW_ALL_USERS that the adapter supports
- bluebubbles.md: drop fictional BLUEBUBBLES_SEND_READ_RECEIPTS env
var; the setting lives in platforms.bluebubbles.extra only
- qqbot.md: drop dead QQ_SANDBOX; add real QQ_PORTAL_HOST and
QQ_GROUP_ALLOWED_USERS
- wecom-callback.md: replace 'hermes gateway start' (service-only)
with 'hermes gateway' for first-time setup
Developer-guide:
- architecture.md: refresh tool/toolset counts (61/52), terminal
backend count (7), line counts for run_agent.py (~13.7k), cli.py
(~11.5k), main.py (~10.4k), setup.py (~3.5k), gateway/run.py
(~12.2k), mcp_tool.py (~3.1k); add yuanbao adapter, bump platform
adapter count 18 -> 20
- agent-loop.md: run_agent.py line count 10.7k -> 13.7k
- tools-runtime.md: add vercel_sandbox backend
- adding-tools.md: remove stale 'Discovery import added to
model_tools.py' checklist item (registry auto-discovery)
- adding-platform-adapters.md: mark send_typing / get_chat_info as
concrete base methods; only connect/disconnect/send are abstract
- acp-internals.md: ACP sessions now persist to SessionDB
(~/.hermes/state.db); acp.run_agent call uses
use_unstable_protocol=True
- cron-internals.md: gateway runs scheduler in a dedicated background
thread via _start_cron_ticker, not on a maintenance cycle; locking
is cross-process via fcntl.flock (Unix) / msvcrt.locking (Windows)
- gateway-internals.md: gateway/run.py ~12k lines
- provider-runtime.md: cron DOES support fallback (run_job reads
fallback_providers from config)
- session-storage.md: SCHEMA_VERSION = 11 (not 9); add migrations
10 and 11 (trigram FTS, inline-mode FTS5 re-index); add
api_call_count column to Sessions DDL; document messages_fts_trigram
and state_meta in the architecture tree
- context-compression-and-caching.md: remove the obsolete 'context
pressure warnings' section (warnings were removed for causing
models to give up early)
- context-engine-plugin.md: compress() signature now includes
focus_topic param
- extending-the-cli.md: _build_tui_layout_children signature now
includes model_picker_widget; add to default layout
Also fixed three pre-existing broken links/anchors the build warned
about (docker.md -> api-server.md, yuanbao.md -> cron-jobs.md and
tips#background-tasks, nix-setup.md -> #container-aware-cli).
Regenerated per-skill pages via website/scripts/generate-skill-docs.py
so catalog tables and sidebar are consistent with current SKILL.md
frontmatter.
docusaurus build: clean, no broken links or anchors.
609 lines
16 KiB
Markdown
609 lines
16 KiB
Markdown
---
|
|
title: "Dspy — DSPy: declarative LM programs, auto-optimize prompts, RAG"
|
|
sidebar_label: "Dspy"
|
|
description: "DSPy: declarative LM programs, auto-optimize prompts, RAG"
|
|
---
|
|
|
|
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
|
|
|
# Dspy
|
|
|
|
DSPy: declarative LM programs, auto-optimize prompts, RAG.
|
|
|
|
## Skill metadata
|
|
|
|
| | |
|
|
|---|---|
|
|
| Source | Bundled (installed by default) |
|
|
| Path | `skills/mlops/research/dspy` |
|
|
| Version | `1.0.0` |
|
|
| Author | Orchestra Research |
|
|
| License | MIT |
|
|
| Dependencies | `dspy`, `openai`, `anthropic` |
|
|
| Tags | `Prompt Engineering`, `DSPy`, `Declarative Programming`, `RAG`, `Agents`, `Prompt Optimization`, `LM Programming`, `Stanford NLP`, `Automatic Optimization`, `Modular AI` |
|
|
|
|
## Reference: full SKILL.md
|
|
|
|
:::info
|
|
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
|
:::
|
|
|
|
# DSPy: Declarative Language Model Programming
|
|
|
|
## When to Use This Skill
|
|
|
|
Use DSPy when you need to:
|
|
- **Build complex AI systems** with multiple components and workflows
|
|
- **Program LMs declaratively** instead of manual prompt engineering
|
|
- **Optimize prompts automatically** using data-driven methods
|
|
- **Create modular AI pipelines** that are maintainable and portable
|
|
- **Improve model outputs systematically** with optimizers
|
|
- **Build RAG systems, agents, or classifiers** with better reliability
|
|
|
|
**GitHub Stars**: 22,000+ | **Created By**: Stanford NLP
|
|
|
|
## Installation
|
|
|
|
```bash
|
|
# Stable release
|
|
pip install dspy
|
|
|
|
# Latest development version
|
|
pip install git+https://github.com/stanfordnlp/dspy.git
|
|
|
|
# With specific LM providers
|
|
pip install dspy[openai] # OpenAI
|
|
pip install dspy[anthropic] # Anthropic Claude
|
|
pip install dspy[all] # All providers
|
|
```
|
|
|
|
## Quick Start
|
|
|
|
### Basic Example: Question Answering
|
|
|
|
```python
|
|
import dspy
|
|
|
|
# Configure your language model
|
|
lm = dspy.Claude(model="claude-sonnet-4-5-20250929")
|
|
dspy.settings.configure(lm=lm)
|
|
|
|
# Define a signature (input → output)
|
|
class QA(dspy.Signature):
|
|
"""Answer questions with short factual answers."""
|
|
question = dspy.InputField()
|
|
answer = dspy.OutputField(desc="often between 1 and 5 words")
|
|
|
|
# Create a module
|
|
qa = dspy.Predict(QA)
|
|
|
|
# Use it
|
|
response = qa(question="What is the capital of France?")
|
|
print(response.answer) # "Paris"
|
|
```
|
|
|
|
### Chain of Thought Reasoning
|
|
|
|
```python
|
|
import dspy
|
|
|
|
lm = dspy.Claude(model="claude-sonnet-4-5-20250929")
|
|
dspy.settings.configure(lm=lm)
|
|
|
|
# Use ChainOfThought for better reasoning
|
|
class MathProblem(dspy.Signature):
|
|
"""Solve math word problems."""
|
|
problem = dspy.InputField()
|
|
answer = dspy.OutputField(desc="numerical answer")
|
|
|
|
# ChainOfThought generates reasoning steps automatically
|
|
cot = dspy.ChainOfThought(MathProblem)
|
|
|
|
response = cot(problem="If John has 5 apples and gives 2 to Mary, how many does he have?")
|
|
print(response.rationale) # Shows reasoning steps
|
|
print(response.answer) # "3"
|
|
```
|
|
|
|
## Core Concepts
|
|
|
|
### 1. Signatures
|
|
|
|
Signatures define the structure of your AI task (inputs → outputs):
|
|
|
|
```python
|
|
# Inline signature (simple)
|
|
qa = dspy.Predict("question -> answer")
|
|
|
|
# Class signature (detailed)
|
|
class Summarize(dspy.Signature):
|
|
"""Summarize text into key points."""
|
|
text = dspy.InputField()
|
|
summary = dspy.OutputField(desc="bullet points, 3-5 items")
|
|
|
|
summarizer = dspy.ChainOfThought(Summarize)
|
|
```
|
|
|
|
**When to use each:**
|
|
- **Inline**: Quick prototyping, simple tasks
|
|
- **Class**: Complex tasks, type hints, better documentation
|
|
|
|
### 2. Modules
|
|
|
|
Modules are reusable components that transform inputs to outputs:
|
|
|
|
#### dspy.Predict
|
|
Basic prediction module:
|
|
|
|
```python
|
|
predictor = dspy.Predict("context, question -> answer")
|
|
result = predictor(context="Paris is the capital of France",
|
|
question="What is the capital?")
|
|
```
|
|
|
|
#### dspy.ChainOfThought
|
|
Generates reasoning steps before answering:
|
|
|
|
```python
|
|
cot = dspy.ChainOfThought("question -> answer")
|
|
result = cot(question="Why is the sky blue?")
|
|
print(result.rationale) # Reasoning steps
|
|
print(result.answer) # Final answer
|
|
```
|
|
|
|
#### dspy.ReAct
|
|
Agent-like reasoning with tools:
|
|
|
|
```python
|
|
from dspy.predict import ReAct
|
|
|
|
class SearchQA(dspy.Signature):
|
|
"""Answer questions using search."""
|
|
question = dspy.InputField()
|
|
answer = dspy.OutputField()
|
|
|
|
def search_tool(query: str) -> str:
|
|
"""Search Wikipedia."""
|
|
# Your search implementation
|
|
return results
|
|
|
|
react = ReAct(SearchQA, tools=[search_tool])
|
|
result = react(question="When was Python created?")
|
|
```
|
|
|
|
#### dspy.ProgramOfThought
|
|
Generates and executes code for reasoning:
|
|
|
|
```python
|
|
pot = dspy.ProgramOfThought("question -> answer")
|
|
result = pot(question="What is 15% of 240?")
|
|
# Generates: answer = 240 * 0.15
|
|
```
|
|
|
|
### 3. Optimizers
|
|
|
|
Optimizers improve your modules automatically using training data:
|
|
|
|
#### BootstrapFewShot
|
|
Learns from examples:
|
|
|
|
```python
|
|
from dspy.teleprompt import BootstrapFewShot
|
|
|
|
# Training data
|
|
trainset = [
|
|
dspy.Example(question="What is 2+2?", answer="4").with_inputs("question"),
|
|
dspy.Example(question="What is 3+5?", answer="8").with_inputs("question"),
|
|
]
|
|
|
|
# Define metric
|
|
def validate_answer(example, pred, trace=None):
|
|
return example.answer == pred.answer
|
|
|
|
# Optimize
|
|
optimizer = BootstrapFewShot(metric=validate_answer, max_bootstrapped_demos=3)
|
|
optimized_qa = optimizer.compile(qa, trainset=trainset)
|
|
|
|
# Now optimized_qa performs better!
|
|
```
|
|
|
|
#### MIPRO (Most Important Prompt Optimization)
|
|
Iteratively improves prompts:
|
|
|
|
```python
|
|
from dspy.teleprompt import MIPRO
|
|
|
|
optimizer = MIPRO(
|
|
metric=validate_answer,
|
|
num_candidates=10,
|
|
init_temperature=1.0
|
|
)
|
|
|
|
optimized_cot = optimizer.compile(
|
|
cot,
|
|
trainset=trainset,
|
|
num_trials=100
|
|
)
|
|
```
|
|
|
|
#### BootstrapFinetune
|
|
Creates datasets for model fine-tuning:
|
|
|
|
```python
|
|
from dspy.teleprompt import BootstrapFinetune
|
|
|
|
optimizer = BootstrapFinetune(metric=validate_answer)
|
|
optimized_module = optimizer.compile(qa, trainset=trainset)
|
|
|
|
# Exports training data for fine-tuning
|
|
```
|
|
|
|
### 4. Building Complex Systems
|
|
|
|
#### Multi-Stage Pipeline
|
|
|
|
```python
|
|
import dspy
|
|
|
|
class MultiHopQA(dspy.Module):
|
|
def __init__(self):
|
|
super().__init__()
|
|
self.retrieve = dspy.Retrieve(k=3)
|
|
self.generate_query = dspy.ChainOfThought("question -> search_query")
|
|
self.generate_answer = dspy.ChainOfThought("context, question -> answer")
|
|
|
|
def forward(self, question):
|
|
# Stage 1: Generate search query
|
|
search_query = self.generate_query(question=question).search_query
|
|
|
|
# Stage 2: Retrieve context
|
|
passages = self.retrieve(search_query).passages
|
|
context = "\n".join(passages)
|
|
|
|
# Stage 3: Generate answer
|
|
answer = self.generate_answer(context=context, question=question).answer
|
|
return dspy.Prediction(answer=answer, context=context)
|
|
|
|
# Use the pipeline
|
|
qa_system = MultiHopQA()
|
|
result = qa_system(question="Who wrote the book that inspired the movie Blade Runner?")
|
|
```
|
|
|
|
#### RAG System with Optimization
|
|
|
|
```python
|
|
import dspy
|
|
from dspy.retrieve.chromadb_rm import ChromadbRM
|
|
|
|
# Configure retriever
|
|
retriever = ChromadbRM(
|
|
collection_name="documents",
|
|
persist_directory="./chroma_db"
|
|
)
|
|
|
|
class RAG(dspy.Module):
|
|
def __init__(self, num_passages=3):
|
|
super().__init__()
|
|
self.retrieve = dspy.Retrieve(k=num_passages)
|
|
self.generate = dspy.ChainOfThought("context, question -> answer")
|
|
|
|
def forward(self, question):
|
|
context = self.retrieve(question).passages
|
|
return self.generate(context=context, question=question)
|
|
|
|
# Create and optimize
|
|
rag = RAG()
|
|
|
|
# Optimize with training data
|
|
from dspy.teleprompt import BootstrapFewShot
|
|
|
|
optimizer = BootstrapFewShot(metric=validate_answer)
|
|
optimized_rag = optimizer.compile(rag, trainset=trainset)
|
|
```
|
|
|
|
## LM Provider Configuration
|
|
|
|
### Anthropic Claude
|
|
|
|
```python
|
|
import dspy
|
|
|
|
lm = dspy.Claude(
|
|
model="claude-sonnet-4-5-20250929",
|
|
api_key="your-api-key", # Or set ANTHROPIC_API_KEY env var
|
|
max_tokens=1000,
|
|
temperature=0.7
|
|
)
|
|
dspy.settings.configure(lm=lm)
|
|
```
|
|
|
|
### OpenAI
|
|
|
|
```python
|
|
lm = dspy.OpenAI(
|
|
model="gpt-4",
|
|
api_key="your-api-key",
|
|
max_tokens=1000
|
|
)
|
|
dspy.settings.configure(lm=lm)
|
|
```
|
|
|
|
### Local Models (Ollama)
|
|
|
|
```python
|
|
lm = dspy.OllamaLocal(
|
|
model="llama3.1",
|
|
base_url="http://localhost:11434"
|
|
)
|
|
dspy.settings.configure(lm=lm)
|
|
```
|
|
|
|
### Multiple Models
|
|
|
|
```python
|
|
# Different models for different tasks
|
|
cheap_lm = dspy.OpenAI(model="gpt-3.5-turbo")
|
|
strong_lm = dspy.Claude(model="claude-sonnet-4-5-20250929")
|
|
|
|
# Use cheap model for retrieval, strong model for reasoning
|
|
with dspy.settings.context(lm=cheap_lm):
|
|
context = retriever(question)
|
|
|
|
with dspy.settings.context(lm=strong_lm):
|
|
answer = generator(context=context, question=question)
|
|
```
|
|
|
|
## Common Patterns
|
|
|
|
### Pattern 1: Structured Output
|
|
|
|
```python
|
|
from pydantic import BaseModel, Field
|
|
|
|
class PersonInfo(BaseModel):
|
|
name: str = Field(description="Full name")
|
|
age: int = Field(description="Age in years")
|
|
occupation: str = Field(description="Current job")
|
|
|
|
class ExtractPerson(dspy.Signature):
|
|
"""Extract person information from text."""
|
|
text = dspy.InputField()
|
|
person: PersonInfo = dspy.OutputField()
|
|
|
|
extractor = dspy.TypedPredictor(ExtractPerson)
|
|
result = extractor(text="John Doe is a 35-year-old software engineer.")
|
|
print(result.person.name) # "John Doe"
|
|
print(result.person.age) # 35
|
|
```
|
|
|
|
### Pattern 2: Assertion-Driven Optimization
|
|
|
|
```python
|
|
import dspy
|
|
from dspy.primitives.assertions import assert_transform_module, backtrack_handler
|
|
|
|
class MathQA(dspy.Module):
|
|
def __init__(self):
|
|
super().__init__()
|
|
self.solve = dspy.ChainOfThought("problem -> solution: float")
|
|
|
|
def forward(self, problem):
|
|
solution = self.solve(problem=problem).solution
|
|
|
|
# Assert solution is numeric
|
|
dspy.Assert(
|
|
isinstance(float(solution), float),
|
|
"Solution must be a number",
|
|
backtrack=backtrack_handler
|
|
)
|
|
|
|
return dspy.Prediction(solution=solution)
|
|
```
|
|
|
|
### Pattern 3: Self-Consistency
|
|
|
|
```python
|
|
import dspy
|
|
from collections import Counter
|
|
|
|
class ConsistentQA(dspy.Module):
|
|
def __init__(self, num_samples=5):
|
|
super().__init__()
|
|
self.qa = dspy.ChainOfThought("question -> answer")
|
|
self.num_samples = num_samples
|
|
|
|
def forward(self, question):
|
|
# Generate multiple answers
|
|
answers = []
|
|
for _ in range(self.num_samples):
|
|
result = self.qa(question=question)
|
|
answers.append(result.answer)
|
|
|
|
# Return most common answer
|
|
most_common = Counter(answers).most_common(1)[0][0]
|
|
return dspy.Prediction(answer=most_common)
|
|
```
|
|
|
|
### Pattern 4: Retrieval with Reranking
|
|
|
|
```python
|
|
class RerankedRAG(dspy.Module):
|
|
def __init__(self):
|
|
super().__init__()
|
|
self.retrieve = dspy.Retrieve(k=10)
|
|
self.rerank = dspy.Predict("question, passage -> relevance_score: float")
|
|
self.answer = dspy.ChainOfThought("context, question -> answer")
|
|
|
|
def forward(self, question):
|
|
# Retrieve candidates
|
|
passages = self.retrieve(question).passages
|
|
|
|
# Rerank passages
|
|
scored = []
|
|
for passage in passages:
|
|
score = float(self.rerank(question=question, passage=passage).relevance_score)
|
|
scored.append((score, passage))
|
|
|
|
# Take top 3
|
|
top_passages = [p for _, p in sorted(scored, reverse=True)[:3]]
|
|
context = "\n\n".join(top_passages)
|
|
|
|
# Generate answer
|
|
return self.answer(context=context, question=question)
|
|
```
|
|
|
|
## Evaluation and Metrics
|
|
|
|
### Custom Metrics
|
|
|
|
```python
|
|
def exact_match(example, pred, trace=None):
|
|
"""Exact match metric."""
|
|
return example.answer.lower() == pred.answer.lower()
|
|
|
|
def f1_score(example, pred, trace=None):
|
|
"""F1 score for text overlap."""
|
|
pred_tokens = set(pred.answer.lower().split())
|
|
gold_tokens = set(example.answer.lower().split())
|
|
|
|
if not pred_tokens:
|
|
return 0.0
|
|
|
|
precision = len(pred_tokens & gold_tokens) / len(pred_tokens)
|
|
recall = len(pred_tokens & gold_tokens) / len(gold_tokens)
|
|
|
|
if precision + recall == 0:
|
|
return 0.0
|
|
|
|
return 2 * (precision * recall) / (precision + recall)
|
|
```
|
|
|
|
### Evaluation
|
|
|
|
```python
|
|
from dspy.evaluate import Evaluate
|
|
|
|
# Create evaluator
|
|
evaluator = Evaluate(
|
|
devset=testset,
|
|
metric=exact_match,
|
|
num_threads=4,
|
|
display_progress=True
|
|
)
|
|
|
|
# Evaluate model
|
|
score = evaluator(qa_system)
|
|
print(f"Accuracy: {score}")
|
|
|
|
# Compare optimized vs unoptimized
|
|
score_before = evaluator(qa)
|
|
score_after = evaluator(optimized_qa)
|
|
print(f"Improvement: {score_after - score_before:.2%}")
|
|
```
|
|
|
|
## Best Practices
|
|
|
|
### 1. Start Simple, Iterate
|
|
|
|
```python
|
|
# Start with Predict
|
|
qa = dspy.Predict("question -> answer")
|
|
|
|
# Add reasoning if needed
|
|
qa = dspy.ChainOfThought("question -> answer")
|
|
|
|
# Add optimization when you have data
|
|
optimized_qa = optimizer.compile(qa, trainset=data)
|
|
```
|
|
|
|
### 2. Use Descriptive Signatures
|
|
|
|
```python
|
|
# ❌ Bad: Vague
|
|
class Task(dspy.Signature):
|
|
input = dspy.InputField()
|
|
output = dspy.OutputField()
|
|
|
|
# ✅ Good: Descriptive
|
|
class SummarizeArticle(dspy.Signature):
|
|
"""Summarize news articles into 3-5 key points."""
|
|
article = dspy.InputField(desc="full article text")
|
|
summary = dspy.OutputField(desc="bullet points, 3-5 items")
|
|
```
|
|
|
|
### 3. Optimize with Representative Data
|
|
|
|
```python
|
|
# Create diverse training examples
|
|
trainset = [
|
|
dspy.Example(question="factual", answer="...).with_inputs("question"),
|
|
dspy.Example(question="reasoning", answer="...").with_inputs("question"),
|
|
dspy.Example(question="calculation", answer="...").with_inputs("question"),
|
|
]
|
|
|
|
# Use validation set for metric
|
|
def metric(example, pred, trace=None):
|
|
return example.answer in pred.answer
|
|
```
|
|
|
|
### 4. Save and Load Optimized Models
|
|
|
|
```python
|
|
# Save
|
|
optimized_qa.save("models/qa_v1.json")
|
|
|
|
# Load
|
|
loaded_qa = dspy.ChainOfThought("question -> answer")
|
|
loaded_qa.load("models/qa_v1.json")
|
|
```
|
|
|
|
### 5. Monitor and Debug
|
|
|
|
```python
|
|
# Enable tracing
|
|
dspy.settings.configure(lm=lm, trace=[])
|
|
|
|
# Run prediction
|
|
result = qa(question="...")
|
|
|
|
# Inspect trace
|
|
for call in dspy.settings.trace:
|
|
print(f"Prompt: {call['prompt']}")
|
|
print(f"Response: {call['response']}")
|
|
```
|
|
|
|
## Comparison to Other Approaches
|
|
|
|
| Feature | Manual Prompting | LangChain | DSPy |
|
|
|---------|-----------------|-----------|------|
|
|
| Prompt Engineering | Manual | Manual | Automatic |
|
|
| Optimization | Trial & error | None | Data-driven |
|
|
| Modularity | Low | Medium | High |
|
|
| Type Safety | No | Limited | Yes (Signatures) |
|
|
| Portability | Low | Medium | High |
|
|
| Learning Curve | Low | Medium | Medium-High |
|
|
|
|
**When to choose DSPy:**
|
|
- You have training data or can generate it
|
|
- You need systematic prompt improvement
|
|
- You're building complex multi-stage systems
|
|
- You want to optimize across different LMs
|
|
|
|
**When to choose alternatives:**
|
|
- Quick prototypes (manual prompting)
|
|
- Simple chains with existing tools (LangChain)
|
|
- Custom optimization logic needed
|
|
|
|
## Resources
|
|
|
|
- **Documentation**: https://dspy.ai
|
|
- **GitHub**: https://github.com/stanfordnlp/dspy (22k+ stars)
|
|
- **Discord**: https://discord.gg/XCGy2WDCQB
|
|
- **Twitter**: @DSPyOSS
|
|
- **Paper**: "DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines"
|
|
|
|
## See Also
|
|
|
|
- `references/modules.md` - Detailed module guide (Predict, ChainOfThought, ReAct, ProgramOfThought)
|
|
- `references/optimizers.md` - Optimization algorithms (BootstrapFewShot, MIPRO, BootstrapFinetune)
|
|
- `references/examples.md` - Real-world examples (RAG, agents, classifiers)
|