mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-03 17:27:37 +08:00
Broad drift audit against origin/main (b52b63396).
Reference pages (most user-visible drift):
- slash-commands: add /busy, /curator, /footer, /indicator, /redraw, /steer
that were missing; drop non-existent /terminal-setup; fix /q footnote
(resolves to /queue, not /quit); extend CLI-only list with all 24
CLI-only commands in the registry
- cli-commands: add dedicated sections for hermes curator / fallback /
hooks (new subcommands not previously documented); remove stale
hermes honcho standalone section (the plugin registers dynamically
via hermes memory); list curator/fallback/hooks in top-level table;
fix completion to include fish
- toolsets-reference: document the real 52-toolset count; split browser
vs browser-cdp; add discord / discord_admin / spotify / yuanbao;
correct hermes-cli tool count from 36 to 38; fix misleading claim
that hermes-homeassistant adds tools (it's identical to hermes-cli)
- tools-reference: bump tool count 55 -> 68; add 7 Spotify, 5 Yuanbao,
2 Discord toolsets; move browser_cdp/browser_dialog to their own
browser-cdp toolset section
- environment-variables: add 40+ user-facing HERMES_* vars that were
undocumented (--yolo, --accept-hooks, --ignore-*, inference model
override, agent/stream/checkpoint timeouts, OAuth trace, per-platform
batch tuning for Telegram/Discord/Matrix/Feishu/WeCom, cron knobs,
gateway restart/connect timeouts); dedupe the Cron Scheduler section;
replace stale QQ_SANDBOX with QQ_PORTAL_HOST
User-guide (top level):
- cli.md: compression preserves last 20 turns, not 4 (protect_last_n: 20)
- configuration.md: display.platforms is the canonical per-platform
override key; tool_progress_overrides is deprecated and auto-migrated
- profiles.md: model.default is the config key, not model.model
- sessions.md: CLI/TUI session IDs use 6-char hex, gateway uses 8
- checkpoints-and-rollback.md: destructive-command list now matches
_DESTRUCTIVE_PATTERNS (adds rmdir, cp, install, dd)
- docker.md: the container runs as non-root hermes (UID 10000) via
gosu; fix install command (uv pip); add missing --insecure on the
dashboard compose example (required for non-loopback bind)
- security.md: systemctl danger pattern also matches 'restart'
- index.md: built-in tool count 47 -> 68
- integrations/index.md: 6 STT providers, 8 memory providers
- integrations/providers.md: drop fictional dashscope/qwen aliases
Features:
- overview.md: 9 image models (not 8), 9 TTS providers (not 5),
8 memory providers (Supermemory was missing)
- tool-gateway.md: 9 image models
- tools.md: extend common-toolsets list with search / messaging /
spotify / discord / debugging / safe
- fallback-providers.md: add 6 real providers from PROVIDER_REGISTRY
(lmstudio, kimi-coding-cn, stepfun, alibaba-coding-plan,
tencent-tokenhub, azure-foundry)
- plugins.md: Available Hooks table now includes on_session_finalize,
on_session_reset, subagent_stop
- built-in-plugins.md: add the 7 bundled plugins the page didn't
mention (spotify, google_meet, three image_gen providers, two
dashboard examples)
- web-dashboard.md: add --insecure and --tui flags
- cron.md: hermes cron create takes positional schedule/prompt, not
flags
Messaging:
- telegram.md: TELEGRAM_WEBHOOK_SECRET is now REQUIRED when
TELEGRAM_WEBHOOK_URL is set (gateway refuses to start without it
per GHSA-3vpc-7q5r-276h). Biggest user-visible drift in the batch.
- discord.md: HERMES_DISCORD_TEXT_BATCH_SPLIT_DELAY_SECONDS default
is 2.0, not 0.1
- dingtalk.md: document DINGTALK_REQUIRE_MENTION /
FREE_RESPONSE_CHATS / MENTION_PATTERNS / HOME_CHANNEL /
ALLOW_ALL_USERS that the adapter supports
- bluebubbles.md: drop fictional BLUEBUBBLES_SEND_READ_RECEIPTS env
var; the setting lives in platforms.bluebubbles.extra only
- qqbot.md: drop dead QQ_SANDBOX; add real QQ_PORTAL_HOST and
QQ_GROUP_ALLOWED_USERS
- wecom-callback.md: replace 'hermes gateway start' (service-only)
with 'hermes gateway' for first-time setup
Developer-guide:
- architecture.md: refresh tool/toolset counts (61/52), terminal
backend count (7), line counts for run_agent.py (~13.7k), cli.py
(~11.5k), main.py (~10.4k), setup.py (~3.5k), gateway/run.py
(~12.2k), mcp_tool.py (~3.1k); add yuanbao adapter, bump platform
adapter count 18 -> 20
- agent-loop.md: run_agent.py line count 10.7k -> 13.7k
- tools-runtime.md: add vercel_sandbox backend
- adding-tools.md: remove stale 'Discovery import added to
model_tools.py' checklist item (registry auto-discovery)
- adding-platform-adapters.md: mark send_typing / get_chat_info as
concrete base methods; only connect/disconnect/send are abstract
- acp-internals.md: ACP sessions now persist to SessionDB
(~/.hermes/state.db); acp.run_agent call uses
use_unstable_protocol=True
- cron-internals.md: gateway runs scheduler in a dedicated background
thread via _start_cron_ticker, not on a maintenance cycle; locking
is cross-process via fcntl.flock (Unix) / msvcrt.locking (Windows)
- gateway-internals.md: gateway/run.py ~12k lines
- provider-runtime.md: cron DOES support fallback (run_job reads
fallback_providers from config)
- session-storage.md: SCHEMA_VERSION = 11 (not 9); add migrations
10 and 11 (trigram FTS, inline-mode FTS5 re-index); add
api_call_count column to Sessions DDL; document messages_fts_trigram
and state_meta in the architecture tree
- context-compression-and-caching.md: remove the obsolete 'context
pressure warnings' section (warnings were removed for causing
models to give up early)
- context-engine-plugin.md: compress() signature now includes
focus_topic param
- extending-the-cli.md: _build_tui_layout_children signature now
includes model_picker_widget; add to default layout
Also fixed three pre-existing broken links/anchors the build warned
about (docker.md -> api-server.md, yuanbao.md -> cron-jobs.md and
tips#background-tasks, nix-setup.md -> #container-aware-cli).
Regenerated per-skill pages via website/scripts/generate-skill-docs.py
so catalog tables and sidebar are consistent with current SKILL.md
frontmatter.
docusaurus build: clean, no broken links or anchors.
671 lines
16 KiB
Markdown
671 lines
16 KiB
Markdown
---
|
|
title: "Outlines — Outlines: structured JSON/regex/Pydantic LLM generation"
|
|
sidebar_label: "Outlines"
|
|
description: "Outlines: structured JSON/regex/Pydantic LLM generation"
|
|
---
|
|
|
|
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
|
|
|
# Outlines
|
|
|
|
Outlines: structured JSON/regex/Pydantic LLM generation.
|
|
|
|
## Skill metadata
|
|
|
|
| | |
|
|
|---|---|
|
|
| Source | Bundled (installed by default) |
|
|
| Path | `skills/mlops/inference/outlines` |
|
|
| Version | `1.0.0` |
|
|
| Author | Orchestra Research |
|
|
| License | MIT |
|
|
| Dependencies | `outlines`, `transformers`, `vllm`, `pydantic` |
|
|
| Tags | `Prompt Engineering`, `Outlines`, `Structured Generation`, `JSON Schema`, `Pydantic`, `Local Models`, `Grammar-Based Generation`, `vLLM`, `Transformers`, `Type Safety` |
|
|
|
|
## Reference: full SKILL.md
|
|
|
|
:::info
|
|
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
|
:::
|
|
|
|
# Outlines: Structured Text Generation
|
|
|
|
## When to Use This Skill
|
|
|
|
Use Outlines when you need to:
|
|
- **Guarantee valid JSON/XML/code** structure during generation
|
|
- **Use Pydantic models** for type-safe outputs
|
|
- **Support local models** (Transformers, llama.cpp, vLLM)
|
|
- **Maximize inference speed** with zero-overhead structured generation
|
|
- **Generate against JSON schemas** automatically
|
|
- **Control token sampling** at the grammar level
|
|
|
|
**GitHub Stars**: 8,000+ | **From**: dottxt.ai (formerly .txt)
|
|
|
|
## Installation
|
|
|
|
```bash
|
|
# Base installation
|
|
pip install outlines
|
|
|
|
# With specific backends
|
|
pip install outlines transformers # Hugging Face models
|
|
pip install outlines llama-cpp-python # llama.cpp
|
|
pip install outlines vllm # vLLM for high-throughput
|
|
```
|
|
|
|
## Quick Start
|
|
|
|
### Basic Example: Classification
|
|
|
|
```python
|
|
import outlines
|
|
from typing import Literal
|
|
|
|
# Load model
|
|
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
|
|
|
|
# Generate with type constraint
|
|
prompt = "Sentiment of 'This product is amazing!': "
|
|
generator = outlines.generate.choice(model, ["positive", "negative", "neutral"])
|
|
sentiment = generator(prompt)
|
|
|
|
print(sentiment) # "positive" (guaranteed one of these)
|
|
```
|
|
|
|
### With Pydantic Models
|
|
|
|
```python
|
|
from pydantic import BaseModel
|
|
import outlines
|
|
|
|
class User(BaseModel):
|
|
name: str
|
|
age: int
|
|
email: str
|
|
|
|
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
|
|
|
|
# Generate structured output
|
|
prompt = "Extract user: John Doe, 30 years old, john@example.com"
|
|
generator = outlines.generate.json(model, User)
|
|
user = generator(prompt)
|
|
|
|
print(user.name) # "John Doe"
|
|
print(user.age) # 30
|
|
print(user.email) # "john@example.com"
|
|
```
|
|
|
|
## Core Concepts
|
|
|
|
### 1. Constrained Token Sampling
|
|
|
|
Outlines uses Finite State Machines (FSM) to constrain token generation at the logit level.
|
|
|
|
**How it works:**
|
|
1. Convert schema (JSON/Pydantic/regex) to context-free grammar (CFG)
|
|
2. Transform CFG into Finite State Machine (FSM)
|
|
3. Filter invalid tokens at each step during generation
|
|
4. Fast-forward when only one valid token exists
|
|
|
|
**Benefits:**
|
|
- **Zero overhead**: Filtering happens at token level
|
|
- **Speed improvement**: Fast-forward through deterministic paths
|
|
- **Guaranteed validity**: Invalid outputs impossible
|
|
|
|
```python
|
|
import outlines
|
|
|
|
# Pydantic model -> JSON schema -> CFG -> FSM
|
|
class Person(BaseModel):
|
|
name: str
|
|
age: int
|
|
|
|
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
|
|
|
|
# Behind the scenes:
|
|
# 1. Person -> JSON schema
|
|
# 2. JSON schema -> CFG
|
|
# 3. CFG -> FSM
|
|
# 4. FSM filters tokens during generation
|
|
|
|
generator = outlines.generate.json(model, Person)
|
|
result = generator("Generate person: Alice, 25")
|
|
```
|
|
|
|
### 2. Structured Generators
|
|
|
|
Outlines provides specialized generators for different output types.
|
|
|
|
#### Choice Generator
|
|
|
|
```python
|
|
# Multiple choice selection
|
|
generator = outlines.generate.choice(
|
|
model,
|
|
["positive", "negative", "neutral"]
|
|
)
|
|
|
|
sentiment = generator("Review: This is great!")
|
|
# Result: One of the three choices
|
|
```
|
|
|
|
#### JSON Generator
|
|
|
|
```python
|
|
from pydantic import BaseModel
|
|
|
|
class Product(BaseModel):
|
|
name: str
|
|
price: float
|
|
in_stock: bool
|
|
|
|
# Generate valid JSON matching schema
|
|
generator = outlines.generate.json(model, Product)
|
|
product = generator("Extract: iPhone 15, $999, available")
|
|
|
|
# Guaranteed valid Product instance
|
|
print(type(product)) # <class '__main__.Product'>
|
|
```
|
|
|
|
#### Regex Generator
|
|
|
|
```python
|
|
# Generate text matching regex
|
|
generator = outlines.generate.regex(
|
|
model,
|
|
r"[0-9]{3}-[0-9]{3}-[0-9]{4}" # Phone number pattern
|
|
)
|
|
|
|
phone = generator("Generate phone number:")
|
|
# Result: "555-123-4567" (guaranteed to match pattern)
|
|
```
|
|
|
|
#### Integer/Float Generators
|
|
|
|
```python
|
|
# Generate specific numeric types
|
|
int_generator = outlines.generate.integer(model)
|
|
age = int_generator("Person's age:") # Guaranteed integer
|
|
|
|
float_generator = outlines.generate.float(model)
|
|
price = float_generator("Product price:") # Guaranteed float
|
|
```
|
|
|
|
### 3. Model Backends
|
|
|
|
Outlines supports multiple local and API-based backends.
|
|
|
|
#### Transformers (Hugging Face)
|
|
|
|
```python
|
|
import outlines
|
|
|
|
# Load from Hugging Face
|
|
model = outlines.models.transformers(
|
|
"microsoft/Phi-3-mini-4k-instruct",
|
|
device="cuda" # Or "cpu"
|
|
)
|
|
|
|
# Use with any generator
|
|
generator = outlines.generate.json(model, YourModel)
|
|
```
|
|
|
|
#### llama.cpp
|
|
|
|
```python
|
|
# Load GGUF model
|
|
model = outlines.models.llamacpp(
|
|
"./models/llama-3.1-8b-instruct.Q4_K_M.gguf",
|
|
n_gpu_layers=35
|
|
)
|
|
|
|
generator = outlines.generate.json(model, YourModel)
|
|
```
|
|
|
|
#### vLLM (High Throughput)
|
|
|
|
```python
|
|
# For production deployments
|
|
model = outlines.models.vllm(
|
|
"meta-llama/Llama-3.1-8B-Instruct",
|
|
tensor_parallel_size=2 # Multi-GPU
|
|
)
|
|
|
|
generator = outlines.generate.json(model, YourModel)
|
|
```
|
|
|
|
#### OpenAI (Limited Support)
|
|
|
|
```python
|
|
# Basic OpenAI support
|
|
model = outlines.models.openai(
|
|
"gpt-4o-mini",
|
|
api_key="your-api-key"
|
|
)
|
|
|
|
# Note: Some features limited with API models
|
|
generator = outlines.generate.json(model, YourModel)
|
|
```
|
|
|
|
### 4. Pydantic Integration
|
|
|
|
Outlines has first-class Pydantic support with automatic schema translation.
|
|
|
|
#### Basic Models
|
|
|
|
```python
|
|
from pydantic import BaseModel, Field
|
|
|
|
class Article(BaseModel):
|
|
title: str = Field(description="Article title")
|
|
author: str = Field(description="Author name")
|
|
word_count: int = Field(description="Number of words", gt=0)
|
|
tags: list[str] = Field(description="List of tags")
|
|
|
|
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
|
|
generator = outlines.generate.json(model, Article)
|
|
|
|
article = generator("Generate article about AI")
|
|
print(article.title)
|
|
print(article.word_count) # Guaranteed > 0
|
|
```
|
|
|
|
#### Nested Models
|
|
|
|
```python
|
|
class Address(BaseModel):
|
|
street: str
|
|
city: str
|
|
country: str
|
|
|
|
class Person(BaseModel):
|
|
name: str
|
|
age: int
|
|
address: Address # Nested model
|
|
|
|
generator = outlines.generate.json(model, Person)
|
|
person = generator("Generate person in New York")
|
|
|
|
print(person.address.city) # "New York"
|
|
```
|
|
|
|
#### Enums and Literals
|
|
|
|
```python
|
|
from enum import Enum
|
|
from typing import Literal
|
|
|
|
class Status(str, Enum):
|
|
PENDING = "pending"
|
|
APPROVED = "approved"
|
|
REJECTED = "rejected"
|
|
|
|
class Application(BaseModel):
|
|
applicant: str
|
|
status: Status # Must be one of enum values
|
|
priority: Literal["low", "medium", "high"] # Must be one of literals
|
|
|
|
generator = outlines.generate.json(model, Application)
|
|
app = generator("Generate application")
|
|
|
|
print(app.status) # Status.PENDING (or APPROVED/REJECTED)
|
|
```
|
|
|
|
## Common Patterns
|
|
|
|
### Pattern 1: Data Extraction
|
|
|
|
```python
|
|
from pydantic import BaseModel
|
|
import outlines
|
|
|
|
class CompanyInfo(BaseModel):
|
|
name: str
|
|
founded_year: int
|
|
industry: str
|
|
employees: int
|
|
|
|
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
|
|
generator = outlines.generate.json(model, CompanyInfo)
|
|
|
|
text = """
|
|
Apple Inc. was founded in 1976 in the technology industry.
|
|
The company employs approximately 164,000 people worldwide.
|
|
"""
|
|
|
|
prompt = f"Extract company information:\n{text}\n\nCompany:"
|
|
company = generator(prompt)
|
|
|
|
print(f"Name: {company.name}")
|
|
print(f"Founded: {company.founded_year}")
|
|
print(f"Industry: {company.industry}")
|
|
print(f"Employees: {company.employees}")
|
|
```
|
|
|
|
### Pattern 2: Classification
|
|
|
|
```python
|
|
from typing import Literal
|
|
import outlines
|
|
|
|
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
|
|
|
|
# Binary classification
|
|
generator = outlines.generate.choice(model, ["spam", "not_spam"])
|
|
result = generator("Email: Buy now! 50% off!")
|
|
|
|
# Multi-class classification
|
|
categories = ["technology", "business", "sports", "entertainment"]
|
|
category_gen = outlines.generate.choice(model, categories)
|
|
category = category_gen("Article: Apple announces new iPhone...")
|
|
|
|
# With confidence
|
|
class Classification(BaseModel):
|
|
label: Literal["positive", "negative", "neutral"]
|
|
confidence: float
|
|
|
|
classifier = outlines.generate.json(model, Classification)
|
|
result = classifier("Review: This product is okay, nothing special")
|
|
```
|
|
|
|
### Pattern 3: Structured Forms
|
|
|
|
```python
|
|
class UserProfile(BaseModel):
|
|
full_name: str
|
|
age: int
|
|
email: str
|
|
phone: str
|
|
country: str
|
|
interests: list[str]
|
|
|
|
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
|
|
generator = outlines.generate.json(model, UserProfile)
|
|
|
|
prompt = """
|
|
Extract user profile from:
|
|
Name: Alice Johnson
|
|
Age: 28
|
|
Email: alice@example.com
|
|
Phone: 555-0123
|
|
Country: USA
|
|
Interests: hiking, photography, cooking
|
|
"""
|
|
|
|
profile = generator(prompt)
|
|
print(profile.full_name)
|
|
print(profile.interests) # ["hiking", "photography", "cooking"]
|
|
```
|
|
|
|
### Pattern 4: Multi-Entity Extraction
|
|
|
|
```python
|
|
class Entity(BaseModel):
|
|
name: str
|
|
type: Literal["PERSON", "ORGANIZATION", "LOCATION"]
|
|
|
|
class DocumentEntities(BaseModel):
|
|
entities: list[Entity]
|
|
|
|
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
|
|
generator = outlines.generate.json(model, DocumentEntities)
|
|
|
|
text = "Tim Cook met with Satya Nadella at Microsoft headquarters in Redmond."
|
|
prompt = f"Extract entities from: {text}"
|
|
|
|
result = generator(prompt)
|
|
for entity in result.entities:
|
|
print(f"{entity.name} ({entity.type})")
|
|
```
|
|
|
|
### Pattern 5: Code Generation
|
|
|
|
```python
|
|
class PythonFunction(BaseModel):
|
|
function_name: str
|
|
parameters: list[str]
|
|
docstring: str
|
|
body: str
|
|
|
|
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
|
|
generator = outlines.generate.json(model, PythonFunction)
|
|
|
|
prompt = "Generate a Python function to calculate factorial"
|
|
func = generator(prompt)
|
|
|
|
print(f"def {func.function_name}({', '.join(func.parameters)}):")
|
|
print(f' """{func.docstring}"""')
|
|
print(f" {func.body}")
|
|
```
|
|
|
|
### Pattern 6: Batch Processing
|
|
|
|
```python
|
|
def batch_extract(texts: list[str], schema: type[BaseModel]):
|
|
"""Extract structured data from multiple texts."""
|
|
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
|
|
generator = outlines.generate.json(model, schema)
|
|
|
|
results = []
|
|
for text in texts:
|
|
result = generator(f"Extract from: {text}")
|
|
results.append(result)
|
|
|
|
return results
|
|
|
|
class Person(BaseModel):
|
|
name: str
|
|
age: int
|
|
|
|
texts = [
|
|
"John is 30 years old",
|
|
"Alice is 25 years old",
|
|
"Bob is 40 years old"
|
|
]
|
|
|
|
people = batch_extract(texts, Person)
|
|
for person in people:
|
|
print(f"{person.name}: {person.age}")
|
|
```
|
|
|
|
## Backend Configuration
|
|
|
|
### Transformers
|
|
|
|
```python
|
|
import outlines
|
|
|
|
# Basic usage
|
|
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
|
|
|
|
# GPU configuration
|
|
model = outlines.models.transformers(
|
|
"microsoft/Phi-3-mini-4k-instruct",
|
|
device="cuda",
|
|
model_kwargs={"torch_dtype": "float16"}
|
|
)
|
|
|
|
# Popular models
|
|
model = outlines.models.transformers("meta-llama/Llama-3.1-8B-Instruct")
|
|
model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.3")
|
|
model = outlines.models.transformers("Qwen/Qwen2.5-7B-Instruct")
|
|
```
|
|
|
|
### llama.cpp
|
|
|
|
```python
|
|
# Load GGUF model
|
|
model = outlines.models.llamacpp(
|
|
"./models/llama-3.1-8b.Q4_K_M.gguf",
|
|
n_ctx=4096, # Context window
|
|
n_gpu_layers=35, # GPU layers
|
|
n_threads=8 # CPU threads
|
|
)
|
|
|
|
# Full GPU offload
|
|
model = outlines.models.llamacpp(
|
|
"./models/model.gguf",
|
|
n_gpu_layers=-1 # All layers on GPU
|
|
)
|
|
```
|
|
|
|
### vLLM (Production)
|
|
|
|
```python
|
|
# Single GPU
|
|
model = outlines.models.vllm("meta-llama/Llama-3.1-8B-Instruct")
|
|
|
|
# Multi-GPU
|
|
model = outlines.models.vllm(
|
|
"meta-llama/Llama-3.1-70B-Instruct",
|
|
tensor_parallel_size=4 # 4 GPUs
|
|
)
|
|
|
|
# With quantization
|
|
model = outlines.models.vllm(
|
|
"meta-llama/Llama-3.1-8B-Instruct",
|
|
quantization="awq" # Or "gptq"
|
|
)
|
|
```
|
|
|
|
## Best Practices
|
|
|
|
### 1. Use Specific Types
|
|
|
|
```python
|
|
# ✅ Good: Specific types
|
|
class Product(BaseModel):
|
|
name: str
|
|
price: float # Not str
|
|
quantity: int # Not str
|
|
in_stock: bool # Not str
|
|
|
|
# ❌ Bad: Everything as string
|
|
class Product(BaseModel):
|
|
name: str
|
|
price: str # Should be float
|
|
quantity: str # Should be int
|
|
```
|
|
|
|
### 2. Add Constraints
|
|
|
|
```python
|
|
from pydantic import Field
|
|
|
|
# ✅ Good: With constraints
|
|
class User(BaseModel):
|
|
name: str = Field(min_length=1, max_length=100)
|
|
age: int = Field(ge=0, le=120)
|
|
email: str = Field(pattern=r"^[\w\.-]+@[\w\.-]+\.\w+$")
|
|
|
|
# ❌ Bad: No constraints
|
|
class User(BaseModel):
|
|
name: str
|
|
age: int
|
|
email: str
|
|
```
|
|
|
|
### 3. Use Enums for Categories
|
|
|
|
```python
|
|
# ✅ Good: Enum for fixed set
|
|
class Priority(str, Enum):
|
|
LOW = "low"
|
|
MEDIUM = "medium"
|
|
HIGH = "high"
|
|
|
|
class Task(BaseModel):
|
|
title: str
|
|
priority: Priority
|
|
|
|
# ❌ Bad: Free-form string
|
|
class Task(BaseModel):
|
|
title: str
|
|
priority: str # Can be anything
|
|
```
|
|
|
|
### 4. Provide Context in Prompts
|
|
|
|
```python
|
|
# ✅ Good: Clear context
|
|
prompt = """
|
|
Extract product information from the following text.
|
|
Text: iPhone 15 Pro costs $999 and is currently in stock.
|
|
Product:
|
|
"""
|
|
|
|
# ❌ Bad: Minimal context
|
|
prompt = "iPhone 15 Pro costs $999 and is currently in stock."
|
|
```
|
|
|
|
### 5. Handle Optional Fields
|
|
|
|
```python
|
|
from typing import Optional
|
|
|
|
# ✅ Good: Optional fields for incomplete data
|
|
class Article(BaseModel):
|
|
title: str # Required
|
|
author: Optional[str] = None # Optional
|
|
date: Optional[str] = None # Optional
|
|
tags: list[str] = [] # Default empty list
|
|
|
|
# Can succeed even if author/date missing
|
|
```
|
|
|
|
## Comparison to Alternatives
|
|
|
|
| Feature | Outlines | Instructor | Guidance | LMQL |
|
|
|---------|----------|------------|----------|------|
|
|
| Pydantic Support | ✅ Native | ✅ Native | ❌ No | ❌ No |
|
|
| JSON Schema | ✅ Yes | ✅ Yes | ⚠️ Limited | ✅ Yes |
|
|
| Regex Constraints | ✅ Yes | ❌ No | ✅ Yes | ✅ Yes |
|
|
| Local Models | ✅ Full | ⚠️ Limited | ✅ Full | ✅ Full |
|
|
| API Models | ⚠️ Limited | ✅ Full | ✅ Full | ✅ Full |
|
|
| Zero Overhead | ✅ Yes | ❌ No | ⚠️ Partial | ✅ Yes |
|
|
| Automatic Retrying | ❌ No | ✅ Yes | ❌ No | ❌ No |
|
|
| Learning Curve | Low | Low | Low | High |
|
|
|
|
**When to choose Outlines:**
|
|
- Using local models (Transformers, llama.cpp, vLLM)
|
|
- Need maximum inference speed
|
|
- Want Pydantic model support
|
|
- Require zero-overhead structured generation
|
|
- Control token sampling process
|
|
|
|
**When to choose alternatives:**
|
|
- Instructor: Need API models with automatic retrying
|
|
- Guidance: Need token healing and complex workflows
|
|
- LMQL: Prefer declarative query syntax
|
|
|
|
## Performance Characteristics
|
|
|
|
**Speed:**
|
|
- **Zero overhead**: Structured generation as fast as unconstrained
|
|
- **Fast-forward optimization**: Skips deterministic tokens
|
|
- **1.2-2x faster** than post-generation validation approaches
|
|
|
|
**Memory:**
|
|
- FSM compiled once per schema (cached)
|
|
- Minimal runtime overhead
|
|
- Efficient with vLLM for high throughput
|
|
|
|
**Accuracy:**
|
|
- **100% valid outputs** (guaranteed by FSM)
|
|
- No retry loops needed
|
|
- Deterministic token filtering
|
|
|
|
## Resources
|
|
|
|
- **Documentation**: https://outlines-dev.github.io/outlines
|
|
- **GitHub**: https://github.com/outlines-dev/outlines (8k+ stars)
|
|
- **Discord**: https://discord.gg/R9DSu34mGd
|
|
- **Blog**: https://blog.dottxt.co
|
|
|
|
## See Also
|
|
|
|
- `references/json_generation.md` - Comprehensive JSON and Pydantic patterns
|
|
- `references/backends.md` - Backend-specific configuration
|
|
- `references/examples.md` - Production-ready examples
|