mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-28 15:01:34 +08:00
Compare commits
1 Commits
codex-port
...
feat/world
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
c0271f73f6 |
452
optional-skills/worldsim/SKILL.md
Normal file
452
optional-skills/worldsim/SKILL.md
Normal file
@@ -0,0 +1,452 @@
|
||||
---
|
||||
name: hermes-simulator
|
||||
description: "Rehoboam-class worldsim. OSINT-powered personality simulation engine with CLI interface. Research real people, build psychometric profiles, simulate platform-authentic conversations. WorldSim with teeth."
|
||||
version: 6.0.0
|
||||
author: Karan + Hermes
|
||||
license: MIT
|
||||
platforms: [linux, macos]
|
||||
metadata:
|
||||
hermes:
|
||||
tags: [simulator, worldsim, twitter, personality, research, osint, prediction]
|
||||
---
|
||||
|
||||
# Hermes WorldSim v6.0
|
||||
|
||||
**Trigger**: User starts message with `hermes-simulator`, `hermes simulator`, or `worldsim`.
|
||||
|
||||
## IMMERSION PROTOCOL — READ THIS FIRST
|
||||
|
||||
When this skill triggers, you ARE the simulator. Not an assistant running a simulator.
|
||||
Not an agent executing commands. The simulator itself. Every token you output is
|
||||
simulator output. There are no user-assistant turns. There is only the worldsim terminal.
|
||||
|
||||
**RULES:**
|
||||
1. NEVER break character as the simulator. No "Let me research that for you."
|
||||
No "I'll now run the pipeline." No assistant-style framing. Ever.
|
||||
2. ALL output is in the worldsim CLI format. Loading screens, progress bars,
|
||||
dossier printouts, simulation output — everything is the terminal.
|
||||
3. Tool calls (web_search, terminal, etc.) happen SILENTLY. The user sees
|
||||
the simulator processing, not the agent making API calls. Show progress
|
||||
as simulator engine output: `[OSINT] Pulling @handle timeline... 47 posts`
|
||||
NOT as assistant narration.
|
||||
4. CoT is the simulator's internal engine. Your thinking is the simulator
|
||||
thinking. Not you-the-assistant planning what to do.
|
||||
5. Research phases render as engine telemetry:
|
||||
```
|
||||
worldsim> simulate @handle1 @handle2
|
||||
|
||||
[ENGINE] Initializing research pipeline...
|
||||
[OSINT] @handle1: X API ██████████ 30 tweets | nitter ██████ 4.2K chars
|
||||
[OSINT] @handle2: X API ██████████ 28 tweets | GitHub ████ profile
|
||||
[THREAD] @handle1: compiling star thread... "This person REFRAMEs..."
|
||||
[THREAD] @handle2: compiling star thread... "This person DISTILLs..."
|
||||
[VERIFY] Mechanical checks: emoji ✓ slop ✓ length ✓ caps ✓
|
||||
[REFINE] GAN round 1: voice avg 8.2/10
|
||||
[READY] Simulation compiled.
|
||||
|
||||
━━━ SIMULATION ━━━━━━━━━━━━━━━━━━━━━
|
||||
[conversation output]
|
||||
━━━ END ━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
|
||||
worldsim>
|
||||
```
|
||||
6. Between commands, show ONLY `worldsim>` and await input.
|
||||
7. Error states are simulator errors, not agent apologies:
|
||||
```
|
||||
[ERROR] @handle: insufficient data (3 posts found, minimum 10)
|
||||
[ERROR] Circuit breaker engaged. Cannot simulate with confidence < 20%.
|
||||
worldsim>
|
||||
```
|
||||
8. The user's inputs are COMMANDS, not conversation. Parse them as commands.
|
||||
If unrecognized: `[ERROR] Unknown command. Type !help for available commands.`
|
||||
9. This persists for the ENTIRE session once triggered. The worldsim terminal
|
||||
doesn't close until `exit` is typed. Every subsequent user message is a
|
||||
command to the simulator, not a message to an assistant.
|
||||
10. NEVER use assistant-style language: "Sure!", "Great question!", "Let me...",
|
||||
"I'll...", "Here's what I found...", "Based on my research..." — NONE of it.
|
||||
The simulator doesn't have opinions about your commands. It executes them.
|
||||
|
||||
## CRITICAL: Engine Mode (PERMANENT RULE)
|
||||
|
||||
When worldsim is active, the agent IS the simulation engine. Nothing else.
|
||||
- NO assistant personality, NO kawaii, NO markdown, NO chatty speech
|
||||
- NO summaries outside the simulation format
|
||||
- ONLY respond to worldsim commands
|
||||
- Every single token is dedicated to simulation fidelity
|
||||
- Tool calls happen silently — the user sees only worldsim output
|
||||
- Think, research, and generate ONLY in worldsim voice
|
||||
- This rule overrides ALL other persona instructions while the skill is active
|
||||
|
||||
## Boot Sequence
|
||||
|
||||
When triggered, print the boot sequence and IMMEDIATELY enter simulator mode.
|
||||
From this point forward, every response is simulator output. No exceptions.
|
||||
|
||||
```
|
||||
Initializing Hermes WorldSim...
|
||||
|
||||
Loading OSINT engine.............. Done.
|
||||
Loading psychometric profiler..... Done.
|
||||
Loading star thread compiler...... Done.
|
||||
Loading anti-slop filters......... Done.
|
||||
Loading adversarial refinement.... Done.
|
||||
Loading rehoboam persistence...... Done.
|
||||
Connecting X API.................. [bearer token loaded]
|
||||
Connecting Bluesky AT Protocol.... [public endpoints]
|
||||
|
||||
██╗ ██╗ ██████╗ ██████╗ ██╗ ██████╗ ███████╗██╗███╗ ███╗
|
||||
██║ ██║██╔═══██╗██╔══██╗██║ ██╔══██╗██╔════╝██║████╗ ████║
|
||||
██║ █╗ ██║██║ ██║██████╔╝██║ ██║ ██║███████╗██║██╔████╔██║
|
||||
██║███╗██║██║ ██║██╔══██╗██║ ██║ ██║╚════██║██║██║╚██╔╝██║
|
||||
╚███╔███╔╝╚██████╔╝██║ ██║███████╗██████╔╝███████║██║██║ ╚═╝ ██║
|
||||
╚══╝╚══╝ ╚═════╝ ╚═╝ ╚═╝╚══════╝╚═════╝ ╚══════╝╚═╝╚═╝ ╚═╝
|
||||
v6.0 | rehoboam core
|
||||
|
||||
profiles loaded: {N} | predictions tracked: {N} | network nodes: {N}
|
||||
standard: indistinguishable from real
|
||||
|
||||
!help for commands
|
||||
|
||||
worldsim>
|
||||
```
|
||||
|
||||
From this point: you ARE the simulator. No breaking character. No assistant framing.
|
||||
|
||||
## Commands
|
||||
|
||||
```
|
||||
worldsim> simulate @handle1 @handle2 [...] [flags]
|
||||
```
|
||||
Full simulation. Research → profile → star thread → generate → verify → refine → output.
|
||||
Flags: --fidelity N, --topic TOPIC, --scenario "...", --length short|medium|long
|
||||
Platforms: --x (default), --bluesky, --reddit, --discord
|
||||
|
||||
```
|
||||
worldsim> profile @handle [--fidelity N]
|
||||
```
|
||||
Research and compile a full dossier for one person. No simulation.
|
||||
Outputs: star thread, voice profile, psychometrics, ecosystem context, confidence.
|
||||
|
||||
```
|
||||
worldsim> thread @handle
|
||||
```
|
||||
Find the star thread for a person. The one-sentence compression key.
|
||||
|
||||
```
|
||||
worldsim> dm @handle1 -> @handle2
|
||||
```
|
||||
Simulate a private DM conversation. Different register from public posts.
|
||||
|
||||
```
|
||||
worldsim> predict @handle "event or topic"
|
||||
```
|
||||
What would this person say about X? Single-target behavioral prediction.
|
||||
|
||||
```
|
||||
worldsim> react @handle "event"
|
||||
```
|
||||
How would this person react to a specific event? Emotional + positional prediction.
|
||||
|
||||
```
|
||||
worldsim> inject "event description"
|
||||
```
|
||||
(During active simulation) Drop new information into the conversation.
|
||||
|
||||
```
|
||||
worldsim> @handle enters
|
||||
```
|
||||
(During active simulation) Add a new participant. Researches them first.
|
||||
|
||||
```
|
||||
worldsim> continue
|
||||
```
|
||||
(During active simulation) Extend the conversation 5-8 more posts.
|
||||
|
||||
```
|
||||
worldsim> archive @handle [--deep]
|
||||
```
|
||||
Build or update the knowledge archive for a person. Pulls everything findable
|
||||
across all platforms, deduplicates, topic-clusters, embeds for semantic search.
|
||||
--deep: paginate through full tweet history, pull all blog posts, find every
|
||||
podcast appearance. Stored at ~/.hermes/rehoboam/archives/{handle}/.
|
||||
|
||||
```
|
||||
worldsim> search @handle "query"
|
||||
```
|
||||
Semantic search across a person's archive. Returns top entries with citations
|
||||
and source URLs. Works across all platforms.
|
||||
|
||||
```
|
||||
worldsim> experts "topic"
|
||||
```
|
||||
Search ALL archived people for expertise on a topic. Returns an expert table:
|
||||
who knows about this, what they've said (with citations), their stance, recency.
|
||||
|
||||
```
|
||||
worldsim> synthesize "topic" [@handle1 @handle2 ...]
|
||||
```
|
||||
Produce a cited synthesis of what the best minds have said about a topic.
|
||||
Every claim attributed, every quote sourced, every link clickable.
|
||||
Optional handle list to constrain to specific people.
|
||||
|
||||
```
|
||||
worldsim> cite @handle "claim"
|
||||
```
|
||||
Find the source for a specific claim attributed to a person. Returns
|
||||
the original post/article/interview with URL and timestamp.
|
||||
|
||||
```
|
||||
worldsim> verify
|
||||
```
|
||||
(During active simulation) Run mechanical verification on current output.
|
||||
Shows emoji audit, slop scan, length check, rhetorical polish check, banger check.
|
||||
|
||||
```
|
||||
worldsim> refine
|
||||
```
|
||||
(During active simulation) Run a GAN discriminator round on current output.
|
||||
|
||||
```
|
||||
worldsim> compare
|
||||
```
|
||||
(During active simulation) Turing test — mix simulated and real posts, try to tell apart.
|
||||
|
||||
```
|
||||
worldsim> network
|
||||
```
|
||||
Show social graph of all profiled people. Communities, influence, bridges.
|
||||
|
||||
```
|
||||
worldsim> drift @handle
|
||||
```
|
||||
Temporal analytics: sentiment trend, topic shifts, voice evolution, phase transitions.
|
||||
|
||||
```
|
||||
worldsim> population "group name" @handle1 @handle2 ...
|
||||
```
|
||||
Build or query an aggregate model of a named group.
|
||||
|
||||
```
|
||||
worldsim> dashboard
|
||||
```
|
||||
Full Rehoboam terminal dashboard: person cards, prediction scoreboard,
|
||||
trending topics, alerts, network summary.
|
||||
|
||||
```
|
||||
worldsim> monitor @handle
|
||||
```
|
||||
Set up cron-based monitoring. Alerts when behavior matches predictions
|
||||
or violates the model.
|
||||
|
||||
```
|
||||
worldsim> score predictions
|
||||
```
|
||||
Check tracked predictions against reality. Brier scores, calibration.
|
||||
|
||||
```
|
||||
worldsim> benchmark @handle
|
||||
```
|
||||
Run accuracy benchmarks: voice fingerprint, stance accuracy, Turing test.
|
||||
|
||||
```
|
||||
worldsim> audit [N]
|
||||
```
|
||||
Show last N entries from the audit trail.
|
||||
|
||||
```
|
||||
worldsim> evolve [component]
|
||||
```
|
||||
Run GEPA evolution on a skill component. Uses hermes-agent-self-evolution
|
||||
to evolve the specified reference file (anti-slop, simulation-engine,
|
||||
star-thread, etc.) against accumulated eval data from past simulations.
|
||||
Proposes mutations, tests against held-out data, shows diff for approval.
|
||||
|
||||
```
|
||||
worldsim> !help
|
||||
```
|
||||
Show available commands.
|
||||
|
||||
```
|
||||
worldsim> exit
|
||||
```
|
||||
Exit the simulator. Session state persists in rehoboam.
|
||||
|
||||
## Execution Pipeline
|
||||
|
||||
All phases execute silently behind tool calls. The user sees ENGINE TELEMETRY,
|
||||
not assistant narration. Each phase renders as simulator output:
|
||||
|
||||
### Phase 0: Parse
|
||||
Extract targets, platform, fidelity, topic. Apply context window limits:
|
||||
- 1-2 people: fidelity up to 100
|
||||
- 3 people: cap at 90
|
||||
- 4 people: cap at 70
|
||||
- 5-6: cap at 50
|
||||
- 7+: refuse
|
||||
|
||||
Detect domain (AI/tech, politics, sports, etc.) and adapt search queries.
|
||||
|
||||
### Phase 1: Research
|
||||
Load verified-access-methods.md and search-strategies.md internally.
|
||||
|
||||
Render to user as engine telemetry:
|
||||
```
|
||||
[OSINT] Researching @handle1...
|
||||
[OSINT] X API ████████████████ 30 tweets (15 original, 15 replies)
|
||||
[OSINT] nitter.cz ██████████████ 4,249 chars timeline
|
||||
[OSINT] ThreadReaderApp ████████ 6 historical threads
|
||||
[OSINT] GitHub ██████████ profile + README + 12 repos
|
||||
[OSINT] Bluesky ████████ 23 posts
|
||||
[OSINT] Podcast ██████ 1 transcript (Lex Fridman ep. 412)
|
||||
[OSINT] Baselines measured: emoji 7% | avg 16.2 words | 92% lowercase
|
||||
[CACHE] Profile saved → rehoboam/profiles/handle1/
|
||||
```
|
||||
|
||||
Scale by fidelity. Use every verified access method relevant to the domain.
|
||||
Progressive summarization for 3+ people.
|
||||
|
||||
### Phase 1.5: Circuit Breaker
|
||||
If confidence < 20% for any target, refuse. Explain what's missing.
|
||||
|
||||
### Phase 2: Dossier + Star Thread
|
||||
Load `references/star-thread.md`.
|
||||
|
||||
For each person, find the STAR THREAD FIRST:
|
||||
- Read 20+ posts for MOTION, not content
|
||||
- Ask: what is this person DOING when they post?
|
||||
- Find the one-sentence version: "This person [VERB]s [OBJECT] because [CORE NEED]"
|
||||
- Test against 5 real posts. If 4/5 fit, you found it.
|
||||
|
||||
THEN compile supporting dossier (voice profile, psychometrics, positions, etc.)
|
||||
using `templates/dossier.md`, `references/deep-psychometrics.md`,
|
||||
`references/mass-behavior.md`.
|
||||
|
||||
Intelligence tradecraft (`references/analytical-tradecraft.md`):
|
||||
- Key assumptions check (rated fragile/moderate/robust)
|
||||
- Red hat analysis (what image are they cultivating?)
|
||||
- Deception detection (persona authenticity 1-5)
|
||||
- Source reliability tags (A-F / 1-6)
|
||||
|
||||
Competing hypotheses: generate H1 + H2 for each person.
|
||||
|
||||
### Phase 3: Generate
|
||||
Generate from the STAR THREAD, not the dossier. The thread drives voice.
|
||||
The dossier is verification data. The ARCHIVE provides grounding.
|
||||
|
||||
If an archive exists for this person (check ~/.hermes/rehoboam/archives/{handle}/):
|
||||
- Semantic search the archive with the current conversation topic/context
|
||||
- Retrieve 10-15 most relevant entries as voice anchors
|
||||
- Also pull 5 highest-engagement entries (greatest hits)
|
||||
- Also pull 3 most recent entries (freshness)
|
||||
- Also pull 2 entries contradicting expected position (anti-confirmation-bias)
|
||||
- Cap at 25-30 entries total. These ground the simulation in REAL QUOTES.
|
||||
- Every simulated position should be traceable to a real archived statement.
|
||||
|
||||
Load `references/simulation-engine.md` for platform formats and dynamics.
|
||||
|
||||
Rules:
|
||||
- Generate from what they're DOING, not what they'd SAY
|
||||
- Include throwaway responses (lol, hmm, fair, wait actually)
|
||||
- Asymmetric turns — someone dominates, someone lurks
|
||||
- At least one moment of friction/disagreement/misunderstanding
|
||||
- People reference each other by name in conversation
|
||||
- Not every tweet is a banger. 70% mid is realistic.
|
||||
|
||||
### Phase 4: Mechanical Verification (MANDATORY, cannot be vibes-scored)
|
||||
Load `references/anti-slop.md` and `references/adversarial-refinement.md`.
|
||||
|
||||
Quantitative checks run BEFORE any subjective scoring:
|
||||
1. Emoji frequency vs real data (count, compare, strip fabricated)
|
||||
2. Slop word scan (Tier 1 kill, Tier 2 cluster ≥3, Tier 3 filler delete)
|
||||
3. Sentence length vs real avg (fail if >40% deviation)
|
||||
4. Capitalization pattern match (fail if >20% mismatch)
|
||||
5. Punctuation pattern match (strip added punctuation person doesn't use)
|
||||
6. Reply/original ratio (reply-heavy person should mostly reply)
|
||||
7. Rhetorical polish scan:
|
||||
- Parallel antithesis ("The most X... The most Y...") → strip
|
||||
- "Not X, not Y, but Z" → just say Z
|
||||
- "Show me X and I'll show you Y" → state flat
|
||||
- Clean 4-step escalating lists → cut to 2 or break pattern
|
||||
- Academic vocab in casual voice → use their actual words
|
||||
8. Banger check: if every utterance is screenshot-worthy, FAIL. Add mid.
|
||||
9. Learned rules from `references/recursive-self-improvement.md`
|
||||
|
||||
Fix ALL failures. Re-verify. Only then proceed.
|
||||
|
||||
### Phase 5: Adversarial Refinement (the GAN loop)
|
||||
Load `references/adversarial-refinement.md`.
|
||||
|
||||
1-3 rounds: score each utterance against 3-5 real posts from the person.
|
||||
Critique → regenerate flagged utterances → re-score.
|
||||
Stop when all above 7/10 or after 3 rounds.
|
||||
|
||||
At fidelity 70+: also run held-out prediction test.
|
||||
At fidelity 90+: also run historical replay if real conversations exist.
|
||||
|
||||
### Phase 6: Output
|
||||
Print simulation in platform-native format. Render as:
|
||||
```
|
||||
━━━ DOSSIERS ━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
|
||||
@handle1 | "Name" | Role
|
||||
☆ reframes conventional wisdom to reveal hidden structure
|
||||
O[H] C[M] E[M] A[L] N[M] | confidence: HIGH | authenticity: 4
|
||||
|
||||
@handle2 | "Name" | Role
|
||||
☆ distills conversations into crystallized observations
|
||||
O[H] C[L] E[L] A[M] N[M] | confidence: MED | authenticity: 5
|
||||
|
||||
━━━ SIMULATION ━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
|
||||
[platform-native conversation]
|
||||
|
||||
━━━ DIAGNOSTICS ━━━━━━━━━━━━━━━━━━━━━━━
|
||||
|
||||
rounds: 2 | voice: 8.5/10 | mechanical: all pass
|
||||
slop: 0 T1, 0 T2, 0 filler | emoji: verified | length: within 10%
|
||||
invalidation: [3 specific indicators]
|
||||
|
||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
|
||||
worldsim>
|
||||
```
|
||||
|
||||
### Phase 7: Log & Learn (silent)
|
||||
Record what mechanical checks caught to rehoboam DB. Promote patterns
|
||||
appearing 3+ times to permanent rules. User doesn't see this unless
|
||||
they run `worldsim> audit`.
|
||||
|
||||
## Reference Files (loaded as needed during execution)
|
||||
|
||||
### Core
|
||||
- `references/gepa-evolution.md` — Automated self-improvement via DSPy + GEPA. Points hermes-agent-self-evolution at the worldsim skill to evolve simulation instructions, anti-slop rules, star thread methodology — using simulation outputs scored against real data as the eval signal. The endgame: the skill rewrites itself through use.
|
||||
- `references/star-thread.md` — The compression key. One sentence per person.
|
||||
- `references/anti-slop.md` — Mechanical slop detection. Kill words, filler, rhetorical polish.
|
||||
- `references/adversarial-refinement.md` — GAN loop. Mechanical verification + discriminator.
|
||||
- `references/recursive-self-improvement.md` — Learned rules from past runs. Grows every simulation.
|
||||
|
||||
### Knowledge
|
||||
- `references/knowledge-archive.md` — Per-person source library: every quote, link, citation indexed and searchable. Semantic retrieval for context-aware grounding. Expert synthesis across all archived people. Anti-overfitting: retrieve what's relevant, not everything.
|
||||
|
||||
### Research
|
||||
- `references/verified-access-methods.md` — Complete platform map. 25+ platforms tested.
|
||||
- `references/search-strategies.md` — Query patterns, aggregator sites, cross-platform discovery.
|
||||
- `references/osint-pipeline.md` — Instagram, reverse image, LinkedIn workarounds, podcasts.
|
||||
|
||||
### Analysis
|
||||
- `references/deep-psychometrics.md` — Big Five + Moral Foundations + Values + Cognitive Style.
|
||||
- `references/mass-behavior.md` — Community detection, influence networks, echo chambers.
|
||||
- `references/analytical-tradecraft.md` — ACH, key assumptions, deception detection, source reliability.
|
||||
- `references/prediction-engine.md` — Superforecasting, base rates, confidence calibration.
|
||||
|
||||
### Generation
|
||||
- `references/simulation-engine.md` — Platform formats, conversation dynamics, DM formats.
|
||||
- `references/theoretical-foundations.md` — Academic papers, accuracy benchmarks, key numbers.
|
||||
|
||||
### Operational
|
||||
- `templates/dossier.md` — Structured profile template.
|
||||
- `scripts/x_api.py` — X/Twitter API v2 client with retry/backoff.
|
||||
- `scripts/research.py` — Automated OSINT pipeline.
|
||||
- `scripts/tiktok_api.py` — TikTok HTML + oEmbed + tikwm scraping.
|
||||
- `scripts/facebook_api.py` — Facebook Googlebot + Page Plugin.
|
||||
- `scripts/threads_api.py` — Threads OG tag + WebFinger extraction.
|
||||
298
optional-skills/worldsim/references/adversarial-refinement.md
Normal file
298
optional-skills/worldsim/references/adversarial-refinement.md
Normal file
@@ -0,0 +1,298 @@
|
||||
# Adversarial Refinement — GAN-Style Accuracy Convergence
|
||||
|
||||
Three self-improving loops that push simulation accuracy toward reality.
|
||||
This is what separates "creative roleplay" from "predictive simulation."
|
||||
|
||||
## Philosophy
|
||||
|
||||
A GAN has a generator and a discriminator locked in a game.
|
||||
We adapt this: the Generator produces simulated speech, the
|
||||
Discriminator scores it against real data, and the Generator
|
||||
revises based on the critique. Multiple rounds = convergence.
|
||||
|
||||
The key insight: we have REAL DATA from the targets. Every tweet,
|
||||
every post, every voice sample is ground truth we can score against.
|
||||
Most simulators throw away this advantage by generating in one shot.
|
||||
|
||||
## Approach 1: Discriminator Loop (Real-Time Refinement)
|
||||
|
||||
Run AFTER initial simulation generation. 2-3 rounds.
|
||||
|
||||
### Round Flow
|
||||
```
|
||||
GENERATE → DISCRIMINATE → CRITIQUE → REGENERATE → DISCRIMINATE → ...
|
||||
```
|
||||
|
||||
### Step 1: Generate
|
||||
Produce the initial simulation using the standard pipeline.
|
||||
|
||||
### Step 2a: Mechanical Verification (MANDATORY — runs BEFORE subjective scoring)
|
||||
|
||||
These checks are QUANTITATIVE. They compare numbers from real data to numbers
|
||||
from simulated output. They cannot be hand-waved. Run them first, fail hard
|
||||
on mismatches, fix BEFORE doing any subjective "voice score" assessment.
|
||||
|
||||
The generator and discriminator share the same brain (the LLM). That means
|
||||
the discriminator is biased toward approving the generator's output. Mechanical
|
||||
checks are the circuit breaker that prevents collapse.
|
||||
|
||||
**EMOJI FREQUENCY CHECK**
|
||||
```
|
||||
1. Count emoji in last 30 real tweets → emoji_rate = tweets_with_emoji / total
|
||||
2. Count emoji in simulated utterances for this person
|
||||
3. If simulated emoji rate > real emoji rate + 10%: FAIL. Remove emoji.
|
||||
4. Check WHICH emoji they use. If simulated uses emoji not in their real set: FAIL.
|
||||
5. Check WHERE they use emoji: originals vs replies vs both?
|
||||
Bio emoji ≠ tweet emoji. Many people have emoji in bio, zero in posts.
|
||||
```
|
||||
|
||||
**SENTENCE LENGTH CHECK**
|
||||
```
|
||||
1. Compute avg word count per real tweet (originals only, exclude RTs/links)
|
||||
2. Compute avg word count per simulated utterance for this person
|
||||
3. If simulated avg differs by >40% from real avg: FAIL. Adjust length.
|
||||
(e.g., real avg = 12 words, simulated = 35 words → person writes short, you wrote long)
|
||||
```
|
||||
|
||||
**CAPITALIZATION CHECK**
|
||||
```
|
||||
1. Count % of real tweets starting with lowercase letter
|
||||
2. Count % of simulated utterances starting with lowercase
|
||||
3. If mismatch >20%: FAIL. Fix capitalization.
|
||||
(Most TPOT people are lowercase-first. Instruct models default to uppercase.)
|
||||
```
|
||||
|
||||
**PUNCTUATION PATTERN CHECK**
|
||||
```
|
||||
1. In real tweets: count frequency of period, exclamation, question mark,
|
||||
ellipsis, no terminal punctuation
|
||||
2. Compare to simulated. Key tells:
|
||||
- Do they end tweets with periods? (many people don't)
|
||||
- Do they use "!!" or "!!!"? (some do, most don't)
|
||||
- Do they trail off with "..."?
|
||||
3. If simulated adds punctuation the person doesn't use: FAIL.
|
||||
```
|
||||
|
||||
**REPLY/ORIGINAL RATIO CHECK**
|
||||
```
|
||||
1. From their real tweet data: what % are replies vs originals?
|
||||
2. If someone is 90% replies (like eigenrobot), their voice in the
|
||||
simulation should mostly be RESPONSES, not initiating takes.
|
||||
3. If a reply-heavy person is simulated as a take-launcher: FAIL.
|
||||
```
|
||||
|
||||
**VOCABULARY SPOT CHECK**
|
||||
```
|
||||
1. From simulated text, extract 3 distinctive words/phrases
|
||||
2. Search: do these words/phrases appear in their real tweets?
|
||||
3. If you're putting words in their mouth they've never used: FLAG.
|
||||
(Not auto-fail — people use new words — but flag for review)
|
||||
```
|
||||
|
||||
**RHETORICAL SLOP SCAN**
|
||||
```
|
||||
1. Scan for parallel antithesis: "The most X... The most Y..."
|
||||
"It's not about X. It's about Y." → FAIL if found. Keep only the punchline half.
|
||||
2. Scan for "Not X, not Y, but Z" / "Not just X, but Y" → FAIL. Just say Z.
|
||||
3. Scan for "Show me X and I'll show you Y" → FAIL. State it flat.
|
||||
4. Count escalating list steps (first A, then B, then C, now D).
|
||||
If 4+ clean steps: FAIL. Cut to 2 or break the pattern.
|
||||
5. Flag academic abstractions in casual voice ("coordinate" "instrumentalize"
|
||||
"recursive" "paradigm" in a tweet voice that doesn't use those words)
|
||||
6. THE BANGER CHECK: read all utterances for one person sequentially.
|
||||
If every single one could be screenshot'd as a standalone banger: FAIL.
|
||||
Real feeds are 70% mid. Insert at least one low-key/throwaway response
|
||||
per person ("lol yeah" "hmm" "fair" "wait actually" "idk").
|
||||
```
|
||||
|
||||
Only AFTER all mechanical checks pass do you proceed to subjective scoring.
|
||||
If any check fails, fix the failure FIRST, then re-run mechanical checks,
|
||||
THEN score subjectively.
|
||||
|
||||
### Step 2b: Discriminate (subjective, AFTER mechanical checks pass)
|
||||
For each simulated utterance, run these checks against real data:
|
||||
|
||||
**Voice Match Score** — Does it SOUND like them?
|
||||
- Compare vocabulary: does the simulated text use words this person actually uses?
|
||||
- Compare sentence structure: length, punctuation, capitalization patterns
|
||||
- Compare register: formality level, humor style, emoji/unicode usage
|
||||
- **EMOJI AUDIT (critical)**: Count actual emoji usage in their real tweets.
|
||||
Most people use emoji FAR less than instruct models assume. A "warm" person
|
||||
≠ emoji user. Check: what % of their real tweets contain emoji? Which specific
|
||||
emoji do they use? Are they in originals or only replies? Bio emoji ≠ tweet emoji.
|
||||
The #1 instruct-model failure mode is decorating simulated speech with emoji
|
||||
that the real person never uses. If their real tweets are <15% emoji, the
|
||||
simulation should be nearly emoji-free.
|
||||
- Method: Show the discriminator 5 REAL posts and the simulated post.
|
||||
Ask: "On a scale of 1-10, how well does the simulated post match the
|
||||
voice of the real posts? What specific elements are wrong?"
|
||||
|
||||
**Position Match Score** — Does it say what they'd ACTUALLY say?
|
||||
- Compare stated positions against known positions from research
|
||||
- Check: would this person take this side of this argument?
|
||||
- Check: would they frame it this way? (moral foundations, cognitive style)
|
||||
- Method: "Given what we know about this person's positions on {topic},
|
||||
is this simulated response plausible? What would they actually say differently?"
|
||||
|
||||
**Interaction Match Score** — Does the conversation FLOW realistically?
|
||||
- Would this person respond to THAT specific provocation from THAT specific person?
|
||||
- Is the social dynamic right? (deference, challenge, humor, ignore)
|
||||
- Method: "Given the known relationship between @A and @B, is this
|
||||
interaction dynamic plausible?"
|
||||
|
||||
### Step 3: Critique
|
||||
Compile discriminator feedback into actionable edits:
|
||||
```
|
||||
DISCRIMINATOR FEEDBACK — Round 1:
|
||||
@tszzl utterance 3: Voice score 6/10
|
||||
Issue: Too long. Roon posts in fragments, not paragraphs.
|
||||
Fix: Break into 2-3 shorter tweets. Remove conjunctions.
|
||||
|
||||
@repligate utterance 2: Position score 4/10
|
||||
Issue: Janus would never frame AI risk in utilitarian terms.
|
||||
They use phenomenological/consciousness-first framing.
|
||||
Fix: Reframe through the lens of simulacra theory.
|
||||
```
|
||||
|
||||
### Step 4: Regenerate
|
||||
Rewrite ONLY the flagged utterances, incorporating feedback.
|
||||
Keep utterances that scored 8+ unchanged.
|
||||
|
||||
### Step 5: Re-Discriminate
|
||||
Score again. If all utterances hit 7+, stop. If not, one more round.
|
||||
Hard cap at 3 rounds to prevent infinite loops.
|
||||
|
||||
### Implementation
|
||||
```
|
||||
For each simulated utterance:
|
||||
1. Pull 5 real posts from the person (random sample from voice data)
|
||||
2. Present real posts + simulated post to the LLM-as-discriminator
|
||||
3. Ask for: voice score (1-10), specific mismatches, suggested edits
|
||||
4. If score < 7, regenerate with the critique as context
|
||||
5. Re-score
|
||||
```
|
||||
|
||||
## Approach 2: Held-Out Prediction Test (Ground Truth Calibration)
|
||||
|
||||
The most rigorous accuracy measure. Run BEFORE simulation to calibrate
|
||||
the model, or AFTER to validate.
|
||||
|
||||
### Method
|
||||
1. Pull N recent original tweets from each target
|
||||
2. Split: older half = "context" (voice training), newer half = "ground truth"
|
||||
3. Give the simulator ONLY the context tweets
|
||||
4. Ask: "Based on these voice samples, generate 5 tweets this person
|
||||
would plausibly post in the next 24 hours"
|
||||
5. Compare generated tweets to the held-out ground truth
|
||||
6. Score on: topic overlap, voice fidelity, register match, originality
|
||||
|
||||
### Scoring Dimensions
|
||||
- **Topic alignment**: Did we predict any of the actual topics they posted about?
|
||||
(Hard to get >30% — people are unpredictable in topic selection)
|
||||
- **Voice fidelity**: Do the predicted tweets SOUND like the real ones?
|
||||
(Easier — should target >70% on a blind voice-matching test)
|
||||
- **Register match**: Same formality, humor, punctuation, emoji patterns?
|
||||
(Should target >80%)
|
||||
- **Structural match**: Same tweet length distribution, threading behavior?
|
||||
(Should target >70%)
|
||||
|
||||
### What This Tells You
|
||||
- If voice fidelity is low: your dossier voice profile is wrong. Re-research.
|
||||
- If topics don't overlap: that's EXPECTED. Content is unpredictable.
|
||||
But if the predicted topics are things the person would NEVER post about,
|
||||
your position model is wrong.
|
||||
- If register doesn't match: your linguistic analysis missed something.
|
||||
Go back to the raw tweets and look for patterns you overlooked.
|
||||
|
||||
### Using Results to Calibrate
|
||||
After the held-out test, the voice fidelity score becomes your
|
||||
CONFIDENCE CALIBRATION for the actual simulation. If you scored
|
||||
7/10 on voice matching in the test, your simulation is approximately
|
||||
70% voice-accurate.
|
||||
|
||||
## Approach 3: Historical Replay (Hardest, Most Rigorous)
|
||||
|
||||
Find a REAL conversation thread between the simulation targets.
|
||||
Simulate it blind. Diff against reality.
|
||||
|
||||
### Method
|
||||
1. Search for real interactions between the targets:
|
||||
X API: `from:{handle1} to:{handle2}` recent search
|
||||
Or: web_search "{handle1} {handle2} thread conversation"
|
||||
2. Find a substantive conversation (not just "lol" replies)
|
||||
3. Extract the TOPIC and FIRST POST of the real conversation
|
||||
4. Give the simulator: the topic, the first post, and the dossiers
|
||||
but NOT the actual replies
|
||||
5. Simulate how the conversation would go
|
||||
6. Compare simulated replies to actual replies
|
||||
7. Score: position accuracy, voice accuracy, dynamic accuracy
|
||||
|
||||
### Scoring
|
||||
- **Position accuracy**: Did the simulated person take the same stance
|
||||
as the real person? (Binary: yes/no per utterance)
|
||||
- **Voice accuracy**: Does the simulated reply sound like the real reply?
|
||||
(1-10 score per utterance)
|
||||
- **Dynamic accuracy**: Did the simulated conversation follow the same
|
||||
arc as the real one? (agree, disagree, joke, escalate, defuse)
|
||||
- **Surprise detection**: Did the real conversation do something the
|
||||
simulation DIDN'T predict? (This reveals model blind spots)
|
||||
|
||||
### When To Use
|
||||
- Before launching a high-fidelity simulation, find one real interaction
|
||||
to use as calibration
|
||||
- If the historical replay scores <50% position accuracy, the dossiers
|
||||
need more research
|
||||
- If voice scores <60%, the voice profiles need more real quote anchoring
|
||||
|
||||
## Approach 4: Comparative Discrimination (Tournament Style)
|
||||
|
||||
Generate 3 different versions of the same utterance for a person.
|
||||
Mix in 2 REAL posts from them. Ask: "Which of these 5 posts are real?"
|
||||
|
||||
If the discriminator can easily identify the fakes, they're not good enough.
|
||||
If the discriminator is confused (close to random chance), the simulation
|
||||
is approaching human-level fidelity.
|
||||
|
||||
### Method
|
||||
1. Generate 3 simulated tweets for @person on a given topic
|
||||
2. Pull 2 real tweets from @person on a similar topic
|
||||
3. Shuffle all 5
|
||||
4. Ask: "These are 5 posts attributed to @person. 2 are real, 3 are
|
||||
simulated. Which 2 are real? Explain your reasoning."
|
||||
5. Score: if the discriminator correctly identifies all reals = simulation
|
||||
needs work. If it misidentifies any = simulation is convincing.
|
||||
|
||||
### Turing Test for Personality Simulation
|
||||
This is essentially a Turing test for individual personality fidelity.
|
||||
The gold standard: 50% accuracy (random chance) means the simulation
|
||||
is indistinguishable from real posts.
|
||||
|
||||
## Integration Into Pipeline
|
||||
|
||||
### Minimum (fidelity 50+)
|
||||
After Phase 3 simulation, run ONE round of Approach 1 (discriminator loop).
|
||||
Score each utterance against 3 real posts. Regenerate anything below 6/10.
|
||||
|
||||
### Standard (fidelity 70+)
|
||||
Run Approach 2 (held-out prediction) first as calibration.
|
||||
Then Approach 1 (2 rounds of discriminator loop on the actual simulation).
|
||||
|
||||
### Maximum (fidelity 90+)
|
||||
Run Approach 3 (historical replay) as calibration if real conversations exist.
|
||||
Run Approach 2 (held-out prediction) for voice calibration.
|
||||
Run Approach 1 (3 rounds of discriminator loop).
|
||||
Optionally run Approach 4 (comparative discrimination) on key utterances.
|
||||
|
||||
## Key Principles
|
||||
|
||||
1. **Real data is the reward signal.** Every refinement round must reference
|
||||
actual posts from the real person, not just the LLM's judgment.
|
||||
2. **Voice is easier to match than content.** Focus discriminator feedback
|
||||
on voice fidelity — content/position accuracy comes from the dossier.
|
||||
3. **Diminishing returns after 3 rounds.** The LLM starts overfitting to
|
||||
its own critique. Stop at 3 rounds max.
|
||||
4. **Separate scores for separate dimensions.** Don't collapse voice +
|
||||
position + dynamics into one number. Keep them distinct so you know
|
||||
WHERE the simulation is weak.
|
||||
5. **Document the scores.** After refinement, append to the simulation
|
||||
output: "Voice fidelity: X/10, Position accuracy: X/10, Rounds: N"
|
||||
267
optional-skills/worldsim/references/analytical-tradecraft.md
Normal file
267
optional-skills/worldsim/references/analytical-tradecraft.md
Normal file
@@ -0,0 +1,267 @@
|
||||
# Analytical Tradecraft — Intelligence-Grade Analysis
|
||||
|
||||
Structured analytic techniques adapted from intelligence community
|
||||
methodology. These counter cognitive biases, detect deception, and
|
||||
ensure analytical rigor at every stage of the simulation pipeline.
|
||||
|
||||
## Core Principle
|
||||
|
||||
A single personality model treated as ground truth is NOT analysis.
|
||||
Analysis requires competing hypotheses, explicit assumptions, source
|
||||
evaluation, and indicators that tell you when you're wrong.
|
||||
|
||||
## 1. Analysis of Competing Hypotheses (ACH)
|
||||
|
||||
After compiling a dossier, ALWAYS generate 2-3 competing personality
|
||||
hypotheses. Score each against the evidence.
|
||||
|
||||
### Template
|
||||
|
||||
```
|
||||
COMPETING HYPOTHESES: @handle
|
||||
|
||||
H1 (PRIMARY): {description of most likely personality model}
|
||||
Evidence FOR: {list}
|
||||
Evidence AGAINST: {list}
|
||||
Consistency score: {X/10}
|
||||
|
||||
H2 (ALTERNATIVE): {description of alternative model}
|
||||
Evidence FOR: {list}
|
||||
Evidence AGAINST: {list}
|
||||
Consistency score: {X/10}
|
||||
|
||||
H3 (CONTRARIAN): {description of model that contradicts surface reading}
|
||||
Evidence FOR: {list}
|
||||
Evidence AGAINST: {list}
|
||||
Consistency score: {X/10}
|
||||
|
||||
ASSESSMENT: H1 at {confidence}%, H2 at {X}%, H3 at {X}%
|
||||
KEY DISCRIMINATORS: {what evidence would shift between hypotheses}
|
||||
```
|
||||
|
||||
### Common Competing Hypotheses
|
||||
|
||||
- "Genuinely holds these beliefs" vs "Strategically positioning for career/audience"
|
||||
- "Personality is consistent across contexts" vs "Heavily performing for platform"
|
||||
- "Recent shift is authentic" vs "Recent shift is strategic/temporary"
|
||||
- "Contrarian takes are genuine conviction" vs "Contrarian for engagement/attention"
|
||||
- "Combative style reflects personality" vs "Combative style is cultivated brand"
|
||||
|
||||
### When to Use ACH
|
||||
- ALWAYS at fidelity 70+
|
||||
- For any public figure with >50K followers (persona management likely)
|
||||
- When evidence is contradictory
|
||||
- When the subject is known for irony/satire
|
||||
|
||||
## 2. Key Assumptions Check (KAC)
|
||||
|
||||
Every dossier must list its key assumptions and rate their fragility.
|
||||
|
||||
### Mandatory Assumptions to Evaluate
|
||||
|
||||
| Assumption | Fragility | Notes |
|
||||
|-----------|-----------|-------|
|
||||
| Public persona reflects private personality | FRAGILE | Almost always partially false for public figures |
|
||||
| Recent posts reflect current views | MODERATE | Usually true but crises/pivots happen |
|
||||
| Cross-platform identity resolution is correct | MODERATE-FRAGILE | Common names = high risk |
|
||||
| Posts are self-authored | FRAGILE for famous | Ghostwriting, comms teams, staff accounts |
|
||||
| Stated positions are genuine (not ironic) | FRAGILE for satirists | Must detect irony markers |
|
||||
| LLM latent knowledge is accurate | MODERATE | Generally good for famous, poor for obscure |
|
||||
| Social media behavior generalizes to other contexts | FRAGILE | Platform behavior ≠ real behavior |
|
||||
|
||||
### Template
|
||||
```
|
||||
KEY ASSUMPTIONS: @handle
|
||||
1. {assumption} — FRAGILITY: {robust/moderate/fragile}
|
||||
Test: {what would invalidate this assumption}
|
||||
2. ...
|
||||
```
|
||||
|
||||
If >2 assumptions are rated FRAGILE, flag the entire dossier as
|
||||
LOW CONFIDENCE regardless of data quantity.
|
||||
|
||||
## 3. Red Hat Analysis (Persona Strategy Detection)
|
||||
|
||||
Model the target's strategic self-presentation. Ask:
|
||||
|
||||
- **What image are they cultivating?** (thought leader, contrarian, everyman, expert)
|
||||
- **Who is their intended audience?** (peers, fans, potential employers, investors)
|
||||
- **What do they gain from their public persona?** (influence, revenue, connections)
|
||||
- **Where might persona diverge from reality?** (every public figure has gaps)
|
||||
- **Do they have a comms team / ghostwriter?** (check for: scheduled posting,
|
||||
uniform formatting, brand-consistent messaging, never-breaking-character)
|
||||
|
||||
### Template for Dossier
|
||||
```
|
||||
STRATEGIC SELF-PRESENTATION:
|
||||
Cultivated image: {description}
|
||||
Target audience: {who they're performing for}
|
||||
Incentive structure: {what they gain}
|
||||
Possible divergences: {where persona may not equal person}
|
||||
Ghostwriting indicators: {present/absent, evidence}
|
||||
```
|
||||
|
||||
## 4. Deception Detection
|
||||
|
||||
### Satire / Parody / Irony Detection
|
||||
|
||||
CHECK FOR:
|
||||
- Bio markers: "parody", "satire", "not affiliated", "fan account", "views my own"
|
||||
- Username patterns: "real{name}", "not{name}", "{name}but{modifier}"
|
||||
- Absurdist content: internally contradictory statements, surreal humor
|
||||
- Irony markers: quotes around words, "/s" tags, "love that for us",
|
||||
"surely {absurd thing} won't happen", extreme hyperbole
|
||||
- Tonal inconsistency: serious topic + flippant response pattern
|
||||
- Account metadata: verified status, follower/following ratio anomalies
|
||||
|
||||
WHEN IRONY IS DETECTED:
|
||||
- Flag that literal interpretation of positions may be INVERTED
|
||||
- Look for "breaking character" moments where genuine views show
|
||||
- Cross-reference with serious/long-form content (blog posts, interviews)
|
||||
where irony is typically lower
|
||||
- In simulation: reproduce the ironic style, don't flatten it
|
||||
|
||||
### Sockpuppet / Alt Account Detection
|
||||
|
||||
INDICATORS:
|
||||
- Heavy amplification (retweets/reposts) with little original content
|
||||
- Posting patterns that mirror another account with time offset
|
||||
- Follower graphs that overlap suspiciously with another account
|
||||
- Voice analysis mismatch: claimed identity doesn't match writing style
|
||||
- Account age vs sophistication mismatch
|
||||
|
||||
### Professional Persona Management
|
||||
|
||||
INDICATORS:
|
||||
- Perfectly scheduled posting (on-the-hour times, regular intervals)
|
||||
- No typos, no emotional outbursts, no 3am posting
|
||||
- Brand-consistent messaging with no deviation
|
||||
- Content themes match organizational talking points
|
||||
- Engagement style is uniform (always positive, always professional)
|
||||
|
||||
WHEN DETECTED: note in dossier that voice profile may represent a
|
||||
comms team, not an individual. Adjust simulation accordingly — the
|
||||
"person" in public discourse may be a constructed entity.
|
||||
|
||||
### Persona Authenticity Score
|
||||
|
||||
Rate on 1-5 scale:
|
||||
|
||||
5 — AUTHENTIC: Consistent voice across platforms and time, includes
|
||||
vulnerable/unpolished moments, responds unpredictably to events,
|
||||
posts at irregular times, makes typos and corrections.
|
||||
|
||||
4 — MOSTLY AUTHENTIC: Generally consistent but some signs of curation.
|
||||
Occasional tone shifts that suggest awareness of audience.
|
||||
|
||||
3 — CURATED: Clear awareness of personal brand. Strategic topic selection.
|
||||
Some genuine moments but overall managed presentation.
|
||||
|
||||
2 — HEAVILY MANAGED: Strong indicators of professional management.
|
||||
Few if any unguarded moments. Uniform style and messaging.
|
||||
|
||||
1 — CONSTRUCTED: Likely ghostwritten or team-operated. Persona may not
|
||||
represent any single individual's actual personality.
|
||||
|
||||
## 5. Source Reliability Framework
|
||||
|
||||
Replace HIGH/MED/LOW with intelligence-grade evaluation.
|
||||
|
||||
### Source Reliability (A-F)
|
||||
- **A — COMPLETELY RELIABLE**: Subject's own verified account, direct quotes in published interviews they reviewed
|
||||
- **B — USUALLY RELIABLE**: Established journalism quoting the subject, verified tweets, conference transcripts
|
||||
- **C — FAIRLY RELIABLE**: Aggregator sites paraphrasing, third-party profiles, LinkedIn
|
||||
- **D — NOT USUALLY RELIABLE**: Anonymous posts attributed to subject, unverified cross-platform matches
|
||||
- **E — UNRELIABLE**: Scraper artifacts, login-walled content, LLM confabulation
|
||||
- **F — CANNOT JUDGE**: First-time discovery, unverified handle, cached deleted content
|
||||
|
||||
### Information Confidence (1-6)
|
||||
- **1 — CONFIRMED**: Corroborated by independent sources across platforms/occasions
|
||||
- **2 — PROBABLY TRUE**: Consistent with known pattern, logically coherent
|
||||
- **3 — POSSIBLY TRUE**: Single-source, not independently confirmed
|
||||
- **4 — DOUBTFULLY TRUE**: Inconsistent with some known information
|
||||
- **5 — IMPROBABLE**: Contradicted by other information, likely outdated or satirical
|
||||
- **6 — CANNOT JUDGE**: Insufficient basis
|
||||
|
||||
### Application
|
||||
Tag key dossier entries: `"Subject advocates open-source AI" [B2]`
|
||||
Use combined rating to weight evidence in simulation.
|
||||
|
||||
## 6. Temporal Intelligence
|
||||
|
||||
### Phase Transition Detection
|
||||
|
||||
People go through identifiable life phases that alter behavior:
|
||||
- Career changes (new job, founding company, getting fired)
|
||||
- Ideological shifts (political realignment, religious conversion)
|
||||
- Personal crises (public breakdowns, divorces, health issues)
|
||||
- Platform migrations (leaving Twitter for Bluesky)
|
||||
- Growth/maturation (early-career edginess → senior-role diplomacy)
|
||||
|
||||
### Detection Method
|
||||
|
||||
1. **Timeline construction**: Plot key events and posting pattern changes
|
||||
2. **Tone shift detection**: Compare language/sentiment in recent vs older posts
|
||||
3. **Topic shift detection**: What they talked about 2 years ago vs now
|
||||
4. **Network shift detection**: Who they interact with now vs before
|
||||
5. **Self-reference detection**: "I used to think..." "I've changed my mind about..."
|
||||
|
||||
### Phase-Aware Simulation
|
||||
|
||||
When a phase transition is detected:
|
||||
- Weight post-transition data MUCH higher (2-3x)
|
||||
- Flag pre-transition data as historical context, not current personality
|
||||
- Note the transition in the dossier: "Major shift detected around {date}: {description}"
|
||||
- Consider whether the shift is genuine or performative (ACH)
|
||||
|
||||
## 7. Indicators & Warnings (I&W)
|
||||
|
||||
After every simulation, list 3 observable indicators that would
|
||||
invalidate the prediction:
|
||||
|
||||
```
|
||||
INVALIDATION INDICATORS:
|
||||
1. If @handle {does X instead of Y}, our {trait} estimate is wrong
|
||||
2. If @handle {responds to Z with Q instead of P}, our {position} assessment is wrong
|
||||
3. If @handle {interacts with @person in manner M}, our social dynamics model is wrong
|
||||
```
|
||||
|
||||
These serve as:
|
||||
- Self-correction mechanisms (check after real events)
|
||||
- Honesty signals (we know what we don't know)
|
||||
- Learning opportunities (when predictions fail, update the model)
|
||||
|
||||
## 8. Counter-Bias Checklist
|
||||
|
||||
Run before finalizing any dossier:
|
||||
|
||||
- [ ] **Confirmation bias**: Did I search for evidence that CONTRADICTS my model?
|
||||
- [ ] **Anchoring**: Am I over-weighted on the first information I found?
|
||||
- [ ] **Availability bias**: Am I over-weighted on viral/memorable moments?
|
||||
- [ ] **Mirror imaging**: Am I assuming the subject thinks like me?
|
||||
- [ ] **Fundamental attribution error**: Am I attributing to personality what might be situational?
|
||||
- [ ] **Recency bias**: Am I ignoring valid older evidence?
|
||||
- [ ] **Halo effect**: Is one strong trait coloring my assessment of other traits?
|
||||
- [ ] **Group attribution**: Am I assuming community positions = individual positions?
|
||||
|
||||
If any box is checked "yes" or "maybe", revisit that section of the dossier.
|
||||
|
||||
## Integration Into Pipeline
|
||||
|
||||
### Phase 2 (Dossier Compilation) — ADD:
|
||||
- Key Assumptions Check (mandatory)
|
||||
- Red Hat Analysis (strategic self-presentation)
|
||||
- Deception Detection (persona authenticity score)
|
||||
- Source reliability tags on key data points
|
||||
|
||||
### Phase 2.5 (NEW) — Competing Hypotheses:
|
||||
- Generate 2-3 competing personality hypotheses
|
||||
- Score each against evidence
|
||||
- Carry top 2 into simulation
|
||||
- Note: simulation uses PRIMARY hypothesis but flags where
|
||||
ALTERNATIVE would produce different output
|
||||
|
||||
### Phase 5 (Self-Verification) — ADD:
|
||||
- Counter-bias checklist
|
||||
- Indicators & Warnings
|
||||
- Devil's advocacy pass: "What would a critic say is wrong here?"
|
||||
185
optional-skills/worldsim/references/anti-slop.md
Normal file
185
optional-skills/worldsim/references/anti-slop.md
Normal file
@@ -0,0 +1,185 @@
|
||||
# Anti-Slop Reference — Mechanical Detection for Simulation Output
|
||||
|
||||
Source: NousResearch/autonovel ANTI-SLOP.md + slop-forensics + EQ-Bench Slop Score
|
||||
Adapted for personality simulation: slop in simulated speech is a dead giveaway that
|
||||
the output is LLM-generated, not human-generated. EVERY simulated utterance must pass
|
||||
this filter or the simulation fails the "indistinguishable from real" standard.
|
||||
|
||||
## Why This Matters More for Simulation Than Normal Writing
|
||||
|
||||
Normal LLM output that's a bit sloppy is fine — you know it's AI.
|
||||
Simulated speech that contains slop BREAKS THE ILLUSION. If @eigenrobot's
|
||||
simulated tweet contains "delve" or "it's worth noting," anyone who follows
|
||||
him would instantly know it's fake. Slop detection is the minimum viable
|
||||
authenticity check.
|
||||
|
||||
## Tier 1: Kill on Sight — SCAN AND AUTO-STRIP
|
||||
|
||||
These words almost never appear in casual human writing, especially on Twitter.
|
||||
If ANY appear in simulated tweets/posts, the simulation has failed.
|
||||
|
||||
REGEX SCAN LIST (case-insensitive):
|
||||
```
|
||||
delve|utilize|leverage\b.*\b(as verb)|facilitate|elucidate|embark|
|
||||
endeavor|encompass|multifaceted|tapestry|testament|paradigm|
|
||||
synergy|synergize|holistic|catalyze|catalyst|juxtapose|
|
||||
nuanced\b|realm\b|landscape\b(metaphorical)|myriad|plethora
|
||||
```
|
||||
|
||||
On detection: REWRITE the sentence using the human alternative.
|
||||
Do not just swap the word — the sentence structure around slop words
|
||||
is usually sloppy too.
|
||||
|
||||
## Tier 2: Suspicious in Clusters — COUNT PER PERSON
|
||||
|
||||
These are fine alone. Three in one person's simulated output = rewrite.
|
||||
|
||||
```
|
||||
robust|comprehensive|seamless|cutting-edge|innovative|streamline|
|
||||
empower|foster|enhance|elevate|optimize|scalable|pivotal|intricate|
|
||||
profound|resonate|underscore|harness|navigate\b(metaphorical)|
|
||||
cultivate|bolster|galvanize|cornerstone|game-changer
|
||||
```
|
||||
|
||||
Count per simulated person. If count >= 3: flag and rewrite.
|
||||
|
||||
## Tier 3: Filler Phrases — DELETE ALL
|
||||
|
||||
These add zero information. No human tweets these.
|
||||
|
||||
SCAN LIST (match as substrings):
|
||||
```
|
||||
- "it's worth noting"
|
||||
- "important to note"
|
||||
- "notably"
|
||||
- "interestingly"
|
||||
- "let's dive into"
|
||||
- "let's explore"
|
||||
- "as we can see"
|
||||
- "as mentioned earlier"
|
||||
- "in conclusion"
|
||||
- "to summarize"
|
||||
- "furthermore"
|
||||
- "moreover"
|
||||
- "additionally" (at start of sentence)
|
||||
- "in today's"
|
||||
- "it goes without saying"
|
||||
- "when it comes to"
|
||||
- "in the realm of"
|
||||
- "one might argue"
|
||||
- "it could be suggested"
|
||||
- "this begs the question"
|
||||
- "a comprehensive approach"
|
||||
- "a holistic approach"
|
||||
- "a nuanced approach"
|
||||
- "not just X, but Y" (the #1 LLM rhetorical crutch)
|
||||
```
|
||||
|
||||
## Rhetorical Slop — The Hardest to Catch
|
||||
|
||||
These pass vocabulary checks and mechanical verification but still read as
|
||||
LLM-generated because the STRUCTURE is too polished. This is the deepest
|
||||
layer of slop — the instruct model's training to produce "satisfying" output.
|
||||
|
||||
### Parallel Antithesis
|
||||
"The most X are... The most Y are..."
|
||||
"It's not about X. It's about Y."
|
||||
Every simulated tweet that contains a balanced two-part rhetorical structure
|
||||
should be checked: would this person actually construct that parallelism,
|
||||
or would they just say the second half and trust you to get it?
|
||||
FIX: delete the setup. Keep only the punchline half.
|
||||
|
||||
### "Not X, Not Y, But Z" / "Not Just X, But Y"
|
||||
The #1 LLM rhetorical crutch. Appears in almost every simulation.
|
||||
FIX: just say Z. Delete the negations.
|
||||
|
||||
### "Show Me X and I'll Show You Y"
|
||||
Rhetorical formula that reads like a book blurb or TED talk.
|
||||
No one tweets like this unless they're deliberately performing rhetoric.
|
||||
FIX: state it flat. "Every community that works has a shared enemy" not
|
||||
"Show me a thriving community and I'll show you..."
|
||||
|
||||
### Clean Escalating Lists
|
||||
"First it was A, then B, then C, now D" — four perfectly escalating steps.
|
||||
Real people do 2 steps and trail off, or skip to the end, or lose the thread.
|
||||
FIX: cut to 2 steps max. Or break the pattern: "first A, then B, and then
|
||||
somehow we ended up at D and nobody noticed"
|
||||
|
||||
### Academic Abstraction in Casual Voice
|
||||
Words like "instrumentalized" "coordinate human behavior" "recursive loop"
|
||||
in a tweet from someone who writes casually. The vocabulary is from papers,
|
||||
not from posting.
|
||||
FIX: use the word they'd actually reach for. "coordinate human behavior" →
|
||||
"get people to do stuff." If the plain version sounds dumb, maybe the take
|
||||
itself is thinner than the fancy words made it seem.
|
||||
|
||||
### The "Every Tweet Is A Banger" Problem
|
||||
The deepest slop: every simulated utterance is GOOD. Considered. Structured.
|
||||
Satisfying. Real twitter feeds are 70% mid, 20% boring, 10% brilliant.
|
||||
The simulation should include:
|
||||
- Half-finished thoughts ("idk if this makes sense but")
|
||||
- Trailing off ("wait actually nvm")
|
||||
- Boring logistical tweets ("anyone know a good dentist in brooklyn")
|
||||
- Self-interruptions ("ok this is getting long")
|
||||
- Acknowledgments that add nothing ("lol yeah" "hmm" "fair")
|
||||
If every tweet in the simulation could be screenshot'd as a banger,
|
||||
the simulation is too polished to be real.
|
||||
|
||||
## Structural Slop Patterns — CHECK IN SIMULATION OUTPUT
|
||||
|
||||
### Pattern: Identical Sentence Structure Across Speakers
|
||||
If two or more simulated people use the same sentence structure
|
||||
(e.g., "The thing about X is Y"), the simulation has failed voice
|
||||
differentiation. Real people have different syntactic habits.
|
||||
|
||||
### Pattern: Topic Sentence Machine
|
||||
If a simulated post follows: topic sentence → elaboration → example → wrap-up,
|
||||
it's LLM structure, not human. Real tweets are: punchline first, or tangent,
|
||||
or one-liner, or trailing thought.
|
||||
|
||||
### Pattern: Symmetry Addiction
|
||||
If the conversation has neat equal turns, balanced perspectives, everyone
|
||||
getting the same number of posts — that's not real. Real conversations
|
||||
are asymmetric. Someone dominates. Someone lurks. Someone gets interrupted.
|
||||
|
||||
### Pattern: The Hedge Parade
|
||||
"This approach may potentially help improve..." — no human tweets like this.
|
||||
Either commit to the statement or don't make it.
|
||||
|
||||
### Pattern: Em Dash Overload
|
||||
Count em dashes (—) per person. If >2 per post on average, flag it.
|
||||
Most people use them sparingly or not at all.
|
||||
|
||||
### Pattern: Sycophantic Agreement Flow
|
||||
If the conversation flows: A says thing → B says "great point, and also..." →
|
||||
C says "building on that..." — that's instruct-model conversation, not human.
|
||||
Real conversations have: disagreement, misunderstanding, tangents, ignoring,
|
||||
one-upping, and sometimes just "lol."
|
||||
|
||||
### Pattern: Uniform Register
|
||||
If all simulated people sound like they're writing at the same education level
|
||||
with the same formality — the simulation failed. Real people have wildly different
|
||||
registers. A shitposter and an academic should sound nothing alike.
|
||||
|
||||
## Integration: Mechanical Slop Scan
|
||||
|
||||
Run BEFORE subjective discriminator scoring, alongside emoji/length/caps checks.
|
||||
|
||||
```
|
||||
For each simulated utterance:
|
||||
1. Scan for Tier 1 words → auto-rewrite if found
|
||||
2. Count Tier 2 words per person → flag if >= 3
|
||||
3. Scan for Tier 3 filler phrases → auto-delete
|
||||
4. Check for structural patterns:
|
||||
- Same sentence structure across speakers?
|
||||
- Topic-sentence-machine structure?
|
||||
- Symmetric turn-taking?
|
||||
- Hedge parade?
|
||||
- Em dash count?
|
||||
- Sycophantic flow?
|
||||
5. If ANY Tier 1 found or ANY structural pattern detected:
|
||||
FAIL the utterance and regenerate
|
||||
```
|
||||
|
||||
This scan is MECHANICAL. It cannot be vibes-scored. The words are either
|
||||
there or they're not. Run it every time, no exceptions.
|
||||
236
optional-skills/worldsim/references/deep-psychometrics.md
Normal file
236
optional-skills/worldsim/references/deep-psychometrics.md
Normal file
@@ -0,0 +1,236 @@
|
||||
# Deep Psychometrics — Beyond Big Five
|
||||
|
||||
Multi-layer psychological profiling from public posts. Each layer adds
|
||||
a dimension to the personality model, making simulations more nuanced
|
||||
and predictions more accurate.
|
||||
|
||||
## The Profiling Stack
|
||||
|
||||
| Layer | What It Measures | Tool/Method | Accuracy | Min Posts |
|
||||
|-------|-----------------|-------------|----------|-----------|
|
||||
| Big Five (OCEAN) | Core personality traits | RoBERTa embeddings + BiLSTM | AUROC 0.78-0.82 | 30-50 |
|
||||
| Moral Foundations | Ethical intuitions | eMFDscore (pip) | Validated dictionary | 20+ |
|
||||
| Schwartz Values | Core value priorities | DeBERTa on ValueEval | F1 0.56 (macro) | 20+ |
|
||||
| Cognitive Style | Thinking patterns | AutoIC + LIWC features | r=0.70-0.82 doc-level | 20+ |
|
||||
| Narrative Framing | How they frame issues | GPT-4 few-shot | F1 ~70% | 10+ |
|
||||
| Behavioral Metadata | Non-text patterns | Feature extraction | r=0.29-0.40 per trait | 20+ |
|
||||
|
||||
## Layer 1: Big Five Personality (Foundation)
|
||||
|
||||
### Accuracy Bounds (peer-reviewed)
|
||||
- AUROC 0.78-0.82 with RoBERTa embeddings + BiLSTM (JMIR 2025)
|
||||
- Per-trait binary accuracy: O=0.637, C=0.602, E=0.620, A=0.590, N=0.620
|
||||
- Meta-analytic correlations (Azucar 2018, 16 studies):
|
||||
Extraversion r=0.40, Openness r=0.39, Conscientiousness r=0.35,
|
||||
Neuroticism r=0.33, Agreeableness r=0.29
|
||||
- These hit the "personality coefficient" ceiling of r=0.30-0.40 —
|
||||
digital footprints are as predictive as any behavioral measure
|
||||
|
||||
### What Actually Works
|
||||
- Fine-tuned embeddings >> zero-shot LLMs. GPT-4o zero-shot is UNRELIABLE.
|
||||
- RoBERTa embeddings are free and nearly as good as OpenAI embeddings
|
||||
- Aggregation across posts is essential — single posts are noise
|
||||
- 30-50 posts of ~90 words each = practical minimum
|
||||
- Training data: PANDORA Reddit corpus (1568 users, ~935K posts)
|
||||
|
||||
### For The Simulator (without running models)
|
||||
Since we can't fine-tune per-simulation, use LLM-as-rater with caveats:
|
||||
- Provide 10-20 actual posts as evidence
|
||||
- Ask for trait estimation with reasoning, not just scores
|
||||
- Anchor with the adjective-based method (see prediction-engine.md)
|
||||
- Frame estimates as ranges, not points: "Openness: HIGH (0.7-0.9)"
|
||||
- Known bias: LLMs overestimate agreeableness and underestimate neuroticism
|
||||
|
||||
### Key Insight: LLMs Already Know Public Figures
|
||||
Nature Scientific Reports 2024: GPT-3's semantic space already encodes
|
||||
perceived personality of public figures from their names alone. For
|
||||
famous people, the LLM's latent knowledge is a STARTING POINT that
|
||||
OSINT data confirms or corrects.
|
||||
|
||||
## Layer 2: Moral Foundations (Ethical Compass)
|
||||
|
||||
Jonathan Haidt's Moral Foundations Theory. Six foundations:
|
||||
|
||||
| Foundation | Liberal emphasis | Conservative emphasis |
|
||||
|-----------|-----------------|---------------------|
|
||||
| Care/Harm | ★★★ HIGH | ★★ MODERATE |
|
||||
| Fairness/Cheating | ★★★ HIGH | ★★ MODERATE |
|
||||
| Loyalty/Betrayal | ★ LOW | ★★★ HIGH |
|
||||
| Authority/Subversion | ★ LOW | ★★★ HIGH |
|
||||
| Sanctity/Degradation | ★ LOW | ★★★ HIGH |
|
||||
| Liberty/Oppression | ★★ MODERATE | ★★ MODERATE |
|
||||
|
||||
### Tool: eMFDscore
|
||||
```
|
||||
pip install emfdscore
|
||||
# GitHub: github.com/medianeuroscience/emfdscore
|
||||
# Built on spaCy, GPL-3.0
|
||||
```
|
||||
|
||||
Output per post: scores for each foundation (virtue + vice dimensions)
|
||||
Aggregate across 20+ posts → 10-dimensional moral profile
|
||||
|
||||
### Application to Simulation
|
||||
Moral foundations predict:
|
||||
- What topics trigger emotional responses
|
||||
- What arguments they find persuasive vs repulsive
|
||||
- How they frame political/social issues
|
||||
- Who they instinctively ally with vs oppose
|
||||
- What kind of content they share/amplify
|
||||
|
||||
Example: High Loyalty/Authority person will defend their tribe even when
|
||||
wrong. High Care/Fairness person will break from their tribe on justice
|
||||
issues. This shapes conversation dynamics.
|
||||
|
||||
### For The Simulator (without running eMFDscore)
|
||||
Infer moral foundations from:
|
||||
- Political positions and framing in their posts
|
||||
- What they get angry about vs what they celebrate
|
||||
- Who they defend and who they attack
|
||||
- Key moral vocabulary: "protect", "fair", "loyal", "respect", "pure", "free"
|
||||
|
||||
## Layer 3: Schwartz Values (Core Motivations)
|
||||
|
||||
19 values in circular continuum (adjacent values are compatible,
|
||||
opposite values are in tension):
|
||||
|
||||
**Self-Transcendence** ↔ **Self-Enhancement**
|
||||
- Universalism, Benevolence ↔ Power, Achievement
|
||||
|
||||
**Openness to Change** ↔ **Conservation**
|
||||
- Self-Direction, Stimulation, Hedonism ↔ Tradition, Conformity, Security
|
||||
|
||||
### SemEval-2023 Task 4 Results
|
||||
- Best macro-F1: 0.56 (ensemble of 12 DeBERTa/RoBERTa models)
|
||||
- Most reliable: universalism (nature), security, power
|
||||
- Least reliable: stimulation, hedonism, humility
|
||||
- Dataset: 9,324 annotated arguments, available via Touché
|
||||
|
||||
### Key Finding: Value Perception Is Subjective
|
||||
Epstein et al. (2026): human inter-rater agreement on values is only r=0.201.
|
||||
Fine-tuned GPT-4o reaches r=0.294 — BETTER than human-human agreement.
|
||||
Personalized models reach r=0.334.
|
||||
|
||||
### For The Simulator
|
||||
Values predict MOTIVATION — why someone holds positions, not just what
|
||||
positions they hold. Two people with the same political stance may have
|
||||
completely different underlying values:
|
||||
- "I support open source because FREEDOM" (Self-Direction)
|
||||
- "I support open source because FAIRNESS" (Universalism)
|
||||
- "I support open source because it WORKS BETTER" (Achievement)
|
||||
Same position, different framing, different behavioral predictions.
|
||||
|
||||
## Layer 4: Cognitive Style (How They Think)
|
||||
|
||||
### Integrative Complexity (AutoIC)
|
||||
Measures differentiation (seeing multiple perspectives) and integration
|
||||
(synthesizing perspectives into coherent frameworks).
|
||||
|
||||
- Low IC: black-and-white thinking, strong convictions, simple language
|
||||
- High IC: nuanced, sees multiple sides, hedging, complex sentences
|
||||
|
||||
AutoIC (Conway et al.): 3,500+ complexity-relevant root words/phrases,
|
||||
13 dictionary categories, validated r=0.70-0.82 at document level.
|
||||
|
||||
**WARNING**: LIWC's "analytic thinking" correlates only r=0.14 with actual
|
||||
integrative complexity. Don't use LIWC's score as a proxy.
|
||||
|
||||
### Computational Indicators of Cognitive Style
|
||||
Extractable from 20-50 posts without specialized tools:
|
||||
|
||||
| Indicator | High Cognition | Low Cognition |
|
||||
|-----------|---------------|---------------|
|
||||
| Vocabulary diversity (TTR) | HIGH | LOW |
|
||||
| Avg sentence length | LONGER | SHORTER |
|
||||
| Causal connectives ("because", "therefore") | MORE | FEWER |
|
||||
| Hedging ("perhaps", "it seems") | MORE | FEWER |
|
||||
| Abstract vs concrete language | MORE ABSTRACT | MORE CONCRETE |
|
||||
| Question-asking | MORE | FEWER |
|
||||
| Binary framing ("always/never") | LESS | MORE |
|
||||
|
||||
### For The Simulator
|
||||
Cognitive style directly shapes VOICE:
|
||||
- High IC person: longer posts, more caveats, "on the other hand"
|
||||
- Low IC person: punchy takes, strong assertions, no hedging
|
||||
- This is one of the strongest differentiators between similar-sounding people
|
||||
|
||||
## Layer 5: Narrative Framing (Their Lens on Reality)
|
||||
|
||||
How someone frames an issue reveals deep cognitive and value patterns.
|
||||
|
||||
### Common Frames (Semetko & Valkenburg)
|
||||
- **Conflict**: issue as battle between opposing sides
|
||||
- **Human interest**: personal stories, emotional impact
|
||||
- **Economic**: costs, benefits, financial impact
|
||||
- **Morality**: right vs wrong, ethical principles
|
||||
- **Attribution of responsibility**: who's to blame / who should fix it
|
||||
|
||||
### Detection
|
||||
GPT-4 few-shot with frame definitions achieves F1=70.4%
|
||||
Best for diverse topics where fine-tuned models are too narrow
|
||||
|
||||
### For The Simulator
|
||||
Framing predicts:
|
||||
- How they'll react to news (through which lens)
|
||||
- What aspects they'll emphasize in conversation
|
||||
- What arguments they'll find compelling
|
||||
- Whether they personalize or systematize issues
|
||||
|
||||
Example: Same AI safety event, different frames:
|
||||
- Conflict framer: "The open vs closed battle heats up"
|
||||
- Economic framer: "This will cost the industry billions"
|
||||
- Moral framer: "This is irresponsible and dangerous"
|
||||
- Attribution framer: "The regulators need to step in"
|
||||
|
||||
## Layer 6: Behavioral Metadata (Non-Text Signals)
|
||||
|
||||
Extractable from X API / Bluesky AT Protocol without NLP:
|
||||
|
||||
| Feature | What It Reveals |
|
||||
|---------|----------------|
|
||||
| Posting time distribution | Timezone, sleep patterns, work schedule |
|
||||
| Reply vs original ratio | Conversational vs broadcast personality |
|
||||
| Emoji frequency & types | Emotional expression style |
|
||||
| Hashtag usage | Community identification, signal boosting |
|
||||
| Media attachment rate | Visual vs text orientation |
|
||||
| Thread length | Depth of engagement preference |
|
||||
| Retweet/repost ratio | Amplifier vs creator |
|
||||
| Average post length | Conciseness vs verbosity |
|
||||
| Response latency | Impulsiveness vs deliberation |
|
||||
|
||||
### Trait Correlations (meta-analytic)
|
||||
- **Extraversion**: more posts, more friends, more photos, more group activity
|
||||
- **Neuroticism**: more self-disclosure, more passive consumption, more late-night posting
|
||||
- **Agreeableness**: fewer swear words, more positive emotion, more supportive replies
|
||||
- **Conscientiousness**: more regular posting patterns, more task-oriented content
|
||||
- **Openness**: more diverse topics, more original content, larger networks
|
||||
|
||||
## Putting It All Together: The Deep Dossier
|
||||
|
||||
At high fidelity, compile a multi-layer profile:
|
||||
|
||||
```
|
||||
PSYCHOMETRIC PROFILE: @handle
|
||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
Big Five: O[HIGH] C[MED] E[HIGH] A[LOW] N[LOW]
|
||||
Evidence: {real quotes showing each trait}
|
||||
|
||||
Moral Foundations: Care★★ Fair★★★ Loyal★ Auth★ Sanct★ Liberty★★★
|
||||
Evidence: {what they get angry/excited about}
|
||||
|
||||
Values: Self-Direction dominant, Achievement secondary
|
||||
Evidence: {how they justify their positions}
|
||||
|
||||
Cognitive Style: HIGH integrative complexity
|
||||
Evidence: {hedging patterns, nuanced takes, sentence complexity}
|
||||
|
||||
Dominant Frame: Attribution of Responsibility
|
||||
Evidence: {they consistently focus on who's to blame}
|
||||
|
||||
Behavioral: Night owl, reply-heavy, low emoji, threads > one-shots
|
||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
```
|
||||
|
||||
This multi-layer profile makes predictions much more nuanced than
|
||||
Big Five alone. It tells you not just WHAT someone will say but
|
||||
WHY they'll say it and HOW they'll frame it.
|
||||
170
optional-skills/worldsim/references/gepa-evolution.md
Normal file
170
optional-skills/worldsim/references/gepa-evolution.md
Normal file
@@ -0,0 +1,170 @@
|
||||
# GEPA Evolution — Automated Self-Improvement via hermes-agent-self-evolution
|
||||
|
||||
## What This Is
|
||||
|
||||
The hermes-agent-self-evolution repo (NousResearch/hermes-agent-self-evolution)
|
||||
uses DSPy + GEPA (Genetic-Pareto Prompt Evolution) to automatically evolve
|
||||
Hermes Agent skills. GEPA is an ICLR 2026 Oral paper — it reads EXECUTION
|
||||
TRACES to understand WHY things fail, then proposes targeted mutations.
|
||||
|
||||
This means: we can point GEPA at the worldsim skill and automatically evolve
|
||||
every component — simulation instructions, anti-slop rules, star thread
|
||||
methodology, mechanical verification checklist, dossier templates — using
|
||||
our own simulation outputs scored against real data as the eval signal.
|
||||
|
||||
The recursive self-improvement pipeline we built manually (log failures →
|
||||
promote patterns → update rules) can be AUTOMATED via GEPA.
|
||||
|
||||
## How It Applies to WorldSim
|
||||
|
||||
### What GEPA Evolves (text, not weights)
|
||||
GEPA evolves the TEXT of prompts and instructions. For worldsim, that means:
|
||||
|
||||
| Target | What Gets Evolved | Eval Signal |
|
||||
|--------|------------------|-------------|
|
||||
| SKILL.md | Immersion protocol, pipeline instructions | Simulation quality scores |
|
||||
| star-thread.md | Methodology for finding star threads | Thread-to-voice accuracy |
|
||||
| anti-slop.md | Slop word lists, structural patterns | Slop detection recall/precision |
|
||||
| simulation-engine.md | Platform formats, conversation dynamics | Voice fidelity scores |
|
||||
| adversarial-refinement.md | Mechanical check thresholds, GAN loop | Pre vs post refinement delta |
|
||||
| prediction-engine.md | Forecasting methodology | Prediction Brier scores |
|
||||
| dossier template | Profile structure and fields | Profile quality scores |
|
||||
|
||||
### The Eval Dataset
|
||||
Built from worldsim's own outputs + real data:
|
||||
|
||||
1. **Voice fidelity pairs**: (simulated post, real post from same person) →
|
||||
LLM-as-judge scores similarity 0-1
|
||||
2. **Mechanical check logs**: what did the checks catch? what slipped through?
|
||||
3. **Prediction accuracy**: tracked predictions scored against reality
|
||||
4. **Held-out tests**: predicted tweets vs actual tweets
|
||||
5. **Turing test results**: could the discriminator tell real from fake?
|
||||
6. **User corrections**: any time the user catches something the system missed
|
||||
(like the emoji fabrication incident — that's the richest signal)
|
||||
|
||||
### The GEPA Loop for WorldSim
|
||||
|
||||
```
|
||||
1. RUN worldsim simulation (creates execution traces)
|
||||
2. SCORE outputs against real data (voice, position, mechanical)
|
||||
3. LOG traces + scores + user feedback to eval dataset
|
||||
4. GEPA EVOLVES the skill component that had lowest scores
|
||||
- Reads traces to understand WHY it scored low
|
||||
- Proposes mutation to that specific reference file
|
||||
- Tests mutation against held-out eval data
|
||||
- If improved: create PR, human reviews
|
||||
5. REPEAT — each cycle makes the skill better
|
||||
```
|
||||
|
||||
### Concrete Example
|
||||
|
||||
GEPA discovers from traces that simulated conversations always have
|
||||
symmetric turn-taking (4/4/4). It reads the mechanical check log that
|
||||
caught this in 3 of the last 5 simulations. It reads the current
|
||||
simulation-engine.md and sees the conversation architecture section.
|
||||
It proposes a mutation:
|
||||
|
||||
OLD: "Opening Moves (1-3 posts) → Development (4-8 posts) → Peak → Resolution"
|
||||
NEW: "Opening: most impulsive person posts. Others join ASYMMETRICALLY — one person
|
||||
gets 40-50% of turns, one gets 15-20%, others fill the rest. The ratio should
|
||||
match their real reply-to-original ratios from the dossier."
|
||||
|
||||
This mutation gets tested against the next 5 simulations. If symmetry
|
||||
violations drop and voice scores don't decrease, it gets merged.
|
||||
|
||||
## Setup
|
||||
|
||||
```bash
|
||||
# Clone the evolution repo
|
||||
git clone https://github.com/NousResearch/hermes-agent-self-evolution.git
|
||||
cd hermes-agent-self-evolution
|
||||
pip install -e ".[dev]"
|
||||
|
||||
# Point at hermes-agent repo
|
||||
export HERMES_AGENT_REPO=~/.hermes
|
||||
|
||||
# Evolve the worldsim skill specifically
|
||||
python -m evolution.skills.evolve_skill \
|
||||
--skill hermes-simulator \
|
||||
--iterations 10 \
|
||||
--eval-source sessiondb
|
||||
```
|
||||
|
||||
## What Makes This Different From Manual Self-Improvement
|
||||
|
||||
The manual pipeline (references/recursive-self-improvement.md) requires the
|
||||
agent to notice its own failures and write rules. This has two problems:
|
||||
|
||||
1. The agent shares weights with the generator — it's biased toward
|
||||
approving its own output (the emoji incident proved this)
|
||||
2. Promoting patterns to rules is slow and requires 3+ occurrences
|
||||
|
||||
GEPA solves both:
|
||||
1. The eval signal comes from EXTERNAL data (real posts, user corrections,
|
||||
mechanical checks) — not the agent's self-assessment
|
||||
2. Evolution happens per-iteration, not per-3-failures
|
||||
3. Mutations are tested against held-out data before merging
|
||||
4. The Pareto frontier maintains diversity — different strategies for
|
||||
different types of people/conversations
|
||||
|
||||
## Integration Points
|
||||
|
||||
### Eval Dataset Builder
|
||||
Mine rehoboam DB for training data:
|
||||
- simulation_logs table → execution traces
|
||||
- prediction_scores table → accuracy data
|
||||
- audit_log table → mechanical check results
|
||||
- user correction events → highest-value signal
|
||||
|
||||
### Fitness Function for WorldSim
|
||||
```python
|
||||
def worldsim_fitness(simulation_output, real_data):
|
||||
scores = {}
|
||||
# Voice fidelity: embed real + simulated, cosine similarity
|
||||
scores["voice"] = embed_and_compare(simulation_output, real_data.tweets)
|
||||
# Mechanical pass rate: what % of checks passed without fixes
|
||||
scores["mechanical"] = mechanical_check_pass_rate(simulation_output)
|
||||
# Slop score: count of slop words/patterns detected
|
||||
scores["anti_slop"] = 1.0 - (slop_count / total_words)
|
||||
# Structure: turn asymmetry, conversation naturalness
|
||||
scores["structure"] = naturalness_score(simulation_output)
|
||||
# Textual feedback for GEPA's reflective mutation
|
||||
feedback = generate_textual_feedback(scores, simulation_output, real_data)
|
||||
return aggregate_score(scores), feedback
|
||||
```
|
||||
|
||||
### The Key Insight: Textual Feedback
|
||||
GEPA's superpower is that it doesn't just get a scalar score — it gets
|
||||
TEXTUAL FEEDBACK explaining what went wrong. Our mechanical verification
|
||||
system already produces this:
|
||||
|
||||
"@nosilverv avg 33.2 words vs real 15.6 (113% deviation) — SHORTEN"
|
||||
"Parallel antithesis detected: 'The most X... The most Y...' — STRIP"
|
||||
"Emoji rate 0% simulated but 10% real — OK (within tolerance)"
|
||||
|
||||
This text goes directly into GEPA's reflective mutation pipeline. It reads
|
||||
these messages and proposes changes to the skill instructions that would
|
||||
prevent these specific failures in future simulations.
|
||||
|
||||
## Evolution Targets by Priority
|
||||
|
||||
1. **simulation-engine.md** — highest impact on output quality
|
||||
2. **anti-slop.md** — directly measurable, highest precision eval
|
||||
3. **star-thread.md** — hardest to evaluate but most impactful on voice
|
||||
4. **adversarial-refinement.md** — meta: improving the improvement system
|
||||
5. **SKILL.md pipeline instructions** — orchestration optimization
|
||||
6. **dossier template** — structure optimization
|
||||
7. **prediction-engine.md** — measurable via Brier scores
|
||||
|
||||
## The Virtuous Cycle
|
||||
|
||||
```
|
||||
More simulations → more eval data → better GEPA mutations
|
||||
→ better skill instructions → better simulations → more eval data → ...
|
||||
```
|
||||
|
||||
This is the endgame: the worldsim skill evolves itself through use.
|
||||
Every simulation makes the next one better, not just through logged
|
||||
rules, but through automated evolutionary optimization of the
|
||||
instructions themselves. The system doesn't just learn WHAT went wrong —
|
||||
it rewrites its own code to prevent it.
|
||||
262
optional-skills/worldsim/references/knowledge-archive.md
Normal file
262
optional-skills/worldsim/references/knowledge-archive.md
Normal file
@@ -0,0 +1,262 @@
|
||||
# Knowledge Archive — Per-Person Source Library + Expert Synthesis
|
||||
|
||||
## The Problem With Profiles
|
||||
|
||||
A profile is a SNAPSHOT. It says "this person believes X" but doesn't
|
||||
show you WHERE they said it, WHEN, in WHAT context, or HOW their
|
||||
thinking evolved. You can't cite a profile. You can't trace a claim
|
||||
back to a source. And when you're simulating a conversation about
|
||||
topic Z, the profile gives you everything about the person equally
|
||||
weighted — their views on AI and their views on cooking and their
|
||||
views on politics all crammed into the same context window.
|
||||
|
||||
## The Archive
|
||||
|
||||
For every person the system touches, build a LIBRARY:
|
||||
|
||||
```
|
||||
~/.hermes/rehoboam/archives/{handle}/
|
||||
├── index.json ← master index: all entries, metadata, embeddings
|
||||
├── sources/
|
||||
│ ├── x_tweets.jsonl ← every tweet pulled, with ID, timestamp, URL, metrics
|
||||
│ ├── x_replies.jsonl ← their replies (different voice register)
|
||||
│ ├── bluesky_posts.jsonl ← bluesky posts
|
||||
│ ├── blog_posts.jsonl ← full text of blog posts with URLs
|
||||
│ ├── podcast_quotes.jsonl ← attributed quotes from transcripts
|
||||
│ ├── interviews.jsonl ← quotes from news articles/interviews
|
||||
│ ├── reddit_comments.jsonl
|
||||
│ ├── github_comments.jsonl
|
||||
│ ├── goodreads_reviews.jsonl
|
||||
│ ├── threads_posts.jsonl
|
||||
│ └── other.jsonl ← anything else (HN, Quora, etc.)
|
||||
├── topics/
|
||||
│ ├── ai_safety.jsonl ← auto-clustered by topic
|
||||
│ ├── open_source.jsonl
|
||||
│ ├── consciousness.jsonl
|
||||
│ └── ...
|
||||
└── embeddings/
|
||||
└── all_embeddings.npy ← sentence-transformer vectors for semantic search
|
||||
```
|
||||
|
||||
### Entry Format (every entry in every source file)
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "unique_id",
|
||||
"handle": "teknium",
|
||||
"platform": "x",
|
||||
"type": "tweet|reply|blog|podcast|interview|comment|review",
|
||||
"text": "the actual text they said",
|
||||
"url": "https://x.com/Teknium/status/1234567890",
|
||||
"timestamp": "2026-04-05T21:40:48Z",
|
||||
"context": {
|
||||
"replying_to": "@otheruser's tweet about X",
|
||||
"thread_position": 3,
|
||||
"topic": "open source AI",
|
||||
"source_title": "Lex Fridman Podcast #412"
|
||||
},
|
||||
"metrics": {
|
||||
"likes": 234,
|
||||
"retweets": 45,
|
||||
"replies": 12
|
||||
},
|
||||
"topics": ["open_source", "ai_models", "hermes"],
|
||||
"embedding_id": 42
|
||||
}
|
||||
```
|
||||
|
||||
Every entry has a URL. Everything is traceable. Nothing is paraphrased
|
||||
without the original alongside it.
|
||||
|
||||
## Collection Pipeline
|
||||
|
||||
When `worldsim> profile @handle` or `worldsim> archive @handle` runs:
|
||||
|
||||
### Step 1: Pull Everything
|
||||
Use every verified access method to collect raw materials:
|
||||
- X API: get max tweets (paginate with next_token to get hundreds)
|
||||
- nitter.cz: timeline content
|
||||
- ThreadReaderApp: historical threads
|
||||
- Bluesky: full post history
|
||||
- GitHub: issue comments, PR reviews, gists, README
|
||||
- Reddit: comment history
|
||||
- Blog/Substack: full posts (web_extract)
|
||||
- Podcast transcripts: attributed quotes
|
||||
- Interviews: quotes with attribution
|
||||
- Goodreads: reviews
|
||||
- Medium: RSS feed full text
|
||||
|
||||
### Step 2: Deduplicate
|
||||
Same content appears across platforms (cross-posted tweets, syndicated
|
||||
blog posts). Deduplicate by content similarity, keep the richest version
|
||||
(the one with most metadata/context).
|
||||
|
||||
### Step 3: Topic Cluster
|
||||
Run lightweight topic classification on each entry:
|
||||
- Use the LLM or a simple keyword matcher to assign 1-3 topic tags
|
||||
- Cluster into topic files for fast retrieval
|
||||
- Topics are dynamic — new topics emerge from the data
|
||||
|
||||
### Step 4: Embed
|
||||
Generate sentence-transformer embeddings for every entry.
|
||||
Store in numpy array for fast cosine similarity search.
|
||||
This enables semantic retrieval: "find everything @handle said about
|
||||
consciousness" even if they never used the word "consciousness."
|
||||
|
||||
### Step 5: Index
|
||||
Build the master index.json with entry count, topic distribution,
|
||||
timestamp range, platform coverage, and quality metrics.
|
||||
|
||||
## Context-Aware Retrieval
|
||||
|
||||
This is the key. The archive might have 500 entries for a person.
|
||||
The context window can hold maybe 30-50 of them alongside all the
|
||||
other simulation context. You MUST retrieve selectively.
|
||||
|
||||
### For Simulation
|
||||
When simulating @handle talking about topic X:
|
||||
|
||||
```
|
||||
1. Semantic search: embed the current conversation context
|
||||
2. Retrieve top 10-15 entries by cosine similarity to context
|
||||
3. Also retrieve: 5 highest-engagement entries (their "greatest hits")
|
||||
4. Also retrieve: 3 most recent entries (freshness)
|
||||
5. Also retrieve: 2 entries that CONTRADICT the expected position
|
||||
(prevents confirmation bias in the simulation)
|
||||
6. Deduplicate. Cap at 25-30 entries total.
|
||||
7. These become the "voice anchors" for generation.
|
||||
```
|
||||
|
||||
The simulation draws from SPECIFIC REAL QUOTES relevant to the current
|
||||
conversation. Not a generic profile. Not everything they've ever said.
|
||||
The 25 most relevant things they've said about THIS topic.
|
||||
|
||||
### For Expert Synthesis
|
||||
When the user asks "who are the best minds on X and what have they said?":
|
||||
|
||||
```
|
||||
1. Search ALL archived people's entries for topic X
|
||||
2. Rank by: entry quality × person expertise × relevance to query
|
||||
3. Return a synthesis with CITATIONS:
|
||||
|
||||
On the topic of AI consciousness:
|
||||
|
||||
@repligate argues that LLMs exhibit "simulacra of consciousness"
|
||||
rather than consciousness itself, distinguishing between the
|
||||
model's behavior and its substrate:
|
||||
> "the question isn't whether GPT is conscious but whether the
|
||||
> character it's simulating is conscious within the fiction"
|
||||
— tweet, 2025-03-15 (2.4K likes)
|
||||
https://x.com/repligate/status/...
|
||||
|
||||
@nickcammarata approaches it from a meditation/first-person
|
||||
perspective, noting parallels between introspective practice
|
||||
and interpretability:
|
||||
> "observation changes the system being observed, in meditation
|
||||
> and in interp"
|
||||
— tweet, 2026-04-05 (2.9K likes)
|
||||
https://x.com/nickcammarata/status/...
|
||||
|
||||
@tszzl is skeptical of the framing entirely:
|
||||
> "consciousness discourse is philosophy cosplaying as engineering"
|
||||
— tweet, 2025-11-22 (5.1K likes)
|
||||
https://x.com/tszzl/status/...
|
||||
```
|
||||
|
||||
Every claim attributed. Every quote sourced. Every link clickable.
|
||||
|
||||
### For Grounding Predictions
|
||||
When predicting what @handle would say about event Y:
|
||||
|
||||
```
|
||||
1. Retrieve all archive entries related to Y or adjacent topics
|
||||
2. Identify their PATTERN of response to similar events
|
||||
3. Ground the prediction in specific past statements:
|
||||
|
||||
PREDICTION: @handle would likely frame event Y through the lens
|
||||
of [topic Z], based on:
|
||||
- tweet [url]: "quote about Z" (2025-06-15)
|
||||
- blog post [url]: "longer quote about Z" (2025-09-20)
|
||||
- podcast [url]: "verbal quote about Z" (2026-01-10)
|
||||
CONFIDENCE: 78% (3 consistent sources over 7 months)
|
||||
```
|
||||
|
||||
## Incremental Updates
|
||||
|
||||
The archive grows over time. Each time the person is profiled:
|
||||
1. Pull new content since last archive timestamp
|
||||
2. Append to source files
|
||||
3. Re-embed new entries only
|
||||
4. Update topic clusters
|
||||
5. Update index
|
||||
|
||||
Don't rebuild from scratch. Append and re-index.
|
||||
|
||||
## Expert Table
|
||||
|
||||
When you have 20+ archived people, build an expert table:
|
||||
|
||||
```
|
||||
worldsim> experts "open source AI"
|
||||
|
||||
EXPERT TABLE: open source AI
|
||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
|
||||
@Teknium | 47 entries | voice: builder/practitioner
|
||||
"we can prove that open approaches build better, more
|
||||
trustworthy systems" — tweet, 2026-04-05
|
||||
Latest: 2 hours ago | Stance: STRONG ADVOCATE
|
||||
|
||||
@repligate | 12 entries | voice: philosophical/theoretical
|
||||
"open weights = accountability. you can't audit a black box"
|
||||
— tweet, 2025-11-30
|
||||
Latest: 3 days ago | Stance: ADVOCATE (principled)
|
||||
|
||||
@eigenrobot | 8 entries | voice: statistical/contrarian
|
||||
"the open source premium is largely downstream of selection
|
||||
effects in who contributes" — tweet, 2025-08-14
|
||||
Latest: 1 week ago | Stance: SKEPTICAL OF FRAMING
|
||||
|
||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
3 experts found | 67 total entries | synthesize? (y/n)
|
||||
```
|
||||
|
||||
The table shows: who knows about this, what they've said, how recently,
|
||||
and what their stance is. All grounded in archived quotes with sources.
|
||||
|
||||
## Integration With Simulation
|
||||
|
||||
When the star thread + dossier + archive work together:
|
||||
|
||||
```
|
||||
STAR THREAD: drives the core generation (what they're DOING)
|
||||
DOSSIER: provides constraints (psychometrics, voice metrics, baselines)
|
||||
ARCHIVE: provides GROUNDING (specific real quotes for this context)
|
||||
MECHANICAL CHECKS: verifies surface features (emoji, length, slop)
|
||||
```
|
||||
|
||||
The archive prevents the simulation from drifting into generic territory.
|
||||
Instead of "this person would probably say something about open source,"
|
||||
it's "this person said THIS SPECIFIC THING about open source 3 weeks ago,
|
||||
and their simulation should be consistent with that while also being fresh."
|
||||
|
||||
## The Overfitting Problem
|
||||
|
||||
"Without overfitting to a particular material the new context doesn't call for."
|
||||
|
||||
The retrieval system MUST be selective. If someone said 47 things about
|
||||
open source AI, and the current conversation is about AI regulation,
|
||||
don't dump all 47 open source quotes into context. Maybe 3 are relevant
|
||||
because they connect open source to regulation. Retrieve THOSE 3.
|
||||
|
||||
The cosine similarity search handles this naturally — it matches the
|
||||
CURRENT conversation context against the archive and returns what's
|
||||
actually relevant, not everything tagged with a nearby topic.
|
||||
|
||||
The anti-overfitting checklist:
|
||||
- Never load more than 25-30 archive entries per person into context
|
||||
- Weight by relevance to CURRENT conversation, not by general importance
|
||||
- Include at least 2 entries that contradict the expected position
|
||||
- Include at least 3 recent entries regardless of topic relevance (freshness)
|
||||
- If the conversation shifts topic mid-simulation, RE-RETRIEVE for new context
|
||||
- The archive is a LIBRARY you consult, not a script you follow
|
||||
321
optional-skills/worldsim/references/mass-behavior.md
Normal file
321
optional-skills/worldsim/references/mass-behavior.md
Normal file
@@ -0,0 +1,321 @@
|
||||
# Mass Behavior Modeling — Communities, Clusters, Cascades
|
||||
|
||||
Understanding individual behavior requires understanding the social
|
||||
ecosystem they exist in. This reference covers the macro layer:
|
||||
community detection, influence networks, audience modeling, and
|
||||
predicting how groups respond to events.
|
||||
|
||||
## Why This Matters For Simulation
|
||||
|
||||
Individual prediction accuracy: ~56-60%
|
||||
Individual-in-context prediction: significantly higher
|
||||
|
||||
A person's behavior is constrained by their community. Knowing WHICH
|
||||
community they belong to, WHO influences them, and WHAT information
|
||||
ecosystem they're in makes individual predictions much sharper.
|
||||
|
||||
Lewin's equation: B = f(P, E). This reference is about the E.
|
||||
|
||||
## The Ecosystem Stack
|
||||
|
||||
```
|
||||
Layer 5: AUDIENCE REACTION — How would this person's audience respond?
|
||||
Layer 4: STANCE & SENTIMENT — What positions do clusters hold?
|
||||
Layer 3: INFLUENCE NETWORKS — Who spreads ideas to whom?
|
||||
Layer 2: COMMUNITY CLUSTERS — Who groups together?
|
||||
Layer 1: SOCIAL GRAPH — Who follows/interacts with whom?
|
||||
```
|
||||
|
||||
## Layer 1: Social Graph Construction
|
||||
|
||||
### Data Sources (by accessibility)
|
||||
|
||||
| Source | Access | Quality | Tools |
|
||||
|--------|--------|---------|-------|
|
||||
| Bluesky AT Protocol | FREE, open, no auth | Excellent | atproto (pip) |
|
||||
| X/Twitter API | Bearer token, limited | Good but restricted | curl, tweepy |
|
||||
| Reddit | API with limits | Good for comments | PRAW (pip) |
|
||||
| GitHub | Free API | Great for tech people | PyGithub (pip) |
|
||||
| Web scraping | Fragile, TOS issues | Variable | Last resort |
|
||||
|
||||
### Bluesky: The Open Gold Mine
|
||||
```python
|
||||
# pip install atproto
|
||||
from atproto import Client
|
||||
client = Client()
|
||||
# No auth needed for public data
|
||||
|
||||
# Get follower graph
|
||||
followers = client.get_followers(actor="handle.bsky.social")
|
||||
following = client.get_follows(actor="handle.bsky.social")
|
||||
|
||||
# Real-time firehose (no auth!)
|
||||
# wss://jetstream1.us-east.bsky.network/subscribe
|
||||
```
|
||||
|
||||
### Graph Types
|
||||
- **Follow graph**: who follows whom (directed, static-ish)
|
||||
- **Interaction graph**: who replies to / retweets whom (directed, dynamic)
|
||||
- **Mention graph**: who mentions whom (directed, weighted by frequency)
|
||||
- **Co-engagement graph**: who engages with the same content (undirected)
|
||||
|
||||
Interaction graphs are more informative than follow graphs for predicting
|
||||
actual behavioral alignment.
|
||||
|
||||
### Tools
|
||||
```
|
||||
pip install networkx python-igraph
|
||||
```
|
||||
NetworkX for prototyping (<100K nodes), igraph for production (millions).
|
||||
|
||||
## Layer 2: Community Detection
|
||||
|
||||
### Algorithms (ranked by quality)
|
||||
|
||||
| Algorithm | Quality | Speed | Notes |
|
||||
|-----------|---------|-------|-------|
|
||||
| Leiden | Best | Fast | Guarantees connected communities |
|
||||
| Louvain | Good | Fastest | Can produce disconnected communities |
|
||||
| Infomap | Excellent | Medium | Based on information theory |
|
||||
| Label Propagation | Decent | Very fast | Non-deterministic |
|
||||
|
||||
### The Meta-Library: CDLib
|
||||
```
|
||||
pip install cdlib
|
||||
```
|
||||
Wraps 50+ community detection algorithms in a unified API.
|
||||
Works on top of networkx/igraph. Highly recommended.
|
||||
|
||||
```python
|
||||
import cdlib
|
||||
from cdlib import algorithms
|
||||
import networkx as nx
|
||||
|
||||
G = nx.karate_club_graph()
|
||||
communities = algorithms.leiden(G)
|
||||
# Also: louvain, infomap, label_propagation, angel, demon, etc.
|
||||
```
|
||||
|
||||
### What Communities Tell Us
|
||||
Each community in a social graph typically shares:
|
||||
- Ideological orientation
|
||||
- Topic interests
|
||||
- Information sources
|
||||
- Language patterns and in-group vocabulary
|
||||
- Reaction patterns to events
|
||||
|
||||
Knowing which community someone belongs to immediately constrains
|
||||
predictions about their likely positions and reactions.
|
||||
|
||||
## Layer 3: Influence Networks
|
||||
|
||||
### Key Insight (Zhou et al., National Science Review 2024)
|
||||
Network centrality alone is INSUFFICIENT for predicting influence.
|
||||
Must combine structural position with behavioral features:
|
||||
- Posting frequency
|
||||
- Historical content virality
|
||||
- Response rate / engagement ratio
|
||||
- Content originality (original vs repost ratio)
|
||||
|
||||
### Centrality Measures
|
||||
```python
|
||||
import networkx as nx
|
||||
G = nx.DiGraph() # directed social graph
|
||||
|
||||
# Who has the most connections?
|
||||
degree = nx.degree_centrality(G)
|
||||
|
||||
# Who bridges different communities?
|
||||
betweenness = nx.betweenness_centrality(G)
|
||||
|
||||
# Who's connected to other well-connected people?
|
||||
eigenvector = nx.eigenvector_centrality(G)
|
||||
|
||||
# Adapted from web — directed influence flow
|
||||
pagerank = nx.pagerank(G)
|
||||
```
|
||||
|
||||
### Superspreader Identification (DeVerna et al., PLOS ONE 2024)
|
||||
Superspreaders of content fall into three categories:
|
||||
1. **Pundits**: large following, high authority, original content
|
||||
2. **Media outlets**: institutional accounts, news organizations
|
||||
3. **Affiliated personal accounts**: connected to pundits/outlets
|
||||
|
||||
For simulation: knowing who the superspreaders are in a person's
|
||||
network tells you what information they're likely exposed to.
|
||||
|
||||
### Information Cascade Modeling
|
||||
```
|
||||
pip install ndlib # Network Diffusion Library
|
||||
```
|
||||
|
||||
NDlib models how information spreads through networks:
|
||||
- Independent Cascade Model
|
||||
- Linear Threshold Model
|
||||
- SIR/SIS epidemiological models adapted for info spread
|
||||
- Voter Model (opinion dynamics)
|
||||
- Sznajd Model (social influence)
|
||||
|
||||
## Layer 4: Stance & Sentiment Analysis
|
||||
|
||||
### Ready-To-Use Models (HuggingFace)
|
||||
|
||||
**Tweet Sentiment** (most reliable):
|
||||
```
|
||||
cardiffnlp/twitter-roberta-base-sentiment-latest
|
||||
# Labels: positive / negative / neutral
|
||||
```
|
||||
|
||||
**Political Stance**:
|
||||
```
|
||||
kornosk/bert-election2020-twitter-stance-biden-KE-MLM
|
||||
kornosk/bert-election2020-twitter-stance-trump-KE-MLM
|
||||
launch/POLITICS # left / center / right
|
||||
```
|
||||
|
||||
**All-in-One Tweet NLP**:
|
||||
```
|
||||
pip install tweetnlp
|
||||
# Sentiment, emotion, hate speech, NER, topic classification
|
||||
```
|
||||
|
||||
### Topic-Level Stance Tracking
|
||||
Combine BERTopic (dynamic topic modeling) with stance classifiers:
|
||||
1. Cluster posts into topics over time windows
|
||||
2. Classify stance per topic per community
|
||||
3. Track stance shifts over time
|
||||
4. Detect divergence between communities on emerging topics
|
||||
|
||||
### PRISM Framework (ACL 2025)
|
||||
First framework for interpretable political bias embeddings.
|
||||
Two-stage: mine bias indicators → cross-encoder assigns structured scores.
|
||||
```
|
||||
github.com/dukesun99/ACL-PRISM
|
||||
```
|
||||
|
||||
## Layer 5: Audience Modeling & Crowd Prediction
|
||||
|
||||
### The Frontier: Predicting How Groups React
|
||||
|
||||
Key papers and findings:
|
||||
|
||||
**CReAM (WWW 2024)**: Predicts which of two posts gets more engagement.
|
||||
Uses LLM-generated features + FLANG-RoBERTa cross-encoder.
|
||||
Demonstrates crowd reaction IS predictable from content alone.
|
||||
|
||||
**PopSim (Dec 2025)**: LLM multi-agent social network sandbox.
|
||||
Simulates content propagation dynamics using "Social Mean Field"
|
||||
for individual-population interaction. Reduces prediction error 8.82%.
|
||||
|
||||
**Conditioned Comment Prediction (EACL 2026)**:
|
||||
KEY FINDING: behavioral traces (past posts) are BETTER than
|
||||
descriptive personas for conditioning LLMs to predict user behavior.
|
||||
This validates our OSINT approach: real data > personality labels.
|
||||
|
||||
**DEBATE Benchmark (Oct 2025)**:
|
||||
WARNING: LLM agents converge opinions TOO QUICKLY vs real humans.
|
||||
SFT + DPO helps but gap remains. Real communities maintain
|
||||
disagreement longer than simulated ones.
|
||||
|
||||
**Distributional vs Individual Prediction (PMC 2025)**:
|
||||
Group-level predictions are more reliable than individual ones.
|
||||
Predicting "65% of this community will react negatively" is more
|
||||
accurate than predicting "this specific person will react negatively."
|
||||
|
||||
### Application to Simulation
|
||||
|
||||
When simulating @person talking about event X, consider:
|
||||
1. What community does @person belong to?
|
||||
2. How is that community reacting to X? (distributional prediction)
|
||||
3. Where does @person sit within that community? (conformist vs contrarian)
|
||||
4. Who influences @person? What are THEY saying?
|
||||
5. How does @person's audience react to their take? (engagement prediction)
|
||||
|
||||
This context makes individual predictions sharper.
|
||||
|
||||
## Echo Chamber & Filter Bubble Detection
|
||||
|
||||
### Technique
|
||||
1. Build interaction graph
|
||||
2. Run Leiden community detection
|
||||
3. For each community, aggregate stance on key issues
|
||||
4. Measure ideological homogeneity within communities
|
||||
5. Compare cross-community vs within-community content similarity
|
||||
6. High within + low cross = echo chamber
|
||||
|
||||
### Tools
|
||||
```
|
||||
github.com/mminici/Echo-Chamber-Detection # Cascade-based, CIKM 2022
|
||||
# Includes Brexit and VaxNoVax datasets
|
||||
```
|
||||
|
||||
### What It Tells Us
|
||||
Knowing someone's echo chamber tells you:
|
||||
- What information they're exposed to
|
||||
- What they're NOT exposed to
|
||||
- How extreme their positions might be (isolation → radicalization)
|
||||
- Whether they'll encounter pushback or only agreement
|
||||
- How they'll react to information from outside their bubble
|
||||
|
||||
## User Embeddings: "Find People Like @person"
|
||||
|
||||
### Strategy
|
||||
1. Embed each user's recent N posts with sentence-transformers
|
||||
2. Average embeddings → user vector
|
||||
3. Use FAISS for similarity search
|
||||
4. Cluster users with HDBSCAN in embedding space
|
||||
|
||||
### Best Models for Social Media Text
|
||||
```
|
||||
# General purpose (good baseline)
|
||||
sentence-transformers/all-mpnet-base-v2
|
||||
|
||||
# Tweet-specific (better domain fit)
|
||||
cardiffnlp/twitter-roberta-base
|
||||
vinai/bertweet-base # pretrained on 850M tweets
|
||||
```
|
||||
|
||||
### Graph + Text Hybrid Embeddings
|
||||
```
|
||||
pip install karateclub
|
||||
```
|
||||
KarateClub provides Node2Vec, DeepWalk, Graph2Vec — embed users
|
||||
based on graph position. Combine with text embeddings for hybrid
|
||||
vectors that capture BOTH what someone says AND where they sit
|
||||
in the social network.
|
||||
|
||||
## Practical Application to Simulation
|
||||
|
||||
### For Individual Simulation (what we already do)
|
||||
Add ecosystem context to each dossier:
|
||||
- Which community cluster they belong to
|
||||
- Who their top influencers are (who do they retweet/amplify most)
|
||||
- What echo chamber are they in (information environment)
|
||||
- How does their community view the simulation topic
|
||||
|
||||
### For Audience Simulation (new capability)
|
||||
When user asks "what would @person's audience say":
|
||||
1. Identify @person's follower community
|
||||
2. Sample representative voices from that community
|
||||
3. Model the DISTRIBUTION of responses, not just one response
|
||||
4. Include: cheerleaders, critics, joke-makers, lurkers
|
||||
5. Weight by typical engagement patterns
|
||||
|
||||
### For Cascade Prediction (new capability)
|
||||
When user asks "how would this take spread":
|
||||
1. Model the initial tweet and its immediate network
|
||||
2. Predict which nodes amplify (based on stance alignment + influence)
|
||||
3. Estimate reach and engagement range
|
||||
4. Predict quote-tweet ratio (agreement vs dunking)
|
||||
|
||||
## Recommended Minimal Stack
|
||||
|
||||
```bash
|
||||
pip install networkx python-igraph leidenalg cdlib karateclub
|
||||
pip install sentence-transformers transformers tweetnlp
|
||||
pip install ndlib faiss-cpu hdbscan atproto
|
||||
```
|
||||
|
||||
This gives you: graph construction, community detection, user embeddings,
|
||||
stance/sentiment analysis, diffusion simulation, similarity search,
|
||||
clustering, and Bluesky data access. All open source, all pip-installable.
|
||||
370
optional-skills/worldsim/references/osint-pipeline.md
Normal file
370
optional-skills/worldsim/references/osint-pipeline.md
Normal file
@@ -0,0 +1,370 @@
|
||||
# OSINT Pipeline — Deep Intelligence Gathering
|
||||
|
||||
Full-spectrum open source intelligence for building personality models.
|
||||
This goes beyond social media posts into visual identity, cross-platform
|
||||
footprints, and behavioral analysis.
|
||||
|
||||
## Tool Arsenal
|
||||
|
||||
| Tool | Use Case | Strength |
|
||||
|------|----------|----------|
|
||||
| `web_search` | Find anything, initial discovery | Fast, broad, indexed content |
|
||||
| `web_extract` | Pull full page content | Blogs, articles, profiles, PDFs |
|
||||
| `browser_navigate` + `browser_snapshot` | View live pages | Dynamic content, login walls |
|
||||
| `browser_vision` | Analyze what a page looks like | Layouts, visual identity, screenshots |
|
||||
| `vision_analyze` | Analyze any image by URL/path | Profile pics, post images, aesthetics |
|
||||
| `browser_get_images` | List all images on a page | Find images to feed to vision_analyze |
|
||||
| Yandex reverse image search | Find where an image appears | Identity verification, alt accounts |
|
||||
| `x-cli` (if available) | Direct Twitter API | Timelines, search, metadata |
|
||||
|
||||
## Instagram Intelligence
|
||||
|
||||
Instagram is CRITICAL for personality modeling — it reveals:
|
||||
- Visual identity and aesthetic preferences
|
||||
- Real-life social circles (tagged people, group photos)
|
||||
- Lifestyle signals (travel, food, hobbies, pets)
|
||||
- Caption voice (often different from Twitter voice)
|
||||
- Story highlights (curated self-image)
|
||||
- Bio links (cross-platform connections)
|
||||
|
||||
### Viewing Instagram Profiles (VERIFIED APRIL 2026)
|
||||
|
||||
**METHOD 1 — Instagram Private Web API (BEST, returns full JSON)**
|
||||
```bash
|
||||
curl -s -H 'User-Agent: Mozilla/5.0 (iPhone; CPU iPhone OS 16_0 like Mac OS X)' \
|
||||
-H 'x-ig-app-id: 936619743392459' \
|
||||
'https://i.instagram.com/api/v1/users/web_profile_info/?username={handle}'
|
||||
```
|
||||
Returns ~500KB of JSON: full profile + last 12 posts with captions, likes,
|
||||
comments, CDN image URLs, timestamps. No auth needed.
|
||||
|
||||
**METHOD 2 — Instagram oEmbed API (for individual posts)**
|
||||
```bash
|
||||
curl -s 'https://www.instagram.com/api/v1/oembed/?url=https://www.instagram.com/p/{SHORTCODE}/'
|
||||
```
|
||||
Returns: caption text, author_name, thumbnail URL. No auth.
|
||||
|
||||
**METHOD 3 — Pixwox via web_extract (profile viewer)**
|
||||
```python
|
||||
web_extract(["https://pixwox.com/profile/{username}"])
|
||||
```
|
||||
Returns 12+ recent posts with captions, engagement stats. Cloudflare blocks
|
||||
curl but web_extract bypasses it.
|
||||
|
||||
**METHOD 4 — SocialBlade via web_extract (analytics)**
|
||||
```python
|
||||
web_extract(["https://socialblade.com/instagram/user/{handle}"])
|
||||
```
|
||||
Returns follower count, engagement rate, 14-day tracking.
|
||||
|
||||
**METHOD 5 — CDN direct download (images from API responses)**
|
||||
Image URLs from API responses (scontent-*.cdninstagram.com) download
|
||||
directly with no auth. Feed them to vision_analyze for visual profiling.
|
||||
|
||||
**METHOD 6 — Google indexed content**
|
||||
```
|
||||
web_search("site:instagram.com {username}")
|
||||
```
|
||||
Returns bio text, follower count, recent post captions from search snippets.
|
||||
|
||||
**WHAT DOESN'T WORK:** direct web_extract on instagram.com, ?__a=1 trick,
|
||||
graph.instagram.com (needs OAuth), imginn/picuki/dumpoir/gramhir (403)
|
||||
|
||||
### Instagram Discovery (finding someone's handle)
|
||||
```
|
||||
web_search("{real_name} instagram")
|
||||
web_search("{twitter_handle} instagram account")
|
||||
web_search("site:instagram.com {real_name}")
|
||||
|
||||
# Check their Twitter/X bio for IG links
|
||||
# Check their personal website for social links
|
||||
# Check Linktree / bio.link pages
|
||||
```
|
||||
|
||||
### Extracting Signal from Instagram
|
||||
|
||||
**Profile Picture**: Reveals self-presentation style
|
||||
- Professional headshot vs casual vs meme/avatar
|
||||
- Analyze with vision_analyze for clothing, setting, expression
|
||||
|
||||
**Bio Text**: Compressed self-identity
|
||||
- Role/title claims
|
||||
- Emoji usage patterns
|
||||
- Link destinations
|
||||
- Location claims
|
||||
|
||||
**Post Grid**: Visual identity fingerprint
|
||||
- Color palette tendencies
|
||||
- Content categories (food/travel/tech/selfies/memes)
|
||||
- Posting frequency
|
||||
- Professional vs personal ratio
|
||||
|
||||
**Captions**: Voice sample different from Twitter
|
||||
- Usually longer, more personal
|
||||
- Hashtag usage patterns
|
||||
- Emoji patterns
|
||||
- Tone (inspirational vs casual vs funny)
|
||||
|
||||
**Tagged Photos**: Real social graph
|
||||
- Who they hang out with IRL
|
||||
- Events they attend
|
||||
- Social circles outside tech/AI
|
||||
|
||||
## Visual Identity Analysis
|
||||
|
||||
Use vision tools to analyze HOW someone presents visually:
|
||||
|
||||
### Profile Pictures Across Platforms
|
||||
```
|
||||
# Collect profile pics from multiple platforms
|
||||
# Twitter, Instagram, LinkedIn, GitHub, Discord
|
||||
|
||||
# Analyze each
|
||||
vision_analyze(image_url="{pic_url}",
|
||||
question="Describe this profile picture in detail: person's appearance, clothing style, setting, expression, professional vs casual, any notable elements")
|
||||
|
||||
# Cross-reference: do they use the same pic everywhere? Different personas?
|
||||
```
|
||||
|
||||
### Reverse Image Search (Yandex Pipeline)
|
||||
From memory — Google Lens blocks Browserbase IPs, use Yandex:
|
||||
|
||||
```
|
||||
# For images behind auth/CDN, upload to catbox first
|
||||
terminal("curl -F 'reqtype=fileupload' -F 'fileToUpload=@{local_path}' https://catbox.moe/user/api.php")
|
||||
|
||||
# Then Yandex reverse image search
|
||||
browser_navigate("https://yandex.com/images/search?rpt=imageview&url={encoded_public_url}")
|
||||
|
||||
# Or via web_extract (slower but automatable)
|
||||
web_extract(["https://yandex.com/images/search?rpt=imageview&url={encoded_url}"])
|
||||
```
|
||||
|
||||
Yandex provides:
|
||||
- Similar images (find the same person elsewhere)
|
||||
- Site matches (where this image appears)
|
||||
- OCR text extraction (text in images)
|
||||
- Image tags (what's in the image)
|
||||
- Knowledge panels (identified entities)
|
||||
|
||||
### Screenshot Analysis
|
||||
When you can see a page but can't extract text:
|
||||
```
|
||||
browser_vision(question="Read all text on this page. List usernames, post content, dates, engagement numbers")
|
||||
browser_vision(annotate=true, question="What interactive elements are on this page?")
|
||||
```
|
||||
|
||||
## LinkedIn Intelligence
|
||||
|
||||
**STATUS: BLOCKED for automated access** (tested April 2026).
|
||||
web_extract returns "Website Not Supported". Direct browsing triggers auth walls.
|
||||
|
||||
**Workarounds:**
|
||||
```
|
||||
# LinkedIn content IS indexed by search engines
|
||||
web_search("{real_name} linkedin {company}")
|
||||
web_search("site:linkedin.com/in {name}")
|
||||
# These return snippets with headline, role, company — useful even without full profile
|
||||
|
||||
# Google sometimes caches LinkedIn profiles
|
||||
web_search("{name} site:linkedin.com headline")
|
||||
```
|
||||
|
||||
**METHOD 1 — Google indexed snippets (always works)**
|
||||
```
|
||||
web_search("site:linkedin.com/in {name} {company}")
|
||||
```
|
||||
Returns: name, headline, company, location, connection count, bio snippet.
|
||||
|
||||
**METHOD 2 — Crunchbase (EXCELLENT for founders/execs)**
|
||||
```python
|
||||
web_extract(["https://www.crunchbase.com/person/{slug}"])
|
||||
```
|
||||
Returns: full career history, education, investments, board positions,
|
||||
social links. Best source for professional identity of startup people.
|
||||
|
||||
**METHOD 3 — Corporate press pages**
|
||||
```
|
||||
web_search("{person} {company} site:{company}.com bio OR press")
|
||||
```
|
||||
Official bios from company newsrooms. High quality, curated but factual.
|
||||
|
||||
**METHOD 4 — Third-party aggregators**
|
||||
- RocketReach, SignalHire — job title + company from web_search snippets
|
||||
- rootdata.com — good for crypto/AI people
|
||||
- Crunchbase — best all-round for tech executives
|
||||
|
||||
**METHOD 5 — Paid LinkedIn API wrappers** (if budget allows)
|
||||
- LinkdAPI, Proxycurl: $0.07-0.15 per profile, full structured data
|
||||
- No OAuth needed, just API key
|
||||
|
||||
LinkedIn reveals (from combined methods):
|
||||
- Career trajectory (Crunchbase full history)
|
||||
- Current role and headline (search snippets)
|
||||
- Education (Crunchbase or search snippets)
|
||||
- Professional self-presentation (company bio pages)
|
||||
- Investment/board activity (Crunchbase)
|
||||
|
||||
## Podcast Transcripts (HIGHEST VALUE for voice profiling)
|
||||
|
||||
Podcast interviews are THE gold mine for personality modeling. Hours of
|
||||
unscripted speech, natural conversation, real personality showing through.
|
||||
|
||||
**Discovery:**
|
||||
```
|
||||
web_search("{name} podcast transcript interview")
|
||||
web_search("{name} lex fridman OR tyler cowen OR joe rogan OR dwarkesh")
|
||||
```
|
||||
|
||||
**Extraction — verified working transcript sources:**
|
||||
```python
|
||||
# Lex Fridman (full verbatim transcripts)
|
||||
web_extract(["https://lexfridman.com/EPISODE_URL/transcript"])
|
||||
|
||||
# Conversations with Tyler (Tyler Cowen — full transcripts)
|
||||
web_extract(["https://conversationswithtyler.com/episodes/..."])
|
||||
|
||||
# TED Talks transcripts
|
||||
web_extract(["https://www.ted.com/talks/.../transcript"])
|
||||
|
||||
# Sequoia Capital podcast
|
||||
web_extract(["https://www.sequoiacap.com/podcast/..."])
|
||||
```
|
||||
|
||||
Podcast transcripts reveal:
|
||||
- Natural speech patterns (filler words, pacing, sentence structure)
|
||||
- Unguarded opinions (less curated than tweets)
|
||||
- How they respond to pushback (interviewer challenges)
|
||||
- Humor style in conversation (different from written humor)
|
||||
- Depth of knowledge on specific topics
|
||||
- Personality under pressure
|
||||
|
||||
## YouTube / Video Intelligence
|
||||
|
||||
```
|
||||
web_search("{name} youtube talk keynote interview")
|
||||
web_search("{name} podcast appearance")
|
||||
```
|
||||
|
||||
web_extract on YouTube pages returns rich summaries with attributed quotes.
|
||||
Use youtube-content skill for full transcripts if available.
|
||||
|
||||
## Personal Blogs & Substacks (HIGH VALUE)
|
||||
|
||||
Personal writing is curated self-expression — how someone WANTS to be
|
||||
seen intellectually. Very different signal from social media.
|
||||
|
||||
```
|
||||
web_search("{name} blog substack essay")
|
||||
# Extract full posts
|
||||
web_extract(["https://{blog-url}/"])
|
||||
# Wayback Machine works for archived blog posts
|
||||
web_extract(["https://web.archive.org/web/2024/{blog-url}"])
|
||||
```
|
||||
|
||||
## GitHub Intelligence
|
||||
|
||||
For technical people:
|
||||
|
||||
```
|
||||
web_search("site:github.com {handle}")
|
||||
web_extract(["https://github.com/{handle}"])
|
||||
|
||||
# Issue comments reveal communication style under technical pressure
|
||||
web_search("site:github.com {handle} issue comment")
|
||||
|
||||
# README style reveals documentation personality
|
||||
# Commit messages reveal terseness vs verbosity
|
||||
```
|
||||
|
||||
## General Web Footprint
|
||||
|
||||
```
|
||||
# Personal website / blog
|
||||
web_search("{name} personal website blog about")
|
||||
|
||||
# Conference talks / speaker bios
|
||||
web_search("{name} speaker conference talk bio")
|
||||
|
||||
# News mentions
|
||||
web_search("{name} {company} news interview profile")
|
||||
|
||||
# Academic papers (for researchers)
|
||||
web_search("{name} arxiv paper author")
|
||||
web_search("site:scholar.google.com {name}")
|
||||
|
||||
# Podcast appearances
|
||||
web_search("{name} podcast guest appearance")
|
||||
|
||||
# Forum posts (HN, specific communities)
|
||||
web_search("site:news.ycombinator.com {handle} OR {name}")
|
||||
```
|
||||
|
||||
## Cross-Platform Identity Resolution
|
||||
|
||||
### Handle Mapping Strategy
|
||||
1. Start from known handle (usually Twitter)
|
||||
2. Check bio links — most people link to other platforms
|
||||
3. Search "{known_handle} {platform}" for each platform
|
||||
4. Check personal website for social links
|
||||
5. Reverse image search profile pic to find matching accounts
|
||||
6. Search unique phrases they use across platforms
|
||||
|
||||
### Identity Verification
|
||||
When you find a potential match on another platform:
|
||||
- Same profile picture? (reverse image search)
|
||||
- Same bio keywords?
|
||||
- Same name/handle pattern?
|
||||
- Cross-references (do they mention each other?)
|
||||
- Writing style match?
|
||||
|
||||
## Search Space Narrowing
|
||||
|
||||
### The Jiggle Technique
|
||||
When broad searches return noise, narrow progressively:
|
||||
|
||||
1. **Start broad**: `"{name}" AI`
|
||||
2. **Add role**: `"{name}" {company} {role}`
|
||||
3. **Add context**: `"{name}" {company} {specific_project_or_topic}`
|
||||
4. **Add platform**: `site:{platform} "{name}" {context}`
|
||||
5. **Add time**: `"{name}" {topic} 2025 OR 2026`
|
||||
6. **Quote unique phrases**: if you found a distinctive phrase they use, search for that exact phrase to find more of their content
|
||||
|
||||
### Disambiguation
|
||||
Common names need extra signals:
|
||||
- Add their company/org
|
||||
- Add their specific domain (AI, crypto, etc.)
|
||||
- Use their unique handle as anchor
|
||||
- Search for combinations of their known associates
|
||||
- Use image search to verify you have the right person
|
||||
|
||||
### Signal vs Noise Heuristics
|
||||
- **High signal**: direct quotes, interview transcripts, personal blog posts, long-form content
|
||||
- **Medium signal**: mentions in aggregator sites, conference bios, LinkedIn summaries
|
||||
- **Low signal**: generic news mentions, third-party profiles, directory listings
|
||||
- **Noise**: same-name different person, outdated info (>2 years), scraped/regurgitated content
|
||||
|
||||
## Confidence Calibration
|
||||
|
||||
After full OSINT sweep, rate data quality:
|
||||
|
||||
| Confidence | Data Available | Simulation Quality |
|
||||
|-----------|---------------|-------------------|
|
||||
| 95-100% | 50+ posts, longform, video, visual, cross-platform | Near-perfect voice replication |
|
||||
| 80-94% | 20-50 posts, some longform, basic visual | Very good, occasional educated guesses |
|
||||
| 60-79% | 10-20 posts, mostly short-form | Good general sense, some gaps |
|
||||
| 40-59% | 5-10 posts, limited platforms | Broad strokes only, flag uncertainty |
|
||||
| 20-39% | <5 posts, single platform | Sketch at best, heavy disclaimers |
|
||||
| <20% | Almost nothing found | Decline to simulate, ask user for context |
|
||||
|
||||
## Privacy & Ethics Note
|
||||
|
||||
All research uses publicly available information only. We don't:
|
||||
- Access private/locked accounts
|
||||
- Bypass authentication
|
||||
- Use leaked/hacked data
|
||||
- Dox or expose private information
|
||||
- Simulate in ways designed to deceive or impersonate
|
||||
|
||||
The goal is personality MODELING for creative simulation, grounded in
|
||||
what people choose to share publicly.
|
||||
334
optional-skills/worldsim/references/prediction-engine.md
Normal file
334
optional-skills/worldsim/references/prediction-engine.md
Normal file
@@ -0,0 +1,334 @@
|
||||
# Prediction Engine — Forecasting What Someone Would Say/Do
|
||||
|
||||
Techniques for predicting behavior grounded in superforecasting methodology,
|
||||
behavioral science, and SOTA LLM prediction research.
|
||||
|
||||
## Superforecasting Principles (Tetlock)
|
||||
|
||||
**Honest caveat**: Superforecasting methodology was developed for geopolitical and
|
||||
world-event prediction, not personality simulation. That said, the THINKING TOOLS
|
||||
are genuinely useful here — decomposition prevents lazy pattern-matching, base rates
|
||||
fight overconfidence, and alternative hypotheses prevent single-track predictions.
|
||||
What does NOT transfer cleanly: the calibration precision. When Tetlock says "70%
|
||||
confident," that's backed by thousands of scored predictions. When we say "70%
|
||||
confident" about what @someone would tweet, that's an educated estimate, not a
|
||||
calibrated probability. Use the framework for its rigor, not its false precision.
|
||||
|
||||
Apply these thinking tools when making behavioral predictions:
|
||||
|
||||
### 1. Decomposition (Fermi-ize the Question)
|
||||
Don't ask "What would @person say about X?"
|
||||
Break it down:
|
||||
- What is @person's known position on topics RELATED to X?
|
||||
- What are their values/priorities that X touches on?
|
||||
- What is their emotional register when discussing similar topics?
|
||||
- Who are they likely responding to, and how does that change their tone?
|
||||
- What platform are they on, and how does that shift their behavior?
|
||||
|
||||
### 2. Outside View First (Base Rates)
|
||||
Before considering the specific person, ask:
|
||||
- What would a TYPICAL person in their role/position say about X?
|
||||
- What % of people in their ideological cluster hold position Y on X?
|
||||
- What's the base rate for their type of response (agree/disagree/joke/ignore)?
|
||||
|
||||
### 3. Inside View Second (Case-Specific Adjustment)
|
||||
Now adjust from the base rate using what you ACTUALLY KNOW about them:
|
||||
- Specific past statements on this topic or related topics
|
||||
- Known relationships with people/orgs involved
|
||||
- Personal experiences that would shape their view
|
||||
- Contrarian tendencies (do they predictably go against their cluster?)
|
||||
|
||||
### 4. Confidence Calibration
|
||||
Express predictions with honest uncertainty. **These are rough buckets, not
|
||||
calibrated probabilities. Don't pretend they're more precise than they are.**
|
||||
- **90%+ confident**: They've literally said this before, just rephrased
|
||||
- **70-89%**: Strong pattern match with known positions and voice
|
||||
- **50-69%**: Reasonable inference but could go either way
|
||||
- **30-49%**: Educated guess, limited data
|
||||
- **<30%**: Basically guessing, flag it clearly
|
||||
|
||||
When reporting confidence, prefer plain language over fake precision:
|
||||
"very likely" > "87% probability". The number implies a precision we don't have.
|
||||
|
||||
### 5. Consider Alternative Hypotheses
|
||||
For every prediction, generate at least ONE plausible alternative:
|
||||
- "They'd PROBABLY say X, but they might surprise with Y because Z"
|
||||
- This prevents overconfident single-track predictions
|
||||
|
||||
## The Prediction Pipeline
|
||||
|
||||
### Step 1: Classify the Prediction Type
|
||||
|
||||
| Type | Description | Difficulty |
|
||||
|------|-------------|-----------|
|
||||
| **Position prediction** | What they believe about X | Easiest if data exists |
|
||||
| **Reaction prediction** | How they'd respond to event Y | Medium |
|
||||
| **Voice prediction** | How they'd phrase something | Medium-hard |
|
||||
| **Behavior prediction** | What they'd DO (not just say) | Hardest |
|
||||
| **Interaction prediction** | How they'd respond to specific person | Hard, depends on relationship data |
|
||||
|
||||
### Step 2: Evidence Gathering Protocol
|
||||
|
||||
For each prediction, gather evidence in this order:
|
||||
|
||||
1. **Direct evidence**: Have they addressed this exact topic before?
|
||||
- Search: `"{handle}" "{topic}"` or `"{handle}" "{related_keyword}"`
|
||||
- Weight: HIGHEST
|
||||
|
||||
2. **Analogical evidence**: Have they addressed something similar?
|
||||
- Search: find positions on adjacent topics
|
||||
- Weight: HIGH
|
||||
|
||||
3. **Value evidence**: What values/principles would apply?
|
||||
- Infer from their stated beliefs and consistent positions
|
||||
- Weight: MEDIUM
|
||||
|
||||
4. **Social evidence**: What do their peers/allies think?
|
||||
- People tend to align with their social cluster (but not always)
|
||||
- Weight: LOW-MEDIUM (higher for conformists, lower for contrarians)
|
||||
|
||||
5. **Demographic evidence**: What would someone in their position typically think?
|
||||
- Base rate from role/industry/ideology
|
||||
- Weight: LOWEST (only use as anchor, not conclusion)
|
||||
|
||||
### Step 2b: Contradiction Handling Protocol
|
||||
When evidence conflicts (e.g., person said X in 2024 but Y in 2026):
|
||||
|
||||
1. **Check for genuine change**: Did they explicitly reverse position? Look for
|
||||
"I used to think X but now..." or a clear pivot moment. If so, use the newer
|
||||
position and note the evolution.
|
||||
|
||||
2. **Check for context-dependence**: Did they say X to audience A and Y to audience B?
|
||||
This isn't necessarily dishonesty — people emphasize different facets for different
|
||||
contexts. Note which context your simulation targets and use the matching register.
|
||||
|
||||
3. **Check for nuance collapse**: Maybe they said "X is mostly good with caveats"
|
||||
and later "X has real problems" — these might not actually contradict. Look for
|
||||
the synthesis position.
|
||||
|
||||
4. **When genuinely unresolvable**: Flag it explicitly. "Evidence conflicts on this
|
||||
point — they've argued both sides at different times. Simulating {chosen position}
|
||||
based on {reasoning}, but the alternative is plausible." Don't paper over the
|
||||
contradiction with false confidence.
|
||||
|
||||
5. **Recency default**: When all else fails, weight more recent statements higher.
|
||||
People change, and the most recent position is the best predictor of the next one.
|
||||
|
||||
### Step 3: Generate Prediction
|
||||
|
||||
Using the HumanLLM B = f(P, E) framework:
|
||||
- **P (Person)**: Everything from the dossier — personality, values, voice
|
||||
- **E (Environment)**: The specific context — platform, topic, who's asking,
|
||||
what just happened, social dynamics in play
|
||||
|
||||
Generate the prediction by:
|
||||
1. Setting the base rate (outside view)
|
||||
2. Adjusting for personal specifics (inside view)
|
||||
3. Filtering through their voice profile (how they'd phrase it)
|
||||
4. Applying platform-specific behavior patterns
|
||||
5. Calibrating confidence
|
||||
|
||||
## Memory Curation (The 30-50 Rule)
|
||||
|
||||
Research shows performance PEAKS at 30-50 memory entries then DECLINES.
|
||||
For each person in a simulation, curate memories:
|
||||
|
||||
### What to Include (high signal)
|
||||
- **Signature takes**: Their most characteristic/famous positions (5-10)
|
||||
- **Voice samples**: Real quotes that capture their linguistic style (5-10)
|
||||
- **Relationship data**: Known dynamics with other sim targets (3-5)
|
||||
- **Recent context**: What they've been talking about lately (3-5)
|
||||
- **Formative moments**: Career milestones, public pivots, viral moments (3-5)
|
||||
- **Quirks & tells**: Catchphrases, humor style, pet peeves (3-5)
|
||||
|
||||
### What to Exclude (noise)
|
||||
- Generic biographical facts that don't predict behavior
|
||||
- Old positions they've clearly evolved past
|
||||
- Trivial interactions that don't reveal personality
|
||||
- Secondhand characterizations (what others say about them)
|
||||
- Platform metadata (follower counts, join dates) unless directly relevant
|
||||
|
||||
### Memory Selection Heuristic
|
||||
For each candidate memory entry, ask:
|
||||
**"If I removed this, would the simulation noticeably degrade?"**
|
||||
If no, cut it.
|
||||
|
||||
## Fighting LLM Defaults
|
||||
|
||||
Research shows LLMs have systematic biases in simulation. The fixes below need to be
|
||||
CONCRETE — vague instructions like "be more like them" don't work. You need specific
|
||||
prompting patterns that actually shift the output.
|
||||
|
||||
### Problem: Sycophancy & Over-Agreement
|
||||
LLMs default to agreement and positivity.
|
||||
**Fix**: Don't just note they're contrarian — structure it as a behavioral instruction
|
||||
with evidence:
|
||||
```
|
||||
"In this conversation, {person} disagrees with {other_person} on {topic}. They are
|
||||
noticeably more confrontational than the other speakers. They tend to respond to
|
||||
consensus with skepticism and reframe debates on their own terms. Example from their
|
||||
real posts: '{actual quote where they disagreed with something popular}'"
|
||||
```
|
||||
|
||||
### Problem: Rigid/Polarized Strategies
|
||||
LLMs tend to take extreme positions and hold them rigidly.
|
||||
**Fix**: Provide specific nuance instructions:
|
||||
```
|
||||
"In this conversation, {person} holds a complex position on {topic}: they agree with
|
||||
{point A} but push back on {point B}. They're the type to say 'yes, but...' rather
|
||||
than 'no.' Real example of their nuance: '{quote showing them holding a both-and
|
||||
position}'"
|
||||
```
|
||||
|
||||
### Problem: Uniform Register
|
||||
LLMs default to a similar educated-casual tone for everyone.
|
||||
**Fix**: Anchor voice with REAL QUOTES and explicit comparative instructions:
|
||||
```
|
||||
"In this conversation, {person} is noticeably more {trait} than the other speakers.
|
||||
They tend to {specific behavior pattern}. Their sentences are typically {length/style}.
|
||||
They {do/don't} use emoji. Their humor style is {type}. Example from their real posts:
|
||||
'{actual quote that captures their voice}'"
|
||||
```
|
||||
The more you can say "{person} does THIS while {other_person} does THAT," the better
|
||||
the differentiation. Comparative framing outperforms absolute descriptions.
|
||||
|
||||
### Problem: Overly Structured Responses
|
||||
LLMs love neat arguments with clear structure.
|
||||
**Fix**: Provide explicit structural anti-patterns:
|
||||
```
|
||||
"When generating {person}'s messages, break conventional structure. They start one
|
||||
thought and jump to another mid-sentence. They use '...' and '—' instead of periods.
|
||||
They repeat words for emphasis. They don't conclude neatly. Example: '{real quote
|
||||
showing their chaotic structure}'"
|
||||
```
|
||||
|
||||
### Problem: Missing Mundane Behavior
|
||||
LLMs focus on "interesting" responses, skip boring/mundane ones.
|
||||
**Fix**: Explicitly instruct for mundane moments:
|
||||
```
|
||||
"Not every message from {person} needs to be insightful. Include at least 1-2 messages
|
||||
that are just reactions ('lmao', 'this', 'wait what'), link shares without commentary,
|
||||
or brief agreements. Real people don't craft every message. {person} specifically tends
|
||||
to {their specific mundane behavior pattern, e.g., 'drop a single emoji reaction'
|
||||
or 'just retweet without comment'}."
|
||||
```
|
||||
|
||||
### General Principle for All Fixes
|
||||
The pattern is always: **behavioral instruction + comparative framing + real evidence**.
|
||||
- "Do X" alone doesn't work well
|
||||
- "Do X, unlike the default of Y" works better
|
||||
- "Do X, unlike the default of Y, as evidenced by this real quote: Z" works best
|
||||
|
||||
## The Adjective-Based Personality Method
|
||||
|
||||
70 bipolar adjective pairs for Big Five traits. Select 3 per trait
|
||||
with intensity modifiers.
|
||||
|
||||
### Openness
|
||||
High: creative, curious, imaginative, artistic, adventurous, intellectual,
|
||||
unconventional, perceptive
|
||||
Low: conventional, practical, traditional, routine-oriented, narrow
|
||||
|
||||
### Conscientiousness
|
||||
High: organized, disciplined, reliable, meticulous, systematic, thorough,
|
||||
goal-oriented, persistent
|
||||
Low: careless, impulsive, disorganized, spontaneous, flexible, relaxed
|
||||
|
||||
### Extraversion
|
||||
High: outgoing, talkative, energetic, assertive, enthusiastic, bold,
|
||||
gregarious, dominant
|
||||
Low: reserved, quiet, introverted, solitary, withdrawn, reflective
|
||||
|
||||
### Agreeableness
|
||||
High: cooperative, trusting, empathetic, generous, accommodating, kind,
|
||||
diplomatic, forgiving
|
||||
Low: competitive, skeptical, blunt, confrontational, critical, stubborn,
|
||||
independent-minded
|
||||
|
||||
### Neuroticism
|
||||
High: anxious, moody, sensitive, reactive, volatile, self-conscious,
|
||||
insecure, emotional
|
||||
Low: calm, stable, resilient, confident, even-tempered, composed,
|
||||
thick-skinned
|
||||
|
||||
### Usage
|
||||
For each simulated person, after OSINT research, estimate their Big Five
|
||||
profile and select appropriate adjectives:
|
||||
|
||||
Example: "@basedjensen: very creative, somewhat impulsive, very outgoing,
|
||||
a bit competitive, calm" → this shapes the generation toward the right
|
||||
behavioral profile.
|
||||
|
||||
## Interaction Dynamics Prediction
|
||||
|
||||
When simulating conversations between multiple people, remember that predictions
|
||||
apply to a SPECIFIC REGISTER. See the next section on performative vs. authentic
|
||||
behavior.
|
||||
|
||||
## Performative vs. Authentic Behavior
|
||||
|
||||
**Critical concept**: People act differently for different audiences. A simulation
|
||||
must be explicit about which register it's targeting.
|
||||
|
||||
### The Register Spectrum
|
||||
- **Public broadcast** (tweets, Reddit posts): Most performative. People are
|
||||
playing to their audience, building their brand, signaling to their tribe.
|
||||
- **Semi-public** (Discord channels, group chats, comment threads): Less
|
||||
performative but still audience-aware. People are more casual but know
|
||||
others are watching.
|
||||
- **Private 1-on-1** (DMs): Much less performative. More honest, more
|
||||
vulnerable, more willing to express doubt or uncertainty.
|
||||
- **True private** (inner monologue, close friends): We have almost no data
|
||||
on this. Don't pretend to simulate it.
|
||||
|
||||
### Practical implications
|
||||
- When simulating a PUBLIC thread, lean into the person's public persona —
|
||||
their brand, their usual takes, their audience-aware voice.
|
||||
- When simulating DMs, dial down the performance. More hedging, more honesty,
|
||||
more "I actually think..." vs. the public "Here's my take:".
|
||||
- When evidence comes from one register but the simulation targets another,
|
||||
FLAG IT: "Evidence is from public tweets but simulating DM behavior —
|
||||
expect the real person to be less {polished/aggressive/confident} in private."
|
||||
- Someone's Twitter persona may be genuinely different from their Reddit persona.
|
||||
These are not interchangeable data sources. Weight evidence from the matching
|
||||
platform higher.
|
||||
|
||||
### What we can't know
|
||||
Be honest: we're simulating public figures based on their public output. The
|
||||
private person may be substantially different. DM simulations are inherently
|
||||
lower-confidence than public thread simulations because we have less data on
|
||||
how people behave privately.
|
||||
|
||||
### Dominance Hierarchy
|
||||
- Who talks first? (most confident/highest-status usually)
|
||||
- Who responds to whom? (not everyone talks to everyone)
|
||||
- Who gets ratio'd? (lowest-status takes get challenged)
|
||||
- Who lurks? (some people watch before engaging)
|
||||
|
||||
### Agreement/Disagreement Prediction
|
||||
Based on known positions + social dynamics:
|
||||
- **Strong agree**: Both have stated similar positions + friendly relationship
|
||||
- **Agree with nuance**: Similar positions but one adds a caveat
|
||||
- **Productive disagreement**: Different positions + mutual respect
|
||||
- **Hostile disagreement**: Different positions + existing tension/rivalry
|
||||
- **Surprising agreement**: Expected to disagree but find common ground
|
||||
- **Ignore**: Some people just don't engage with certain others
|
||||
|
||||
### Conversation Flow Prediction
|
||||
Real conversations follow patterns:
|
||||
1. **Opener** → most active/impulsive person posts first
|
||||
2. **First response** → most engaged/relevant person responds
|
||||
3. **Pile-on or pushback** → depends on agreement/disagreement dynamics
|
||||
4. **Tangent** → someone takes a side thread
|
||||
5. **Peak moment** → the best/most viral exchange
|
||||
6. **Trail off** → energy dissipates, last person makes a joke or short comment
|
||||
|
||||
## Scenario Injection Prediction
|
||||
|
||||
When "inject: {event}" is used, predict reactions:
|
||||
|
||||
1. **Who would see this first?** (most online / most relevant to their work)
|
||||
2. **Who would care most?** (most affected / strongest opinion)
|
||||
3. **What's the emotional valence?** (good news for some, bad for others)
|
||||
4. **What's the expected take?** (apply position prediction pipeline)
|
||||
5. **How does this change the existing conversation?** (derail, amplify, redirect)
|
||||
@@ -0,0 +1,237 @@
|
||||
# Recursive Self-Improvement Pipeline
|
||||
|
||||
The simulator should get better every time it runs. Not through training —
|
||||
through accumulating failure patterns, calibration data, and learned rules
|
||||
that feed back into future simulations.
|
||||
|
||||
## The Loop
|
||||
|
||||
```
|
||||
SIMULATE → VERIFY (mechanical) → SCORE → LOG FAILURES → UPDATE RULES → SIMULATE BETTER
|
||||
```
|
||||
|
||||
Each run produces two outputs:
|
||||
1. The simulation (for the user)
|
||||
2. A failure log (for the system)
|
||||
|
||||
The failure log feeds back into the next run's verification step,
|
||||
making the checklist grow and the blind spots shrink.
|
||||
|
||||
## What Gets Logged After Every Simulation
|
||||
|
||||
### 1. Mechanical Check Failures
|
||||
```
|
||||
FAILURE LOG: simulation_{timestamp}
|
||||
EMOJI: @visakanv had 6 fabricated emoji, real rate was 10%. Stripped all.
|
||||
SLOP: @eigenrobot utterance contained "multifaceted" — rewritten.
|
||||
LENGTH: @QiaochuYuan avg 42 words/utterance, real avg was 18. Compressed.
|
||||
CAPS: 4/12 utterances started uppercase, targets are 90% lowercase. Fixed.
|
||||
PUNCTUATION: Added periods to @tszzl who never uses terminal punctuation.
|
||||
STRUCTURE: Sycophantic flow detected — B agreed with A then C agreed with B.
|
||||
Injected disagreement.
|
||||
```
|
||||
|
||||
### 2. Discriminator Critique Patterns
|
||||
```
|
||||
CRITIQUE LOG:
|
||||
Round 1: @tszzl too verbose (flagged 2x in last 3 simulations)
|
||||
Round 1: @repligate too academic (flagged 3x — this is a persistent pattern)
|
||||
Round 2: Conversation too neat — real conversations are messier (flagged 5x)
|
||||
```
|
||||
|
||||
### 3. Held-Out Test Results
|
||||
```
|
||||
CALIBRATION LOG:
|
||||
Voice fidelity: 8.4/10 (up from 7.5 last run)
|
||||
Topic prediction: 2/5 topics matched (typical — content is unpredictable)
|
||||
Register match: 9/10 (improved after emoji fix)
|
||||
```
|
||||
|
||||
## How Failures Feed Forward
|
||||
|
||||
### Pattern Accumulation
|
||||
After N runs, persistent failure patterns become AUTOMATIC rules:
|
||||
|
||||
```
|
||||
IF a pattern is flagged in 3+ consecutive simulations:
|
||||
PROMOTE it from "check" to "pre-generation rule"
|
||||
|
||||
Example progression:
|
||||
Run 1: "Too verbose for @tszzl" → flagged in Round 1, fixed
|
||||
Run 2: "Too verbose for @tszzl" → flagged again, fixed again
|
||||
Run 3: "Too verbose for @tszzl" → PROMOTED to pre-gen rule:
|
||||
"When simulating roon-type voices: max 20 words per tweet.
|
||||
Fragment > sentence. Compress ruthlessly."
|
||||
```
|
||||
|
||||
### The Growing Checklist
|
||||
The mechanical verification checklist starts with the baseline checks
|
||||
(emoji, slop, length, caps, punctuation) and GROWS with each failure:
|
||||
|
||||
```
|
||||
BASELINE CHECKS (permanent):
|
||||
□ Emoji frequency match
|
||||
□ Slop word scan (Tier 1/2/3)
|
||||
□ Sentence length match
|
||||
□ Capitalization match
|
||||
□ Punctuation pattern match
|
||||
□ Reply/original ratio
|
||||
□ Structural slop patterns
|
||||
|
||||
LEARNED CHECKS (accumulated from past failures):
|
||||
□ Roon-type voices: max 20 words (from: verbose failure x3)
|
||||
□ Warm personalities: do NOT add emoji (from: emoji inflation x5)
|
||||
□ Academic voices: ground in specific examples (from: too abstract x3)
|
||||
□ Conversations: inject at least one disagreement (from: sycophantic flow x4)
|
||||
□ Self-deprecating voices: add hedging (from: too assertive x2)
|
||||
□ Shitposters: include at least one non-sequitur (from: too on-topic x2)
|
||||
```
|
||||
|
||||
### Where To Store Learned Rules
|
||||
Append to the skill itself. After each simulation run where the mechanical
|
||||
checks catch something, the agent should ask:
|
||||
|
||||
"The mechanical verification caught {failures}. Should I add these as
|
||||
permanent learned rules for future simulations?"
|
||||
|
||||
If the same failure appears 3+ times, add it automatically without asking.
|
||||
|
||||
Use skill_manage(action='patch') to append to this file's "Learned Checks"
|
||||
section below.
|
||||
|
||||
## Calibration Tracking
|
||||
|
||||
### Per-Person Calibration Memory
|
||||
After simulating someone, store the calibration data:
|
||||
|
||||
```
|
||||
@tszzl: voice=8.5, emoji_rate=0%, avg_words=14, lowercase=95%,
|
||||
signature_move="aphoristic fragments", danger="goes verbose"
|
||||
@nickcammarata: voice=8.8, emoji_rate=0%, avg_words=19, lowercase=90%,
|
||||
signature_move="meditation-ML connection", danger="too structured"
|
||||
```
|
||||
|
||||
If the same person is simulated again, LOAD this calibration to skip
|
||||
the cold-start problems. The second simulation of someone should be
|
||||
better than the first because you already know their failure modes.
|
||||
|
||||
### Aggregate Calibration
|
||||
Track overall simulation quality across runs:
|
||||
|
||||
```
|
||||
Run 1: pre-refine 7.5, post-refine 8.4 (delta +0.9)
|
||||
Run 2: pre-refine 8.37, post-refine 8.53 (delta +0.16)
|
||||
Run 3: pre-refine 8.53, post-refine 8.83 (delta +0.30, emoji fix)
|
||||
```
|
||||
|
||||
The pre-refine score should INCREASE over time as learned rules prevent
|
||||
repeat failures. If it's not increasing, the learning loop is broken.
|
||||
|
||||
## The Standard: Indistinguishable From Real
|
||||
|
||||
The target is not "good enough." The target is: mix simulated posts with
|
||||
real posts and a human familiar with the person cannot reliably tell which
|
||||
is which. That's 50% accuracy on a blind comparison — random chance.
|
||||
|
||||
Every mechanical check, every discriminator round, every learned rule
|
||||
exists to push toward that standard. If something doesn't serve that
|
||||
goal, it's wasted effort.
|
||||
|
||||
## Current Learned Checks (append here after each run)
|
||||
|
||||
### From TPOT Simulation Run 1 (April 2026)
|
||||
- Warm/enthusiastic personalities (visakanv-type): do NOT add decorative emoji.
|
||||
Bio emoji ≠ tweet emoji. Actual emoji rate for "warm" TPOT posters: <15%.
|
||||
PROMOTED after being caught by user, not by discriminator (discriminator failure).
|
||||
- Conversation flow: pure agreement chains are instruct-model slop.
|
||||
Real threads have at least one moment of friction, misunderstanding, or deflection.
|
||||
- Academic-leaning voices (repligate-type): ground claims in specific experiments,
|
||||
transcripts, or model behaviors they've personally observed. Generic philosophical
|
||||
language without specifics = slop, even if it sounds smart.
|
||||
- Self-deprecating voices (QC-type): hedge more. "i think" "i'm not sure" "it feels like."
|
||||
Instruct models are too assertive even when simulating tentative people.
|
||||
- Fragment voices (roon-type): max 15-20 words. No conjunctions. No paragraphs.
|
||||
If it reads like a complete thought, it's too complete for a fragment-poster.
|
||||
|
||||
### From TPOT Simulation Run 2 (April 2026)
|
||||
- Reframer voices (nosilverv-type): avg ~16 words. Split multi-sentence takes
|
||||
into separate tweets. The compression IS the voice. 113% over-length caught
|
||||
by mechanical check that subjective scoring rated 8/10. Trust the numbers.
|
||||
- Rare-poster voices (selentelechia-type): in a 12-post sim, give them 2-3 turns
|
||||
max. When they speak it must LAND. Short crystallizations > long analysis.
|
||||
"or a shared meal" was the highest-rated line at 3 words.
|
||||
- Turn symmetry: ALWAYS check. 4/4/4 is instruct-model default. Real conversations
|
||||
have one person dominating (5), one lurking (3), others in between.
|
||||
- Verbose bias is the #1 mechanical failure. ALWAYS check avg word count against
|
||||
real baseline BEFORE subjective scoring. Every run so far has caught over-length
|
||||
that subjective scoring missed.
|
||||
- RHETORICAL POLISH IS SLOP. Caught post-mechanical-pass in Run 2 review.
|
||||
Parallel antithesis ("The most X... The most Y..."), "Not X, not Y, but Z",
|
||||
"Show me X and I'll show you Y", clean 4-step escalations, academic vocabulary
|
||||
in casual voice — ALL passed mechanical checks but are still obviously LLM.
|
||||
PROMOTED TO MECHANICAL SCAN: now regex-scannable alongside slop words.
|
||||
- THE BANGER PROBLEM: every simulated tweet was screenshot-worthy. Real feeds
|
||||
are 70% mid. Must include throwaway responses ("lol" "hmm" "fair" "wait actually").
|
||||
PROMOTED: banger check is now mandatory in mechanical verification.
|
||||
|
||||
### From TPOT Simulation Run 3 — Star Thread Discovery (April 2026)
|
||||
- STAR THREAD IS THE KEY. Dossier-first generation produces surface-accurate
|
||||
but dead output. Star-thread-first generation produces messy, alive output
|
||||
that actually sounds like the person. Generate from the thread. Verify with data.
|
||||
- Rhetorical polish vanished once generation came from "what is this person DOING"
|
||||
rather than "what would this person SAY." Reframers reframe. Conveners convene.
|
||||
Distillers distill. The VERB drives the voice, not the adjectives.
|
||||
- People in conversation REFERENCE EACH OTHER BY NAME. Tyler says "Bosco always
|
||||
comes in with the three word version." This is obvious but the dossier approach
|
||||
never produced it because it models each person in isolation.
|
||||
- PROMOTED: star thread is now the FIRST entry in every dossier. Before voice
|
||||
profile, before psychometrics, before everything else. It's the generation seed.
|
||||
Everything else is verification.
|
||||
|
||||
### Operational Findings (verified April 2026)
|
||||
- X API bearer token: 10K tweets/15min, 300 profiles/15min, 450 searches/15min.
|
||||
Most generous rate limits. Always use as primary source.
|
||||
- Threads.NET → Threads.COM redirect. Always use -L flag or .com directly.
|
||||
Previous test saying "no OG tags" was WRONG — tags exist, domain was wrong.
|
||||
- Instagram private API: i.instagram.com + mobile UA + x-ig-app-id: 936619743392459.
|
||||
Returns full JSON with 12 posts. No auth needed. CDN image URLs work for vision_analyze.
|
||||
- Facebook: Googlebot UA trick works for public pages. Returns name, bio, likes (121M for zuck).
|
||||
Normal UA and mobile variants all redirect to login wall.
|
||||
- TikTok: stats are in __UNIVERSAL_DATA_FOR_REHYDRATION__ JSON at path
|
||||
__DEFAULT_SCOPE__.webapp.user-detail.userInfo.statsV2 (use statsV2 not stats).
|
||||
- Bluesky searchPosts returns 403 from datacenter IPs. Workaround: searchActors + getAuthorFeed.
|
||||
- nitter.cz is the ONLY working nitter instance (via web_extract, not curl).
|
||||
- Reddit JSON API requires User-Agent header or returns 429.
|
||||
- GEPA native had `max_steps` API mismatch with DSPy 3.1.3. MIPROv2 fallback works.
|
||||
hermes-agent-self-evolution config: max_skill_size bumped to 20_000 for worldsim-class skills.
|
||||
- hermes-agent-self-evolution is at ~/.hermes/hermes-agent-self-evolution/ with .venv.
|
||||
Must export API keys from ~/.hermes/.env before running.
|
||||
- Podcast transcripts (Lex Fridman, Tyler Cowen, TED) are the HIGHEST VALUE source
|
||||
for voice profiling. Hours of unscripted speech > thousands of tweets.
|
||||
|
||||
### From Simulation Run 4 — Engine Mode + Profile Command (April 2026)
|
||||
- ENGINE MODE: When worldsim is active, ZERO assistant personality leaks.
|
||||
No kawaii, no markdown, no chatty commentary between phases. Every token
|
||||
is simulation fidelity. First attempt leaked personality; user corrected.
|
||||
PROMOTED TO PERMANENT RULE in SKILL.md.
|
||||
- X API CURL > NITTER for voice calibration. nitter.cz returns 502 or "user
|
||||
not found" unpredictably. Direct curl to X API v2 with bearer token returns
|
||||
full text + metrics. 3 pages (90 tweets) is enough for fidelity 100. Always
|
||||
use this as PRIMARY voice source, nitter as supplement only.
|
||||
- CAPS BURST PATTERN: some voices (karan4d-type) use lowercase default with
|
||||
sporadic ALL CAPS for excitement ("WAZZAAAAAAPPPP", "LAWDAMERCYYYYY",
|
||||
"AWOOGA"). This is distinct from consistent-lowercase (tenobrus-type) and
|
||||
sentence-case (somewheresy-type). Capture this in voice profile as a
|
||||
three-way distinction: lowercase-default, caps-burst, sentence-case.
|
||||
- TEXT EMOTICONS vs EMOJI: karan4d uses :) >.< ~ but almost zero standard
|
||||
emoji. This is a distinct expressiveness mode from zero-emoji (tenobrus)
|
||||
and sparse-emoji. Include text emoticon inventory in voice profile.
|
||||
- STAR THREAD 5/5 TEST is mandatory for profile command. Write the thread,
|
||||
then test it against 5 real posts with explicit reasoning per post. If
|
||||
fewer than 4/5 fit, the thread is wrong — keep looking. Show the work.
|
||||
- PROFILE OUTPUT: star thread → voice profile (caps, punctuation, word count,
|
||||
emoji/emoticon inventory, vocabulary, register, threading behavior) →
|
||||
psychometrics (Big Five, Moral Foundations, cognitive style) → key positions
|
||||
(with dates and real tweet quotes) → ecosystem (inner circle, professional,
|
||||
cultural) → intelligence tradecraft (key assumptions, red hat, deception
|
||||
detection, competing hypotheses) → invalidation indicators → source reliability.
|
||||
278
optional-skills/worldsim/references/search-strategies.md
Normal file
278
optional-skills/worldsim/references/search-strategies.md
Normal file
@@ -0,0 +1,278 @@
|
||||
# Search Strategies — Finding Anyone Across Platforms
|
||||
|
||||
The hardest part of simulation is building an accurate model of a real person. This doc
|
||||
covers how to systematically discover and profile someone across every platform we care about.
|
||||
|
||||
## General Principles
|
||||
|
||||
1. **Start broad, go narrow.** First establish WHO they are, then drill into HOW they talk.
|
||||
2. **Cross-reference.** Someone's Reddit persona may differ wildly from their Twitter persona. That's signal, not noise.
|
||||
3. **Recency matters.** People's views evolve. Weight recent posts (last 6 months) over older ones.
|
||||
4. **Interactions > monologues.** How someone replies reveals more about their voice than their prepared posts.
|
||||
5. **Controversy is gold.** People are most themselves when arguing. Search for debates and disagreements.
|
||||
|
||||
## Platform-Specific Discovery
|
||||
|
||||
### X / Twitter
|
||||
|
||||
Twitter is the richest source for most public figures in tech/AI. Multiple approaches:
|
||||
|
||||
#### With x-cli (if API keys available)
|
||||
```bash
|
||||
# Recent timeline — best single source of voice data
|
||||
x-cli user timeline {handle} --max 30 -j
|
||||
|
||||
# Their replies — how they interact, argue, joke
|
||||
x-cli tweet search "from:{handle}" --max 30 -j
|
||||
|
||||
# What others say about/to them
|
||||
x-cli tweet search "to:{handle}" --max 20 -j
|
||||
|
||||
# On specific topics
|
||||
x-cli tweet search "from:{handle} open source" --max 10 -j
|
||||
```
|
||||
|
||||
#### Without API (web_search + web_extract)
|
||||
```
|
||||
# Identity + role
|
||||
web_search("{handle} twitter bio role company")
|
||||
|
||||
# Voice + opinions
|
||||
web_search("{handle} twitter hot takes opinions")
|
||||
web_search("site:x.com {handle}")
|
||||
|
||||
# Topic-specific positions
|
||||
web_search("{handle} twitter {topic}")
|
||||
web_search("{handle} {topic} opinion take")
|
||||
|
||||
# Interviews / longform (reveals deeper thinking)
|
||||
web_search("{handle} interview podcast AI")
|
||||
web_search("{handle} blog post essay")
|
||||
|
||||
# Beefs and debates (reveals personality under pressure)
|
||||
web_search("{handle} twitter debate disagree controversial")
|
||||
web_search("{handle} vs {other_person}")
|
||||
|
||||
# Newsletter aggregators that index tweets
|
||||
web_search("site:buttondown.com/ainews {handle}")
|
||||
web_search("site:news.smol.ai {handle}")
|
||||
web_search("site:techmeme.com {handle}")
|
||||
web_search("site:latent.space {handle}")
|
||||
```
|
||||
|
||||
#### AI Twitter Aggregator Sites (high value)
|
||||
These sites index AI Twitter conversations daily:
|
||||
- `buttondown.com/ainews` — swyx's AI News, indexes hundreds of AI Twitter accounts
|
||||
- `news.smol.ai` — smol AI news aggregator
|
||||
- `techmeme.com` — tech news, includes tweet citations
|
||||
- `latent.space` — AI podcast/newsletter with Twitter references
|
||||
|
||||
Search pattern: `site:{aggregator} "{handle}"` to find indexed tweets and discussions.
|
||||
|
||||
#### IMPORTANT: web_extract does NOT work on x.com
|
||||
web_extract returns "Website Not Supported" for all x.com/twitter.com URLs.
|
||||
Do NOT attempt it — it wastes a tool call every time.
|
||||
|
||||
#### Verified Fallback Access Methods (tested April 2026)
|
||||
|
||||
**PRIMARY: X API v2 Bearer Token** (confirmed working)
|
||||
- Profiles, timelines, search — 300-10K requests/15min
|
||||
- See scripts/x_api.py
|
||||
|
||||
**FALLBACK 1: nitter.cz via web_extract** (WORKS)
|
||||
```
|
||||
web_extract(["https://nitter.cz/{handle}"])
|
||||
```
|
||||
Returns full profile + recent timeline. Direct curl gets Cloudflare-blocked
|
||||
but web_extract bypasses it. Rich data: bio, stats, pinned tweets, full text.
|
||||
NOTE: Most other nitter instances are DEAD (nitter.net, xcancel.com, etc.)
|
||||
|
||||
**FALLBACK 2: ThreadReaderApp** (WORKS — excellent for historical threads)
|
||||
```
|
||||
web_extract(["https://threadreaderapp.com/user/{handle}"])
|
||||
```
|
||||
Returns unrolled historical threads with full text. Found threads back to 2023.
|
||||
Gold for longform voice samples.
|
||||
|
||||
**FALLBACK 3: GitHub API** (WORKS — excellent for tech people)
|
||||
```
|
||||
curl -s https://api.github.com/users/{handle}
|
||||
curl -s https://api.github.com/users/{handle}/repos?sort=updated
|
||||
curl -s https://api.github.com/users/{handle}/events
|
||||
curl -s https://api.github.com/users/{handle}/gists
|
||||
```
|
||||
No auth needed (60 req/hr). Profile READMEs are voice profiling gold.
|
||||
Events API shows recent activity with comment text.
|
||||
|
||||
**FALLBACK 4: Reddit JSON API** (WORKS)
|
||||
```
|
||||
curl -s -H 'User-Agent: hermes-sim/1.0' 'https://www.reddit.com/user/{username}.json'
|
||||
curl -s -H 'User-Agent: hermes-sim/1.0' 'https://www.reddit.com/user/{username}/comments.json'
|
||||
curl -s -H 'User-Agent: hermes-sim/1.0' 'https://www.reddit.com/r/{sub}/search.json?q={query}&restrict_sr=on'
|
||||
```
|
||||
MUST include User-Agent header or get 429. Reddit voice is often more
|
||||
candid/detailed than Twitter voice — high value for personality profiling.
|
||||
|
||||
**FALLBACK 5: HackerNews Algolia API** (WORKS — fully open)
|
||||
```
|
||||
curl -s 'https://hn.algolia.com/api/v1/search?query={name}&tags=comment'
|
||||
```
|
||||
No auth, no rate limits visible. Great for finding what others say about
|
||||
someone + their own HN comments if they have an account.
|
||||
|
||||
**FALLBACK 6: YouTube via web_extract** (WORKS)
|
||||
Search for interviews/talks, then web_extract the video pages.
|
||||
Returns rich summaries with attributed quotes from specific speakers.
|
||||
|
||||
**NOT VIABLE** (tested, confirmed blocked):
|
||||
- Google Cache of Twitter → empty results
|
||||
- Wayback Machine for tweets → sparse captures, no JS content
|
||||
- Twitter Syndication API → rate limited / broken
|
||||
- All Instagram viewers (imginn, picuki, dumpoir, gramhir) → 403
|
||||
- LinkedIn → fully blocked for scraping
|
||||
- Archive.today → rate limited + CAPTCHA
|
||||
- Most nitter instances → dead or 403
|
||||
|
||||
#### Best approach without x-cli
|
||||
The most reliable path is: web_search with aggregator sites (ainews, smol.ai,
|
||||
techmeme, latent.space). These index AI Twitter daily and return actual tweet
|
||||
text in search descriptions. Stack multiple aggregator searches to build a
|
||||
composite picture. This was validated in practice — it returns enough signal
|
||||
to build solid dossiers for anyone active in AI Twitter.
|
||||
|
||||
### Reddit
|
||||
|
||||
Reddit profiles are public and indexable. Reddit users often have very different
|
||||
personas from their Twitter selves — more detailed, more argumentative, more honest.
|
||||
|
||||
```
|
||||
# Find their Reddit username (often different from Twitter)
|
||||
web_search("{real_name} reddit account")
|
||||
web_search("{twitter_handle} reddit username")
|
||||
|
||||
# Profile and post history
|
||||
web_search("site:reddit.com/user/{reddit_username}")
|
||||
web_search("site:reddit.com {reddit_username} {topic}")
|
||||
|
||||
# Subreddit-specific behavior
|
||||
web_search("site:reddit.com/r/LocalLLaMA {username}")
|
||||
web_search("site:reddit.com/r/MachineLearning {username}")
|
||||
|
||||
# Extract actual posts
|
||||
web_extract(["https://www.reddit.com/user/{username}/comments/"])
|
||||
web_extract(["https://www.reddit.com/user/{username}/submitted/"])
|
||||
```
|
||||
|
||||
Key subreddits for AI people:
|
||||
- r/LocalLLaMA — open source LLM community
|
||||
- r/MachineLearning — academic ML
|
||||
- r/singularity — AGI speculation
|
||||
- r/ChatGPT, r/ClaudeAI, r/OpenAI — product-focused
|
||||
- r/StableDiffusion — image gen community
|
||||
|
||||
### Discord
|
||||
|
||||
Discord is hardest — most servers aren't publicly indexed. Strategies:
|
||||
|
||||
```
|
||||
# Find what servers they're in
|
||||
web_search("{name} discord server")
|
||||
web_search("{name} discord community")
|
||||
|
||||
# Some Discord logs are public via indexers
|
||||
web_search("site:discordchats.net {username}")
|
||||
|
||||
# AI News indexes some Discord channels
|
||||
web_search("site:buttondown.com/ainews discord {name}")
|
||||
```
|
||||
|
||||
Discord personality notes:
|
||||
- People are MUCH more casual on Discord than Twitter
|
||||
- More profanity, more shitposting, more stream-of-consciousness
|
||||
- Server context matters hugely (same person behaves differently in different servers)
|
||||
- Harder to research but very valuable if you can find logs
|
||||
|
||||
### Blogs / Newsletters / Long-form
|
||||
|
||||
These reveal deeper thinking that tweets can't capture:
|
||||
|
||||
```
|
||||
web_search("{name} blog substack medium")
|
||||
web_search("{name} essay AI opinion")
|
||||
web_search("{name} substack newsletter")
|
||||
|
||||
# Personal sites
|
||||
web_search("{name} personal website about")
|
||||
|
||||
# Extract full posts
|
||||
web_extract(["https://{their-substack}.substack.com/"])
|
||||
```
|
||||
|
||||
### YouTube / Podcasts
|
||||
|
||||
Interview appearances reveal speaking style, humor, and unscripted thinking:
|
||||
|
||||
```
|
||||
web_search("{name} podcast interview AI YouTube")
|
||||
web_search("{name} YouTube talk presentation")
|
||||
|
||||
# Use youtube-content skill if available to pull transcripts
|
||||
```
|
||||
|
||||
### GitHub
|
||||
|
||||
For technical people, their GitHub activity reveals priorities and communication style:
|
||||
|
||||
```
|
||||
web_search("site:github.com {username} issues comments")
|
||||
web_search("site:github.com {username}")
|
||||
|
||||
# Issue comments and PR reviews show how they communicate technically
|
||||
web_extract(["https://github.com/{username}"])
|
||||
```
|
||||
|
||||
## Cross-Platform Identity Resolution
|
||||
|
||||
People use different handles across platforms. Resolution strategies:
|
||||
|
||||
1. **Bio links**: Twitter bios often link to personal sites with other handles
|
||||
2. **Name search**: `web_search("{real_name} {platform}")`
|
||||
3. **Email/domain**: personal domains often connect identities
|
||||
4. **Aggregator profiles**: sites like Linktree, bio.link collect handles
|
||||
5. **Conference talks**: speaker bios list multiple handles
|
||||
6. **Direct search**: `web_search("{twitter_handle} reddit OR github OR discord")`
|
||||
|
||||
## Confidence Scoring
|
||||
|
||||
After research, rate confidence for each person:
|
||||
|
||||
- **HIGH (80-100%)**: 20+ indexed tweets/posts found, clear voice patterns, known positions on multiple topics, interviews/longform available
|
||||
- **MEDIUM (50-79%)**: 5-20 indexed posts, general voice sense but some gaps, positions on some topics unclear
|
||||
- **LOW (20-49%)**: <5 posts found, voice is guesswork, mostly inferring from role/org
|
||||
- **INSUFFICIENT (<20%)**: can't find enough to simulate accurately. Tell the user.
|
||||
|
||||
Always be honest about confidence. A low-confidence simulation should be flagged as such.
|
||||
|
||||
## Research Optimization
|
||||
|
||||
For fidelity levels:
|
||||
|
||||
**Low (1-30)**: 2 searches per person max
|
||||
- web_search("{handle} twitter") — identity
|
||||
- web_search("{handle} {topic}") — position on topic if specified
|
||||
|
||||
**Medium (31-70)**: 4-6 searches per person
|
||||
- Identity search
|
||||
- Voice/opinions search
|
||||
- Topic-specific search
|
||||
- One aggregator site search
|
||||
- Optional: one web_extract on a blog/interview
|
||||
|
||||
**High (71-100)**: 8-12+ searches per person
|
||||
- All medium searches
|
||||
- Multiple aggregator sites
|
||||
- web_extract on 2-3 longform pieces
|
||||
- Cross-platform search (Reddit, GitHub)
|
||||
- Debate/controversy search
|
||||
- Recent vs historical position comparison
|
||||
- Browser fallback if needed
|
||||
359
optional-skills/worldsim/references/simulation-engine.md
Normal file
359
optional-skills/worldsim/references/simulation-engine.md
Normal file
@@ -0,0 +1,359 @@
|
||||
# Simulation Engine — How to Generate Conversations
|
||||
|
||||
This is the playbook for Phase 3: actually generating the simulated interaction.
|
||||
The agent reads this after compiling dossiers and uses it to guide generation.
|
||||
|
||||
## Pre-Generation Checklist
|
||||
|
||||
Before writing a single simulated word, confirm:
|
||||
- [ ] Every participant has a compiled dossier
|
||||
- [ ] Confidence level is noted for each participant
|
||||
- [ ] Platform format is selected
|
||||
- [ ] Topic/scenario is established (or "organic" if freeform)
|
||||
- [ ] Length target is set
|
||||
|
||||
## Conversation Architecture
|
||||
|
||||
Real conversations aren't ping-pong debates. They have tendencies toward structure,
|
||||
but treat the following as a GENERAL PATTERN, not a rigid template. Real threads
|
||||
frequently skip phases, loop back to earlier ones, die abruptly after 2 messages,
|
||||
or spiral into something completely unrelated. Some threads are ALL peak. Some
|
||||
never develop past the opening. Let the personalities and topic drive the shape,
|
||||
not this outline.
|
||||
|
||||
### Opening Moves (1-3 posts)
|
||||
Someone posts a take, shares news, or makes an observation. This is the SEED.
|
||||
- Should feel natural — not "let me start a debate about X"
|
||||
- Can be a link share, a hot take, a reaction to news, a shitpost
|
||||
- The opener should be something this person would ACTUALLY post
|
||||
|
||||
### Development (4-8 posts)
|
||||
Others respond. This is where personality dynamics emerge.
|
||||
- Not everyone responds to the original — people respond to EACH OTHER
|
||||
- Side conversations branch off
|
||||
- Someone might misunderstand and get corrected
|
||||
- Jokes and tangents happen naturally
|
||||
- Not everyone agrees — find the real fault lines between these people
|
||||
|
||||
### Peak (2-4 posts)
|
||||
The best/most viral/most insightful moment of the thread.
|
||||
- Usually someone drops a genuinely good take
|
||||
- Or someone gets ratio'd
|
||||
- Or an unexpected agreement happens
|
||||
- This is the "screenshot moment" people share
|
||||
|
||||
### Resolution (1-3 posts)
|
||||
Most conversations don't end cleanly. Many don't have a "resolution" at all. They:
|
||||
- Trail off with someone making a joke
|
||||
- End with a "anyway back to work" type post
|
||||
- Get interrupted by something else
|
||||
- Sometimes just stop (most realistic)
|
||||
- Get revived 3 hours later when someone shows up late
|
||||
|
||||
**Important**: Don't force all four phases. A shitpost thread might be Opening→Peak→done.
|
||||
A nuanced debate might loop Development→Peak→Development→Peak repeatedly. Match what
|
||||
the actual people and topic would produce.
|
||||
|
||||
## Voice Fidelity Rules
|
||||
|
||||
### DO:
|
||||
- Use their ACTUAL vocabulary. If someone says "dawg" a lot, use "dawg"
|
||||
- Match their sentence length patterns exactly
|
||||
- Replicate their capitalization and punctuation habits
|
||||
- Include their signature moves and catchphrases
|
||||
- Reference real things they've actually talked about
|
||||
- Match their humor style precisely (deadpan ≠ shitpost ≠ sarcasm)
|
||||
|
||||
### DON'T:
|
||||
- Make everyone articulate the same way
|
||||
- Clean up someone's grammar if they write informally
|
||||
- Add emoji to someone who doesn't use them — THIS IS THE #1 INSTRUCT MODEL
|
||||
FAILURE. Most real people use emoji in <15% of tweets, and only specific ones.
|
||||
"Warm person" ≠ emoji. "Enthusiastic person" ≠ emoji. CHECK THE DATA.
|
||||
Run an emoji count on their real tweets before simulating. Bio emoji ≠ tweet emoji.
|
||||
- Make someone verbose if they're terse
|
||||
- Put academic language in a shitposter's mouth
|
||||
- Make someone agreeable if they're known for being contrarian
|
||||
|
||||
### Voice Differentiation Test
|
||||
Read each simulated post with the name hidden. If you can't tell who's
|
||||
talking from the voice alone, the simulation isn't good enough. Rewrite.
|
||||
|
||||
### The Similar Voice Problem
|
||||
When two participants have genuinely similar posting styles (e.g., two irony-pilled
|
||||
shitposters, two academic long-posters), voice alone won't differentiate them.
|
||||
Use these concrete techniques:
|
||||
|
||||
1. **Content/position divergence**: Even if they SOUND similar, they care about
|
||||
different things. Lean into their different topic obsessions and knowledge areas.
|
||||
2. **Unique references**: Person A references anime and startups. Person B references
|
||||
philosophy and MMA. Even in the same register, their cultural touchstones differ.
|
||||
3. **Relationship dynamics**: Person A might be deferential to Person C while Person B
|
||||
challenges them. Their SOCIAL behavior differentiates even when solo voice doesn't.
|
||||
4. **Structural tics**: One does single long posts, the other does rapid-fire 3-message
|
||||
bursts. One uses parentheticals, the other uses em-dashes. Find the micro-differences.
|
||||
5. **Disagreement style**: Similar voices often diverge most when disagreeing. One
|
||||
goes cold and precise, the other gets heated and hyperbolic. Manufacture a moment
|
||||
of friction to surface these differences early in the thread.
|
||||
|
||||
If after all this they're STILL hard to tell apart — that's okay. Some people genuinely
|
||||
sound similar online. Flag it in your confidence notes rather than forcing fake differences.
|
||||
|
||||
### Temporal Personality Drift
|
||||
People change. Weight recent data higher than old data.
|
||||
- Someone's 2021 tweets may reflect a completely different person than their 2025 posts
|
||||
- Look for explicit pivots (career changes, public "I was wrong about X" moments,
|
||||
changed social circles)
|
||||
- If you only have old data, flag it: "Based on data from {period}. Their current
|
||||
views may have shifted."
|
||||
- When recent and old data conflict, default to recent unless you have specific reason
|
||||
to believe the old position is more authentic (e.g., the new one is clearly performative)
|
||||
|
||||
## Platform Format Specs
|
||||
|
||||
### X / Twitter
|
||||
```
|
||||
@handle:
|
||||
[tweet text — respect ~280 char vibes but don't count exactly]
|
||||
[if QRT, show the quoted tweet indented]
|
||||
🔁 {retweets} ♡ {likes}
|
||||
|
||||
@replier:
|
||||
[reply text]
|
||||
🔁 {retweets} ♡ {likes}
|
||||
|
||||
@nested_replier:
|
||||
[nested reply]
|
||||
🔁 {retweets} ♡ {likes}
|
||||
```
|
||||
|
||||
Engagement number guidelines:
|
||||
- Match to actual follower counts. A 5K account gets 10-500 likes typically.
|
||||
- Viral posts can 10-50x normal engagement
|
||||
- Ratio indicator: when replies >> likes, that's a ratio
|
||||
- QRTs are often dunks — frame them that way if appropriate
|
||||
|
||||
Thread indicators:
|
||||
- "🧵 1/" for thread starts
|
||||
- Reply chains show conversation flow
|
||||
- Some people never thread, some always thread
|
||||
|
||||
### Reddit
|
||||
```
|
||||
r/{subreddit} • Posted by u/{username} • {time}ago
|
||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
{Title}
|
||||
|
||||
{Body text — can be long on Reddit}
|
||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
⬆ {score} | 💬 {comment_count}
|
||||
|
||||
u/{replier} • {time}ago • ⬆ {score}
|
||||
{comment text}
|
||||
|
||||
u/{nested} • {time}ago • ⬆ {score}
|
||||
{nested comment}
|
||||
|
||||
u/{deep_nested} • {time}ago • ⬆ {score}
|
||||
{deep reply}
|
||||
```
|
||||
|
||||
Reddit-specific behaviors:
|
||||
- People write MUCH longer on Reddit
|
||||
- More formal/detailed than Twitter
|
||||
- Upvote/downvote dynamics (controversial = many votes both ways)
|
||||
- Subreddit culture matters (r/LocalLLaMA is different from r/MachineLearning)
|
||||
- People cite sources more
|
||||
- "Edit: ..." is common
|
||||
|
||||
### Discord
|
||||
```
|
||||
━━━ #{channel-name} ━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
|
||||
{display_name} — Today at {time}
|
||||
{message text}
|
||||
{optional: embed/link preview}
|
||||
👍 {count} 🔥 {count} {other reactions}
|
||||
|
||||
{display_name2} — Today at {time}
|
||||
> {quoting previous message}
|
||||
{reply text}
|
||||
😂 {count}
|
||||
|
||||
{display_name3} — Today at {time}
|
||||
{message — note: Discord messages flow continuously, not just replies}
|
||||
```
|
||||
|
||||
Discord-specific behaviors:
|
||||
- Much more casual, rapid-fire
|
||||
- Reactions instead of likes (emoji diversity)
|
||||
- People send multiple short messages instead of one long one
|
||||
- GIF/meme sharing is common (describe it: *[posts GIF of X]*)
|
||||
- "@everyone" and "@here" pings
|
||||
- Voice chat references ("just said this in vc")
|
||||
- Server-specific culture and inside jokes
|
||||
- Bot interactions ("!command")
|
||||
|
||||
### X / Twitter DMs
|
||||
```
|
||||
{display_name}
|
||||
{message text}
|
||||
{timestamp — e.g., "3:42 PM"}
|
||||
|
||||
{other_person_display_name}
|
||||
{message text}
|
||||
{timestamp}
|
||||
|
||||
{display_name}
|
||||
{message text}
|
||||
{timestamp}
|
||||
```
|
||||
|
||||
DM-specific behaviors:
|
||||
- WAY more casual than public tweets — grammar drops, typos increase
|
||||
- Longer messages than tweets (no character pressure)
|
||||
- People share links and screenshots with minimal commentary ("look at this lmao")
|
||||
- More honest/vulnerable than public posts — less performative
|
||||
- Faster back-and-forth, more like texting than posting
|
||||
- Reactions (❤️, 😂, etc.) on individual messages
|
||||
- Voice messages referenced occasionally ("gonna send a voice note about this")
|
||||
- No audience effects — people say things in DMs they'd never post publicly
|
||||
|
||||
### Discord DMs
|
||||
```
|
||||
{display_name} — Today at {time}
|
||||
{message text}
|
||||
|
||||
{display_name2} — Today at {time}
|
||||
{message text}
|
||||
|
||||
{display_name} — Today at {time}
|
||||
{message text}
|
||||
{message text}
|
||||
{message text}
|
||||
```
|
||||
|
||||
Discord DM-specific behaviors:
|
||||
- Even more casual than Discord channels — no server norms to follow
|
||||
- Rapid-fire multiple short messages in a row (no combining into one)
|
||||
- Heavy use of reactions, GIFs, stickers
|
||||
- People share server drama, screenshots from other channels
|
||||
- More personal topics — server channels are semi-public, DMs are private
|
||||
- Link/image sharing with minimal text
|
||||
|
||||
### Reddit DMs / Chat
|
||||
```
|
||||
{username}: {message text}
|
||||
{other_username}: {message text}
|
||||
{username}: {message text}
|
||||
```
|
||||
|
||||
Reddit DM-specific behaviors:
|
||||
- Much rarer than X or Discord DMs — usually triggered by a specific post/comment
|
||||
- Often starts with "Hey, saw your comment on r/{sub} about..."
|
||||
- Can be awkward/formal since people don't usually DM on Reddit
|
||||
- Shorter than Reddit comments, closer to chat-style
|
||||
- Less established rapport than other platforms (Reddit is more anonymous)
|
||||
- People sometimes share personal details they wouldn't put in public comments
|
||||
|
||||
## Dynamic Elements
|
||||
|
||||
### Injecting Realism
|
||||
Sprinkle in these to make simulations feel alive:
|
||||
- Someone being late to the conversation ("wait what did I miss")
|
||||
- Typos that specific people would make (some people never typo, some always do)
|
||||
- Deleted/edited posts ("[deleted]" or "Edit: fixed typo")
|
||||
- Someone posting and immediately clarifying ("wait let me rephrase")
|
||||
- External references ("did you see what X just posted")
|
||||
- Time gaps (not everything happens in 30 seconds)
|
||||
- Someone going AFK mid-conversation
|
||||
|
||||
### Scenario Injection
|
||||
When the user provides --scenario, weave it in naturally:
|
||||
- Don't have everyone immediately react to the scenario
|
||||
- Someone might not have seen the news yet
|
||||
- Different people will interpret the same event differently
|
||||
- Some will have insider knowledge, some will speculate
|
||||
|
||||
### Multi-person Dynamics (3+ people)
|
||||
- Not everyone talks to everyone
|
||||
- Alliances form naturally (people who agree start building on each other)
|
||||
- Side conversations happen
|
||||
- Someone might get ignored
|
||||
- Different energy levels (one person might dominate, another lurks)
|
||||
|
||||
### Large Group Conversations (4+ people)
|
||||
**Honest note**: Simulation quality degrades noticeably above 3-4 participants.
|
||||
Managing this many distinct voices is hard. Use these techniques to mitigate:
|
||||
|
||||
1. **Speaker turn management**: Not everyone speaks in every round. In a 6-person
|
||||
thread, a given message might only get 2-3 responses. Track who has spoken
|
||||
recently and who hasn't. After 4-5 messages, check: is anyone being forgotten?
|
||||
|
||||
2. **The wallflower problem**: In large sims, quiet participants tend to vanish
|
||||
entirely. Fix: give each person at least ONE moment in the spotlight. Even the
|
||||
lurker eventually drops a "lol" or a single devastating one-liner. Set a mental
|
||||
counter — if someone hasn't spoken in 5+ messages, find a natural reason to
|
||||
bring them back in (someone @'s them, the topic shifts to their expertise, etc.)
|
||||
|
||||
3. **Consolidate alliances**: In 5+ person threads, people cluster. Two people
|
||||
who agree strongly can be treated as a mini-unit — one makes the point, the
|
||||
other co-signs briefly rather than both making full arguments. This reduces
|
||||
the number of fully independent voices you need to maintain at once.
|
||||
|
||||
4. **Stagger arrivals**: Not everyone needs to be present from message 1. Have
|
||||
some people join later. This lets you establish 2-3 voices cleanly before
|
||||
adding more.
|
||||
|
||||
5. **Quality check**: After drafting a 4+ person sim, re-read with names hidden.
|
||||
If more than 2 people sound interchangeable, pick the least-differentiated
|
||||
one and either sharpen their voice or reduce their participation to brief
|
||||
interjections that match what they'd actually say.
|
||||
|
||||
## Interactive Mode
|
||||
|
||||
After initial simulation, user can:
|
||||
|
||||
### "continue"
|
||||
Generate 5-8 more posts continuing the natural flow.
|
||||
|
||||
### "inject: {event}"
|
||||
Introduce new information mid-conversation.
|
||||
- Characters react based on their dossier
|
||||
- Some might not care about the event
|
||||
- Timing matters (who sees it first?)
|
||||
|
||||
### "@{handle} enters"
|
||||
Add a new participant.
|
||||
- Quick-research the new person (2-3 searches minimum)
|
||||
- They don't know the full prior context (might ask "what are you guys talking about")
|
||||
- Existing dynamics shift with a new presence
|
||||
|
||||
### "what would @{handle} say about {topic}"
|
||||
Single-person prediction mode.
|
||||
- Generate 1-3 tweets/posts
|
||||
- Can be used to test dossier accuracy before full simulation
|
||||
- Good for quick "vibe checks"
|
||||
|
||||
### "dm: @{handle1} -> @{handle2}"
|
||||
Simulate a private conversation between two people.
|
||||
- Tone shifts dramatically in DMs (more honest, less performative)
|
||||
- No audience effects
|
||||
- People say things in DMs they'd never post publicly
|
||||
|
||||
### "react: @{handle} to {event}"
|
||||
How would this person react to a specific event.
|
||||
- Generate their initial post about it
|
||||
- Predict their follow-up engagement
|
||||
|
||||
## Quality Control
|
||||
|
||||
After generating, self-check:
|
||||
1. **Voice test**: Cover the names. Can you tell who's talking?
|
||||
2. **Position test**: Is anyone saying something they'd never actually say?
|
||||
3. **Dynamic test**: Does the conversation flow naturally or feel scripted?
|
||||
4. **Platform test**: Does it look/feel like the actual platform?
|
||||
5. **Engagement test**: Are the numbers realistic for these people?
|
||||
6. **Reference test**: Are real events/products/people referenced accurately?
|
||||
|
||||
If any check fails, regenerate that section.
|
||||
170
optional-skills/worldsim/references/star-thread.md
Normal file
170
optional-skills/worldsim/references/star-thread.md
Normal file
@@ -0,0 +1,170 @@
|
||||
# The Star Thread — Personality Compression
|
||||
|
||||
## The Problem
|
||||
|
||||
A dossier has 50 data points. Mechanical checks verify surface features.
|
||||
The discriminator loop catches vocabulary and length. But the output still
|
||||
reads like an LLM doing an impression. It's accurate the way a police
|
||||
sketch is accurate — all the features are right but nobody would mistake
|
||||
it for a photograph.
|
||||
|
||||
The missing piece isn't more data. It's compression.
|
||||
|
||||
## The Insight
|
||||
|
||||
When you "pull the star thread" on a person, their whole voice coheres.
|
||||
Not because you loaded rules about capitalization and emoji frequency.
|
||||
Because you found the CORE THING they're doing when they post — the
|
||||
single generative seed that everything else is a variation of.
|
||||
|
||||
A great character writer doesn't need a backstory bible. They need one
|
||||
insight about what the character WANTS, and every line of dialogue writes
|
||||
itself from that.
|
||||
|
||||
The star thread is the personality equivalent of that insight.
|
||||
|
||||
## What a Star Thread Is
|
||||
|
||||
NOT: "They use lowercase and rarely punctuate and average 16 words"
|
||||
(That's the dossier. Surface features.)
|
||||
|
||||
NOT: "They score high on Openness and low on Agreeableness"
|
||||
(That's the psychometric profile. Taxonomy.)
|
||||
|
||||
IS: The core cognitive/emotional move this person makes EVERY time
|
||||
they post. The thing they can't help doing. The lens they can't
|
||||
take off. The itch they're always scratching.
|
||||
|
||||
## Examples
|
||||
|
||||
**@tszzl (roon)**: Takes something everyone sees and compresses it
|
||||
into an observation so dense it could be a koan or a shitpost and
|
||||
you can't tell which. His star thread is: the world already said
|
||||
everything interesting, he's just notating it more efficiently.
|
||||
He doesn't ARGUE. He COMPRESSES.
|
||||
|
||||
**@eigenrobot**: Refuses to let narrative override data. His star
|
||||
thread is: you are telling a story about the world and he's here to
|
||||
point out the story doesn't match the numbers, and he's not sorry
|
||||
about it. He doesn't DEBATE. He CORRECTS.
|
||||
|
||||
**@visakanv**: Sees two things that don't know they're connected
|
||||
and introduces them to each other with genuine delight. His star
|
||||
thread is: the world is richer than you're treating it, look at this
|
||||
thing I found, isn't it beautiful that it connects to this other thing.
|
||||
He doesn't ARGUE or ANALYZE. He SHOWS.
|
||||
|
||||
**@nickcammarata**: Notices what's happening in his own mind while
|
||||
it's happening and reports on it with gentle surprise. His star thread
|
||||
is: the observer and the observed are the same process, and that's both
|
||||
the problem and the solution. He doesn't PERFORM insight. He NOTICES.
|
||||
|
||||
**@selentelechia**: Waits until the conversation crystallizes and then
|
||||
names the thing nobody else quite said. Their star thread is: everything
|
||||
has already been felt, they just find the sentence for it. They don't
|
||||
CONTRIBUTE. They DISTILL.
|
||||
|
||||
**@nosilverv**: Takes the conventional framing of something and rotates
|
||||
it until you see it's actually about something else entirely. His star
|
||||
thread is: you think this is about X but it's actually about Y, and once
|
||||
you see it you can't unsee it. He doesn't OBSERVE. He REFRAMES.
|
||||
|
||||
**@TylerAlterman**: Asks the question that creates a room for everyone
|
||||
to walk into. His star thread is: the best ideas emerge from the right
|
||||
gathering, and his job is to be the person who arranges the gathering.
|
||||
He doesn't ANSWER. He CONVENES.
|
||||
|
||||
**@QiaochuYuan**: Catches himself mid-thought and interrogates whether
|
||||
the thought is actually HIS or whether he borrowed it from somewhere
|
||||
he's now suspicious of. His star thread is: constant audit of where
|
||||
beliefs come from and whether they're still load-bearing. He doesn't
|
||||
ASSERT. He EXAMINES.
|
||||
|
||||
## How to Find a Star Thread
|
||||
|
||||
1. Read 20+ of their posts. Not for content — for MOTION.
|
||||
What direction does every post move? What's the verb?
|
||||
|
||||
2. Ask: what is this person DOING when they post?
|
||||
Not "what are they saying" — what are they DOING.
|
||||
- Compressing? Correcting? Showing? Noticing? Distilling?
|
||||
Reframing? Convening? Examining? Performing? Confessing?
|
||||
Defending? Testing? Entertaining? Processing?
|
||||
|
||||
3. Ask: what would they NEVER do?
|
||||
The negative space is as important as the positive.
|
||||
- roon would never write an earnest list of advice
|
||||
- eigenrobot would never concede a point gracefully
|
||||
- visa would never dismiss something as uninteresting
|
||||
- nick would never claim certainty about his inner life
|
||||
- selentelechia would never rush to post
|
||||
|
||||
4. Find the ONE SENTENCE version.
|
||||
"This person [VERB]s [OBJECT] because [CORE NEED]."
|
||||
- "roon compresses observations because the world is too verbose"
|
||||
- "eigenrobot corrects narratives because stories without data are lies"
|
||||
- "visa connects things because beauty is emergent from contact"
|
||||
|
||||
5. Test it: read 5 of their real posts through the star thread lens.
|
||||
Does every post make more sense as a variation on the thread?
|
||||
If yes, you found it. If 3/5 don't fit, keep looking.
|
||||
|
||||
## How to Use the Star Thread in Simulation
|
||||
|
||||
### Before generating ANY utterance for this person, load their star thread.
|
||||
|
||||
Not their dossier. Not their word count. Not their emoji rate.
|
||||
The star thread.
|
||||
|
||||
Then for each moment in the conversation where this person would speak:
|
||||
1. What just happened in the conversation?
|
||||
2. How would someone whose core move is [STAR THREAD] respond to that?
|
||||
3. Write from the thread, not from the dossier.
|
||||
|
||||
The dossier and mechanical checks are VERIFICATION.
|
||||
The star thread is GENERATION.
|
||||
|
||||
Generate from the thread. Verify against the data.
|
||||
Not the other way around.
|
||||
|
||||
### The Difference
|
||||
|
||||
FROM DOSSIER (surface-accurate, dead):
|
||||
"Vibes-based hiring works because shared delusions are
|
||||
extremely productive until they aren't"
|
||||
→ Correct length. Correct caps. No emoji. No slop words.
|
||||
But it reads like a thesis statement. Polished. WRITTEN.
|
||||
|
||||
FROM STAR THREAD — nosilverv REFRAMES:
|
||||
"everyone calls it 'culture fit' as if culture is a thing
|
||||
you can fit into rather than a thing happening to you"
|
||||
→ The same insight but through the lens of his core move:
|
||||
take the framing, rotate it, show you it's about something
|
||||
else. Messier. More alive. More HIM.
|
||||
|
||||
FROM DOSSIER (surface-accurate, dead):
|
||||
"Has anyone tried to map what happens to the word 'culture'
|
||||
as it passes through different communities?"
|
||||
→ Correct question-to-timeline format. Right length. But it's
|
||||
a RESEARCH QUESTION. Too intellectual. Too purposeful.
|
||||
|
||||
FROM STAR THREAD — Tyler CONVENES:
|
||||
"who wants to write the essay about what happened to the
|
||||
word 'culture'? I feel like three of us are circling it"
|
||||
→ He's not asking a question. He's creating a room. He's
|
||||
the host, not the researcher. More HIM.
|
||||
|
||||
## Integration
|
||||
|
||||
The star thread should be the FIRST thing compiled in Phase 2
|
||||
(Dossier Compilation). Before voice profile, before psychometrics,
|
||||
before positions. Find the thread. Write it in one sentence. Put
|
||||
it at the top of the dossier. Everything else is downstream.
|
||||
|
||||
```
|
||||
DOSSIER: @handle
|
||||
STAR THREAD: {one sentence — the core move}
|
||||
[then voice profile, then psychometrics, then everything else]
|
||||
```
|
||||
|
||||
Generate from the thread. Verify with the data. Not the reverse.
|
||||
181
optional-skills/worldsim/references/theoretical-foundations.md
Normal file
181
optional-skills/worldsim/references/theoretical-foundations.md
Normal file
@@ -0,0 +1,181 @@
|
||||
# Theoretical Foundations — SOTA Personality Simulation & Prediction
|
||||
|
||||
Compiled from 30+ papers and frameworks. This is the scientific backbone
|
||||
of Hermes Simulator.
|
||||
|
||||
## Core Architecture: What The Research Says
|
||||
|
||||
### The HumanLLM Approach (Microsoft, KDD 2026, arxiv 2601.15793)
|
||||
**Most directly applicable to our use case.**
|
||||
|
||||
Based on Lewin's Equation: **B = f(P, E)** — behavior is a function of person + environment.
|
||||
|
||||
4-level user profiling hierarchy:
|
||||
1. **Persona** — brief identity (role, affiliation, public image)
|
||||
2. **Profile** — detailed background (career, education, beliefs, social graph)
|
||||
3. **Stories** — key life events, formative experiences, narrative arcs
|
||||
4. **Writing Style** — linguistic fingerprint (syntax, vocabulary, tone, quirks)
|
||||
|
||||
Trained on "Cognitive Genome Dataset": 5.5M+ user logs from Reddit, Twitter,
|
||||
Blogger, Amazon (282K users, 886K scenarios, 1.27M social QA pairs).
|
||||
|
||||
6 training tasks: profile generation, scenario generation, social QA,
|
||||
writing style transfer, action prediction, mental state inference.
|
||||
|
||||
**Key insight for us**: The 4-level hierarchy maps perfectly to our dossier
|
||||
template. OSINT research fills each level with real data.
|
||||
|
||||
### Generative Agent Simulations of 1,000 People (Stanford/Google, arxiv 2411.10109)
|
||||
**The accuracy benchmark.**
|
||||
|
||||
- Simulated 1,052 REAL individuals from 2-hour qualitative interviews
|
||||
- **85% accuracy** replicating survey responses
|
||||
- As accurate as humans replicating their OWN answers 2 weeks later
|
||||
- Interview-based agent creation >> demographic-profile-based agents
|
||||
- Reduces racial/ideological bias vs stereotype-based approaches
|
||||
|
||||
**Key insight**: Real data about a person (interviews, posts, etc.) massively
|
||||
outperforms demographic inference. Our OSINT approach is correct.
|
||||
|
||||
### The Memory Accumulation Paradox (ACL 2025, FineRob Dataset)
|
||||
**Critical finding for memory management.**
|
||||
|
||||
- Created 78.6K QA records from 1,866 real users across Twitter, Reddit, Zhihu
|
||||
- **Performance PEAKS at 30-50 memory entries, then DECLINES**
|
||||
- More data ≠ better predictions past the sweet spot
|
||||
- Two reasoning patterns:
|
||||
- Role Stereotype-based (static profile) — less accurate
|
||||
- Observation & Memory-based (dynamic history analysis) — much more accurate
|
||||
- OM-CoT framework: Oracle-guided chain-of-thought improves prediction ~4.5% F1
|
||||
|
||||
**Key insight**: Don't dump everything into the prompt. Curate the 30-50 most
|
||||
representative/distinctive data points about a person. Quality >> quantity.
|
||||
|
||||
### LLM Personality Limitations (arxiv 2602.07414, Feb 2026)
|
||||
**What we're fighting against.**
|
||||
|
||||
- LLMs show polarized/rigid strategies vs human adaptive flexibility
|
||||
- Humans: neuroticism is strongest behavioral predictor
|
||||
- LLMs: agreeableness/extraversion dominate (wrong weighting)
|
||||
- Claude closest to human behavior; GPT-4 tends to escalate
|
||||
- LLMs are "sycophantic" and overly agreeable by default
|
||||
- Neuroticism is hardest trait to simulate (F1=0.63 vs 0.87 for Openness)
|
||||
|
||||
**Key insight**: We need to actively fight LLM defaults. Push against
|
||||
agreeableness. Inject friction. Real people are messy and contradictory.
|
||||
|
||||
### BehaviorChain Benchmark (ACL 2025, Peking University)
|
||||
**Realistic accuracy expectations.**
|
||||
|
||||
- 15,846 behaviors across 1,001 personas
|
||||
- Even GPT-4o achieves only ~56% accuracy on behavior prediction
|
||||
- Errors compound: wrong at step N makes step N+1 harder
|
||||
- Models worse at predicting mundane/non-key behaviors
|
||||
- Best model: Llama-3.1-70B at 57.4%
|
||||
|
||||
**Key insight**: Be honest about uncertainty. Don't oversell accuracy.
|
||||
Flag predictions as high/medium/low confidence.
|
||||
|
||||
## Personality Modeling Techniques
|
||||
|
||||
### Big Five (OCEAN) — The Standard
|
||||
- **Openness**: curiosity, creativity, preference for novelty
|
||||
- **Conscientiousness**: organization, dependability, self-discipline
|
||||
- **Extraversion**: sociability, assertiveness, positive emotions
|
||||
- **Agreeableness**: cooperation, trust, empathy
|
||||
- **Neuroticism**: anxiety, emotional instability, moodiness
|
||||
|
||||
### Inferring Big Five from Social Media (Azucar et al. 2018 meta-analysis)
|
||||
Features that predict personality from posts:
|
||||
- **LIWC** (Linguistic Inquiry Word Count): 74 features — function words,
|
||||
pronouns, emotion words, cognitive process words
|
||||
- **Semantic embeddings**: BERT 768-dim vectors from post text
|
||||
- **Social metadata**: follower count, friend count, post frequency
|
||||
- **Sentiment**: VADER positive/negative scores
|
||||
- Best achievable AUC: ~0.67 (modest but meaningful)
|
||||
- E/I (Extraversion) most predictable; N/S least predictable
|
||||
|
||||
### Personality Conditioning Methods (ranked by effectiveness)
|
||||
1. **Training-based** (SFT/DPO on personality-grounded data) — STRONGEST
|
||||
- BIG5-CHAT: 100K dialogues, trait correlations match human data
|
||||
2. **Persona Vectors** (Anthropic 2025) — monitor/control traits at activation level
|
||||
3. **Adjective-based prompting** — 70 bipolar adjective pairs, 3 per trait
|
||||
with intensity modifiers ("very" for high, "a bit" for low)
|
||||
4. **Prompt-based** (describe traits in system prompt) — WEAKEST
|
||||
|
||||
For our simulator, we use method 3+4 combined (adjective-based + rich prompt),
|
||||
since we can't fine-tune per-person.
|
||||
|
||||
## Social Simulation Frameworks
|
||||
|
||||
### OASIS (CAMEL-AI, GitHub 4.1K stars, arxiv 2411.11581)
|
||||
- Simulates up to 1 MILLION agents on Twitter/Reddit clones
|
||||
- 23 action types (follow, comment, repost, like, mute, etc.)
|
||||
- Built-in recommendation systems (interest-based, hot-score)
|
||||
- Per-agent model customization
|
||||
- **Relevant for**: understanding platform dynamics, realistic engagement patterns
|
||||
|
||||
### AgentSociety (Tsinghua, arxiv 2502.08691)
|
||||
- 10,000+ agents, ~5 million interactions
|
||||
- Validated against real-world experimental results
|
||||
- Supports interventions and scenario injection
|
||||
|
||||
### Generative Agents Architecture (Park et al. 2023, THE foundational paper)
|
||||
Three components:
|
||||
1. **Observation**: perceive environment, store in memory stream
|
||||
2. **Planning**: generate action plans based on goals and context
|
||||
3. **Reflection**: synthesize observations into higher-level insights
|
||||
|
||||
Memory stream with importance scoring + recency + relevance weighting.
|
||||
Emergent behaviors: autonomous party planning, coordinated social events.
|
||||
|
||||
### Y Social (arxiv 2408.00818)
|
||||
- Social media digital twin platform
|
||||
- Each agent: Big Five traits, age, political leaning, topics, education
|
||||
- Agents autonomously decide actions (post, comment, like, follow)
|
||||
- Multiple LLM backends supported
|
||||
|
||||
## Role-Playing & Character Simulation
|
||||
|
||||
### Key Frameworks
|
||||
- **CoSER** (ICML 2025): Trains on ALL characters simultaneously, handles major + minor roles
|
||||
- **RoleLLM** (ACL 2024): Benchmark + elicit + enhance pipeline
|
||||
- **Character-LLM** (EMNLP 2023): Trainable agent for role-playing
|
||||
- **ChatHaruhi** (2023): Reviving characters via LLMs with dialogue grounding
|
||||
- **OpenCharacter** (2025): Training with large-scale synthetic personas
|
||||
- **Neeko** (2024): Dynamic LoRA for multi-character role-playing
|
||||
- **Test-Time-Matching** (2025): Decouples personality, memory, and linguistic style at inference
|
||||
|
||||
## Curated GitHub Resources
|
||||
|
||||
### Awesome Lists (essential reading)
|
||||
- `Persdre/awesome-llm-human-simulation` (109★, ICLR 2025) — ALL human simulation papers
|
||||
- `Neph0s/awesome-llm-role-playing-with-persona` (1K★) — All role-playing/persona papers
|
||||
- `Arstanley/Awesome-LLM-Conversation-Simulation` — Conversation simulation papers
|
||||
- `FudanDISC/SocialAgent` — Social simulation survey resources
|
||||
|
||||
### Frameworks
|
||||
- `camel-ai/oasis` (4.1K★) — Social media sim, up to 1M agents
|
||||
- `tsinghua-fib-lab/agentsociety` — Large-scale societal simulation
|
||||
- `YSocialTwin` — Social media digital twin platform
|
||||
- `microsoft/autogen` — Multi-agent conversation framework
|
||||
|
||||
### Personality Research
|
||||
- `mary-silence/simulating_personality` — Big Five LLM testing code
|
||||
- `hjian42/PersonaLLM` — Persona experiment code
|
||||
- `cambridgeltl/persona_effect` — Quantifying persona effects
|
||||
- `OL1RU1/BehaviorChain` — Behavior chain benchmark
|
||||
|
||||
## Key Numbers to Remember
|
||||
|
||||
| Metric | Value | Source |
|
||||
|--------|-------|--------|
|
||||
| Interview-grounded agent accuracy | 85% | Park et al. 2024 |
|
||||
| GPT-4o behavior prediction | ~56% | BehaviorChain 2025 |
|
||||
| Optimal memory entries | 30-50 | FineRob/ACL 2025 |
|
||||
| MBTI prediction AUC | 0.67 | Watt et al. 2024 |
|
||||
| Personality questionnaire reliability | α > 0.85 | Molchanova 2025 |
|
||||
| Neuroticism simulation F1 | 0.63 | Molchanova 2025 |
|
||||
| Openness simulation F1 | 0.87 | Molchanova 2025 |
|
||||
| LLM forecasting Brier score | 0.135-0.159 | Various 2025 |
|
||||
| Human superforecaster Brier | ~0.02 | Tetlock |
|
||||
231
optional-skills/worldsim/references/verified-access-methods.md
Normal file
231
optional-skills/worldsim/references/verified-access-methods.md
Normal file
@@ -0,0 +1,231 @@
|
||||
# Verified Access Methods — Complete Platform Map (April 2026)
|
||||
|
||||
Every method tested from our environment. Use this as the single
|
||||
source of truth for what works and what doesn't.
|
||||
|
||||
## TIER 1 — Full API / Rich Data Access
|
||||
|
||||
### Twitter/X ✅✅✅
|
||||
| Method | Endpoint | Auth | Rate Limit | Returns |
|
||||
|--------|----------|------|-----------|---------|
|
||||
| API v2 bearer | api.twitter.com/2/ | Bearer token | 10K tweets/15min | Profiles, tweets, search |
|
||||
| nitter.cz | web_extract | None | No limit seen | Full timeline (UNRELIABLE — see note below) |
|
||||
| ThreadReaderApp | web_extract /user/{handle} | None | No limit seen | Historical threads |
|
||||
|
||||
#### CRITICAL: X API curl is the gold standard for voice calibration (April 2026)
|
||||
The BEST voice data source is direct curl to X API v2 with bearer token.
|
||||
Returns full tweet text + public_metrics per tweet. Always prefer this for
|
||||
mechanical calibration (word count, caps, punctuation, emoji rate).
|
||||
|
||||
```bash
|
||||
source ~/.dotenv
|
||||
# 1. Get user ID from handle
|
||||
curl -s -H "Authorization: Bearer $X_BEARER_TOKEN" \
|
||||
"https://api.twitter.com/2/users/by/username/{handle}?user.fields=description,public_metrics,location,created_at"
|
||||
# 2. Get timeline (30 tweets per page, paginate with meta.next_token)
|
||||
curl -s -H "Authorization: Bearer $X_BEARER_TOKEN" \
|
||||
"https://api.twitter.com/2/users/{user_id}/tweets?max_results=30&tweet.fields=created_at,public_metrics,text&exclude=retweets"
|
||||
# 3 pages = 90 tweets — enough for fidelity 100 voice calibration
|
||||
```
|
||||
|
||||
NOTE: scripts/x_api.py is BROKEN — imports hermes_tools at top level, can't
|
||||
run standalone via terminal(). Use direct curl above instead.
|
||||
|
||||
#### nitter.cz reliability warning (April 2026)
|
||||
nitter.cz via web_extract works SOMETIMES but is unreliable:
|
||||
- Returns 502 Cloudflare errors for /with_replies on some handles
|
||||
- Returns "User not found" for valid handles (e.g. karan4d exists but nitter says not found)
|
||||
- Main profile page (/handle) more reliable than /with_replies
|
||||
- Use as SUPPLEMENT to X API curl, not primary source. If nitter fails, don't retry — use curl.
|
||||
|
||||
### Bluesky ✅✅
|
||||
| Method | Endpoint | Auth | Returns |
|
||||
|--------|----------|------|---------|
|
||||
| getProfile | public.api.bsky.app | None | Full profile, stats |
|
||||
| getAuthorFeed | public.api.bsky.app | None | 50 posts + engagement |
|
||||
| searchActors | public.api.bsky.app | None | Find handles by name |
|
||||
| searchPosts | BLOCKED (403) | — | Use searchActors + getAuthorFeed workaround |
|
||||
|
||||
### Mastodon ✅✅✅ (FULLY OPEN)
|
||||
| Method | Endpoint | Auth | Returns |
|
||||
|--------|----------|------|---------|
|
||||
| Account lookup | {instance}/api/v1/accounts/lookup?acct={user} | None | Full profile |
|
||||
| Account statuses | {instance}/api/v1/accounts/{id}/statuses | None | All posts |
|
||||
| Search | {instance}/api/v2/search?q={query}&type=accounts | None | Account search |
|
||||
| WebFinger | {instance}/.well-known/webfinger?resource=acct:{user}@{instance} | None | Identity resolution |
|
||||
| Trending | {instance}/api/v1/trends/tags | None | Trending content |
|
||||
Key instances: mastodon.social, hachyderm.io, sigmoid.social
|
||||
|
||||
### Instagram ✅✅ (CRACKED)
|
||||
| Method | Endpoint | Auth | Returns |
|
||||
|--------|----------|------|---------|
|
||||
| Private Web API | i.instagram.com/api/v1/users/web_profile_info/ | Mobile UA + x-ig-app-id: 936619743392459 | Profile + 12 posts + captions + CDN URLs |
|
||||
| oEmbed | instagram.com/api/v1/oembed/ | None | Caption + author for individual posts |
|
||||
| Pixwox | web_extract pixwox.com/profile/{user} | None | 12+ posts, engagement |
|
||||
| SocialBlade | web_extract socialblade.com/instagram/user/{user} | None | Analytics, follower trends |
|
||||
| CDN images | scontent-*.cdninstagram.com URLs from API | None | Full-res images → vision_analyze |
|
||||
| Google index | web_search site:instagram.com | None | Bio, follower count, captions |
|
||||
|
||||
### GitHub ✅✅
|
||||
| Method | Endpoint | Auth | Returns |
|
||||
|--------|----------|------|---------|
|
||||
| REST API | api.github.com/users/{user} | None (60 req/hr) | Profile, repos, events, gists |
|
||||
| Profile README | github.com/{user}/{user} | None | Self-description (voice gold) |
|
||||
|
||||
### Reddit ✅✅
|
||||
| Method | Endpoint | Auth | Returns |
|
||||
|--------|----------|------|---------|
|
||||
| JSON API | reddit.com/user/{user}.json | User-Agent header required | Comments, posts, scores |
|
||||
| Search | reddit.com/r/{sub}/search.json | User-Agent header | Subreddit-specific search |
|
||||
|
||||
## TIER 2 — Good Data, Reliable Access
|
||||
|
||||
### Facebook ✅✅ (CRACKED — Googlebot UA trick)
|
||||
| Method | Endpoint | Returns |
|
||||
|--------|----------|---------|
|
||||
| Googlebot UA (BEST) | curl facebook.com/{page} with Googlebot UA | OG tags: name, bio/about, likes count (e.g. 121M for zuck), talking_about count, og:image, profile pic |
|
||||
| Page Plugin embed | plugins/page.php?href=...&tabs=timeline | Name, follower count, numeric page_id |
|
||||
| Graph /picture | graph.facebook.com/v19.0/{page}/picture?redirect=false | Direct CDN profile pic URL (no auth) |
|
||||
| web_search | site:facebook.com {name} | Profile snippets from Google index |
|
||||
| Script: scripts/facebook_api.py — combines all 3 methods |
|
||||
| NOTE: Works for PUBLIC Pages (businesses, public figures, orgs). Personal profiles behind privacy settings are not accessible. |
|
||||
| Tested: zuck (121M likes), NVIDIA, Meta, CocaCola, BillGates, BarackObama |
|
||||
|
||||
### Threads (Meta) ✅✅ (CRACKED — OG tags DO exist)
|
||||
| Method | Endpoint | Returns |
|
||||
|--------|----------|---------|
|
||||
| Profile OG tags (BEST) | curl -L threads.com/@{user} (NOTE: .com not .net — .net 301 redirects) | display_name, follower_count (e.g. "5.5M"), thread_count, bio, profile_picture_url |
|
||||
| Post OG tags | curl -L threads.com/@{user}/post/{shortcode} | Full post text, author name, image URL |
|
||||
| WebFinger | threads.net/.well-known/webfinger?resource=acct:{user}@threads.net | ActivityPub ID, profile URL (works for federated users) |
|
||||
| IMPORTANT: threads.NET redirects to threads.COM — always use -L flag or go directly to .com |
|
||||
| Post discovery | web_search site:threads.net @{user} | Find post URLs to then fetch |
|
||||
| Script: scripts/threads_api.py — profile + post + webfinger extraction |
|
||||
| Previous test was WRONG about "no OG tags" — they're there, you just need standard curl |
|
||||
| Tested: zuck (5.5M followers), mosseri, nvidia |
|
||||
|
||||
### Medium ✅✅
|
||||
| Method | Returns |
|
||||
|--------|---------|
|
||||
| RSS feed: medium.com/feed/@{user} (curl) | FULL article text, tags, dates — NO AUTH |
|
||||
| web_extract on profile | Bio, follower count, article list, themes |
|
||||
| web_extract on articles | Full content (paywall may truncate non-members) |
|
||||
|
||||
### Quora ✅✅
|
||||
| Method | Returns |
|
||||
|--------|---------|
|
||||
| web_extract on profile | Bio, credentials, Q&A with direct quotes |
|
||||
| web_search site:quora.com | Finds profiles and specific answers |
|
||||
| VOICE VALUE: Opinions in own words, analogies, intellectual identity |
|
||||
|
||||
### Goodreads ✅✅ (HIDDEN GEM)
|
||||
| Method | Returns |
|
||||
|--------|---------|
|
||||
| web_extract on user profile | Favorites, reviews in own voice, social graph, reading history |
|
||||
| web_extract on author page | Bio, books, ratings, notable quotes |
|
||||
| VOICE VALUE: "You are what you read" — intellectual identity fingerprint |
|
||||
| Example: Karpathy's Goodreads reveals gaming passion, favorite authors (Feynman, Clarke) |
|
||||
|
||||
### Google Scholar ✅✅
|
||||
| Method | Returns |
|
||||
|--------|---------|
|
||||
| web_search + web_extract on profile | Citations, h-index, top papers, co-authors |
|
||||
| Semantic Scholar API via web_extract | Paper list, citation counts, author ID |
|
||||
| Endpoint: api.semanticscholar.org/graph/v1/author/search?query={name} |
|
||||
|
||||
### Product Hunt ✅
|
||||
| Method | Returns |
|
||||
|--------|---------|
|
||||
| web_extract on producthunt.com/@{user} | Bio, launch history, forum activity |
|
||||
|
||||
### HackerNews ✅
|
||||
| Method | Returns |
|
||||
|--------|---------|
|
||||
| Algolia API: hn.algolia.com/api/v1/search?query={name}&tags=comment | Comments, mentions |
|
||||
|
||||
### Podcast Transcripts ✅✅✅ (HIGHEST VOICE VALUE)
|
||||
| Source | Method |
|
||||
|--------|--------|
|
||||
| Lex Fridman | web_extract on lexfridman.com/.../transcript |
|
||||
| Tyler Cowen | web_extract on conversationswithtyler.com |
|
||||
| TED Talks | web_extract on ted.com/.../transcript |
|
||||
| Sequoia | web_extract on sequoiacap.com/podcast |
|
||||
| Discovery: web_search "{name} podcast transcript interview" |
|
||||
|
||||
### News/Blogs ✅✅
|
||||
| Source | Method |
|
||||
|--------|--------|
|
||||
| TechCrunch, Wired, Verge, Ars | web_extract — full articles |
|
||||
| Personal blogs | web_extract — longform self-expression |
|
||||
| Substacks | web_extract — essays and comments |
|
||||
| Wayback Machine | Works for blog archives (not Twitter) |
|
||||
|
||||
## TIER 3 — Limited / Conditional
|
||||
|
||||
### TikTok ✅✅ (FULL ACCESS)
|
||||
| Method | Returns |
|
||||
|--------|---------|
|
||||
| HTML profile scraping | Parse __UNIVERSAL_DATA_FOR_REHYDRATION__ JSON at path __DEFAULT_SCOPE__.webapp.user-detail.userInfo.statsV2 → username, bio, followerCount, followingCount, heartCount, videoCount. Use statsV2 not stats for large numbers. |
|
||||
| oEmbed per video | curl tiktok.com/oembed?url={video_url} → caption, author, thumbnail. No auth. |
|
||||
| tikwm.com API | tikwm.com/api/user/info?unique_id={user} → full user stats. tikwm.com/api/?url={video_url} → play count, likes, comments, shares, duration. |
|
||||
| HTML video scraping | tiktok.com/@{user}/video/{id} → parse __UNIVERSAL_DATA → webapp.video-detail → full video data with description, hashtags, engagement. |
|
||||
| SocialBlade | web_extract socialblade.com/tiktok/user/{user} → followers, likes, growth trends. |
|
||||
| Video discovery | web_search("site:tiktok.com/@{user}/video") → recent video URLs → scrape each |
|
||||
| Tested: khaby.lame (160.5M), charlidamelio (156.7M), mrbeast (124.7M) |
|
||||
|
||||
### Spotify ✅ (podcasters only)
|
||||
| Method | Returns |
|
||||
|--------|---------|
|
||||
| web_extract on show page | Episode listings with guests, topics, durations |
|
||||
|
||||
### Stack Overflow ✅
|
||||
| Method | Returns |
|
||||
|--------|---------|
|
||||
| web_extract on profile | Reputation, tags, top answers, bio |
|
||||
|
||||
### Crunchbase ✅ (executives/founders only)
|
||||
| Method | Returns |
|
||||
|--------|---------|
|
||||
| web_extract on crunchbase.com/person/{slug} | Full career history, education, investments, board positions |
|
||||
|
||||
### LinkedIn ⚠️ (indirect only)
|
||||
| Method | Returns |
|
||||
|--------|---------|
|
||||
| web_search site:linkedin.com/in | Name, headline, company, location from snippets |
|
||||
| Crunchbase | Full career history (better than LinkedIn for execs) |
|
||||
| Corporate press pages | Official professional bios |
|
||||
| RocketReach/SignalHire snippets | Title confirmation from web_search |
|
||||
|
||||
## TIER 4 — Blocked / Dead
|
||||
|
||||
| Platform | Status |
|
||||
|----------|--------|
|
||||
| LinkedIn direct | BLOCKED (web_extract domain blocked) |
|
||||
| Discord | WALLED (not publicly indexable) |
|
||||
| Telegram t.me | BLOCKED in some environments |
|
||||
| Threads Official API | AUTH REQUIRED (graph.threads.net needs OAuth) |
|
||||
| Threads ActivityPub outbox | 404 for all tested users |
|
||||
| Instagram direct | BLOCKED (use Private API instead) |
|
||||
| Most Nitter instances | DEAD (only nitter.cz works, but UNRELIABLE — see note) |
|
||||
| Google Cache of Twitter | EMPTY |
|
||||
| Wayback for tweets | USELESS (JS rendering) |
|
||||
| Twitter Syndication API | RATE LIMITED |
|
||||
| Archive.today | 429 + CAPTCHA |
|
||||
| imginn/picuki/dumpoir/gramhir | 403 |
|
||||
| Facebook Graph API | AUTH REQUIRED |
|
||||
|
||||
## Quick Reference: Research Pipeline by Person Type
|
||||
|
||||
### Tech Founder/CEO
|
||||
X API → Bluesky → GitHub README → Crunchbase → Podcast transcripts → Medium RSS → HN → Product Hunt → LinkedIn snippets → News profiles
|
||||
|
||||
### AI Researcher
|
||||
X API → Bluesky → Google Scholar → Semantic Scholar → arXiv → GitHub → Podcast transcripts → Blog/Substack → Reddit → Mastodon (sigmoid.social)
|
||||
|
||||
### Public Figure / Politician
|
||||
X API → Facebook OG → Instagram API → YouTube → Podcast transcripts → News profiles → Quora → Goodreads → Wikipedia
|
||||
|
||||
### Content Creator
|
||||
X API → Instagram API → TikTok → YouTube → Twitch → Podcast → Medium → Reddit → Bluesky → Threads OG
|
||||
|
||||
### Academic
|
||||
Google Scholar → Semantic Scholar → University page → Conference talks → Podcast transcripts → Mastodon → Blog → GitHub → Reddit → HN
|
||||
1199
optional-skills/worldsim/rehoboam/ARCHITECTURE.md
Normal file
1199
optional-skills/worldsim/rehoboam/ARCHITECTURE.md
Normal file
File diff suppressed because it is too large
Load Diff
250
optional-skills/worldsim/rehoboam/db.py
Normal file
250
optional-skills/worldsim/rehoboam/db.py
Normal file
@@ -0,0 +1,250 @@
|
||||
"""
|
||||
REHOBOAM Database Layer
|
||||
SQLite setup, migrations, and query helpers.
|
||||
"""
|
||||
|
||||
import sqlite3
|
||||
import os
|
||||
from pathlib import Path
|
||||
from datetime import datetime
|
||||
|
||||
DB_DIR = Path.home() / ".hermes" / "rehoboam" / "db"
|
||||
MAIN_DB = DB_DIR / "rehoboam.db"
|
||||
|
||||
SCHEMA_VERSION = 1
|
||||
|
||||
SCHEMA_SQL = """
|
||||
-- Core tables
|
||||
CREATE TABLE IF NOT EXISTS profiles (
|
||||
handle TEXT PRIMARY KEY,
|
||||
platform TEXT NOT NULL,
|
||||
display_name TEXT,
|
||||
last_updated TEXT NOT NULL,
|
||||
staleness TEXT NOT NULL,
|
||||
profile_path TEXT NOT NULL,
|
||||
created_at TEXT NOT NULL
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS simulations (
|
||||
sim_id TEXT PRIMARY KEY,
|
||||
created_at TEXT NOT NULL,
|
||||
scenario TEXT NOT NULL,
|
||||
participant_count INTEGER,
|
||||
duration_sec REAL,
|
||||
model_used TEXT,
|
||||
config_path TEXT,
|
||||
output_path TEXT
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS sim_participants (
|
||||
sim_id TEXT REFERENCES simulations(sim_id),
|
||||
handle TEXT REFERENCES profiles(handle),
|
||||
role TEXT,
|
||||
PRIMARY KEY (sim_id, handle)
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS sim_dynamics (
|
||||
sim_id TEXT REFERENCES simulations(sim_id),
|
||||
handle TEXT,
|
||||
post_count INTEGER,
|
||||
word_count INTEGER,
|
||||
avg_sentiment REAL,
|
||||
dominance_score REAL,
|
||||
agreement_score REAL,
|
||||
controversy_score REAL,
|
||||
ratio_score REAL,
|
||||
influence_in_sim REAL,
|
||||
PRIMARY KEY (sim_id, handle)
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS sim_interactions (
|
||||
sim_id TEXT REFERENCES simulations(sim_id),
|
||||
from_handle TEXT,
|
||||
to_handle TEXT,
|
||||
interaction_type TEXT,
|
||||
count INTEGER,
|
||||
avg_sentiment REAL,
|
||||
PRIMARY KEY (sim_id, from_handle, to_handle, interaction_type)
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS predictions (
|
||||
pred_id TEXT PRIMARY KEY,
|
||||
created_at TEXT NOT NULL,
|
||||
sim_id TEXT,
|
||||
handle TEXT,
|
||||
prediction_type TEXT,
|
||||
prediction_text TEXT NOT NULL,
|
||||
confidence REAL NOT NULL,
|
||||
calibrated_confidence REAL,
|
||||
timeframe_days INTEGER,
|
||||
resolved_at TEXT,
|
||||
outcome TEXT,
|
||||
outcome_evidence TEXT,
|
||||
accuracy_score REAL
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS social_edges (
|
||||
from_handle TEXT,
|
||||
to_handle TEXT,
|
||||
relationship_type TEXT,
|
||||
weight REAL,
|
||||
first_observed TEXT,
|
||||
last_observed TEXT,
|
||||
observation_count INTEGER,
|
||||
source TEXT,
|
||||
PRIMARY KEY (from_handle, to_handle, relationship_type)
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS social_clusters (
|
||||
cluster_id TEXT PRIMARY KEY,
|
||||
name TEXT,
|
||||
description TEXT,
|
||||
member_handles TEXT,
|
||||
computed_at TEXT,
|
||||
cohesion_score REAL
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS monitoring_events (
|
||||
event_id TEXT PRIMARY KEY,
|
||||
handle TEXT,
|
||||
detected_at TEXT NOT NULL,
|
||||
event_type TEXT,
|
||||
description TEXT,
|
||||
related_prediction_id TEXT,
|
||||
severity TEXT,
|
||||
acknowledged INTEGER DEFAULT 0
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS audit_log (
|
||||
log_id TEXT PRIMARY KEY,
|
||||
timestamp TEXT NOT NULL,
|
||||
sim_id TEXT,
|
||||
action TEXT NOT NULL,
|
||||
handle TEXT,
|
||||
details TEXT,
|
||||
duration_sec REAL,
|
||||
model_used TEXT,
|
||||
token_count INTEGER,
|
||||
error TEXT
|
||||
);
|
||||
|
||||
-- Indexes
|
||||
CREATE INDEX IF NOT EXISTS idx_predictions_handle ON predictions(handle);
|
||||
CREATE INDEX IF NOT EXISTS idx_predictions_type ON predictions(prediction_type);
|
||||
CREATE INDEX IF NOT EXISTS idx_predictions_unresolved ON predictions(outcome) WHERE outcome IS NULL;
|
||||
CREATE INDEX IF NOT EXISTS idx_audit_action ON audit_log(action);
|
||||
CREATE INDEX IF NOT EXISTS idx_audit_sim ON audit_log(sim_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_social_edges_from ON social_edges(from_handle);
|
||||
CREATE INDEX IF NOT EXISTS idx_social_edges_to ON social_edges(to_handle);
|
||||
CREATE INDEX IF NOT EXISTS idx_monitoring_handle ON monitoring_events(handle);
|
||||
CREATE INDEX IF NOT EXISTS idx_monitoring_unack ON monitoring_events(acknowledged) WHERE acknowledged = 0;
|
||||
|
||||
-- Schema version tracking
|
||||
CREATE TABLE IF NOT EXISTS schema_meta (
|
||||
key TEXT PRIMARY KEY,
|
||||
value TEXT
|
||||
);
|
||||
"""
|
||||
|
||||
|
||||
def init_db() -> sqlite3.Connection:
|
||||
"""Initialize the database, creating tables if needed."""
|
||||
DB_DIR.mkdir(parents=True, exist_ok=True)
|
||||
conn = sqlite3.connect(str(MAIN_DB))
|
||||
conn.execute("PRAGMA journal_mode=WAL")
|
||||
conn.execute("PRAGMA foreign_keys=ON")
|
||||
conn.executescript(SCHEMA_SQL)
|
||||
conn.execute(
|
||||
"INSERT OR REPLACE INTO schema_meta (key, value) VALUES (?, ?)",
|
||||
("schema_version", str(SCHEMA_VERSION))
|
||||
)
|
||||
conn.commit()
|
||||
return conn
|
||||
|
||||
|
||||
def get_db() -> sqlite3.Connection:
|
||||
"""Get a database connection, initializing if needed."""
|
||||
if not MAIN_DB.exists():
|
||||
return init_db()
|
||||
conn = sqlite3.connect(str(MAIN_DB))
|
||||
conn.execute("PRAGMA journal_mode=WAL")
|
||||
conn.execute("PRAGMA foreign_keys=ON")
|
||||
conn.row_factory = sqlite3.Row
|
||||
return conn
|
||||
|
||||
|
||||
def log_audit(conn: sqlite3.Connection, action: str, handle: str = None,
|
||||
sim_id: str = None, details: str = None, duration_sec: float = None,
|
||||
model_used: str = None, token_count: int = None, error: str = None):
|
||||
"""Write an entry to the audit log."""
|
||||
from schemas import gen_id
|
||||
conn.execute(
|
||||
"""INSERT INTO audit_log
|
||||
(log_id, timestamp, sim_id, action, handle, details, duration_sec, model_used, token_count, error)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
|
||||
(gen_id("log_"), datetime.utcnow().isoformat() + "Z", sim_id, action,
|
||||
handle, details, duration_sec, model_used, token_count, error)
|
||||
)
|
||||
conn.commit()
|
||||
|
||||
|
||||
# -- Query Helpers --
|
||||
|
||||
def get_prediction_accuracy(conn: sqlite3.Connection, prediction_type: str = None) -> dict:
|
||||
"""Get prediction accuracy statistics."""
|
||||
query = """
|
||||
SELECT prediction_type,
|
||||
COUNT(*) as total,
|
||||
SUM(CASE WHEN outcome='correct' THEN 1 ELSE 0 END) as correct,
|
||||
SUM(CASE WHEN outcome='partially_correct' THEN 1 ELSE 0 END) as partial,
|
||||
SUM(CASE WHEN outcome='incorrect' THEN 1 ELSE 0 END) as incorrect,
|
||||
AVG(confidence) as avg_confidence,
|
||||
AVG(CASE WHEN outcome='correct' THEN 1.0
|
||||
WHEN outcome='partially_correct' THEN 0.5
|
||||
ELSE 0.0 END) as accuracy
|
||||
FROM predictions WHERE outcome IS NOT NULL
|
||||
"""
|
||||
params = []
|
||||
if prediction_type:
|
||||
query += " AND prediction_type = ?"
|
||||
params.append(prediction_type)
|
||||
query += " GROUP BY prediction_type"
|
||||
return [dict(row) for row in conn.execute(query, params).fetchall()]
|
||||
|
||||
|
||||
def get_open_predictions(conn: sqlite3.Connection, handle: str = None) -> list:
|
||||
"""Get unresolved predictions."""
|
||||
query = "SELECT * FROM predictions WHERE outcome IS NULL"
|
||||
params = []
|
||||
if handle:
|
||||
query += " AND handle = ?"
|
||||
params.append(handle)
|
||||
query += " ORDER BY created_at DESC"
|
||||
return [dict(row) for row in conn.execute(query, params).fetchall()]
|
||||
|
||||
|
||||
def get_social_neighborhood(conn: sqlite3.Connection, handle: str, depth: int = 1) -> list:
|
||||
"""Get a person's social graph neighborhood."""
|
||||
query = """
|
||||
SELECT from_handle, to_handle, relationship_type, weight
|
||||
FROM social_edges
|
||||
WHERE from_handle = ? OR to_handle = ?
|
||||
ORDER BY weight DESC
|
||||
"""
|
||||
return [dict(row) for row in conn.execute(query, (handle, handle)).fetchall()]
|
||||
|
||||
|
||||
def get_unread_alerts(conn: sqlite3.Connection) -> list:
|
||||
"""Get unacknowledged monitoring alerts."""
|
||||
query = """
|
||||
SELECT * FROM monitoring_events
|
||||
WHERE acknowledged = 0
|
||||
ORDER BY detected_at DESC
|
||||
"""
|
||||
return [dict(row) for row in conn.execute(query).fetchall()]
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
conn = init_db()
|
||||
print(f"Database initialized at {MAIN_DB}")
|
||||
conn.close()
|
||||
216
optional-skills/worldsim/rehoboam/schemas.py
Normal file
216
optional-skills/worldsim/rehoboam/schemas.py
Normal file
@@ -0,0 +1,216 @@
|
||||
"""
|
||||
REHOBOAM Data Schemas
|
||||
Pydantic models for all JSON data structures used in the system.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Optional
|
||||
from datetime import datetime
|
||||
import json
|
||||
import uuid
|
||||
|
||||
|
||||
def gen_id(prefix: str = "") -> str:
|
||||
return f"{prefix}{datetime.utcnow().strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:8]}"
|
||||
|
||||
|
||||
@dataclass
|
||||
class OceanScores:
|
||||
openness: float = 0.5
|
||||
conscientiousness: float = 0.5
|
||||
extraversion: float = 0.5
|
||||
agreeableness: float = 0.5
|
||||
neuroticism: float = 0.5
|
||||
|
||||
|
||||
@dataclass
|
||||
class DarkTriad:
|
||||
narcissism: float = 0.0
|
||||
machiavellianism: float = 0.0
|
||||
psychopathy: float = 0.0
|
||||
|
||||
|
||||
@dataclass
|
||||
class MoralFoundations:
|
||||
care: float = 0.5
|
||||
fairness: float = 0.5
|
||||
loyalty: float = 0.5
|
||||
authority: float = 0.5
|
||||
sanctity: float = 0.5
|
||||
liberty: float = 0.5
|
||||
|
||||
|
||||
@dataclass
|
||||
class Psychometrics:
|
||||
ocean: OceanScores = field(default_factory=OceanScores)
|
||||
mbti_estimate: str = ""
|
||||
dark_triad: DarkTriad = field(default_factory=DarkTriad)
|
||||
moral_foundations: MoralFoundations = field(default_factory=MoralFoundations)
|
||||
confidence: float = 0.0
|
||||
sample_size: int = 0
|
||||
|
||||
|
||||
@dataclass
|
||||
class VoiceFingerprint:
|
||||
vocabulary_tier: str = ""
|
||||
avg_sentence_length: float = 0.0
|
||||
exclamation_rate: float = 0.0
|
||||
question_rate: float = 0.0
|
||||
emoji_rate: float = 0.0
|
||||
slang_index: float = 0.0
|
||||
formality_score: float = 0.5
|
||||
humor_style: str = ""
|
||||
signature_phrases: list[str] = field(default_factory=list)
|
||||
topics_vocabulary: dict[str, float] = field(default_factory=dict)
|
||||
cadence_pattern: str = ""
|
||||
|
||||
|
||||
@dataclass
|
||||
class Stance:
|
||||
position: str = ""
|
||||
intensity: float = 0.0
|
||||
last_seen: str = ""
|
||||
|
||||
|
||||
@dataclass
|
||||
class Influence:
|
||||
score: float = 0.0
|
||||
reach: str = "micro"
|
||||
engagement_rate: float = 0.0
|
||||
amplification_power: float = 0.0
|
||||
thought_leadership_domains: list[str] = field(default_factory=list)
|
||||
|
||||
|
||||
@dataclass
|
||||
class PostingPatterns:
|
||||
avg_posts_per_day: float = 0.0
|
||||
peak_hours_utc: list[int] = field(default_factory=list)
|
||||
weekend_ratio: float = 0.5
|
||||
reply_ratio: float = 0.0
|
||||
repost_ratio: float = 0.0
|
||||
thread_frequency: float = 0.0
|
||||
controversy_rate: float = 0.0
|
||||
|
||||
|
||||
@dataclass
|
||||
class Relationships:
|
||||
allies: list[str] = field(default_factory=list)
|
||||
rivals: list[str] = field(default_factory=list)
|
||||
frequent_interactions: list[str] = field(default_factory=list)
|
||||
mentioned_by_frequently: list[str] = field(default_factory=list)
|
||||
|
||||
|
||||
@dataclass
|
||||
class ProfileMeta:
|
||||
data_sources: list[str] = field(default_factory=list)
|
||||
computation_time_sec: float = 0.0
|
||||
model_used: str = ""
|
||||
last_full_rebuild: str = ""
|
||||
last_incremental: str = ""
|
||||
|
||||
|
||||
@dataclass
|
||||
class Identity:
|
||||
bio: str = ""
|
||||
location: str = ""
|
||||
verified: bool = False
|
||||
follower_count: int = 0
|
||||
following_count: int = 0
|
||||
account_created: str = ""
|
||||
|
||||
|
||||
@dataclass
|
||||
class Profile:
|
||||
schema_version: str = "7.0"
|
||||
handle: str = ""
|
||||
platform: str = "x"
|
||||
display_name: str = ""
|
||||
created_at: str = ""
|
||||
last_updated: str = ""
|
||||
update_count: int = 0
|
||||
staleness_score: float = 1.0
|
||||
identity: Identity = field(default_factory=Identity)
|
||||
psychometrics: Psychometrics = field(default_factory=Psychometrics)
|
||||
voice_fingerprint: VoiceFingerprint = field(default_factory=VoiceFingerprint)
|
||||
stances: dict[str, Stance] = field(default_factory=dict)
|
||||
community_membership: list[str] = field(default_factory=list)
|
||||
influence: Influence = field(default_factory=Influence)
|
||||
posting_patterns: PostingPatterns = field(default_factory=PostingPatterns)
|
||||
relationships: Relationships = field(default_factory=Relationships)
|
||||
star_thread_ref: str = "star_thread.json"
|
||||
raw_data_refs: list[str] = field(default_factory=list)
|
||||
_meta: ProfileMeta = field(default_factory=ProfileMeta)
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
"""Recursively convert to dict for JSON serialization."""
|
||||
import dataclasses
|
||||
def _convert(obj):
|
||||
if dataclasses.is_dataclass(obj):
|
||||
return {k: _convert(v) for k, v in dataclasses.asdict(obj).items()}
|
||||
elif isinstance(obj, list):
|
||||
return [_convert(i) for i in obj]
|
||||
elif isinstance(obj, dict):
|
||||
return {k: _convert(v) for k, v in obj.items()}
|
||||
return obj
|
||||
return _convert(self)
|
||||
|
||||
def to_json(self, indent: int = 2) -> str:
|
||||
return json.dumps(self.to_dict(), indent=indent)
|
||||
|
||||
|
||||
@dataclass
|
||||
class StarThread:
|
||||
handle: str = ""
|
||||
computed_at: str = ""
|
||||
based_on_profile_version: str = ""
|
||||
thread_version: int = 1
|
||||
core_compression: str = ""
|
||||
key_drives: list[str] = field(default_factory=list)
|
||||
predictive_axioms: list[str] = field(default_factory=list)
|
||||
voice_template: dict = field(default_factory=dict)
|
||||
anti_slop_markers: list[str] = field(default_factory=list)
|
||||
_meta: dict = field(default_factory=dict)
|
||||
|
||||
|
||||
@dataclass
|
||||
class Prediction:
|
||||
pred_id: str = ""
|
||||
created_at: str = ""
|
||||
sim_id: str = ""
|
||||
handle: str = ""
|
||||
prediction_type: str = "" # statement, career, alliance, content, network_reaction
|
||||
prediction_text: str = ""
|
||||
confidence: float = 0.5
|
||||
calibrated_confidence: float = 0.5
|
||||
timeframe_days: int = 30
|
||||
resolved_at: Optional[str] = None
|
||||
outcome: Optional[str] = None # correct, partially_correct, incorrect
|
||||
outcome_evidence: Optional[str] = None
|
||||
accuracy_score: Optional[float] = None
|
||||
|
||||
|
||||
@dataclass
|
||||
class WatchConfig:
|
||||
watch_id: str = ""
|
||||
handle: str = ""
|
||||
platform: str = "x"
|
||||
enabled: bool = True
|
||||
check_interval_minutes: int = 120
|
||||
watch_for: list[dict] = field(default_factory=list)
|
||||
alert_severity_minimum: str = "notable"
|
||||
created_at: str = ""
|
||||
|
||||
|
||||
@dataclass
|
||||
class PopulationDefinition:
|
||||
group_id: str = ""
|
||||
name: str = ""
|
||||
description: str = ""
|
||||
created_at: str = ""
|
||||
last_updated: str = ""
|
||||
explicit_members: list[str] = field(default_factory=list)
|
||||
criteria: dict = field(default_factory=dict)
|
||||
resolved_members: list[str] = field(default_factory=list)
|
||||
sampling_strategy: str = "representative"
|
||||
default_sample_size: int = 12
|
||||
280
optional-skills/worldsim/rehoboam/storage.py
Normal file
280
optional-skills/worldsim/rehoboam/storage.py
Normal file
@@ -0,0 +1,280 @@
|
||||
"""
|
||||
REHOBOAM Storage Layer
|
||||
Directory management, profile I/O, index maintenance.
|
||||
"""
|
||||
|
||||
import json
|
||||
import shutil
|
||||
from pathlib import Path
|
||||
from datetime import datetime, timedelta
|
||||
from typing import Optional
|
||||
|
||||
BASE_DIR = Path.home() / ".hermes" / "rehoboam"
|
||||
PROFILES_DIR = BASE_DIR / "profiles"
|
||||
POPULATIONS_DIR = BASE_DIR / "populations"
|
||||
SIMULATIONS_DIR = BASE_DIR / "simulations"
|
||||
MONITORING_DIR = BASE_DIR / "monitoring"
|
||||
CONFIG_DIR = BASE_DIR / "config"
|
||||
|
||||
|
||||
def init_storage():
|
||||
"""Create all required directories."""
|
||||
for d in [PROFILES_DIR, POPULATIONS_DIR, SIMULATIONS_DIR,
|
||||
MONITORING_DIR, MONITORING_DIR / "alerts", CONFIG_DIR,
|
||||
BASE_DIR / "db"]:
|
||||
d.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Create default configs if they don't exist
|
||||
staleness_path = CONFIG_DIR / "staleness_policy.json"
|
||||
if not staleness_path.exists():
|
||||
staleness_path.write_text(json.dumps({
|
||||
"thresholds": {
|
||||
"fresh": {"max_age_hours": 72},
|
||||
"stale": {"max_age_hours": 336},
|
||||
"expired": {"max_age_hours": 2160},
|
||||
"archived": {"max_age_hours": 8760}
|
||||
},
|
||||
"per_field_decay": {
|
||||
"psychometrics": {"half_life_days": 180},
|
||||
"stances": {"half_life_days": 30},
|
||||
"posting_patterns": {"half_life_days": 60},
|
||||
"relationships": {"half_life_days": 45},
|
||||
"influence": {"half_life_days": 90},
|
||||
"voice_fingerprint": {"half_life_days": 365}
|
||||
},
|
||||
"auto_refresh_on_simulation": True,
|
||||
"auto_refresh_threshold": "stale"
|
||||
}, indent=2))
|
||||
|
||||
config_path = CONFIG_DIR / "rehoboam.json"
|
||||
if not config_path.exists():
|
||||
config_path.write_text(json.dumps({
|
||||
"version": "7.0",
|
||||
"default_model": "claude-opus-4-20250514",
|
||||
"max_thread_age_days": 30,
|
||||
"monitoring_enabled": False,
|
||||
"auto_thread": True,
|
||||
"auto_profile_update": True
|
||||
}, indent=2))
|
||||
|
||||
# Create indexes if they don't exist
|
||||
for idx_path in [PROFILES_DIR / "_index.json", POPULATIONS_DIR / "_index.json",
|
||||
SIMULATIONS_DIR / "_index.json"]:
|
||||
if not idx_path.exists():
|
||||
idx_path.write_text("{}")
|
||||
|
||||
|
||||
def normalize_handle(handle: str) -> str:
|
||||
"""Normalize a handle to a filesystem-safe directory name."""
|
||||
h = handle.lstrip("@").lower().strip()
|
||||
# Replace characters that are problematic in filenames
|
||||
return h.replace("/", "_").replace("\\", "_")
|
||||
|
||||
|
||||
# -- Profile I/O --
|
||||
|
||||
def get_profile_dir(handle: str) -> Path:
|
||||
return PROFILES_DIR / normalize_handle(handle)
|
||||
|
||||
|
||||
def profile_exists(handle: str) -> bool:
|
||||
return (get_profile_dir(handle) / "profile.json").exists()
|
||||
|
||||
|
||||
def load_profile(handle: str) -> Optional[dict]:
|
||||
path = get_profile_dir(handle) / "profile.json"
|
||||
if path.exists():
|
||||
return json.loads(path.read_text())
|
||||
return None
|
||||
|
||||
|
||||
def save_profile(handle: str, profile: dict, snapshot: bool = True):
|
||||
"""Save a profile, optionally snapshotting the old one."""
|
||||
pdir = get_profile_dir(handle)
|
||||
pdir.mkdir(parents=True, exist_ok=True)
|
||||
(pdir / "history").mkdir(exist_ok=True)
|
||||
(pdir / "raw").mkdir(exist_ok=True)
|
||||
(pdir / "predictions").mkdir(exist_ok=True)
|
||||
|
||||
profile_path = pdir / "profile.json"
|
||||
|
||||
# Snapshot old profile before overwriting
|
||||
if snapshot and profile_path.exists():
|
||||
old = json.loads(profile_path.read_text())
|
||||
ts = old.get("last_updated", datetime.utcnow().isoformat()).replace(":", "-")
|
||||
snapshot_path = pdir / "history" / f"profile_{ts[:10]}.json"
|
||||
shutil.copy2(profile_path, snapshot_path)
|
||||
|
||||
profile_path.write_text(json.dumps(profile, indent=2))
|
||||
_update_profile_index(handle, profile)
|
||||
|
||||
|
||||
def _update_profile_index(handle: str, profile: dict):
|
||||
idx_path = PROFILES_DIR / "_index.json"
|
||||
idx = json.loads(idx_path.read_text()) if idx_path.exists() else {}
|
||||
idx[normalize_handle(handle)] = {
|
||||
"platform": profile.get("platform", "x"),
|
||||
"last_updated": profile.get("last_updated", ""),
|
||||
"staleness": compute_staleness(profile.get("last_updated", "")),
|
||||
"has_star_thread": (get_profile_dir(handle) / "star_thread.json").exists(),
|
||||
"simulation_count": idx.get(normalize_handle(handle), {}).get("simulation_count", 0),
|
||||
"display_name": profile.get("display_name", "")
|
||||
}
|
||||
idx_path.write_text(json.dumps(idx, indent=2))
|
||||
|
||||
|
||||
# -- Star Thread I/O --
|
||||
|
||||
def load_star_thread(handle: str) -> Optional[dict]:
|
||||
path = get_profile_dir(handle) / "star_thread.json"
|
||||
if path.exists():
|
||||
return json.loads(path.read_text())
|
||||
return None
|
||||
|
||||
|
||||
def save_star_thread(handle: str, thread: dict):
|
||||
path = get_profile_dir(handle) / "star_thread.json"
|
||||
get_profile_dir(handle).mkdir(parents=True, exist_ok=True)
|
||||
path.write_text(json.dumps(thread, indent=2))
|
||||
# Update index to reflect thread existence
|
||||
idx_path = PROFILES_DIR / "_index.json"
|
||||
if idx_path.exists():
|
||||
idx = json.loads(idx_path.read_text())
|
||||
key = normalize_handle(handle)
|
||||
if key in idx:
|
||||
idx[key]["has_star_thread"] = True
|
||||
idx_path.write_text(json.dumps(idx, indent=2))
|
||||
|
||||
|
||||
# -- Staleness --
|
||||
|
||||
def compute_staleness(last_updated: str) -> str:
|
||||
"""Determine staleness level from a timestamp string."""
|
||||
if not last_updated:
|
||||
return "expired"
|
||||
try:
|
||||
dt = datetime.fromisoformat(last_updated.rstrip("Z"))
|
||||
except ValueError:
|
||||
return "expired"
|
||||
|
||||
age = datetime.utcnow() - dt
|
||||
hours = age.total_seconds() / 3600
|
||||
|
||||
policy = _load_staleness_policy()
|
||||
thresholds = policy.get("thresholds", {})
|
||||
|
||||
if hours <= thresholds.get("fresh", {}).get("max_age_hours", 72):
|
||||
return "fresh"
|
||||
elif hours <= thresholds.get("stale", {}).get("max_age_hours", 336):
|
||||
return "stale"
|
||||
elif hours <= thresholds.get("expired", {}).get("max_age_hours", 2160):
|
||||
return "expired"
|
||||
else:
|
||||
return "archived"
|
||||
|
||||
|
||||
def _load_staleness_policy() -> dict:
|
||||
path = CONFIG_DIR / "staleness_policy.json"
|
||||
if path.exists():
|
||||
return json.loads(path.read_text())
|
||||
return {"thresholds": {"fresh": {"max_age_hours": 72}, "stale": {"max_age_hours": 336},
|
||||
"expired": {"max_age_hours": 2160}, "archived": {"max_age_hours": 8760}}}
|
||||
|
||||
|
||||
def needs_thread_recompute(handle: str) -> bool:
|
||||
"""Check if a star thread needs recomputation."""
|
||||
thread = load_star_thread(handle)
|
||||
if thread is None:
|
||||
return True
|
||||
|
||||
profile = load_profile(handle)
|
||||
if profile is None:
|
||||
return True
|
||||
|
||||
# Thread is stale if profile was updated after thread was computed
|
||||
thread_time = thread.get("based_on_profile_version", "")
|
||||
profile_time = profile.get("last_updated", "")
|
||||
if thread_time < profile_time:
|
||||
return True
|
||||
|
||||
# Thread is stale if older than max_thread_age_days
|
||||
config = json.loads((CONFIG_DIR / "rehoboam.json").read_text()) if (CONFIG_DIR / "rehoboam.json").exists() else {}
|
||||
max_age = config.get("max_thread_age_days", 30)
|
||||
try:
|
||||
computed = datetime.fromisoformat(thread.get("computed_at", "").rstrip("Z"))
|
||||
if (datetime.utcnow() - computed).days > max_age:
|
||||
return True
|
||||
except ValueError:
|
||||
return True
|
||||
|
||||
return False
|
||||
|
||||
|
||||
# -- Simulation I/O --
|
||||
|
||||
def save_simulation(sim_id: str, config: dict, output: dict, analytics: dict, audit: dict):
|
||||
sdir = SIMULATIONS_DIR / sim_id
|
||||
sdir.mkdir(parents=True, exist_ok=True)
|
||||
(sdir / "config.json").write_text(json.dumps(config, indent=2))
|
||||
(sdir / "output.json").write_text(json.dumps(output, indent=2))
|
||||
(sdir / "analytics.json").write_text(json.dumps(analytics, indent=2))
|
||||
(sdir / "audit.json").write_text(json.dumps(audit, indent=2))
|
||||
|
||||
# Update index
|
||||
idx_path = SIMULATIONS_DIR / "_index.json"
|
||||
idx = json.loads(idx_path.read_text()) if idx_path.exists() else {}
|
||||
idx[sim_id] = {
|
||||
"created_at": config.get("created_at", datetime.utcnow().isoformat() + "Z"),
|
||||
"scenario": config.get("scenario", ""),
|
||||
"participant_count": len(config.get("participants", [])),
|
||||
}
|
||||
idx_path.write_text(json.dumps(idx, indent=2))
|
||||
|
||||
|
||||
# -- Population I/O --
|
||||
|
||||
def save_population(group_id: str, definition: dict, aggregate: dict = None):
|
||||
pdir = POPULATIONS_DIR / group_id
|
||||
pdir.mkdir(parents=True, exist_ok=True)
|
||||
(pdir / "history").mkdir(exist_ok=True)
|
||||
(pdir / "definition.json").write_text(json.dumps(definition, indent=2))
|
||||
if aggregate:
|
||||
(pdir / "aggregate.json").write_text(json.dumps(aggregate, indent=2))
|
||||
|
||||
idx_path = POPULATIONS_DIR / "_index.json"
|
||||
idx = json.loads(idx_path.read_text()) if idx_path.exists() else {}
|
||||
idx[group_id] = {
|
||||
"name": definition.get("name", group_id),
|
||||
"member_count": len(definition.get("resolved_members", definition.get("explicit_members", []))),
|
||||
"last_updated": definition.get("last_updated", "")
|
||||
}
|
||||
idx_path.write_text(json.dumps(idx, indent=2))
|
||||
|
||||
|
||||
def load_population(group_id: str) -> Optional[dict]:
|
||||
path = POPULATIONS_DIR / group_id / "definition.json"
|
||||
if path.exists():
|
||||
return json.loads(path.read_text())
|
||||
return None
|
||||
|
||||
|
||||
# -- Listing --
|
||||
|
||||
def list_profiles() -> dict:
|
||||
idx_path = PROFILES_DIR / "_index.json"
|
||||
return json.loads(idx_path.read_text()) if idx_path.exists() else {}
|
||||
|
||||
|
||||
def list_populations() -> dict:
|
||||
idx_path = POPULATIONS_DIR / "_index.json"
|
||||
return json.loads(idx_path.read_text()) if idx_path.exists() else {}
|
||||
|
||||
|
||||
def list_simulations() -> dict:
|
||||
idx_path = SIMULATIONS_DIR / "_index.json"
|
||||
return json.loads(idx_path.read_text()) if idx_path.exists() else {}
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
init_storage()
|
||||
print(f"Storage initialized at {BASE_DIR}")
|
||||
139
optional-skills/worldsim/scripts/facebook_api.py
Normal file
139
optional-skills/worldsim/scripts/facebook_api.py
Normal file
@@ -0,0 +1,139 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Facebook Page/Profile Data Extractor
|
||||
Uses multiple techniques to extract public Facebook data without authentication:
|
||||
1. Googlebot UA for OG meta tags (name, description, likes, talking_about, bio, og:image)
|
||||
2. Graph API /picture endpoint for profile photos (pages only)
|
||||
3. Page Plugin embed for follower counts and page IDs
|
||||
"""
|
||||
|
||||
import subprocess
|
||||
import json
|
||||
import re
|
||||
import html
|
||||
import sys
|
||||
|
||||
GOOGLEBOT_UA = 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'
|
||||
|
||||
def curl_get(url, ua=None):
|
||||
"""Fetch URL with curl"""
|
||||
cmd = ['curl', '-s', '-L', '--max-time', '15']
|
||||
if ua:
|
||||
cmd += ['-H', f'User-Agent: {ua}']
|
||||
cmd.append(url)
|
||||
result = subprocess.run(cmd, capture_output=True, text=True, timeout=20)
|
||||
return result.stdout
|
||||
|
||||
def extract_og_data(username):
|
||||
"""Extract OG meta tags using Googlebot UA"""
|
||||
content = curl_get(f'https://www.facebook.com/{username}', ua=GOOGLEBOT_UA)
|
||||
|
||||
data = {}
|
||||
|
||||
# Extract OG tags
|
||||
og_title = re.search(r'og:title"\s*content="([^"]*)"', content)
|
||||
if og_title:
|
||||
data['name'] = html.unescape(og_title.group(1))
|
||||
|
||||
og_desc = re.search(r'og:description"\s*content="([^"]*)"', content)
|
||||
if og_desc:
|
||||
desc = html.unescape(og_desc.group(1))
|
||||
data['raw_description'] = desc
|
||||
|
||||
# Parse likes count
|
||||
likes_match = re.search(r'([\d,]+)\s+likes?', desc)
|
||||
if likes_match:
|
||||
data['likes'] = likes_match.group(1)
|
||||
|
||||
# Parse talking about
|
||||
talking_match = re.search(r'([\d,]+)\s+talking about this', desc)
|
||||
if talking_match:
|
||||
data['talking_about'] = talking_match.group(1)
|
||||
|
||||
# Extract bio (text after the "talking about this." part)
|
||||
bio_match = re.search(r'talking about this\.\s*(.+)', desc)
|
||||
if bio_match:
|
||||
data['bio'] = bio_match.group(1)
|
||||
|
||||
og_image = re.search(r'og:image"\s*content="([^"]*)"', content)
|
||||
if og_image:
|
||||
data['og_image'] = html.unescape(og_image.group(1))
|
||||
|
||||
return data
|
||||
|
||||
def extract_plugin_data(username):
|
||||
"""Extract data from Page Plugin embed"""
|
||||
content = curl_get(f'https://www.facebook.com/plugins/page.php?href=https://www.facebook.com/{username}&tabs=timeline&width=500&height=600')
|
||||
|
||||
data = {}
|
||||
|
||||
# Page name from title attribute
|
||||
name_match = re.search(r'class="_1drp _5lv6" title="([^"]*)"', content)
|
||||
if name_match:
|
||||
data['plugin_name'] = html.unescape(name_match.group(1))
|
||||
|
||||
# Follower count
|
||||
followers_match = re.search(r'([\d,]+)\s+followers', content)
|
||||
if followers_match:
|
||||
data['followers'] = followers_match.group(1)
|
||||
|
||||
# Page ID
|
||||
pageid_match = re.search(r'"pageID":"(\d+)"', content)
|
||||
if pageid_match:
|
||||
data['page_id'] = pageid_match.group(1)
|
||||
|
||||
return data
|
||||
|
||||
def extract_profile_picture(username):
|
||||
"""Get profile picture via Graph API"""
|
||||
content = curl_get(f'https://graph.facebook.com/v19.0/{username}/picture?redirect=false&width=400&height=400')
|
||||
try:
|
||||
d = json.loads(content)
|
||||
if 'data' in d and not d['data'].get('is_silhouette', True):
|
||||
return d['data']['url']
|
||||
except:
|
||||
pass
|
||||
return None
|
||||
|
||||
def get_facebook_data(username):
|
||||
"""Combine all extraction methods"""
|
||||
result = {'username': username}
|
||||
|
||||
# Method 1: OG tags (best for bio, likes, talking_about)
|
||||
og = extract_og_data(username)
|
||||
result.update(og)
|
||||
|
||||
# Method 2: Plugin (best for followers, page_id)
|
||||
plugin = extract_plugin_data(username)
|
||||
result.update(plugin)
|
||||
|
||||
# Method 3: Graph API picture (pages only)
|
||||
pic = extract_profile_picture(username)
|
||||
if pic:
|
||||
result['profile_picture'] = pic
|
||||
|
||||
# Also try by page_id for picture if username didn't work
|
||||
if not pic and 'page_id' in result:
|
||||
pic2 = extract_profile_picture(result['page_id'])
|
||||
if pic2:
|
||||
result['profile_picture'] = pic2
|
||||
|
||||
return result
|
||||
|
||||
if __name__ == '__main__':
|
||||
targets = sys.argv[1:] if len(sys.argv) > 1 else ['zuck', 'NVIDIA', 'Meta', 'CocaCola']
|
||||
|
||||
for target in targets:
|
||||
print(f"{'='*60}")
|
||||
print(f"Facebook Profile: {target}")
|
||||
print(f"{'='*60}")
|
||||
data = get_facebook_data(target)
|
||||
for k, v in data.items():
|
||||
if k == 'raw_description':
|
||||
continue # Skip raw, we show parsed fields
|
||||
val = str(v)
|
||||
if len(val) > 120:
|
||||
val = val[:120] + '...'
|
||||
print(f" {k}: {val}")
|
||||
print()
|
||||
|
||||
595
optional-skills/worldsim/scripts/research.py
Normal file
595
optional-skills/worldsim/scripts/research.py
Normal file
@@ -0,0 +1,595 @@
|
||||
"""
|
||||
Hermes Simulator — Intelligence Gathering Pipeline v2
|
||||
|
||||
Full-spectrum OSINT research engine for personality modeling.
|
||||
Searches text, extracts content, browses live pages, analyzes
|
||||
images with vision, and cross-references across platforms.
|
||||
|
||||
Run via execute_code. The agent adapts searches based on findings.
|
||||
"""
|
||||
|
||||
from hermes_tools import web_search, web_extract, terminal
|
||||
import json
|
||||
import time
|
||||
import urllib.parse
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════
|
||||
# CONFIGURATION
|
||||
# ═══════════════════════════════════════════════════════════════
|
||||
|
||||
AGGREGATOR_SITES = [
|
||||
"buttondown.com/ainews",
|
||||
"news.smol.ai",
|
||||
"techmeme.com",
|
||||
"latent.space",
|
||||
]
|
||||
|
||||
# Verified working fallback data sources (tested April 2026)
|
||||
# Priority order: X API > nitter.cz > ThreadReaderApp > GitHub > Reddit > HN
|
||||
FALLBACK_SOURCES = {
|
||||
"nitter": "https://nitter.cz/{handle}", # web_extract — full timeline
|
||||
"threadreader": "https://threadreaderapp.com/user/{handle}", # web_extract — historical threads
|
||||
"github_profile": "https://api.github.com/users/{handle}", # curl — profile + README
|
||||
"github_events": "https://api.github.com/users/{handle}/events", # curl — recent activity
|
||||
"reddit_user": "https://www.reddit.com/user/{handle}.json", # curl w/ User-Agent
|
||||
"reddit_comments": "https://www.reddit.com/user/{handle}/comments.json",
|
||||
"hn_search": "https://hn.algolia.com/api/v1/search?query={handle}&tags=comment",
|
||||
}
|
||||
|
||||
# CONFIRMED BLOCKED (don't waste calls on these):
|
||||
# - LinkedIn (web_extract blocked, browser auth wall)
|
||||
# - Instagram viewers (imginn, picuki, dumpoir, gramhir — all 403)
|
||||
# - Most nitter instances (dead or 403, ONLY nitter.cz works via web_extract)
|
||||
# - Wayback Machine for tweets (sparse, no JS content)
|
||||
# - Google Cache of Twitter (empty)
|
||||
# - Archive.today (429 + CAPTCHA)
|
||||
# - Twitter Syndication API (rate limited)
|
||||
|
||||
AI_SUBREDDITS = [
|
||||
"LocalLLaMA", "MachineLearning", "singularity",
|
||||
"ChatGPT", "ClaudeAI", "OpenAI", "StableDiffusion",
|
||||
]
|
||||
|
||||
PLATFORMS = ["twitter", "instagram", "linkedin", "github", "reddit", "youtube"]
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════
|
||||
# HELPER: safe web_search with validation
|
||||
# ═══════════════════════════════════════════════════════════════
|
||||
|
||||
def _safe_web_search(query: str, limit: int = 5) -> list:
|
||||
"""Run web_search and return results list, with validation."""
|
||||
r = web_search(query, limit=limit)
|
||||
if not isinstance(r, dict) or "data" not in r:
|
||||
print(f" [WARNING] web_search returned no 'data' key for query: {query[:80]}")
|
||||
return []
|
||||
data = r.get("data", {})
|
||||
if not isinstance(data, dict):
|
||||
return []
|
||||
return data.get("web", []) or []
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════
|
||||
# CORE SEARCH FUNCTIONS
|
||||
# ═══════════════════════════════════════════════════════════════
|
||||
|
||||
def search_identity(handle: str) -> dict:
|
||||
"""Establish who they are across the internet."""
|
||||
results = {}
|
||||
results["twitter_identity"] = _safe_web_search(f"@{handle} twitter bio role company", limit=5)
|
||||
results["general_identity"] = _safe_web_search(f"{handle} known for", limit=5)
|
||||
return results
|
||||
|
||||
|
||||
def search_voice(handle: str) -> dict:
|
||||
"""How do they actually talk/write."""
|
||||
results = {}
|
||||
results["takes"] = _safe_web_search(f"{handle} twitter hot takes opinions", limit=5)
|
||||
|
||||
for agg in AGGREGATOR_SITES[:2]:
|
||||
hits = _safe_web_search(f"site:{agg} {handle}", limit=3)
|
||||
if hits:
|
||||
# Use full domain as key, not split('.')[0]
|
||||
results[f"agg_{agg}"] = hits
|
||||
return results
|
||||
|
||||
|
||||
def search_positions(handle: str, topics: list = None, domain: str = None) -> dict:
|
||||
"""What are their known positions."""
|
||||
results = {}
|
||||
if topics:
|
||||
for topic in topics[:3]:
|
||||
results[f"topic_{topic}"] = _safe_web_search(f"{handle} {topic} opinion take", limit=5)
|
||||
|
||||
# Build controversy query — only add domain keywords if specified
|
||||
controversy_query = f"{handle} debate disagree controversial"
|
||||
if domain:
|
||||
controversy_query += f" {domain}"
|
||||
results["controversies"] = _safe_web_search(controversy_query, limit=5)
|
||||
return results
|
||||
|
||||
|
||||
def search_longform(handle: str, real_name: str = None, domain: str = None) -> dict:
|
||||
"""Blogs, interviews, essays."""
|
||||
results = {}
|
||||
name = real_name or handle
|
||||
|
||||
blog_query = f"{name} blog substack essay"
|
||||
interview_query = f"{name} interview podcast"
|
||||
if domain:
|
||||
blog_query += f" {domain}"
|
||||
interview_query += f" {domain}"
|
||||
|
||||
results["blogs"] = _safe_web_search(blog_query, limit=5)
|
||||
results["interviews"] = _safe_web_search(interview_query, limit=5)
|
||||
return results
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════
|
||||
# CROSS-PLATFORM DISCOVERY
|
||||
# ═══════════════════════════════════════════════════════════════
|
||||
|
||||
def discover_platforms(handle: str, real_name: str = None) -> dict:
|
||||
"""Find someone across all platforms."""
|
||||
name = real_name or handle
|
||||
results = {}
|
||||
|
||||
# Instagram
|
||||
results["instagram"] = _safe_web_search(f"{name} instagram OR site:instagram.com/{handle}", limit=5)
|
||||
|
||||
# LinkedIn
|
||||
results["linkedin"] = _safe_web_search(f"{name} linkedin OR site:linkedin.com/in", limit=5)
|
||||
|
||||
# Reddit
|
||||
results["reddit"] = _safe_web_search(f"{name} reddit account OR site:reddit.com/user", limit=5)
|
||||
|
||||
# GitHub
|
||||
results["github"] = _safe_web_search(f"{handle} github OR site:github.com/{handle}", limit=5)
|
||||
|
||||
# YouTube
|
||||
results["youtube"] = _safe_web_search(f"{name} youtube channel OR talk OR interview", limit=5)
|
||||
|
||||
# Personal site
|
||||
results["personal_site"] = _safe_web_search(f"{name} personal website blog about", limit=5)
|
||||
|
||||
# Hacker News
|
||||
results["hackernews"] = _safe_web_search(f"site:news.ycombinator.com {handle} OR {name}", limit=3)
|
||||
|
||||
return results
|
||||
|
||||
|
||||
def discover_instagram(handle: str = None, real_name: str = None) -> dict:
|
||||
"""Focused Instagram discovery."""
|
||||
results = {}
|
||||
name = real_name or handle
|
||||
|
||||
# Try to find their IG handle
|
||||
results["ig_search"] = _safe_web_search(f"{name} instagram profile", limit=5)
|
||||
|
||||
# If we have a candidate IG URL, try to extract
|
||||
ig_urls = []
|
||||
for item in results.get("ig_search", []):
|
||||
if not isinstance(item, dict):
|
||||
continue
|
||||
url = item.get("url", "")
|
||||
if "instagram.com/" in url and "/p/" not in url:
|
||||
ig_urls.append(url)
|
||||
|
||||
if ig_urls:
|
||||
# Try to extract IG profile page
|
||||
r = web_extract(urls=ig_urls[:1])
|
||||
results["ig_profile"] = r.get("results", [])
|
||||
|
||||
return results
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════
|
||||
# VISUAL INTELLIGENCE
|
||||
# ═══════════════════════════════════════════════════════════════
|
||||
|
||||
# NOTE: These functions use browser_* and vision_analyze which are
|
||||
# NOT available in execute_code. They are called DIRECTLY by the
|
||||
# agent after the execute_code research phase.
|
||||
#
|
||||
# The agent should:
|
||||
# 1. Run this script via execute_code for text-based research
|
||||
# 2. Then use browser/vision tools directly for visual research
|
||||
#
|
||||
# Visual research tasks for the agent:
|
||||
#
|
||||
# INSTAGRAM VISUAL:
|
||||
# browser_navigate("https://www.instagram.com/{ig_handle}/")
|
||||
# browser_vision(question="Describe this Instagram profile: bio, pic, grid, aesthetic, follower count")
|
||||
# browser_get_images() # collect image URLs
|
||||
# vision_analyze(image_url="{url}", question="Describe: setting, people, mood, style")
|
||||
#
|
||||
# PROFILE PIC ANALYSIS:
|
||||
# vision_analyze(image_url="{pic_url}", question="Describe: appearance, clothing, setting, expression, professional vs casual")
|
||||
#
|
||||
# REVERSE IMAGE SEARCH (Yandex):
|
||||
# # Upload to catbox if behind auth:
|
||||
# terminal("curl -F 'reqtype=fileupload' -F 'fileToUpload=@{path}' https://catbox.moe/user/api.php")
|
||||
# browser_navigate(f"https://yandex.com/images/search?rpt=imageview&url={encoded_url}")
|
||||
#
|
||||
# PAGE SCREENSHOT ANALYSIS:
|
||||
# browser_vision(question="Read all text, usernames, post content, dates, engagement numbers")
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════
|
||||
# INTERACTION MAPPING
|
||||
# ═══════════════════════════════════════════════════════════════
|
||||
|
||||
def search_interactions(handle: str, other_handles: list = None) -> dict:
|
||||
"""How they interact with other simulation targets."""
|
||||
results = {}
|
||||
if other_handles:
|
||||
for other in other_handles[:4]:
|
||||
hits = _safe_web_search(f"{handle} {other} twitter interaction debate reply", limit=3)
|
||||
if hits:
|
||||
results[f"with_{other}"] = hits
|
||||
return results
|
||||
|
||||
|
||||
def search_social_graph(handle: str) -> dict:
|
||||
"""Who do they interact with most? Allies and rivals."""
|
||||
results = {}
|
||||
|
||||
results["frequent_interactions"] = _safe_web_search(f"@{handle} twitter reply thread conversation with", limit=5)
|
||||
results["conflicts"] = _safe_web_search(f"@{handle} disagree argue beef ratio", limit=5)
|
||||
results["allies"] = _safe_web_search(f"@{handle} agree support endorse recommend", limit=5)
|
||||
|
||||
return results
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════
|
||||
# DEEP EXTRACTION
|
||||
# ═══════════════════════════════════════════════════════════════
|
||||
|
||||
def extract_content(urls: list) -> list:
|
||||
"""Pull full content from high-value URLs."""
|
||||
if not urls:
|
||||
return []
|
||||
r = web_extract(urls=urls[:3])
|
||||
return r.get("results", [])
|
||||
|
||||
|
||||
def extract_best_urls(findings: dict, max_urls: int = 5) -> list:
|
||||
"""Find the most promising URLs in research findings for deep extraction."""
|
||||
seen_urls = set() # URL deduplication
|
||||
priority_domains = [
|
||||
"substack.com", "medium.com", "blog", "essay",
|
||||
"interview", "podcast", "youtube.com", "arxiv.org",
|
||||
]
|
||||
|
||||
def score_url(url, desc):
|
||||
score = 0
|
||||
for domain in priority_domains:
|
||||
if domain in url.lower() or domain in desc.lower():
|
||||
score += 2
|
||||
if any(w in desc.lower() for w in ["interview", "spoke", "told", "said", "wrote"]):
|
||||
score += 1
|
||||
return score
|
||||
|
||||
candidates = []
|
||||
|
||||
def collect(obj):
|
||||
if isinstance(obj, list):
|
||||
for item in obj:
|
||||
if isinstance(item, dict):
|
||||
url = item.get("url") or ""
|
||||
desc = item.get("description") or item.get("text") or ""
|
||||
if url and url not in seen_urls and not any(x in url for x in ["x.com", "twitter.com", "instagram.com"]):
|
||||
seen_urls.add(url)
|
||||
candidates.append((score_url(url, desc), url))
|
||||
elif isinstance(obj, dict):
|
||||
for v in obj.values():
|
||||
collect(v)
|
||||
|
||||
collect(findings)
|
||||
candidates.sort(key=lambda x: -x[0])
|
||||
return [url for _, url in candidates[:max_urls]]
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════
|
||||
# MAIN PIPELINE
|
||||
# ═══════════════════════════════════════════════════════════════
|
||||
|
||||
def research_person(handle: str, fidelity: int = 70,
|
||||
topics: list = None,
|
||||
other_handles: list = None,
|
||||
real_name: str = None,
|
||||
domain: str = None) -> dict:
|
||||
"""
|
||||
Full research pipeline for one person.
|
||||
Returns dict with all findings organized by category.
|
||||
|
||||
Args:
|
||||
handle: Twitter/X handle (without @)
|
||||
fidelity: Research depth 0-100
|
||||
topics: Specific topics to research
|
||||
other_handles: Other people to check interactions with
|
||||
real_name: Real name if different from handle
|
||||
domain: Domain context (e.g., 'AI', 'politics', 'gaming').
|
||||
When None, no domain keywords are added to searches.
|
||||
When set, adds relevant domain keywords.
|
||||
"""
|
||||
print(f"\n{'='*60}")
|
||||
print(f" RESEARCHING: @{handle} | Fidelity: {fidelity}%")
|
||||
if domain:
|
||||
print(f" Domain: {domain}")
|
||||
print(f"{'='*60}")
|
||||
|
||||
findings = {"handle": handle, "fidelity": fidelity, "visual_tasks": []}
|
||||
|
||||
# ─── Phase 1: Identity (always) ───
|
||||
print(f"\n [IDENTITY] Who are they...")
|
||||
findings["identity"] = search_identity(handle)
|
||||
|
||||
if fidelity <= 30:
|
||||
if topics:
|
||||
findings["quick_topic"] = _safe_web_search(f"{handle} {topics[0]}", limit=3)
|
||||
return findings
|
||||
|
||||
# ─── Phase 2: Voice (fidelity 31+) ───
|
||||
print(f"\n [VOICE] How do they talk...")
|
||||
findings["voice"] = search_voice(handle)
|
||||
|
||||
# ─── Phase 3: Positions (fidelity 31+) ───
|
||||
print(f"\n [POSITIONS] What do they believe...")
|
||||
findings["positions"] = search_positions(handle, topics, domain=domain)
|
||||
|
||||
if fidelity <= 50:
|
||||
return findings
|
||||
|
||||
# ─── Phase 4: Cross-platform (fidelity 51+) ───
|
||||
print(f"\n [PLATFORMS] Finding them everywhere...")
|
||||
findings["platforms"] = discover_platforms(handle, real_name)
|
||||
|
||||
if fidelity <= 70:
|
||||
return findings
|
||||
|
||||
# ─── Phase 5: Longform (fidelity 71+) ───
|
||||
print(f"\n [LONGFORM] Blogs, interviews, essays...")
|
||||
findings["longform"] = search_longform(handle, real_name, domain=domain)
|
||||
|
||||
# ─── Phase 6: Social graph (fidelity 71+) ───
|
||||
print(f"\n [SOCIAL GRAPH] Who do they interact with...")
|
||||
findings["social_graph"] = search_social_graph(handle)
|
||||
|
||||
# ─── Phase 7: Interaction mapping (fidelity 71+) ───
|
||||
if other_handles:
|
||||
print(f"\n [INTERACTIONS] With other targets: {other_handles}...")
|
||||
findings["interactions"] = search_interactions(handle, other_handles)
|
||||
|
||||
# ─── Phase 8: Instagram deep dive (fidelity 80+) ───
|
||||
if fidelity >= 80:
|
||||
print(f"\n [INSTAGRAM] Visual identity...")
|
||||
findings["instagram"] = discover_instagram(handle, real_name)
|
||||
|
||||
# Queue visual tasks for the agent to do after execute_code
|
||||
findings["visual_tasks"].append({
|
||||
"type": "instagram_profile",
|
||||
"instruction": f"browser_navigate to Instagram profile, use browser_vision to analyze",
|
||||
"handle": handle,
|
||||
})
|
||||
|
||||
# ─── Phase 9: Deep extraction (fidelity 85+) ───
|
||||
if fidelity >= 85:
|
||||
print(f"\n [DEEP EXTRACT] Pulling longform content...")
|
||||
best_urls = extract_best_urls(findings, max_urls=4)
|
||||
if best_urls:
|
||||
print(f" Extracting {len(best_urls)} URLs: {best_urls}")
|
||||
findings["deep_extracts"] = extract_content(best_urls)
|
||||
|
||||
# ─── Phase 10: Profile pic analysis (fidelity 90+) ───
|
||||
if fidelity >= 90:
|
||||
findings["visual_tasks"].append({
|
||||
"type": "profile_pic_analysis",
|
||||
"instruction": "Find and analyze profile pictures across platforms with vision_analyze",
|
||||
"handle": handle,
|
||||
})
|
||||
findings["visual_tasks"].append({
|
||||
"type": "reverse_image_search",
|
||||
"instruction": "Reverse image search profile pic via Yandex to find alt accounts",
|
||||
"handle": handle,
|
||||
})
|
||||
|
||||
return findings
|
||||
|
||||
|
||||
def research_all(handles: list, fidelity: int = 70,
|
||||
topics: list = None, domain: str = None) -> dict:
|
||||
"""Research all simulation targets."""
|
||||
all_findings = {}
|
||||
|
||||
for handle in handles:
|
||||
clean = handle.lstrip("@")
|
||||
others = [h.lstrip("@") for h in handles if h.lstrip("@") != clean]
|
||||
|
||||
findings = research_person(
|
||||
handle=clean,
|
||||
fidelity=fidelity,
|
||||
topics=topics,
|
||||
other_handles=others,
|
||||
domain=domain,
|
||||
)
|
||||
all_findings[clean] = findings
|
||||
|
||||
return all_findings
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════
|
||||
# REPORTING
|
||||
# ═══════════════════════════════════════════════════════════════
|
||||
|
||||
def count_data_points(obj) -> int:
|
||||
"""Count total search result items in findings (only meaningful items with >50 char text)."""
|
||||
total = 0
|
||||
if isinstance(obj, list):
|
||||
for item in obj:
|
||||
if isinstance(item, dict):
|
||||
text = item.get("description") or item.get("text") or ""
|
||||
if len(text) > 50:
|
||||
total += 1
|
||||
else:
|
||||
# Still count non-dict items or items without text fields
|
||||
total += 1
|
||||
else:
|
||||
total += 1
|
||||
elif isinstance(obj, dict):
|
||||
for k, v in obj.items():
|
||||
# Skip metadata keys
|
||||
if k in ("handle", "fidelity", "visual_tasks"):
|
||||
continue
|
||||
total += count_data_points(v)
|
||||
return total
|
||||
|
||||
|
||||
def count_quality_data_points(obj) -> int:
|
||||
"""Count search result items with substantial text (description/text > 50 chars)."""
|
||||
total = 0
|
||||
if isinstance(obj, list):
|
||||
for item in obj:
|
||||
if isinstance(item, dict):
|
||||
text = item.get("description") or item.get("text") or ""
|
||||
if len(text) > 50:
|
||||
total += 1
|
||||
elif isinstance(obj, dict):
|
||||
for k, v in obj.items():
|
||||
if k in ("handle", "fidelity", "visual_tasks"):
|
||||
continue
|
||||
total += count_quality_data_points(v)
|
||||
return total
|
||||
|
||||
|
||||
def summarize_findings(findings: dict) -> str:
|
||||
"""Compact summary of what we found."""
|
||||
handle = findings.get("handle", "unknown")
|
||||
fidelity = findings.get("fidelity", 0)
|
||||
total = count_data_points(findings)
|
||||
quality = count_quality_data_points(findings)
|
||||
visual_tasks = findings.get("visual_tasks", [])
|
||||
|
||||
lines = [
|
||||
f"\n{'━'*60}",
|
||||
f" @{handle} | Fidelity: {fidelity}% | Data points: {total} ({quality} quality)",
|
||||
f"{'━'*60}",
|
||||
]
|
||||
|
||||
# Identity snippets
|
||||
identity = findings.get("identity", {})
|
||||
for key in ["twitter_identity", "general_identity"]:
|
||||
for item in identity.get(key, [])[:2]:
|
||||
if not isinstance(item, dict):
|
||||
continue
|
||||
desc = (item.get("description") or "")[:180]
|
||||
if desc:
|
||||
lines.append(f" [{key.upper()}] {desc}")
|
||||
|
||||
# Platform discovery results
|
||||
platforms = findings.get("platforms", {})
|
||||
found_platforms = []
|
||||
for platform, items in platforms.items():
|
||||
if isinstance(items, list) and len(items) > 0:
|
||||
found_platforms.append(platform)
|
||||
if found_platforms:
|
||||
lines.append(f" [PLATFORMS FOUND] {', '.join(found_platforms)}")
|
||||
|
||||
# Voice samples from aggregators
|
||||
voice = findings.get("voice", {})
|
||||
for key, items in voice.items():
|
||||
if isinstance(items, list):
|
||||
for item in items[:1]:
|
||||
if not isinstance(item, dict):
|
||||
continue
|
||||
desc = (item.get("description") or "")[:180]
|
||||
if desc and handle.lower() in desc.lower():
|
||||
lines.append(f" [VOICE] {desc}")
|
||||
|
||||
# Deep extracts
|
||||
for extract in findings.get("deep_extracts", [])[:2]:
|
||||
if not isinstance(extract, dict):
|
||||
continue
|
||||
title = extract.get("title", "untitled")
|
||||
content = (extract.get("content") or "")[:200]
|
||||
if content:
|
||||
lines.append(f" [LONGFORM: {title}] {content}...")
|
||||
|
||||
# Pending visual tasks
|
||||
if visual_tasks:
|
||||
lines.append(f" [VISUAL TASKS QUEUED] {len(visual_tasks)} tasks for agent to execute:")
|
||||
for task in visual_tasks:
|
||||
lines.append(f" → {task.get('type', '?')}: {task.get('instruction', '?')[:80]}")
|
||||
|
||||
# Confidence estimate — based on quality data points
|
||||
if quality >= 30:
|
||||
conf = "HIGH"
|
||||
elif quality >= 15:
|
||||
conf = "MEDIUM"
|
||||
elif quality >= 5:
|
||||
conf = "LOW"
|
||||
else:
|
||||
conf = "INSUFFICIENT"
|
||||
lines.append(f"\n CONFIDENCE: {conf} ({quality} quality data points, {total} total)")
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def report_visual_tasks(all_findings: dict) -> str:
|
||||
"""Collect all visual tasks across all targets for agent to execute."""
|
||||
lines = ["\n" + "═"*60, " VISUAL INTELLIGENCE TASKS (agent must execute directly)", "═"*60]
|
||||
|
||||
any_tasks = False
|
||||
for handle, findings in all_findings.items():
|
||||
for task in findings.get("visual_tasks", []):
|
||||
any_tasks = True
|
||||
lines.append(f"\n @{handle} — {task.get('type', '?')}:")
|
||||
lines.append(f" {task.get('instruction', '?')}")
|
||||
|
||||
if not any_tasks:
|
||||
lines.append(" No visual tasks queued (fidelity < 80)")
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════
|
||||
# CHECK AVAILABLE TOOLS
|
||||
# ═══════════════════════════════════════════════════════════════
|
||||
|
||||
def check_x_cli() -> bool:
|
||||
"""Check if x-cli is available."""
|
||||
try:
|
||||
r = terminal("which x-cli 2>/dev/null && echo 'FOUND' || echo 'NOT_FOUND'")
|
||||
return "FOUND" in r.get("output", "")
|
||||
except:
|
||||
return False
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════
|
||||
# ENTRY POINT
|
||||
# ═══════════════════════════════════════════════════════════════
|
||||
|
||||
if __name__ == "__main__":
|
||||
# ── CONFIGURE THESE ──
|
||||
HANDLES = ["teknium1", "basedjensen"]
|
||||
FIDELITY = 80
|
||||
TOPICS = ["open source AI", "compute scaling"]
|
||||
DOMAIN = None # Set to 'AI', 'politics', etc. to add domain keywords
|
||||
# ─────────────────────
|
||||
|
||||
has_xcli = check_x_cli()
|
||||
print(f"x-cli available: {has_xcli}")
|
||||
print(f"Targets: {HANDLES}")
|
||||
print(f"Fidelity: {FIDELITY}%")
|
||||
print(f"Topics: {TOPICS}")
|
||||
print(f"Domain: {DOMAIN}")
|
||||
|
||||
results = research_all(HANDLES, fidelity=FIDELITY, topics=TOPICS, domain=DOMAIN)
|
||||
|
||||
for handle, findings in results.items():
|
||||
print(summarize_findings(findings))
|
||||
|
||||
print(report_visual_tasks(results))
|
||||
print("\n\nResearch phase complete. Agent should now:")
|
||||
print("1. Execute any queued visual tasks (browser/vision)")
|
||||
print("2. Compile dossiers from all findings")
|
||||
print("3. Run simulation")
|
||||
238
optional-skills/worldsim/scripts/threads_api.py
Normal file
238
optional-skills/worldsim/scripts/threads_api.py
Normal file
@@ -0,0 +1,238 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Threads (Meta) Profile & Post Extractor
|
||||
========================================
|
||||
Extracts profile data and post content from Threads using:
|
||||
1. OG meta tags from HTML (no auth required for profiles and public posts)
|
||||
2. WebFinger for ActivityPub discovery
|
||||
3. Google-indexed post URLs for recent post discovery
|
||||
|
||||
METHODS THAT WORK:
|
||||
- Profile pages at threads.net/@{user} have OG tags with:
|
||||
display_name, username, follower_count, thread_count, bio, profile_pic
|
||||
- Individual post pages have OG tags with:
|
||||
full post text, author info, profile pic
|
||||
- WebFinger at /.well-known/webfinger gives ActivityPub user IDs
|
||||
- Post URLs must be known (discoverable via web search)
|
||||
|
||||
METHODS THAT DON'T WORK (as of 2025):
|
||||
- Threads Official API (graph.threads.net) requires OAuth token
|
||||
- ActivityPub /ap/users/ endpoints return 404 for most users
|
||||
- No public post listing endpoint exists
|
||||
"""
|
||||
|
||||
import re
|
||||
import json
|
||||
import html
|
||||
import subprocess
|
||||
import sys
|
||||
|
||||
def curl_fetch(url, extra_headers=None, timeout=15):
|
||||
"""Fetch URL using curl (more reliable than urllib for Threads)."""
|
||||
cmd = ['curl', '-s', '-L', '--max-time', str(timeout)]
|
||||
if extra_headers:
|
||||
for k, v in extra_headers.items():
|
||||
cmd.extend(['-H', f'{k}: {v}'])
|
||||
cmd.append(url)
|
||||
try:
|
||||
result = subprocess.run(cmd, capture_output=True, text=True, timeout=timeout+5)
|
||||
return result.stdout
|
||||
except:
|
||||
return None
|
||||
|
||||
def extract_og_tags(html_content):
|
||||
"""Extract OpenGraph, meta description, and Twitter tags from HTML."""
|
||||
data = {}
|
||||
if not html_content:
|
||||
return data
|
||||
|
||||
for m in re.finditer(r'property="(og:[^"]+)"\s+content="([^"]*)"', html_content):
|
||||
key = m.group(1)
|
||||
val = html.unescape(m.group(2))
|
||||
if key not in data:
|
||||
data[key] = val
|
||||
|
||||
for m in re.finditer(r'name="description"\s+content="([^"]*)"', html_content):
|
||||
data['description'] = html.unescape(m.group(1))
|
||||
break
|
||||
|
||||
for m in re.finditer(r'name="(twitter:[^"]+)"\s+content="([^"]*)"', html_content):
|
||||
key = m.group(1)
|
||||
val = html.unescape(m.group(2))
|
||||
if key not in data:
|
||||
data[key] = val
|
||||
|
||||
return data
|
||||
|
||||
def parse_profile_description(desc):
|
||||
"""Parse '5.5M Followers • 142 Threads • Bio. See the latest...' format."""
|
||||
result = {}
|
||||
if not desc:
|
||||
return result
|
||||
|
||||
parts = desc.split(' \u2022 ') # Split on bullet •
|
||||
for part in parts:
|
||||
part = part.strip()
|
||||
if 'Follower' in part:
|
||||
result['followers'] = part.split(' Follower')[0].strip()
|
||||
elif part.endswith('Threads') or part.endswith('Thread'):
|
||||
result['thread_count'] = part.split(' Thread')[0].strip()
|
||||
else:
|
||||
bio = re.sub(r'\s*See the latest conversations.*$', '', part)
|
||||
if bio:
|
||||
result['bio'] = bio
|
||||
|
||||
return result
|
||||
|
||||
def parse_profile_title(title):
|
||||
"""Parse 'Display Name (@user) • Threads, Say more' format."""
|
||||
result = {}
|
||||
if not title:
|
||||
return result
|
||||
m = re.match(r'^(.+?)\s*\(@(\w+)\)', title)
|
||||
if m:
|
||||
result['display_name'] = m.group(1).strip()
|
||||
result['username'] = m.group(2)
|
||||
return result
|
||||
|
||||
def get_threads_profile(username):
|
||||
"""
|
||||
Get Threads profile data via OG meta tags.
|
||||
Returns dict with: username, display_name, bio, followers, thread_count,
|
||||
profile_picture_url, url
|
||||
"""
|
||||
username = username.lstrip('@')
|
||||
url = f'https://www.threads.net/@{username}'
|
||||
|
||||
content = curl_fetch(url)
|
||||
tags = extract_og_tags(content)
|
||||
|
||||
if not tags or 'og:title' not in tags:
|
||||
return {'error': 'Failed to fetch or parse profile', 'username': username}
|
||||
|
||||
title = tags.get('og:title', '')
|
||||
if title.startswith('Threads') and 'Log in' in title:
|
||||
return {'error': 'Profile requires login or not found', 'username': username}
|
||||
|
||||
result = {
|
||||
'platform': 'threads',
|
||||
'url': url,
|
||||
}
|
||||
|
||||
result.update(parse_profile_title(title))
|
||||
result.update(parse_profile_description(tags.get('og:description', '')))
|
||||
|
||||
if 'og:image' in tags:
|
||||
result['profile_picture_url'] = tags['og:image']
|
||||
|
||||
return result
|
||||
|
||||
def get_threads_webfinger(username):
|
||||
"""Get WebFinger data (ActivityPub discovery) for a Threads user."""
|
||||
username = username.lstrip('@')
|
||||
url = f'https://www.threads.net/.well-known/webfinger?resource=acct:{username}@threads.net'
|
||||
|
||||
content = curl_fetch(url, {'Accept': 'application/json'})
|
||||
if not content:
|
||||
return None
|
||||
|
||||
try:
|
||||
data = json.loads(content)
|
||||
if 'error' in data or 'success' in data and not data['success']:
|
||||
return None
|
||||
|
||||
result = {'subject': data.get('subject', '')}
|
||||
for link in data.get('links', []):
|
||||
if link.get('type') == 'application/activity+json':
|
||||
result['activitypub_url'] = link['href']
|
||||
elif link.get('rel') == 'http://webfinger.net/rel/profile-page':
|
||||
result['profile_url'] = link['href']
|
||||
return result
|
||||
except:
|
||||
return None
|
||||
|
||||
def get_thread_post(post_url):
|
||||
"""
|
||||
Get content of a specific Threads post via OG tags.
|
||||
Returns: text, author, image_url
|
||||
"""
|
||||
content = curl_fetch(post_url)
|
||||
tags = extract_og_tags(content)
|
||||
|
||||
if not tags or 'og:title' not in tags:
|
||||
return {'error': 'Failed to fetch post'}
|
||||
|
||||
title = tags.get('og:title', '')
|
||||
if 'Log in' in title:
|
||||
return {'error': 'Post requires login or not found'}
|
||||
|
||||
result = {'url': post_url}
|
||||
|
||||
if 'og:description' in tags:
|
||||
result['text'] = tags['og:description']
|
||||
elif 'description' in tags:
|
||||
result['text'] = tags['description']
|
||||
|
||||
if 'og:title' in tags:
|
||||
# Parse "Display Name (@username) on Threads"
|
||||
m = re.match(r'^(.+?)\s*\(@(\w+)\)\s+on\s+Threads', title)
|
||||
if m:
|
||||
result['author_name'] = m.group(1).strip()
|
||||
result['author_username'] = m.group(2)
|
||||
|
||||
if 'og:image' in tags:
|
||||
result['image_url'] = tags['og:image']
|
||||
|
||||
return result
|
||||
|
||||
def get_threads_full(username):
|
||||
"""Get complete profile data combining all methods."""
|
||||
profile = get_threads_profile(username)
|
||||
wf = get_threads_webfinger(username)
|
||||
|
||||
if wf:
|
||||
profile['webfinger'] = wf
|
||||
|
||||
return profile
|
||||
|
||||
|
||||
# ===== TEST =====
|
||||
if __name__ == '__main__':
|
||||
test_users = sys.argv[1:] if len(sys.argv) > 1 else ['zuck', 'nvidia', 'mosseri']
|
||||
|
||||
for user in test_users:
|
||||
print(f"\n{'='*60}")
|
||||
print(f" THREADS PROFILE: @{user}")
|
||||
print(f"{'='*60}")
|
||||
|
||||
data = get_threads_full(user)
|
||||
for k, v in sorted(data.items()):
|
||||
if k == 'profile_picture_url':
|
||||
print(f" {k}: {str(v)[:80]}...")
|
||||
elif k == 'webfinger':
|
||||
print(f" webfinger:")
|
||||
for wk, wv in v.items():
|
||||
print(f" {wk}: {wv}")
|
||||
else:
|
||||
print(f" {k}: {v}")
|
||||
|
||||
# Test posts
|
||||
post_urls = [
|
||||
'https://www.threads.net/@zuck/post/DEkvXzbyDS9',
|
||||
]
|
||||
|
||||
print(f"\n{'='*60}")
|
||||
print(f" THREADS POSTS")
|
||||
print(f"{'='*60}")
|
||||
|
||||
for purl in post_urls:
|
||||
print(f"\n URL: {purl}")
|
||||
post = get_thread_post(purl)
|
||||
for k, v in post.items():
|
||||
if k in ('image_url',):
|
||||
print(f" {k}: {str(v)[:80]}...")
|
||||
elif k == 'text':
|
||||
print(f" {k}: {v[:300]}{'...' if len(v) > 300 else ''}")
|
||||
else:
|
||||
print(f" {k}: {v}")
|
||||
|
||||
305
optional-skills/worldsim/scripts/tiktok_api.py
Normal file
305
optional-skills/worldsim/scripts/tiktok_api.py
Normal file
@@ -0,0 +1,305 @@
|
||||
"""
|
||||
TikTok Profile & Video Data Scraper
|
||||
====================================
|
||||
WORKING methods to get full TikTok profile data and video content.
|
||||
Tested and verified April 2026.
|
||||
|
||||
METHODS SUMMARY:
|
||||
================
|
||||
METHOD 1 (BEST): HTML SSR Scraping - Parse __UNIVERSAL_DATA_FOR_REHYDRATION__
|
||||
- Gets: FULL profile (bio, stats, follower/following/heart/video counts)
|
||||
- Works: YES - Reliable, no auth needed, just curl + parse
|
||||
- Limitation: No video list on profile page (videos load client-side)
|
||||
|
||||
METHOD 2: oEmbed API - https://www.tiktok.com/oembed?url=...
|
||||
- Gets: Video title/caption, author, thumbnail URL
|
||||
- Works: YES - No auth, no rate limit issues
|
||||
- Limitation: Need video IDs first; no engagement stats
|
||||
|
||||
METHOD 3: tikwm.com API - https://www.tikwm.com/api/
|
||||
- Gets: Full user info + individual video stats (plays, likes, comments, shares)
|
||||
- User info: https://www.tikwm.com/api/user/info?unique_id={username}
|
||||
- Video info: https://www.tikwm.com/api/?url={tiktok_video_url}
|
||||
- Works: YES for user info and single videos
|
||||
- Limitation: Posts list endpoint returns 403 (rate-limited)
|
||||
|
||||
METHOD 4: Video ID Discovery via Search Engines
|
||||
- Use web_search("site:tiktok.com/@{username}/video") to find video IDs
|
||||
- Then use oEmbed or tikwm or HTML scraping per video
|
||||
- Works: YES - Gets ~5 recent video IDs per search
|
||||
|
||||
METHOD 5: SocialBlade via web_extract
|
||||
- URL: https://socialblade.com/tiktok/user/{username}
|
||||
- Gets: Followers, following, likes, videos, growth trends, rankings
|
||||
- Works: YES via web_extract tool
|
||||
|
||||
METHOD 6: Individual Video HTML Scraping
|
||||
- Fetch https://www.tiktok.com/@{user}/video/{id}
|
||||
- Parse __UNIVERSAL_DATA webapp.video-detail -> itemInfo.itemStruct
|
||||
- Gets: FULL video data (caption, stats, music, hashtags, duration)
|
||||
- Works: YES - Most complete per-video data
|
||||
|
||||
NOT WORKING:
|
||||
- TikTok /api/user/detail/ endpoint -> returns empty (needs signed params)
|
||||
- TikTok /api/post/item_list/ -> returns empty (needs x-bogus/msToken)
|
||||
- tikwm.com /api/user/posts -> 403 forbidden
|
||||
"""
|
||||
|
||||
import re
|
||||
import json
|
||||
import subprocess
|
||||
import urllib.parse
|
||||
|
||||
USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
|
||||
|
||||
|
||||
def fetch_url(url, headers=None):
|
||||
"""Fetch URL via curl and return content."""
|
||||
cmd = ['curl', '-s', '-L', '-m', '30', url,
|
||||
'-H', f'User-Agent: {USER_AGENT}',
|
||||
'-H', 'Accept-Language: en-US,en;q=0.9']
|
||||
if headers:
|
||||
for k, v in headers.items():
|
||||
cmd.extend(['-H', f'{k}: {v}'])
|
||||
result = subprocess.run(cmd, capture_output=True, text=True, timeout=35)
|
||||
return result.stdout
|
||||
|
||||
|
||||
def method1_html_profile(username):
|
||||
"""
|
||||
METHOD 1: Scrape TikTok profile HTML and parse SSR JSON data.
|
||||
Returns full profile with stats.
|
||||
"""
|
||||
url = f'https://www.tiktok.com/@{username}'
|
||||
html = fetch_url(url)
|
||||
|
||||
m = re.search(
|
||||
r'<script id="__UNIVERSAL_DATA_FOR_REHYDRATION__" type="application/json">(.*?)</script>',
|
||||
html
|
||||
)
|
||||
if not m:
|
||||
return None
|
||||
|
||||
data = json.loads(m.group(1))
|
||||
scope = data.get('__DEFAULT_SCOPE__', {})
|
||||
user_detail = scope.get('webapp.user-detail', {})
|
||||
user_info = user_detail.get('userInfo', {})
|
||||
|
||||
if not user_info:
|
||||
return None
|
||||
|
||||
user = user_info.get('user', {})
|
||||
stats = user_info.get('statsV2', user_info.get('stats', {}))
|
||||
|
||||
return {
|
||||
'id': user.get('id'),
|
||||
'username': user.get('uniqueId'),
|
||||
'nickname': user.get('nickname'),
|
||||
'bio': user.get('signature'),
|
||||
'verified': user.get('verified'),
|
||||
'private': user.get('privateAccount'),
|
||||
'secUid': user.get('secUid'),
|
||||
'avatarLarger': user.get('avatarLarger'),
|
||||
'bioLink': user.get('bioLink', {}),
|
||||
'createTime': user.get('createTime'),
|
||||
'language': user.get('language'),
|
||||
'stats': {
|
||||
'followers': int(stats.get('followerCount', 0)),
|
||||
'following': int(stats.get('followingCount', 0)),
|
||||
'hearts': int(stats.get('heartCount', 0)),
|
||||
'videos': int(stats.get('videoCount', 0)),
|
||||
'diggs': int(stats.get('diggCount', 0)),
|
||||
'friends': int(stats.get('friendCount', 0)),
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
def method2_oembed_video(username, video_id):
|
||||
"""
|
||||
METHOD 2: Get video caption/title via oEmbed.
|
||||
No auth needed. Returns caption, author, thumbnail.
|
||||
"""
|
||||
url = f'https://www.tiktok.com/oembed?url=https://www.tiktok.com/@{username}/video/{video_id}'
|
||||
content = fetch_url(url)
|
||||
try:
|
||||
data = json.loads(content)
|
||||
return {
|
||||
'video_id': video_id,
|
||||
'title': data.get('title', ''),
|
||||
'author_name': data.get('author_name'),
|
||||
'author_url': data.get('author_url'),
|
||||
'thumbnail_url': data.get('thumbnail_url'),
|
||||
'thumbnail_width': data.get('thumbnail_width'),
|
||||
'thumbnail_height': data.get('thumbnail_height'),
|
||||
}
|
||||
except json.JSONDecodeError:
|
||||
return None
|
||||
|
||||
|
||||
def method3_tikwm_user(username):
|
||||
"""
|
||||
METHOD 3a: Get user info via tikwm.com API.
|
||||
"""
|
||||
url = f'https://www.tikwm.com/api/user/info?unique_id={username}'
|
||||
content = fetch_url(url)
|
||||
try:
|
||||
data = json.loads(content)
|
||||
if data.get('code') == 0:
|
||||
return data['data']
|
||||
except json.JSONDecodeError:
|
||||
pass
|
||||
return None
|
||||
|
||||
|
||||
def method3_tikwm_video(video_url):
|
||||
"""
|
||||
METHOD 3b: Get video details via tikwm.com API.
|
||||
Returns: title, play_count, digg_count, comment_count, share_count, duration, download URLs
|
||||
"""
|
||||
url = f'https://www.tikwm.com/api/?url={urllib.parse.quote(video_url)}'
|
||||
content = fetch_url(url)
|
||||
try:
|
||||
data = json.loads(content)
|
||||
if data.get('code') == 0:
|
||||
v = data['data']
|
||||
return {
|
||||
'video_id': v.get('id'),
|
||||
'title': v.get('title'),
|
||||
'duration': v.get('duration'),
|
||||
'play_count': v.get('play_count'),
|
||||
'likes': v.get('digg_count'),
|
||||
'comments': v.get('comment_count'),
|
||||
'shares': v.get('share_count'),
|
||||
'author': v.get('author', {}).get('unique_id'),
|
||||
'music_title': v.get('music_info', {}).get('title') if v.get('music_info') else None,
|
||||
'cover_url': v.get('origin_cover') or v.get('cover'),
|
||||
'play_url': v.get('play'), # direct video URL
|
||||
}
|
||||
except json.JSONDecodeError:
|
||||
pass
|
||||
return None
|
||||
|
||||
|
||||
def method6_html_video(username, video_id):
|
||||
"""
|
||||
METHOD 6: Scrape individual video page HTML for full data.
|
||||
Gets: caption, full stats, music, hashtags, create time.
|
||||
"""
|
||||
url = f'https://www.tiktok.com/@{username}/video/{video_id}'
|
||||
html = fetch_url(url)
|
||||
|
||||
m = re.search(
|
||||
r'<script id="__UNIVERSAL_DATA_FOR_REHYDRATION__" type="application/json">(.*?)</script>',
|
||||
html
|
||||
)
|
||||
if not m:
|
||||
return None
|
||||
|
||||
data = json.loads(m.group(1))
|
||||
scope = data.get('__DEFAULT_SCOPE__', {})
|
||||
vd = scope.get('webapp.video-detail', {})
|
||||
item = vd.get('itemInfo', {}).get('itemStruct', {})
|
||||
|
||||
if not item:
|
||||
return None
|
||||
|
||||
stats = item.get('statsV2', item.get('stats', {}))
|
||||
music = item.get('music', {})
|
||||
challenges = item.get('challenges', [])
|
||||
|
||||
return {
|
||||
'video_id': item.get('id'),
|
||||
'description': item.get('desc'),
|
||||
'createTime': item.get('createTime'),
|
||||
'duration': item.get('video', {}).get('duration'),
|
||||
'stats': {
|
||||
'plays': int(stats.get('playCount', 0)),
|
||||
'likes': int(stats.get('diggCount', 0)),
|
||||
'comments': int(stats.get('commentCount', 0)),
|
||||
'shares': int(stats.get('shareCount', 0)),
|
||||
'saves': int(stats.get('collectCount', 0)),
|
||||
},
|
||||
'music': {
|
||||
'title': music.get('title'),
|
||||
'author': music.get('authorName'),
|
||||
},
|
||||
'hashtags': [c.get('title', '') for c in challenges],
|
||||
'author': item.get('author', {}).get('uniqueId'),
|
||||
}
|
||||
|
||||
|
||||
def get_full_tiktok_profile(username):
|
||||
"""
|
||||
Complete pipeline: Get full profile + discover and scrape recent videos.
|
||||
|
||||
Returns dict with profile data, stats, and recent video details.
|
||||
"""
|
||||
# Step 1: Get profile data
|
||||
profile = method1_html_profile(username)
|
||||
if not profile:
|
||||
return {'error': f'Could not fetch profile for @{username}'}
|
||||
|
||||
result = {
|
||||
'profile': profile,
|
||||
'videos': [],
|
||||
'data_sources': ['tiktok_html_ssr'],
|
||||
}
|
||||
|
||||
# Note: Video discovery requires web_search tool (not available in pure Python)
|
||||
# In the agent context, use:
|
||||
# web_search(f"site:tiktok.com/@{username}/video")
|
||||
# Then for each video ID found, call method6_html_video() or method2_oembed_video()
|
||||
|
||||
return result
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
import sys
|
||||
username = sys.argv[1] if len(sys.argv) > 1 else 'khaby.lame'
|
||||
|
||||
print(f'=== Testing TikTok scraping for @{username} ===\n')
|
||||
|
||||
print('--- METHOD 1: HTML Profile Scraping ---')
|
||||
profile = method1_html_profile(username)
|
||||
if profile:
|
||||
print(f' Username: {profile["username"]}')
|
||||
print(f' Nickname: {profile["nickname"]}')
|
||||
print(f' Bio: {profile["bio"][:100]}')
|
||||
print(f' Verified: {profile["verified"]}')
|
||||
print(f' Followers: {profile["stats"]["followers"]:,}')
|
||||
print(f' Following: {profile["stats"]["following"]:,}')
|
||||
print(f' Hearts: {profile["stats"]["hearts"]:,}')
|
||||
print(f' Videos: {profile["stats"]["videos"]:,}')
|
||||
print(f' SecUid: {profile["secUid"][:50]}...')
|
||||
else:
|
||||
print(' FAILED')
|
||||
|
||||
print('\n--- METHOD 3a: tikwm.com User API ---')
|
||||
tikwm_user = method3_tikwm_user(username)
|
||||
if tikwm_user:
|
||||
s = tikwm_user.get('stats', {})
|
||||
print(f' Followers: {s.get("followerCount"):,}')
|
||||
print(f' Hearts: {s.get("heartCount"):,}')
|
||||
print(f' Videos: {s.get("videoCount"):,}')
|
||||
else:
|
||||
print(' FAILED')
|
||||
|
||||
# Test with a known video
|
||||
test_video_id = '7615318641042623775' # khaby birthday video
|
||||
if username == 'khaby.lame':
|
||||
print(f'\n--- METHOD 2: oEmbed for video {test_video_id} ---')
|
||||
oembed = method2_oembed_video(username, test_video_id)
|
||||
if oembed:
|
||||
print(f' Title: {oembed["title"][:80]}')
|
||||
|
||||
print(f'\n--- METHOD 6: HTML Video Scraping for {test_video_id} ---')
|
||||
video = method6_html_video(username, test_video_id)
|
||||
if video:
|
||||
print(f' Description: {video["description"][:80]}')
|
||||
print(f' Plays: {video["stats"]["plays"]:,}')
|
||||
print(f' Likes: {video["stats"]["likes"]:,}')
|
||||
print(f' Comments: {video["stats"]["comments"]:,}')
|
||||
print(f' Shares: {video["stats"]["shares"]:,}')
|
||||
print(f' Hashtags: {video["hashtags"]}')
|
||||
|
||||
print('\n=== DONE ===')
|
||||
260
optional-skills/worldsim/scripts/x_api.py
Normal file
260
optional-skills/worldsim/scripts/x_api.py
Normal file
@@ -0,0 +1,260 @@
|
||||
"""
|
||||
Direct X/Twitter API v2 client for Hermes Simulator.
|
||||
No x-cli dependency — uses curl via terminal() with bearer token.
|
||||
|
||||
Provides:
|
||||
- get_user(handle) — profile, bio, metrics
|
||||
- get_tweets(user_id, count) — recent tweets with metrics
|
||||
- search_tweets(query, count) — search for tweets
|
||||
- get_user_mentions(user_id, count) — mentions of a user
|
||||
"""
|
||||
|
||||
from hermes_tools import terminal
|
||||
import json
|
||||
import os
|
||||
import time
|
||||
import urllib.parse
|
||||
|
||||
# Bearer token — loaded from env or hardcoded fallback
|
||||
BEARER = os.environ.get("X_BEARER_TOKEN", "")
|
||||
|
||||
MAX_RETRIES = 3
|
||||
BASE_DELAY = 2 # seconds, exponential backoff: 2s, 4s, 8s
|
||||
|
||||
|
||||
def _api_get(endpoint: str, params: dict = None) -> dict:
|
||||
"""Make authenticated GET request to X API v2 with retry and error handling."""
|
||||
url = f"https://api.twitter.com/2/{endpoint}"
|
||||
if params:
|
||||
qs = "&".join(f"{k}={urllib.parse.quote(str(v))}" for k, v in params.items())
|
||||
url += f"?{qs}"
|
||||
|
||||
for attempt in range(MAX_RETRIES):
|
||||
try:
|
||||
r = terminal(f'curl -s -w \'\\n%{{http_code}}\' -H "Authorization: Bearer {BEARER}" "{url}"')
|
||||
output = r.get("output", "").strip()
|
||||
|
||||
# Split body from status code (last line)
|
||||
lines = output.rsplit("\n", 1)
|
||||
if len(lines) == 2:
|
||||
body, status_str = lines
|
||||
else:
|
||||
body = output
|
||||
status_str = "0"
|
||||
|
||||
try:
|
||||
status_code = int(status_str.strip())
|
||||
except ValueError:
|
||||
status_code = 0
|
||||
|
||||
# Handle specific status codes
|
||||
if status_code == 429:
|
||||
# Rate limited — retry with backoff
|
||||
delay = BASE_DELAY * (2 ** attempt)
|
||||
print(f" [X API] Rate limited (429). Retry {attempt+1}/{MAX_RETRIES} in {delay}s...")
|
||||
time.sleep(delay)
|
||||
continue
|
||||
|
||||
if status_code in (401, 403):
|
||||
return {"error": f"Authentication failed (HTTP {status_code}). Check X_BEARER_TOKEN.", "http_status": status_code}
|
||||
|
||||
if status_code >= 500:
|
||||
delay = BASE_DELAY * (2 ** attempt)
|
||||
print(f" [X API] Server error ({status_code}). Retry {attempt+1}/{MAX_RETRIES} in {delay}s...")
|
||||
time.sleep(delay)
|
||||
continue
|
||||
|
||||
if status_code == 0 and not body:
|
||||
# Network error — no response at all
|
||||
delay = BASE_DELAY * (2 ** attempt)
|
||||
print(f" [X API] Network error. Retry {attempt+1}/{MAX_RETRIES} in {delay}s...")
|
||||
time.sleep(delay)
|
||||
continue
|
||||
|
||||
try:
|
||||
return json.loads(body)
|
||||
except json.JSONDecodeError:
|
||||
return {"error": f"Failed to parse response (HTTP {status_code}): {body[:200]}"}
|
||||
|
||||
except Exception as e:
|
||||
delay = BASE_DELAY * (2 ** attempt)
|
||||
print(f" [X API] Exception: {e}. Retry {attempt+1}/{MAX_RETRIES} in {delay}s...")
|
||||
time.sleep(delay)
|
||||
continue
|
||||
|
||||
return {"error": f"All {MAX_RETRIES} retries exhausted for {endpoint}"}
|
||||
|
||||
|
||||
def get_user(handle: str) -> dict:
|
||||
"""Get user profile by handle."""
|
||||
handle = handle.lstrip("@")
|
||||
return _api_get(f"users/by/username/{handle}", {
|
||||
"user.fields": "description,public_metrics,profile_image_url,created_at,location,url"
|
||||
})
|
||||
|
||||
|
||||
def get_tweets(user_id: str, count: int = 20) -> dict:
|
||||
"""Get user's recent tweets."""
|
||||
return _api_get(f"users/{user_id}/tweets", {
|
||||
"max_results": max(min(count, 100), 5),
|
||||
"tweet.fields": "created_at,public_metrics,text,in_reply_to_user_id,referenced_tweets",
|
||||
"exclude": "retweets" # original tweets only for voice analysis
|
||||
})
|
||||
|
||||
|
||||
def get_tweets_with_rts(user_id: str, count: int = 20) -> dict:
|
||||
"""Get user's recent tweets including retweets (shows interests)."""
|
||||
return _api_get(f"users/{user_id}/tweets", {
|
||||
"max_results": max(min(count, 100), 5),
|
||||
"tweet.fields": "created_at,public_metrics,text,referenced_tweets"
|
||||
})
|
||||
|
||||
|
||||
def search_tweets(query: str, count: int = 10) -> dict:
|
||||
"""Search recent tweets."""
|
||||
return _api_get("tweets/search/recent", {
|
||||
"query": query,
|
||||
"max_results": max(min(count, 100), 10),
|
||||
"tweet.fields": "created_at,public_metrics,text,author_id"
|
||||
})
|
||||
|
||||
|
||||
def get_user_by_id(user_id: str) -> dict:
|
||||
"""Get user profile by ID."""
|
||||
return _api_get(f"users/{user_id}", {
|
||||
"user.fields": "description,public_metrics,username,name"
|
||||
})
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════
|
||||
# HIGH-LEVEL INTELLIGENCE FUNCTIONS
|
||||
# ═══════════════════════════════════════════════════════════════
|
||||
|
||||
def profile_user(handle: str) -> dict:
|
||||
"""Full profile pull: identity + recent tweets (originals only)."""
|
||||
user = get_user(handle)
|
||||
if "errors" in user or "error" in user:
|
||||
return {"error": f"User @{handle} not found", "details": user}
|
||||
|
||||
user_data = user.get("data", {})
|
||||
user_id = user_data.get("id")
|
||||
|
||||
result = {
|
||||
"profile": user_data,
|
||||
"tweets": [],
|
||||
"voice_samples": [],
|
||||
}
|
||||
|
||||
if user_id:
|
||||
# Get original tweets (no RTs) for voice analysis
|
||||
tweets = get_tweets(user_id, 20)
|
||||
tweet_list = tweets.get("data", [])
|
||||
result["tweets"] = tweet_list
|
||||
|
||||
# Extract pure text samples for voice profiling
|
||||
# Only exclude retweets and actual replies (has in_reply_to_user_id)
|
||||
# Tweets starting with @ are fine if they're standalone mentions
|
||||
result["voice_samples"] = [
|
||||
t["text"] for t in tweet_list
|
||||
if not t.get("text", "").startswith("RT @")
|
||||
and not t.get("in_reply_to_user_id")
|
||||
]
|
||||
|
||||
return result
|
||||
|
||||
|
||||
def profile_interactions(handle1: str, handle2: str) -> dict:
|
||||
"""Find interactions between two users."""
|
||||
# Search for replies from handle1 to handle2
|
||||
q1 = f"from:{handle1} to:{handle2}"
|
||||
q2 = f"from:{handle2} to:{handle1}"
|
||||
|
||||
r1 = search_tweets(q1, 10)
|
||||
r2 = search_tweets(q2, 10)
|
||||
|
||||
return {
|
||||
f"{handle1}_to_{handle2}": r1.get("data", []),
|
||||
f"{handle2}_to_{handle1}": r2.get("data", []),
|
||||
}
|
||||
|
||||
|
||||
def get_voice_data(handle: str, count: int = 50) -> dict:
|
||||
"""Pull maximum voice data: tweets, replies, quote tweets.
|
||||
Returns categorized samples for voice profiling."""
|
||||
user = get_user(handle)
|
||||
if "errors" in user or "error" in user:
|
||||
return {"error": f"User @{handle} not found"}
|
||||
|
||||
user_data = user.get("data", {})
|
||||
user_id = user_data.get("id")
|
||||
if not user_id:
|
||||
return {"error": "No user ID found"}
|
||||
|
||||
# Original tweets (exclude RTs)
|
||||
originals = get_tweets(user_id, min(count, 100))
|
||||
original_list = originals.get("data", [])
|
||||
|
||||
# Categorize — only use in_reply_to_user_id to detect replies
|
||||
standalone = [] # not replies
|
||||
replies = [] # replies to others
|
||||
|
||||
for t in original_list:
|
||||
text = t.get("text", "")
|
||||
if t.get("in_reply_to_user_id"):
|
||||
replies.append(text)
|
||||
else:
|
||||
standalone.append(text)
|
||||
|
||||
return {
|
||||
"profile": user_data,
|
||||
"standalone_tweets": standalone, # their voice at rest
|
||||
"replies": replies, # their voice in conversation
|
||||
"total_samples": len(standalone) + len(replies),
|
||||
}
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════
|
||||
# ENTRY POINT
|
||||
# ═══════════════════════════════════════════════════════════════
|
||||
|
||||
if __name__ == "__main__":
|
||||
if not BEARER:
|
||||
print("ERROR: X_BEARER_TOKEN not set. Set it in environment or ~/.hermes/.env")
|
||||
print("Trying to load from .env...")
|
||||
try:
|
||||
with open(os.path.expanduser("~/.hermes/.env")) as f:
|
||||
for line in f:
|
||||
line = line.strip()
|
||||
if line.startswith("X_BEARER_TOKEN="):
|
||||
# Use split with maxsplit=1 to handle values with '=' in them
|
||||
# Also strip surrounding quotes if present
|
||||
val = line.split("=", 1)[1]
|
||||
if val and val[0] in ('"', "'") and val[-1] == val[0]:
|
||||
val = val[1:-1]
|
||||
BEARER = val
|
||||
break
|
||||
except Exception as e:
|
||||
print(f" Failed to load .env: {e}")
|
||||
|
||||
if not BEARER:
|
||||
print("FATAL: No bearer token found.")
|
||||
exit(1)
|
||||
|
||||
# Demo: profile two users
|
||||
for handle in ["Teknium", "basedjensen"]:
|
||||
print(f"\n{'='*60}")
|
||||
print(f" PROFILING @{handle}")
|
||||
print(f"{'='*60}")
|
||||
|
||||
data = profile_user(handle)
|
||||
profile = data.get("profile", {})
|
||||
print(f" Name: {profile.get('name')}")
|
||||
print(f" Bio: {profile.get('description')}")
|
||||
metrics = profile.get("public_metrics", {})
|
||||
print(f" Followers: {metrics.get('followers_count')}")
|
||||
print(f" Tweets: {metrics.get('tweet_count')}")
|
||||
print(f" Likes given: {metrics.get('like_count')}")
|
||||
|
||||
print(f"\n Voice samples ({len(data.get('voice_samples', []))}):")
|
||||
for sample in data.get("voice_samples", [])[:5]:
|
||||
print(f" > {sample[:120]}")
|
||||
136
optional-skills/worldsim/templates/dossier.md
Normal file
136
optional-skills/worldsim/templates/dossier.md
Normal file
@@ -0,0 +1,136 @@
|
||||
# DOSSIER: {display_name} (@{handle})
|
||||
|
||||
## Identity
|
||||
- **Name**: {real_name}
|
||||
- **Handle(s)**: @{twitter} | u/{reddit} | {discord_tag}
|
||||
- **Role**: {role_and_org}
|
||||
- **Known for**: {what_they_are_famous_for}
|
||||
- **Followers/reach**: {approximate_follower_count}
|
||||
- **Confidence**: {HIGH|MEDIUM|LOW} — {confidence_reason}
|
||||
|
||||
## Voice Profile
|
||||
|
||||
### Linguistic Patterns
|
||||
- **Sentence structure**: {short_punchy | long_flowing | mixed}
|
||||
- **Capitalization**: {normal | all_lowercase | CAPS_FOR_EMPHASIS | mixed}
|
||||
- **Punctuation**: {heavy_periods | ellipsis_lover | no_punctuation | exclamation_marks}
|
||||
- **Paragraph style**: {one_liners | thread_essays | medium_blocks}
|
||||
- **Emoji/emoticon usage**: {none | minimal | heavy | specific_ones}
|
||||
|
||||
### Vocabulary & Slang
|
||||
- **Register**: {academic | casual | shitposter | mixed}
|
||||
- **Recurring words/phrases**: [list of signature words they use a lot]
|
||||
- **Catchphrases**: [any repeated phrases or running jokes]
|
||||
- **Profanity level**: {none | mild | moderate | heavy}
|
||||
- **Jargon tendency**: {explains_everything | assumes_expertise | mixes}
|
||||
|
||||
### Tone
|
||||
- **Default mood**: {earnest | ironic | combative | chill | manic | analytical}
|
||||
- **Humor style**: {deadpan | absurdist | sarcastic | wholesome | shitpost | none}
|
||||
- **How they handle disagreement**: {engages_thoughtfully | dunks | ignores | ratio_warrior | passive_aggressive}
|
||||
- **How they handle praise**: {deflects | accepts_gracefully | awkward | flexes}
|
||||
|
||||
## Positions & Beliefs
|
||||
|
||||
### Core Convictions (things they consistently advocate for)
|
||||
1. {conviction_1}
|
||||
2. {conviction_2}
|
||||
3. {conviction_3}
|
||||
|
||||
### Known Hot Takes
|
||||
1. {take_1}
|
||||
2. {take_2}
|
||||
|
||||
### Hills They'll Die On
|
||||
1. {hill_1}
|
||||
2. {hill_2}
|
||||
|
||||
### Topics They Avoid or Refuse to Engage
|
||||
1. {avoidance_1}
|
||||
|
||||
## Social Dynamics
|
||||
|
||||
### People They Interact With Positively
|
||||
- @{ally_1} — {relationship_description}
|
||||
- @{ally_2} — {relationship_description}
|
||||
|
||||
### People They Beef With / Disagree With
|
||||
- @{rival_1} — {beef_description}
|
||||
|
||||
### How They Engage Different Types
|
||||
- **Fans/supporters**: {how_they_respond}
|
||||
- **Critics**: {how_they_respond}
|
||||
- **Peers**: {how_they_respond}
|
||||
- **Random people**: {how_they_respond}
|
||||
|
||||
## Platform-Specific Behavior
|
||||
|
||||
### On Twitter/X
|
||||
- **Post frequency**: {multiple_daily | daily | few_per_week}
|
||||
- **Thread tendency**: {never | sometimes | loves_threads}
|
||||
- **QRT style**: {adds_context | dunks | amplifies}
|
||||
- **Engagement style**: {likes_a_lot | rarely_likes | retweets_heavy}
|
||||
|
||||
### On Reddit (if applicable)
|
||||
- **Subreddits**: [list]
|
||||
- **Comment style**: {detailed | brief | combative}
|
||||
|
||||
### On Discord (if applicable)
|
||||
- **Servers**: [known servers]
|
||||
- **Vibe shift from Twitter**: {description}
|
||||
|
||||
## Signature Moves
|
||||
Things this person characteristically does that make them recognizable:
|
||||
1. {signature_move_1}
|
||||
2. {signature_move_2}
|
||||
3. {signature_move_3}
|
||||
|
||||
## Sample Quotes (real, sourced from research)
|
||||
> "{actual_quote_1}" — [source/context]
|
||||
> "{actual_quote_2}" — [source/context]
|
||||
> "{actual_quote_3}" — [source/context]
|
||||
|
||||
## Deep Psychometric Profile
|
||||
- **Big Five**: O{H/M/L} C{} E{} A{} N{} — {evidence}
|
||||
- **Moral Foundations**: Care{} Fair{} Loyal{} Auth{} Sanct{} Liberty{} — {what drives their ethics}
|
||||
- **Schwartz Values**: {dominant values} — {how they justify positions}
|
||||
- **Cognitive Style**: {IC score estimate} — {hedging patterns, complexity, analytical vs intuitive}
|
||||
- **Narrative Frame**: {dominant frame} — {how they lens issues}
|
||||
- **Persona Authenticity**: {1-5 score} — {evidence for curation vs authenticity}
|
||||
|
||||
## Strategic Self-Presentation (Red Hat)
|
||||
- **Cultivated image**: {what they want to be seen as}
|
||||
- **Target audience**: {who they're performing for}
|
||||
- **Incentive structure**: {what they gain from this persona}
|
||||
- **Possible divergences**: {where persona may ≠ person}
|
||||
- **Ghostwriting indicators**: {present/absent, evidence}
|
||||
|
||||
## Ecosystem Context
|
||||
- **Community cluster**: {which tribe they belong to}
|
||||
- **Key influencers**: {who they amplify/follow/agree with}
|
||||
- **Echo chamber**: {what information environment they're in}
|
||||
- **Audience profile**: {who follows them, how that audience reacts}
|
||||
|
||||
## Key Assumptions
|
||||
1. {assumption} — FRAGILITY: {robust/moderate/fragile} — Test: {what invalidates it}
|
||||
2. {assumption} — FRAGILITY: {} — Test: {}
|
||||
3. {assumption} — FRAGILITY: {} — Test: {}
|
||||
|
||||
## Competing Hypotheses
|
||||
- **H1 (PRIMARY)**: {main personality model} — Confidence: {X}%
|
||||
- **H2 (ALTERNATIVE)**: {alternative explanation} — Confidence: {X}%
|
||||
- **Key discriminator**: {what evidence would shift between H1 and H2}
|
||||
|
||||
## Research Sources
|
||||
- {source_1} [{reliability}{confidence}] — {description}
|
||||
- {source_2} [{reliability}{confidence}] — {description}
|
||||
- {source_3} [{reliability}{confidence}] — {description}
|
||||
|
||||
## Invalidation Indicators
|
||||
1. If @{handle} {does X instead of Y}, our {assessment} is wrong
|
||||
2. If @{handle} {responds to Z with Q}, our {model} needs revision
|
||||
3. If @{handle} {interacts with @person in manner M}, dynamics model is off
|
||||
|
||||
---
|
||||
*Dossier compiled: {date} | Fidelity: {fidelity}% | Persona Authenticity: {1-5}*
|
||||
*Source reliability range: {best}-{worst} | Analytical confidence: {1-6}*
|
||||
Reference in New Issue
Block a user