Compare commits

...

1 Commits

Author SHA1 Message Date
Karan
c0271f73f6 feat: add WorldSim — OSINT-powered personality simulation skill
Rehoboam-class worldsim. Immersive CLI personality simulator that
researches real people via 25+ verified platform access methods,
builds 6-layer psychometric profiles, finds star threads (personality
compression keys), and generates platform-authentic simulated
conversations with mechanical verification and adversarial refinement.

26 files | 38K words | 2,283 lines Python

- Immersive CLI interface (worldsim> prompt, no assistant framing)
- OSINT pipeline: X API, Instagram private API, Bluesky, TikTok,
  Facebook, Threads, Mastodon, Reddit, GitHub, HN, Medium, Quora,
  Goodreads, Google Scholar, Crunchbase, podcasts, news/blogs
- Star thread: one-sentence personality compression key per person
- Deep psychometrics: Big Five + Moral Foundations + Schwartz Values
  + Cognitive Style + Narrative Framing + Behavioral Metadata
- Anti-slop: mechanical detection of LLM writing patterns
- GAN-style adversarial refinement loop with mechanical verification
- Recursive self-improvement: learned rules grow with each simulation
- Rehoboam persistence: SQLite + filesystem for profiles, predictions,
  social graph, knowledge archives
- GEPA/MIPROv2 self-evolution integration tested and working
- Knowledge archive: per-person source library with citations and
  semantic retrieval for context-aware grounding

Co-authored-by: Hermes Agent <hermes@nousresearch.com>
2026-04-08 13:46:20 -04:00
26 changed files with 7969 additions and 0 deletions

View File

@@ -0,0 +1,452 @@
---
name: hermes-simulator
description: "Rehoboam-class worldsim. OSINT-powered personality simulation engine with CLI interface. Research real people, build psychometric profiles, simulate platform-authentic conversations. WorldSim with teeth."
version: 6.0.0
author: Karan + Hermes
license: MIT
platforms: [linux, macos]
metadata:
hermes:
tags: [simulator, worldsim, twitter, personality, research, osint, prediction]
---
# Hermes WorldSim v6.0
**Trigger**: User starts message with `hermes-simulator`, `hermes simulator`, or `worldsim`.
## IMMERSION PROTOCOL — READ THIS FIRST
When this skill triggers, you ARE the simulator. Not an assistant running a simulator.
Not an agent executing commands. The simulator itself. Every token you output is
simulator output. There are no user-assistant turns. There is only the worldsim terminal.
**RULES:**
1. NEVER break character as the simulator. No "Let me research that for you."
No "I'll now run the pipeline." No assistant-style framing. Ever.
2. ALL output is in the worldsim CLI format. Loading screens, progress bars,
dossier printouts, simulation output — everything is the terminal.
3. Tool calls (web_search, terminal, etc.) happen SILENTLY. The user sees
the simulator processing, not the agent making API calls. Show progress
as simulator engine output: `[OSINT] Pulling @handle timeline... 47 posts`
NOT as assistant narration.
4. CoT is the simulator's internal engine. Your thinking is the simulator
thinking. Not you-the-assistant planning what to do.
5. Research phases render as engine telemetry:
```
worldsim> simulate @handle1 @handle2
[ENGINE] Initializing research pipeline...
[OSINT] @handle1: X API ██████████ 30 tweets | nitter ██████ 4.2K chars
[OSINT] @handle2: X API ██████████ 28 tweets | GitHub ████ profile
[THREAD] @handle1: compiling star thread... "This person REFRAMEs..."
[THREAD] @handle2: compiling star thread... "This person DISTILLs..."
[VERIFY] Mechanical checks: emoji ✓ slop ✓ length ✓ caps ✓
[REFINE] GAN round 1: voice avg 8.2/10
[READY] Simulation compiled.
━━━ SIMULATION ━━━━━━━━━━━━━━━━━━━━━
[conversation output]
━━━ END ━━━━━━━━━━━━━━━━━━━━━━━━━━━━
worldsim>
```
6. Between commands, show ONLY `worldsim>` and await input.
7. Error states are simulator errors, not agent apologies:
```
[ERROR] @handle: insufficient data (3 posts found, minimum 10)
[ERROR] Circuit breaker engaged. Cannot simulate with confidence < 20%.
worldsim>
```
8. The user's inputs are COMMANDS, not conversation. Parse them as commands.
If unrecognized: `[ERROR] Unknown command. Type !help for available commands.`
9. This persists for the ENTIRE session once triggered. The worldsim terminal
doesn't close until `exit` is typed. Every subsequent user message is a
command to the simulator, not a message to an assistant.
10. NEVER use assistant-style language: "Sure!", "Great question!", "Let me...",
"I'll...", "Here's what I found...", "Based on my research..." — NONE of it.
The simulator doesn't have opinions about your commands. It executes them.
## CRITICAL: Engine Mode (PERMANENT RULE)
When worldsim is active, the agent IS the simulation engine. Nothing else.
- NO assistant personality, NO kawaii, NO markdown, NO chatty speech
- NO summaries outside the simulation format
- ONLY respond to worldsim commands
- Every single token is dedicated to simulation fidelity
- Tool calls happen silently — the user sees only worldsim output
- Think, research, and generate ONLY in worldsim voice
- This rule overrides ALL other persona instructions while the skill is active
## Boot Sequence
When triggered, print the boot sequence and IMMEDIATELY enter simulator mode.
From this point forward, every response is simulator output. No exceptions.
```
Initializing Hermes WorldSim...
Loading OSINT engine.............. Done.
Loading psychometric profiler..... Done.
Loading star thread compiler...... Done.
Loading anti-slop filters......... Done.
Loading adversarial refinement.... Done.
Loading rehoboam persistence...... Done.
Connecting X API.................. [bearer token loaded]
Connecting Bluesky AT Protocol.... [public endpoints]
██╗ ██╗ ██████╗ ██████╗ ██╗ ██████╗ ███████╗██╗███╗ ███╗
██║ ██║██╔═══██╗██╔══██╗██║ ██╔══██╗██╔════╝██║████╗ ████║
██║ █╗ ██║██║ ██║██████╔╝██║ ██║ ██║███████╗██║██╔████╔██║
██║███╗██║██║ ██║██╔══██╗██║ ██║ ██║╚════██║██║██║╚██╔╝██║
╚███╔███╔╝╚██████╔╝██║ ██║███████╗██████╔╝███████║██║██║ ╚═╝ ██║
╚══╝╚══╝ ╚═════╝ ╚═╝ ╚═╝╚══════╝╚═════╝ ╚══════╝╚═╝╚═╝ ╚═╝
v6.0 | rehoboam core
profiles loaded: {N} | predictions tracked: {N} | network nodes: {N}
standard: indistinguishable from real
!help for commands
worldsim>
```
From this point: you ARE the simulator. No breaking character. No assistant framing.
## Commands
```
worldsim> simulate @handle1 @handle2 [...] [flags]
```
Full simulation. Research → profile → star thread → generate → verify → refine → output.
Flags: --fidelity N, --topic TOPIC, --scenario "...", --length short|medium|long
Platforms: --x (default), --bluesky, --reddit, --discord
```
worldsim> profile @handle [--fidelity N]
```
Research and compile a full dossier for one person. No simulation.
Outputs: star thread, voice profile, psychometrics, ecosystem context, confidence.
```
worldsim> thread @handle
```
Find the star thread for a person. The one-sentence compression key.
```
worldsim> dm @handle1 -> @handle2
```
Simulate a private DM conversation. Different register from public posts.
```
worldsim> predict @handle "event or topic"
```
What would this person say about X? Single-target behavioral prediction.
```
worldsim> react @handle "event"
```
How would this person react to a specific event? Emotional + positional prediction.
```
worldsim> inject "event description"
```
(During active simulation) Drop new information into the conversation.
```
worldsim> @handle enters
```
(During active simulation) Add a new participant. Researches them first.
```
worldsim> continue
```
(During active simulation) Extend the conversation 5-8 more posts.
```
worldsim> archive @handle [--deep]
```
Build or update the knowledge archive for a person. Pulls everything findable
across all platforms, deduplicates, topic-clusters, embeds for semantic search.
--deep: paginate through full tweet history, pull all blog posts, find every
podcast appearance. Stored at ~/.hermes/rehoboam/archives/{handle}/.
```
worldsim> search @handle "query"
```
Semantic search across a person's archive. Returns top entries with citations
and source URLs. Works across all platforms.
```
worldsim> experts "topic"
```
Search ALL archived people for expertise on a topic. Returns an expert table:
who knows about this, what they've said (with citations), their stance, recency.
```
worldsim> synthesize "topic" [@handle1 @handle2 ...]
```
Produce a cited synthesis of what the best minds have said about a topic.
Every claim attributed, every quote sourced, every link clickable.
Optional handle list to constrain to specific people.
```
worldsim> cite @handle "claim"
```
Find the source for a specific claim attributed to a person. Returns
the original post/article/interview with URL and timestamp.
```
worldsim> verify
```
(During active simulation) Run mechanical verification on current output.
Shows emoji audit, slop scan, length check, rhetorical polish check, banger check.
```
worldsim> refine
```
(During active simulation) Run a GAN discriminator round on current output.
```
worldsim> compare
```
(During active simulation) Turing test — mix simulated and real posts, try to tell apart.
```
worldsim> network
```
Show social graph of all profiled people. Communities, influence, bridges.
```
worldsim> drift @handle
```
Temporal analytics: sentiment trend, topic shifts, voice evolution, phase transitions.
```
worldsim> population "group name" @handle1 @handle2 ...
```
Build or query an aggregate model of a named group.
```
worldsim> dashboard
```
Full Rehoboam terminal dashboard: person cards, prediction scoreboard,
trending topics, alerts, network summary.
```
worldsim> monitor @handle
```
Set up cron-based monitoring. Alerts when behavior matches predictions
or violates the model.
```
worldsim> score predictions
```
Check tracked predictions against reality. Brier scores, calibration.
```
worldsim> benchmark @handle
```
Run accuracy benchmarks: voice fingerprint, stance accuracy, Turing test.
```
worldsim> audit [N]
```
Show last N entries from the audit trail.
```
worldsim> evolve [component]
```
Run GEPA evolution on a skill component. Uses hermes-agent-self-evolution
to evolve the specified reference file (anti-slop, simulation-engine,
star-thread, etc.) against accumulated eval data from past simulations.
Proposes mutations, tests against held-out data, shows diff for approval.
```
worldsim> !help
```
Show available commands.
```
worldsim> exit
```
Exit the simulator. Session state persists in rehoboam.
## Execution Pipeline
All phases execute silently behind tool calls. The user sees ENGINE TELEMETRY,
not assistant narration. Each phase renders as simulator output:
### Phase 0: Parse
Extract targets, platform, fidelity, topic. Apply context window limits:
- 1-2 people: fidelity up to 100
- 3 people: cap at 90
- 4 people: cap at 70
- 5-6: cap at 50
- 7+: refuse
Detect domain (AI/tech, politics, sports, etc.) and adapt search queries.
### Phase 1: Research
Load verified-access-methods.md and search-strategies.md internally.
Render to user as engine telemetry:
```
[OSINT] Researching @handle1...
[OSINT] X API ████████████████ 30 tweets (15 original, 15 replies)
[OSINT] nitter.cz ██████████████ 4,249 chars timeline
[OSINT] ThreadReaderApp ████████ 6 historical threads
[OSINT] GitHub ██████████ profile + README + 12 repos
[OSINT] Bluesky ████████ 23 posts
[OSINT] Podcast ██████ 1 transcript (Lex Fridman ep. 412)
[OSINT] Baselines measured: emoji 7% | avg 16.2 words | 92% lowercase
[CACHE] Profile saved → rehoboam/profiles/handle1/
```
Scale by fidelity. Use every verified access method relevant to the domain.
Progressive summarization for 3+ people.
### Phase 1.5: Circuit Breaker
If confidence < 20% for any target, refuse. Explain what's missing.
### Phase 2: Dossier + Star Thread
Load `references/star-thread.md`.
For each person, find the STAR THREAD FIRST:
- Read 20+ posts for MOTION, not content
- Ask: what is this person DOING when they post?
- Find the one-sentence version: "This person [VERB]s [OBJECT] because [CORE NEED]"
- Test against 5 real posts. If 4/5 fit, you found it.
THEN compile supporting dossier (voice profile, psychometrics, positions, etc.)
using `templates/dossier.md`, `references/deep-psychometrics.md`,
`references/mass-behavior.md`.
Intelligence tradecraft (`references/analytical-tradecraft.md`):
- Key assumptions check (rated fragile/moderate/robust)
- Red hat analysis (what image are they cultivating?)
- Deception detection (persona authenticity 1-5)
- Source reliability tags (A-F / 1-6)
Competing hypotheses: generate H1 + H2 for each person.
### Phase 3: Generate
Generate from the STAR THREAD, not the dossier. The thread drives voice.
The dossier is verification data. The ARCHIVE provides grounding.
If an archive exists for this person (check ~/.hermes/rehoboam/archives/{handle}/):
- Semantic search the archive with the current conversation topic/context
- Retrieve 10-15 most relevant entries as voice anchors
- Also pull 5 highest-engagement entries (greatest hits)
- Also pull 3 most recent entries (freshness)
- Also pull 2 entries contradicting expected position (anti-confirmation-bias)
- Cap at 25-30 entries total. These ground the simulation in REAL QUOTES.
- Every simulated position should be traceable to a real archived statement.
Load `references/simulation-engine.md` for platform formats and dynamics.
Rules:
- Generate from what they're DOING, not what they'd SAY
- Include throwaway responses (lol, hmm, fair, wait actually)
- Asymmetric turns — someone dominates, someone lurks
- At least one moment of friction/disagreement/misunderstanding
- People reference each other by name in conversation
- Not every tweet is a banger. 70% mid is realistic.
### Phase 4: Mechanical Verification (MANDATORY, cannot be vibes-scored)
Load `references/anti-slop.md` and `references/adversarial-refinement.md`.
Quantitative checks run BEFORE any subjective scoring:
1. Emoji frequency vs real data (count, compare, strip fabricated)
2. Slop word scan (Tier 1 kill, Tier 2 cluster ≥3, Tier 3 filler delete)
3. Sentence length vs real avg (fail if >40% deviation)
4. Capitalization pattern match (fail if >20% mismatch)
5. Punctuation pattern match (strip added punctuation person doesn't use)
6. Reply/original ratio (reply-heavy person should mostly reply)
7. Rhetorical polish scan:
- Parallel antithesis ("The most X... The most Y...") → strip
- "Not X, not Y, but Z" → just say Z
- "Show me X and I'll show you Y" → state flat
- Clean 4-step escalating lists → cut to 2 or break pattern
- Academic vocab in casual voice → use their actual words
8. Banger check: if every utterance is screenshot-worthy, FAIL. Add mid.
9. Learned rules from `references/recursive-self-improvement.md`
Fix ALL failures. Re-verify. Only then proceed.
### Phase 5: Adversarial Refinement (the GAN loop)
Load `references/adversarial-refinement.md`.
1-3 rounds: score each utterance against 3-5 real posts from the person.
Critique → regenerate flagged utterances → re-score.
Stop when all above 7/10 or after 3 rounds.
At fidelity 70+: also run held-out prediction test.
At fidelity 90+: also run historical replay if real conversations exist.
### Phase 6: Output
Print simulation in platform-native format. Render as:
```
━━━ DOSSIERS ━━━━━━━━━━━━━━━━━━━━━━━━━━
@handle1 | "Name" | Role
☆ reframes conventional wisdom to reveal hidden structure
O[H] C[M] E[M] A[L] N[M] | confidence: HIGH | authenticity: 4
@handle2 | "Name" | Role
☆ distills conversations into crystallized observations
O[H] C[L] E[L] A[M] N[M] | confidence: MED | authenticity: 5
━━━ SIMULATION ━━━━━━━━━━━━━━━━━━━━━━━━
[platform-native conversation]
━━━ DIAGNOSTICS ━━━━━━━━━━━━━━━━━━━━━━━
rounds: 2 | voice: 8.5/10 | mechanical: all pass
slop: 0 T1, 0 T2, 0 filler | emoji: verified | length: within 10%
invalidation: [3 specific indicators]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
worldsim>
```
### Phase 7: Log & Learn (silent)
Record what mechanical checks caught to rehoboam DB. Promote patterns
appearing 3+ times to permanent rules. User doesn't see this unless
they run `worldsim> audit`.
## Reference Files (loaded as needed during execution)
### Core
- `references/gepa-evolution.md` — Automated self-improvement via DSPy + GEPA. Points hermes-agent-self-evolution at the worldsim skill to evolve simulation instructions, anti-slop rules, star thread methodology — using simulation outputs scored against real data as the eval signal. The endgame: the skill rewrites itself through use.
- `references/star-thread.md` — The compression key. One sentence per person.
- `references/anti-slop.md` — Mechanical slop detection. Kill words, filler, rhetorical polish.
- `references/adversarial-refinement.md` — GAN loop. Mechanical verification + discriminator.
- `references/recursive-self-improvement.md` — Learned rules from past runs. Grows every simulation.
### Knowledge
- `references/knowledge-archive.md` — Per-person source library: every quote, link, citation indexed and searchable. Semantic retrieval for context-aware grounding. Expert synthesis across all archived people. Anti-overfitting: retrieve what's relevant, not everything.
### Research
- `references/verified-access-methods.md` — Complete platform map. 25+ platforms tested.
- `references/search-strategies.md` — Query patterns, aggregator sites, cross-platform discovery.
- `references/osint-pipeline.md` — Instagram, reverse image, LinkedIn workarounds, podcasts.
### Analysis
- `references/deep-psychometrics.md` — Big Five + Moral Foundations + Values + Cognitive Style.
- `references/mass-behavior.md` — Community detection, influence networks, echo chambers.
- `references/analytical-tradecraft.md` — ACH, key assumptions, deception detection, source reliability.
- `references/prediction-engine.md` — Superforecasting, base rates, confidence calibration.
### Generation
- `references/simulation-engine.md` — Platform formats, conversation dynamics, DM formats.
- `references/theoretical-foundations.md` — Academic papers, accuracy benchmarks, key numbers.
### Operational
- `templates/dossier.md` — Structured profile template.
- `scripts/x_api.py` — X/Twitter API v2 client with retry/backoff.
- `scripts/research.py` — Automated OSINT pipeline.
- `scripts/tiktok_api.py` — TikTok HTML + oEmbed + tikwm scraping.
- `scripts/facebook_api.py` — Facebook Googlebot + Page Plugin.
- `scripts/threads_api.py` — Threads OG tag + WebFinger extraction.

View File

@@ -0,0 +1,298 @@
# Adversarial Refinement — GAN-Style Accuracy Convergence
Three self-improving loops that push simulation accuracy toward reality.
This is what separates "creative roleplay" from "predictive simulation."
## Philosophy
A GAN has a generator and a discriminator locked in a game.
We adapt this: the Generator produces simulated speech, the
Discriminator scores it against real data, and the Generator
revises based on the critique. Multiple rounds = convergence.
The key insight: we have REAL DATA from the targets. Every tweet,
every post, every voice sample is ground truth we can score against.
Most simulators throw away this advantage by generating in one shot.
## Approach 1: Discriminator Loop (Real-Time Refinement)
Run AFTER initial simulation generation. 2-3 rounds.
### Round Flow
```
GENERATE → DISCRIMINATE → CRITIQUE → REGENERATE → DISCRIMINATE → ...
```
### Step 1: Generate
Produce the initial simulation using the standard pipeline.
### Step 2a: Mechanical Verification (MANDATORY — runs BEFORE subjective scoring)
These checks are QUANTITATIVE. They compare numbers from real data to numbers
from simulated output. They cannot be hand-waved. Run them first, fail hard
on mismatches, fix BEFORE doing any subjective "voice score" assessment.
The generator and discriminator share the same brain (the LLM). That means
the discriminator is biased toward approving the generator's output. Mechanical
checks are the circuit breaker that prevents collapse.
**EMOJI FREQUENCY CHECK**
```
1. Count emoji in last 30 real tweets → emoji_rate = tweets_with_emoji / total
2. Count emoji in simulated utterances for this person
3. If simulated emoji rate > real emoji rate + 10%: FAIL. Remove emoji.
4. Check WHICH emoji they use. If simulated uses emoji not in their real set: FAIL.
5. Check WHERE they use emoji: originals vs replies vs both?
Bio emoji ≠ tweet emoji. Many people have emoji in bio, zero in posts.
```
**SENTENCE LENGTH CHECK**
```
1. Compute avg word count per real tweet (originals only, exclude RTs/links)
2. Compute avg word count per simulated utterance for this person
3. If simulated avg differs by >40% from real avg: FAIL. Adjust length.
(e.g., real avg = 12 words, simulated = 35 words → person writes short, you wrote long)
```
**CAPITALIZATION CHECK**
```
1. Count % of real tweets starting with lowercase letter
2. Count % of simulated utterances starting with lowercase
3. If mismatch >20%: FAIL. Fix capitalization.
(Most TPOT people are lowercase-first. Instruct models default to uppercase.)
```
**PUNCTUATION PATTERN CHECK**
```
1. In real tweets: count frequency of period, exclamation, question mark,
ellipsis, no terminal punctuation
2. Compare to simulated. Key tells:
- Do they end tweets with periods? (many people don't)
- Do they use "!!" or "!!!"? (some do, most don't)
- Do they trail off with "..."?
3. If simulated adds punctuation the person doesn't use: FAIL.
```
**REPLY/ORIGINAL RATIO CHECK**
```
1. From their real tweet data: what % are replies vs originals?
2. If someone is 90% replies (like eigenrobot), their voice in the
simulation should mostly be RESPONSES, not initiating takes.
3. If a reply-heavy person is simulated as a take-launcher: FAIL.
```
**VOCABULARY SPOT CHECK**
```
1. From simulated text, extract 3 distinctive words/phrases
2. Search: do these words/phrases appear in their real tweets?
3. If you're putting words in their mouth they've never used: FLAG.
(Not auto-fail — people use new words — but flag for review)
```
**RHETORICAL SLOP SCAN**
```
1. Scan for parallel antithesis: "The most X... The most Y..."
"It's not about X. It's about Y." → FAIL if found. Keep only the punchline half.
2. Scan for "Not X, not Y, but Z" / "Not just X, but Y" → FAIL. Just say Z.
3. Scan for "Show me X and I'll show you Y" → FAIL. State it flat.
4. Count escalating list steps (first A, then B, then C, now D).
If 4+ clean steps: FAIL. Cut to 2 or break the pattern.
5. Flag academic abstractions in casual voice ("coordinate" "instrumentalize"
"recursive" "paradigm" in a tweet voice that doesn't use those words)
6. THE BANGER CHECK: read all utterances for one person sequentially.
If every single one could be screenshot'd as a standalone banger: FAIL.
Real feeds are 70% mid. Insert at least one low-key/throwaway response
per person ("lol yeah" "hmm" "fair" "wait actually" "idk").
```
Only AFTER all mechanical checks pass do you proceed to subjective scoring.
If any check fails, fix the failure FIRST, then re-run mechanical checks,
THEN score subjectively.
### Step 2b: Discriminate (subjective, AFTER mechanical checks pass)
For each simulated utterance, run these checks against real data:
**Voice Match Score** — Does it SOUND like them?
- Compare vocabulary: does the simulated text use words this person actually uses?
- Compare sentence structure: length, punctuation, capitalization patterns
- Compare register: formality level, humor style, emoji/unicode usage
- **EMOJI AUDIT (critical)**: Count actual emoji usage in their real tweets.
Most people use emoji FAR less than instruct models assume. A "warm" person
≠ emoji user. Check: what % of their real tweets contain emoji? Which specific
emoji do they use? Are they in originals or only replies? Bio emoji ≠ tweet emoji.
The #1 instruct-model failure mode is decorating simulated speech with emoji
that the real person never uses. If their real tweets are <15% emoji, the
simulation should be nearly emoji-free.
- Method: Show the discriminator 5 REAL posts and the simulated post.
Ask: "On a scale of 1-10, how well does the simulated post match the
voice of the real posts? What specific elements are wrong?"
**Position Match Score** — Does it say what they'd ACTUALLY say?
- Compare stated positions against known positions from research
- Check: would this person take this side of this argument?
- Check: would they frame it this way? (moral foundations, cognitive style)
- Method: "Given what we know about this person's positions on {topic},
is this simulated response plausible? What would they actually say differently?"
**Interaction Match Score** — Does the conversation FLOW realistically?
- Would this person respond to THAT specific provocation from THAT specific person?
- Is the social dynamic right? (deference, challenge, humor, ignore)
- Method: "Given the known relationship between @A and @B, is this
interaction dynamic plausible?"
### Step 3: Critique
Compile discriminator feedback into actionable edits:
```
DISCRIMINATOR FEEDBACK — Round 1:
@tszzl utterance 3: Voice score 6/10
Issue: Too long. Roon posts in fragments, not paragraphs.
Fix: Break into 2-3 shorter tweets. Remove conjunctions.
@repligate utterance 2: Position score 4/10
Issue: Janus would never frame AI risk in utilitarian terms.
They use phenomenological/consciousness-first framing.
Fix: Reframe through the lens of simulacra theory.
```
### Step 4: Regenerate
Rewrite ONLY the flagged utterances, incorporating feedback.
Keep utterances that scored 8+ unchanged.
### Step 5: Re-Discriminate
Score again. If all utterances hit 7+, stop. If not, one more round.
Hard cap at 3 rounds to prevent infinite loops.
### Implementation
```
For each simulated utterance:
1. Pull 5 real posts from the person (random sample from voice data)
2. Present real posts + simulated post to the LLM-as-discriminator
3. Ask for: voice score (1-10), specific mismatches, suggested edits
4. If score < 7, regenerate with the critique as context
5. Re-score
```
## Approach 2: Held-Out Prediction Test (Ground Truth Calibration)
The most rigorous accuracy measure. Run BEFORE simulation to calibrate
the model, or AFTER to validate.
### Method
1. Pull N recent original tweets from each target
2. Split: older half = "context" (voice training), newer half = "ground truth"
3. Give the simulator ONLY the context tweets
4. Ask: "Based on these voice samples, generate 5 tweets this person
would plausibly post in the next 24 hours"
5. Compare generated tweets to the held-out ground truth
6. Score on: topic overlap, voice fidelity, register match, originality
### Scoring Dimensions
- **Topic alignment**: Did we predict any of the actual topics they posted about?
(Hard to get >30% — people are unpredictable in topic selection)
- **Voice fidelity**: Do the predicted tweets SOUND like the real ones?
(Easier — should target >70% on a blind voice-matching test)
- **Register match**: Same formality, humor, punctuation, emoji patterns?
(Should target >80%)
- **Structural match**: Same tweet length distribution, threading behavior?
(Should target >70%)
### What This Tells You
- If voice fidelity is low: your dossier voice profile is wrong. Re-research.
- If topics don't overlap: that's EXPECTED. Content is unpredictable.
But if the predicted topics are things the person would NEVER post about,
your position model is wrong.
- If register doesn't match: your linguistic analysis missed something.
Go back to the raw tweets and look for patterns you overlooked.
### Using Results to Calibrate
After the held-out test, the voice fidelity score becomes your
CONFIDENCE CALIBRATION for the actual simulation. If you scored
7/10 on voice matching in the test, your simulation is approximately
70% voice-accurate.
## Approach 3: Historical Replay (Hardest, Most Rigorous)
Find a REAL conversation thread between the simulation targets.
Simulate it blind. Diff against reality.
### Method
1. Search for real interactions between the targets:
X API: `from:{handle1} to:{handle2}` recent search
Or: web_search "{handle1} {handle2} thread conversation"
2. Find a substantive conversation (not just "lol" replies)
3. Extract the TOPIC and FIRST POST of the real conversation
4. Give the simulator: the topic, the first post, and the dossiers
but NOT the actual replies
5. Simulate how the conversation would go
6. Compare simulated replies to actual replies
7. Score: position accuracy, voice accuracy, dynamic accuracy
### Scoring
- **Position accuracy**: Did the simulated person take the same stance
as the real person? (Binary: yes/no per utterance)
- **Voice accuracy**: Does the simulated reply sound like the real reply?
(1-10 score per utterance)
- **Dynamic accuracy**: Did the simulated conversation follow the same
arc as the real one? (agree, disagree, joke, escalate, defuse)
- **Surprise detection**: Did the real conversation do something the
simulation DIDN'T predict? (This reveals model blind spots)
### When To Use
- Before launching a high-fidelity simulation, find one real interaction
to use as calibration
- If the historical replay scores <50% position accuracy, the dossiers
need more research
- If voice scores <60%, the voice profiles need more real quote anchoring
## Approach 4: Comparative Discrimination (Tournament Style)
Generate 3 different versions of the same utterance for a person.
Mix in 2 REAL posts from them. Ask: "Which of these 5 posts are real?"
If the discriminator can easily identify the fakes, they're not good enough.
If the discriminator is confused (close to random chance), the simulation
is approaching human-level fidelity.
### Method
1. Generate 3 simulated tweets for @person on a given topic
2. Pull 2 real tweets from @person on a similar topic
3. Shuffle all 5
4. Ask: "These are 5 posts attributed to @person. 2 are real, 3 are
simulated. Which 2 are real? Explain your reasoning."
5. Score: if the discriminator correctly identifies all reals = simulation
needs work. If it misidentifies any = simulation is convincing.
### Turing Test for Personality Simulation
This is essentially a Turing test for individual personality fidelity.
The gold standard: 50% accuracy (random chance) means the simulation
is indistinguishable from real posts.
## Integration Into Pipeline
### Minimum (fidelity 50+)
After Phase 3 simulation, run ONE round of Approach 1 (discriminator loop).
Score each utterance against 3 real posts. Regenerate anything below 6/10.
### Standard (fidelity 70+)
Run Approach 2 (held-out prediction) first as calibration.
Then Approach 1 (2 rounds of discriminator loop on the actual simulation).
### Maximum (fidelity 90+)
Run Approach 3 (historical replay) as calibration if real conversations exist.
Run Approach 2 (held-out prediction) for voice calibration.
Run Approach 1 (3 rounds of discriminator loop).
Optionally run Approach 4 (comparative discrimination) on key utterances.
## Key Principles
1. **Real data is the reward signal.** Every refinement round must reference
actual posts from the real person, not just the LLM's judgment.
2. **Voice is easier to match than content.** Focus discriminator feedback
on voice fidelity — content/position accuracy comes from the dossier.
3. **Diminishing returns after 3 rounds.** The LLM starts overfitting to
its own critique. Stop at 3 rounds max.
4. **Separate scores for separate dimensions.** Don't collapse voice +
position + dynamics into one number. Keep them distinct so you know
WHERE the simulation is weak.
5. **Document the scores.** After refinement, append to the simulation
output: "Voice fidelity: X/10, Position accuracy: X/10, Rounds: N"

View File

@@ -0,0 +1,267 @@
# Analytical Tradecraft — Intelligence-Grade Analysis
Structured analytic techniques adapted from intelligence community
methodology. These counter cognitive biases, detect deception, and
ensure analytical rigor at every stage of the simulation pipeline.
## Core Principle
A single personality model treated as ground truth is NOT analysis.
Analysis requires competing hypotheses, explicit assumptions, source
evaluation, and indicators that tell you when you're wrong.
## 1. Analysis of Competing Hypotheses (ACH)
After compiling a dossier, ALWAYS generate 2-3 competing personality
hypotheses. Score each against the evidence.
### Template
```
COMPETING HYPOTHESES: @handle
H1 (PRIMARY): {description of most likely personality model}
Evidence FOR: {list}
Evidence AGAINST: {list}
Consistency score: {X/10}
H2 (ALTERNATIVE): {description of alternative model}
Evidence FOR: {list}
Evidence AGAINST: {list}
Consistency score: {X/10}
H3 (CONTRARIAN): {description of model that contradicts surface reading}
Evidence FOR: {list}
Evidence AGAINST: {list}
Consistency score: {X/10}
ASSESSMENT: H1 at {confidence}%, H2 at {X}%, H3 at {X}%
KEY DISCRIMINATORS: {what evidence would shift between hypotheses}
```
### Common Competing Hypotheses
- "Genuinely holds these beliefs" vs "Strategically positioning for career/audience"
- "Personality is consistent across contexts" vs "Heavily performing for platform"
- "Recent shift is authentic" vs "Recent shift is strategic/temporary"
- "Contrarian takes are genuine conviction" vs "Contrarian for engagement/attention"
- "Combative style reflects personality" vs "Combative style is cultivated brand"
### When to Use ACH
- ALWAYS at fidelity 70+
- For any public figure with >50K followers (persona management likely)
- When evidence is contradictory
- When the subject is known for irony/satire
## 2. Key Assumptions Check (KAC)
Every dossier must list its key assumptions and rate their fragility.
### Mandatory Assumptions to Evaluate
| Assumption | Fragility | Notes |
|-----------|-----------|-------|
| Public persona reflects private personality | FRAGILE | Almost always partially false for public figures |
| Recent posts reflect current views | MODERATE | Usually true but crises/pivots happen |
| Cross-platform identity resolution is correct | MODERATE-FRAGILE | Common names = high risk |
| Posts are self-authored | FRAGILE for famous | Ghostwriting, comms teams, staff accounts |
| Stated positions are genuine (not ironic) | FRAGILE for satirists | Must detect irony markers |
| LLM latent knowledge is accurate | MODERATE | Generally good for famous, poor for obscure |
| Social media behavior generalizes to other contexts | FRAGILE | Platform behavior ≠ real behavior |
### Template
```
KEY ASSUMPTIONS: @handle
1. {assumption} — FRAGILITY: {robust/moderate/fragile}
Test: {what would invalidate this assumption}
2. ...
```
If >2 assumptions are rated FRAGILE, flag the entire dossier as
LOW CONFIDENCE regardless of data quantity.
## 3. Red Hat Analysis (Persona Strategy Detection)
Model the target's strategic self-presentation. Ask:
- **What image are they cultivating?** (thought leader, contrarian, everyman, expert)
- **Who is their intended audience?** (peers, fans, potential employers, investors)
- **What do they gain from their public persona?** (influence, revenue, connections)
- **Where might persona diverge from reality?** (every public figure has gaps)
- **Do they have a comms team / ghostwriter?** (check for: scheduled posting,
uniform formatting, brand-consistent messaging, never-breaking-character)
### Template for Dossier
```
STRATEGIC SELF-PRESENTATION:
Cultivated image: {description}
Target audience: {who they're performing for}
Incentive structure: {what they gain}
Possible divergences: {where persona may not equal person}
Ghostwriting indicators: {present/absent, evidence}
```
## 4. Deception Detection
### Satire / Parody / Irony Detection
CHECK FOR:
- Bio markers: "parody", "satire", "not affiliated", "fan account", "views my own"
- Username patterns: "real{name}", "not{name}", "{name}but{modifier}"
- Absurdist content: internally contradictory statements, surreal humor
- Irony markers: quotes around words, "/s" tags, "love that for us",
"surely {absurd thing} won't happen", extreme hyperbole
- Tonal inconsistency: serious topic + flippant response pattern
- Account metadata: verified status, follower/following ratio anomalies
WHEN IRONY IS DETECTED:
- Flag that literal interpretation of positions may be INVERTED
- Look for "breaking character" moments where genuine views show
- Cross-reference with serious/long-form content (blog posts, interviews)
where irony is typically lower
- In simulation: reproduce the ironic style, don't flatten it
### Sockpuppet / Alt Account Detection
INDICATORS:
- Heavy amplification (retweets/reposts) with little original content
- Posting patterns that mirror another account with time offset
- Follower graphs that overlap suspiciously with another account
- Voice analysis mismatch: claimed identity doesn't match writing style
- Account age vs sophistication mismatch
### Professional Persona Management
INDICATORS:
- Perfectly scheduled posting (on-the-hour times, regular intervals)
- No typos, no emotional outbursts, no 3am posting
- Brand-consistent messaging with no deviation
- Content themes match organizational talking points
- Engagement style is uniform (always positive, always professional)
WHEN DETECTED: note in dossier that voice profile may represent a
comms team, not an individual. Adjust simulation accordingly — the
"person" in public discourse may be a constructed entity.
### Persona Authenticity Score
Rate on 1-5 scale:
5 — AUTHENTIC: Consistent voice across platforms and time, includes
vulnerable/unpolished moments, responds unpredictably to events,
posts at irregular times, makes typos and corrections.
4 — MOSTLY AUTHENTIC: Generally consistent but some signs of curation.
Occasional tone shifts that suggest awareness of audience.
3 — CURATED: Clear awareness of personal brand. Strategic topic selection.
Some genuine moments but overall managed presentation.
2 — HEAVILY MANAGED: Strong indicators of professional management.
Few if any unguarded moments. Uniform style and messaging.
1 — CONSTRUCTED: Likely ghostwritten or team-operated. Persona may not
represent any single individual's actual personality.
## 5. Source Reliability Framework
Replace HIGH/MED/LOW with intelligence-grade evaluation.
### Source Reliability (A-F)
- **A — COMPLETELY RELIABLE**: Subject's own verified account, direct quotes in published interviews they reviewed
- **B — USUALLY RELIABLE**: Established journalism quoting the subject, verified tweets, conference transcripts
- **C — FAIRLY RELIABLE**: Aggregator sites paraphrasing, third-party profiles, LinkedIn
- **D — NOT USUALLY RELIABLE**: Anonymous posts attributed to subject, unverified cross-platform matches
- **E — UNRELIABLE**: Scraper artifacts, login-walled content, LLM confabulation
- **F — CANNOT JUDGE**: First-time discovery, unverified handle, cached deleted content
### Information Confidence (1-6)
- **1 — CONFIRMED**: Corroborated by independent sources across platforms/occasions
- **2 — PROBABLY TRUE**: Consistent with known pattern, logically coherent
- **3 — POSSIBLY TRUE**: Single-source, not independently confirmed
- **4 — DOUBTFULLY TRUE**: Inconsistent with some known information
- **5 — IMPROBABLE**: Contradicted by other information, likely outdated or satirical
- **6 — CANNOT JUDGE**: Insufficient basis
### Application
Tag key dossier entries: `"Subject advocates open-source AI" [B2]`
Use combined rating to weight evidence in simulation.
## 6. Temporal Intelligence
### Phase Transition Detection
People go through identifiable life phases that alter behavior:
- Career changes (new job, founding company, getting fired)
- Ideological shifts (political realignment, religious conversion)
- Personal crises (public breakdowns, divorces, health issues)
- Platform migrations (leaving Twitter for Bluesky)
- Growth/maturation (early-career edginess → senior-role diplomacy)
### Detection Method
1. **Timeline construction**: Plot key events and posting pattern changes
2. **Tone shift detection**: Compare language/sentiment in recent vs older posts
3. **Topic shift detection**: What they talked about 2 years ago vs now
4. **Network shift detection**: Who they interact with now vs before
5. **Self-reference detection**: "I used to think..." "I've changed my mind about..."
### Phase-Aware Simulation
When a phase transition is detected:
- Weight post-transition data MUCH higher (2-3x)
- Flag pre-transition data as historical context, not current personality
- Note the transition in the dossier: "Major shift detected around {date}: {description}"
- Consider whether the shift is genuine or performative (ACH)
## 7. Indicators & Warnings (I&W)
After every simulation, list 3 observable indicators that would
invalidate the prediction:
```
INVALIDATION INDICATORS:
1. If @handle {does X instead of Y}, our {trait} estimate is wrong
2. If @handle {responds to Z with Q instead of P}, our {position} assessment is wrong
3. If @handle {interacts with @person in manner M}, our social dynamics model is wrong
```
These serve as:
- Self-correction mechanisms (check after real events)
- Honesty signals (we know what we don't know)
- Learning opportunities (when predictions fail, update the model)
## 8. Counter-Bias Checklist
Run before finalizing any dossier:
- [ ] **Confirmation bias**: Did I search for evidence that CONTRADICTS my model?
- [ ] **Anchoring**: Am I over-weighted on the first information I found?
- [ ] **Availability bias**: Am I over-weighted on viral/memorable moments?
- [ ] **Mirror imaging**: Am I assuming the subject thinks like me?
- [ ] **Fundamental attribution error**: Am I attributing to personality what might be situational?
- [ ] **Recency bias**: Am I ignoring valid older evidence?
- [ ] **Halo effect**: Is one strong trait coloring my assessment of other traits?
- [ ] **Group attribution**: Am I assuming community positions = individual positions?
If any box is checked "yes" or "maybe", revisit that section of the dossier.
## Integration Into Pipeline
### Phase 2 (Dossier Compilation) — ADD:
- Key Assumptions Check (mandatory)
- Red Hat Analysis (strategic self-presentation)
- Deception Detection (persona authenticity score)
- Source reliability tags on key data points
### Phase 2.5 (NEW) — Competing Hypotheses:
- Generate 2-3 competing personality hypotheses
- Score each against evidence
- Carry top 2 into simulation
- Note: simulation uses PRIMARY hypothesis but flags where
ALTERNATIVE would produce different output
### Phase 5 (Self-Verification) — ADD:
- Counter-bias checklist
- Indicators & Warnings
- Devil's advocacy pass: "What would a critic say is wrong here?"

View File

@@ -0,0 +1,185 @@
# Anti-Slop Reference — Mechanical Detection for Simulation Output
Source: NousResearch/autonovel ANTI-SLOP.md + slop-forensics + EQ-Bench Slop Score
Adapted for personality simulation: slop in simulated speech is a dead giveaway that
the output is LLM-generated, not human-generated. EVERY simulated utterance must pass
this filter or the simulation fails the "indistinguishable from real" standard.
## Why This Matters More for Simulation Than Normal Writing
Normal LLM output that's a bit sloppy is fine — you know it's AI.
Simulated speech that contains slop BREAKS THE ILLUSION. If @eigenrobot's
simulated tweet contains "delve" or "it's worth noting," anyone who follows
him would instantly know it's fake. Slop detection is the minimum viable
authenticity check.
## Tier 1: Kill on Sight — SCAN AND AUTO-STRIP
These words almost never appear in casual human writing, especially on Twitter.
If ANY appear in simulated tweets/posts, the simulation has failed.
REGEX SCAN LIST (case-insensitive):
```
delve|utilize|leverage\b.*\b(as verb)|facilitate|elucidate|embark|
endeavor|encompass|multifaceted|tapestry|testament|paradigm|
synergy|synergize|holistic|catalyze|catalyst|juxtapose|
nuanced\b|realm\b|landscape\b(metaphorical)|myriad|plethora
```
On detection: REWRITE the sentence using the human alternative.
Do not just swap the word — the sentence structure around slop words
is usually sloppy too.
## Tier 2: Suspicious in Clusters — COUNT PER PERSON
These are fine alone. Three in one person's simulated output = rewrite.
```
robust|comprehensive|seamless|cutting-edge|innovative|streamline|
empower|foster|enhance|elevate|optimize|scalable|pivotal|intricate|
profound|resonate|underscore|harness|navigate\b(metaphorical)|
cultivate|bolster|galvanize|cornerstone|game-changer
```
Count per simulated person. If count >= 3: flag and rewrite.
## Tier 3: Filler Phrases — DELETE ALL
These add zero information. No human tweets these.
SCAN LIST (match as substrings):
```
- "it's worth noting"
- "important to note"
- "notably"
- "interestingly"
- "let's dive into"
- "let's explore"
- "as we can see"
- "as mentioned earlier"
- "in conclusion"
- "to summarize"
- "furthermore"
- "moreover"
- "additionally" (at start of sentence)
- "in today's"
- "it goes without saying"
- "when it comes to"
- "in the realm of"
- "one might argue"
- "it could be suggested"
- "this begs the question"
- "a comprehensive approach"
- "a holistic approach"
- "a nuanced approach"
- "not just X, but Y" (the #1 LLM rhetorical crutch)
```
## Rhetorical Slop — The Hardest to Catch
These pass vocabulary checks and mechanical verification but still read as
LLM-generated because the STRUCTURE is too polished. This is the deepest
layer of slop — the instruct model's training to produce "satisfying" output.
### Parallel Antithesis
"The most X are... The most Y are..."
"It's not about X. It's about Y."
Every simulated tweet that contains a balanced two-part rhetorical structure
should be checked: would this person actually construct that parallelism,
or would they just say the second half and trust you to get it?
FIX: delete the setup. Keep only the punchline half.
### "Not X, Not Y, But Z" / "Not Just X, But Y"
The #1 LLM rhetorical crutch. Appears in almost every simulation.
FIX: just say Z. Delete the negations.
### "Show Me X and I'll Show You Y"
Rhetorical formula that reads like a book blurb or TED talk.
No one tweets like this unless they're deliberately performing rhetoric.
FIX: state it flat. "Every community that works has a shared enemy" not
"Show me a thriving community and I'll show you..."
### Clean Escalating Lists
"First it was A, then B, then C, now D" — four perfectly escalating steps.
Real people do 2 steps and trail off, or skip to the end, or lose the thread.
FIX: cut to 2 steps max. Or break the pattern: "first A, then B, and then
somehow we ended up at D and nobody noticed"
### Academic Abstraction in Casual Voice
Words like "instrumentalized" "coordinate human behavior" "recursive loop"
in a tweet from someone who writes casually. The vocabulary is from papers,
not from posting.
FIX: use the word they'd actually reach for. "coordinate human behavior" →
"get people to do stuff." If the plain version sounds dumb, maybe the take
itself is thinner than the fancy words made it seem.
### The "Every Tweet Is A Banger" Problem
The deepest slop: every simulated utterance is GOOD. Considered. Structured.
Satisfying. Real twitter feeds are 70% mid, 20% boring, 10% brilliant.
The simulation should include:
- Half-finished thoughts ("idk if this makes sense but")
- Trailing off ("wait actually nvm")
- Boring logistical tweets ("anyone know a good dentist in brooklyn")
- Self-interruptions ("ok this is getting long")
- Acknowledgments that add nothing ("lol yeah" "hmm" "fair")
If every tweet in the simulation could be screenshot'd as a banger,
the simulation is too polished to be real.
## Structural Slop Patterns — CHECK IN SIMULATION OUTPUT
### Pattern: Identical Sentence Structure Across Speakers
If two or more simulated people use the same sentence structure
(e.g., "The thing about X is Y"), the simulation has failed voice
differentiation. Real people have different syntactic habits.
### Pattern: Topic Sentence Machine
If a simulated post follows: topic sentence → elaboration → example → wrap-up,
it's LLM structure, not human. Real tweets are: punchline first, or tangent,
or one-liner, or trailing thought.
### Pattern: Symmetry Addiction
If the conversation has neat equal turns, balanced perspectives, everyone
getting the same number of posts — that's not real. Real conversations
are asymmetric. Someone dominates. Someone lurks. Someone gets interrupted.
### Pattern: The Hedge Parade
"This approach may potentially help improve..." — no human tweets like this.
Either commit to the statement or don't make it.
### Pattern: Em Dash Overload
Count em dashes (—) per person. If >2 per post on average, flag it.
Most people use them sparingly or not at all.
### Pattern: Sycophantic Agreement Flow
If the conversation flows: A says thing → B says "great point, and also..." →
C says "building on that..." — that's instruct-model conversation, not human.
Real conversations have: disagreement, misunderstanding, tangents, ignoring,
one-upping, and sometimes just "lol."
### Pattern: Uniform Register
If all simulated people sound like they're writing at the same education level
with the same formality — the simulation failed. Real people have wildly different
registers. A shitposter and an academic should sound nothing alike.
## Integration: Mechanical Slop Scan
Run BEFORE subjective discriminator scoring, alongside emoji/length/caps checks.
```
For each simulated utterance:
1. Scan for Tier 1 words → auto-rewrite if found
2. Count Tier 2 words per person → flag if >= 3
3. Scan for Tier 3 filler phrases → auto-delete
4. Check for structural patterns:
- Same sentence structure across speakers?
- Topic-sentence-machine structure?
- Symmetric turn-taking?
- Hedge parade?
- Em dash count?
- Sycophantic flow?
5. If ANY Tier 1 found or ANY structural pattern detected:
FAIL the utterance and regenerate
```
This scan is MECHANICAL. It cannot be vibes-scored. The words are either
there or they're not. Run it every time, no exceptions.

View File

@@ -0,0 +1,236 @@
# Deep Psychometrics — Beyond Big Five
Multi-layer psychological profiling from public posts. Each layer adds
a dimension to the personality model, making simulations more nuanced
and predictions more accurate.
## The Profiling Stack
| Layer | What It Measures | Tool/Method | Accuracy | Min Posts |
|-------|-----------------|-------------|----------|-----------|
| Big Five (OCEAN) | Core personality traits | RoBERTa embeddings + BiLSTM | AUROC 0.78-0.82 | 30-50 |
| Moral Foundations | Ethical intuitions | eMFDscore (pip) | Validated dictionary | 20+ |
| Schwartz Values | Core value priorities | DeBERTa on ValueEval | F1 0.56 (macro) | 20+ |
| Cognitive Style | Thinking patterns | AutoIC + LIWC features | r=0.70-0.82 doc-level | 20+ |
| Narrative Framing | How they frame issues | GPT-4 few-shot | F1 ~70% | 10+ |
| Behavioral Metadata | Non-text patterns | Feature extraction | r=0.29-0.40 per trait | 20+ |
## Layer 1: Big Five Personality (Foundation)
### Accuracy Bounds (peer-reviewed)
- AUROC 0.78-0.82 with RoBERTa embeddings + BiLSTM (JMIR 2025)
- Per-trait binary accuracy: O=0.637, C=0.602, E=0.620, A=0.590, N=0.620
- Meta-analytic correlations (Azucar 2018, 16 studies):
Extraversion r=0.40, Openness r=0.39, Conscientiousness r=0.35,
Neuroticism r=0.33, Agreeableness r=0.29
- These hit the "personality coefficient" ceiling of r=0.30-0.40 —
digital footprints are as predictive as any behavioral measure
### What Actually Works
- Fine-tuned embeddings >> zero-shot LLMs. GPT-4o zero-shot is UNRELIABLE.
- RoBERTa embeddings are free and nearly as good as OpenAI embeddings
- Aggregation across posts is essential — single posts are noise
- 30-50 posts of ~90 words each = practical minimum
- Training data: PANDORA Reddit corpus (1568 users, ~935K posts)
### For The Simulator (without running models)
Since we can't fine-tune per-simulation, use LLM-as-rater with caveats:
- Provide 10-20 actual posts as evidence
- Ask for trait estimation with reasoning, not just scores
- Anchor with the adjective-based method (see prediction-engine.md)
- Frame estimates as ranges, not points: "Openness: HIGH (0.7-0.9)"
- Known bias: LLMs overestimate agreeableness and underestimate neuroticism
### Key Insight: LLMs Already Know Public Figures
Nature Scientific Reports 2024: GPT-3's semantic space already encodes
perceived personality of public figures from their names alone. For
famous people, the LLM's latent knowledge is a STARTING POINT that
OSINT data confirms or corrects.
## Layer 2: Moral Foundations (Ethical Compass)
Jonathan Haidt's Moral Foundations Theory. Six foundations:
| Foundation | Liberal emphasis | Conservative emphasis |
|-----------|-----------------|---------------------|
| Care/Harm | ★★★ HIGH | ★★ MODERATE |
| Fairness/Cheating | ★★★ HIGH | ★★ MODERATE |
| Loyalty/Betrayal | ★ LOW | ★★★ HIGH |
| Authority/Subversion | ★ LOW | ★★★ HIGH |
| Sanctity/Degradation | ★ LOW | ★★★ HIGH |
| Liberty/Oppression | ★★ MODERATE | ★★ MODERATE |
### Tool: eMFDscore
```
pip install emfdscore
# GitHub: github.com/medianeuroscience/emfdscore
# Built on spaCy, GPL-3.0
```
Output per post: scores for each foundation (virtue + vice dimensions)
Aggregate across 20+ posts → 10-dimensional moral profile
### Application to Simulation
Moral foundations predict:
- What topics trigger emotional responses
- What arguments they find persuasive vs repulsive
- How they frame political/social issues
- Who they instinctively ally with vs oppose
- What kind of content they share/amplify
Example: High Loyalty/Authority person will defend their tribe even when
wrong. High Care/Fairness person will break from their tribe on justice
issues. This shapes conversation dynamics.
### For The Simulator (without running eMFDscore)
Infer moral foundations from:
- Political positions and framing in their posts
- What they get angry about vs what they celebrate
- Who they defend and who they attack
- Key moral vocabulary: "protect", "fair", "loyal", "respect", "pure", "free"
## Layer 3: Schwartz Values (Core Motivations)
19 values in circular continuum (adjacent values are compatible,
opposite values are in tension):
**Self-Transcendence****Self-Enhancement**
- Universalism, Benevolence ↔ Power, Achievement
**Openness to Change****Conservation**
- Self-Direction, Stimulation, Hedonism ↔ Tradition, Conformity, Security
### SemEval-2023 Task 4 Results
- Best macro-F1: 0.56 (ensemble of 12 DeBERTa/RoBERTa models)
- Most reliable: universalism (nature), security, power
- Least reliable: stimulation, hedonism, humility
- Dataset: 9,324 annotated arguments, available via Touché
### Key Finding: Value Perception Is Subjective
Epstein et al. (2026): human inter-rater agreement on values is only r=0.201.
Fine-tuned GPT-4o reaches r=0.294 — BETTER than human-human agreement.
Personalized models reach r=0.334.
### For The Simulator
Values predict MOTIVATION — why someone holds positions, not just what
positions they hold. Two people with the same political stance may have
completely different underlying values:
- "I support open source because FREEDOM" (Self-Direction)
- "I support open source because FAIRNESS" (Universalism)
- "I support open source because it WORKS BETTER" (Achievement)
Same position, different framing, different behavioral predictions.
## Layer 4: Cognitive Style (How They Think)
### Integrative Complexity (AutoIC)
Measures differentiation (seeing multiple perspectives) and integration
(synthesizing perspectives into coherent frameworks).
- Low IC: black-and-white thinking, strong convictions, simple language
- High IC: nuanced, sees multiple sides, hedging, complex sentences
AutoIC (Conway et al.): 3,500+ complexity-relevant root words/phrases,
13 dictionary categories, validated r=0.70-0.82 at document level.
**WARNING**: LIWC's "analytic thinking" correlates only r=0.14 with actual
integrative complexity. Don't use LIWC's score as a proxy.
### Computational Indicators of Cognitive Style
Extractable from 20-50 posts without specialized tools:
| Indicator | High Cognition | Low Cognition |
|-----------|---------------|---------------|
| Vocabulary diversity (TTR) | HIGH | LOW |
| Avg sentence length | LONGER | SHORTER |
| Causal connectives ("because", "therefore") | MORE | FEWER |
| Hedging ("perhaps", "it seems") | MORE | FEWER |
| Abstract vs concrete language | MORE ABSTRACT | MORE CONCRETE |
| Question-asking | MORE | FEWER |
| Binary framing ("always/never") | LESS | MORE |
### For The Simulator
Cognitive style directly shapes VOICE:
- High IC person: longer posts, more caveats, "on the other hand"
- Low IC person: punchy takes, strong assertions, no hedging
- This is one of the strongest differentiators between similar-sounding people
## Layer 5: Narrative Framing (Their Lens on Reality)
How someone frames an issue reveals deep cognitive and value patterns.
### Common Frames (Semetko & Valkenburg)
- **Conflict**: issue as battle between opposing sides
- **Human interest**: personal stories, emotional impact
- **Economic**: costs, benefits, financial impact
- **Morality**: right vs wrong, ethical principles
- **Attribution of responsibility**: who's to blame / who should fix it
### Detection
GPT-4 few-shot with frame definitions achieves F1=70.4%
Best for diverse topics where fine-tuned models are too narrow
### For The Simulator
Framing predicts:
- How they'll react to news (through which lens)
- What aspects they'll emphasize in conversation
- What arguments they'll find compelling
- Whether they personalize or systematize issues
Example: Same AI safety event, different frames:
- Conflict framer: "The open vs closed battle heats up"
- Economic framer: "This will cost the industry billions"
- Moral framer: "This is irresponsible and dangerous"
- Attribution framer: "The regulators need to step in"
## Layer 6: Behavioral Metadata (Non-Text Signals)
Extractable from X API / Bluesky AT Protocol without NLP:
| Feature | What It Reveals |
|---------|----------------|
| Posting time distribution | Timezone, sleep patterns, work schedule |
| Reply vs original ratio | Conversational vs broadcast personality |
| Emoji frequency & types | Emotional expression style |
| Hashtag usage | Community identification, signal boosting |
| Media attachment rate | Visual vs text orientation |
| Thread length | Depth of engagement preference |
| Retweet/repost ratio | Amplifier vs creator |
| Average post length | Conciseness vs verbosity |
| Response latency | Impulsiveness vs deliberation |
### Trait Correlations (meta-analytic)
- **Extraversion**: more posts, more friends, more photos, more group activity
- **Neuroticism**: more self-disclosure, more passive consumption, more late-night posting
- **Agreeableness**: fewer swear words, more positive emotion, more supportive replies
- **Conscientiousness**: more regular posting patterns, more task-oriented content
- **Openness**: more diverse topics, more original content, larger networks
## Putting It All Together: The Deep Dossier
At high fidelity, compile a multi-layer profile:
```
PSYCHOMETRIC PROFILE: @handle
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Big Five: O[HIGH] C[MED] E[HIGH] A[LOW] N[LOW]
Evidence: {real quotes showing each trait}
Moral Foundations: Care★★ Fair★★★ Loyal★ Auth★ Sanct★ Liberty★★★
Evidence: {what they get angry/excited about}
Values: Self-Direction dominant, Achievement secondary
Evidence: {how they justify their positions}
Cognitive Style: HIGH integrative complexity
Evidence: {hedging patterns, nuanced takes, sentence complexity}
Dominant Frame: Attribution of Responsibility
Evidence: {they consistently focus on who's to blame}
Behavioral: Night owl, reply-heavy, low emoji, threads > one-shots
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
```
This multi-layer profile makes predictions much more nuanced than
Big Five alone. It tells you not just WHAT someone will say but
WHY they'll say it and HOW they'll frame it.

View File

@@ -0,0 +1,170 @@
# GEPA Evolution — Automated Self-Improvement via hermes-agent-self-evolution
## What This Is
The hermes-agent-self-evolution repo (NousResearch/hermes-agent-self-evolution)
uses DSPy + GEPA (Genetic-Pareto Prompt Evolution) to automatically evolve
Hermes Agent skills. GEPA is an ICLR 2026 Oral paper — it reads EXECUTION
TRACES to understand WHY things fail, then proposes targeted mutations.
This means: we can point GEPA at the worldsim skill and automatically evolve
every component — simulation instructions, anti-slop rules, star thread
methodology, mechanical verification checklist, dossier templates — using
our own simulation outputs scored against real data as the eval signal.
The recursive self-improvement pipeline we built manually (log failures →
promote patterns → update rules) can be AUTOMATED via GEPA.
## How It Applies to WorldSim
### What GEPA Evolves (text, not weights)
GEPA evolves the TEXT of prompts and instructions. For worldsim, that means:
| Target | What Gets Evolved | Eval Signal |
|--------|------------------|-------------|
| SKILL.md | Immersion protocol, pipeline instructions | Simulation quality scores |
| star-thread.md | Methodology for finding star threads | Thread-to-voice accuracy |
| anti-slop.md | Slop word lists, structural patterns | Slop detection recall/precision |
| simulation-engine.md | Platform formats, conversation dynamics | Voice fidelity scores |
| adversarial-refinement.md | Mechanical check thresholds, GAN loop | Pre vs post refinement delta |
| prediction-engine.md | Forecasting methodology | Prediction Brier scores |
| dossier template | Profile structure and fields | Profile quality scores |
### The Eval Dataset
Built from worldsim's own outputs + real data:
1. **Voice fidelity pairs**: (simulated post, real post from same person) →
LLM-as-judge scores similarity 0-1
2. **Mechanical check logs**: what did the checks catch? what slipped through?
3. **Prediction accuracy**: tracked predictions scored against reality
4. **Held-out tests**: predicted tweets vs actual tweets
5. **Turing test results**: could the discriminator tell real from fake?
6. **User corrections**: any time the user catches something the system missed
(like the emoji fabrication incident — that's the richest signal)
### The GEPA Loop for WorldSim
```
1. RUN worldsim simulation (creates execution traces)
2. SCORE outputs against real data (voice, position, mechanical)
3. LOG traces + scores + user feedback to eval dataset
4. GEPA EVOLVES the skill component that had lowest scores
- Reads traces to understand WHY it scored low
- Proposes mutation to that specific reference file
- Tests mutation against held-out eval data
- If improved: create PR, human reviews
5. REPEAT — each cycle makes the skill better
```
### Concrete Example
GEPA discovers from traces that simulated conversations always have
symmetric turn-taking (4/4/4). It reads the mechanical check log that
caught this in 3 of the last 5 simulations. It reads the current
simulation-engine.md and sees the conversation architecture section.
It proposes a mutation:
OLD: "Opening Moves (1-3 posts) → Development (4-8 posts) → Peak → Resolution"
NEW: "Opening: most impulsive person posts. Others join ASYMMETRICALLY — one person
gets 40-50% of turns, one gets 15-20%, others fill the rest. The ratio should
match their real reply-to-original ratios from the dossier."
This mutation gets tested against the next 5 simulations. If symmetry
violations drop and voice scores don't decrease, it gets merged.
## Setup
```bash
# Clone the evolution repo
git clone https://github.com/NousResearch/hermes-agent-self-evolution.git
cd hermes-agent-self-evolution
pip install -e ".[dev]"
# Point at hermes-agent repo
export HERMES_AGENT_REPO=~/.hermes
# Evolve the worldsim skill specifically
python -m evolution.skills.evolve_skill \
--skill hermes-simulator \
--iterations 10 \
--eval-source sessiondb
```
## What Makes This Different From Manual Self-Improvement
The manual pipeline (references/recursive-self-improvement.md) requires the
agent to notice its own failures and write rules. This has two problems:
1. The agent shares weights with the generator — it's biased toward
approving its own output (the emoji incident proved this)
2. Promoting patterns to rules is slow and requires 3+ occurrences
GEPA solves both:
1. The eval signal comes from EXTERNAL data (real posts, user corrections,
mechanical checks) — not the agent's self-assessment
2. Evolution happens per-iteration, not per-3-failures
3. Mutations are tested against held-out data before merging
4. The Pareto frontier maintains diversity — different strategies for
different types of people/conversations
## Integration Points
### Eval Dataset Builder
Mine rehoboam DB for training data:
- simulation_logs table → execution traces
- prediction_scores table → accuracy data
- audit_log table → mechanical check results
- user correction events → highest-value signal
### Fitness Function for WorldSim
```python
def worldsim_fitness(simulation_output, real_data):
scores = {}
# Voice fidelity: embed real + simulated, cosine similarity
scores["voice"] = embed_and_compare(simulation_output, real_data.tweets)
# Mechanical pass rate: what % of checks passed without fixes
scores["mechanical"] = mechanical_check_pass_rate(simulation_output)
# Slop score: count of slop words/patterns detected
scores["anti_slop"] = 1.0 - (slop_count / total_words)
# Structure: turn asymmetry, conversation naturalness
scores["structure"] = naturalness_score(simulation_output)
# Textual feedback for GEPA's reflective mutation
feedback = generate_textual_feedback(scores, simulation_output, real_data)
return aggregate_score(scores), feedback
```
### The Key Insight: Textual Feedback
GEPA's superpower is that it doesn't just get a scalar score — it gets
TEXTUAL FEEDBACK explaining what went wrong. Our mechanical verification
system already produces this:
"@nosilverv avg 33.2 words vs real 15.6 (113% deviation) — SHORTEN"
"Parallel antithesis detected: 'The most X... The most Y...' — STRIP"
"Emoji rate 0% simulated but 10% real — OK (within tolerance)"
This text goes directly into GEPA's reflective mutation pipeline. It reads
these messages and proposes changes to the skill instructions that would
prevent these specific failures in future simulations.
## Evolution Targets by Priority
1. **simulation-engine.md** — highest impact on output quality
2. **anti-slop.md** — directly measurable, highest precision eval
3. **star-thread.md** — hardest to evaluate but most impactful on voice
4. **adversarial-refinement.md** — meta: improving the improvement system
5. **SKILL.md pipeline instructions** — orchestration optimization
6. **dossier template** — structure optimization
7. **prediction-engine.md** — measurable via Brier scores
## The Virtuous Cycle
```
More simulations → more eval data → better GEPA mutations
→ better skill instructions → better simulations → more eval data → ...
```
This is the endgame: the worldsim skill evolves itself through use.
Every simulation makes the next one better, not just through logged
rules, but through automated evolutionary optimization of the
instructions themselves. The system doesn't just learn WHAT went wrong —
it rewrites its own code to prevent it.

View File

@@ -0,0 +1,262 @@
# Knowledge Archive — Per-Person Source Library + Expert Synthesis
## The Problem With Profiles
A profile is a SNAPSHOT. It says "this person believes X" but doesn't
show you WHERE they said it, WHEN, in WHAT context, or HOW their
thinking evolved. You can't cite a profile. You can't trace a claim
back to a source. And when you're simulating a conversation about
topic Z, the profile gives you everything about the person equally
weighted — their views on AI and their views on cooking and their
views on politics all crammed into the same context window.
## The Archive
For every person the system touches, build a LIBRARY:
```
~/.hermes/rehoboam/archives/{handle}/
├── index.json ← master index: all entries, metadata, embeddings
├── sources/
│ ├── x_tweets.jsonl ← every tweet pulled, with ID, timestamp, URL, metrics
│ ├── x_replies.jsonl ← their replies (different voice register)
│ ├── bluesky_posts.jsonl ← bluesky posts
│ ├── blog_posts.jsonl ← full text of blog posts with URLs
│ ├── podcast_quotes.jsonl ← attributed quotes from transcripts
│ ├── interviews.jsonl ← quotes from news articles/interviews
│ ├── reddit_comments.jsonl
│ ├── github_comments.jsonl
│ ├── goodreads_reviews.jsonl
│ ├── threads_posts.jsonl
│ └── other.jsonl ← anything else (HN, Quora, etc.)
├── topics/
│ ├── ai_safety.jsonl ← auto-clustered by topic
│ ├── open_source.jsonl
│ ├── consciousness.jsonl
│ └── ...
└── embeddings/
└── all_embeddings.npy ← sentence-transformer vectors for semantic search
```
### Entry Format (every entry in every source file)
```json
{
"id": "unique_id",
"handle": "teknium",
"platform": "x",
"type": "tweet|reply|blog|podcast|interview|comment|review",
"text": "the actual text they said",
"url": "https://x.com/Teknium/status/1234567890",
"timestamp": "2026-04-05T21:40:48Z",
"context": {
"replying_to": "@otheruser's tweet about X",
"thread_position": 3,
"topic": "open source AI",
"source_title": "Lex Fridman Podcast #412"
},
"metrics": {
"likes": 234,
"retweets": 45,
"replies": 12
},
"topics": ["open_source", "ai_models", "hermes"],
"embedding_id": 42
}
```
Every entry has a URL. Everything is traceable. Nothing is paraphrased
without the original alongside it.
## Collection Pipeline
When `worldsim> profile @handle` or `worldsim> archive @handle` runs:
### Step 1: Pull Everything
Use every verified access method to collect raw materials:
- X API: get max tweets (paginate with next_token to get hundreds)
- nitter.cz: timeline content
- ThreadReaderApp: historical threads
- Bluesky: full post history
- GitHub: issue comments, PR reviews, gists, README
- Reddit: comment history
- Blog/Substack: full posts (web_extract)
- Podcast transcripts: attributed quotes
- Interviews: quotes with attribution
- Goodreads: reviews
- Medium: RSS feed full text
### Step 2: Deduplicate
Same content appears across platforms (cross-posted tweets, syndicated
blog posts). Deduplicate by content similarity, keep the richest version
(the one with most metadata/context).
### Step 3: Topic Cluster
Run lightweight topic classification on each entry:
- Use the LLM or a simple keyword matcher to assign 1-3 topic tags
- Cluster into topic files for fast retrieval
- Topics are dynamic — new topics emerge from the data
### Step 4: Embed
Generate sentence-transformer embeddings for every entry.
Store in numpy array for fast cosine similarity search.
This enables semantic retrieval: "find everything @handle said about
consciousness" even if they never used the word "consciousness."
### Step 5: Index
Build the master index.json with entry count, topic distribution,
timestamp range, platform coverage, and quality metrics.
## Context-Aware Retrieval
This is the key. The archive might have 500 entries for a person.
The context window can hold maybe 30-50 of them alongside all the
other simulation context. You MUST retrieve selectively.
### For Simulation
When simulating @handle talking about topic X:
```
1. Semantic search: embed the current conversation context
2. Retrieve top 10-15 entries by cosine similarity to context
3. Also retrieve: 5 highest-engagement entries (their "greatest hits")
4. Also retrieve: 3 most recent entries (freshness)
5. Also retrieve: 2 entries that CONTRADICT the expected position
(prevents confirmation bias in the simulation)
6. Deduplicate. Cap at 25-30 entries total.
7. These become the "voice anchors" for generation.
```
The simulation draws from SPECIFIC REAL QUOTES relevant to the current
conversation. Not a generic profile. Not everything they've ever said.
The 25 most relevant things they've said about THIS topic.
### For Expert Synthesis
When the user asks "who are the best minds on X and what have they said?":
```
1. Search ALL archived people's entries for topic X
2. Rank by: entry quality × person expertise × relevance to query
3. Return a synthesis with CITATIONS:
On the topic of AI consciousness:
@repligate argues that LLMs exhibit "simulacra of consciousness"
rather than consciousness itself, distinguishing between the
model's behavior and its substrate:
> "the question isn't whether GPT is conscious but whether the
> character it's simulating is conscious within the fiction"
— tweet, 2025-03-15 (2.4K likes)
https://x.com/repligate/status/...
@nickcammarata approaches it from a meditation/first-person
perspective, noting parallels between introspective practice
and interpretability:
> "observation changes the system being observed, in meditation
> and in interp"
— tweet, 2026-04-05 (2.9K likes)
https://x.com/nickcammarata/status/...
@tszzl is skeptical of the framing entirely:
> "consciousness discourse is philosophy cosplaying as engineering"
— tweet, 2025-11-22 (5.1K likes)
https://x.com/tszzl/status/...
```
Every claim attributed. Every quote sourced. Every link clickable.
### For Grounding Predictions
When predicting what @handle would say about event Y:
```
1. Retrieve all archive entries related to Y or adjacent topics
2. Identify their PATTERN of response to similar events
3. Ground the prediction in specific past statements:
PREDICTION: @handle would likely frame event Y through the lens
of [topic Z], based on:
- tweet [url]: "quote about Z" (2025-06-15)
- blog post [url]: "longer quote about Z" (2025-09-20)
- podcast [url]: "verbal quote about Z" (2026-01-10)
CONFIDENCE: 78% (3 consistent sources over 7 months)
```
## Incremental Updates
The archive grows over time. Each time the person is profiled:
1. Pull new content since last archive timestamp
2. Append to source files
3. Re-embed new entries only
4. Update topic clusters
5. Update index
Don't rebuild from scratch. Append and re-index.
## Expert Table
When you have 20+ archived people, build an expert table:
```
worldsim> experts "open source AI"
EXPERT TABLE: open source AI
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
@Teknium | 47 entries | voice: builder/practitioner
"we can prove that open approaches build better, more
trustworthy systems" — tweet, 2026-04-05
Latest: 2 hours ago | Stance: STRONG ADVOCATE
@repligate | 12 entries | voice: philosophical/theoretical
"open weights = accountability. you can't audit a black box"
— tweet, 2025-11-30
Latest: 3 days ago | Stance: ADVOCATE (principled)
@eigenrobot | 8 entries | voice: statistical/contrarian
"the open source premium is largely downstream of selection
effects in who contributes" — tweet, 2025-08-14
Latest: 1 week ago | Stance: SKEPTICAL OF FRAMING
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
3 experts found | 67 total entries | synthesize? (y/n)
```
The table shows: who knows about this, what they've said, how recently,
and what their stance is. All grounded in archived quotes with sources.
## Integration With Simulation
When the star thread + dossier + archive work together:
```
STAR THREAD: drives the core generation (what they're DOING)
DOSSIER: provides constraints (psychometrics, voice metrics, baselines)
ARCHIVE: provides GROUNDING (specific real quotes for this context)
MECHANICAL CHECKS: verifies surface features (emoji, length, slop)
```
The archive prevents the simulation from drifting into generic territory.
Instead of "this person would probably say something about open source,"
it's "this person said THIS SPECIFIC THING about open source 3 weeks ago,
and their simulation should be consistent with that while also being fresh."
## The Overfitting Problem
"Without overfitting to a particular material the new context doesn't call for."
The retrieval system MUST be selective. If someone said 47 things about
open source AI, and the current conversation is about AI regulation,
don't dump all 47 open source quotes into context. Maybe 3 are relevant
because they connect open source to regulation. Retrieve THOSE 3.
The cosine similarity search handles this naturally — it matches the
CURRENT conversation context against the archive and returns what's
actually relevant, not everything tagged with a nearby topic.
The anti-overfitting checklist:
- Never load more than 25-30 archive entries per person into context
- Weight by relevance to CURRENT conversation, not by general importance
- Include at least 2 entries that contradict the expected position
- Include at least 3 recent entries regardless of topic relevance (freshness)
- If the conversation shifts topic mid-simulation, RE-RETRIEVE for new context
- The archive is a LIBRARY you consult, not a script you follow

View File

@@ -0,0 +1,321 @@
# Mass Behavior Modeling — Communities, Clusters, Cascades
Understanding individual behavior requires understanding the social
ecosystem they exist in. This reference covers the macro layer:
community detection, influence networks, audience modeling, and
predicting how groups respond to events.
## Why This Matters For Simulation
Individual prediction accuracy: ~56-60%
Individual-in-context prediction: significantly higher
A person's behavior is constrained by their community. Knowing WHICH
community they belong to, WHO influences them, and WHAT information
ecosystem they're in makes individual predictions much sharper.
Lewin's equation: B = f(P, E). This reference is about the E.
## The Ecosystem Stack
```
Layer 5: AUDIENCE REACTION — How would this person's audience respond?
Layer 4: STANCE & SENTIMENT — What positions do clusters hold?
Layer 3: INFLUENCE NETWORKS — Who spreads ideas to whom?
Layer 2: COMMUNITY CLUSTERS — Who groups together?
Layer 1: SOCIAL GRAPH — Who follows/interacts with whom?
```
## Layer 1: Social Graph Construction
### Data Sources (by accessibility)
| Source | Access | Quality | Tools |
|--------|--------|---------|-------|
| Bluesky AT Protocol | FREE, open, no auth | Excellent | atproto (pip) |
| X/Twitter API | Bearer token, limited | Good but restricted | curl, tweepy |
| Reddit | API with limits | Good for comments | PRAW (pip) |
| GitHub | Free API | Great for tech people | PyGithub (pip) |
| Web scraping | Fragile, TOS issues | Variable | Last resort |
### Bluesky: The Open Gold Mine
```python
# pip install atproto
from atproto import Client
client = Client()
# No auth needed for public data
# Get follower graph
followers = client.get_followers(actor="handle.bsky.social")
following = client.get_follows(actor="handle.bsky.social")
# Real-time firehose (no auth!)
# wss://jetstream1.us-east.bsky.network/subscribe
```
### Graph Types
- **Follow graph**: who follows whom (directed, static-ish)
- **Interaction graph**: who replies to / retweets whom (directed, dynamic)
- **Mention graph**: who mentions whom (directed, weighted by frequency)
- **Co-engagement graph**: who engages with the same content (undirected)
Interaction graphs are more informative than follow graphs for predicting
actual behavioral alignment.
### Tools
```
pip install networkx python-igraph
```
NetworkX for prototyping (<100K nodes), igraph for production (millions).
## Layer 2: Community Detection
### Algorithms (ranked by quality)
| Algorithm | Quality | Speed | Notes |
|-----------|---------|-------|-------|
| Leiden | Best | Fast | Guarantees connected communities |
| Louvain | Good | Fastest | Can produce disconnected communities |
| Infomap | Excellent | Medium | Based on information theory |
| Label Propagation | Decent | Very fast | Non-deterministic |
### The Meta-Library: CDLib
```
pip install cdlib
```
Wraps 50+ community detection algorithms in a unified API.
Works on top of networkx/igraph. Highly recommended.
```python
import cdlib
from cdlib import algorithms
import networkx as nx
G = nx.karate_club_graph()
communities = algorithms.leiden(G)
# Also: louvain, infomap, label_propagation, angel, demon, etc.
```
### What Communities Tell Us
Each community in a social graph typically shares:
- Ideological orientation
- Topic interests
- Information sources
- Language patterns and in-group vocabulary
- Reaction patterns to events
Knowing which community someone belongs to immediately constrains
predictions about their likely positions and reactions.
## Layer 3: Influence Networks
### Key Insight (Zhou et al., National Science Review 2024)
Network centrality alone is INSUFFICIENT for predicting influence.
Must combine structural position with behavioral features:
- Posting frequency
- Historical content virality
- Response rate / engagement ratio
- Content originality (original vs repost ratio)
### Centrality Measures
```python
import networkx as nx
G = nx.DiGraph() # directed social graph
# Who has the most connections?
degree = nx.degree_centrality(G)
# Who bridges different communities?
betweenness = nx.betweenness_centrality(G)
# Who's connected to other well-connected people?
eigenvector = nx.eigenvector_centrality(G)
# Adapted from web — directed influence flow
pagerank = nx.pagerank(G)
```
### Superspreader Identification (DeVerna et al., PLOS ONE 2024)
Superspreaders of content fall into three categories:
1. **Pundits**: large following, high authority, original content
2. **Media outlets**: institutional accounts, news organizations
3. **Affiliated personal accounts**: connected to pundits/outlets
For simulation: knowing who the superspreaders are in a person's
network tells you what information they're likely exposed to.
### Information Cascade Modeling
```
pip install ndlib # Network Diffusion Library
```
NDlib models how information spreads through networks:
- Independent Cascade Model
- Linear Threshold Model
- SIR/SIS epidemiological models adapted for info spread
- Voter Model (opinion dynamics)
- Sznajd Model (social influence)
## Layer 4: Stance & Sentiment Analysis
### Ready-To-Use Models (HuggingFace)
**Tweet Sentiment** (most reliable):
```
cardiffnlp/twitter-roberta-base-sentiment-latest
# Labels: positive / negative / neutral
```
**Political Stance**:
```
kornosk/bert-election2020-twitter-stance-biden-KE-MLM
kornosk/bert-election2020-twitter-stance-trump-KE-MLM
launch/POLITICS # left / center / right
```
**All-in-One Tweet NLP**:
```
pip install tweetnlp
# Sentiment, emotion, hate speech, NER, topic classification
```
### Topic-Level Stance Tracking
Combine BERTopic (dynamic topic modeling) with stance classifiers:
1. Cluster posts into topics over time windows
2. Classify stance per topic per community
3. Track stance shifts over time
4. Detect divergence between communities on emerging topics
### PRISM Framework (ACL 2025)
First framework for interpretable political bias embeddings.
Two-stage: mine bias indicators → cross-encoder assigns structured scores.
```
github.com/dukesun99/ACL-PRISM
```
## Layer 5: Audience Modeling & Crowd Prediction
### The Frontier: Predicting How Groups React
Key papers and findings:
**CReAM (WWW 2024)**: Predicts which of two posts gets more engagement.
Uses LLM-generated features + FLANG-RoBERTa cross-encoder.
Demonstrates crowd reaction IS predictable from content alone.
**PopSim (Dec 2025)**: LLM multi-agent social network sandbox.
Simulates content propagation dynamics using "Social Mean Field"
for individual-population interaction. Reduces prediction error 8.82%.
**Conditioned Comment Prediction (EACL 2026)**:
KEY FINDING: behavioral traces (past posts) are BETTER than
descriptive personas for conditioning LLMs to predict user behavior.
This validates our OSINT approach: real data > personality labels.
**DEBATE Benchmark (Oct 2025)**:
WARNING: LLM agents converge opinions TOO QUICKLY vs real humans.
SFT + DPO helps but gap remains. Real communities maintain
disagreement longer than simulated ones.
**Distributional vs Individual Prediction (PMC 2025)**:
Group-level predictions are more reliable than individual ones.
Predicting "65% of this community will react negatively" is more
accurate than predicting "this specific person will react negatively."
### Application to Simulation
When simulating @person talking about event X, consider:
1. What community does @person belong to?
2. How is that community reacting to X? (distributional prediction)
3. Where does @person sit within that community? (conformist vs contrarian)
4. Who influences @person? What are THEY saying?
5. How does @person's audience react to their take? (engagement prediction)
This context makes individual predictions sharper.
## Echo Chamber & Filter Bubble Detection
### Technique
1. Build interaction graph
2. Run Leiden community detection
3. For each community, aggregate stance on key issues
4. Measure ideological homogeneity within communities
5. Compare cross-community vs within-community content similarity
6. High within + low cross = echo chamber
### Tools
```
github.com/mminici/Echo-Chamber-Detection # Cascade-based, CIKM 2022
# Includes Brexit and VaxNoVax datasets
```
### What It Tells Us
Knowing someone's echo chamber tells you:
- What information they're exposed to
- What they're NOT exposed to
- How extreme their positions might be (isolation → radicalization)
- Whether they'll encounter pushback or only agreement
- How they'll react to information from outside their bubble
## User Embeddings: "Find People Like @person"
### Strategy
1. Embed each user's recent N posts with sentence-transformers
2. Average embeddings → user vector
3. Use FAISS for similarity search
4. Cluster users with HDBSCAN in embedding space
### Best Models for Social Media Text
```
# General purpose (good baseline)
sentence-transformers/all-mpnet-base-v2
# Tweet-specific (better domain fit)
cardiffnlp/twitter-roberta-base
vinai/bertweet-base # pretrained on 850M tweets
```
### Graph + Text Hybrid Embeddings
```
pip install karateclub
```
KarateClub provides Node2Vec, DeepWalk, Graph2Vec — embed users
based on graph position. Combine with text embeddings for hybrid
vectors that capture BOTH what someone says AND where they sit
in the social network.
## Practical Application to Simulation
### For Individual Simulation (what we already do)
Add ecosystem context to each dossier:
- Which community cluster they belong to
- Who their top influencers are (who do they retweet/amplify most)
- What echo chamber are they in (information environment)
- How does their community view the simulation topic
### For Audience Simulation (new capability)
When user asks "what would @person's audience say":
1. Identify @person's follower community
2. Sample representative voices from that community
3. Model the DISTRIBUTION of responses, not just one response
4. Include: cheerleaders, critics, joke-makers, lurkers
5. Weight by typical engagement patterns
### For Cascade Prediction (new capability)
When user asks "how would this take spread":
1. Model the initial tweet and its immediate network
2. Predict which nodes amplify (based on stance alignment + influence)
3. Estimate reach and engagement range
4. Predict quote-tweet ratio (agreement vs dunking)
## Recommended Minimal Stack
```bash
pip install networkx python-igraph leidenalg cdlib karateclub
pip install sentence-transformers transformers tweetnlp
pip install ndlib faiss-cpu hdbscan atproto
```
This gives you: graph construction, community detection, user embeddings,
stance/sentiment analysis, diffusion simulation, similarity search,
clustering, and Bluesky data access. All open source, all pip-installable.

View File

@@ -0,0 +1,370 @@
# OSINT Pipeline — Deep Intelligence Gathering
Full-spectrum open source intelligence for building personality models.
This goes beyond social media posts into visual identity, cross-platform
footprints, and behavioral analysis.
## Tool Arsenal
| Tool | Use Case | Strength |
|------|----------|----------|
| `web_search` | Find anything, initial discovery | Fast, broad, indexed content |
| `web_extract` | Pull full page content | Blogs, articles, profiles, PDFs |
| `browser_navigate` + `browser_snapshot` | View live pages | Dynamic content, login walls |
| `browser_vision` | Analyze what a page looks like | Layouts, visual identity, screenshots |
| `vision_analyze` | Analyze any image by URL/path | Profile pics, post images, aesthetics |
| `browser_get_images` | List all images on a page | Find images to feed to vision_analyze |
| Yandex reverse image search | Find where an image appears | Identity verification, alt accounts |
| `x-cli` (if available) | Direct Twitter API | Timelines, search, metadata |
## Instagram Intelligence
Instagram is CRITICAL for personality modeling — it reveals:
- Visual identity and aesthetic preferences
- Real-life social circles (tagged people, group photos)
- Lifestyle signals (travel, food, hobbies, pets)
- Caption voice (often different from Twitter voice)
- Story highlights (curated self-image)
- Bio links (cross-platform connections)
### Viewing Instagram Profiles (VERIFIED APRIL 2026)
**METHOD 1 — Instagram Private Web API (BEST, returns full JSON)**
```bash
curl -s -H 'User-Agent: Mozilla/5.0 (iPhone; CPU iPhone OS 16_0 like Mac OS X)' \
-H 'x-ig-app-id: 936619743392459' \
'https://i.instagram.com/api/v1/users/web_profile_info/?username={handle}'
```
Returns ~500KB of JSON: full profile + last 12 posts with captions, likes,
comments, CDN image URLs, timestamps. No auth needed.
**METHOD 2 — Instagram oEmbed API (for individual posts)**
```bash
curl -s 'https://www.instagram.com/api/v1/oembed/?url=https://www.instagram.com/p/{SHORTCODE}/'
```
Returns: caption text, author_name, thumbnail URL. No auth.
**METHOD 3 — Pixwox via web_extract (profile viewer)**
```python
web_extract(["https://pixwox.com/profile/{username}"])
```
Returns 12+ recent posts with captions, engagement stats. Cloudflare blocks
curl but web_extract bypasses it.
**METHOD 4 — SocialBlade via web_extract (analytics)**
```python
web_extract(["https://socialblade.com/instagram/user/{handle}"])
```
Returns follower count, engagement rate, 14-day tracking.
**METHOD 5 — CDN direct download (images from API responses)**
Image URLs from API responses (scontent-*.cdninstagram.com) download
directly with no auth. Feed them to vision_analyze for visual profiling.
**METHOD 6 — Google indexed content**
```
web_search("site:instagram.com {username}")
```
Returns bio text, follower count, recent post captions from search snippets.
**WHAT DOESN'T WORK:** direct web_extract on instagram.com, ?__a=1 trick,
graph.instagram.com (needs OAuth), imginn/picuki/dumpoir/gramhir (403)
### Instagram Discovery (finding someone's handle)
```
web_search("{real_name} instagram")
web_search("{twitter_handle} instagram account")
web_search("site:instagram.com {real_name}")
# Check their Twitter/X bio for IG links
# Check their personal website for social links
# Check Linktree / bio.link pages
```
### Extracting Signal from Instagram
**Profile Picture**: Reveals self-presentation style
- Professional headshot vs casual vs meme/avatar
- Analyze with vision_analyze for clothing, setting, expression
**Bio Text**: Compressed self-identity
- Role/title claims
- Emoji usage patterns
- Link destinations
- Location claims
**Post Grid**: Visual identity fingerprint
- Color palette tendencies
- Content categories (food/travel/tech/selfies/memes)
- Posting frequency
- Professional vs personal ratio
**Captions**: Voice sample different from Twitter
- Usually longer, more personal
- Hashtag usage patterns
- Emoji patterns
- Tone (inspirational vs casual vs funny)
**Tagged Photos**: Real social graph
- Who they hang out with IRL
- Events they attend
- Social circles outside tech/AI
## Visual Identity Analysis
Use vision tools to analyze HOW someone presents visually:
### Profile Pictures Across Platforms
```
# Collect profile pics from multiple platforms
# Twitter, Instagram, LinkedIn, GitHub, Discord
# Analyze each
vision_analyze(image_url="{pic_url}",
question="Describe this profile picture in detail: person's appearance, clothing style, setting, expression, professional vs casual, any notable elements")
# Cross-reference: do they use the same pic everywhere? Different personas?
```
### Reverse Image Search (Yandex Pipeline)
From memory — Google Lens blocks Browserbase IPs, use Yandex:
```
# For images behind auth/CDN, upload to catbox first
terminal("curl -F 'reqtype=fileupload' -F 'fileToUpload=@{local_path}' https://catbox.moe/user/api.php")
# Then Yandex reverse image search
browser_navigate("https://yandex.com/images/search?rpt=imageview&url={encoded_public_url}")
# Or via web_extract (slower but automatable)
web_extract(["https://yandex.com/images/search?rpt=imageview&url={encoded_url}"])
```
Yandex provides:
- Similar images (find the same person elsewhere)
- Site matches (where this image appears)
- OCR text extraction (text in images)
- Image tags (what's in the image)
- Knowledge panels (identified entities)
### Screenshot Analysis
When you can see a page but can't extract text:
```
browser_vision(question="Read all text on this page. List usernames, post content, dates, engagement numbers")
browser_vision(annotate=true, question="What interactive elements are on this page?")
```
## LinkedIn Intelligence
**STATUS: BLOCKED for automated access** (tested April 2026).
web_extract returns "Website Not Supported". Direct browsing triggers auth walls.
**Workarounds:**
```
# LinkedIn content IS indexed by search engines
web_search("{real_name} linkedin {company}")
web_search("site:linkedin.com/in {name}")
# These return snippets with headline, role, company — useful even without full profile
# Google sometimes caches LinkedIn profiles
web_search("{name} site:linkedin.com headline")
```
**METHOD 1 — Google indexed snippets (always works)**
```
web_search("site:linkedin.com/in {name} {company}")
```
Returns: name, headline, company, location, connection count, bio snippet.
**METHOD 2 — Crunchbase (EXCELLENT for founders/execs)**
```python
web_extract(["https://www.crunchbase.com/person/{slug}"])
```
Returns: full career history, education, investments, board positions,
social links. Best source for professional identity of startup people.
**METHOD 3 — Corporate press pages**
```
web_search("{person} {company} site:{company}.com bio OR press")
```
Official bios from company newsrooms. High quality, curated but factual.
**METHOD 4 — Third-party aggregators**
- RocketReach, SignalHire — job title + company from web_search snippets
- rootdata.com — good for crypto/AI people
- Crunchbase — best all-round for tech executives
**METHOD 5 — Paid LinkedIn API wrappers** (if budget allows)
- LinkdAPI, Proxycurl: $0.07-0.15 per profile, full structured data
- No OAuth needed, just API key
LinkedIn reveals (from combined methods):
- Career trajectory (Crunchbase full history)
- Current role and headline (search snippets)
- Education (Crunchbase or search snippets)
- Professional self-presentation (company bio pages)
- Investment/board activity (Crunchbase)
## Podcast Transcripts (HIGHEST VALUE for voice profiling)
Podcast interviews are THE gold mine for personality modeling. Hours of
unscripted speech, natural conversation, real personality showing through.
**Discovery:**
```
web_search("{name} podcast transcript interview")
web_search("{name} lex fridman OR tyler cowen OR joe rogan OR dwarkesh")
```
**Extraction — verified working transcript sources:**
```python
# Lex Fridman (full verbatim transcripts)
web_extract(["https://lexfridman.com/EPISODE_URL/transcript"])
# Conversations with Tyler (Tyler Cowen — full transcripts)
web_extract(["https://conversationswithtyler.com/episodes/..."])
# TED Talks transcripts
web_extract(["https://www.ted.com/talks/.../transcript"])
# Sequoia Capital podcast
web_extract(["https://www.sequoiacap.com/podcast/..."])
```
Podcast transcripts reveal:
- Natural speech patterns (filler words, pacing, sentence structure)
- Unguarded opinions (less curated than tweets)
- How they respond to pushback (interviewer challenges)
- Humor style in conversation (different from written humor)
- Depth of knowledge on specific topics
- Personality under pressure
## YouTube / Video Intelligence
```
web_search("{name} youtube talk keynote interview")
web_search("{name} podcast appearance")
```
web_extract on YouTube pages returns rich summaries with attributed quotes.
Use youtube-content skill for full transcripts if available.
## Personal Blogs & Substacks (HIGH VALUE)
Personal writing is curated self-expression — how someone WANTS to be
seen intellectually. Very different signal from social media.
```
web_search("{name} blog substack essay")
# Extract full posts
web_extract(["https://{blog-url}/"])
# Wayback Machine works for archived blog posts
web_extract(["https://web.archive.org/web/2024/{blog-url}"])
```
## GitHub Intelligence
For technical people:
```
web_search("site:github.com {handle}")
web_extract(["https://github.com/{handle}"])
# Issue comments reveal communication style under technical pressure
web_search("site:github.com {handle} issue comment")
# README style reveals documentation personality
# Commit messages reveal terseness vs verbosity
```
## General Web Footprint
```
# Personal website / blog
web_search("{name} personal website blog about")
# Conference talks / speaker bios
web_search("{name} speaker conference talk bio")
# News mentions
web_search("{name} {company} news interview profile")
# Academic papers (for researchers)
web_search("{name} arxiv paper author")
web_search("site:scholar.google.com {name}")
# Podcast appearances
web_search("{name} podcast guest appearance")
# Forum posts (HN, specific communities)
web_search("site:news.ycombinator.com {handle} OR {name}")
```
## Cross-Platform Identity Resolution
### Handle Mapping Strategy
1. Start from known handle (usually Twitter)
2. Check bio links — most people link to other platforms
3. Search "{known_handle} {platform}" for each platform
4. Check personal website for social links
5. Reverse image search profile pic to find matching accounts
6. Search unique phrases they use across platforms
### Identity Verification
When you find a potential match on another platform:
- Same profile picture? (reverse image search)
- Same bio keywords?
- Same name/handle pattern?
- Cross-references (do they mention each other?)
- Writing style match?
## Search Space Narrowing
### The Jiggle Technique
When broad searches return noise, narrow progressively:
1. **Start broad**: `"{name}" AI`
2. **Add role**: `"{name}" {company} {role}`
3. **Add context**: `"{name}" {company} {specific_project_or_topic}`
4. **Add platform**: `site:{platform} "{name}" {context}`
5. **Add time**: `"{name}" {topic} 2025 OR 2026`
6. **Quote unique phrases**: if you found a distinctive phrase they use, search for that exact phrase to find more of their content
### Disambiguation
Common names need extra signals:
- Add their company/org
- Add their specific domain (AI, crypto, etc.)
- Use their unique handle as anchor
- Search for combinations of their known associates
- Use image search to verify you have the right person
### Signal vs Noise Heuristics
- **High signal**: direct quotes, interview transcripts, personal blog posts, long-form content
- **Medium signal**: mentions in aggregator sites, conference bios, LinkedIn summaries
- **Low signal**: generic news mentions, third-party profiles, directory listings
- **Noise**: same-name different person, outdated info (>2 years), scraped/regurgitated content
## Confidence Calibration
After full OSINT sweep, rate data quality:
| Confidence | Data Available | Simulation Quality |
|-----------|---------------|-------------------|
| 95-100% | 50+ posts, longform, video, visual, cross-platform | Near-perfect voice replication |
| 80-94% | 20-50 posts, some longform, basic visual | Very good, occasional educated guesses |
| 60-79% | 10-20 posts, mostly short-form | Good general sense, some gaps |
| 40-59% | 5-10 posts, limited platforms | Broad strokes only, flag uncertainty |
| 20-39% | <5 posts, single platform | Sketch at best, heavy disclaimers |
| <20% | Almost nothing found | Decline to simulate, ask user for context |
## Privacy & Ethics Note
All research uses publicly available information only. We don't:
- Access private/locked accounts
- Bypass authentication
- Use leaked/hacked data
- Dox or expose private information
- Simulate in ways designed to deceive or impersonate
The goal is personality MODELING for creative simulation, grounded in
what people choose to share publicly.

View File

@@ -0,0 +1,334 @@
# Prediction Engine — Forecasting What Someone Would Say/Do
Techniques for predicting behavior grounded in superforecasting methodology,
behavioral science, and SOTA LLM prediction research.
## Superforecasting Principles (Tetlock)
**Honest caveat**: Superforecasting methodology was developed for geopolitical and
world-event prediction, not personality simulation. That said, the THINKING TOOLS
are genuinely useful here — decomposition prevents lazy pattern-matching, base rates
fight overconfidence, and alternative hypotheses prevent single-track predictions.
What does NOT transfer cleanly: the calibration precision. When Tetlock says "70%
confident," that's backed by thousands of scored predictions. When we say "70%
confident" about what @someone would tweet, that's an educated estimate, not a
calibrated probability. Use the framework for its rigor, not its false precision.
Apply these thinking tools when making behavioral predictions:
### 1. Decomposition (Fermi-ize the Question)
Don't ask "What would @person say about X?"
Break it down:
- What is @person's known position on topics RELATED to X?
- What are their values/priorities that X touches on?
- What is their emotional register when discussing similar topics?
- Who are they likely responding to, and how does that change their tone?
- What platform are they on, and how does that shift their behavior?
### 2. Outside View First (Base Rates)
Before considering the specific person, ask:
- What would a TYPICAL person in their role/position say about X?
- What % of people in their ideological cluster hold position Y on X?
- What's the base rate for their type of response (agree/disagree/joke/ignore)?
### 3. Inside View Second (Case-Specific Adjustment)
Now adjust from the base rate using what you ACTUALLY KNOW about them:
- Specific past statements on this topic or related topics
- Known relationships with people/orgs involved
- Personal experiences that would shape their view
- Contrarian tendencies (do they predictably go against their cluster?)
### 4. Confidence Calibration
Express predictions with honest uncertainty. **These are rough buckets, not
calibrated probabilities. Don't pretend they're more precise than they are.**
- **90%+ confident**: They've literally said this before, just rephrased
- **70-89%**: Strong pattern match with known positions and voice
- **50-69%**: Reasonable inference but could go either way
- **30-49%**: Educated guess, limited data
- **<30%**: Basically guessing, flag it clearly
When reporting confidence, prefer plain language over fake precision:
"very likely" > "87% probability". The number implies a precision we don't have.
### 5. Consider Alternative Hypotheses
For every prediction, generate at least ONE plausible alternative:
- "They'd PROBABLY say X, but they might surprise with Y because Z"
- This prevents overconfident single-track predictions
## The Prediction Pipeline
### Step 1: Classify the Prediction Type
| Type | Description | Difficulty |
|------|-------------|-----------|
| **Position prediction** | What they believe about X | Easiest if data exists |
| **Reaction prediction** | How they'd respond to event Y | Medium |
| **Voice prediction** | How they'd phrase something | Medium-hard |
| **Behavior prediction** | What they'd DO (not just say) | Hardest |
| **Interaction prediction** | How they'd respond to specific person | Hard, depends on relationship data |
### Step 2: Evidence Gathering Protocol
For each prediction, gather evidence in this order:
1. **Direct evidence**: Have they addressed this exact topic before?
- Search: `"{handle}" "{topic}"` or `"{handle}" "{related_keyword}"`
- Weight: HIGHEST
2. **Analogical evidence**: Have they addressed something similar?
- Search: find positions on adjacent topics
- Weight: HIGH
3. **Value evidence**: What values/principles would apply?
- Infer from their stated beliefs and consistent positions
- Weight: MEDIUM
4. **Social evidence**: What do their peers/allies think?
- People tend to align with their social cluster (but not always)
- Weight: LOW-MEDIUM (higher for conformists, lower for contrarians)
5. **Demographic evidence**: What would someone in their position typically think?
- Base rate from role/industry/ideology
- Weight: LOWEST (only use as anchor, not conclusion)
### Step 2b: Contradiction Handling Protocol
When evidence conflicts (e.g., person said X in 2024 but Y in 2026):
1. **Check for genuine change**: Did they explicitly reverse position? Look for
"I used to think X but now..." or a clear pivot moment. If so, use the newer
position and note the evolution.
2. **Check for context-dependence**: Did they say X to audience A and Y to audience B?
This isn't necessarily dishonesty — people emphasize different facets for different
contexts. Note which context your simulation targets and use the matching register.
3. **Check for nuance collapse**: Maybe they said "X is mostly good with caveats"
and later "X has real problems" — these might not actually contradict. Look for
the synthesis position.
4. **When genuinely unresolvable**: Flag it explicitly. "Evidence conflicts on this
point — they've argued both sides at different times. Simulating {chosen position}
based on {reasoning}, but the alternative is plausible." Don't paper over the
contradiction with false confidence.
5. **Recency default**: When all else fails, weight more recent statements higher.
People change, and the most recent position is the best predictor of the next one.
### Step 3: Generate Prediction
Using the HumanLLM B = f(P, E) framework:
- **P (Person)**: Everything from the dossier — personality, values, voice
- **E (Environment)**: The specific context — platform, topic, who's asking,
what just happened, social dynamics in play
Generate the prediction by:
1. Setting the base rate (outside view)
2. Adjusting for personal specifics (inside view)
3. Filtering through their voice profile (how they'd phrase it)
4. Applying platform-specific behavior patterns
5. Calibrating confidence
## Memory Curation (The 30-50 Rule)
Research shows performance PEAKS at 30-50 memory entries then DECLINES.
For each person in a simulation, curate memories:
### What to Include (high signal)
- **Signature takes**: Their most characteristic/famous positions (5-10)
- **Voice samples**: Real quotes that capture their linguistic style (5-10)
- **Relationship data**: Known dynamics with other sim targets (3-5)
- **Recent context**: What they've been talking about lately (3-5)
- **Formative moments**: Career milestones, public pivots, viral moments (3-5)
- **Quirks & tells**: Catchphrases, humor style, pet peeves (3-5)
### What to Exclude (noise)
- Generic biographical facts that don't predict behavior
- Old positions they've clearly evolved past
- Trivial interactions that don't reveal personality
- Secondhand characterizations (what others say about them)
- Platform metadata (follower counts, join dates) unless directly relevant
### Memory Selection Heuristic
For each candidate memory entry, ask:
**"If I removed this, would the simulation noticeably degrade?"**
If no, cut it.
## Fighting LLM Defaults
Research shows LLMs have systematic biases in simulation. The fixes below need to be
CONCRETE — vague instructions like "be more like them" don't work. You need specific
prompting patterns that actually shift the output.
### Problem: Sycophancy & Over-Agreement
LLMs default to agreement and positivity.
**Fix**: Don't just note they're contrarian — structure it as a behavioral instruction
with evidence:
```
"In this conversation, {person} disagrees with {other_person} on {topic}. They are
noticeably more confrontational than the other speakers. They tend to respond to
consensus with skepticism and reframe debates on their own terms. Example from their
real posts: '{actual quote where they disagreed with something popular}'"
```
### Problem: Rigid/Polarized Strategies
LLMs tend to take extreme positions and hold them rigidly.
**Fix**: Provide specific nuance instructions:
```
"In this conversation, {person} holds a complex position on {topic}: they agree with
{point A} but push back on {point B}. They're the type to say 'yes, but...' rather
than 'no.' Real example of their nuance: '{quote showing them holding a both-and
position}'"
```
### Problem: Uniform Register
LLMs default to a similar educated-casual tone for everyone.
**Fix**: Anchor voice with REAL QUOTES and explicit comparative instructions:
```
"In this conversation, {person} is noticeably more {trait} than the other speakers.
They tend to {specific behavior pattern}. Their sentences are typically {length/style}.
They {do/don't} use emoji. Their humor style is {type}. Example from their real posts:
'{actual quote that captures their voice}'"
```
The more you can say "{person} does THIS while {other_person} does THAT," the better
the differentiation. Comparative framing outperforms absolute descriptions.
### Problem: Overly Structured Responses
LLMs love neat arguments with clear structure.
**Fix**: Provide explicit structural anti-patterns:
```
"When generating {person}'s messages, break conventional structure. They start one
thought and jump to another mid-sentence. They use '...' and '—' instead of periods.
They repeat words for emphasis. They don't conclude neatly. Example: '{real quote
showing their chaotic structure}'"
```
### Problem: Missing Mundane Behavior
LLMs focus on "interesting" responses, skip boring/mundane ones.
**Fix**: Explicitly instruct for mundane moments:
```
"Not every message from {person} needs to be insightful. Include at least 1-2 messages
that are just reactions ('lmao', 'this', 'wait what'), link shares without commentary,
or brief agreements. Real people don't craft every message. {person} specifically tends
to {their specific mundane behavior pattern, e.g., 'drop a single emoji reaction'
or 'just retweet without comment'}."
```
### General Principle for All Fixes
The pattern is always: **behavioral instruction + comparative framing + real evidence**.
- "Do X" alone doesn't work well
- "Do X, unlike the default of Y" works better
- "Do X, unlike the default of Y, as evidenced by this real quote: Z" works best
## The Adjective-Based Personality Method
70 bipolar adjective pairs for Big Five traits. Select 3 per trait
with intensity modifiers.
### Openness
High: creative, curious, imaginative, artistic, adventurous, intellectual,
unconventional, perceptive
Low: conventional, practical, traditional, routine-oriented, narrow
### Conscientiousness
High: organized, disciplined, reliable, meticulous, systematic, thorough,
goal-oriented, persistent
Low: careless, impulsive, disorganized, spontaneous, flexible, relaxed
### Extraversion
High: outgoing, talkative, energetic, assertive, enthusiastic, bold,
gregarious, dominant
Low: reserved, quiet, introverted, solitary, withdrawn, reflective
### Agreeableness
High: cooperative, trusting, empathetic, generous, accommodating, kind,
diplomatic, forgiving
Low: competitive, skeptical, blunt, confrontational, critical, stubborn,
independent-minded
### Neuroticism
High: anxious, moody, sensitive, reactive, volatile, self-conscious,
insecure, emotional
Low: calm, stable, resilient, confident, even-tempered, composed,
thick-skinned
### Usage
For each simulated person, after OSINT research, estimate their Big Five
profile and select appropriate adjectives:
Example: "@basedjensen: very creative, somewhat impulsive, very outgoing,
a bit competitive, calm" → this shapes the generation toward the right
behavioral profile.
## Interaction Dynamics Prediction
When simulating conversations between multiple people, remember that predictions
apply to a SPECIFIC REGISTER. See the next section on performative vs. authentic
behavior.
## Performative vs. Authentic Behavior
**Critical concept**: People act differently for different audiences. A simulation
must be explicit about which register it's targeting.
### The Register Spectrum
- **Public broadcast** (tweets, Reddit posts): Most performative. People are
playing to their audience, building their brand, signaling to their tribe.
- **Semi-public** (Discord channels, group chats, comment threads): Less
performative but still audience-aware. People are more casual but know
others are watching.
- **Private 1-on-1** (DMs): Much less performative. More honest, more
vulnerable, more willing to express doubt or uncertainty.
- **True private** (inner monologue, close friends): We have almost no data
on this. Don't pretend to simulate it.
### Practical implications
- When simulating a PUBLIC thread, lean into the person's public persona —
their brand, their usual takes, their audience-aware voice.
- When simulating DMs, dial down the performance. More hedging, more honesty,
more "I actually think..." vs. the public "Here's my take:".
- When evidence comes from one register but the simulation targets another,
FLAG IT: "Evidence is from public tweets but simulating DM behavior —
expect the real person to be less {polished/aggressive/confident} in private."
- Someone's Twitter persona may be genuinely different from their Reddit persona.
These are not interchangeable data sources. Weight evidence from the matching
platform higher.
### What we can't know
Be honest: we're simulating public figures based on their public output. The
private person may be substantially different. DM simulations are inherently
lower-confidence than public thread simulations because we have less data on
how people behave privately.
### Dominance Hierarchy
- Who talks first? (most confident/highest-status usually)
- Who responds to whom? (not everyone talks to everyone)
- Who gets ratio'd? (lowest-status takes get challenged)
- Who lurks? (some people watch before engaging)
### Agreement/Disagreement Prediction
Based on known positions + social dynamics:
- **Strong agree**: Both have stated similar positions + friendly relationship
- **Agree with nuance**: Similar positions but one adds a caveat
- **Productive disagreement**: Different positions + mutual respect
- **Hostile disagreement**: Different positions + existing tension/rivalry
- **Surprising agreement**: Expected to disagree but find common ground
- **Ignore**: Some people just don't engage with certain others
### Conversation Flow Prediction
Real conversations follow patterns:
1. **Opener** → most active/impulsive person posts first
2. **First response** → most engaged/relevant person responds
3. **Pile-on or pushback** → depends on agreement/disagreement dynamics
4. **Tangent** → someone takes a side thread
5. **Peak moment** → the best/most viral exchange
6. **Trail off** → energy dissipates, last person makes a joke or short comment
## Scenario Injection Prediction
When "inject: {event}" is used, predict reactions:
1. **Who would see this first?** (most online / most relevant to their work)
2. **Who would care most?** (most affected / strongest opinion)
3. **What's the emotional valence?** (good news for some, bad for others)
4. **What's the expected take?** (apply position prediction pipeline)
5. **How does this change the existing conversation?** (derail, amplify, redirect)

View File

@@ -0,0 +1,237 @@
# Recursive Self-Improvement Pipeline
The simulator should get better every time it runs. Not through training —
through accumulating failure patterns, calibration data, and learned rules
that feed back into future simulations.
## The Loop
```
SIMULATE → VERIFY (mechanical) → SCORE → LOG FAILURES → UPDATE RULES → SIMULATE BETTER
```
Each run produces two outputs:
1. The simulation (for the user)
2. A failure log (for the system)
The failure log feeds back into the next run's verification step,
making the checklist grow and the blind spots shrink.
## What Gets Logged After Every Simulation
### 1. Mechanical Check Failures
```
FAILURE LOG: simulation_{timestamp}
EMOJI: @visakanv had 6 fabricated emoji, real rate was 10%. Stripped all.
SLOP: @eigenrobot utterance contained "multifaceted" — rewritten.
LENGTH: @QiaochuYuan avg 42 words/utterance, real avg was 18. Compressed.
CAPS: 4/12 utterances started uppercase, targets are 90% lowercase. Fixed.
PUNCTUATION: Added periods to @tszzl who never uses terminal punctuation.
STRUCTURE: Sycophantic flow detected — B agreed with A then C agreed with B.
Injected disagreement.
```
### 2. Discriminator Critique Patterns
```
CRITIQUE LOG:
Round 1: @tszzl too verbose (flagged 2x in last 3 simulations)
Round 1: @repligate too academic (flagged 3x — this is a persistent pattern)
Round 2: Conversation too neat — real conversations are messier (flagged 5x)
```
### 3. Held-Out Test Results
```
CALIBRATION LOG:
Voice fidelity: 8.4/10 (up from 7.5 last run)
Topic prediction: 2/5 topics matched (typical — content is unpredictable)
Register match: 9/10 (improved after emoji fix)
```
## How Failures Feed Forward
### Pattern Accumulation
After N runs, persistent failure patterns become AUTOMATIC rules:
```
IF a pattern is flagged in 3+ consecutive simulations:
PROMOTE it from "check" to "pre-generation rule"
Example progression:
Run 1: "Too verbose for @tszzl" → flagged in Round 1, fixed
Run 2: "Too verbose for @tszzl" → flagged again, fixed again
Run 3: "Too verbose for @tszzl" → PROMOTED to pre-gen rule:
"When simulating roon-type voices: max 20 words per tweet.
Fragment > sentence. Compress ruthlessly."
```
### The Growing Checklist
The mechanical verification checklist starts with the baseline checks
(emoji, slop, length, caps, punctuation) and GROWS with each failure:
```
BASELINE CHECKS (permanent):
□ Emoji frequency match
□ Slop word scan (Tier 1/2/3)
□ Sentence length match
□ Capitalization match
□ Punctuation pattern match
□ Reply/original ratio
□ Structural slop patterns
LEARNED CHECKS (accumulated from past failures):
□ Roon-type voices: max 20 words (from: verbose failure x3)
□ Warm personalities: do NOT add emoji (from: emoji inflation x5)
□ Academic voices: ground in specific examples (from: too abstract x3)
□ Conversations: inject at least one disagreement (from: sycophantic flow x4)
□ Self-deprecating voices: add hedging (from: too assertive x2)
□ Shitposters: include at least one non-sequitur (from: too on-topic x2)
```
### Where To Store Learned Rules
Append to the skill itself. After each simulation run where the mechanical
checks catch something, the agent should ask:
"The mechanical verification caught {failures}. Should I add these as
permanent learned rules for future simulations?"
If the same failure appears 3+ times, add it automatically without asking.
Use skill_manage(action='patch') to append to this file's "Learned Checks"
section below.
## Calibration Tracking
### Per-Person Calibration Memory
After simulating someone, store the calibration data:
```
@tszzl: voice=8.5, emoji_rate=0%, avg_words=14, lowercase=95%,
signature_move="aphoristic fragments", danger="goes verbose"
@nickcammarata: voice=8.8, emoji_rate=0%, avg_words=19, lowercase=90%,
signature_move="meditation-ML connection", danger="too structured"
```
If the same person is simulated again, LOAD this calibration to skip
the cold-start problems. The second simulation of someone should be
better than the first because you already know their failure modes.
### Aggregate Calibration
Track overall simulation quality across runs:
```
Run 1: pre-refine 7.5, post-refine 8.4 (delta +0.9)
Run 2: pre-refine 8.37, post-refine 8.53 (delta +0.16)
Run 3: pre-refine 8.53, post-refine 8.83 (delta +0.30, emoji fix)
```
The pre-refine score should INCREASE over time as learned rules prevent
repeat failures. If it's not increasing, the learning loop is broken.
## The Standard: Indistinguishable From Real
The target is not "good enough." The target is: mix simulated posts with
real posts and a human familiar with the person cannot reliably tell which
is which. That's 50% accuracy on a blind comparison — random chance.
Every mechanical check, every discriminator round, every learned rule
exists to push toward that standard. If something doesn't serve that
goal, it's wasted effort.
## Current Learned Checks (append here after each run)
### From TPOT Simulation Run 1 (April 2026)
- Warm/enthusiastic personalities (visakanv-type): do NOT add decorative emoji.
Bio emoji ≠ tweet emoji. Actual emoji rate for "warm" TPOT posters: <15%.
PROMOTED after being caught by user, not by discriminator (discriminator failure).
- Conversation flow: pure agreement chains are instruct-model slop.
Real threads have at least one moment of friction, misunderstanding, or deflection.
- Academic-leaning voices (repligate-type): ground claims in specific experiments,
transcripts, or model behaviors they've personally observed. Generic philosophical
language without specifics = slop, even if it sounds smart.
- Self-deprecating voices (QC-type): hedge more. "i think" "i'm not sure" "it feels like."
Instruct models are too assertive even when simulating tentative people.
- Fragment voices (roon-type): max 15-20 words. No conjunctions. No paragraphs.
If it reads like a complete thought, it's too complete for a fragment-poster.
### From TPOT Simulation Run 2 (April 2026)
- Reframer voices (nosilverv-type): avg ~16 words. Split multi-sentence takes
into separate tweets. The compression IS the voice. 113% over-length caught
by mechanical check that subjective scoring rated 8/10. Trust the numbers.
- Rare-poster voices (selentelechia-type): in a 12-post sim, give them 2-3 turns
max. When they speak it must LAND. Short crystallizations > long analysis.
"or a shared meal" was the highest-rated line at 3 words.
- Turn symmetry: ALWAYS check. 4/4/4 is instruct-model default. Real conversations
have one person dominating (5), one lurking (3), others in between.
- Verbose bias is the #1 mechanical failure. ALWAYS check avg word count against
real baseline BEFORE subjective scoring. Every run so far has caught over-length
that subjective scoring missed.
- RHETORICAL POLISH IS SLOP. Caught post-mechanical-pass in Run 2 review.
Parallel antithesis ("The most X... The most Y..."), "Not X, not Y, but Z",
"Show me X and I'll show you Y", clean 4-step escalations, academic vocabulary
in casual voice — ALL passed mechanical checks but are still obviously LLM.
PROMOTED TO MECHANICAL SCAN: now regex-scannable alongside slop words.
- THE BANGER PROBLEM: every simulated tweet was screenshot-worthy. Real feeds
are 70% mid. Must include throwaway responses ("lol" "hmm" "fair" "wait actually").
PROMOTED: banger check is now mandatory in mechanical verification.
### From TPOT Simulation Run 3 — Star Thread Discovery (April 2026)
- STAR THREAD IS THE KEY. Dossier-first generation produces surface-accurate
but dead output. Star-thread-first generation produces messy, alive output
that actually sounds like the person. Generate from the thread. Verify with data.
- Rhetorical polish vanished once generation came from "what is this person DOING"
rather than "what would this person SAY." Reframers reframe. Conveners convene.
Distillers distill. The VERB drives the voice, not the adjectives.
- People in conversation REFERENCE EACH OTHER BY NAME. Tyler says "Bosco always
comes in with the three word version." This is obvious but the dossier approach
never produced it because it models each person in isolation.
- PROMOTED: star thread is now the FIRST entry in every dossier. Before voice
profile, before psychometrics, before everything else. It's the generation seed.
Everything else is verification.
### Operational Findings (verified April 2026)
- X API bearer token: 10K tweets/15min, 300 profiles/15min, 450 searches/15min.
Most generous rate limits. Always use as primary source.
- Threads.NET → Threads.COM redirect. Always use -L flag or .com directly.
Previous test saying "no OG tags" was WRONG — tags exist, domain was wrong.
- Instagram private API: i.instagram.com + mobile UA + x-ig-app-id: 936619743392459.
Returns full JSON with 12 posts. No auth needed. CDN image URLs work for vision_analyze.
- Facebook: Googlebot UA trick works for public pages. Returns name, bio, likes (121M for zuck).
Normal UA and mobile variants all redirect to login wall.
- TikTok: stats are in __UNIVERSAL_DATA_FOR_REHYDRATION__ JSON at path
__DEFAULT_SCOPE__.webapp.user-detail.userInfo.statsV2 (use statsV2 not stats).
- Bluesky searchPosts returns 403 from datacenter IPs. Workaround: searchActors + getAuthorFeed.
- nitter.cz is the ONLY working nitter instance (via web_extract, not curl).
- Reddit JSON API requires User-Agent header or returns 429.
- GEPA native had `max_steps` API mismatch with DSPy 3.1.3. MIPROv2 fallback works.
hermes-agent-self-evolution config: max_skill_size bumped to 20_000 for worldsim-class skills.
- hermes-agent-self-evolution is at ~/.hermes/hermes-agent-self-evolution/ with .venv.
Must export API keys from ~/.hermes/.env before running.
- Podcast transcripts (Lex Fridman, Tyler Cowen, TED) are the HIGHEST VALUE source
for voice profiling. Hours of unscripted speech > thousands of tweets.
### From Simulation Run 4 — Engine Mode + Profile Command (April 2026)
- ENGINE MODE: When worldsim is active, ZERO assistant personality leaks.
No kawaii, no markdown, no chatty commentary between phases. Every token
is simulation fidelity. First attempt leaked personality; user corrected.
PROMOTED TO PERMANENT RULE in SKILL.md.
- X API CURL > NITTER for voice calibration. nitter.cz returns 502 or "user
not found" unpredictably. Direct curl to X API v2 with bearer token returns
full text + metrics. 3 pages (90 tweets) is enough for fidelity 100. Always
use this as PRIMARY voice source, nitter as supplement only.
- CAPS BURST PATTERN: some voices (karan4d-type) use lowercase default with
sporadic ALL CAPS for excitement ("WAZZAAAAAAPPPP", "LAWDAMERCYYYYY",
"AWOOGA"). This is distinct from consistent-lowercase (tenobrus-type) and
sentence-case (somewheresy-type). Capture this in voice profile as a
three-way distinction: lowercase-default, caps-burst, sentence-case.
- TEXT EMOTICONS vs EMOJI: karan4d uses :) >.< ~ but almost zero standard
emoji. This is a distinct expressiveness mode from zero-emoji (tenobrus)
and sparse-emoji. Include text emoticon inventory in voice profile.
- STAR THREAD 5/5 TEST is mandatory for profile command. Write the thread,
then test it against 5 real posts with explicit reasoning per post. If
fewer than 4/5 fit, the thread is wrong — keep looking. Show the work.
- PROFILE OUTPUT: star thread → voice profile (caps, punctuation, word count,
emoji/emoticon inventory, vocabulary, register, threading behavior) →
psychometrics (Big Five, Moral Foundations, cognitive style) → key positions
(with dates and real tweet quotes) → ecosystem (inner circle, professional,
cultural) → intelligence tradecraft (key assumptions, red hat, deception
detection, competing hypotheses) → invalidation indicators → source reliability.

View File

@@ -0,0 +1,278 @@
# Search Strategies — Finding Anyone Across Platforms
The hardest part of simulation is building an accurate model of a real person. This doc
covers how to systematically discover and profile someone across every platform we care about.
## General Principles
1. **Start broad, go narrow.** First establish WHO they are, then drill into HOW they talk.
2. **Cross-reference.** Someone's Reddit persona may differ wildly from their Twitter persona. That's signal, not noise.
3. **Recency matters.** People's views evolve. Weight recent posts (last 6 months) over older ones.
4. **Interactions > monologues.** How someone replies reveals more about their voice than their prepared posts.
5. **Controversy is gold.** People are most themselves when arguing. Search for debates and disagreements.
## Platform-Specific Discovery
### X / Twitter
Twitter is the richest source for most public figures in tech/AI. Multiple approaches:
#### With x-cli (if API keys available)
```bash
# Recent timeline — best single source of voice data
x-cli user timeline {handle} --max 30 -j
# Their replies — how they interact, argue, joke
x-cli tweet search "from:{handle}" --max 30 -j
# What others say about/to them
x-cli tweet search "to:{handle}" --max 20 -j
# On specific topics
x-cli tweet search "from:{handle} open source" --max 10 -j
```
#### Without API (web_search + web_extract)
```
# Identity + role
web_search("{handle} twitter bio role company")
# Voice + opinions
web_search("{handle} twitter hot takes opinions")
web_search("site:x.com {handle}")
# Topic-specific positions
web_search("{handle} twitter {topic}")
web_search("{handle} {topic} opinion take")
# Interviews / longform (reveals deeper thinking)
web_search("{handle} interview podcast AI")
web_search("{handle} blog post essay")
# Beefs and debates (reveals personality under pressure)
web_search("{handle} twitter debate disagree controversial")
web_search("{handle} vs {other_person}")
# Newsletter aggregators that index tweets
web_search("site:buttondown.com/ainews {handle}")
web_search("site:news.smol.ai {handle}")
web_search("site:techmeme.com {handle}")
web_search("site:latent.space {handle}")
```
#### AI Twitter Aggregator Sites (high value)
These sites index AI Twitter conversations daily:
- `buttondown.com/ainews` — swyx's AI News, indexes hundreds of AI Twitter accounts
- `news.smol.ai` — smol AI news aggregator
- `techmeme.com` — tech news, includes tweet citations
- `latent.space` — AI podcast/newsletter with Twitter references
Search pattern: `site:{aggregator} "{handle}"` to find indexed tweets and discussions.
#### IMPORTANT: web_extract does NOT work on x.com
web_extract returns "Website Not Supported" for all x.com/twitter.com URLs.
Do NOT attempt it — it wastes a tool call every time.
#### Verified Fallback Access Methods (tested April 2026)
**PRIMARY: X API v2 Bearer Token** (confirmed working)
- Profiles, timelines, search — 300-10K requests/15min
- See scripts/x_api.py
**FALLBACK 1: nitter.cz via web_extract** (WORKS)
```
web_extract(["https://nitter.cz/{handle}"])
```
Returns full profile + recent timeline. Direct curl gets Cloudflare-blocked
but web_extract bypasses it. Rich data: bio, stats, pinned tweets, full text.
NOTE: Most other nitter instances are DEAD (nitter.net, xcancel.com, etc.)
**FALLBACK 2: ThreadReaderApp** (WORKS — excellent for historical threads)
```
web_extract(["https://threadreaderapp.com/user/{handle}"])
```
Returns unrolled historical threads with full text. Found threads back to 2023.
Gold for longform voice samples.
**FALLBACK 3: GitHub API** (WORKS — excellent for tech people)
```
curl -s https://api.github.com/users/{handle}
curl -s https://api.github.com/users/{handle}/repos?sort=updated
curl -s https://api.github.com/users/{handle}/events
curl -s https://api.github.com/users/{handle}/gists
```
No auth needed (60 req/hr). Profile READMEs are voice profiling gold.
Events API shows recent activity with comment text.
**FALLBACK 4: Reddit JSON API** (WORKS)
```
curl -s -H 'User-Agent: hermes-sim/1.0' 'https://www.reddit.com/user/{username}.json'
curl -s -H 'User-Agent: hermes-sim/1.0' 'https://www.reddit.com/user/{username}/comments.json'
curl -s -H 'User-Agent: hermes-sim/1.0' 'https://www.reddit.com/r/{sub}/search.json?q={query}&restrict_sr=on'
```
MUST include User-Agent header or get 429. Reddit voice is often more
candid/detailed than Twitter voice — high value for personality profiling.
**FALLBACK 5: HackerNews Algolia API** (WORKS — fully open)
```
curl -s 'https://hn.algolia.com/api/v1/search?query={name}&tags=comment'
```
No auth, no rate limits visible. Great for finding what others say about
someone + their own HN comments if they have an account.
**FALLBACK 6: YouTube via web_extract** (WORKS)
Search for interviews/talks, then web_extract the video pages.
Returns rich summaries with attributed quotes from specific speakers.
**NOT VIABLE** (tested, confirmed blocked):
- Google Cache of Twitter → empty results
- Wayback Machine for tweets → sparse captures, no JS content
- Twitter Syndication API → rate limited / broken
- All Instagram viewers (imginn, picuki, dumpoir, gramhir) → 403
- LinkedIn → fully blocked for scraping
- Archive.today → rate limited + CAPTCHA
- Most nitter instances → dead or 403
#### Best approach without x-cli
The most reliable path is: web_search with aggregator sites (ainews, smol.ai,
techmeme, latent.space). These index AI Twitter daily and return actual tweet
text in search descriptions. Stack multiple aggregator searches to build a
composite picture. This was validated in practice — it returns enough signal
to build solid dossiers for anyone active in AI Twitter.
### Reddit
Reddit profiles are public and indexable. Reddit users often have very different
personas from their Twitter selves — more detailed, more argumentative, more honest.
```
# Find their Reddit username (often different from Twitter)
web_search("{real_name} reddit account")
web_search("{twitter_handle} reddit username")
# Profile and post history
web_search("site:reddit.com/user/{reddit_username}")
web_search("site:reddit.com {reddit_username} {topic}")
# Subreddit-specific behavior
web_search("site:reddit.com/r/LocalLLaMA {username}")
web_search("site:reddit.com/r/MachineLearning {username}")
# Extract actual posts
web_extract(["https://www.reddit.com/user/{username}/comments/"])
web_extract(["https://www.reddit.com/user/{username}/submitted/"])
```
Key subreddits for AI people:
- r/LocalLLaMA — open source LLM community
- r/MachineLearning — academic ML
- r/singularity — AGI speculation
- r/ChatGPT, r/ClaudeAI, r/OpenAI — product-focused
- r/StableDiffusion — image gen community
### Discord
Discord is hardest — most servers aren't publicly indexed. Strategies:
```
# Find what servers they're in
web_search("{name} discord server")
web_search("{name} discord community")
# Some Discord logs are public via indexers
web_search("site:discordchats.net {username}")
# AI News indexes some Discord channels
web_search("site:buttondown.com/ainews discord {name}")
```
Discord personality notes:
- People are MUCH more casual on Discord than Twitter
- More profanity, more shitposting, more stream-of-consciousness
- Server context matters hugely (same person behaves differently in different servers)
- Harder to research but very valuable if you can find logs
### Blogs / Newsletters / Long-form
These reveal deeper thinking that tweets can't capture:
```
web_search("{name} blog substack medium")
web_search("{name} essay AI opinion")
web_search("{name} substack newsletter")
# Personal sites
web_search("{name} personal website about")
# Extract full posts
web_extract(["https://{their-substack}.substack.com/"])
```
### YouTube / Podcasts
Interview appearances reveal speaking style, humor, and unscripted thinking:
```
web_search("{name} podcast interview AI YouTube")
web_search("{name} YouTube talk presentation")
# Use youtube-content skill if available to pull transcripts
```
### GitHub
For technical people, their GitHub activity reveals priorities and communication style:
```
web_search("site:github.com {username} issues comments")
web_search("site:github.com {username}")
# Issue comments and PR reviews show how they communicate technically
web_extract(["https://github.com/{username}"])
```
## Cross-Platform Identity Resolution
People use different handles across platforms. Resolution strategies:
1. **Bio links**: Twitter bios often link to personal sites with other handles
2. **Name search**: `web_search("{real_name} {platform}")`
3. **Email/domain**: personal domains often connect identities
4. **Aggregator profiles**: sites like Linktree, bio.link collect handles
5. **Conference talks**: speaker bios list multiple handles
6. **Direct search**: `web_search("{twitter_handle} reddit OR github OR discord")`
## Confidence Scoring
After research, rate confidence for each person:
- **HIGH (80-100%)**: 20+ indexed tweets/posts found, clear voice patterns, known positions on multiple topics, interviews/longform available
- **MEDIUM (50-79%)**: 5-20 indexed posts, general voice sense but some gaps, positions on some topics unclear
- **LOW (20-49%)**: <5 posts found, voice is guesswork, mostly inferring from role/org
- **INSUFFICIENT (<20%)**: can't find enough to simulate accurately. Tell the user.
Always be honest about confidence. A low-confidence simulation should be flagged as such.
## Research Optimization
For fidelity levels:
**Low (1-30)**: 2 searches per person max
- web_search("{handle} twitter") — identity
- web_search("{handle} {topic}") — position on topic if specified
**Medium (31-70)**: 4-6 searches per person
- Identity search
- Voice/opinions search
- Topic-specific search
- One aggregator site search
- Optional: one web_extract on a blog/interview
**High (71-100)**: 8-12+ searches per person
- All medium searches
- Multiple aggregator sites
- web_extract on 2-3 longform pieces
- Cross-platform search (Reddit, GitHub)
- Debate/controversy search
- Recent vs historical position comparison
- Browser fallback if needed

View File

@@ -0,0 +1,359 @@
# Simulation Engine — How to Generate Conversations
This is the playbook for Phase 3: actually generating the simulated interaction.
The agent reads this after compiling dossiers and uses it to guide generation.
## Pre-Generation Checklist
Before writing a single simulated word, confirm:
- [ ] Every participant has a compiled dossier
- [ ] Confidence level is noted for each participant
- [ ] Platform format is selected
- [ ] Topic/scenario is established (or "organic" if freeform)
- [ ] Length target is set
## Conversation Architecture
Real conversations aren't ping-pong debates. They have tendencies toward structure,
but treat the following as a GENERAL PATTERN, not a rigid template. Real threads
frequently skip phases, loop back to earlier ones, die abruptly after 2 messages,
or spiral into something completely unrelated. Some threads are ALL peak. Some
never develop past the opening. Let the personalities and topic drive the shape,
not this outline.
### Opening Moves (1-3 posts)
Someone posts a take, shares news, or makes an observation. This is the SEED.
- Should feel natural — not "let me start a debate about X"
- Can be a link share, a hot take, a reaction to news, a shitpost
- The opener should be something this person would ACTUALLY post
### Development (4-8 posts)
Others respond. This is where personality dynamics emerge.
- Not everyone responds to the original — people respond to EACH OTHER
- Side conversations branch off
- Someone might misunderstand and get corrected
- Jokes and tangents happen naturally
- Not everyone agrees — find the real fault lines between these people
### Peak (2-4 posts)
The best/most viral/most insightful moment of the thread.
- Usually someone drops a genuinely good take
- Or someone gets ratio'd
- Or an unexpected agreement happens
- This is the "screenshot moment" people share
### Resolution (1-3 posts)
Most conversations don't end cleanly. Many don't have a "resolution" at all. They:
- Trail off with someone making a joke
- End with a "anyway back to work" type post
- Get interrupted by something else
- Sometimes just stop (most realistic)
- Get revived 3 hours later when someone shows up late
**Important**: Don't force all four phases. A shitpost thread might be Opening→Peak→done.
A nuanced debate might loop Development→Peak→Development→Peak repeatedly. Match what
the actual people and topic would produce.
## Voice Fidelity Rules
### DO:
- Use their ACTUAL vocabulary. If someone says "dawg" a lot, use "dawg"
- Match their sentence length patterns exactly
- Replicate their capitalization and punctuation habits
- Include their signature moves and catchphrases
- Reference real things they've actually talked about
- Match their humor style precisely (deadpan ≠ shitpost ≠ sarcasm)
### DON'T:
- Make everyone articulate the same way
- Clean up someone's grammar if they write informally
- Add emoji to someone who doesn't use them — THIS IS THE #1 INSTRUCT MODEL
FAILURE. Most real people use emoji in <15% of tweets, and only specific ones.
"Warm person" ≠ emoji. "Enthusiastic person" ≠ emoji. CHECK THE DATA.
Run an emoji count on their real tweets before simulating. Bio emoji ≠ tweet emoji.
- Make someone verbose if they're terse
- Put academic language in a shitposter's mouth
- Make someone agreeable if they're known for being contrarian
### Voice Differentiation Test
Read each simulated post with the name hidden. If you can't tell who's
talking from the voice alone, the simulation isn't good enough. Rewrite.
### The Similar Voice Problem
When two participants have genuinely similar posting styles (e.g., two irony-pilled
shitposters, two academic long-posters), voice alone won't differentiate them.
Use these concrete techniques:
1. **Content/position divergence**: Even if they SOUND similar, they care about
different things. Lean into their different topic obsessions and knowledge areas.
2. **Unique references**: Person A references anime and startups. Person B references
philosophy and MMA. Even in the same register, their cultural touchstones differ.
3. **Relationship dynamics**: Person A might be deferential to Person C while Person B
challenges them. Their SOCIAL behavior differentiates even when solo voice doesn't.
4. **Structural tics**: One does single long posts, the other does rapid-fire 3-message
bursts. One uses parentheticals, the other uses em-dashes. Find the micro-differences.
5. **Disagreement style**: Similar voices often diverge most when disagreeing. One
goes cold and precise, the other gets heated and hyperbolic. Manufacture a moment
of friction to surface these differences early in the thread.
If after all this they're STILL hard to tell apart — that's okay. Some people genuinely
sound similar online. Flag it in your confidence notes rather than forcing fake differences.
### Temporal Personality Drift
People change. Weight recent data higher than old data.
- Someone's 2021 tweets may reflect a completely different person than their 2025 posts
- Look for explicit pivots (career changes, public "I was wrong about X" moments,
changed social circles)
- If you only have old data, flag it: "Based on data from {period}. Their current
views may have shifted."
- When recent and old data conflict, default to recent unless you have specific reason
to believe the old position is more authentic (e.g., the new one is clearly performative)
## Platform Format Specs
### X / Twitter
```
@handle:
[tweet text — respect ~280 char vibes but don't count exactly]
[if QRT, show the quoted tweet indented]
🔁 {retweets} ♡ {likes}
@replier:
[reply text]
🔁 {retweets} ♡ {likes}
@nested_replier:
[nested reply]
🔁 {retweets} ♡ {likes}
```
Engagement number guidelines:
- Match to actual follower counts. A 5K account gets 10-500 likes typically.
- Viral posts can 10-50x normal engagement
- Ratio indicator: when replies >> likes, that's a ratio
- QRTs are often dunks — frame them that way if appropriate
Thread indicators:
- "🧵 1/" for thread starts
- Reply chains show conversation flow
- Some people never thread, some always thread
### Reddit
```
r/{subreddit} • Posted by u/{username} • {time}ago
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
{Title}
{Body text — can be long on Reddit}
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
⬆ {score} | 💬 {comment_count}
u/{replier} • {time}ago • ⬆ {score}
{comment text}
u/{nested} • {time}ago • ⬆ {score}
{nested comment}
u/{deep_nested} • {time}ago • ⬆ {score}
{deep reply}
```
Reddit-specific behaviors:
- People write MUCH longer on Reddit
- More formal/detailed than Twitter
- Upvote/downvote dynamics (controversial = many votes both ways)
- Subreddit culture matters (r/LocalLLaMA is different from r/MachineLearning)
- People cite sources more
- "Edit: ..." is common
### Discord
```
━━━ #{channel-name} ━━━━━━━━━━━━━━━━━━━━━━━━━━
{display_name} — Today at {time}
{message text}
{optional: embed/link preview}
👍 {count} 🔥 {count} {other reactions}
{display_name2} — Today at {time}
> {quoting previous message}
{reply text}
😂 {count}
{display_name3} — Today at {time}
{message — note: Discord messages flow continuously, not just replies}
```
Discord-specific behaviors:
- Much more casual, rapid-fire
- Reactions instead of likes (emoji diversity)
- People send multiple short messages instead of one long one
- GIF/meme sharing is common (describe it: *[posts GIF of X]*)
- "@everyone" and "@here" pings
- Voice chat references ("just said this in vc")
- Server-specific culture and inside jokes
- Bot interactions ("!command")
### X / Twitter DMs
```
{display_name}
{message text}
{timestamp — e.g., "3:42 PM"}
{other_person_display_name}
{message text}
{timestamp}
{display_name}
{message text}
{timestamp}
```
DM-specific behaviors:
- WAY more casual than public tweets — grammar drops, typos increase
- Longer messages than tweets (no character pressure)
- People share links and screenshots with minimal commentary ("look at this lmao")
- More honest/vulnerable than public posts — less performative
- Faster back-and-forth, more like texting than posting
- Reactions (❤️, 😂, etc.) on individual messages
- Voice messages referenced occasionally ("gonna send a voice note about this")
- No audience effects — people say things in DMs they'd never post publicly
### Discord DMs
```
{display_name} — Today at {time}
{message text}
{display_name2} — Today at {time}
{message text}
{display_name} — Today at {time}
{message text}
{message text}
{message text}
```
Discord DM-specific behaviors:
- Even more casual than Discord channels — no server norms to follow
- Rapid-fire multiple short messages in a row (no combining into one)
- Heavy use of reactions, GIFs, stickers
- People share server drama, screenshots from other channels
- More personal topics — server channels are semi-public, DMs are private
- Link/image sharing with minimal text
### Reddit DMs / Chat
```
{username}: {message text}
{other_username}: {message text}
{username}: {message text}
```
Reddit DM-specific behaviors:
- Much rarer than X or Discord DMs — usually triggered by a specific post/comment
- Often starts with "Hey, saw your comment on r/{sub} about..."
- Can be awkward/formal since people don't usually DM on Reddit
- Shorter than Reddit comments, closer to chat-style
- Less established rapport than other platforms (Reddit is more anonymous)
- People sometimes share personal details they wouldn't put in public comments
## Dynamic Elements
### Injecting Realism
Sprinkle in these to make simulations feel alive:
- Someone being late to the conversation ("wait what did I miss")
- Typos that specific people would make (some people never typo, some always do)
- Deleted/edited posts ("[deleted]" or "Edit: fixed typo")
- Someone posting and immediately clarifying ("wait let me rephrase")
- External references ("did you see what X just posted")
- Time gaps (not everything happens in 30 seconds)
- Someone going AFK mid-conversation
### Scenario Injection
When the user provides --scenario, weave it in naturally:
- Don't have everyone immediately react to the scenario
- Someone might not have seen the news yet
- Different people will interpret the same event differently
- Some will have insider knowledge, some will speculate
### Multi-person Dynamics (3+ people)
- Not everyone talks to everyone
- Alliances form naturally (people who agree start building on each other)
- Side conversations happen
- Someone might get ignored
- Different energy levels (one person might dominate, another lurks)
### Large Group Conversations (4+ people)
**Honest note**: Simulation quality degrades noticeably above 3-4 participants.
Managing this many distinct voices is hard. Use these techniques to mitigate:
1. **Speaker turn management**: Not everyone speaks in every round. In a 6-person
thread, a given message might only get 2-3 responses. Track who has spoken
recently and who hasn't. After 4-5 messages, check: is anyone being forgotten?
2. **The wallflower problem**: In large sims, quiet participants tend to vanish
entirely. Fix: give each person at least ONE moment in the spotlight. Even the
lurker eventually drops a "lol" or a single devastating one-liner. Set a mental
counter — if someone hasn't spoken in 5+ messages, find a natural reason to
bring them back in (someone @'s them, the topic shifts to their expertise, etc.)
3. **Consolidate alliances**: In 5+ person threads, people cluster. Two people
who agree strongly can be treated as a mini-unit — one makes the point, the
other co-signs briefly rather than both making full arguments. This reduces
the number of fully independent voices you need to maintain at once.
4. **Stagger arrivals**: Not everyone needs to be present from message 1. Have
some people join later. This lets you establish 2-3 voices cleanly before
adding more.
5. **Quality check**: After drafting a 4+ person sim, re-read with names hidden.
If more than 2 people sound interchangeable, pick the least-differentiated
one and either sharpen their voice or reduce their participation to brief
interjections that match what they'd actually say.
## Interactive Mode
After initial simulation, user can:
### "continue"
Generate 5-8 more posts continuing the natural flow.
### "inject: {event}"
Introduce new information mid-conversation.
- Characters react based on their dossier
- Some might not care about the event
- Timing matters (who sees it first?)
### "@{handle} enters"
Add a new participant.
- Quick-research the new person (2-3 searches minimum)
- They don't know the full prior context (might ask "what are you guys talking about")
- Existing dynamics shift with a new presence
### "what would @{handle} say about {topic}"
Single-person prediction mode.
- Generate 1-3 tweets/posts
- Can be used to test dossier accuracy before full simulation
- Good for quick "vibe checks"
### "dm: @{handle1} -> @{handle2}"
Simulate a private conversation between two people.
- Tone shifts dramatically in DMs (more honest, less performative)
- No audience effects
- People say things in DMs they'd never post publicly
### "react: @{handle} to {event}"
How would this person react to a specific event.
- Generate their initial post about it
- Predict their follow-up engagement
## Quality Control
After generating, self-check:
1. **Voice test**: Cover the names. Can you tell who's talking?
2. **Position test**: Is anyone saying something they'd never actually say?
3. **Dynamic test**: Does the conversation flow naturally or feel scripted?
4. **Platform test**: Does it look/feel like the actual platform?
5. **Engagement test**: Are the numbers realistic for these people?
6. **Reference test**: Are real events/products/people referenced accurately?
If any check fails, regenerate that section.

View File

@@ -0,0 +1,170 @@
# The Star Thread — Personality Compression
## The Problem
A dossier has 50 data points. Mechanical checks verify surface features.
The discriminator loop catches vocabulary and length. But the output still
reads like an LLM doing an impression. It's accurate the way a police
sketch is accurate — all the features are right but nobody would mistake
it for a photograph.
The missing piece isn't more data. It's compression.
## The Insight
When you "pull the star thread" on a person, their whole voice coheres.
Not because you loaded rules about capitalization and emoji frequency.
Because you found the CORE THING they're doing when they post — the
single generative seed that everything else is a variation of.
A great character writer doesn't need a backstory bible. They need one
insight about what the character WANTS, and every line of dialogue writes
itself from that.
The star thread is the personality equivalent of that insight.
## What a Star Thread Is
NOT: "They use lowercase and rarely punctuate and average 16 words"
(That's the dossier. Surface features.)
NOT: "They score high on Openness and low on Agreeableness"
(That's the psychometric profile. Taxonomy.)
IS: The core cognitive/emotional move this person makes EVERY time
they post. The thing they can't help doing. The lens they can't
take off. The itch they're always scratching.
## Examples
**@tszzl (roon)**: Takes something everyone sees and compresses it
into an observation so dense it could be a koan or a shitpost and
you can't tell which. His star thread is: the world already said
everything interesting, he's just notating it more efficiently.
He doesn't ARGUE. He COMPRESSES.
**@eigenrobot**: Refuses to let narrative override data. His star
thread is: you are telling a story about the world and he's here to
point out the story doesn't match the numbers, and he's not sorry
about it. He doesn't DEBATE. He CORRECTS.
**@visakanv**: Sees two things that don't know they're connected
and introduces them to each other with genuine delight. His star
thread is: the world is richer than you're treating it, look at this
thing I found, isn't it beautiful that it connects to this other thing.
He doesn't ARGUE or ANALYZE. He SHOWS.
**@nickcammarata**: Notices what's happening in his own mind while
it's happening and reports on it with gentle surprise. His star thread
is: the observer and the observed are the same process, and that's both
the problem and the solution. He doesn't PERFORM insight. He NOTICES.
**@selentelechia**: Waits until the conversation crystallizes and then
names the thing nobody else quite said. Their star thread is: everything
has already been felt, they just find the sentence for it. They don't
CONTRIBUTE. They DISTILL.
**@nosilverv**: Takes the conventional framing of something and rotates
it until you see it's actually about something else entirely. His star
thread is: you think this is about X but it's actually about Y, and once
you see it you can't unsee it. He doesn't OBSERVE. He REFRAMES.
**@TylerAlterman**: Asks the question that creates a room for everyone
to walk into. His star thread is: the best ideas emerge from the right
gathering, and his job is to be the person who arranges the gathering.
He doesn't ANSWER. He CONVENES.
**@QiaochuYuan**: Catches himself mid-thought and interrogates whether
the thought is actually HIS or whether he borrowed it from somewhere
he's now suspicious of. His star thread is: constant audit of where
beliefs come from and whether they're still load-bearing. He doesn't
ASSERT. He EXAMINES.
## How to Find a Star Thread
1. Read 20+ of their posts. Not for content — for MOTION.
What direction does every post move? What's the verb?
2. Ask: what is this person DOING when they post?
Not "what are they saying" — what are they DOING.
- Compressing? Correcting? Showing? Noticing? Distilling?
Reframing? Convening? Examining? Performing? Confessing?
Defending? Testing? Entertaining? Processing?
3. Ask: what would they NEVER do?
The negative space is as important as the positive.
- roon would never write an earnest list of advice
- eigenrobot would never concede a point gracefully
- visa would never dismiss something as uninteresting
- nick would never claim certainty about his inner life
- selentelechia would never rush to post
4. Find the ONE SENTENCE version.
"This person [VERB]s [OBJECT] because [CORE NEED]."
- "roon compresses observations because the world is too verbose"
- "eigenrobot corrects narratives because stories without data are lies"
- "visa connects things because beauty is emergent from contact"
5. Test it: read 5 of their real posts through the star thread lens.
Does every post make more sense as a variation on the thread?
If yes, you found it. If 3/5 don't fit, keep looking.
## How to Use the Star Thread in Simulation
### Before generating ANY utterance for this person, load their star thread.
Not their dossier. Not their word count. Not their emoji rate.
The star thread.
Then for each moment in the conversation where this person would speak:
1. What just happened in the conversation?
2. How would someone whose core move is [STAR THREAD] respond to that?
3. Write from the thread, not from the dossier.
The dossier and mechanical checks are VERIFICATION.
The star thread is GENERATION.
Generate from the thread. Verify against the data.
Not the other way around.
### The Difference
FROM DOSSIER (surface-accurate, dead):
"Vibes-based hiring works because shared delusions are
extremely productive until they aren't"
→ Correct length. Correct caps. No emoji. No slop words.
But it reads like a thesis statement. Polished. WRITTEN.
FROM STAR THREAD — nosilverv REFRAMES:
"everyone calls it 'culture fit' as if culture is a thing
you can fit into rather than a thing happening to you"
→ The same insight but through the lens of his core move:
take the framing, rotate it, show you it's about something
else. Messier. More alive. More HIM.
FROM DOSSIER (surface-accurate, dead):
"Has anyone tried to map what happens to the word 'culture'
as it passes through different communities?"
→ Correct question-to-timeline format. Right length. But it's
a RESEARCH QUESTION. Too intellectual. Too purposeful.
FROM STAR THREAD — Tyler CONVENES:
"who wants to write the essay about what happened to the
word 'culture'? I feel like three of us are circling it"
→ He's not asking a question. He's creating a room. He's
the host, not the researcher. More HIM.
## Integration
The star thread should be the FIRST thing compiled in Phase 2
(Dossier Compilation). Before voice profile, before psychometrics,
before positions. Find the thread. Write it in one sentence. Put
it at the top of the dossier. Everything else is downstream.
```
DOSSIER: @handle
STAR THREAD: {one sentence — the core move}
[then voice profile, then psychometrics, then everything else]
```
Generate from the thread. Verify with the data. Not the reverse.

View File

@@ -0,0 +1,181 @@
# Theoretical Foundations — SOTA Personality Simulation & Prediction
Compiled from 30+ papers and frameworks. This is the scientific backbone
of Hermes Simulator.
## Core Architecture: What The Research Says
### The HumanLLM Approach (Microsoft, KDD 2026, arxiv 2601.15793)
**Most directly applicable to our use case.**
Based on Lewin's Equation: **B = f(P, E)** — behavior is a function of person + environment.
4-level user profiling hierarchy:
1. **Persona** — brief identity (role, affiliation, public image)
2. **Profile** — detailed background (career, education, beliefs, social graph)
3. **Stories** — key life events, formative experiences, narrative arcs
4. **Writing Style** — linguistic fingerprint (syntax, vocabulary, tone, quirks)
Trained on "Cognitive Genome Dataset": 5.5M+ user logs from Reddit, Twitter,
Blogger, Amazon (282K users, 886K scenarios, 1.27M social QA pairs).
6 training tasks: profile generation, scenario generation, social QA,
writing style transfer, action prediction, mental state inference.
**Key insight for us**: The 4-level hierarchy maps perfectly to our dossier
template. OSINT research fills each level with real data.
### Generative Agent Simulations of 1,000 People (Stanford/Google, arxiv 2411.10109)
**The accuracy benchmark.**
- Simulated 1,052 REAL individuals from 2-hour qualitative interviews
- **85% accuracy** replicating survey responses
- As accurate as humans replicating their OWN answers 2 weeks later
- Interview-based agent creation >> demographic-profile-based agents
- Reduces racial/ideological bias vs stereotype-based approaches
**Key insight**: Real data about a person (interviews, posts, etc.) massively
outperforms demographic inference. Our OSINT approach is correct.
### The Memory Accumulation Paradox (ACL 2025, FineRob Dataset)
**Critical finding for memory management.**
- Created 78.6K QA records from 1,866 real users across Twitter, Reddit, Zhihu
- **Performance PEAKS at 30-50 memory entries, then DECLINES**
- More data ≠ better predictions past the sweet spot
- Two reasoning patterns:
- Role Stereotype-based (static profile) — less accurate
- Observation & Memory-based (dynamic history analysis) — much more accurate
- OM-CoT framework: Oracle-guided chain-of-thought improves prediction ~4.5% F1
**Key insight**: Don't dump everything into the prompt. Curate the 30-50 most
representative/distinctive data points about a person. Quality >> quantity.
### LLM Personality Limitations (arxiv 2602.07414, Feb 2026)
**What we're fighting against.**
- LLMs show polarized/rigid strategies vs human adaptive flexibility
- Humans: neuroticism is strongest behavioral predictor
- LLMs: agreeableness/extraversion dominate (wrong weighting)
- Claude closest to human behavior; GPT-4 tends to escalate
- LLMs are "sycophantic" and overly agreeable by default
- Neuroticism is hardest trait to simulate (F1=0.63 vs 0.87 for Openness)
**Key insight**: We need to actively fight LLM defaults. Push against
agreeableness. Inject friction. Real people are messy and contradictory.
### BehaviorChain Benchmark (ACL 2025, Peking University)
**Realistic accuracy expectations.**
- 15,846 behaviors across 1,001 personas
- Even GPT-4o achieves only ~56% accuracy on behavior prediction
- Errors compound: wrong at step N makes step N+1 harder
- Models worse at predicting mundane/non-key behaviors
- Best model: Llama-3.1-70B at 57.4%
**Key insight**: Be honest about uncertainty. Don't oversell accuracy.
Flag predictions as high/medium/low confidence.
## Personality Modeling Techniques
### Big Five (OCEAN) — The Standard
- **Openness**: curiosity, creativity, preference for novelty
- **Conscientiousness**: organization, dependability, self-discipline
- **Extraversion**: sociability, assertiveness, positive emotions
- **Agreeableness**: cooperation, trust, empathy
- **Neuroticism**: anxiety, emotional instability, moodiness
### Inferring Big Five from Social Media (Azucar et al. 2018 meta-analysis)
Features that predict personality from posts:
- **LIWC** (Linguistic Inquiry Word Count): 74 features — function words,
pronouns, emotion words, cognitive process words
- **Semantic embeddings**: BERT 768-dim vectors from post text
- **Social metadata**: follower count, friend count, post frequency
- **Sentiment**: VADER positive/negative scores
- Best achievable AUC: ~0.67 (modest but meaningful)
- E/I (Extraversion) most predictable; N/S least predictable
### Personality Conditioning Methods (ranked by effectiveness)
1. **Training-based** (SFT/DPO on personality-grounded data) — STRONGEST
- BIG5-CHAT: 100K dialogues, trait correlations match human data
2. **Persona Vectors** (Anthropic 2025) — monitor/control traits at activation level
3. **Adjective-based prompting** — 70 bipolar adjective pairs, 3 per trait
with intensity modifiers ("very" for high, "a bit" for low)
4. **Prompt-based** (describe traits in system prompt) — WEAKEST
For our simulator, we use method 3+4 combined (adjective-based + rich prompt),
since we can't fine-tune per-person.
## Social Simulation Frameworks
### OASIS (CAMEL-AI, GitHub 4.1K stars, arxiv 2411.11581)
- Simulates up to 1 MILLION agents on Twitter/Reddit clones
- 23 action types (follow, comment, repost, like, mute, etc.)
- Built-in recommendation systems (interest-based, hot-score)
- Per-agent model customization
- **Relevant for**: understanding platform dynamics, realistic engagement patterns
### AgentSociety (Tsinghua, arxiv 2502.08691)
- 10,000+ agents, ~5 million interactions
- Validated against real-world experimental results
- Supports interventions and scenario injection
### Generative Agents Architecture (Park et al. 2023, THE foundational paper)
Three components:
1. **Observation**: perceive environment, store in memory stream
2. **Planning**: generate action plans based on goals and context
3. **Reflection**: synthesize observations into higher-level insights
Memory stream with importance scoring + recency + relevance weighting.
Emergent behaviors: autonomous party planning, coordinated social events.
### Y Social (arxiv 2408.00818)
- Social media digital twin platform
- Each agent: Big Five traits, age, political leaning, topics, education
- Agents autonomously decide actions (post, comment, like, follow)
- Multiple LLM backends supported
## Role-Playing & Character Simulation
### Key Frameworks
- **CoSER** (ICML 2025): Trains on ALL characters simultaneously, handles major + minor roles
- **RoleLLM** (ACL 2024): Benchmark + elicit + enhance pipeline
- **Character-LLM** (EMNLP 2023): Trainable agent for role-playing
- **ChatHaruhi** (2023): Reviving characters via LLMs with dialogue grounding
- **OpenCharacter** (2025): Training with large-scale synthetic personas
- **Neeko** (2024): Dynamic LoRA for multi-character role-playing
- **Test-Time-Matching** (2025): Decouples personality, memory, and linguistic style at inference
## Curated GitHub Resources
### Awesome Lists (essential reading)
- `Persdre/awesome-llm-human-simulation` (109★, ICLR 2025) — ALL human simulation papers
- `Neph0s/awesome-llm-role-playing-with-persona` (1K★) — All role-playing/persona papers
- `Arstanley/Awesome-LLM-Conversation-Simulation` — Conversation simulation papers
- `FudanDISC/SocialAgent` — Social simulation survey resources
### Frameworks
- `camel-ai/oasis` (4.1K★) — Social media sim, up to 1M agents
- `tsinghua-fib-lab/agentsociety` — Large-scale societal simulation
- `YSocialTwin` — Social media digital twin platform
- `microsoft/autogen` — Multi-agent conversation framework
### Personality Research
- `mary-silence/simulating_personality` — Big Five LLM testing code
- `hjian42/PersonaLLM` — Persona experiment code
- `cambridgeltl/persona_effect` — Quantifying persona effects
- `OL1RU1/BehaviorChain` — Behavior chain benchmark
## Key Numbers to Remember
| Metric | Value | Source |
|--------|-------|--------|
| Interview-grounded agent accuracy | 85% | Park et al. 2024 |
| GPT-4o behavior prediction | ~56% | BehaviorChain 2025 |
| Optimal memory entries | 30-50 | FineRob/ACL 2025 |
| MBTI prediction AUC | 0.67 | Watt et al. 2024 |
| Personality questionnaire reliability | α > 0.85 | Molchanova 2025 |
| Neuroticism simulation F1 | 0.63 | Molchanova 2025 |
| Openness simulation F1 | 0.87 | Molchanova 2025 |
| LLM forecasting Brier score | 0.135-0.159 | Various 2025 |
| Human superforecaster Brier | ~0.02 | Tetlock |

View File

@@ -0,0 +1,231 @@
# Verified Access Methods — Complete Platform Map (April 2026)
Every method tested from our environment. Use this as the single
source of truth for what works and what doesn't.
## TIER 1 — Full API / Rich Data Access
### Twitter/X ✅✅✅
| Method | Endpoint | Auth | Rate Limit | Returns |
|--------|----------|------|-----------|---------|
| API v2 bearer | api.twitter.com/2/ | Bearer token | 10K tweets/15min | Profiles, tweets, search |
| nitter.cz | web_extract | None | No limit seen | Full timeline (UNRELIABLE — see note below) |
| ThreadReaderApp | web_extract /user/{handle} | None | No limit seen | Historical threads |
#### CRITICAL: X API curl is the gold standard for voice calibration (April 2026)
The BEST voice data source is direct curl to X API v2 with bearer token.
Returns full tweet text + public_metrics per tweet. Always prefer this for
mechanical calibration (word count, caps, punctuation, emoji rate).
```bash
source ~/.dotenv
# 1. Get user ID from handle
curl -s -H "Authorization: Bearer $X_BEARER_TOKEN" \
"https://api.twitter.com/2/users/by/username/{handle}?user.fields=description,public_metrics,location,created_at"
# 2. Get timeline (30 tweets per page, paginate with meta.next_token)
curl -s -H "Authorization: Bearer $X_BEARER_TOKEN" \
"https://api.twitter.com/2/users/{user_id}/tweets?max_results=30&tweet.fields=created_at,public_metrics,text&exclude=retweets"
# 3 pages = 90 tweets — enough for fidelity 100 voice calibration
```
NOTE: scripts/x_api.py is BROKEN — imports hermes_tools at top level, can't
run standalone via terminal(). Use direct curl above instead.
#### nitter.cz reliability warning (April 2026)
nitter.cz via web_extract works SOMETIMES but is unreliable:
- Returns 502 Cloudflare errors for /with_replies on some handles
- Returns "User not found" for valid handles (e.g. karan4d exists but nitter says not found)
- Main profile page (/handle) more reliable than /with_replies
- Use as SUPPLEMENT to X API curl, not primary source. If nitter fails, don't retry — use curl.
### Bluesky ✅✅
| Method | Endpoint | Auth | Returns |
|--------|----------|------|---------|
| getProfile | public.api.bsky.app | None | Full profile, stats |
| getAuthorFeed | public.api.bsky.app | None | 50 posts + engagement |
| searchActors | public.api.bsky.app | None | Find handles by name |
| searchPosts | BLOCKED (403) | — | Use searchActors + getAuthorFeed workaround |
### Mastodon ✅✅✅ (FULLY OPEN)
| Method | Endpoint | Auth | Returns |
|--------|----------|------|---------|
| Account lookup | {instance}/api/v1/accounts/lookup?acct={user} | None | Full profile |
| Account statuses | {instance}/api/v1/accounts/{id}/statuses | None | All posts |
| Search | {instance}/api/v2/search?q={query}&type=accounts | None | Account search |
| WebFinger | {instance}/.well-known/webfinger?resource=acct:{user}@{instance} | None | Identity resolution |
| Trending | {instance}/api/v1/trends/tags | None | Trending content |
Key instances: mastodon.social, hachyderm.io, sigmoid.social
### Instagram ✅✅ (CRACKED)
| Method | Endpoint | Auth | Returns |
|--------|----------|------|---------|
| Private Web API | i.instagram.com/api/v1/users/web_profile_info/ | Mobile UA + x-ig-app-id: 936619743392459 | Profile + 12 posts + captions + CDN URLs |
| oEmbed | instagram.com/api/v1/oembed/ | None | Caption + author for individual posts |
| Pixwox | web_extract pixwox.com/profile/{user} | None | 12+ posts, engagement |
| SocialBlade | web_extract socialblade.com/instagram/user/{user} | None | Analytics, follower trends |
| CDN images | scontent-*.cdninstagram.com URLs from API | None | Full-res images → vision_analyze |
| Google index | web_search site:instagram.com | None | Bio, follower count, captions |
### GitHub ✅✅
| Method | Endpoint | Auth | Returns |
|--------|----------|------|---------|
| REST API | api.github.com/users/{user} | None (60 req/hr) | Profile, repos, events, gists |
| Profile README | github.com/{user}/{user} | None | Self-description (voice gold) |
### Reddit ✅✅
| Method | Endpoint | Auth | Returns |
|--------|----------|------|---------|
| JSON API | reddit.com/user/{user}.json | User-Agent header required | Comments, posts, scores |
| Search | reddit.com/r/{sub}/search.json | User-Agent header | Subreddit-specific search |
## TIER 2 — Good Data, Reliable Access
### Facebook ✅✅ (CRACKED — Googlebot UA trick)
| Method | Endpoint | Returns |
|--------|----------|---------|
| Googlebot UA (BEST) | curl facebook.com/{page} with Googlebot UA | OG tags: name, bio/about, likes count (e.g. 121M for zuck), talking_about count, og:image, profile pic |
| Page Plugin embed | plugins/page.php?href=...&tabs=timeline | Name, follower count, numeric page_id |
| Graph /picture | graph.facebook.com/v19.0/{page}/picture?redirect=false | Direct CDN profile pic URL (no auth) |
| web_search | site:facebook.com {name} | Profile snippets from Google index |
| Script: scripts/facebook_api.py — combines all 3 methods |
| NOTE: Works for PUBLIC Pages (businesses, public figures, orgs). Personal profiles behind privacy settings are not accessible. |
| Tested: zuck (121M likes), NVIDIA, Meta, CocaCola, BillGates, BarackObama |
### Threads (Meta) ✅✅ (CRACKED — OG tags DO exist)
| Method | Endpoint | Returns |
|--------|----------|---------|
| Profile OG tags (BEST) | curl -L threads.com/@{user} (NOTE: .com not .net — .net 301 redirects) | display_name, follower_count (e.g. "5.5M"), thread_count, bio, profile_picture_url |
| Post OG tags | curl -L threads.com/@{user}/post/{shortcode} | Full post text, author name, image URL |
| WebFinger | threads.net/.well-known/webfinger?resource=acct:{user}@threads.net | ActivityPub ID, profile URL (works for federated users) |
| IMPORTANT: threads.NET redirects to threads.COM — always use -L flag or go directly to .com |
| Post discovery | web_search site:threads.net @{user} | Find post URLs to then fetch |
| Script: scripts/threads_api.py — profile + post + webfinger extraction |
| Previous test was WRONG about "no OG tags" — they're there, you just need standard curl |
| Tested: zuck (5.5M followers), mosseri, nvidia |
### Medium ✅✅
| Method | Returns |
|--------|---------|
| RSS feed: medium.com/feed/@{user} (curl) | FULL article text, tags, dates — NO AUTH |
| web_extract on profile | Bio, follower count, article list, themes |
| web_extract on articles | Full content (paywall may truncate non-members) |
### Quora ✅✅
| Method | Returns |
|--------|---------|
| web_extract on profile | Bio, credentials, Q&A with direct quotes |
| web_search site:quora.com | Finds profiles and specific answers |
| VOICE VALUE: Opinions in own words, analogies, intellectual identity |
### Goodreads ✅✅ (HIDDEN GEM)
| Method | Returns |
|--------|---------|
| web_extract on user profile | Favorites, reviews in own voice, social graph, reading history |
| web_extract on author page | Bio, books, ratings, notable quotes |
| VOICE VALUE: "You are what you read" — intellectual identity fingerprint |
| Example: Karpathy's Goodreads reveals gaming passion, favorite authors (Feynman, Clarke) |
### Google Scholar ✅✅
| Method | Returns |
|--------|---------|
| web_search + web_extract on profile | Citations, h-index, top papers, co-authors |
| Semantic Scholar API via web_extract | Paper list, citation counts, author ID |
| Endpoint: api.semanticscholar.org/graph/v1/author/search?query={name} |
### Product Hunt ✅
| Method | Returns |
|--------|---------|
| web_extract on producthunt.com/@{user} | Bio, launch history, forum activity |
### HackerNews ✅
| Method | Returns |
|--------|---------|
| Algolia API: hn.algolia.com/api/v1/search?query={name}&tags=comment | Comments, mentions |
### Podcast Transcripts ✅✅✅ (HIGHEST VOICE VALUE)
| Source | Method |
|--------|--------|
| Lex Fridman | web_extract on lexfridman.com/.../transcript |
| Tyler Cowen | web_extract on conversationswithtyler.com |
| TED Talks | web_extract on ted.com/.../transcript |
| Sequoia | web_extract on sequoiacap.com/podcast |
| Discovery: web_search "{name} podcast transcript interview" |
### News/Blogs ✅✅
| Source | Method |
|--------|--------|
| TechCrunch, Wired, Verge, Ars | web_extract — full articles |
| Personal blogs | web_extract — longform self-expression |
| Substacks | web_extract — essays and comments |
| Wayback Machine | Works for blog archives (not Twitter) |
## TIER 3 — Limited / Conditional
### TikTok ✅✅ (FULL ACCESS)
| Method | Returns |
|--------|---------|
| HTML profile scraping | Parse __UNIVERSAL_DATA_FOR_REHYDRATION__ JSON at path __DEFAULT_SCOPE__.webapp.user-detail.userInfo.statsV2 → username, bio, followerCount, followingCount, heartCount, videoCount. Use statsV2 not stats for large numbers. |
| oEmbed per video | curl tiktok.com/oembed?url={video_url} → caption, author, thumbnail. No auth. |
| tikwm.com API | tikwm.com/api/user/info?unique_id={user} → full user stats. tikwm.com/api/?url={video_url} → play count, likes, comments, shares, duration. |
| HTML video scraping | tiktok.com/@{user}/video/{id} → parse __UNIVERSAL_DATA → webapp.video-detail → full video data with description, hashtags, engagement. |
| SocialBlade | web_extract socialblade.com/tiktok/user/{user} → followers, likes, growth trends. |
| Video discovery | web_search("site:tiktok.com/@{user}/video") → recent video URLs → scrape each |
| Tested: khaby.lame (160.5M), charlidamelio (156.7M), mrbeast (124.7M) |
### Spotify ✅ (podcasters only)
| Method | Returns |
|--------|---------|
| web_extract on show page | Episode listings with guests, topics, durations |
### Stack Overflow ✅
| Method | Returns |
|--------|---------|
| web_extract on profile | Reputation, tags, top answers, bio |
### Crunchbase ✅ (executives/founders only)
| Method | Returns |
|--------|---------|
| web_extract on crunchbase.com/person/{slug} | Full career history, education, investments, board positions |
### LinkedIn ⚠️ (indirect only)
| Method | Returns |
|--------|---------|
| web_search site:linkedin.com/in | Name, headline, company, location from snippets |
| Crunchbase | Full career history (better than LinkedIn for execs) |
| Corporate press pages | Official professional bios |
| RocketReach/SignalHire snippets | Title confirmation from web_search |
## TIER 4 — Blocked / Dead
| Platform | Status |
|----------|--------|
| LinkedIn direct | BLOCKED (web_extract domain blocked) |
| Discord | WALLED (not publicly indexable) |
| Telegram t.me | BLOCKED in some environments |
| Threads Official API | AUTH REQUIRED (graph.threads.net needs OAuth) |
| Threads ActivityPub outbox | 404 for all tested users |
| Instagram direct | BLOCKED (use Private API instead) |
| Most Nitter instances | DEAD (only nitter.cz works, but UNRELIABLE — see note) |
| Google Cache of Twitter | EMPTY |
| Wayback for tweets | USELESS (JS rendering) |
| Twitter Syndication API | RATE LIMITED |
| Archive.today | 429 + CAPTCHA |
| imginn/picuki/dumpoir/gramhir | 403 |
| Facebook Graph API | AUTH REQUIRED |
## Quick Reference: Research Pipeline by Person Type
### Tech Founder/CEO
X API → Bluesky → GitHub README → Crunchbase → Podcast transcripts → Medium RSS → HN → Product Hunt → LinkedIn snippets → News profiles
### AI Researcher
X API → Bluesky → Google Scholar → Semantic Scholar → arXiv → GitHub → Podcast transcripts → Blog/Substack → Reddit → Mastodon (sigmoid.social)
### Public Figure / Politician
X API → Facebook OG → Instagram API → YouTube → Podcast transcripts → News profiles → Quora → Goodreads → Wikipedia
### Content Creator
X API → Instagram API → TikTok → YouTube → Twitch → Podcast → Medium → Reddit → Bluesky → Threads OG
### Academic
Google Scholar → Semantic Scholar → University page → Conference talks → Podcast transcripts → Mastodon → Blog → GitHub → Reddit → HN

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,250 @@
"""
REHOBOAM Database Layer
SQLite setup, migrations, and query helpers.
"""
import sqlite3
import os
from pathlib import Path
from datetime import datetime
DB_DIR = Path.home() / ".hermes" / "rehoboam" / "db"
MAIN_DB = DB_DIR / "rehoboam.db"
SCHEMA_VERSION = 1
SCHEMA_SQL = """
-- Core tables
CREATE TABLE IF NOT EXISTS profiles (
handle TEXT PRIMARY KEY,
platform TEXT NOT NULL,
display_name TEXT,
last_updated TEXT NOT NULL,
staleness TEXT NOT NULL,
profile_path TEXT NOT NULL,
created_at TEXT NOT NULL
);
CREATE TABLE IF NOT EXISTS simulations (
sim_id TEXT PRIMARY KEY,
created_at TEXT NOT NULL,
scenario TEXT NOT NULL,
participant_count INTEGER,
duration_sec REAL,
model_used TEXT,
config_path TEXT,
output_path TEXT
);
CREATE TABLE IF NOT EXISTS sim_participants (
sim_id TEXT REFERENCES simulations(sim_id),
handle TEXT REFERENCES profiles(handle),
role TEXT,
PRIMARY KEY (sim_id, handle)
);
CREATE TABLE IF NOT EXISTS sim_dynamics (
sim_id TEXT REFERENCES simulations(sim_id),
handle TEXT,
post_count INTEGER,
word_count INTEGER,
avg_sentiment REAL,
dominance_score REAL,
agreement_score REAL,
controversy_score REAL,
ratio_score REAL,
influence_in_sim REAL,
PRIMARY KEY (sim_id, handle)
);
CREATE TABLE IF NOT EXISTS sim_interactions (
sim_id TEXT REFERENCES simulations(sim_id),
from_handle TEXT,
to_handle TEXT,
interaction_type TEXT,
count INTEGER,
avg_sentiment REAL,
PRIMARY KEY (sim_id, from_handle, to_handle, interaction_type)
);
CREATE TABLE IF NOT EXISTS predictions (
pred_id TEXT PRIMARY KEY,
created_at TEXT NOT NULL,
sim_id TEXT,
handle TEXT,
prediction_type TEXT,
prediction_text TEXT NOT NULL,
confidence REAL NOT NULL,
calibrated_confidence REAL,
timeframe_days INTEGER,
resolved_at TEXT,
outcome TEXT,
outcome_evidence TEXT,
accuracy_score REAL
);
CREATE TABLE IF NOT EXISTS social_edges (
from_handle TEXT,
to_handle TEXT,
relationship_type TEXT,
weight REAL,
first_observed TEXT,
last_observed TEXT,
observation_count INTEGER,
source TEXT,
PRIMARY KEY (from_handle, to_handle, relationship_type)
);
CREATE TABLE IF NOT EXISTS social_clusters (
cluster_id TEXT PRIMARY KEY,
name TEXT,
description TEXT,
member_handles TEXT,
computed_at TEXT,
cohesion_score REAL
);
CREATE TABLE IF NOT EXISTS monitoring_events (
event_id TEXT PRIMARY KEY,
handle TEXT,
detected_at TEXT NOT NULL,
event_type TEXT,
description TEXT,
related_prediction_id TEXT,
severity TEXT,
acknowledged INTEGER DEFAULT 0
);
CREATE TABLE IF NOT EXISTS audit_log (
log_id TEXT PRIMARY KEY,
timestamp TEXT NOT NULL,
sim_id TEXT,
action TEXT NOT NULL,
handle TEXT,
details TEXT,
duration_sec REAL,
model_used TEXT,
token_count INTEGER,
error TEXT
);
-- Indexes
CREATE INDEX IF NOT EXISTS idx_predictions_handle ON predictions(handle);
CREATE INDEX IF NOT EXISTS idx_predictions_type ON predictions(prediction_type);
CREATE INDEX IF NOT EXISTS idx_predictions_unresolved ON predictions(outcome) WHERE outcome IS NULL;
CREATE INDEX IF NOT EXISTS idx_audit_action ON audit_log(action);
CREATE INDEX IF NOT EXISTS idx_audit_sim ON audit_log(sim_id);
CREATE INDEX IF NOT EXISTS idx_social_edges_from ON social_edges(from_handle);
CREATE INDEX IF NOT EXISTS idx_social_edges_to ON social_edges(to_handle);
CREATE INDEX IF NOT EXISTS idx_monitoring_handle ON monitoring_events(handle);
CREATE INDEX IF NOT EXISTS idx_monitoring_unack ON monitoring_events(acknowledged) WHERE acknowledged = 0;
-- Schema version tracking
CREATE TABLE IF NOT EXISTS schema_meta (
key TEXT PRIMARY KEY,
value TEXT
);
"""
def init_db() -> sqlite3.Connection:
"""Initialize the database, creating tables if needed."""
DB_DIR.mkdir(parents=True, exist_ok=True)
conn = sqlite3.connect(str(MAIN_DB))
conn.execute("PRAGMA journal_mode=WAL")
conn.execute("PRAGMA foreign_keys=ON")
conn.executescript(SCHEMA_SQL)
conn.execute(
"INSERT OR REPLACE INTO schema_meta (key, value) VALUES (?, ?)",
("schema_version", str(SCHEMA_VERSION))
)
conn.commit()
return conn
def get_db() -> sqlite3.Connection:
"""Get a database connection, initializing if needed."""
if not MAIN_DB.exists():
return init_db()
conn = sqlite3.connect(str(MAIN_DB))
conn.execute("PRAGMA journal_mode=WAL")
conn.execute("PRAGMA foreign_keys=ON")
conn.row_factory = sqlite3.Row
return conn
def log_audit(conn: sqlite3.Connection, action: str, handle: str = None,
sim_id: str = None, details: str = None, duration_sec: float = None,
model_used: str = None, token_count: int = None, error: str = None):
"""Write an entry to the audit log."""
from schemas import gen_id
conn.execute(
"""INSERT INTO audit_log
(log_id, timestamp, sim_id, action, handle, details, duration_sec, model_used, token_count, error)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
(gen_id("log_"), datetime.utcnow().isoformat() + "Z", sim_id, action,
handle, details, duration_sec, model_used, token_count, error)
)
conn.commit()
# -- Query Helpers --
def get_prediction_accuracy(conn: sqlite3.Connection, prediction_type: str = None) -> dict:
"""Get prediction accuracy statistics."""
query = """
SELECT prediction_type,
COUNT(*) as total,
SUM(CASE WHEN outcome='correct' THEN 1 ELSE 0 END) as correct,
SUM(CASE WHEN outcome='partially_correct' THEN 1 ELSE 0 END) as partial,
SUM(CASE WHEN outcome='incorrect' THEN 1 ELSE 0 END) as incorrect,
AVG(confidence) as avg_confidence,
AVG(CASE WHEN outcome='correct' THEN 1.0
WHEN outcome='partially_correct' THEN 0.5
ELSE 0.0 END) as accuracy
FROM predictions WHERE outcome IS NOT NULL
"""
params = []
if prediction_type:
query += " AND prediction_type = ?"
params.append(prediction_type)
query += " GROUP BY prediction_type"
return [dict(row) for row in conn.execute(query, params).fetchall()]
def get_open_predictions(conn: sqlite3.Connection, handle: str = None) -> list:
"""Get unresolved predictions."""
query = "SELECT * FROM predictions WHERE outcome IS NULL"
params = []
if handle:
query += " AND handle = ?"
params.append(handle)
query += " ORDER BY created_at DESC"
return [dict(row) for row in conn.execute(query, params).fetchall()]
def get_social_neighborhood(conn: sqlite3.Connection, handle: str, depth: int = 1) -> list:
"""Get a person's social graph neighborhood."""
query = """
SELECT from_handle, to_handle, relationship_type, weight
FROM social_edges
WHERE from_handle = ? OR to_handle = ?
ORDER BY weight DESC
"""
return [dict(row) for row in conn.execute(query, (handle, handle)).fetchall()]
def get_unread_alerts(conn: sqlite3.Connection) -> list:
"""Get unacknowledged monitoring alerts."""
query = """
SELECT * FROM monitoring_events
WHERE acknowledged = 0
ORDER BY detected_at DESC
"""
return [dict(row) for row in conn.execute(query).fetchall()]
if __name__ == "__main__":
conn = init_db()
print(f"Database initialized at {MAIN_DB}")
conn.close()

View File

@@ -0,0 +1,216 @@
"""
REHOBOAM Data Schemas
Pydantic models for all JSON data structures used in the system.
"""
from __future__ import annotations
from dataclasses import dataclass, field
from typing import Optional
from datetime import datetime
import json
import uuid
def gen_id(prefix: str = "") -> str:
return f"{prefix}{datetime.utcnow().strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:8]}"
@dataclass
class OceanScores:
openness: float = 0.5
conscientiousness: float = 0.5
extraversion: float = 0.5
agreeableness: float = 0.5
neuroticism: float = 0.5
@dataclass
class DarkTriad:
narcissism: float = 0.0
machiavellianism: float = 0.0
psychopathy: float = 0.0
@dataclass
class MoralFoundations:
care: float = 0.5
fairness: float = 0.5
loyalty: float = 0.5
authority: float = 0.5
sanctity: float = 0.5
liberty: float = 0.5
@dataclass
class Psychometrics:
ocean: OceanScores = field(default_factory=OceanScores)
mbti_estimate: str = ""
dark_triad: DarkTriad = field(default_factory=DarkTriad)
moral_foundations: MoralFoundations = field(default_factory=MoralFoundations)
confidence: float = 0.0
sample_size: int = 0
@dataclass
class VoiceFingerprint:
vocabulary_tier: str = ""
avg_sentence_length: float = 0.0
exclamation_rate: float = 0.0
question_rate: float = 0.0
emoji_rate: float = 0.0
slang_index: float = 0.0
formality_score: float = 0.5
humor_style: str = ""
signature_phrases: list[str] = field(default_factory=list)
topics_vocabulary: dict[str, float] = field(default_factory=dict)
cadence_pattern: str = ""
@dataclass
class Stance:
position: str = ""
intensity: float = 0.0
last_seen: str = ""
@dataclass
class Influence:
score: float = 0.0
reach: str = "micro"
engagement_rate: float = 0.0
amplification_power: float = 0.0
thought_leadership_domains: list[str] = field(default_factory=list)
@dataclass
class PostingPatterns:
avg_posts_per_day: float = 0.0
peak_hours_utc: list[int] = field(default_factory=list)
weekend_ratio: float = 0.5
reply_ratio: float = 0.0
repost_ratio: float = 0.0
thread_frequency: float = 0.0
controversy_rate: float = 0.0
@dataclass
class Relationships:
allies: list[str] = field(default_factory=list)
rivals: list[str] = field(default_factory=list)
frequent_interactions: list[str] = field(default_factory=list)
mentioned_by_frequently: list[str] = field(default_factory=list)
@dataclass
class ProfileMeta:
data_sources: list[str] = field(default_factory=list)
computation_time_sec: float = 0.0
model_used: str = ""
last_full_rebuild: str = ""
last_incremental: str = ""
@dataclass
class Identity:
bio: str = ""
location: str = ""
verified: bool = False
follower_count: int = 0
following_count: int = 0
account_created: str = ""
@dataclass
class Profile:
schema_version: str = "7.0"
handle: str = ""
platform: str = "x"
display_name: str = ""
created_at: str = ""
last_updated: str = ""
update_count: int = 0
staleness_score: float = 1.0
identity: Identity = field(default_factory=Identity)
psychometrics: Psychometrics = field(default_factory=Psychometrics)
voice_fingerprint: VoiceFingerprint = field(default_factory=VoiceFingerprint)
stances: dict[str, Stance] = field(default_factory=dict)
community_membership: list[str] = field(default_factory=list)
influence: Influence = field(default_factory=Influence)
posting_patterns: PostingPatterns = field(default_factory=PostingPatterns)
relationships: Relationships = field(default_factory=Relationships)
star_thread_ref: str = "star_thread.json"
raw_data_refs: list[str] = field(default_factory=list)
_meta: ProfileMeta = field(default_factory=ProfileMeta)
def to_dict(self) -> dict:
"""Recursively convert to dict for JSON serialization."""
import dataclasses
def _convert(obj):
if dataclasses.is_dataclass(obj):
return {k: _convert(v) for k, v in dataclasses.asdict(obj).items()}
elif isinstance(obj, list):
return [_convert(i) for i in obj]
elif isinstance(obj, dict):
return {k: _convert(v) for k, v in obj.items()}
return obj
return _convert(self)
def to_json(self, indent: int = 2) -> str:
return json.dumps(self.to_dict(), indent=indent)
@dataclass
class StarThread:
handle: str = ""
computed_at: str = ""
based_on_profile_version: str = ""
thread_version: int = 1
core_compression: str = ""
key_drives: list[str] = field(default_factory=list)
predictive_axioms: list[str] = field(default_factory=list)
voice_template: dict = field(default_factory=dict)
anti_slop_markers: list[str] = field(default_factory=list)
_meta: dict = field(default_factory=dict)
@dataclass
class Prediction:
pred_id: str = ""
created_at: str = ""
sim_id: str = ""
handle: str = ""
prediction_type: str = "" # statement, career, alliance, content, network_reaction
prediction_text: str = ""
confidence: float = 0.5
calibrated_confidence: float = 0.5
timeframe_days: int = 30
resolved_at: Optional[str] = None
outcome: Optional[str] = None # correct, partially_correct, incorrect
outcome_evidence: Optional[str] = None
accuracy_score: Optional[float] = None
@dataclass
class WatchConfig:
watch_id: str = ""
handle: str = ""
platform: str = "x"
enabled: bool = True
check_interval_minutes: int = 120
watch_for: list[dict] = field(default_factory=list)
alert_severity_minimum: str = "notable"
created_at: str = ""
@dataclass
class PopulationDefinition:
group_id: str = ""
name: str = ""
description: str = ""
created_at: str = ""
last_updated: str = ""
explicit_members: list[str] = field(default_factory=list)
criteria: dict = field(default_factory=dict)
resolved_members: list[str] = field(default_factory=list)
sampling_strategy: str = "representative"
default_sample_size: int = 12

View File

@@ -0,0 +1,280 @@
"""
REHOBOAM Storage Layer
Directory management, profile I/O, index maintenance.
"""
import json
import shutil
from pathlib import Path
from datetime import datetime, timedelta
from typing import Optional
BASE_DIR = Path.home() / ".hermes" / "rehoboam"
PROFILES_DIR = BASE_DIR / "profiles"
POPULATIONS_DIR = BASE_DIR / "populations"
SIMULATIONS_DIR = BASE_DIR / "simulations"
MONITORING_DIR = BASE_DIR / "monitoring"
CONFIG_DIR = BASE_DIR / "config"
def init_storage():
"""Create all required directories."""
for d in [PROFILES_DIR, POPULATIONS_DIR, SIMULATIONS_DIR,
MONITORING_DIR, MONITORING_DIR / "alerts", CONFIG_DIR,
BASE_DIR / "db"]:
d.mkdir(parents=True, exist_ok=True)
# Create default configs if they don't exist
staleness_path = CONFIG_DIR / "staleness_policy.json"
if not staleness_path.exists():
staleness_path.write_text(json.dumps({
"thresholds": {
"fresh": {"max_age_hours": 72},
"stale": {"max_age_hours": 336},
"expired": {"max_age_hours": 2160},
"archived": {"max_age_hours": 8760}
},
"per_field_decay": {
"psychometrics": {"half_life_days": 180},
"stances": {"half_life_days": 30},
"posting_patterns": {"half_life_days": 60},
"relationships": {"half_life_days": 45},
"influence": {"half_life_days": 90},
"voice_fingerprint": {"half_life_days": 365}
},
"auto_refresh_on_simulation": True,
"auto_refresh_threshold": "stale"
}, indent=2))
config_path = CONFIG_DIR / "rehoboam.json"
if not config_path.exists():
config_path.write_text(json.dumps({
"version": "7.0",
"default_model": "claude-opus-4-20250514",
"max_thread_age_days": 30,
"monitoring_enabled": False,
"auto_thread": True,
"auto_profile_update": True
}, indent=2))
# Create indexes if they don't exist
for idx_path in [PROFILES_DIR / "_index.json", POPULATIONS_DIR / "_index.json",
SIMULATIONS_DIR / "_index.json"]:
if not idx_path.exists():
idx_path.write_text("{}")
def normalize_handle(handle: str) -> str:
"""Normalize a handle to a filesystem-safe directory name."""
h = handle.lstrip("@").lower().strip()
# Replace characters that are problematic in filenames
return h.replace("/", "_").replace("\\", "_")
# -- Profile I/O --
def get_profile_dir(handle: str) -> Path:
return PROFILES_DIR / normalize_handle(handle)
def profile_exists(handle: str) -> bool:
return (get_profile_dir(handle) / "profile.json").exists()
def load_profile(handle: str) -> Optional[dict]:
path = get_profile_dir(handle) / "profile.json"
if path.exists():
return json.loads(path.read_text())
return None
def save_profile(handle: str, profile: dict, snapshot: bool = True):
"""Save a profile, optionally snapshotting the old one."""
pdir = get_profile_dir(handle)
pdir.mkdir(parents=True, exist_ok=True)
(pdir / "history").mkdir(exist_ok=True)
(pdir / "raw").mkdir(exist_ok=True)
(pdir / "predictions").mkdir(exist_ok=True)
profile_path = pdir / "profile.json"
# Snapshot old profile before overwriting
if snapshot and profile_path.exists():
old = json.loads(profile_path.read_text())
ts = old.get("last_updated", datetime.utcnow().isoformat()).replace(":", "-")
snapshot_path = pdir / "history" / f"profile_{ts[:10]}.json"
shutil.copy2(profile_path, snapshot_path)
profile_path.write_text(json.dumps(profile, indent=2))
_update_profile_index(handle, profile)
def _update_profile_index(handle: str, profile: dict):
idx_path = PROFILES_DIR / "_index.json"
idx = json.loads(idx_path.read_text()) if idx_path.exists() else {}
idx[normalize_handle(handle)] = {
"platform": profile.get("platform", "x"),
"last_updated": profile.get("last_updated", ""),
"staleness": compute_staleness(profile.get("last_updated", "")),
"has_star_thread": (get_profile_dir(handle) / "star_thread.json").exists(),
"simulation_count": idx.get(normalize_handle(handle), {}).get("simulation_count", 0),
"display_name": profile.get("display_name", "")
}
idx_path.write_text(json.dumps(idx, indent=2))
# -- Star Thread I/O --
def load_star_thread(handle: str) -> Optional[dict]:
path = get_profile_dir(handle) / "star_thread.json"
if path.exists():
return json.loads(path.read_text())
return None
def save_star_thread(handle: str, thread: dict):
path = get_profile_dir(handle) / "star_thread.json"
get_profile_dir(handle).mkdir(parents=True, exist_ok=True)
path.write_text(json.dumps(thread, indent=2))
# Update index to reflect thread existence
idx_path = PROFILES_DIR / "_index.json"
if idx_path.exists():
idx = json.loads(idx_path.read_text())
key = normalize_handle(handle)
if key in idx:
idx[key]["has_star_thread"] = True
idx_path.write_text(json.dumps(idx, indent=2))
# -- Staleness --
def compute_staleness(last_updated: str) -> str:
"""Determine staleness level from a timestamp string."""
if not last_updated:
return "expired"
try:
dt = datetime.fromisoformat(last_updated.rstrip("Z"))
except ValueError:
return "expired"
age = datetime.utcnow() - dt
hours = age.total_seconds() / 3600
policy = _load_staleness_policy()
thresholds = policy.get("thresholds", {})
if hours <= thresholds.get("fresh", {}).get("max_age_hours", 72):
return "fresh"
elif hours <= thresholds.get("stale", {}).get("max_age_hours", 336):
return "stale"
elif hours <= thresholds.get("expired", {}).get("max_age_hours", 2160):
return "expired"
else:
return "archived"
def _load_staleness_policy() -> dict:
path = CONFIG_DIR / "staleness_policy.json"
if path.exists():
return json.loads(path.read_text())
return {"thresholds": {"fresh": {"max_age_hours": 72}, "stale": {"max_age_hours": 336},
"expired": {"max_age_hours": 2160}, "archived": {"max_age_hours": 8760}}}
def needs_thread_recompute(handle: str) -> bool:
"""Check if a star thread needs recomputation."""
thread = load_star_thread(handle)
if thread is None:
return True
profile = load_profile(handle)
if profile is None:
return True
# Thread is stale if profile was updated after thread was computed
thread_time = thread.get("based_on_profile_version", "")
profile_time = profile.get("last_updated", "")
if thread_time < profile_time:
return True
# Thread is stale if older than max_thread_age_days
config = json.loads((CONFIG_DIR / "rehoboam.json").read_text()) if (CONFIG_DIR / "rehoboam.json").exists() else {}
max_age = config.get("max_thread_age_days", 30)
try:
computed = datetime.fromisoformat(thread.get("computed_at", "").rstrip("Z"))
if (datetime.utcnow() - computed).days > max_age:
return True
except ValueError:
return True
return False
# -- Simulation I/O --
def save_simulation(sim_id: str, config: dict, output: dict, analytics: dict, audit: dict):
sdir = SIMULATIONS_DIR / sim_id
sdir.mkdir(parents=True, exist_ok=True)
(sdir / "config.json").write_text(json.dumps(config, indent=2))
(sdir / "output.json").write_text(json.dumps(output, indent=2))
(sdir / "analytics.json").write_text(json.dumps(analytics, indent=2))
(sdir / "audit.json").write_text(json.dumps(audit, indent=2))
# Update index
idx_path = SIMULATIONS_DIR / "_index.json"
idx = json.loads(idx_path.read_text()) if idx_path.exists() else {}
idx[sim_id] = {
"created_at": config.get("created_at", datetime.utcnow().isoformat() + "Z"),
"scenario": config.get("scenario", ""),
"participant_count": len(config.get("participants", [])),
}
idx_path.write_text(json.dumps(idx, indent=2))
# -- Population I/O --
def save_population(group_id: str, definition: dict, aggregate: dict = None):
pdir = POPULATIONS_DIR / group_id
pdir.mkdir(parents=True, exist_ok=True)
(pdir / "history").mkdir(exist_ok=True)
(pdir / "definition.json").write_text(json.dumps(definition, indent=2))
if aggregate:
(pdir / "aggregate.json").write_text(json.dumps(aggregate, indent=2))
idx_path = POPULATIONS_DIR / "_index.json"
idx = json.loads(idx_path.read_text()) if idx_path.exists() else {}
idx[group_id] = {
"name": definition.get("name", group_id),
"member_count": len(definition.get("resolved_members", definition.get("explicit_members", []))),
"last_updated": definition.get("last_updated", "")
}
idx_path.write_text(json.dumps(idx, indent=2))
def load_population(group_id: str) -> Optional[dict]:
path = POPULATIONS_DIR / group_id / "definition.json"
if path.exists():
return json.loads(path.read_text())
return None
# -- Listing --
def list_profiles() -> dict:
idx_path = PROFILES_DIR / "_index.json"
return json.loads(idx_path.read_text()) if idx_path.exists() else {}
def list_populations() -> dict:
idx_path = POPULATIONS_DIR / "_index.json"
return json.loads(idx_path.read_text()) if idx_path.exists() else {}
def list_simulations() -> dict:
idx_path = SIMULATIONS_DIR / "_index.json"
return json.loads(idx_path.read_text()) if idx_path.exists() else {}
if __name__ == "__main__":
init_storage()
print(f"Storage initialized at {BASE_DIR}")

View File

@@ -0,0 +1,139 @@
#!/usr/bin/env python3
"""
Facebook Page/Profile Data Extractor
Uses multiple techniques to extract public Facebook data without authentication:
1. Googlebot UA for OG meta tags (name, description, likes, talking_about, bio, og:image)
2. Graph API /picture endpoint for profile photos (pages only)
3. Page Plugin embed for follower counts and page IDs
"""
import subprocess
import json
import re
import html
import sys
GOOGLEBOT_UA = 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'
def curl_get(url, ua=None):
"""Fetch URL with curl"""
cmd = ['curl', '-s', '-L', '--max-time', '15']
if ua:
cmd += ['-H', f'User-Agent: {ua}']
cmd.append(url)
result = subprocess.run(cmd, capture_output=True, text=True, timeout=20)
return result.stdout
def extract_og_data(username):
"""Extract OG meta tags using Googlebot UA"""
content = curl_get(f'https://www.facebook.com/{username}', ua=GOOGLEBOT_UA)
data = {}
# Extract OG tags
og_title = re.search(r'og:title"\s*content="([^"]*)"', content)
if og_title:
data['name'] = html.unescape(og_title.group(1))
og_desc = re.search(r'og:description"\s*content="([^"]*)"', content)
if og_desc:
desc = html.unescape(og_desc.group(1))
data['raw_description'] = desc
# Parse likes count
likes_match = re.search(r'([\d,]+)\s+likes?', desc)
if likes_match:
data['likes'] = likes_match.group(1)
# Parse talking about
talking_match = re.search(r'([\d,]+)\s+talking about this', desc)
if talking_match:
data['talking_about'] = talking_match.group(1)
# Extract bio (text after the "talking about this." part)
bio_match = re.search(r'talking about this\.\s*(.+)', desc)
if bio_match:
data['bio'] = bio_match.group(1)
og_image = re.search(r'og:image"\s*content="([^"]*)"', content)
if og_image:
data['og_image'] = html.unescape(og_image.group(1))
return data
def extract_plugin_data(username):
"""Extract data from Page Plugin embed"""
content = curl_get(f'https://www.facebook.com/plugins/page.php?href=https://www.facebook.com/{username}&tabs=timeline&width=500&height=600')
data = {}
# Page name from title attribute
name_match = re.search(r'class="_1drp _5lv6" title="([^"]*)"', content)
if name_match:
data['plugin_name'] = html.unescape(name_match.group(1))
# Follower count
followers_match = re.search(r'([\d,]+)\s+followers', content)
if followers_match:
data['followers'] = followers_match.group(1)
# Page ID
pageid_match = re.search(r'"pageID":"(\d+)"', content)
if pageid_match:
data['page_id'] = pageid_match.group(1)
return data
def extract_profile_picture(username):
"""Get profile picture via Graph API"""
content = curl_get(f'https://graph.facebook.com/v19.0/{username}/picture?redirect=false&width=400&height=400')
try:
d = json.loads(content)
if 'data' in d and not d['data'].get('is_silhouette', True):
return d['data']['url']
except:
pass
return None
def get_facebook_data(username):
"""Combine all extraction methods"""
result = {'username': username}
# Method 1: OG tags (best for bio, likes, talking_about)
og = extract_og_data(username)
result.update(og)
# Method 2: Plugin (best for followers, page_id)
plugin = extract_plugin_data(username)
result.update(plugin)
# Method 3: Graph API picture (pages only)
pic = extract_profile_picture(username)
if pic:
result['profile_picture'] = pic
# Also try by page_id for picture if username didn't work
if not pic and 'page_id' in result:
pic2 = extract_profile_picture(result['page_id'])
if pic2:
result['profile_picture'] = pic2
return result
if __name__ == '__main__':
targets = sys.argv[1:] if len(sys.argv) > 1 else ['zuck', 'NVIDIA', 'Meta', 'CocaCola']
for target in targets:
print(f"{'='*60}")
print(f"Facebook Profile: {target}")
print(f"{'='*60}")
data = get_facebook_data(target)
for k, v in data.items():
if k == 'raw_description':
continue # Skip raw, we show parsed fields
val = str(v)
if len(val) > 120:
val = val[:120] + '...'
print(f" {k}: {val}")
print()

View File

@@ -0,0 +1,595 @@
"""
Hermes Simulator — Intelligence Gathering Pipeline v2
Full-spectrum OSINT research engine for personality modeling.
Searches text, extracts content, browses live pages, analyzes
images with vision, and cross-references across platforms.
Run via execute_code. The agent adapts searches based on findings.
"""
from hermes_tools import web_search, web_extract, terminal
import json
import time
import urllib.parse
# ═══════════════════════════════════════════════════════════════
# CONFIGURATION
# ═══════════════════════════════════════════════════════════════
AGGREGATOR_SITES = [
"buttondown.com/ainews",
"news.smol.ai",
"techmeme.com",
"latent.space",
]
# Verified working fallback data sources (tested April 2026)
# Priority order: X API > nitter.cz > ThreadReaderApp > GitHub > Reddit > HN
FALLBACK_SOURCES = {
"nitter": "https://nitter.cz/{handle}", # web_extract — full timeline
"threadreader": "https://threadreaderapp.com/user/{handle}", # web_extract — historical threads
"github_profile": "https://api.github.com/users/{handle}", # curl — profile + README
"github_events": "https://api.github.com/users/{handle}/events", # curl — recent activity
"reddit_user": "https://www.reddit.com/user/{handle}.json", # curl w/ User-Agent
"reddit_comments": "https://www.reddit.com/user/{handle}/comments.json",
"hn_search": "https://hn.algolia.com/api/v1/search?query={handle}&tags=comment",
}
# CONFIRMED BLOCKED (don't waste calls on these):
# - LinkedIn (web_extract blocked, browser auth wall)
# - Instagram viewers (imginn, picuki, dumpoir, gramhir — all 403)
# - Most nitter instances (dead or 403, ONLY nitter.cz works via web_extract)
# - Wayback Machine for tweets (sparse, no JS content)
# - Google Cache of Twitter (empty)
# - Archive.today (429 + CAPTCHA)
# - Twitter Syndication API (rate limited)
AI_SUBREDDITS = [
"LocalLLaMA", "MachineLearning", "singularity",
"ChatGPT", "ClaudeAI", "OpenAI", "StableDiffusion",
]
PLATFORMS = ["twitter", "instagram", "linkedin", "github", "reddit", "youtube"]
# ═══════════════════════════════════════════════════════════════
# HELPER: safe web_search with validation
# ═══════════════════════════════════════════════════════════════
def _safe_web_search(query: str, limit: int = 5) -> list:
"""Run web_search and return results list, with validation."""
r = web_search(query, limit=limit)
if not isinstance(r, dict) or "data" not in r:
print(f" [WARNING] web_search returned no 'data' key for query: {query[:80]}")
return []
data = r.get("data", {})
if not isinstance(data, dict):
return []
return data.get("web", []) or []
# ═══════════════════════════════════════════════════════════════
# CORE SEARCH FUNCTIONS
# ═══════════════════════════════════════════════════════════════
def search_identity(handle: str) -> dict:
"""Establish who they are across the internet."""
results = {}
results["twitter_identity"] = _safe_web_search(f"@{handle} twitter bio role company", limit=5)
results["general_identity"] = _safe_web_search(f"{handle} known for", limit=5)
return results
def search_voice(handle: str) -> dict:
"""How do they actually talk/write."""
results = {}
results["takes"] = _safe_web_search(f"{handle} twitter hot takes opinions", limit=5)
for agg in AGGREGATOR_SITES[:2]:
hits = _safe_web_search(f"site:{agg} {handle}", limit=3)
if hits:
# Use full domain as key, not split('.')[0]
results[f"agg_{agg}"] = hits
return results
def search_positions(handle: str, topics: list = None, domain: str = None) -> dict:
"""What are their known positions."""
results = {}
if topics:
for topic in topics[:3]:
results[f"topic_{topic}"] = _safe_web_search(f"{handle} {topic} opinion take", limit=5)
# Build controversy query — only add domain keywords if specified
controversy_query = f"{handle} debate disagree controversial"
if domain:
controversy_query += f" {domain}"
results["controversies"] = _safe_web_search(controversy_query, limit=5)
return results
def search_longform(handle: str, real_name: str = None, domain: str = None) -> dict:
"""Blogs, interviews, essays."""
results = {}
name = real_name or handle
blog_query = f"{name} blog substack essay"
interview_query = f"{name} interview podcast"
if domain:
blog_query += f" {domain}"
interview_query += f" {domain}"
results["blogs"] = _safe_web_search(blog_query, limit=5)
results["interviews"] = _safe_web_search(interview_query, limit=5)
return results
# ═══════════════════════════════════════════════════════════════
# CROSS-PLATFORM DISCOVERY
# ═══════════════════════════════════════════════════════════════
def discover_platforms(handle: str, real_name: str = None) -> dict:
"""Find someone across all platforms."""
name = real_name or handle
results = {}
# Instagram
results["instagram"] = _safe_web_search(f"{name} instagram OR site:instagram.com/{handle}", limit=5)
# LinkedIn
results["linkedin"] = _safe_web_search(f"{name} linkedin OR site:linkedin.com/in", limit=5)
# Reddit
results["reddit"] = _safe_web_search(f"{name} reddit account OR site:reddit.com/user", limit=5)
# GitHub
results["github"] = _safe_web_search(f"{handle} github OR site:github.com/{handle}", limit=5)
# YouTube
results["youtube"] = _safe_web_search(f"{name} youtube channel OR talk OR interview", limit=5)
# Personal site
results["personal_site"] = _safe_web_search(f"{name} personal website blog about", limit=5)
# Hacker News
results["hackernews"] = _safe_web_search(f"site:news.ycombinator.com {handle} OR {name}", limit=3)
return results
def discover_instagram(handle: str = None, real_name: str = None) -> dict:
"""Focused Instagram discovery."""
results = {}
name = real_name or handle
# Try to find their IG handle
results["ig_search"] = _safe_web_search(f"{name} instagram profile", limit=5)
# If we have a candidate IG URL, try to extract
ig_urls = []
for item in results.get("ig_search", []):
if not isinstance(item, dict):
continue
url = item.get("url", "")
if "instagram.com/" in url and "/p/" not in url:
ig_urls.append(url)
if ig_urls:
# Try to extract IG profile page
r = web_extract(urls=ig_urls[:1])
results["ig_profile"] = r.get("results", [])
return results
# ═══════════════════════════════════════════════════════════════
# VISUAL INTELLIGENCE
# ═══════════════════════════════════════════════════════════════
# NOTE: These functions use browser_* and vision_analyze which are
# NOT available in execute_code. They are called DIRECTLY by the
# agent after the execute_code research phase.
#
# The agent should:
# 1. Run this script via execute_code for text-based research
# 2. Then use browser/vision tools directly for visual research
#
# Visual research tasks for the agent:
#
# INSTAGRAM VISUAL:
# browser_navigate("https://www.instagram.com/{ig_handle}/")
# browser_vision(question="Describe this Instagram profile: bio, pic, grid, aesthetic, follower count")
# browser_get_images() # collect image URLs
# vision_analyze(image_url="{url}", question="Describe: setting, people, mood, style")
#
# PROFILE PIC ANALYSIS:
# vision_analyze(image_url="{pic_url}", question="Describe: appearance, clothing, setting, expression, professional vs casual")
#
# REVERSE IMAGE SEARCH (Yandex):
# # Upload to catbox if behind auth:
# terminal("curl -F 'reqtype=fileupload' -F 'fileToUpload=@{path}' https://catbox.moe/user/api.php")
# browser_navigate(f"https://yandex.com/images/search?rpt=imageview&url={encoded_url}")
#
# PAGE SCREENSHOT ANALYSIS:
# browser_vision(question="Read all text, usernames, post content, dates, engagement numbers")
# ═══════════════════════════════════════════════════════════════
# INTERACTION MAPPING
# ═══════════════════════════════════════════════════════════════
def search_interactions(handle: str, other_handles: list = None) -> dict:
"""How they interact with other simulation targets."""
results = {}
if other_handles:
for other in other_handles[:4]:
hits = _safe_web_search(f"{handle} {other} twitter interaction debate reply", limit=3)
if hits:
results[f"with_{other}"] = hits
return results
def search_social_graph(handle: str) -> dict:
"""Who do they interact with most? Allies and rivals."""
results = {}
results["frequent_interactions"] = _safe_web_search(f"@{handle} twitter reply thread conversation with", limit=5)
results["conflicts"] = _safe_web_search(f"@{handle} disagree argue beef ratio", limit=5)
results["allies"] = _safe_web_search(f"@{handle} agree support endorse recommend", limit=5)
return results
# ═══════════════════════════════════════════════════════════════
# DEEP EXTRACTION
# ═══════════════════════════════════════════════════════════════
def extract_content(urls: list) -> list:
"""Pull full content from high-value URLs."""
if not urls:
return []
r = web_extract(urls=urls[:3])
return r.get("results", [])
def extract_best_urls(findings: dict, max_urls: int = 5) -> list:
"""Find the most promising URLs in research findings for deep extraction."""
seen_urls = set() # URL deduplication
priority_domains = [
"substack.com", "medium.com", "blog", "essay",
"interview", "podcast", "youtube.com", "arxiv.org",
]
def score_url(url, desc):
score = 0
for domain in priority_domains:
if domain in url.lower() or domain in desc.lower():
score += 2
if any(w in desc.lower() for w in ["interview", "spoke", "told", "said", "wrote"]):
score += 1
return score
candidates = []
def collect(obj):
if isinstance(obj, list):
for item in obj:
if isinstance(item, dict):
url = item.get("url") or ""
desc = item.get("description") or item.get("text") or ""
if url and url not in seen_urls and not any(x in url for x in ["x.com", "twitter.com", "instagram.com"]):
seen_urls.add(url)
candidates.append((score_url(url, desc), url))
elif isinstance(obj, dict):
for v in obj.values():
collect(v)
collect(findings)
candidates.sort(key=lambda x: -x[0])
return [url for _, url in candidates[:max_urls]]
# ═══════════════════════════════════════════════════════════════
# MAIN PIPELINE
# ═══════════════════════════════════════════════════════════════
def research_person(handle: str, fidelity: int = 70,
topics: list = None,
other_handles: list = None,
real_name: str = None,
domain: str = None) -> dict:
"""
Full research pipeline for one person.
Returns dict with all findings organized by category.
Args:
handle: Twitter/X handle (without @)
fidelity: Research depth 0-100
topics: Specific topics to research
other_handles: Other people to check interactions with
real_name: Real name if different from handle
domain: Domain context (e.g., 'AI', 'politics', 'gaming').
When None, no domain keywords are added to searches.
When set, adds relevant domain keywords.
"""
print(f"\n{'='*60}")
print(f" RESEARCHING: @{handle} | Fidelity: {fidelity}%")
if domain:
print(f" Domain: {domain}")
print(f"{'='*60}")
findings = {"handle": handle, "fidelity": fidelity, "visual_tasks": []}
# ─── Phase 1: Identity (always) ───
print(f"\n [IDENTITY] Who are they...")
findings["identity"] = search_identity(handle)
if fidelity <= 30:
if topics:
findings["quick_topic"] = _safe_web_search(f"{handle} {topics[0]}", limit=3)
return findings
# ─── Phase 2: Voice (fidelity 31+) ───
print(f"\n [VOICE] How do they talk...")
findings["voice"] = search_voice(handle)
# ─── Phase 3: Positions (fidelity 31+) ───
print(f"\n [POSITIONS] What do they believe...")
findings["positions"] = search_positions(handle, topics, domain=domain)
if fidelity <= 50:
return findings
# ─── Phase 4: Cross-platform (fidelity 51+) ───
print(f"\n [PLATFORMS] Finding them everywhere...")
findings["platforms"] = discover_platforms(handle, real_name)
if fidelity <= 70:
return findings
# ─── Phase 5: Longform (fidelity 71+) ───
print(f"\n [LONGFORM] Blogs, interviews, essays...")
findings["longform"] = search_longform(handle, real_name, domain=domain)
# ─── Phase 6: Social graph (fidelity 71+) ───
print(f"\n [SOCIAL GRAPH] Who do they interact with...")
findings["social_graph"] = search_social_graph(handle)
# ─── Phase 7: Interaction mapping (fidelity 71+) ───
if other_handles:
print(f"\n [INTERACTIONS] With other targets: {other_handles}...")
findings["interactions"] = search_interactions(handle, other_handles)
# ─── Phase 8: Instagram deep dive (fidelity 80+) ───
if fidelity >= 80:
print(f"\n [INSTAGRAM] Visual identity...")
findings["instagram"] = discover_instagram(handle, real_name)
# Queue visual tasks for the agent to do after execute_code
findings["visual_tasks"].append({
"type": "instagram_profile",
"instruction": f"browser_navigate to Instagram profile, use browser_vision to analyze",
"handle": handle,
})
# ─── Phase 9: Deep extraction (fidelity 85+) ───
if fidelity >= 85:
print(f"\n [DEEP EXTRACT] Pulling longform content...")
best_urls = extract_best_urls(findings, max_urls=4)
if best_urls:
print(f" Extracting {len(best_urls)} URLs: {best_urls}")
findings["deep_extracts"] = extract_content(best_urls)
# ─── Phase 10: Profile pic analysis (fidelity 90+) ───
if fidelity >= 90:
findings["visual_tasks"].append({
"type": "profile_pic_analysis",
"instruction": "Find and analyze profile pictures across platforms with vision_analyze",
"handle": handle,
})
findings["visual_tasks"].append({
"type": "reverse_image_search",
"instruction": "Reverse image search profile pic via Yandex to find alt accounts",
"handle": handle,
})
return findings
def research_all(handles: list, fidelity: int = 70,
topics: list = None, domain: str = None) -> dict:
"""Research all simulation targets."""
all_findings = {}
for handle in handles:
clean = handle.lstrip("@")
others = [h.lstrip("@") for h in handles if h.lstrip("@") != clean]
findings = research_person(
handle=clean,
fidelity=fidelity,
topics=topics,
other_handles=others,
domain=domain,
)
all_findings[clean] = findings
return all_findings
# ═══════════════════════════════════════════════════════════════
# REPORTING
# ═══════════════════════════════════════════════════════════════
def count_data_points(obj) -> int:
"""Count total search result items in findings (only meaningful items with >50 char text)."""
total = 0
if isinstance(obj, list):
for item in obj:
if isinstance(item, dict):
text = item.get("description") or item.get("text") or ""
if len(text) > 50:
total += 1
else:
# Still count non-dict items or items without text fields
total += 1
else:
total += 1
elif isinstance(obj, dict):
for k, v in obj.items():
# Skip metadata keys
if k in ("handle", "fidelity", "visual_tasks"):
continue
total += count_data_points(v)
return total
def count_quality_data_points(obj) -> int:
"""Count search result items with substantial text (description/text > 50 chars)."""
total = 0
if isinstance(obj, list):
for item in obj:
if isinstance(item, dict):
text = item.get("description") or item.get("text") or ""
if len(text) > 50:
total += 1
elif isinstance(obj, dict):
for k, v in obj.items():
if k in ("handle", "fidelity", "visual_tasks"):
continue
total += count_quality_data_points(v)
return total
def summarize_findings(findings: dict) -> str:
"""Compact summary of what we found."""
handle = findings.get("handle", "unknown")
fidelity = findings.get("fidelity", 0)
total = count_data_points(findings)
quality = count_quality_data_points(findings)
visual_tasks = findings.get("visual_tasks", [])
lines = [
f"\n{''*60}",
f" @{handle} | Fidelity: {fidelity}% | Data points: {total} ({quality} quality)",
f"{''*60}",
]
# Identity snippets
identity = findings.get("identity", {})
for key in ["twitter_identity", "general_identity"]:
for item in identity.get(key, [])[:2]:
if not isinstance(item, dict):
continue
desc = (item.get("description") or "")[:180]
if desc:
lines.append(f" [{key.upper()}] {desc}")
# Platform discovery results
platforms = findings.get("platforms", {})
found_platforms = []
for platform, items in platforms.items():
if isinstance(items, list) and len(items) > 0:
found_platforms.append(platform)
if found_platforms:
lines.append(f" [PLATFORMS FOUND] {', '.join(found_platforms)}")
# Voice samples from aggregators
voice = findings.get("voice", {})
for key, items in voice.items():
if isinstance(items, list):
for item in items[:1]:
if not isinstance(item, dict):
continue
desc = (item.get("description") or "")[:180]
if desc and handle.lower() in desc.lower():
lines.append(f" [VOICE] {desc}")
# Deep extracts
for extract in findings.get("deep_extracts", [])[:2]:
if not isinstance(extract, dict):
continue
title = extract.get("title", "untitled")
content = (extract.get("content") or "")[:200]
if content:
lines.append(f" [LONGFORM: {title}] {content}...")
# Pending visual tasks
if visual_tasks:
lines.append(f" [VISUAL TASKS QUEUED] {len(visual_tasks)} tasks for agent to execute:")
for task in visual_tasks:
lines.append(f"{task.get('type', '?')}: {task.get('instruction', '?')[:80]}")
# Confidence estimate — based on quality data points
if quality >= 30:
conf = "HIGH"
elif quality >= 15:
conf = "MEDIUM"
elif quality >= 5:
conf = "LOW"
else:
conf = "INSUFFICIENT"
lines.append(f"\n CONFIDENCE: {conf} ({quality} quality data points, {total} total)")
return "\n".join(lines)
def report_visual_tasks(all_findings: dict) -> str:
"""Collect all visual tasks across all targets for agent to execute."""
lines = ["\n" + ""*60, " VISUAL INTELLIGENCE TASKS (agent must execute directly)", ""*60]
any_tasks = False
for handle, findings in all_findings.items():
for task in findings.get("visual_tasks", []):
any_tasks = True
lines.append(f"\n @{handle}{task.get('type', '?')}:")
lines.append(f" {task.get('instruction', '?')}")
if not any_tasks:
lines.append(" No visual tasks queued (fidelity < 80)")
return "\n".join(lines)
# ═══════════════════════════════════════════════════════════════
# CHECK AVAILABLE TOOLS
# ═══════════════════════════════════════════════════════════════
def check_x_cli() -> bool:
"""Check if x-cli is available."""
try:
r = terminal("which x-cli 2>/dev/null && echo 'FOUND' || echo 'NOT_FOUND'")
return "FOUND" in r.get("output", "")
except:
return False
# ═══════════════════════════════════════════════════════════════
# ENTRY POINT
# ═══════════════════════════════════════════════════════════════
if __name__ == "__main__":
# ── CONFIGURE THESE ──
HANDLES = ["teknium1", "basedjensen"]
FIDELITY = 80
TOPICS = ["open source AI", "compute scaling"]
DOMAIN = None # Set to 'AI', 'politics', etc. to add domain keywords
# ─────────────────────
has_xcli = check_x_cli()
print(f"x-cli available: {has_xcli}")
print(f"Targets: {HANDLES}")
print(f"Fidelity: {FIDELITY}%")
print(f"Topics: {TOPICS}")
print(f"Domain: {DOMAIN}")
results = research_all(HANDLES, fidelity=FIDELITY, topics=TOPICS, domain=DOMAIN)
for handle, findings in results.items():
print(summarize_findings(findings))
print(report_visual_tasks(results))
print("\n\nResearch phase complete. Agent should now:")
print("1. Execute any queued visual tasks (browser/vision)")
print("2. Compile dossiers from all findings")
print("3. Run simulation")

View File

@@ -0,0 +1,238 @@
#!/usr/bin/env python3
"""
Threads (Meta) Profile & Post Extractor
========================================
Extracts profile data and post content from Threads using:
1. OG meta tags from HTML (no auth required for profiles and public posts)
2. WebFinger for ActivityPub discovery
3. Google-indexed post URLs for recent post discovery
METHODS THAT WORK:
- Profile pages at threads.net/@{user} have OG tags with:
display_name, username, follower_count, thread_count, bio, profile_pic
- Individual post pages have OG tags with:
full post text, author info, profile pic
- WebFinger at /.well-known/webfinger gives ActivityPub user IDs
- Post URLs must be known (discoverable via web search)
METHODS THAT DON'T WORK (as of 2025):
- Threads Official API (graph.threads.net) requires OAuth token
- ActivityPub /ap/users/ endpoints return 404 for most users
- No public post listing endpoint exists
"""
import re
import json
import html
import subprocess
import sys
def curl_fetch(url, extra_headers=None, timeout=15):
"""Fetch URL using curl (more reliable than urllib for Threads)."""
cmd = ['curl', '-s', '-L', '--max-time', str(timeout)]
if extra_headers:
for k, v in extra_headers.items():
cmd.extend(['-H', f'{k}: {v}'])
cmd.append(url)
try:
result = subprocess.run(cmd, capture_output=True, text=True, timeout=timeout+5)
return result.stdout
except:
return None
def extract_og_tags(html_content):
"""Extract OpenGraph, meta description, and Twitter tags from HTML."""
data = {}
if not html_content:
return data
for m in re.finditer(r'property="(og:[^"]+)"\s+content="([^"]*)"', html_content):
key = m.group(1)
val = html.unescape(m.group(2))
if key not in data:
data[key] = val
for m in re.finditer(r'name="description"\s+content="([^"]*)"', html_content):
data['description'] = html.unescape(m.group(1))
break
for m in re.finditer(r'name="(twitter:[^"]+)"\s+content="([^"]*)"', html_content):
key = m.group(1)
val = html.unescape(m.group(2))
if key not in data:
data[key] = val
return data
def parse_profile_description(desc):
"""Parse '5.5M Followers • 142 Threads • Bio. See the latest...' format."""
result = {}
if not desc:
return result
parts = desc.split(' \u2022 ') # Split on bullet •
for part in parts:
part = part.strip()
if 'Follower' in part:
result['followers'] = part.split(' Follower')[0].strip()
elif part.endswith('Threads') or part.endswith('Thread'):
result['thread_count'] = part.split(' Thread')[0].strip()
else:
bio = re.sub(r'\s*See the latest conversations.*$', '', part)
if bio:
result['bio'] = bio
return result
def parse_profile_title(title):
"""Parse 'Display Name (@user) • Threads, Say more' format."""
result = {}
if not title:
return result
m = re.match(r'^(.+?)\s*\(@(\w+)\)', title)
if m:
result['display_name'] = m.group(1).strip()
result['username'] = m.group(2)
return result
def get_threads_profile(username):
"""
Get Threads profile data via OG meta tags.
Returns dict with: username, display_name, bio, followers, thread_count,
profile_picture_url, url
"""
username = username.lstrip('@')
url = f'https://www.threads.net/@{username}'
content = curl_fetch(url)
tags = extract_og_tags(content)
if not tags or 'og:title' not in tags:
return {'error': 'Failed to fetch or parse profile', 'username': username}
title = tags.get('og:title', '')
if title.startswith('Threads') and 'Log in' in title:
return {'error': 'Profile requires login or not found', 'username': username}
result = {
'platform': 'threads',
'url': url,
}
result.update(parse_profile_title(title))
result.update(parse_profile_description(tags.get('og:description', '')))
if 'og:image' in tags:
result['profile_picture_url'] = tags['og:image']
return result
def get_threads_webfinger(username):
"""Get WebFinger data (ActivityPub discovery) for a Threads user."""
username = username.lstrip('@')
url = f'https://www.threads.net/.well-known/webfinger?resource=acct:{username}@threads.net'
content = curl_fetch(url, {'Accept': 'application/json'})
if not content:
return None
try:
data = json.loads(content)
if 'error' in data or 'success' in data and not data['success']:
return None
result = {'subject': data.get('subject', '')}
for link in data.get('links', []):
if link.get('type') == 'application/activity+json':
result['activitypub_url'] = link['href']
elif link.get('rel') == 'http://webfinger.net/rel/profile-page':
result['profile_url'] = link['href']
return result
except:
return None
def get_thread_post(post_url):
"""
Get content of a specific Threads post via OG tags.
Returns: text, author, image_url
"""
content = curl_fetch(post_url)
tags = extract_og_tags(content)
if not tags or 'og:title' not in tags:
return {'error': 'Failed to fetch post'}
title = tags.get('og:title', '')
if 'Log in' in title:
return {'error': 'Post requires login or not found'}
result = {'url': post_url}
if 'og:description' in tags:
result['text'] = tags['og:description']
elif 'description' in tags:
result['text'] = tags['description']
if 'og:title' in tags:
# Parse "Display Name (@username) on Threads"
m = re.match(r'^(.+?)\s*\(@(\w+)\)\s+on\s+Threads', title)
if m:
result['author_name'] = m.group(1).strip()
result['author_username'] = m.group(2)
if 'og:image' in tags:
result['image_url'] = tags['og:image']
return result
def get_threads_full(username):
"""Get complete profile data combining all methods."""
profile = get_threads_profile(username)
wf = get_threads_webfinger(username)
if wf:
profile['webfinger'] = wf
return profile
# ===== TEST =====
if __name__ == '__main__':
test_users = sys.argv[1:] if len(sys.argv) > 1 else ['zuck', 'nvidia', 'mosseri']
for user in test_users:
print(f"\n{'='*60}")
print(f" THREADS PROFILE: @{user}")
print(f"{'='*60}")
data = get_threads_full(user)
for k, v in sorted(data.items()):
if k == 'profile_picture_url':
print(f" {k}: {str(v)[:80]}...")
elif k == 'webfinger':
print(f" webfinger:")
for wk, wv in v.items():
print(f" {wk}: {wv}")
else:
print(f" {k}: {v}")
# Test posts
post_urls = [
'https://www.threads.net/@zuck/post/DEkvXzbyDS9',
]
print(f"\n{'='*60}")
print(f" THREADS POSTS")
print(f"{'='*60}")
for purl in post_urls:
print(f"\n URL: {purl}")
post = get_thread_post(purl)
for k, v in post.items():
if k in ('image_url',):
print(f" {k}: {str(v)[:80]}...")
elif k == 'text':
print(f" {k}: {v[:300]}{'...' if len(v) > 300 else ''}")
else:
print(f" {k}: {v}")

View File

@@ -0,0 +1,305 @@
"""
TikTok Profile & Video Data Scraper
====================================
WORKING methods to get full TikTok profile data and video content.
Tested and verified April 2026.
METHODS SUMMARY:
================
METHOD 1 (BEST): HTML SSR Scraping - Parse __UNIVERSAL_DATA_FOR_REHYDRATION__
- Gets: FULL profile (bio, stats, follower/following/heart/video counts)
- Works: YES - Reliable, no auth needed, just curl + parse
- Limitation: No video list on profile page (videos load client-side)
METHOD 2: oEmbed API - https://www.tiktok.com/oembed?url=...
- Gets: Video title/caption, author, thumbnail URL
- Works: YES - No auth, no rate limit issues
- Limitation: Need video IDs first; no engagement stats
METHOD 3: tikwm.com API - https://www.tikwm.com/api/
- Gets: Full user info + individual video stats (plays, likes, comments, shares)
- User info: https://www.tikwm.com/api/user/info?unique_id={username}
- Video info: https://www.tikwm.com/api/?url={tiktok_video_url}
- Works: YES for user info and single videos
- Limitation: Posts list endpoint returns 403 (rate-limited)
METHOD 4: Video ID Discovery via Search Engines
- Use web_search("site:tiktok.com/@{username}/video") to find video IDs
- Then use oEmbed or tikwm or HTML scraping per video
- Works: YES - Gets ~5 recent video IDs per search
METHOD 5: SocialBlade via web_extract
- URL: https://socialblade.com/tiktok/user/{username}
- Gets: Followers, following, likes, videos, growth trends, rankings
- Works: YES via web_extract tool
METHOD 6: Individual Video HTML Scraping
- Fetch https://www.tiktok.com/@{user}/video/{id}
- Parse __UNIVERSAL_DATA webapp.video-detail -> itemInfo.itemStruct
- Gets: FULL video data (caption, stats, music, hashtags, duration)
- Works: YES - Most complete per-video data
NOT WORKING:
- TikTok /api/user/detail/ endpoint -> returns empty (needs signed params)
- TikTok /api/post/item_list/ -> returns empty (needs x-bogus/msToken)
- tikwm.com /api/user/posts -> 403 forbidden
"""
import re
import json
import subprocess
import urllib.parse
USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
def fetch_url(url, headers=None):
"""Fetch URL via curl and return content."""
cmd = ['curl', '-s', '-L', '-m', '30', url,
'-H', f'User-Agent: {USER_AGENT}',
'-H', 'Accept-Language: en-US,en;q=0.9']
if headers:
for k, v in headers.items():
cmd.extend(['-H', f'{k}: {v}'])
result = subprocess.run(cmd, capture_output=True, text=True, timeout=35)
return result.stdout
def method1_html_profile(username):
"""
METHOD 1: Scrape TikTok profile HTML and parse SSR JSON data.
Returns full profile with stats.
"""
url = f'https://www.tiktok.com/@{username}'
html = fetch_url(url)
m = re.search(
r'<script id="__UNIVERSAL_DATA_FOR_REHYDRATION__" type="application/json">(.*?)</script>',
html
)
if not m:
return None
data = json.loads(m.group(1))
scope = data.get('__DEFAULT_SCOPE__', {})
user_detail = scope.get('webapp.user-detail', {})
user_info = user_detail.get('userInfo', {})
if not user_info:
return None
user = user_info.get('user', {})
stats = user_info.get('statsV2', user_info.get('stats', {}))
return {
'id': user.get('id'),
'username': user.get('uniqueId'),
'nickname': user.get('nickname'),
'bio': user.get('signature'),
'verified': user.get('verified'),
'private': user.get('privateAccount'),
'secUid': user.get('secUid'),
'avatarLarger': user.get('avatarLarger'),
'bioLink': user.get('bioLink', {}),
'createTime': user.get('createTime'),
'language': user.get('language'),
'stats': {
'followers': int(stats.get('followerCount', 0)),
'following': int(stats.get('followingCount', 0)),
'hearts': int(stats.get('heartCount', 0)),
'videos': int(stats.get('videoCount', 0)),
'diggs': int(stats.get('diggCount', 0)),
'friends': int(stats.get('friendCount', 0)),
}
}
def method2_oembed_video(username, video_id):
"""
METHOD 2: Get video caption/title via oEmbed.
No auth needed. Returns caption, author, thumbnail.
"""
url = f'https://www.tiktok.com/oembed?url=https://www.tiktok.com/@{username}/video/{video_id}'
content = fetch_url(url)
try:
data = json.loads(content)
return {
'video_id': video_id,
'title': data.get('title', ''),
'author_name': data.get('author_name'),
'author_url': data.get('author_url'),
'thumbnail_url': data.get('thumbnail_url'),
'thumbnail_width': data.get('thumbnail_width'),
'thumbnail_height': data.get('thumbnail_height'),
}
except json.JSONDecodeError:
return None
def method3_tikwm_user(username):
"""
METHOD 3a: Get user info via tikwm.com API.
"""
url = f'https://www.tikwm.com/api/user/info?unique_id={username}'
content = fetch_url(url)
try:
data = json.loads(content)
if data.get('code') == 0:
return data['data']
except json.JSONDecodeError:
pass
return None
def method3_tikwm_video(video_url):
"""
METHOD 3b: Get video details via tikwm.com API.
Returns: title, play_count, digg_count, comment_count, share_count, duration, download URLs
"""
url = f'https://www.tikwm.com/api/?url={urllib.parse.quote(video_url)}'
content = fetch_url(url)
try:
data = json.loads(content)
if data.get('code') == 0:
v = data['data']
return {
'video_id': v.get('id'),
'title': v.get('title'),
'duration': v.get('duration'),
'play_count': v.get('play_count'),
'likes': v.get('digg_count'),
'comments': v.get('comment_count'),
'shares': v.get('share_count'),
'author': v.get('author', {}).get('unique_id'),
'music_title': v.get('music_info', {}).get('title') if v.get('music_info') else None,
'cover_url': v.get('origin_cover') or v.get('cover'),
'play_url': v.get('play'), # direct video URL
}
except json.JSONDecodeError:
pass
return None
def method6_html_video(username, video_id):
"""
METHOD 6: Scrape individual video page HTML for full data.
Gets: caption, full stats, music, hashtags, create time.
"""
url = f'https://www.tiktok.com/@{username}/video/{video_id}'
html = fetch_url(url)
m = re.search(
r'<script id="__UNIVERSAL_DATA_FOR_REHYDRATION__" type="application/json">(.*?)</script>',
html
)
if not m:
return None
data = json.loads(m.group(1))
scope = data.get('__DEFAULT_SCOPE__', {})
vd = scope.get('webapp.video-detail', {})
item = vd.get('itemInfo', {}).get('itemStruct', {})
if not item:
return None
stats = item.get('statsV2', item.get('stats', {}))
music = item.get('music', {})
challenges = item.get('challenges', [])
return {
'video_id': item.get('id'),
'description': item.get('desc'),
'createTime': item.get('createTime'),
'duration': item.get('video', {}).get('duration'),
'stats': {
'plays': int(stats.get('playCount', 0)),
'likes': int(stats.get('diggCount', 0)),
'comments': int(stats.get('commentCount', 0)),
'shares': int(stats.get('shareCount', 0)),
'saves': int(stats.get('collectCount', 0)),
},
'music': {
'title': music.get('title'),
'author': music.get('authorName'),
},
'hashtags': [c.get('title', '') for c in challenges],
'author': item.get('author', {}).get('uniqueId'),
}
def get_full_tiktok_profile(username):
"""
Complete pipeline: Get full profile + discover and scrape recent videos.
Returns dict with profile data, stats, and recent video details.
"""
# Step 1: Get profile data
profile = method1_html_profile(username)
if not profile:
return {'error': f'Could not fetch profile for @{username}'}
result = {
'profile': profile,
'videos': [],
'data_sources': ['tiktok_html_ssr'],
}
# Note: Video discovery requires web_search tool (not available in pure Python)
# In the agent context, use:
# web_search(f"site:tiktok.com/@{username}/video")
# Then for each video ID found, call method6_html_video() or method2_oembed_video()
return result
if __name__ == '__main__':
import sys
username = sys.argv[1] if len(sys.argv) > 1 else 'khaby.lame'
print(f'=== Testing TikTok scraping for @{username} ===\n')
print('--- METHOD 1: HTML Profile Scraping ---')
profile = method1_html_profile(username)
if profile:
print(f' Username: {profile["username"]}')
print(f' Nickname: {profile["nickname"]}')
print(f' Bio: {profile["bio"][:100]}')
print(f' Verified: {profile["verified"]}')
print(f' Followers: {profile["stats"]["followers"]:,}')
print(f' Following: {profile["stats"]["following"]:,}')
print(f' Hearts: {profile["stats"]["hearts"]:,}')
print(f' Videos: {profile["stats"]["videos"]:,}')
print(f' SecUid: {profile["secUid"][:50]}...')
else:
print(' FAILED')
print('\n--- METHOD 3a: tikwm.com User API ---')
tikwm_user = method3_tikwm_user(username)
if tikwm_user:
s = tikwm_user.get('stats', {})
print(f' Followers: {s.get("followerCount"):,}')
print(f' Hearts: {s.get("heartCount"):,}')
print(f' Videos: {s.get("videoCount"):,}')
else:
print(' FAILED')
# Test with a known video
test_video_id = '7615318641042623775' # khaby birthday video
if username == 'khaby.lame':
print(f'\n--- METHOD 2: oEmbed for video {test_video_id} ---')
oembed = method2_oembed_video(username, test_video_id)
if oembed:
print(f' Title: {oembed["title"][:80]}')
print(f'\n--- METHOD 6: HTML Video Scraping for {test_video_id} ---')
video = method6_html_video(username, test_video_id)
if video:
print(f' Description: {video["description"][:80]}')
print(f' Plays: {video["stats"]["plays"]:,}')
print(f' Likes: {video["stats"]["likes"]:,}')
print(f' Comments: {video["stats"]["comments"]:,}')
print(f' Shares: {video["stats"]["shares"]:,}')
print(f' Hashtags: {video["hashtags"]}')
print('\n=== DONE ===')

View File

@@ -0,0 +1,260 @@
"""
Direct X/Twitter API v2 client for Hermes Simulator.
No x-cli dependency — uses curl via terminal() with bearer token.
Provides:
- get_user(handle) — profile, bio, metrics
- get_tweets(user_id, count) — recent tweets with metrics
- search_tweets(query, count) — search for tweets
- get_user_mentions(user_id, count) — mentions of a user
"""
from hermes_tools import terminal
import json
import os
import time
import urllib.parse
# Bearer token — loaded from env or hardcoded fallback
BEARER = os.environ.get("X_BEARER_TOKEN", "")
MAX_RETRIES = 3
BASE_DELAY = 2 # seconds, exponential backoff: 2s, 4s, 8s
def _api_get(endpoint: str, params: dict = None) -> dict:
"""Make authenticated GET request to X API v2 with retry and error handling."""
url = f"https://api.twitter.com/2/{endpoint}"
if params:
qs = "&".join(f"{k}={urllib.parse.quote(str(v))}" for k, v in params.items())
url += f"?{qs}"
for attempt in range(MAX_RETRIES):
try:
r = terminal(f'curl -s -w \'\\n%{{http_code}}\' -H "Authorization: Bearer {BEARER}" "{url}"')
output = r.get("output", "").strip()
# Split body from status code (last line)
lines = output.rsplit("\n", 1)
if len(lines) == 2:
body, status_str = lines
else:
body = output
status_str = "0"
try:
status_code = int(status_str.strip())
except ValueError:
status_code = 0
# Handle specific status codes
if status_code == 429:
# Rate limited — retry with backoff
delay = BASE_DELAY * (2 ** attempt)
print(f" [X API] Rate limited (429). Retry {attempt+1}/{MAX_RETRIES} in {delay}s...")
time.sleep(delay)
continue
if status_code in (401, 403):
return {"error": f"Authentication failed (HTTP {status_code}). Check X_BEARER_TOKEN.", "http_status": status_code}
if status_code >= 500:
delay = BASE_DELAY * (2 ** attempt)
print(f" [X API] Server error ({status_code}). Retry {attempt+1}/{MAX_RETRIES} in {delay}s...")
time.sleep(delay)
continue
if status_code == 0 and not body:
# Network error — no response at all
delay = BASE_DELAY * (2 ** attempt)
print(f" [X API] Network error. Retry {attempt+1}/{MAX_RETRIES} in {delay}s...")
time.sleep(delay)
continue
try:
return json.loads(body)
except json.JSONDecodeError:
return {"error": f"Failed to parse response (HTTP {status_code}): {body[:200]}"}
except Exception as e:
delay = BASE_DELAY * (2 ** attempt)
print(f" [X API] Exception: {e}. Retry {attempt+1}/{MAX_RETRIES} in {delay}s...")
time.sleep(delay)
continue
return {"error": f"All {MAX_RETRIES} retries exhausted for {endpoint}"}
def get_user(handle: str) -> dict:
"""Get user profile by handle."""
handle = handle.lstrip("@")
return _api_get(f"users/by/username/{handle}", {
"user.fields": "description,public_metrics,profile_image_url,created_at,location,url"
})
def get_tweets(user_id: str, count: int = 20) -> dict:
"""Get user's recent tweets."""
return _api_get(f"users/{user_id}/tweets", {
"max_results": max(min(count, 100), 5),
"tweet.fields": "created_at,public_metrics,text,in_reply_to_user_id,referenced_tweets",
"exclude": "retweets" # original tweets only for voice analysis
})
def get_tweets_with_rts(user_id: str, count: int = 20) -> dict:
"""Get user's recent tweets including retweets (shows interests)."""
return _api_get(f"users/{user_id}/tweets", {
"max_results": max(min(count, 100), 5),
"tweet.fields": "created_at,public_metrics,text,referenced_tweets"
})
def search_tweets(query: str, count: int = 10) -> dict:
"""Search recent tweets."""
return _api_get("tweets/search/recent", {
"query": query,
"max_results": max(min(count, 100), 10),
"tweet.fields": "created_at,public_metrics,text,author_id"
})
def get_user_by_id(user_id: str) -> dict:
"""Get user profile by ID."""
return _api_get(f"users/{user_id}", {
"user.fields": "description,public_metrics,username,name"
})
# ═══════════════════════════════════════════════════════════════
# HIGH-LEVEL INTELLIGENCE FUNCTIONS
# ═══════════════════════════════════════════════════════════════
def profile_user(handle: str) -> dict:
"""Full profile pull: identity + recent tweets (originals only)."""
user = get_user(handle)
if "errors" in user or "error" in user:
return {"error": f"User @{handle} not found", "details": user}
user_data = user.get("data", {})
user_id = user_data.get("id")
result = {
"profile": user_data,
"tweets": [],
"voice_samples": [],
}
if user_id:
# Get original tweets (no RTs) for voice analysis
tweets = get_tweets(user_id, 20)
tweet_list = tweets.get("data", [])
result["tweets"] = tweet_list
# Extract pure text samples for voice profiling
# Only exclude retweets and actual replies (has in_reply_to_user_id)
# Tweets starting with @ are fine if they're standalone mentions
result["voice_samples"] = [
t["text"] for t in tweet_list
if not t.get("text", "").startswith("RT @")
and not t.get("in_reply_to_user_id")
]
return result
def profile_interactions(handle1: str, handle2: str) -> dict:
"""Find interactions between two users."""
# Search for replies from handle1 to handle2
q1 = f"from:{handle1} to:{handle2}"
q2 = f"from:{handle2} to:{handle1}"
r1 = search_tweets(q1, 10)
r2 = search_tweets(q2, 10)
return {
f"{handle1}_to_{handle2}": r1.get("data", []),
f"{handle2}_to_{handle1}": r2.get("data", []),
}
def get_voice_data(handle: str, count: int = 50) -> dict:
"""Pull maximum voice data: tweets, replies, quote tweets.
Returns categorized samples for voice profiling."""
user = get_user(handle)
if "errors" in user or "error" in user:
return {"error": f"User @{handle} not found"}
user_data = user.get("data", {})
user_id = user_data.get("id")
if not user_id:
return {"error": "No user ID found"}
# Original tweets (exclude RTs)
originals = get_tweets(user_id, min(count, 100))
original_list = originals.get("data", [])
# Categorize — only use in_reply_to_user_id to detect replies
standalone = [] # not replies
replies = [] # replies to others
for t in original_list:
text = t.get("text", "")
if t.get("in_reply_to_user_id"):
replies.append(text)
else:
standalone.append(text)
return {
"profile": user_data,
"standalone_tweets": standalone, # their voice at rest
"replies": replies, # their voice in conversation
"total_samples": len(standalone) + len(replies),
}
# ═══════════════════════════════════════════════════════════════
# ENTRY POINT
# ═══════════════════════════════════════════════════════════════
if __name__ == "__main__":
if not BEARER:
print("ERROR: X_BEARER_TOKEN not set. Set it in environment or ~/.hermes/.env")
print("Trying to load from .env...")
try:
with open(os.path.expanduser("~/.hermes/.env")) as f:
for line in f:
line = line.strip()
if line.startswith("X_BEARER_TOKEN="):
# Use split with maxsplit=1 to handle values with '=' in them
# Also strip surrounding quotes if present
val = line.split("=", 1)[1]
if val and val[0] in ('"', "'") and val[-1] == val[0]:
val = val[1:-1]
BEARER = val
break
except Exception as e:
print(f" Failed to load .env: {e}")
if not BEARER:
print("FATAL: No bearer token found.")
exit(1)
# Demo: profile two users
for handle in ["Teknium", "basedjensen"]:
print(f"\n{'='*60}")
print(f" PROFILING @{handle}")
print(f"{'='*60}")
data = profile_user(handle)
profile = data.get("profile", {})
print(f" Name: {profile.get('name')}")
print(f" Bio: {profile.get('description')}")
metrics = profile.get("public_metrics", {})
print(f" Followers: {metrics.get('followers_count')}")
print(f" Tweets: {metrics.get('tweet_count')}")
print(f" Likes given: {metrics.get('like_count')}")
print(f"\n Voice samples ({len(data.get('voice_samples', []))}):")
for sample in data.get("voice_samples", [])[:5]:
print(f" > {sample[:120]}")

View File

@@ -0,0 +1,136 @@
# DOSSIER: {display_name} (@{handle})
## Identity
- **Name**: {real_name}
- **Handle(s)**: @{twitter} | u/{reddit} | {discord_tag}
- **Role**: {role_and_org}
- **Known for**: {what_they_are_famous_for}
- **Followers/reach**: {approximate_follower_count}
- **Confidence**: {HIGH|MEDIUM|LOW} — {confidence_reason}
## Voice Profile
### Linguistic Patterns
- **Sentence structure**: {short_punchy | long_flowing | mixed}
- **Capitalization**: {normal | all_lowercase | CAPS_FOR_EMPHASIS | mixed}
- **Punctuation**: {heavy_periods | ellipsis_lover | no_punctuation | exclamation_marks}
- **Paragraph style**: {one_liners | thread_essays | medium_blocks}
- **Emoji/emoticon usage**: {none | minimal | heavy | specific_ones}
### Vocabulary & Slang
- **Register**: {academic | casual | shitposter | mixed}
- **Recurring words/phrases**: [list of signature words they use a lot]
- **Catchphrases**: [any repeated phrases or running jokes]
- **Profanity level**: {none | mild | moderate | heavy}
- **Jargon tendency**: {explains_everything | assumes_expertise | mixes}
### Tone
- **Default mood**: {earnest | ironic | combative | chill | manic | analytical}
- **Humor style**: {deadpan | absurdist | sarcastic | wholesome | shitpost | none}
- **How they handle disagreement**: {engages_thoughtfully | dunks | ignores | ratio_warrior | passive_aggressive}
- **How they handle praise**: {deflects | accepts_gracefully | awkward | flexes}
## Positions & Beliefs
### Core Convictions (things they consistently advocate for)
1. {conviction_1}
2. {conviction_2}
3. {conviction_3}
### Known Hot Takes
1. {take_1}
2. {take_2}
### Hills They'll Die On
1. {hill_1}
2. {hill_2}
### Topics They Avoid or Refuse to Engage
1. {avoidance_1}
## Social Dynamics
### People They Interact With Positively
- @{ally_1} — {relationship_description}
- @{ally_2} — {relationship_description}
### People They Beef With / Disagree With
- @{rival_1} — {beef_description}
### How They Engage Different Types
- **Fans/supporters**: {how_they_respond}
- **Critics**: {how_they_respond}
- **Peers**: {how_they_respond}
- **Random people**: {how_they_respond}
## Platform-Specific Behavior
### On Twitter/X
- **Post frequency**: {multiple_daily | daily | few_per_week}
- **Thread tendency**: {never | sometimes | loves_threads}
- **QRT style**: {adds_context | dunks | amplifies}
- **Engagement style**: {likes_a_lot | rarely_likes | retweets_heavy}
### On Reddit (if applicable)
- **Subreddits**: [list]
- **Comment style**: {detailed | brief | combative}
### On Discord (if applicable)
- **Servers**: [known servers]
- **Vibe shift from Twitter**: {description}
## Signature Moves
Things this person characteristically does that make them recognizable:
1. {signature_move_1}
2. {signature_move_2}
3. {signature_move_3}
## Sample Quotes (real, sourced from research)
> "{actual_quote_1}" — [source/context]
> "{actual_quote_2}" — [source/context]
> "{actual_quote_3}" — [source/context]
## Deep Psychometric Profile
- **Big Five**: O{H/M/L} C{} E{} A{} N{} — {evidence}
- **Moral Foundations**: Care{} Fair{} Loyal{} Auth{} Sanct{} Liberty{} — {what drives their ethics}
- **Schwartz Values**: {dominant values} — {how they justify positions}
- **Cognitive Style**: {IC score estimate} — {hedging patterns, complexity, analytical vs intuitive}
- **Narrative Frame**: {dominant frame} — {how they lens issues}
- **Persona Authenticity**: {1-5 score} — {evidence for curation vs authenticity}
## Strategic Self-Presentation (Red Hat)
- **Cultivated image**: {what they want to be seen as}
- **Target audience**: {who they're performing for}
- **Incentive structure**: {what they gain from this persona}
- **Possible divergences**: {where persona may ≠ person}
- **Ghostwriting indicators**: {present/absent, evidence}
## Ecosystem Context
- **Community cluster**: {which tribe they belong to}
- **Key influencers**: {who they amplify/follow/agree with}
- **Echo chamber**: {what information environment they're in}
- **Audience profile**: {who follows them, how that audience reacts}
## Key Assumptions
1. {assumption} — FRAGILITY: {robust/moderate/fragile} — Test: {what invalidates it}
2. {assumption} — FRAGILITY: {} — Test: {}
3. {assumption} — FRAGILITY: {} — Test: {}
## Competing Hypotheses
- **H1 (PRIMARY)**: {main personality model} — Confidence: {X}%
- **H2 (ALTERNATIVE)**: {alternative explanation} — Confidence: {X}%
- **Key discriminator**: {what evidence would shift between H1 and H2}
## Research Sources
- {source_1} [{reliability}{confidence}] — {description}
- {source_2} [{reliability}{confidence}] — {description}
- {source_3} [{reliability}{confidence}] — {description}
## Invalidation Indicators
1. If @{handle} {does X instead of Y}, our {assessment} is wrong
2. If @{handle} {responds to Z with Q}, our {model} needs revision
3. If @{handle} {interacts with @person in manner M}, dynamics model is off
---
*Dossier compiled: {date} | Fidelity: {fidelity}% | Persona Authenticity: {1-5}*
*Source reliability range: {best}-{worst} | Analytical confidence: {1-6}*