Commit Graph

4 Commits

Author SHA1 Message Date
Brooklyn Nicholson
ffa33e53f6 chore(tui): remove dead branch cleanup code
- drop unused TUI helpers, test-only layout scaffolding, and stale public debug exports
- remove an unused profiler import and trim test-only coverage for deleted helpers
2026-04-26 21:54:24 -05:00
Brooklyn Nicholson
82f842277e perf(tui): profile harness gains --loop, --save, --compare
Before: change code → build → run profile → manually compare to
mental model of last run.  After: `--loop` watches ui-tui/src and
packages/hermes-ink/src for .ts(x) changes, rebuilds on change,
re-runs the same scenario, prints a side-by-side A/B diff against
the previous iteration — so each edit's impact is quantified
instantly.  Ctrl+C to stop.

Also added:
  --save LABEL     saves metrics snapshot to /tmp/perf-<LABEL>.json
  --compare LABEL  diffs the current run vs that snapshot
  --extra-flag X   pass-through to node dist/entry.js (prepping for
                   --no-fullscreen below)

key_metrics() flattens a full run into scalar numbers across
frames, React commits, and per-phase timings.  format_diff() prints
a table with ↑/↓ markers denoting regressions vs improvements based
on whether the metric is lower-is-better (p99, max, patches, drain)
or higher-is-better (fps, gaps_under_16ms).

Run-to-run noise on static code is ~5-15% on most metrics — big
signal (>30% change on renderer_p99 / fps) cuts through cleanly.
Useful both for validating a single fix and for detecting subtle
regressions during the wheel-accel port.

Usage during the next perf session:

  # one-shot with a baseline for later comparison
  scripts/profile-tui.py --seconds 6 --hold wheel_up --save pre-accel

  # after porting the wheel handler
  scripts/profile-tui.py --seconds 6 --hold wheel_up --compare pre-accel

  # continuous iteration
  scripts/profile-tui.py --seconds 6 --hold wheel_up --loop
2026-04-26 17:08:07 -05:00
Brooklyn Nicholson
f823535db2 perf(tui): instrument stdout drain — rule out terminal parse bottleneck
Adds four fields to FrameEvent.phases and the matching profile
summary:

  optimizedPatches  post-optimize patch count (what's actually
                    written to stdout; the .patches field is
                    pre-optimize)
  writeBytes        UTF-8 byte count of the write this frame
  backpressure      true when Node's stdout.write returned false
                    (Writable buffer full — outer terminal can't
                    keep up)
  prevFrameDrainMs  end-to-end drain time of the PREVIOUS frame's
                    write, captured from stdout.write's 2-arg
                    callback.  Reported on the next frame so the
                    measurement reflects "time until OS flushed
                    the bytes to the terminal fd", not "time until
                    queued in Node".

writeDiffToTerminal() now returns { bytes, backpressure } and
accepts an optional onDrain callback.  Only attached on TTY with
diff; piped/non-TTY stdout bypasses flow control so the callback
would fire synchronously anyway.

Initial measurements under hold-wheel_up against 1106-msg session
(30Hz for 6s):

  patches total    28,888
  optimized total  16,700   (ratio 0.58 — optimizer cuts ~42%)
  writeBytes       42 KB / 10s = 4.2 KB/s throughput
  drainMs p50      0.14 ms   terminal accepts bytes instantly
  drainMs p99      0.85 ms
  backpressure     0% of frames

This rules out the terminal-parse hypothesis — Cursor's xterm.js
drains our output in sub-millisecond time at only 4 KB/s.  The
remaining lag has to be in the render pipeline, not the wire.
Profile output now includes the bytes+drain+backpressure lines to
keep this visible on every subsequent iteration.
2026-04-26 17:06:22 -05:00
Brooklyn Nicholson
71eee26640 perf(tui): full-pipeline instrumentation + profiling harness
Extends HERMES_DEV_PERF to capture the complete render pipeline, not
just React commits. Adds scripts/profile-tui.py to drive repeatable
hold-PageUp stress tests against a real long session.

perfPane.tsx:
  Wires ink's onFrame callback (already plumbed through the fork) into
  the same perf.log as the React.Profiler samples. Captures per-phase
  timing (yoga calculateLayout, renderNodeToOutput, screen diff, patch
  optimize, stdout write) plus yoga counters (visited/measured/cache-
  Hits/live) and patch counts per frame.  Events are tagged
  {src: 'react'|'frame'} so jq can split them.  logFrameEvent is
  undefined when HERMES_DEV_PERF is unset, so ink doesn't even attach
  the callback.

entry.tsx:
  Passes logFrameEvent into render().

types/hermes-ink.d.ts:
  Declares FrameEvent + onFrame on RenderOptions so the ui-tui side
  type-checks against the plumbed-through ink option.

scripts/profile-tui.py:
  New harness. Launches the built TUI under a PTY with the longest
  session in state.db resumed, holds PageUp/PageDown/etc at a
  configurable Hz for N seconds, then parses perf.log and prints
  per-phase p50/p95/p99/max plus yoga-counter summaries. Zero deps
  beyond stdlib. Exit 2 if nothing was captured (wiring broken).

Initial findings (1106-msg session, 6s PageUp hold at 30Hz):
  - Steady state: 10 fps; renderer phase p99=63ms, write p99=0.2ms
  - 4/107 heavy frames (>=16ms), all dominated by renderNodeToOutput
  - One pathological 97ms frame with yoga measuring 70,415 text cells
    and Yoga visiting 225k nodes — the cold-unmeasured-region hit
  - Ink's scroll fast-path (DECSTBM blit from prevScreen) is
    disqualified because our spacer-based virtual history doesn't
    keep heightDelta in sync with scroll.delta, so every PageUp step
    falls through to a full 2000-4800 patch re-render instead of ~40
2026-04-26 16:36:25 -05:00