Compare commits

...

3 Commits

Author SHA1 Message Date
ethernet
cf71d4d0f7 ci(nix): split auto-fix-main into check-main + apply-main with distinct concurrency
The old auto-fix-main job ran both detection (fix-lockfiles --check
semantics) and repair (--apply + commit + push) in one cancel-in-
progress=true job. When pushes landed on main faster than the nix
build took (~1-2 minutes), the in-flight run would be killed by a
newer run — which is correct for the apply path (only the newest
push's fix should land) but wrong for the check path: cancelling
mid-detection can drop the only signal that ever existed for a given
lockfile delta, leaving main quietly broken until the next push
happens to retouch path-filtered files.

Split into two jobs:

  check-main  — runs fix-lockfiles --check, emits stale=true/false as
                a job output. cancel-in-progress: false, so each
                push's detection signal always runs to completion.

  apply-main  — needs check-main, gated on stale == 'true'. Runs
                --apply, commits via GitHub App token, pushes with
                rebase-retry. cancel-in-progress: true, so rapid
                pushes collapse to only the newest apply attempt
                actually landing on main.

Net effect: detection is never lost; apply races are still
collapsed. All existing safety invariants (file-whitelist check
before commit, GH App token so the commit triggers downstream
nix.yml verification, rebase-retry with abort-on-package-file-change)
are preserved on apply-main.
2026-05-04 17:04:11 -04:00
ethernet
5d6e9d2087 fix(nix): refresh hermes-tui npm deps hash for ui-tui lockfile
Unbreaks `nix build .#tui` on main. Commit d6615d8ec modified
ui-tui/package-lock.json without updating nix/tui.nix's fetchNpmDeps
hash; cachix still served the prior content-addressed artifact under
the old hash so the auto-fix workflow's detector missed the drift
(see the lib.nix change in the preceding commit for the diagnosis
and fix).

Recomputed via `nix run .#fix-lockfiles -- --apply` using the new
lockfile-drift detection logic.
2026-05-04 17:03:59 -04:00
ethernet
41af3a5078 fix(nix): detect npm lockfile drift in fix-lockfiles via source vs. npm-deps diff
fix-lockfiles previously decided a hash was current if and only if
`nix build .#<attr>.npmDeps` exited 0. Because fetchNpmDeps is a
fixed-output derivation (content-addressed by its declared hash),
Nix/cachix will happily serve a cached store path that matches the
declared hash even when that path contains an older lockfile than the
one currently checked into the source tree. The `stale=false` signal
then lies — the hash is pinned to an artifact that no longer matches
the source package-lock.json, and downstream `buildNpmPackage` fails
at npmConfigHook's source-vs-cache lockfile diff with a message that
reads like a generic build failure.

Root cause on origin/main: a feature PR edited ui-tui/package-lock.json
as an `npm install` side effect without updating nix/tui.nix's hash.
Cachix still held the prior content-addressed artifact under the pinned
hash, so the detector rubber-stamped the drift.

New behavior: after fetchNpmDeps resolves (from cache or a fresh
build), compare the source package-lock.json byte-for-byte
(newline-normalized, matching mkNpmPassthru's patchPhase so the check
mirrors exactly what npmConfigHook sees) against the one inside the
resulting npm-deps store path. On drift, wipe the hash and force a
clean fetchNpmDeps rebuild to surface the real hash the current source
lockfile produces. This sidesteps the --rebuild flag's downside of
always re-running fetchNpmDeps even when the hash is genuinely
correct (the reason --rebuild was removed in 9ac4a2e53).

Also hardens --apply: the verification build after applying the new
hash now targets the real `.#<attr>` package (not just .npmDeps), so
any lockfile drift that would still break the downstream build is
caught before the fix is pushed instead of immediately re-triggering
this same detector on the next push.

Verified against d35efb9898 (current origin/main head at time of
writing) with the stale cachix artifact primed in the local store:
the patched --check correctly reports stale=true with the recomputed
hash sha256-MLcLhjTF6dgdvNBtJWzo8Nh19eNh/ZitD2b07nm61Tc=, and --apply
updates nix/tui.nix and passes a real `.#tui` verification build.
Healthy cases still fast-path to 'ok' with no wipe+rebuild cost.
2026-05-04 17:03:51 -04:00
3 changed files with 133 additions and 35 deletions

View File

@@ -26,27 +26,62 @@ concurrency:
cancel-in-progress: false
jobs:
# ── Auto-fix on main ───────────────────────────────────────────────
# Fires when a push to main touches package.json or package-lock.json
# in ui-tui/ or web/. Runs fix-lockfiles and pushes the hash
# update commit directly to main so Nix builds never stay broken.
# ── Check on main ──────────────────────────────────────────────────
# Fires on every push to main that touches a path-filtered file. Only
# runs fix-lockfiles --check and emits stale=true/false as a job output.
# NEVER cancels in-progress siblings: each push's detection signal must
# run to completion, otherwise a rapid-fire sequence of pushes can drop
# the only signal that ever existed for a given lockfile delta and
# leave main quietly broken.
check-main:
if: github.event_name == 'push'
runs-on: ubuntu-latest
timeout-minutes: 20
concurrency:
group: nix-lockfile-check-main
cancel-in-progress: false
outputs:
stale: ${{ steps.check.outputs.stale }}
steps:
- uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
with:
ref: main
- uses: ./.github/actions/nix-setup
with:
cachix-auth-token: ${{ secrets.CACHIX_AUTH_TOKEN }}
- name: Check lockfile hashes
id: check
# --check is non-fatal here: it sets $GITHUB_OUTPUT.stale and
# exits 1 when drift is found. We want stale=true/false to
# propagate to apply-main as a job output, not to fail this job.
run: nix run .#fix-lockfiles -- --check || true
# ── Apply on main ──────────────────────────────────────────────────
# Depends on check-main. Only runs when stale=true, so the cheap
# detection path isn't gated behind the expensive apply path.
# DOES cancel in-progress siblings: we only want the newest push's
# fix to actually land on main — older in-flight applies are
# computing against outdated state and would race the push anyway.
#
# Safety invariants:
# 1. The fix commit only touches nix/*.nix files, which are NOT in
# the paths filter above, so this cannot re-trigger itself.
# 2. An explicit file-whitelist check before commit aborts if
# fix-lockfiles ever modifies unexpected files.
# 3. Job-level concurrency with cancel-in-progress: true ensures
# back-to-back pushes collapse to the newest; ref: main checkout
# always operates on the latest branch state.
# 4. Uses a GitHub App token (not GITHUB_TOKEN) so the fix commit
# 3. Uses a GitHub App token (not GITHUB_TOKEN) so the fix commit
# triggers downstream nix.yml verification.
auto-fix-main:
if: github.event_name == 'push'
# 4. Rebase-retry-on-push handles the case where main advanced
# with an unrelated commit during the nix build; aborts
# cleanly if package files changed since hash computation.
apply-main:
needs: check-main
if: needs.check-main.outputs.stale == 'true'
runs-on: ubuntu-latest
timeout-minutes: 25
concurrency:
group: auto-fix-main
group: nix-lockfile-apply-main
cancel-in-progress: true
steps:
- name: Generate GitHub App token

View File

@@ -160,32 +160,91 @@
FIXED=0
REPORT=""
# Normalize trailing newlines so a source-vs-cached lockfile diff is
# purely content-driven. Matches the patchPhase normalization in
# mkNpmPassthru, so this mirrors exactly what npmConfigHook sees.
_norm_lockfile() { sed -z 's/\n*$/\n/' "$1"; }
# Force a clean FOD rebuild to surface the real `got:` hash from the
# actual source lockfile, bypassing cachix pinning. Echoes the new
# hash on success; exits the loop entry with an error on failure.
# The NIX_FILE is always restored, even if the rebuild errors out.
_recompute_hash_from_wipe() {
local nix_file="$1" attr="$2"
local bak
bak="$(mktemp)"
cp "$nix_file" "$bak"
sed -i 's|hash = "sha256-[^"]*";|hash = "";|' "$nix_file"
local out rc
out=$(nix build ".#$attr.npmDeps" --no-link 2>&1) && rc=0 || rc=$?
cp "$bak" "$nix_file"
rm -f "$bak"
local new_hash
new_hash=$(echo "$out" | awk '/got:/ {print $2; exit}')
if [ -z "$new_hash" ]; then
echo "$out" | tail -20 >&2
return 1
fi
echo "$new_hash"
}
for entry in "''${ENTRIES[@]}"; do
IFS=":" read -r ATTR FOLDER NIX_FILE <<< "$entry"
echo "==> .#$ATTR ($FOLDER -> $NIX_FILE)"
OUTPUT=$(nix build ".#$ATTR.npmDeps" --no-link --print-build-logs 2>&1)
STATUS=$?
if [ "$STATUS" -eq 0 ]; then
echo " ok"
continue
fi
NEW_HASH=$(echo "$OUTPUT" | awk '/got:/ {print $2; exit}')
if [ -z "$NEW_HASH" ]; then
# Magic-Nix-Cache occasionally returns HTTP 418 / cache-throttled
# mid-run; nix then prints "outputs not valid, so checking is
# not possible" without a `got:` line. That's an infrastructure
# blip, not a stale lockfile warn + skip rather than failing
# the lint. A real hash mismatch would still surface in the
# primary `.#$ATTR` build, which is a separate CI job.
if echo "$OUTPUT" | grep -qE "throttled|HTTP error 418|substituter .* is disabled|some outputs of .* are not valid"; then
echo " skipped (transient cache failure see primary nix build for real status)" >&2
echo "$OUTPUT" | tail -8 >&2
# Build npmDeps (may resolve a cached FOD artifact pinned to a
# prior lockfile) and capture the resulting store path.
OUTPUT=$(nix build ".#$ATTR.npmDeps" --no-link --print-out-paths --print-build-logs 2>&1)
STATUS=$?
NEW_HASH=""
if [ "$STATUS" -eq 0 ]; then
# --print-out-paths emits the store path(s) on stdout (merged
# with build logs here). Last non-empty line is the npmDeps path.
OUT_PATH=$(echo "$OUTPUT" | awk 'NF' | tail -1)
SRC_LOCK="$FOLDER/package-lock.json"
CACHED_LOCK="$OUT_PATH/package-lock.json"
if [ ! -f "$CACHED_LOCK" ]; then
echo " unexpected: $CACHED_LOCK missing from npmDeps output" >&2
echo "$OUTPUT" | tail -20 >&2
exit 1
fi
if diff -q <(_norm_lockfile "$SRC_LOCK") <(_norm_lockfile "$CACHED_LOCK") >/dev/null 2>&1; then
echo " ok"
continue
fi
echo " build failed with no hash mismatch:" >&2
echo "$OUTPUT" | tail -40 >&2
exit 1
# The pinned FOD hash resolves (cachix handed us a cached
# artifact), but the lockfile inside that artifact no longer
# matches the source lockfile. npmConfigHook in the real .#$ATTR
# build rejects this so treat as stale and recompute the hash
# against the current source lockfile.
echo " lockfile drift source $SRC_LOCK differs from $CACHED_LOCK"
if ! NEW_HASH=$(_recompute_hash_from_wipe "$NIX_FILE" "$ATTR"); then
echo " failed to recompute hash after wipe+rebuild" >&2
exit 1
fi
else
# npmDeps build failed outright (no cached artifact for the
# pinned hash; nix attempted to re-run fetchNpmDeps and reported
# a hash mismatch). Parse got: from the error.
NEW_HASH=$(echo "$OUTPUT" | awk '/got:/ {print $2; exit}')
if [ -z "$NEW_HASH" ]; then
# Magic-Nix-Cache occasionally returns HTTP 418 / cache-throttled
# mid-run; nix then prints "outputs not valid, so checking is
# not possible" without a `got:` line. That's an infrastructure
# blip, not a stale lockfile warn + skip rather than failing
# the lint. A real hash mismatch would still surface in the
# primary `.#$ATTR` build, which is a separate CI job.
if echo "$OUTPUT" | grep -qE "throttled|HTTP error 418|substituter .* is disabled|some outputs of .* are not valid"; then
echo " skipped (transient cache failure see primary nix build for real status)" >&2
echo "$OUTPUT" | tail -8 >&2
continue
fi
echo " build failed with no hash mismatch:" >&2
echo "$OUTPUT" | tail -40 >&2
exit 1
fi
fi
HASH_LINE=$(grep -n 'hash = "sha256-' "$NIX_FILE" | head -1 | cut -d: -f1)
@@ -205,12 +264,16 @@
if [ "$MODE" = "--apply" ]; then
sed -i "s|hash = \"sha256-[^\"]*\";|hash = \"$NEW_HASH\";|" "$NIX_FILE"
if ! nix build ".#$ATTR.npmDeps" --no-link --print-build-logs; then
echo " verification build failed after hash update" >&2
# Verify with the REAL package build (not just .npmDeps). This
# exercises npmConfigHook the same way CI does, so a "fixed" hash
# that would still break the downstream build gets caught here
# rather than after we push the fix.
if ! nix build ".#$ATTR" --no-link --print-build-logs; then
echo " verification build of .#$ATTR failed after hash update" >&2
exit 1
fi
FIXED=1
echo " fixed"
echo " fixed and verified"
fi
done

View File

@@ -4,7 +4,7 @@ let
src = ../ui-tui;
npmDeps = pkgs.fetchNpmDeps {
inherit src;
hash = "sha256-a/HGI9OgVcTnZrMXA7xFMGnFoVxyHe95fulVz+WNYB0=";
hash = "sha256-MLcLhjTF6dgdvNBtJWzo8Nh19eNh/ZitD2b07nm61Tc=";
};
npm = hermesNpmLib.mkNpmPassthru { folder = "ui-tui"; attr = "tui"; pname = "hermes-tui"; };