fix(file-tools): broaden dedup-status write guard to cover small wrappers

The write_file guard added in #16223 used strict equality against the
internal dedup status message. In practice, the model sometimes
prepends a short note or appends a trailing comment before calling
write_file, which slipped past the strict check.

Broaden the heuristic: reject writes whose stripped content equals
the status message OR contains it and is <=2x its length. Short,
status-dominated writes are always corruption; legitimate docs that
quote the message verbatim are always much longer.

Adds two tests: one for the small-wrapper corruption shape, one
confirming large legitimate files that quote the status still write.
This commit is contained in:
Teknium
2026-04-26 19:03:32 -07:00
committed by Teknium
parent 977d5f56c9
commit ced8f44cd2
2 changed files with 82 additions and 2 deletions

View File

@@ -264,10 +264,34 @@ def _cap_read_tracker_data(task_data: dict) -> None:
def _is_internal_file_status_text(content: str) -> bool:
"""Return True when content is an internal file-tool status, not file bytes."""
"""Return True when content looks like an internal file-tool status, not real file bytes.
The read_file dedup status message must never be persisted as file
content. The obvious shape is the model echoing the message verbatim,
but in practice it also wraps it with small framing text (a leading
"Note:", a trailing newline + short comment, etc.) before calling
write_file. We treat any short-ish write whose body is dominated by
the status message as the same class of corruption.
Heuristic:
* Strict equality (after strip) — the verbatim shape.
* OR the stripped content contains the full status message AND is
short enough that the status dominates it (<=2x the message length).
Short, status-dominated writes can't plausibly be real files —
legitimate docs/notes that happen to quote this internal message
are always dramatically longer.
"""
if not isinstance(content, str):
return False
return content.strip() == _READ_DEDUP_STATUS_MESSAGE
stripped = content.strip()
if not stripped:
return False
if stripped == _READ_DEDUP_STATUS_MESSAGE:
return True
if _READ_DEDUP_STATUS_MESSAGE in stripped and \
len(stripped) <= 2 * len(_READ_DEDUP_STATUS_MESSAGE):
return True
return False
def _get_file_ops(task_id: str = "default") -> ShellFileOperations: