feat(harness-guard): FATAL-promotion prerequisite — test matrix + pair-required conflict rule (VERSION 1.4.3)

Builds the false-positive/true-positive proof the plan requires before the guard can be
promoted to blocking, and fixes the one false-positive it surfaced.

- test-harness-guard.sh: 12-case matrix in a throwaway repo, runs the REAL guard, asserts
  WARN/clean for real conflicts/secrets/keys vs legit content (setext underlines, dividers,
  docs that mention a marker, encrypted sops, public keys, .example templates).
- harness-guard.sh: conflict rule now requires a real hunk (BOTH ^<<<<<<< AND ^>>>>>>>),
  dropping the lone =======$ trigger that false-positived on a 7-char setext underline /
  divider. Identical true-positive power (git writes all three markers); FP surface -> 0.
- /self-check: new harness.guard_selftest runs the matrix in an isolated temp repo (read-only
  vs the real tree) so guard correctness is continuously proven.

Verified 12/12 pass, true positives intact, real-tree FP surface = 0. FATAL flip (todo
f1c11d0d, on/after 2026-06-22) is now evidence-backed + one-step.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-08 08:41:53 -07:00
parent cfa264947b
commit 512ceb4727
8 changed files with 221 additions and 6 deletions

View File

@@ -659,6 +659,21 @@ check_harness_smoke() {
"Check .claude/scripts/now-phoenix.sh"
fi
fi
# 6. Guard self-test: run the full false-positive/true-positive matrix in an isolated
# temp repo (writes only under mktemp, never the real tree). Proves the guard still
# detects real conflicts/secrets AND does not false-positive on legit content — the
# standing prerequisite for promoting the guard to FATAL.
local gt="$REPO_ROOT/.claude/scripts/test-harness-guard.sh" gres
if [ -f "$gt" ] && command -v git >/dev/null 2>&1; then
gres="$(bash "$gt" 2>/dev/null | grep 'RESULT:' | head -1 | sed 's/^[[:space:]]*RESULT:[[:space:]]*//')"
if echo "$gres" | grep -q 'FAIL 0'; then
emit harness.guard_selftest harness PASS "guard FP/TP matrix clean ($gres)"
elif [ -n "$gres" ]; then
emit harness.guard_selftest harness WARN "guard self-test reported failures ($gres)" \
"Run: bash .claude/scripts/test-harness-guard.sh — a detection case regressed"
fi
fi
}
# ---------------------------------------------------------------------------