Files
claudetools/.claude/harness/CHANGELOG.md
Mike Swanson 512ceb4727 feat(harness-guard): FATAL-promotion prerequisite — test matrix + pair-required conflict rule (VERSION 1.4.3)
Builds the false-positive/true-positive proof the plan requires before the guard can be
promoted to blocking, and fixes the one false-positive it surfaced.

- test-harness-guard.sh: 12-case matrix in a throwaway repo, runs the REAL guard, asserts
  WARN/clean for real conflicts/secrets/keys vs legit content (setext underlines, dividers,
  docs that mention a marker, encrypted sops, public keys, .example templates).
- harness-guard.sh: conflict rule now requires a real hunk (BOTH ^<<<<<<< AND ^>>>>>>>),
  dropping the lone =======$ trigger that false-positived on a 7-char setext underline /
  divider. Identical true-positive power (git writes all three markers); FP surface -> 0.
- /self-check: new harness.guard_selftest runs the matrix in an isolated temp repo (read-only
  vs the real tree) so guard correctness is continuously proven.

Verified 12/12 pass, true positives intact, real-tree FP surface = 0. FATAL flip (todo
f1c11d0d, on/after 2026-06-22) is now evidence-backed + one-step.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 08:41:58 -07:00

5.8 KiB

Harness CHANGELOG

The ClaudeTools harness version marker (.claude/harness/VERSION). Bump on every fleet-visible behavioral change so a session can detect whether it is running the new or old harness during a heterogeneous rollout. See specs/claudetools-harness-optimization/.

1.0.0 — 2026-06-08

  • Task 0.5: VERSION marker established (this file).
  • Task 0.6: out-of-band recovery script .claude/scripts/force-pull-raw.sh added.
  • (Earlier) Syncro billing SSOT resolved: add_line_item is normal billing; timers are outlier-only (explicit request).

1.1.0 — 2026-06-08

  • Task 1: submodule-safe sync — sync.sh now unstages submodule gitlinks (unless --with-submodules), eliminating the manual detach-to-pin dance before /save.
  • Task 4: harness-guard.sh wired into sync.sh pre-commit, WARN-ONLY (logs conflict markers / unencrypted sops / private keys to .claude/harness/guard.log; does not block unless HARNESS_GUARD_FATAL=1; SKIP_HARNESS_GUARD=1 bypasses).

1.2.0 — 2026-06-08

  • Task 2: wiki synthesis DECOUPLED from /save (the concurrent-recompile conflict source). /save now only writes the log + syncs and emits the exact /wiki-compile command to run. /wiki-compile is now SERIALIZED (per-article coord lock, TTL orphan-evict, coord-down = warn+proceed) and STAGED (writes .claude/wiki_staging/-.md -> review diff -> apply to live -> commit -> release lock). No blind background auto-merge.

1.3.0 — 2026-06-08

  • Task 6: CLAUDE.md split into lean CORE (1.2k tokens, always loaded) + CLAUDE_EXTENDED.md (full manual, on-demand). Saves ~3.7k tokens per CLAUDE.md injection; nothing lost.
  • Task 9 (P2): delegation re-tuned in CORE — act directly by default; delegate only for high-volume output, blast radius >3 files/layers, domain shift, or parallel work.

1.4.0 — 2026-06-08 (P1+P2+P3 complete)

  • Task 5: one-line registry descriptions on the 8 biggest skills (remediation-tool, gc-audit, packetdial, memory-dream, human-flow, self-check, impeccable, mailprotector). Skill-description injection ~3320 -> ~2123 tokens (~36% cut); keyword triggers preserved; frontmatter valid.
  • Task 7: thinned /save + /sync bodies — they point to sync.sh as the single source instead of re-documenting its internals; load-bearing LLM-judgment parts (Phase 0 save-vs-sync, cross-user note display, exit-75 reporting) kept verbatim. The mechanical sync never depends on an LLM step.
  • Task 10 (P3): session-logs/YYYY-MM/ adopted as a FORWARD convention for new logs (recall = scoped grep over month folders, no monolithic index); existing flat logs untouched (grep covers both). Recall order (wiki -> CONTEXT/log -> coord) already lives in CORE.
  • Deterministic Bash fix: now-phoenix.sh helper added — fixed UTC-7 epoch math, replaces the unreliable TZ=America/Phoenix date (silently returns UTC on Git-Bash). --iso/--date/--datetime/ --fmt formats. post-bot-alert.sh already uses jq -nc --arg (verified, no change needed).
  • Deferred (unchanged): full Python port = separate spec; Task 8 shard command bodies; promote guard to FATAL after a clean warn window; schedule memory-dream --apply-safe per-machine.

1.4.1 — 2026-06-08 (Task 12: self-check smoke tests)

  • /self-check gained a harness category that locks in the 1.4.0 invariants (all read-only): VERSION present + not older than manifest min_version; skill-registry description budget (sum of all SKILL.md description: fields under manifest.harness.registry_desc_budget_chars — WARN on regrowth, the metric that would catch Task 5 bloating back); global deploy targets ~/.claude/skills + ~/.claude/commands populated (the Mac-wipe failure); harness-guard.sh wired into sync.sh; core scripts parse (bash -n on sync/guard/now-phoenix); now-phoenix.sh emits a valid date. Tunables live in baseline/manifest.json harness block. Verified: 9/9 PASS on this machine; budget WARN trips correctly on a synthetic over-budget value.
  • Also reconciled the remaining "GrepAI first" docs (standard + CODING_GUIDELINES) with the wiki-first recall hierarchy (started in CLAUDE_EXTENDED).

1.4.2 — 2026-06-08 (Task 3 leftover: command-restates-standard lint)

  • /self-check gained a consistency category — the command-restates-standard lint. Deterministic half: for each manifest.command_standard_links pair, the standard must still carry its defer-to-SSOT pointer to the owning command; a lost pointer WARNs (the standard likely drifted back into restating the command — the Syncro-timers failure mode). Seeded with the syncro-billing link (time-entry-protocol.md -> /syncro). Semantic contradiction pass (read both, judge actual conflict) delegated to the model in SKILL.md, mirroring the memory pass. Verified PASS; negative- tested (WARN fires when the pointer is removed). New pairs: add to manifest.command_standard_links.

1.4.3 — 2026-06-08 (guard FATAL-promotion prerequisite: test matrix + refinement)

  • Built .claude/scripts/test-harness-guard.sh — a 12-case false-positive/true-positive matrix for harness-guard.sh (spins a throwaway repo, stages synthetic content, runs the REAL guard, asserts WARN/clean). Required by the plan before promoting the guard to FATAL.
  • The matrix surfaced a false-positive vector: the conflict rule's lone =======$ alternative fired on a markdown setext underline / divider of exactly seven =. REFINED harness-guard.sh to require a real hunk — BOTH ^<<<<<<< AND ^>>>>>>> present — which has identical true-positive power (git always writes all three markers) and eliminates the false positive. Verified 12/12 pass; real-tree false-positive surface = 0.
  • Wired the matrix into /self-check as harness.guard_selftest (runs in an isolated temp repo, so the read-only-vs-real-tree contract holds). The eventual FATAL flip is now evidence-backed.