Builds the false-positive/true-positive proof the plan requires before the guard can be promoted to blocking, and fixes the one false-positive it surfaced. - test-harness-guard.sh: 12-case matrix in a throwaway repo, runs the REAL guard, asserts WARN/clean for real conflicts/secrets/keys vs legit content (setext underlines, dividers, docs that mention a marker, encrypted sops, public keys, .example templates). - harness-guard.sh: conflict rule now requires a real hunk (BOTH ^<<<<<<< AND ^>>>>>>>), dropping the lone =======$ trigger that false-positived on a 7-char setext underline / divider. Identical true-positive power (git writes all three markers); FP surface -> 0. - /self-check: new harness.guard_selftest runs the matrix in an isolated temp repo (read-only vs the real tree) so guard correctness is continuously proven. Verified 12/12 pass, true positives intact, real-tree FP surface = 0. FATAL flip (todo f1c11d0d, on/after 2026-06-22) is now evidence-backed + one-step. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
5.8 KiB
5.8 KiB
Harness CHANGELOG
The ClaudeTools harness version marker (.claude/harness/VERSION). Bump on every
fleet-visible behavioral change so a session can detect whether it is running the new
or old harness during a heterogeneous rollout. See
specs/claudetools-harness-optimization/.
1.0.0 — 2026-06-08
- Task 0.5: VERSION marker established (this file).
- Task 0.6: out-of-band recovery script
.claude/scripts/force-pull-raw.shadded. - (Earlier) Syncro billing SSOT resolved:
add_line_itemis normal billing; timers are outlier-only (explicit request).
1.1.0 — 2026-06-08
- Task 1: submodule-safe sync —
sync.shnow unstages submodule gitlinks (unless--with-submodules), eliminating the manual detach-to-pin dance before /save. - Task 4:
harness-guard.shwired intosync.shpre-commit, WARN-ONLY (logs conflict markers / unencrypted sops / private keys to .claude/harness/guard.log; does not block unless HARNESS_GUARD_FATAL=1; SKIP_HARNESS_GUARD=1 bypasses).
1.2.0 — 2026-06-08
- Task 2: wiki synthesis DECOUPLED from /save (the concurrent-recompile conflict source). /save now only writes the log + syncs and emits the exact /wiki-compile command to run. /wiki-compile is now SERIALIZED (per-article coord lock, TTL orphan-evict, coord-down = warn+proceed) and STAGED (writes .claude/wiki_staging/-.md -> review diff -> apply to live -> commit -> release lock). No blind background auto-merge.
1.3.0 — 2026-06-08
- Task 6: CLAUDE.md split into lean CORE (1.2k tokens, always loaded) + CLAUDE_EXTENDED.md (full manual, on-demand). Saves ~3.7k tokens per CLAUDE.md injection; nothing lost.
- Task 9 (P2): delegation re-tuned in CORE — act directly by default; delegate only for high-volume output, blast radius >3 files/layers, domain shift, or parallel work.
1.4.0 — 2026-06-08 (P1+P2+P3 complete)
- Task 5: one-line registry descriptions on the 8 biggest skills (remediation-tool, gc-audit, packetdial, memory-dream, human-flow, self-check, impeccable, mailprotector). Skill-description injection ~3320 -> ~2123 tokens (~36% cut); keyword triggers preserved; frontmatter valid.
- Task 7: thinned
/save+/syncbodies — they point tosync.shas the single source instead of re-documenting its internals; load-bearing LLM-judgment parts (Phase 0 save-vs-sync, cross-user note display, exit-75 reporting) kept verbatim. The mechanical sync never depends on an LLM step. - Task 10 (P3):
session-logs/YYYY-MM/adopted as a FORWARD convention for new logs (recall = scoped grep over month folders, no monolithic index); existing flat logs untouched (grep covers both). Recall order (wiki -> CONTEXT/log -> coord) already lives in CORE. - Deterministic Bash fix:
now-phoenix.shhelper added — fixed UTC-7 epoch math, replaces the unreliableTZ=America/Phoenix date(silently returns UTC on Git-Bash).--iso/--date/--datetime/ --fmtformats.post-bot-alert.shalready usesjq -nc --arg(verified, no change needed). - Deferred (unchanged): full Python port = separate spec; Task 8 shard command bodies; promote guard to FATAL after a clean warn window; schedule memory-dream --apply-safe per-machine.
1.4.1 — 2026-06-08 (Task 12: self-check smoke tests)
- /self-check gained a
harnesscategory that locks in the 1.4.0 invariants (all read-only): VERSION present + not older than manifest min_version; skill-registry description budget (sum of all SKILL.md description: fields under manifest.harness.registry_desc_budget_chars — WARN on regrowth, the metric that would catch Task 5 bloating back); global deploy targets ~/.claude/skills + ~/.claude/commands populated (the Mac-wipe failure); harness-guard.sh wired into sync.sh; core scripts parse (bash -n on sync/guard/now-phoenix); now-phoenix.sh emits a valid date. Tunables live in baseline/manifest.jsonharnessblock. Verified: 9/9 PASS on this machine; budget WARN trips correctly on a synthetic over-budget value. - Also reconciled the remaining "GrepAI first" docs (standard + CODING_GUIDELINES) with the wiki-first recall hierarchy (started in CLAUDE_EXTENDED).
1.4.2 — 2026-06-08 (Task 3 leftover: command-restates-standard lint)
- /self-check gained a
consistencycategory — the command-restates-standard lint. Deterministic half: for each manifest.command_standard_links pair, the standard must still carry its defer-to-SSOT pointer to the owning command; a lost pointer WARNs (the standard likely drifted back into restating the command — the Syncro-timers failure mode). Seeded with the syncro-billing link (time-entry-protocol.md -> /syncro). Semantic contradiction pass (read both, judge actual conflict) delegated to the model in SKILL.md, mirroring the memory pass. Verified PASS; negative- tested (WARN fires when the pointer is removed). New pairs: add to manifest.command_standard_links.
1.4.3 — 2026-06-08 (guard FATAL-promotion prerequisite: test matrix + refinement)
- Built
.claude/scripts/test-harness-guard.sh— a 12-case false-positive/true-positive matrix for harness-guard.sh (spins a throwaway repo, stages synthetic content, runs the REAL guard, asserts WARN/clean). Required by the plan before promoting the guard to FATAL. - The matrix surfaced a false-positive vector: the conflict rule's lone
=======$alternative fired on a markdown setext underline / divider of exactly seven=. REFINED harness-guard.sh to require a real hunk — BOTH^<<<<<<<AND^>>>>>>>present — which has identical true-positive power (git always writes all three markers) and eliminates the false positive. Verified 12/12 pass; real-tree false-positive surface = 0. - Wired the matrix into /self-check as
harness.guard_selftest(runs in an isolated temp repo, so the read-only-vs-real-tree contract holds). The eventual FATAL flip is now evidence-backed.