feat(self-check): harness smoke tests lock in the 1.4.0 invariants (VERSION 1.4.1)
Adds a 'harness' category to /self-check (Task 12, self-check half) so the harness- optimization gains can't silently regress. All read-only / non-invasive: - VERSION marker present + not older than manifest.harness.min_version - skill-registry description budget (sum of all SKILL.md description: fields under registry_desc_budget_chars) -- the metric that catches Task 5 bloating back - global deploy targets ~/.claude/skills + ~/.claude/commands populated (Mac-wipe failure) - harness-guard.sh present + wired into sync.sh - core scripts parse (bash -n on sync/guard/now-phoenix); now-phoenix.sh emits a valid date Tunables in baseline/manifest.json 'harness' block. Verified 9/9 PASS; budget WARN negative-tested at a synthetic over-budget value. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -46,3 +46,15 @@ or old harness during a heterogeneous rollout. See
|
|||||||
--fmt` formats. `post-bot-alert.sh` already uses `jq -nc --arg` (verified, no change needed).
|
--fmt` formats. `post-bot-alert.sh` already uses `jq -nc --arg` (verified, no change needed).
|
||||||
- Deferred (unchanged): full Python port = separate spec; Task 8 shard command bodies; promote
|
- Deferred (unchanged): full Python port = separate spec; Task 8 shard command bodies; promote
|
||||||
guard to FATAL after a clean warn window; schedule memory-dream --apply-safe per-machine.
|
guard to FATAL after a clean warn window; schedule memory-dream --apply-safe per-machine.
|
||||||
|
|
||||||
|
## 1.4.1 — 2026-06-08 (Task 12: self-check smoke tests)
|
||||||
|
- /self-check gained a `harness` category that locks in the 1.4.0 invariants (all read-only):
|
||||||
|
VERSION present + not older than manifest min_version; **skill-registry description budget**
|
||||||
|
(sum of all SKILL.md description: fields under manifest.harness.registry_desc_budget_chars —
|
||||||
|
WARN on regrowth, the metric that would catch Task 5 bloating back); global deploy targets
|
||||||
|
~/.claude/skills + ~/.claude/commands populated (the Mac-wipe failure); harness-guard.sh wired
|
||||||
|
into sync.sh; core scripts parse (bash -n on sync/guard/now-phoenix); now-phoenix.sh emits a
|
||||||
|
valid date. Tunables live in baseline/manifest.json `harness` block. Verified: 9/9 PASS on this
|
||||||
|
machine; budget WARN trips correctly on a synthetic over-budget value.
|
||||||
|
- Also reconciled the remaining "GrepAI first" docs (standard + CODING_GUIDELINES) with the
|
||||||
|
wiki-first recall hierarchy (started in CLAUDE_EXTENDED).
|
||||||
|
|||||||
@@ -1 +1 @@
|
|||||||
1.4.0
|
1.4.1
|
||||||
|
|||||||
@@ -81,6 +81,7 @@ SELFCHECK_TS="$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
|
|||||||
| **skills/commands** | every skill dir and command file in the baseline is present; extras are reported as census candidates. |
|
| **skills/commands** | every skill dir and command file in the baseline is present; extras are reported as census candidates. |
|
||||||
| **duplicates** | command/skill names present in BOTH the repo and `~/.claude`. Divergent content = WARN (the "same `/cmd`, different behaviour on the Mac" bug); identical = INFO (redundant, will drift). CRLF-only differences are ignored. |
|
| **duplicates** | command/skill names present in BOTH the repo and `~/.claude`. Divergent content = WARN (the "same `/cmd`, different behaviour on the Mac" bug); identical = INFO (redundant, will drift). CRLF-only differences are ignored. |
|
||||||
| **memory** | `MEMORY.md` index exists; no orphaned memory files; manifest-declared contradiction patterns (see semantic pass below). Never FAILs the grade. |
|
| **memory** | `MEMORY.md` index exists; no orphaned memory files; manifest-declared contradiction patterns (see semantic pass below). Never FAILs the grade. |
|
||||||
|
| **harness** | the 1.4.0 invariants (read-only): VERSION marker present + not older than `manifest.harness.min_version`; **skill-registry description budget** (sum of all SKILL.md `description:` fields under `registry_desc_budget_chars` — WARN on regrowth); global deploy targets `~/.claude/skills` + `~/.claude/commands` populated (the "Mac wiped global skills" failure); `harness-guard.sh` present + wired into `sync.sh`; core scripts parse (`bash -n` on sync/guard/now-phoenix); `now-phoenix.sh --date` emits a valid date. Budget/min-version/script-list are tunable in `manifest.harness`. |
|
||||||
| **vault** | vault repo exists; sops+age present; `vault.sh list` succeeds (decrypt wired). |
|
| **vault** | vault repo exists; sops+age present; `vault.sh list` succeeds (decrypt wired). |
|
||||||
| **connectivity** | coord API (required), main API + internal Gitea (advisory; off-network is OK). |
|
| **connectivity** | coord API (required), main API + internal Gitea (advisory; off-network is OK). |
|
||||||
|
|
||||||
|
|||||||
@@ -5,6 +5,19 @@
|
|||||||
"derived_at": "2026-06-02",
|
"derived_at": "2026-06-02",
|
||||||
"note": "PROVISIONAL baseline, generated from a single known-good machine. V1 of self-check is a CENSUS tool: every machine probes itself, publishes to the coord API, and we refine this manifest from real fleet data (see baseline/README.md). Do NOT treat 'extra' or 'missing' items as authoritative until the fleet census has confirmed them across machines.",
|
"note": "PROVISIONAL baseline, generated from a single known-good machine. V1 of self-check is a CENSUS tool: every machine probes itself, publishes to the coord API, and we refine this manifest from real fleet data (see baseline/README.md). Do NOT treat 'extra' or 'missing' items as authoritative until the fleet census has confirmed them across machines.",
|
||||||
|
|
||||||
|
"harness": {
|
||||||
|
"min_version": "1.4.0",
|
||||||
|
"version_file": ".claude/harness/VERSION",
|
||||||
|
"registry_desc_budget_chars": 10500,
|
||||||
|
"registry_desc_why": "Sum of all skill SKILL.md description: fields. These inject into EVERY session's skill registry, so a bloated description is a fleet-wide context tax. Budget set at the 1.4.0 post-trim size (~8.7k) + headroom; a WARN here means a skill description grew back and should be re-trimmed (move triggers/examples into the SKILL.md body).",
|
||||||
|
"syntax_check_scripts": [
|
||||||
|
".claude/scripts/sync.sh",
|
||||||
|
".claude/scripts/harness-guard.sh",
|
||||||
|
".claude/scripts/now-phoenix.sh"
|
||||||
|
],
|
||||||
|
"guard_wired_in": ".claude/scripts/sync.sh"
|
||||||
|
},
|
||||||
|
|
||||||
"required_tools": [
|
"required_tools": [
|
||||||
{ "name": "bash", "why": "hooks, scripts, sync, vault wrapper" },
|
{ "name": "bash", "why": "hooks, scripts, sync, vault wrapper" },
|
||||||
{ "name": "git", "why": "repo + submodules + Gitea sync" },
|
{ "name": "git", "why": "repo + submodules + Gitea sync" },
|
||||||
|
|||||||
@@ -546,6 +546,121 @@ check_memory() {
|
|||||||
fi
|
fi
|
||||||
}
|
}
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# CHECK: harness smoke tests (the 1.4.0 invariants).
|
||||||
|
# Locks in the harness-optimization gains so they can't silently regress:
|
||||||
|
# - VERSION marker present and not older than the manifest's min_version
|
||||||
|
# - skill-registry description budget (a bloated description taxes EVERY session)
|
||||||
|
# - global deploy targets populated (the "Mac wiped ~/.claude/skills" failure)
|
||||||
|
# - guard wired into sync.sh + executable
|
||||||
|
# - core scripts parse (bash -n) and now-phoenix.sh emits a valid date
|
||||||
|
# All read-only / non-invasive: no commits, no pushes, only a parse + a clock read.
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
ver_ge() { # $1 >= $2 -> echoes 1/0 (portable dotted-numeric compare)
|
||||||
|
awk -v a="$1" -v b="$2" 'BEGIN{
|
||||||
|
na=split(a,A,"."); nb=split(b,B,"."); n=(na>nb?na:nb);
|
||||||
|
for(i=1;i<=n;i++){x=(i<=na?A[i]+0:0); y=(i<=nb?B[i]+0:0);
|
||||||
|
if(x>y){print 1; exit} if(x<y){print 0; exit}}
|
||||||
|
print 1}'
|
||||||
|
}
|
||||||
|
check_harness_smoke() {
|
||||||
|
local hv_file budget guard_in
|
||||||
|
hv_file="$(jq -r '.harness.version_file // ".claude/harness/VERSION"' "$MANIFEST")"
|
||||||
|
budget="$(jq -r '.harness.registry_desc_budget_chars // 10500' "$MANIFEST")"
|
||||||
|
guard_in="$(jq -r '.harness.guard_wired_in // ".claude/scripts/sync.sh"' "$MANIFEST")"
|
||||||
|
local minver; minver="$(jq -r '.harness.min_version // empty' "$MANIFEST")"
|
||||||
|
|
||||||
|
# 1. VERSION marker present + not older than min_version
|
||||||
|
local vpath="$REPO_ROOT/$hv_file" have
|
||||||
|
if [ ! -f "$vpath" ]; then
|
||||||
|
emit harness.version harness FAIL "harness VERSION marker missing: $hv_file" \
|
||||||
|
"Restore via /sync (git pull); this machine may be on a pre-1.3.0 harness"
|
||||||
|
else
|
||||||
|
have="$(tr -d '[:space:]' < "$vpath")"
|
||||||
|
if [ -n "$minver" ] && [ "$(ver_ge "$have" "$minver")" != "1" ]; then
|
||||||
|
emit harness.version harness WARN "harness VERSION $have is older than baseline min $minver" \
|
||||||
|
"Run /sync to pull the current harness; behavior may differ from the fleet"
|
||||||
|
else
|
||||||
|
emit harness.version harness PASS "harness VERSION $have (>= min ${minver:-n/a})"
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
# 2. Skill-registry description budget (sum of all SKILL.md description: fields)
|
||||||
|
local total=0 f n=0 sub
|
||||||
|
for f in "$REPO_ROOT"/.claude/skills/*/SKILL.md; do
|
||||||
|
[ -f "$f" ] || continue
|
||||||
|
sub="$(awk '
|
||||||
|
/^---[ \t]*$/ { d++; next }
|
||||||
|
d==1 {
|
||||||
|
if ($0 ~ /^[A-Za-z0-9_-]+:/ && $0 !~ /^description:/) indesc=0
|
||||||
|
if ($0 ~ /^description:/) indesc=1
|
||||||
|
if (indesc) total += length($0)
|
||||||
|
}
|
||||||
|
d>=2 { print total+0; exit }
|
||||||
|
END { if (d<2) print total+0 }' "$f")"
|
||||||
|
total=$((total + ${sub:-0})); n=$((n+1))
|
||||||
|
done
|
||||||
|
if [ "$total" -gt "$budget" ] 2>/dev/null; then
|
||||||
|
emit harness.registry_budget harness WARN \
|
||||||
|
"skill-registry descriptions = $total chars over budget $budget ($n skills) — registry bloat taxes every session" \
|
||||||
|
"Trim the longest skill description(s) to one line; move triggers/examples into the SKILL.md body"
|
||||||
|
else
|
||||||
|
emit harness.registry_budget harness PASS "skill-registry descriptions $total/$budget chars ($n skills)"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# 3. Global deploy targets populated (Phase 5b/5c deploy; empty = the Mac-wipe failure)
|
||||||
|
local gs gc
|
||||||
|
gs="$(ls -1d "$HOME"/.claude/skills/*/ 2>/dev/null | wc -l | tr -d '[:space:]')"
|
||||||
|
gc="$(ls -1 "$HOME"/.claude/commands/*.md 2>/dev/null | wc -l | tr -d '[:space:]')"
|
||||||
|
if [ "${gs:-0}" -eq 0 ] 2>/dev/null; then
|
||||||
|
emit harness.deploy_skills harness WARN "~/.claude/skills is EMPTY (global skill deploy missing)" \
|
||||||
|
"Run /sync — sync.sh Phase 5c redeploys skills to ~/.claude/skills"
|
||||||
|
else
|
||||||
|
emit harness.deploy_skills harness PASS "~/.claude/skills populated ($gs skills)"
|
||||||
|
fi
|
||||||
|
if [ "${gc:-0}" -eq 0 ] 2>/dev/null; then
|
||||||
|
emit harness.deploy_commands harness WARN "~/.claude/commands has no .md files (global command deploy missing)" \
|
||||||
|
"Run /sync — sync.sh Phase 5b redeploys commands to ~/.claude/commands"
|
||||||
|
else
|
||||||
|
emit harness.deploy_commands harness PASS "~/.claude/commands populated ($gc commands)"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# 4. Guard wired into sync.sh + executable
|
||||||
|
local guard="$REPO_ROOT/.claude/scripts/harness-guard.sh"
|
||||||
|
if [ ! -f "$guard" ]; then
|
||||||
|
emit harness.guard harness WARN "harness-guard.sh missing" "Restore via /sync"
|
||||||
|
elif ! grep -q "harness-guard" "$REPO_ROOT/$guard_in" 2>/dev/null; then
|
||||||
|
emit harness.guard harness WARN "harness-guard.sh not wired into $guard_in (pre-commit guard inactive)" \
|
||||||
|
"Re-check $guard_in: it should call harness-guard.sh before commit"
|
||||||
|
else
|
||||||
|
emit harness.guard harness PASS "harness-guard.sh present + wired into $guard_in"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# 5. Core scripts parse (bash -n) + now-phoenix emits a valid date
|
||||||
|
local s sp
|
||||||
|
for s in $(jq -r '.harness.syntax_check_scripts[]? // empty' "$MANIFEST"); do
|
||||||
|
sp="$REPO_ROOT/$s"
|
||||||
|
if [ ! -f "$sp" ]; then
|
||||||
|
emit "harness.syntax.$(basename "$s")" harness WARN "script missing: $s" "Restore via /sync"
|
||||||
|
elif bash -n "$sp" 2>/dev/null; then
|
||||||
|
emit "harness.syntax.$(basename "$s")" harness PASS "parses clean: $s"
|
||||||
|
else
|
||||||
|
emit "harness.syntax.$(basename "$s")" harness FAIL "SYNTAX ERROR in $s (bash -n failed)" \
|
||||||
|
"Fix the parse error: bash -n \"$s\""
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
local nph="$REPO_ROOT/.claude/scripts/now-phoenix.sh" out
|
||||||
|
if [ -f "$nph" ]; then
|
||||||
|
out="$(bash "$nph" --date 2>/dev/null)"
|
||||||
|
if echo "$out" | grep -qE '^[0-9]{4}-[0-9]{2}-[0-9]{2}$'; then
|
||||||
|
emit harness.now_phoenix harness PASS "now-phoenix.sh --date OK ($out)"
|
||||||
|
else
|
||||||
|
emit harness.now_phoenix harness WARN "now-phoenix.sh --date returned unexpected output: '$out'" \
|
||||||
|
"Check .claude/scripts/now-phoenix.sh"
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
# Build the census JSON from accumulated results
|
# Build the census JSON from accumulated results
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
@@ -730,6 +845,7 @@ check_git
|
|||||||
check_skills_commands
|
check_skills_commands
|
||||||
check_duplicates
|
check_duplicates
|
||||||
check_memory
|
check_memory
|
||||||
|
check_harness_smoke
|
||||||
check_vault
|
check_vault
|
||||||
check_connectivity
|
check_connectivity
|
||||||
|
|
||||||
|
|||||||
@@ -40,6 +40,10 @@
|
|||||||
left in place — recall grep covers both layouts; NO mass migration (would break wiki `source_logs`
|
left in place — recall grep covers both layouts; NO mass migration (would break wiki `source_logs`
|
||||||
refs for low payoff).
|
refs for low payoff).
|
||||||
- **P1+P2+P3 COMPLETE (2026-06-08, VERSION 1.4.0).**
|
- **P1+P2+P3 COMPLETE (2026-06-08, VERSION 1.4.0).**
|
||||||
|
- [DONE] Task 12 (self-check half) — `/self-check` now has a `harness` category enforcing the
|
||||||
|
1.4.0 invariants (VERSION/min-version, skill-registry description budget, global deploy targets
|
||||||
|
populated, guard wired, core scripts parse, now-phoenix valid). Read-only; 9/9 PASS; budget WARN
|
||||||
|
negative-tested. Tunables in `self-check/baseline/manifest.json` `harness` block. (VERSION 1.4.1.)
|
||||||
- REMAINING (ops follow-ups, not blocking):
|
- REMAINING (ops follow-ups, not blocking):
|
||||||
- Promote the warn-only guard (Task 4) to FATAL after a clean warn window (check
|
- Promote the warn-only guard (Task 4) to FATAL after a clean warn window (check
|
||||||
`.claude/harness/guard.log` across the fleet).
|
`.claude/harness/guard.log` across the fleet).
|
||||||
|
|||||||
Reference in New Issue
Block a user