Files

Mike Swanson b153ff158b feat(self-check): add harness self-diagnosis / fleet conformance skill

New /self-check skill: each machine probes its own ClaudeTools harness wiring
(identity.json paths, required tooling, settings.json hooks, skill/command/script
set, vault decrypt, coord/Gitea connectivity, Ollama capability tier) and grades
RED/AMBER/GREEN against a checked-in provisional baseline manifest.

- Capability-tier model: architectural/OS/hardware differences (e.g. no local
  Ollama) select a fallback ruleset instead of failing.
- Duplicate detection: flags command/skill names that diverge between the repo
  and ~/.claude (the "same /cmd, different behaviour" cross-machine bug);
  CRLF-only diffs ignored.
- Memory check: index + orphan detection, plus a model-driven semantic pass for
  memories that contradict identity/settings.
- V1 is a census tool: --publish writes a per-machine census to coord
  (component selfcheck_<host>); fanout requests the fleet to self-check +
  self-remediate + re-publish; aggregate derives the proposed baseline. No
  machine ever fixes another.

Reviewed twice by the Code Review Agent; three CRITICAL coord-API bugs and the
CRLF false-WARN found and fixed, verified live against the coord API.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-02 14:45:42 -07:00

9.1 KiB

Raw Blame History

name, description

name	description
self-check	Self-diagnose a ClaudeTools session's machine: verify the harness is wired the same way as every other instance while allowing for architectural / OS / hardware differences. Checks that identity.json exists and is correct (the map of WHERE things live on this box), required tooling is installed, env/paths resolve, hooks are wired, the skill/command/script set matches the baseline, the vault decrypts, coord/Gitea are reachable, and the machine's capability tier (e.g. no local Ollama) resolves to the right fallback ruleset. Grades RED/AMBER/GREEN and can publish a census to the coord API so the fleet baseline can be built/refined. Invoke for: "self check", "self diagnosis", "self test", "doctor", "health check", "am I configured right", "is my machine set up correctly", "harness conformance", "fleet conformance", "check my environment", "is everything wired up".

name

description

self-check

Self-diagnose a ClaudeTools session's machine: verify the harness is wired the same way as every other instance while allowing for architectural / OS / hardware differences. Checks that identity.json exists and is correct (the map of WHERE things live on this box), required tooling is installed, env/paths resolve, hooks are wired, the skill/command/script set matches the baseline, the vault decrypts, coord/Gitea are reachable, and the machine's capability tier (e.g. no local Ollama) resolves to the right fallback ruleset. Grades RED/AMBER/GREEN and can publish a census to the coord API so the fleet baseline can be built/refined. Invoke for: "self check", "self diagnosis", "self test", "doctor", "health check", "am I configured right", "is my machine set up correctly", "harness conformance", "fleet conformance", "check my environment", "is everything wired up".

Self-Check — ClaudeTools Harness Self-Diagnosis

A top-to-bottom evaluation of how this machine's ClaudeTools harness is wired, graded against a checked-in baseline manifest so every machine behaves the same way — while explicitly allowing for architecture, OS, and hardware differences via a capability tier model.

This is the skill the user asked for when a session needs to "make sure everything is as it should be."

The model in one paragraph

identity.json is the foundational, per-machine map of where things live and what this box can do (vault path, repo root, platform, arch, python command, Ollama endpoints). The baseline manifest (baseline/manifest.json) declares what every machine must have — required tools, identity fields, scripts, hook files, the wired settings.json hooks, the canonical skill/command set, and the capability rules that say what to do when a capability is absent (e.g. no local Ollama → use the remote endpoint, or if that is also down, route Tier-0 work to haiku instead of blocking). The probe compares the live machine against the manifest, resolves the machine's capability tier, and grades RED/AMBER/GREEN. Required things missing = RED. Advisory drift = AMBER. Capability differences are never failures — they select a ruleset.

V1 is a CENSUS tool (read this)

There is no ratified fleet baseline yet. baseline/manifest.json is provisional, generated from a single known-good machine (GURU-5070). So V1's job is to gather ground truth from every machine and help Mike build the real baseline:

Probe — each machine runs the check and produces a structured census.
Publish — --publish PUTs the census to coord as component selfcheck_<host> (state = grade, notes = full JSON). One row per machine = a live fleet conformance view.
Fan out — fanout broadcasts a request to ALL_SESSIONS so every active instance reports.
Aggregate — aggregate reads all censuses back and proposes a baseline (tools/skills/commands present on all machines = "required everywhere"; present on some = "capability-gated"), and lists machines with FAILs.

Mike reviews the aggregate and ratifies manifest.json. From then on the same probe enforces conformance. V1 does not auto-fix anything — it reports the exact fix command for each finding (per the decision on record).

Running it

The probe is scripts/self-check.sh (bash; runs on Git Bash/Windows, macOS, Linux; deps: jq + curl). Always pass a real UTC timestamp:

SELFCHECK_TS="$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
  bash .claude/skills/self-check/scripts/self-check.sh <mode>

Mode	Purpose
`report` (default)	Human RED/AMBER/GREEN report. Exit 0/1/2 = GREEN/AMBER/RED.
`--json`	Structured census JSON to stdout (for piping).
`--publish`	Run + publish census to coord (component `selfcheck_<host>`). Softfails to `.claude/coord-queue.jsonl`.
`fanout`	Broadcast a census request to ALL_SESSIONS.
`aggregate`	Fleet table + proposed-baseline summary from published censuses.

/self-check is the slash-command runner for the same script.

What it checks

Category	Checks
identity	identity.json exists + valid JSON; all required fields present; `claudetools_root` exists and equals the running repo; `vault_path` exists; `machine` == hostname; git user.name/email match identity.
tooling	required everywhere: bash, git, jq, curl, sops, age, ssh, a python. Missing = FAIL.
capability	ollama, cargo, node, gh, docker, op — presence is INFO, never a failure. Resolves the Ollama tier (local / remote / none) and prints the effective Tier-0 ruleset.
files	required scripts + hook files present and executable.
hooks	the three `settings.json` hooks are wired (block-backslash PreToolUse, check-messages UserPromptSubmit, sync-memory SessionStart); `current-mode` present.
git	origin points at ACG Gitea (internal IP preferred); main-repo post-commit hook installed (AMBER if not).
skills/commands	every skill dir and command file in the baseline is present; extras are reported as census candidates.
duplicates	command/skill names present in BOTH the repo and `~/.claude`. Divergent content = WARN (the "same `/cmd`, different behaviour on the Mac" bug); identical = INFO (redundant, will drift). CRLF-only differences are ignored.
memory	`MEMORY.md` index exists; no orphaned memory files; manifest-declared contradiction patterns (see semantic pass below). Never FAILs the grade.
vault	vault repo exists; sops+age present; `vault.sh list` succeeds (decrypt wired).
connectivity	coord API (required), main API + internal Gitea (advisory; off-network is OK).

Rogue-memory contradiction — semantic pass (do this when asked, or on a full check)

The engine's memory check is deterministic and conservative (index + orphans + declared patterns) so it never produces false alarms. A true contradiction check — "does any memory directly contradict what this machine's settings say?" — is a judgment task, so the model does it (route the prose/classification to Ollama Tier-0 per the house rules; Claude reviews the result):

Read identity.json (where things live + this box's capabilities), settings.json (wired hooks/permissions), and baseline/manifest.json.
Read the memory index .claude/memory/MEMORY.md, then open any memory whose one-line hook touches: paths/roots, python launcher, endpoints/IPs, OS/arch assumptions, tool choices, or model routing.
Flag memories that directly contradict this machine's reality, e.g.:
- prescribes python3/python when identity.python.command is py (or vice-versa),
- hardcodes a repo/vault path that isn't this machine's claudetools_root/vault_path,
- names an endpoint/IP that conflicts with identity.coord_api or the manifest,
- assumes a capability (local Ollama) this machine's tier says is absent.
Report each as: memory file, the contradicting claim, the setting it violates, and a suggested correction. Do not edit memories — surface for the operator (deletions/rewrites go through the human, mirroring memory-dream's posture).

Genuinely machine-specific guidance in a shared memory is the usual culprit — the fix is to scope it ("on Windows…") or split it, not to globally flip it.

Fleet self-remediation loop (machines fix themselves)

We never fix a remote machine. The flow is:

fanout — broadcast asks every instance to self-check + self-fix + re-publish.
Each operator runs /self-check locally, applies the printed fix commands on their own box, re-runs to confirm GREEN, then /self-check --publish.
aggregate — shows who is still RED/AMBER and prints each machine's own fix list. Relay it to that operator; do not run it for them.
Repeat until the fleet is consistently GREEN, then ratify the manifest.

How to interpret a run

After running, summarize for the user:

The grade and the PASS/WARN/FAIL/INFO tallies.
Each FAIL and WARN with its exact fix command. Do not auto-apply.
The capability tier line — confirm the machine knows its Tier-0 fallback.
If publishing/aggregating, note how many machines have reported and which are RED.

Capability differences (no Ollama, no gh, ARM vs amd64, macOS vs Windows) are expected and must never be reported as broken — they are the whole point of the tier model.

Files

.claude/skills/self-check/
  SKILL.md                  this file
  scripts/self-check.sh     the probe engine (report / --json / --publish / fanout / aggregate)
  baseline/manifest.json    the provisional fleet baseline (single source of truth)
  baseline/README.md        the baseline model + how to refine/ratify it
.claude/commands/self-check.md   the /self-check runner

Extending the baseline

When a new tool/skill/command/hook becomes mandatory fleet-wide, edit baseline/manifest.json, commit, and /sync. Every machine's next self-check enforces it. Capability-only tools go in capability_tools with a matching entry in capability_rules describing the fallback. See baseline/README.md.

9.1 KiB Raw Blame History