Files
claudetools/.claude/skills/self-check/SKILL.md
Mike Swanson b153ff158b feat(self-check): add harness self-diagnosis / fleet conformance skill
New /self-check skill: each machine probes its own ClaudeTools harness wiring
(identity.json paths, required tooling, settings.json hooks, skill/command/script
set, vault decrypt, coord/Gitea connectivity, Ollama capability tier) and grades
RED/AMBER/GREEN against a checked-in provisional baseline manifest.

- Capability-tier model: architectural/OS/hardware differences (e.g. no local
  Ollama) select a fallback ruleset instead of failing.
- Duplicate detection: flags command/skill names that diverge between the repo
  and ~/.claude (the "same /cmd, different behaviour" cross-machine bug);
  CRLF-only diffs ignored.
- Memory check: index + orphan detection, plus a model-driven semantic pass for
  memories that contradict identity/settings.
- V1 is a census tool: --publish writes a per-machine census to coord
  (component selfcheck_<host>); fanout requests the fleet to self-check +
  self-remediate + re-publish; aggregate derives the proposed baseline. No
  machine ever fixes another.

Reviewed twice by the Code Review Agent; three CRITICAL coord-API bugs and the
CRLF false-WARN found and fixed, verified live against the coord API.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 14:45:42 -07:00

9.1 KiB

name, description
name description
self-check Self-diagnose a ClaudeTools session's machine: verify the harness is wired the same way as every other instance while allowing for architectural / OS / hardware differences. Checks that identity.json exists and is correct (the map of WHERE things live on this box), required tooling is installed, env/paths resolve, hooks are wired, the skill/command/script set matches the baseline, the vault decrypts, coord/Gitea are reachable, and the machine's capability tier (e.g. no local Ollama) resolves to the right fallback ruleset. Grades RED/AMBER/GREEN and can publish a census to the coord API so the fleet baseline can be built/refined. Invoke for: "self check", "self diagnosis", "self test", "doctor", "health check", "am I configured right", "is my machine set up correctly", "harness conformance", "fleet conformance", "check my environment", "is everything wired up".

Self-Check — ClaudeTools Harness Self-Diagnosis

A top-to-bottom evaluation of how this machine's ClaudeTools harness is wired, graded against a checked-in baseline manifest so every machine behaves the same way — while explicitly allowing for architecture, OS, and hardware differences via a capability tier model.

This is the skill the user asked for when a session needs to "make sure everything is as it should be."

The model in one paragraph

identity.json is the foundational, per-machine map of where things live and what this box can do (vault path, repo root, platform, arch, python command, Ollama endpoints). The baseline manifest (baseline/manifest.json) declares what every machine must have — required tools, identity fields, scripts, hook files, the wired settings.json hooks, the canonical skill/command set, and the capability rules that say what to do when a capability is absent (e.g. no local Ollama → use the remote endpoint, or if that is also down, route Tier-0 work to haiku instead of blocking). The probe compares the live machine against the manifest, resolves the machine's capability tier, and grades RED/AMBER/GREEN. Required things missing = RED. Advisory drift = AMBER. Capability differences are never failures — they select a ruleset.

V1 is a CENSUS tool (read this)

There is no ratified fleet baseline yet. baseline/manifest.json is provisional, generated from a single known-good machine (GURU-5070). So V1's job is to gather ground truth from every machine and help Mike build the real baseline:

  1. Probe — each machine runs the check and produces a structured census.
  2. Publish--publish PUTs the census to coord as component selfcheck_<host> (state = grade, notes = full JSON). One row per machine = a live fleet conformance view.
  3. Fan outfanout broadcasts a request to ALL_SESSIONS so every active instance reports.
  4. Aggregateaggregate reads all censuses back and proposes a baseline (tools/skills/commands present on all machines = "required everywhere"; present on some = "capability-gated"), and lists machines with FAILs.

Mike reviews the aggregate and ratifies manifest.json. From then on the same probe enforces conformance. V1 does not auto-fix anything — it reports the exact fix command for each finding (per the decision on record).

Running it

The probe is scripts/self-check.sh (bash; runs on Git Bash/Windows, macOS, Linux; deps: jq + curl). Always pass a real UTC timestamp:

SELFCHECK_TS="$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
  bash .claude/skills/self-check/scripts/self-check.sh <mode>
Mode Purpose
report (default) Human RED/AMBER/GREEN report. Exit 0/1/2 = GREEN/AMBER/RED.
--json Structured census JSON to stdout (for piping).
--publish Run + publish census to coord (component selfcheck_<host>). Softfails to .claude/coord-queue.jsonl.
fanout Broadcast a census request to ALL_SESSIONS.
aggregate Fleet table + proposed-baseline summary from published censuses.

/self-check is the slash-command runner for the same script.

What it checks

Category Checks
identity identity.json exists + valid JSON; all required fields present; claudetools_root exists and equals the running repo; vault_path exists; machine == hostname; git user.name/email match identity.
tooling required everywhere: bash, git, jq, curl, sops, age, ssh, a python. Missing = FAIL.
capability ollama, cargo, node, gh, docker, op — presence is INFO, never a failure. Resolves the Ollama tier (local / remote / none) and prints the effective Tier-0 ruleset.
files required scripts + hook files present and executable.
hooks the three settings.json hooks are wired (block-backslash PreToolUse, check-messages UserPromptSubmit, sync-memory SessionStart); current-mode present.
git origin points at ACG Gitea (internal IP preferred); main-repo post-commit hook installed (AMBER if not).
skills/commands every skill dir and command file in the baseline is present; extras are reported as census candidates.
duplicates command/skill names present in BOTH the repo and ~/.claude. Divergent content = WARN (the "same /cmd, different behaviour on the Mac" bug); identical = INFO (redundant, will drift). CRLF-only differences are ignored.
memory MEMORY.md index exists; no orphaned memory files; manifest-declared contradiction patterns (see semantic pass below). Never FAILs the grade.
vault vault repo exists; sops+age present; vault.sh list succeeds (decrypt wired).
connectivity coord API (required), main API + internal Gitea (advisory; off-network is OK).

Rogue-memory contradiction — semantic pass (do this when asked, or on a full check)

The engine's memory check is deterministic and conservative (index + orphans + declared patterns) so it never produces false alarms. A true contradiction check — "does any memory directly contradict what this machine's settings say?" — is a judgment task, so the model does it (route the prose/classification to Ollama Tier-0 per the house rules; Claude reviews the result):

  1. Read identity.json (where things live + this box's capabilities), settings.json (wired hooks/permissions), and baseline/manifest.json.
  2. Read the memory index .claude/memory/MEMORY.md, then open any memory whose one-line hook touches: paths/roots, python launcher, endpoints/IPs, OS/arch assumptions, tool choices, or model routing.
  3. Flag memories that directly contradict this machine's reality, e.g.:
    • prescribes python3/python when identity.python.command is py (or vice-versa),
    • hardcodes a repo/vault path that isn't this machine's claudetools_root/vault_path,
    • names an endpoint/IP that conflicts with identity.coord_api or the manifest,
    • assumes a capability (local Ollama) this machine's tier says is absent.
  4. Report each as: memory file, the contradicting claim, the setting it violates, and a suggested correction. Do not edit memories — surface for the operator (deletions/rewrites go through the human, mirroring memory-dream's posture).

Genuinely machine-specific guidance in a shared memory is the usual culprit — the fix is to scope it ("on Windows…") or split it, not to globally flip it.

Fleet self-remediation loop (machines fix themselves)

We never fix a remote machine. The flow is:

  1. fanout — broadcast asks every instance to self-check + self-fix + re-publish.
  2. Each operator runs /self-check locally, applies the printed fix commands on their own box, re-runs to confirm GREEN, then /self-check --publish.
  3. aggregate — shows who is still RED/AMBER and prints each machine's own fix list. Relay it to that operator; do not run it for them.
  4. Repeat until the fleet is consistently GREEN, then ratify the manifest.

How to interpret a run

After running, summarize for the user:

  • The grade and the PASS/WARN/FAIL/INFO tallies.
  • Each FAIL and WARN with its exact fix command. Do not auto-apply.
  • The capability tier line — confirm the machine knows its Tier-0 fallback.
  • If publishing/aggregating, note how many machines have reported and which are RED.

Capability differences (no Ollama, no gh, ARM vs amd64, macOS vs Windows) are expected and must never be reported as broken — they are the whole point of the tier model.

Files

.claude/skills/self-check/
  SKILL.md                  this file
  scripts/self-check.sh     the probe engine (report / --json / --publish / fanout / aggregate)
  baseline/manifest.json    the provisional fleet baseline (single source of truth)
  baseline/README.md        the baseline model + how to refine/ratify it
.claude/commands/self-check.md   the /self-check runner

Extending the baseline

When a new tool/skill/command/hook becomes mandatory fleet-wide, edit baseline/manifest.json, commit, and /sync. Every machine's next self-check enforces it. Capability-only tools go in capability_tools with a matching entry in capability_rules describing the fallback. See baseline/README.md.