Files
claudetools/.claude/skills/self-check/SKILL.md
Mike Swanson b153ff158b feat(self-check): add harness self-diagnosis / fleet conformance skill
New /self-check skill: each machine probes its own ClaudeTools harness wiring
(identity.json paths, required tooling, settings.json hooks, skill/command/script
set, vault decrypt, coord/Gitea connectivity, Ollama capability tier) and grades
RED/AMBER/GREEN against a checked-in provisional baseline manifest.

- Capability-tier model: architectural/OS/hardware differences (e.g. no local
  Ollama) select a fallback ruleset instead of failing.
- Duplicate detection: flags command/skill names that diverge between the repo
  and ~/.claude (the "same /cmd, different behaviour" cross-machine bug);
  CRLF-only diffs ignored.
- Memory check: index + orphan detection, plus a model-driven semantic pass for
  memories that contradict identity/settings.
- V1 is a census tool: --publish writes a per-machine census to coord
  (component selfcheck_<host>); fanout requests the fleet to self-check +
  self-remediate + re-publish; aggregate derives the proposed baseline. No
  machine ever fixes another.

Reviewed twice by the Code Review Agent; three CRITICAL coord-API bugs and the
CRLF false-WARN found and fixed, verified live against the coord API.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 14:45:42 -07:00

162 lines
9.1 KiB
Markdown

---
name: self-check
description: >-
Self-diagnose a ClaudeTools session's machine: verify the harness is wired the
same way as every other instance while allowing for architectural / OS / hardware
differences. Checks that identity.json exists and is correct (the map of WHERE
things live on this box), required tooling is installed, env/paths resolve,
hooks are wired, the skill/command/script set matches the baseline, the vault
decrypts, coord/Gitea are reachable, and the machine's capability tier (e.g. no
local Ollama) resolves to the right fallback ruleset. Grades RED/AMBER/GREEN and
can publish a census to the coord API so the fleet baseline can be built/refined.
Invoke for: "self check", "self diagnosis", "self test", "doctor", "health check",
"am I configured right", "is my machine set up correctly", "harness conformance",
"fleet conformance", "check my environment", "is everything wired up".
---
# Self-Check — ClaudeTools Harness Self-Diagnosis
A top-to-bottom evaluation of how *this* machine's ClaudeTools harness is wired,
graded against a checked-in **baseline manifest** so every machine behaves the
same way — while explicitly allowing for architecture, OS, and hardware
differences via a **capability tier** model.
This is the skill the user asked for when a session needs to "make sure
everything is as it should be."
## The model in one paragraph
`identity.json` is the foundational, per-machine map of **where things live and
what this box can do** (vault path, repo root, platform, arch, python command,
Ollama endpoints). The **baseline manifest**
(`baseline/manifest.json`) declares what *every* machine must have — required
tools, identity fields, scripts, hook files, the wired `settings.json` hooks, the
canonical skill/command set, and the **capability rules** that say what to do when
a capability is absent (e.g. no local Ollama → use the remote endpoint, or if that
is also down, route Tier-0 work to haiku instead of blocking). The probe compares
the live machine against the manifest, resolves the machine's capability tier, and
grades RED/AMBER/GREEN. Required things missing = RED. Advisory drift = AMBER.
Capability differences are **never** failures — they select a ruleset.
## V1 is a CENSUS tool (read this)
There is no ratified fleet baseline yet. `baseline/manifest.json` is **provisional**,
generated from a single known-good machine (GURU-5070). So V1's job is to gather
ground truth from every machine and help Mike build the real baseline:
1. **Probe** — each machine runs the check and produces a structured census.
2. **Publish**`--publish` PUTs the census to coord as component
`selfcheck_<host>` (state = grade, notes = full JSON). One row per machine =
a live fleet conformance view.
3. **Fan out**`fanout` broadcasts a request to `ALL_SESSIONS` so every active
instance reports.
4. **Aggregate**`aggregate` reads all censuses back and proposes a baseline
(tools/skills/commands present on *all* machines = "required everywhere";
present on *some* = "capability-gated"), and lists machines with FAILs.
Mike reviews the aggregate and ratifies `manifest.json`. From then on the same
probe enforces conformance. **V1 does not auto-fix anything** — it reports the
exact fix command for each finding (per the decision on record).
## Running it
The probe is `scripts/self-check.sh` (bash; runs on Git Bash/Windows, macOS,
Linux; deps: jq + curl). Always pass a real UTC timestamp:
```bash
SELFCHECK_TS="$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
bash .claude/skills/self-check/scripts/self-check.sh <mode>
```
| Mode | Purpose |
|------|---------|
| `report` (default) | Human RED/AMBER/GREEN report. Exit 0/1/2 = GREEN/AMBER/RED. |
| `--json` | Structured census JSON to stdout (for piping). |
| `--publish` | Run + publish census to coord (component `selfcheck_<host>`). Softfails to `.claude/coord-queue.jsonl`. |
| `fanout` | Broadcast a census request to ALL_SESSIONS. |
| `aggregate` | Fleet table + proposed-baseline summary from published censuses. |
`/self-check` is the slash-command runner for the same script.
## What it checks
| Category | Checks |
|----------|--------|
| **identity** | identity.json exists + valid JSON; **all required fields present**; `claudetools_root` exists and equals the running repo; `vault_path` exists; `machine` == hostname; git user.name/email match identity. |
| **tooling** | required everywhere: bash, git, jq, curl, sops, age, ssh, a python. Missing = FAIL. |
| **capability** | ollama, cargo, node, gh, docker, op — presence is INFO, never a failure. Resolves the **Ollama tier** (local / remote / none) and prints the effective Tier-0 ruleset. |
| **files** | required scripts + hook files present and executable. |
| **hooks** | the three `settings.json` hooks are wired (block-backslash PreToolUse, check-messages UserPromptSubmit, sync-memory SessionStart); `current-mode` present. |
| **git** | origin points at ACG Gitea (internal IP preferred); main-repo post-commit hook installed (AMBER if not). |
| **skills/commands** | every skill dir and command file in the baseline is present; extras are reported as census candidates. |
| **duplicates** | command/skill names present in BOTH the repo and `~/.claude`. Divergent content = WARN (the "same `/cmd`, different behaviour on the Mac" bug); identical = INFO (redundant, will drift). CRLF-only differences are ignored. |
| **memory** | `MEMORY.md` index exists; no orphaned memory files; manifest-declared contradiction patterns (see semantic pass below). Never FAILs the grade. |
| **vault** | vault repo exists; sops+age present; `vault.sh list` succeeds (decrypt wired). |
| **connectivity** | coord API (required), main API + internal Gitea (advisory; off-network is OK). |
## Rogue-memory contradiction — semantic pass (do this when asked, or on a full check)
The engine's memory check is deterministic and conservative (index + orphans +
declared patterns) so it never produces false alarms. A *true* contradiction
check — "does any memory directly contradict what this machine's settings say?"
— is a judgment task, so the model does it (route the prose/classification to
Ollama Tier-0 per the house rules; Claude reviews the result):
1. Read `identity.json` (where things live + this box's capabilities),
`settings.json` (wired hooks/permissions), and `baseline/manifest.json`.
2. Read the memory index `.claude/memory/MEMORY.md`, then open any memory whose
one-line hook touches: paths/roots, python launcher, endpoints/IPs, OS/arch
assumptions, tool choices, or model routing.
3. Flag memories that **directly contradict** this machine's reality, e.g.:
- prescribes `python3`/`python` when `identity.python.command` is `py` (or vice-versa),
- hardcodes a repo/vault path that isn't this machine's `claudetools_root`/`vault_path`,
- names an endpoint/IP that conflicts with `identity.coord_api` or the manifest,
- assumes a capability (local Ollama) this machine's tier says is absent.
4. Report each as: memory file, the contradicting claim, the setting it violates,
and a suggested correction. **Do not edit memories** — surface for the operator
(deletions/rewrites go through the human, mirroring memory-dream's posture).
Genuinely machine-specific guidance in a *shared* memory is the usual culprit —
the fix is to scope it ("on Windows…") or split it, not to globally flip it.
## Fleet self-remediation loop (machines fix themselves)
We never fix a remote machine. The flow is:
1. `fanout` — broadcast asks every instance to self-check + self-fix + re-publish.
2. Each operator runs `/self-check` locally, applies the printed fix commands on
their own box, re-runs to confirm GREEN, then `/self-check --publish`.
3. `aggregate` — shows who is still RED/AMBER and prints each machine's own fix
list. Relay it to that operator; do not run it for them.
4. Repeat until the fleet is consistently GREEN, then ratify the manifest.
## How to interpret a run
After running, summarize for the user:
- The **grade** and the PASS/WARN/FAIL/INFO tallies.
- Each **FAIL** and **WARN** with its exact fix command. Do not auto-apply.
- The **capability tier** line — confirm the machine knows its Tier-0 fallback.
- If publishing/aggregating, note how many machines have reported and which are RED.
Capability differences (no Ollama, no gh, ARM vs amd64, macOS vs Windows) are
expected and must never be reported as broken — they are the whole point of the
tier model.
## Files
```
.claude/skills/self-check/
SKILL.md this file
scripts/self-check.sh the probe engine (report / --json / --publish / fanout / aggregate)
baseline/manifest.json the provisional fleet baseline (single source of truth)
baseline/README.md the baseline model + how to refine/ratify it
.claude/commands/self-check.md the /self-check runner
```
## Extending the baseline
When a new tool/skill/command/hook becomes mandatory fleet-wide, edit
`baseline/manifest.json`, commit, and `/sync`. Every machine's next self-check
enforces it. Capability-only tools go in `capability_tools` with a matching entry
in `capability_rules` describing the fallback. See `baseline/README.md`.