New /self-check skill: each machine probes its own ClaudeTools harness wiring (identity.json paths, required tooling, settings.json hooks, skill/command/script set, vault decrypt, coord/Gitea connectivity, Ollama capability tier) and grades RED/AMBER/GREEN against a checked-in provisional baseline manifest. - Capability-tier model: architectural/OS/hardware differences (e.g. no local Ollama) select a fallback ruleset instead of failing. - Duplicate detection: flags command/skill names that diverge between the repo and ~/.claude (the "same /cmd, different behaviour" cross-machine bug); CRLF-only diffs ignored. - Memory check: index + orphan detection, plus a model-driven semantic pass for memories that contradict identity/settings. - V1 is a census tool: --publish writes a per-machine census to coord (component selfcheck_<host>); fanout requests the fleet to self-check + self-remediate + re-publish; aggregate derives the proposed baseline. No machine ever fixes another. Reviewed twice by the Code Review Agent; three CRITICAL coord-API bugs and the CRLF false-WARN found and fixed, verified live against the coord API. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
162 lines
9.1 KiB
Markdown
162 lines
9.1 KiB
Markdown
---
|
|
name: self-check
|
|
description: >-
|
|
Self-diagnose a ClaudeTools session's machine: verify the harness is wired the
|
|
same way as every other instance while allowing for architectural / OS / hardware
|
|
differences. Checks that identity.json exists and is correct (the map of WHERE
|
|
things live on this box), required tooling is installed, env/paths resolve,
|
|
hooks are wired, the skill/command/script set matches the baseline, the vault
|
|
decrypts, coord/Gitea are reachable, and the machine's capability tier (e.g. no
|
|
local Ollama) resolves to the right fallback ruleset. Grades RED/AMBER/GREEN and
|
|
can publish a census to the coord API so the fleet baseline can be built/refined.
|
|
Invoke for: "self check", "self diagnosis", "self test", "doctor", "health check",
|
|
"am I configured right", "is my machine set up correctly", "harness conformance",
|
|
"fleet conformance", "check my environment", "is everything wired up".
|
|
---
|
|
|
|
# Self-Check — ClaudeTools Harness Self-Diagnosis
|
|
|
|
A top-to-bottom evaluation of how *this* machine's ClaudeTools harness is wired,
|
|
graded against a checked-in **baseline manifest** so every machine behaves the
|
|
same way — while explicitly allowing for architecture, OS, and hardware
|
|
differences via a **capability tier** model.
|
|
|
|
This is the skill the user asked for when a session needs to "make sure
|
|
everything is as it should be."
|
|
|
|
## The model in one paragraph
|
|
|
|
`identity.json` is the foundational, per-machine map of **where things live and
|
|
what this box can do** (vault path, repo root, platform, arch, python command,
|
|
Ollama endpoints). The **baseline manifest**
|
|
(`baseline/manifest.json`) declares what *every* machine must have — required
|
|
tools, identity fields, scripts, hook files, the wired `settings.json` hooks, the
|
|
canonical skill/command set, and the **capability rules** that say what to do when
|
|
a capability is absent (e.g. no local Ollama → use the remote endpoint, or if that
|
|
is also down, route Tier-0 work to haiku instead of blocking). The probe compares
|
|
the live machine against the manifest, resolves the machine's capability tier, and
|
|
grades RED/AMBER/GREEN. Required things missing = RED. Advisory drift = AMBER.
|
|
Capability differences are **never** failures — they select a ruleset.
|
|
|
|
## V1 is a CENSUS tool (read this)
|
|
|
|
There is no ratified fleet baseline yet. `baseline/manifest.json` is **provisional**,
|
|
generated from a single known-good machine (GURU-5070). So V1's job is to gather
|
|
ground truth from every machine and help Mike build the real baseline:
|
|
|
|
1. **Probe** — each machine runs the check and produces a structured census.
|
|
2. **Publish** — `--publish` PUTs the census to coord as component
|
|
`selfcheck_<host>` (state = grade, notes = full JSON). One row per machine =
|
|
a live fleet conformance view.
|
|
3. **Fan out** — `fanout` broadcasts a request to `ALL_SESSIONS` so every active
|
|
instance reports.
|
|
4. **Aggregate** — `aggregate` reads all censuses back and proposes a baseline
|
|
(tools/skills/commands present on *all* machines = "required everywhere";
|
|
present on *some* = "capability-gated"), and lists machines with FAILs.
|
|
|
|
Mike reviews the aggregate and ratifies `manifest.json`. From then on the same
|
|
probe enforces conformance. **V1 does not auto-fix anything** — it reports the
|
|
exact fix command for each finding (per the decision on record).
|
|
|
|
## Running it
|
|
|
|
The probe is `scripts/self-check.sh` (bash; runs on Git Bash/Windows, macOS,
|
|
Linux; deps: jq + curl). Always pass a real UTC timestamp:
|
|
|
|
```bash
|
|
SELFCHECK_TS="$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
|
|
bash .claude/skills/self-check/scripts/self-check.sh <mode>
|
|
```
|
|
|
|
| Mode | Purpose |
|
|
|------|---------|
|
|
| `report` (default) | Human RED/AMBER/GREEN report. Exit 0/1/2 = GREEN/AMBER/RED. |
|
|
| `--json` | Structured census JSON to stdout (for piping). |
|
|
| `--publish` | Run + publish census to coord (component `selfcheck_<host>`). Softfails to `.claude/coord-queue.jsonl`. |
|
|
| `fanout` | Broadcast a census request to ALL_SESSIONS. |
|
|
| `aggregate` | Fleet table + proposed-baseline summary from published censuses. |
|
|
|
|
`/self-check` is the slash-command runner for the same script.
|
|
|
|
## What it checks
|
|
|
|
| Category | Checks |
|
|
|----------|--------|
|
|
| **identity** | identity.json exists + valid JSON; **all required fields present**; `claudetools_root` exists and equals the running repo; `vault_path` exists; `machine` == hostname; git user.name/email match identity. |
|
|
| **tooling** | required everywhere: bash, git, jq, curl, sops, age, ssh, a python. Missing = FAIL. |
|
|
| **capability** | ollama, cargo, node, gh, docker, op — presence is INFO, never a failure. Resolves the **Ollama tier** (local / remote / none) and prints the effective Tier-0 ruleset. |
|
|
| **files** | required scripts + hook files present and executable. |
|
|
| **hooks** | the three `settings.json` hooks are wired (block-backslash PreToolUse, check-messages UserPromptSubmit, sync-memory SessionStart); `current-mode` present. |
|
|
| **git** | origin points at ACG Gitea (internal IP preferred); main-repo post-commit hook installed (AMBER if not). |
|
|
| **skills/commands** | every skill dir and command file in the baseline is present; extras are reported as census candidates. |
|
|
| **duplicates** | command/skill names present in BOTH the repo and `~/.claude`. Divergent content = WARN (the "same `/cmd`, different behaviour on the Mac" bug); identical = INFO (redundant, will drift). CRLF-only differences are ignored. |
|
|
| **memory** | `MEMORY.md` index exists; no orphaned memory files; manifest-declared contradiction patterns (see semantic pass below). Never FAILs the grade. |
|
|
| **vault** | vault repo exists; sops+age present; `vault.sh list` succeeds (decrypt wired). |
|
|
| **connectivity** | coord API (required), main API + internal Gitea (advisory; off-network is OK). |
|
|
|
|
## Rogue-memory contradiction — semantic pass (do this when asked, or on a full check)
|
|
|
|
The engine's memory check is deterministic and conservative (index + orphans +
|
|
declared patterns) so it never produces false alarms. A *true* contradiction
|
|
check — "does any memory directly contradict what this machine's settings say?"
|
|
— is a judgment task, so the model does it (route the prose/classification to
|
|
Ollama Tier-0 per the house rules; Claude reviews the result):
|
|
|
|
1. Read `identity.json` (where things live + this box's capabilities),
|
|
`settings.json` (wired hooks/permissions), and `baseline/manifest.json`.
|
|
2. Read the memory index `.claude/memory/MEMORY.md`, then open any memory whose
|
|
one-line hook touches: paths/roots, python launcher, endpoints/IPs, OS/arch
|
|
assumptions, tool choices, or model routing.
|
|
3. Flag memories that **directly contradict** this machine's reality, e.g.:
|
|
- prescribes `python3`/`python` when `identity.python.command` is `py` (or vice-versa),
|
|
- hardcodes a repo/vault path that isn't this machine's `claudetools_root`/`vault_path`,
|
|
- names an endpoint/IP that conflicts with `identity.coord_api` or the manifest,
|
|
- assumes a capability (local Ollama) this machine's tier says is absent.
|
|
4. Report each as: memory file, the contradicting claim, the setting it violates,
|
|
and a suggested correction. **Do not edit memories** — surface for the operator
|
|
(deletions/rewrites go through the human, mirroring memory-dream's posture).
|
|
|
|
Genuinely machine-specific guidance in a *shared* memory is the usual culprit —
|
|
the fix is to scope it ("on Windows…") or split it, not to globally flip it.
|
|
|
|
## Fleet self-remediation loop (machines fix themselves)
|
|
|
|
We never fix a remote machine. The flow is:
|
|
|
|
1. `fanout` — broadcast asks every instance to self-check + self-fix + re-publish.
|
|
2. Each operator runs `/self-check` locally, applies the printed fix commands on
|
|
their own box, re-runs to confirm GREEN, then `/self-check --publish`.
|
|
3. `aggregate` — shows who is still RED/AMBER and prints each machine's own fix
|
|
list. Relay it to that operator; do not run it for them.
|
|
4. Repeat until the fleet is consistently GREEN, then ratify the manifest.
|
|
|
|
## How to interpret a run
|
|
|
|
After running, summarize for the user:
|
|
- The **grade** and the PASS/WARN/FAIL/INFO tallies.
|
|
- Each **FAIL** and **WARN** with its exact fix command. Do not auto-apply.
|
|
- The **capability tier** line — confirm the machine knows its Tier-0 fallback.
|
|
- If publishing/aggregating, note how many machines have reported and which are RED.
|
|
|
|
Capability differences (no Ollama, no gh, ARM vs amd64, macOS vs Windows) are
|
|
expected and must never be reported as broken — they are the whole point of the
|
|
tier model.
|
|
|
|
## Files
|
|
|
|
```
|
|
.claude/skills/self-check/
|
|
SKILL.md this file
|
|
scripts/self-check.sh the probe engine (report / --json / --publish / fanout / aggregate)
|
|
baseline/manifest.json the provisional fleet baseline (single source of truth)
|
|
baseline/README.md the baseline model + how to refine/ratify it
|
|
.claude/commands/self-check.md the /self-check runner
|
|
```
|
|
|
|
## Extending the baseline
|
|
|
|
When a new tool/skill/command/hook becomes mandatory fleet-wide, edit
|
|
`baseline/manifest.json`, commit, and `/sync`. Every machine's next self-check
|
|
enforces it. Capability-only tools go in `capability_tools` with a matching entry
|
|
in `capability_rules` describing the fallback. See `baseline/README.md`.
|