feat(self-check): add harness self-diagnosis / fleet conformance skill

New /self-check skill: each machine probes its own ClaudeTools harness wiring
(identity.json paths, required tooling, settings.json hooks, skill/command/script
set, vault decrypt, coord/Gitea connectivity, Ollama capability tier) and grades
RED/AMBER/GREEN against a checked-in provisional baseline manifest.

- Capability-tier model: architectural/OS/hardware differences (e.g. no local
  Ollama) select a fallback ruleset instead of failing.
- Duplicate detection: flags command/skill names that diverge between the repo
  and ~/.claude (the "same /cmd, different behaviour" cross-machine bug);
  CRLF-only diffs ignored.
- Memory check: index + orphan detection, plus a model-driven semantic pass for
  memories that contradict identity/settings.
- V1 is a census tool: --publish writes a per-machine census to coord
  (component selfcheck_<host>); fanout requests the fleet to self-check +
  self-remediate + re-publish; aggregate derives the proposed baseline. No
  machine ever fixes another.

Reviewed twice by the Code Review Agent; three CRITICAL coord-API bugs and the
CRLF false-WARN found and fixed, verified live against the coord API.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-02 14:45:42 -07:00
parent bf7079383f
commit b153ff158b
5 changed files with 1142 additions and 0 deletions

View File

@@ -0,0 +1,53 @@
---
name: self-check
description: Self-diagnose this ClaudeTools machine's harness wiring (tools, identity, hooks, skills, commands, scripts, capability tier, connectivity) against the fleet baseline, and optionally publish the census to coord.
---
# /self-check — ClaudeTools Harness Self-Diagnosis
Runs the conformance probe in `.claude/skills/self-check/scripts/self-check.sh`,
which grades this machine RED/AMBER/GREEN against the provisional baseline manifest
and accounts for capability differences (e.g. no local Ollama -> remote/none ruleset).
See `.claude/skills/self-check/SKILL.md` for the full model. This command is a thin
runner — invoke the **self-check** skill for interpretation and follow-up.
## Run it
Always pass a real UTC timestamp via `SELFCHECK_TS` (the script falls back to
`date` if available, but pass it explicitly for correctness):
```bash
SELFCHECK_TS="$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
bash .claude/skills/self-check/scripts/self-check.sh report
```
Modes:
| Arg | What it does |
|-----|--------------|
| `report` (default) | Human-readable RED/AMBER/GREEN report. Exit 0/1/2 = GREEN/AMBER/RED. |
| `--json` | Structured census JSON only (for piping/aggregation). |
| `--publish` | Run + PUT the census to coord as component `selfcheck_<host>`. |
| `fanout` | Broadcast a self-check + self-remediate + re-publish request to ALL_SESSIONS. |
| `aggregate` | Read every machine's published census; print fleet table + proposed baseline. |
## Typical flows
**Just check this box:**
```bash
SELFCHECK_TS="$(date -u +%Y-%m-%dT%H:%M:%SZ)" bash .claude/skills/self-check/scripts/self-check.sh report
```
**Build the fleet baseline (V1 census):**
```bash
# 1) From any machine, request the fleet to report:
bash .claude/skills/self-check/scripts/self-check.sh fanout
# 2) Each instance (incl. this one) publishes:
SELFCHECK_TS="$(date -u +%Y-%m-%dT%H:%M:%SZ)" bash .claude/skills/self-check/scripts/self-check.sh --publish
# 3) Aggregate what came back:
bash .claude/skills/self-check/scripts/self-check.sh aggregate
```
After running, summarize the grade and the FAIL/WARN items for the user with the
exact fix commands. Do NOT auto-apply fixes — V1 is report-only by design.

View File

@@ -0,0 +1,161 @@
---
name: self-check
description: >-
Self-diagnose a ClaudeTools session's machine: verify the harness is wired the
same way as every other instance while allowing for architectural / OS / hardware
differences. Checks that identity.json exists and is correct (the map of WHERE
things live on this box), required tooling is installed, env/paths resolve,
hooks are wired, the skill/command/script set matches the baseline, the vault
decrypts, coord/Gitea are reachable, and the machine's capability tier (e.g. no
local Ollama) resolves to the right fallback ruleset. Grades RED/AMBER/GREEN and
can publish a census to the coord API so the fleet baseline can be built/refined.
Invoke for: "self check", "self diagnosis", "self test", "doctor", "health check",
"am I configured right", "is my machine set up correctly", "harness conformance",
"fleet conformance", "check my environment", "is everything wired up".
---
# Self-Check — ClaudeTools Harness Self-Diagnosis
A top-to-bottom evaluation of how *this* machine's ClaudeTools harness is wired,
graded against a checked-in **baseline manifest** so every machine behaves the
same way — while explicitly allowing for architecture, OS, and hardware
differences via a **capability tier** model.
This is the skill the user asked for when a session needs to "make sure
everything is as it should be."
## The model in one paragraph
`identity.json` is the foundational, per-machine map of **where things live and
what this box can do** (vault path, repo root, platform, arch, python command,
Ollama endpoints). The **baseline manifest**
(`baseline/manifest.json`) declares what *every* machine must have — required
tools, identity fields, scripts, hook files, the wired `settings.json` hooks, the
canonical skill/command set, and the **capability rules** that say what to do when
a capability is absent (e.g. no local Ollama → use the remote endpoint, or if that
is also down, route Tier-0 work to haiku instead of blocking). The probe compares
the live machine against the manifest, resolves the machine's capability tier, and
grades RED/AMBER/GREEN. Required things missing = RED. Advisory drift = AMBER.
Capability differences are **never** failures — they select a ruleset.
## V1 is a CENSUS tool (read this)
There is no ratified fleet baseline yet. `baseline/manifest.json` is **provisional**,
generated from a single known-good machine (GURU-5070). So V1's job is to gather
ground truth from every machine and help Mike build the real baseline:
1. **Probe** — each machine runs the check and produces a structured census.
2. **Publish**`--publish` PUTs the census to coord as component
`selfcheck_<host>` (state = grade, notes = full JSON). One row per machine =
a live fleet conformance view.
3. **Fan out**`fanout` broadcasts a request to `ALL_SESSIONS` so every active
instance reports.
4. **Aggregate**`aggregate` reads all censuses back and proposes a baseline
(tools/skills/commands present on *all* machines = "required everywhere";
present on *some* = "capability-gated"), and lists machines with FAILs.
Mike reviews the aggregate and ratifies `manifest.json`. From then on the same
probe enforces conformance. **V1 does not auto-fix anything** — it reports the
exact fix command for each finding (per the decision on record).
## Running it
The probe is `scripts/self-check.sh` (bash; runs on Git Bash/Windows, macOS,
Linux; deps: jq + curl). Always pass a real UTC timestamp:
```bash
SELFCHECK_TS="$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
bash .claude/skills/self-check/scripts/self-check.sh <mode>
```
| Mode | Purpose |
|------|---------|
| `report` (default) | Human RED/AMBER/GREEN report. Exit 0/1/2 = GREEN/AMBER/RED. |
| `--json` | Structured census JSON to stdout (for piping). |
| `--publish` | Run + publish census to coord (component `selfcheck_<host>`). Softfails to `.claude/coord-queue.jsonl`. |
| `fanout` | Broadcast a census request to ALL_SESSIONS. |
| `aggregate` | Fleet table + proposed-baseline summary from published censuses. |
`/self-check` is the slash-command runner for the same script.
## What it checks
| Category | Checks |
|----------|--------|
| **identity** | identity.json exists + valid JSON; **all required fields present**; `claudetools_root` exists and equals the running repo; `vault_path` exists; `machine` == hostname; git user.name/email match identity. |
| **tooling** | required everywhere: bash, git, jq, curl, sops, age, ssh, a python. Missing = FAIL. |
| **capability** | ollama, cargo, node, gh, docker, op — presence is INFO, never a failure. Resolves the **Ollama tier** (local / remote / none) and prints the effective Tier-0 ruleset. |
| **files** | required scripts + hook files present and executable. |
| **hooks** | the three `settings.json` hooks are wired (block-backslash PreToolUse, check-messages UserPromptSubmit, sync-memory SessionStart); `current-mode` present. |
| **git** | origin points at ACG Gitea (internal IP preferred); main-repo post-commit hook installed (AMBER if not). |
| **skills/commands** | every skill dir and command file in the baseline is present; extras are reported as census candidates. |
| **duplicates** | command/skill names present in BOTH the repo and `~/.claude`. Divergent content = WARN (the "same `/cmd`, different behaviour on the Mac" bug); identical = INFO (redundant, will drift). CRLF-only differences are ignored. |
| **memory** | `MEMORY.md` index exists; no orphaned memory files; manifest-declared contradiction patterns (see semantic pass below). Never FAILs the grade. |
| **vault** | vault repo exists; sops+age present; `vault.sh list` succeeds (decrypt wired). |
| **connectivity** | coord API (required), main API + internal Gitea (advisory; off-network is OK). |
## Rogue-memory contradiction — semantic pass (do this when asked, or on a full check)
The engine's memory check is deterministic and conservative (index + orphans +
declared patterns) so it never produces false alarms. A *true* contradiction
check — "does any memory directly contradict what this machine's settings say?"
— is a judgment task, so the model does it (route the prose/classification to
Ollama Tier-0 per the house rules; Claude reviews the result):
1. Read `identity.json` (where things live + this box's capabilities),
`settings.json` (wired hooks/permissions), and `baseline/manifest.json`.
2. Read the memory index `.claude/memory/MEMORY.md`, then open any memory whose
one-line hook touches: paths/roots, python launcher, endpoints/IPs, OS/arch
assumptions, tool choices, or model routing.
3. Flag memories that **directly contradict** this machine's reality, e.g.:
- prescribes `python3`/`python` when `identity.python.command` is `py` (or vice-versa),
- hardcodes a repo/vault path that isn't this machine's `claudetools_root`/`vault_path`,
- names an endpoint/IP that conflicts with `identity.coord_api` or the manifest,
- assumes a capability (local Ollama) this machine's tier says is absent.
4. Report each as: memory file, the contradicting claim, the setting it violates,
and a suggested correction. **Do not edit memories** — surface for the operator
(deletions/rewrites go through the human, mirroring memory-dream's posture).
Genuinely machine-specific guidance in a *shared* memory is the usual culprit —
the fix is to scope it ("on Windows…") or split it, not to globally flip it.
## Fleet self-remediation loop (machines fix themselves)
We never fix a remote machine. The flow is:
1. `fanout` — broadcast asks every instance to self-check + self-fix + re-publish.
2. Each operator runs `/self-check` locally, applies the printed fix commands on
their own box, re-runs to confirm GREEN, then `/self-check --publish`.
3. `aggregate` — shows who is still RED/AMBER and prints each machine's own fix
list. Relay it to that operator; do not run it for them.
4. Repeat until the fleet is consistently GREEN, then ratify the manifest.
## How to interpret a run
After running, summarize for the user:
- The **grade** and the PASS/WARN/FAIL/INFO tallies.
- Each **FAIL** and **WARN** with its exact fix command. Do not auto-apply.
- The **capability tier** line — confirm the machine knows its Tier-0 fallback.
- If publishing/aggregating, note how many machines have reported and which are RED.
Capability differences (no Ollama, no gh, ARM vs amd64, macOS vs Windows) are
expected and must never be reported as broken — they are the whole point of the
tier model.
## Files
```
.claude/skills/self-check/
SKILL.md this file
scripts/self-check.sh the probe engine (report / --json / --publish / fanout / aggregate)
baseline/manifest.json the provisional fleet baseline (single source of truth)
baseline/README.md the baseline model + how to refine/ratify it
.claude/commands/self-check.md the /self-check runner
```
## Extending the baseline
When a new tool/skill/command/hook becomes mandatory fleet-wide, edit
`baseline/manifest.json`, commit, and `/sync`. Every machine's next self-check
enforces it. Capability-only tools go in `capability_tools` with a matching entry
in `capability_rules` describing the fallback. See `baseline/README.md`.

View File

@@ -0,0 +1,68 @@
# Self-Check Baseline
`manifest.json` is the **single source of truth** for how a ClaudeTools machine
should be wired. Every machine's `/self-check` grades itself against this file.
It is checked into the repo and syncs to all workstations via Gitea, so updating
it here updates the standard for the whole fleet.
## Why a checked-in manifest (the "how do we set a baseline" answer)
The fleet has architectural differences — Windows/macOS/Linux, amd64/arm64, boxes
with and without a local GPU for Ollama. We still want every machine to *behave*
the same. A checked-in manifest gives us that:
- **Required everywhere** lives in `required_*` keys. Missing = a real failure on
any machine, regardless of OS.
- **Allowed-to-differ** lives in `capability_tools` + `capability_rules`. A capability
being absent is never a failure — it selects a *fallback ruleset* so the machine
knows how to behave without it (the canonical example: no local Ollama → use the
remote endpoint; if that is down too, route Tier-0 work to haiku instead of
blocking on it).
`identity.json` (per-machine, gitignored) carries the machine's actual coordinates
and capabilities. The manifest says what's required; identity says where things are
and what this box can do; the probe reconciles the two.
## How the baseline gets built (V1 census → ratified baseline)
The current `manifest.json` is **provisional** — generated from one known-good
machine (GURU-5070, 2026-06-02). It has not been confirmed against the fleet.
The build sequence:
1. `self-check.sh fanout` — ask every active instance to report.
2. Each machine runs `self-check.sh --publish` — publishes its census to coord as
component `selfcheck_<host>`.
3. `self-check.sh aggregate` — reads them all back and proposes:
- items present on **all** reporting machines → **required everywhere** candidates,
- items present on **some** machines → **capability-gated** candidates.
4. Mike reviews the proposal and edits this `manifest.json` to ratify it, then
flips `"status": "provisional"` to `"ratified"` and commits.
Until ratified, treat "extra"/"missing" skill and command findings as **candidates**,
not gospel.
## Manifest structure
| Key | Meaning |
|-----|---------|
| `schema_version`, `status` | manifest version; `provisional` or `ratified`. |
| `required_tools[]` | tools every machine must have on PATH (`name`, `why`). Missing = FAIL. |
| `required_python.any_of[]` | acceptable python launchers (`py`/`python3`/`python`). |
| `capability_tools[]` | optional tools; presence is INFO and toggles a `capability`. |
| `required_identity_fields[]` | dotted identity.json paths that must be set. |
| `required_scripts[]`, `required_hook_files[]` | repo files that must exist (+ executable). |
| `required_settings_hooks[]` | hooks that must be wired in settings.json (`event`, `command_contains`, `why`). |
| `git` | expected remote host, internal IP, post-commit-hook expectation. |
| `skills[]`, `commands[]` | the canonical skill dirs / command files. |
| `connectivity[]` | endpoints to probe (`required` ones FAIL if unreachable). |
| `capability_rules{}` | per-capability fallback behavior (the "different rules" for different hardware). |
## Editing rules
- Add a **required** item only when it should hold on *every* machine, every OS.
If a Windows box legitimately can't have it, it's a capability, not a requirement.
- Every `capability_tools` entry needs a story for what to do without it — encode
it in `capability_rules` (or reference an existing tier) so the probe can print
the effective ruleset.
- Keep paths repo-relative and forward-slashed.
- After editing: commit + `/sync`. The fleet picks it up on next pull.

View File

@@ -0,0 +1,114 @@
{
"schema_version": "1.0.0",
"status": "provisional",
"derived_from": "GURU-5070",
"derived_at": "2026-06-02",
"note": "PROVISIONAL baseline, generated from a single known-good machine. V1 of self-check is a CENSUS tool: every machine probes itself, publishes to the coord API, and we refine this manifest from real fleet data (see baseline/README.md). Do NOT treat 'extra' or 'missing' items as authoritative until the fleet census has confirmed them across machines.",
"required_tools": [
{ "name": "bash", "why": "hooks, scripts, sync, vault wrapper" },
{ "name": "git", "why": "repo + submodules + Gitea sync" },
{ "name": "jq", "why": "every hook and coord script parses JSON with jq" },
{ "name": "curl", "why": "coord API, vault, RMM, all HTTP calls" },
{ "name": "sops", "why": "vault decryption (SOPS)" },
{ "name": "age", "why": "SOPS age recipient/decrypt" },
{ "name": "ssh", "why": "infra access; must be system OpenSSH" }
],
"required_python": {
"any_of": ["py", "python3", "python"],
"why": "JSON sanitizer in check-messages.sh, identity migration, skill scripts. The resolved command is recorded in identity.json (.python.command)."
},
"capability_tools": [
{ "name": "ollama", "capability": "ollama_local", "why": "Tier-0 local inference (prose/classification)" },
{ "name": "cargo", "capability": "rust_build", "why": "GuruRMM / GuruConnect Rust builds" },
{ "name": "node", "capability": "node_build", "why": "dashboard / TS builds" },
{ "name": "gh", "capability": "github_cli", "why": "optional GitHub operations" },
{ "name": "docker", "capability": "containers", "why": "optional container workflows" },
{ "name": "op", "capability": "onepassword_cli","why": "1Password fallback credential access" }
],
"required_identity_fields": [
"user", "full_name", "email", "role", "machine",
"vault_path", "claudetools_root", "platform", "architecture",
"python.command", "ollama.endpoint", "ollama.fallback", "ollama.prose_model"
],
"optional_identity_fields": ["coord_api", "last_updated"],
"required_scripts": [
".claude/scripts/vault.sh",
".claude/scripts/sync.sh",
".claude/scripts/sync-memory.sh",
".claude/scripts/check-messages.sh",
".claude/scripts/migrate-identity.sh"
],
"required_hook_files": [
".claude/hooks/block-backslash-winpath.sh",
".claude/hooks/post-commit.template"
],
"required_settings_hooks": [
{ "event": "PreToolUse", "matcher": "Bash", "command_contains": "block-backslash-winpath.sh", "why": "blocks garbled backslash Windows-path redirects in Git Bash" },
{ "event": "UserPromptSubmit", "matcher": "", "command_contains": "check-messages.sh", "why": "injects unread coord messages + dev-mode locks each prompt" },
{ "event": "SessionStart", "matcher": "", "command_contains": "sync-memory.sh", "why": "pulls shared memory at session start" }
],
"git": {
"remote_host_contains": "git.azcomputerguru.com",
"remote_host_internal_ip": "172.16.3.20",
"remote_note": "On-network machines should use the internal Gitea IP (172.16.3.20:3000) to bypass NPM SSL-renewal blips; off-network may use the domain. Either is acceptable; a non-ACG remote is a FAIL.",
"post_commit_hook_expected": true,
"post_commit_hook_note": "HOOKS.md mandates the dev-alerts post-commit hook in the main repo and each initialized submodule. Missing = AMBER (informational; reinstall from .claude/hooks/post-commit.template)."
},
"skills": [
"1password", "b2", "bitdefender", "frontend-design", "gc-audit",
"impeccable", "memory-dream", "remediation-tool", "rmm-audit",
"skill-creator", "stop-slop", "theme-factory", "self-check"
],
"commands": [
"1password", "autotask", "checkpoint", "context", "create-spec",
"feature-request", "forum-post", "gc-feature-request", "import",
"inject-standards", "mailbox", "mode", "recover", "remediation-tool",
"rmm", "save", "scc", "shape-spec", "sync", "syncro-emergency-billing",
"syncro", "wiki-compile", "wiki-lint", "self-check"
],
"connectivity": [
{ "name": "coord_api", "url": "http://172.16.3.30:8001/api/coord/status", "required": true, "why": "live coordination source of truth" },
{ "name": "claudetools_api","url": "http://172.16.3.30:8001/health", "required": false, "why": "main API health" },
{ "name": "gitea_internal", "url": "http://172.16.3.20:3000", "required": false, "why": "internal Gitea (git/API on-network)" }
],
"memory": {
"note": "Deterministic memory checks: MEMORY.md index exists + no orphaned memory files, plus the contradiction_patterns below. A pattern fires ONLY on machines where identity.<when_field> == when_equals, so it flags a memory only where it is actually a contradiction for THIS box. Kept empty in V1 to avoid false positives; the real semantic contradiction analysis (memories vs identity.json + settings.json + this manifest) is done by the model per SKILL.md, optionally via Ollama Tier-0.",
"pattern_schema": {
"when_field": "dotted identity.json path, e.g. python.command",
"when_equals": "value that makes the grep a contradiction, e.g. python3",
"grep": "ERE matched case-insensitively against memory files",
"why": "human explanation shown in the finding"
},
"contradiction_patterns": []
},
"capability_rules": {
"ollama_local": {
"tier0_engine": "local ollama (localhost:11434) for summarize/classify/extract/draft",
"detect": "curl localhost:11434/api/tags reachable",
"fallback_if_unavailable": "ollama_remote"
},
"ollama_remote": {
"tier0_engine": "remote ollama via identity.ollama.fallback (Beast over Tailscale)",
"detect": "localhost unreachable but identity.ollama.fallback reachable",
"fallback_if_unavailable": "ollama_none"
},
"ollama_none": {
"tier0_engine": "NONE - no Tier-0 path. Route low-stakes prose/classification to Tier-1 (haiku) instead of Ollama. Do NOT block work waiting on Ollama.",
"detect": "neither localhost nor fallback reachable",
"fallback_if_unavailable": null
}
}
}

View File

@@ -0,0 +1,746 @@
#!/usr/bin/env bash
# self-check.sh - ClaudeTools harness self-diagnosis / fleet conformance probe.
#
# V1 is a CENSUS tool. Each machine probes its own harness wiring (tools,
# identity, hooks, skills, commands, scripts, connectivity, capability tier),
# grades what it can against the provisional baseline manifest, and can publish
# the result to the coord API so the fleet can be compared and the baseline
# refined from real data. See ../SKILL.md and ../baseline/README.md.
#
# Usage:
# self-check.sh Run checks, print a human report. (default)
# self-check.sh --json Emit the structured census JSON to stdout only.
# self-check.sh --publish Run checks, then PUT the census to coord (component selfcheck_<host>).
# self-check.sh fanout Broadcast a request to ALL_SESSIONS to run /self-check --publish.
# self-check.sh aggregate Read every machine's published census and print a fleet table
# plus a proposed-baseline (intersection/union) summary.
#
# Portable: bash 3.2+ (macOS), Git Bash (Windows), Linux. Deps: jq, curl.
# Read-only. It collects and reports; it changes nothing on the machine.
set -u
# ---------------------------------------------------------------------------
# Bootstrap: resolve repo root, identity, coord API, session id
# ---------------------------------------------------------------------------
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
SKILL_DIR="$(cd "$SCRIPT_DIR/.." && pwd)"
REPO_ROOT="$(cd "$SCRIPT_DIR/../../../.." && pwd)"
MANIFEST="$SKILL_DIR/baseline/manifest.json"
if ! command -v jq >/dev/null 2>&1; then
echo "[ERROR] jq is required and not found on PATH. Install jq, then re-run." >&2
exit 2
fi
# Some Windows jq builds (winget) emit CRLF line endings; a trailing \r corrupts
# every `for x in $(jq ...)` word and `read`-from-@tsv field. Strip \r from all
# jq output (it is insignificant JSON whitespace and never wanted in raw values).
jq() { command jq "$@" | tr -d '\r'; }
if ! command -v curl >/dev/null 2>&1; then
echo "[ERROR] curl is required and not found on PATH." >&2
exit 2
fi
if [ ! -f "$MANIFEST" ]; then
echo "[ERROR] Baseline manifest not found: $MANIFEST" >&2
exit 2
fi
# identity.json: prefer repo copy, then ~/.claude (mirrors check-messages.sh).
IDENTITY=""
for c in "$REPO_ROOT/.claude/identity.json" "$HOME/.claude/identity.json"; do
[ -f "$c" ] && { IDENTITY="$c"; break; }
done
idfield() { # dotted.path -> value or empty
[ -n "$IDENTITY" ] && jq -r "$1 // empty" "$IDENTITY" 2>/dev/null
}
HOSTNAME_RAW="$(hostname 2>/dev/null || echo unknown)"
HOST="${HOSTNAME_RAW%.local}"
SESSION="${HOST}/claude-main"
API="$(idfield '.coord_api')"
[ -z "$API" ] && API="http://172.16.3.30:8001"
PLATFORM="$(idfield '.platform')"
[ -z "$PLATFORM" ] && case "$(uname -s)" in
Darwin) PLATFORM="macos" ;; Linux) PLATFORM="linux" ;;
CYGWIN*|MINGW*|MSYS*) PLATFORM="windows" ;; *) PLATFORM="unknown" ;;
esac
ARCH="$(idfield '.architecture')"
[ -z "$ARCH" ] && ARCH="$(uname -m 2>/dev/null || echo unknown)"
# ---------------------------------------------------------------------------
# Results accumulation. Each check appends one compact JSON object.
# status in {PASS, WARN, FAIL, SKIP, INFO}. Grade: any FAIL->RED, WARN->AMBER.
# ---------------------------------------------------------------------------
RESULTS_FILE="$(mktemp 2>/dev/null || echo "${TMPDIR:-/tmp}/selfcheck.$$")"
: > "$RESULTS_FILE"
trap 'rm -f "$RESULTS_FILE" 2>/dev/null' EXIT
emit() { # id category status detail fix
jq -nc --arg id "$1" --arg cat "$2" --arg st "$3" --arg detail "$4" --arg fix "${5:-}" \
'{id:$id,category:$cat,status:$st,detail:$detail,fix:$fix}' >> "$RESULTS_FILE"
}
reachable() { curl -s -o /dev/null -m 4 "$1" 2>/dev/null; } # exit 0 if HTTP responds
# Content-equal ignoring line endings: a repo LF copy and a ~/.claude CRLF copy
# are the SAME content (the cross-machine case this check polices), so compare
# with \r stripped rather than byte-for-byte (cmp would false-flag them).
same_content() { diff -q <(tr -d '\r' < "$1") <(tr -d '\r' < "$2") >/dev/null 2>&1; }
# ---------------------------------------------------------------------------
# CHECK: identity
# ---------------------------------------------------------------------------
check_identity() {
if [ -z "$IDENTITY" ]; then
emit identity.present identity FAIL "identity.json not found (.claude or ~/.claude)" \
"Run onboarding; create .claude/identity.json then 'bash .claude/scripts/migrate-identity.sh'"
return
fi
if ! jq -e . "$IDENTITY" >/dev/null 2>&1; then
emit identity.parse identity FAIL "identity.json is not valid JSON: $IDENTITY" "Fix the JSON syntax"
return
fi
emit identity.present identity PASS "identity.json present and valid: $IDENTITY"
local missing=""
for f in $(jq -r '.required_identity_fields[]' "$MANIFEST"); do
local v; v="$(jq -r ".$f // empty" "$IDENTITY" 2>/dev/null)"
[ -z "$v" ] && missing="$missing $f"
done
if [ -n "$missing" ]; then
emit identity.fields identity WARN "missing/empty identity fields:$missing" \
"bash .claude/scripts/migrate-identity.sh (populates machine-specific fields)"
else
emit identity.fields identity PASS "all required identity fields present"
fi
# --- path fields: identity.json is the map of WHERE things live on this box.
# It is foundational - every later check trusts claudetools_root / vault_path.
# Verify they resolve to real locations and that claudetools_root is in fact
# the repo we are running from (a stale clone path is a silent footgun).
norm() { # path -> lowercase, forward-slash, drive-letter, no trailing slash
local p="$1"
command -v cygpath >/dev/null 2>&1 && p="$(cygpath -m "$p" 2>/dev/null || echo "$p")"
printf '%s' "$p" | tr 'A-Z' 'a-z' | sed 's#\\#/#g; s#/\{1,\}$##'
}
local ctroot; ctroot="$(idfield '.claudetools_root')"
if [ -z "$ctroot" ]; then
emit identity.claudetools_root identity FAIL "identity.claudetools_root not set" \
"Set claudetools_root in identity.json to this repo's absolute path"
elif [ ! -d "$ctroot" ]; then
emit identity.claudetools_root identity FAIL "claudetools_root does not exist: $ctroot" \
"Fix claudetools_root in identity.json (machine moved/renamed the repo?)"
elif [ "$(norm "$ctroot")" != "$(norm "$REPO_ROOT")" ]; then
emit identity.claudetools_root identity WARN \
"claudetools_root ($ctroot) != running repo ($REPO_ROOT)" \
"Reconcile claudetools_root in identity.json with the repo you actually run from"
else
emit identity.claudetools_root identity PASS "claudetools_root resolves to this repo ($ctroot)"
fi
local vpath2; vpath2="$(idfield '.vault_path')"
if [ -z "$vpath2" ]; then
emit identity.vault_path identity FAIL "identity.vault_path not set (cannot locate the SOPS vault)" \
"Set vault_path in identity.json to the cloned vault repo path"
elif [ ! -d "$vpath2" ]; then
emit identity.vault_path identity FAIL "vault_path does not exist: $vpath2" \
"Clone the vault repo and set vault_path in identity.json"
else
emit identity.vault_path identity PASS "vault_path resolves ($vpath2)"
fi
# machine field vs actual hostname
local idmach; idmach="$(idfield '.machine')"
if [ -n "$idmach" ] && [ "$(echo "$idmach" | tr 'A-Z' 'a-z')" != "$(echo "$HOST" | tr 'A-Z' 'a-z')" ]; then
emit identity.hostname identity WARN "identity.machine='$idmach' != actual hostname '$HOST'" \
"Update .machine in identity.json (did you clone onto a new box?)"
else
emit identity.hostname identity PASS "identity.machine matches hostname ($HOST)"
fi
# git config vs identity
local gn ge idn ide
gn="$(git -C "$REPO_ROOT" config user.name 2>/dev/null)"
ge="$(git -C "$REPO_ROOT" config user.email 2>/dev/null)"
idn="$(idfield '.full_name')"; ide="$(idfield '.email')"
if [ -n "$idn" ] && [ "$gn" != "$idn" ]; then
emit identity.git_name identity WARN "git user.name='$gn' != identity.full_name='$idn'" \
"git config user.name \"$idn\""
else
emit identity.git_name identity PASS "git user.name matches identity ($gn)"
fi
if [ -n "$ide" ] && [ "$ge" != "$ide" ]; then
emit identity.git_email identity WARN "git user.email='$ge' != identity.email='$ide'" \
"git config user.email \"$ide\""
else
emit identity.git_email identity PASS "git user.email matches identity ($ge)"
fi
}
# ---------------------------------------------------------------------------
# CHECK: tooling (required + capability-gated)
# ---------------------------------------------------------------------------
toolver() { # best-effort one-line version
"$1" --version 2>/dev/null | head -1 || true
}
check_tools() {
local n why
while IFS=$'\t' read -r n why; do
if command -v "$n" >/dev/null 2>&1; then
emit "tool.$n" tooling PASS "$n present ($(toolver "$n"))"
else
emit "tool.$n" tooling FAIL "$n MISSING (required: $why)" "Install $n and ensure it is on PATH"
fi
done < <(jq -r '.required_tools[] | [.name, .why] | @tsv' "$MANIFEST")
# python: any_of
local pyok="" pc
for pc in $(jq -r '.required_python.any_of[]' "$MANIFEST"); do
if command -v "$pc" >/dev/null 2>&1; then pyok="$pc"; break; fi
done
if [ -n "$pyok" ]; then
local declared; declared="$(idfield '.python.command')"
if [ -n "$declared" ] && ! command -v "$declared" >/dev/null 2>&1; then
emit tool.python tooling WARN "identity.python.command='$declared' not on PATH; '$pyok' is available" \
"Update .python.command in identity.json or re-run migrate-identity.sh"
else
emit tool.python tooling PASS "python available ($pyok; identity declares '${declared:-unset}')"
fi
else
emit tool.python tooling FAIL "no python interpreter found (tried py/python3/python)" "Install Python"
fi
# capability tools - presence only, never FAIL
local cn cap cwhy
while IFS=$'\t' read -r cn cap cwhy; do
if command -v "$cn" >/dev/null 2>&1; then
emit "cap.$cn" capability INFO "$cn present [$cap] ($(toolver "$cn"))"
else
emit "cap.$cn" capability INFO "$cn absent [$cap] - capability off ($cwhy)"
fi
done < <(jq -r '.capability_tools[] | [.name, .capability, .why] | @tsv' "$MANIFEST")
}
# ---------------------------------------------------------------------------
# CHECK: capability tier (ollama) + effective ruleset
# ---------------------------------------------------------------------------
check_capability_tier() {
local declared fb local_ok="" remote_ok="" tier rule
declared="$(idfield '.ollama.endpoint')"
fb="$(idfield '.ollama.fallback')"
reachable "http://localhost:11434/api/tags" && local_ok=1
[ -n "$fb" ] && reachable "${fb%/}/api/tags" && remote_ok=1
if [ -n "$local_ok" ]; then
tier="ollama_local"
elif [ -n "$remote_ok" ]; then
tier="ollama_remote"
else
tier="ollama_none"
fi
rule="$(jq -r ".capability_rules.$tier.tier0_engine" "$MANIFEST")"
# Does the resolved tier agree with what identity declares?
if [ "$tier" = "ollama_none" ]; then
emit captier.ollama capability WARN "Ollama tier = NONE (local + fallback both unreachable). Effective rule: $rule" \
"Confirm this machine is meant to run without Ollama; ensure Tier-0 work routes to haiku, not blocked"
else
local note=""
if [ "$tier" = "ollama_remote" ] && echo "$declared" | grep -q "localhost"; then
note=" (identity declares localhost but local is down; using fallback $fb)"
fi
emit captier.ollama capability PASS "Ollama tier = ${tier#ollama_}${note}. Effective Tier-0: $rule"
fi
}
# ---------------------------------------------------------------------------
# CHECK: required scripts + hook files (exist + executable)
# ---------------------------------------------------------------------------
check_files() {
local rel p
for rel in $(jq -r '.required_scripts[], .required_hook_files[]' "$MANIFEST"); do
p="$REPO_ROOT/$rel"
if [ ! -f "$p" ]; then
emit "file.$rel" files FAIL "missing: $rel" "Restore via /sync (git pull from Gitea)"
elif [ ! -x "$p" ] && echo "$rel" | grep -qE '\.(sh|template)$'; then
emit "file.$rel" files WARN "present but not executable: $rel" "chmod +x \"$rel\""
else
emit "file.$rel" files PASS "present: $rel"
fi
done
}
# ---------------------------------------------------------------------------
# CHECK: settings.json hooks wired correctly
# ---------------------------------------------------------------------------
check_settings_hooks() {
local settings="$REPO_ROOT/.claude/settings.json"
if [ ! -f "$settings" ] || ! jq -e . "$settings" >/dev/null 2>&1; then
emit hooks.settings hooks FAIL "settings.json missing or invalid JSON" "Restore .claude/settings.json via /sync"
return
fi
local ev needle why found
# NB: omit .matcher from the TSV - an empty middle field collapses under tab
# IFS (tab is IFS-whitespace), shifting columns. We do not use the matcher here.
while IFS=$'\t' read -r ev needle why; do
# any hook command under this event containing the needle
found="$(jq -r --arg ev "$ev" --arg n "$needle" \
'(.hooks[$ev] // []) | [.[].hooks[]?.command // ""] | map(select(contains($n))) | length' \
"$settings" 2>/dev/null)"
if [ "${found:-0}" -gt 0 ] 2>/dev/null; then
emit "hook.$ev" hooks PASS "$ev hook wired ($needle)"
else
emit "hook.$ev" hooks FAIL "$ev hook NOT wired (expected command containing '$needle' - $why)" \
"Add the $ev hook to .claude/settings.json (see baseline manifest required_settings_hooks)"
fi
done < <(jq -r '.required_settings_hooks[] | [.event, .command_contains, .why] | @tsv' "$MANIFEST")
# current-mode file (created by UserPromptSubmit hook, but flag if absent)
if [ -f "$REPO_ROOT/.claude/current-mode" ]; then
emit hook.current-mode hooks PASS "current-mode present ($(tr -d '[:space:]' < "$REPO_ROOT/.claude/current-mode"))"
else
emit hook.current-mode hooks WARN "current-mode missing (auto-created on next prompt)" "echo general > .claude/current-mode"
fi
}
# ---------------------------------------------------------------------------
# CHECK: git remote + post-commit hooks
# ---------------------------------------------------------------------------
check_git() {
local url want host_ip
url="$(git -C "$REPO_ROOT" remote get-url origin 2>/dev/null)"
want="$(jq -r '.git.remote_host_contains' "$MANIFEST")"
host_ip="$(jq -r '.git.remote_host_internal_ip' "$MANIFEST")"
if [ -z "$url" ]; then
emit git.remote git WARN "no 'origin' remote on $REPO_ROOT" "git remote add origin <gitea-url>"
elif echo "$url" | grep -qF "$want" || echo "$url" | grep -qF "$host_ip"; then
emit git.remote git PASS "origin -> $url"
else
emit git.remote git FAIL "origin does not point at ACG Gitea: $url" \
"git remote set-url origin http://<user>@$host_ip:3000/azcomputerguru/claudetools.git"
fi
if [ "$(jq -r '.git.post_commit_hook_expected' "$MANIFEST")" = "true" ]; then
if [ -f "$REPO_ROOT/.git/hooks/post-commit" ]; then
emit git.post_commit git PASS "main-repo post-commit hook installed"
else
emit git.post_commit git WARN "main-repo post-commit hook NOT installed (HOOKS.md mandates dev-alerts hook)" \
"cp .claude/hooks/post-commit.template .git/hooks/post-commit && chmod +x .git/hooks/post-commit"
fi
fi
}
# ---------------------------------------------------------------------------
# CHECK: skills + commands conformance vs manifest
# ---------------------------------------------------------------------------
check_skills_commands() {
local name dir
# skills present
for name in $(jq -r '.skills[]' "$MANIFEST"); do
dir="$REPO_ROOT/.claude/skills/$name"
if [ -d "$dir" ]; then
if [ -f "$dir/SKILL.md" ] || ls "$dir"/*.md >/dev/null 2>&1 || [ -d "$dir/scripts" ]; then
emit "skill.$name" skills PASS "skill present: $name"
else
emit "skill.$name" skills WARN "skill dir present but looks empty: $name" "Restore skill contents via /sync"
fi
else
emit "skill.$name" skills FAIL "skill MISSING: $name" "Restore .claude/skills/$name via /sync"
fi
done
# extra skills not in manifest (drift to report, not fail in V1)
local known
known="|$(jq -r '.skills[]' "$MANIFEST" | tr '\n' '|')"
for dir in "$REPO_ROOT"/.claude/skills/*/; do
[ -d "$dir" ] || continue
name="$(basename "$dir")"
case "$known" in *"|$name|"*) ;; *) emit "skill.extra.$name" skills INFO "skill present but NOT in baseline: $name (census candidate)" ;; esac
done
# commands present
for name in $(jq -r '.commands[]' "$MANIFEST"); do
if [ -f "$REPO_ROOT/.claude/commands/$name.md" ]; then
emit "cmd.$name" commands PASS "command present: /$name"
else
emit "cmd.$name" commands FAIL "command MISSING: /$name" "Restore .claude/commands/$name.md via /sync"
fi
done
# extra commands
known="|$(jq -r '.commands[]' "$MANIFEST" | tr '\n' '|')"
for f in "$REPO_ROOT"/.claude/commands/*.md; do
[ -f "$f" ] || continue
name="$(basename "$f" .md)"
[ "$name" = "README" ] && continue
case "$known" in *"|$name|"*) ;; *) emit "cmd.extra.$name" commands INFO "command present but NOT in baseline: /$name (census candidate)" ;; esac
done
}
# ---------------------------------------------------------------------------
# CHECK: vault decrypt readiness
# ---------------------------------------------------------------------------
check_vault() {
local vpath; vpath="$(idfield '.vault_path')"
if [ -z "$vpath" ]; then
emit vault.path vault WARN "identity.vault_path not set" "Set vault_path in identity.json"
return
fi
if [ ! -d "$vpath" ]; then
emit vault.path vault FAIL "vault_path does not exist: $vpath" "Clone the vault repo and set vault_path"
return
fi
emit vault.path vault PASS "vault repo present: $vpath"
if ! command -v sops >/dev/null 2>&1 || ! command -v age >/dev/null 2>&1; then
emit vault.tools vault FAIL "sops/age missing - cannot decrypt vault" "Install sops and age"
return
fi
# Lightweight readiness: vault.sh list should enumerate entries without error.
if [ -x "$REPO_ROOT/.claude/scripts/vault.sh" ]; then
if bash "$REPO_ROOT/.claude/scripts/vault.sh" list >/dev/null 2>&1; then
emit vault.list vault PASS "vault.sh list succeeded (sops/age wired)"
else
emit vault.list vault WARN "vault.sh list failed - check age key + SOPS_AGE_KEY_FILE" \
"Verify age key at the SOPS recipient path; run: bash .claude/scripts/vault.sh list"
fi
fi
}
# ---------------------------------------------------------------------------
# CHECK: connectivity
# ---------------------------------------------------------------------------
check_connectivity() {
local name url req
while IFS=$'\t' read -r name url req; do
if reachable "$url"; then
emit "net.$name" connectivity PASS "$name reachable ($url)"
elif [ "$req" = "true" ]; then
emit "net.$name" connectivity FAIL "$name UNREACHABLE ($url)" "Check VPN/Tailscale/network to 172.16.3.x"
else
emit "net.$name" connectivity WARN "$name unreachable ($url) - off-network is OK" ""
fi
done < <(jq -r '.connectivity[] | [.name, .url, (.required|tostring)] | @tsv' "$MANIFEST")
}
# ---------------------------------------------------------------------------
# CHECK: duplicate command/skill definitions across search roots.
# Claude Code resolves slash commands and skills from BOTH the repo
# (.claude/commands, .claude/skills) and the user profile (~/.claude/...). When
# the same name exists in both with DIFFERENT content, the harness may resolve a
# different one than you expect - the "same /cmd, different behaviour on the Mac"
# bug. Divergent = WARN; identical = INFO (redundant copy that WILL drift).
# ---------------------------------------------------------------------------
check_duplicates() {
local kind repo_dir user_dir
# commands: compare *.md files by content
for kind in commands skills; do
repo_dir="$REPO_ROOT/.claude/$kind"
user_dir="$HOME/.claude/$kind"
[ -d "$repo_dir" ] || continue
[ -d "$user_dir" ] || { emit "dup.$kind" duplicates PASS "no user-level ~/.claude/$kind (single source: repo)"; continue; }
local name rp up dup_div=0 dup_same=0
if [ "$kind" = "commands" ]; then
for rp in "$repo_dir"/*.md; do
[ -f "$rp" ] || continue
name="$(basename "$rp" .md)"
[ "$name" = "README" ] && continue
up="$user_dir/$name.md"
[ -f "$up" ] || continue
[ "$rp" -ef "$up" ] && continue # symlink to the same file - cannot drift
if same_content "$rp" "$up"; then
dup_same=$((dup_same+1))
else
dup_div=$((dup_div+1))
emit "dup.cmd.$name" duplicates WARN \
"/$name is DIVERGENT: repo and ~/.claude copies differ (harness may run the wrong one)" \
"Reconcile: diff \"$rp\" \"$up\" then make ~/.claude/commands/$name.md match the repo (or remove it)"
fi
done
else
for rp in "$repo_dir"/*/; do
[ -d "$rp" ] || continue
name="$(basename "$rp")"
up="$user_dir/$name"
[ -d "$up" ] || continue
[ "$rp" -ef "$up" ] && continue # symlinked dir - cannot drift
# Only compare when BOTH have a SKILL.md; otherwise not comparable
# (script-only / *.md-only skills) - skip rather than miscount.
if [ -f "$rp/SKILL.md" ] && [ -f "$up/SKILL.md" ]; then
if same_content "$rp/SKILL.md" "$up/SKILL.md"; then
dup_same=$((dup_same+1))
else
dup_div=$((dup_div+1))
emit "dup.skill.$name" duplicates WARN \
"skill '$name' is DIVERGENT: repo and ~/.claude SKILL.md differ" \
"Reconcile ~/.claude/skills/$name with the repo copy (or remove the user-level one)"
fi
fi
done
fi
if [ "$dup_div" -eq 0 ] && [ "$dup_same" -gt 0 ]; then
emit "dup.$kind" duplicates INFO \
"$dup_same $kind exist in BOTH repo and ~/.claude (identical now, but a redundant copy that can drift)" \
"Consider a single source of truth for $kind to prevent future divergence"
elif [ "$dup_div" -eq 0 ] && [ "$dup_same" -eq 0 ]; then
emit "dup.$kind" duplicates PASS "no duplicate $kind across roots"
fi
done
}
# ---------------------------------------------------------------------------
# CHECK: rogue memories that contradict settings/identity.
# Deterministic core only: index integrity + a conservative, manifest-declared
# set of contradiction patterns evaluated against this machine's identity. The
# SEMANTIC contradiction pass (reasoning over all memories vs identity/settings)
# is a judgment task and is delegated to the model in SKILL.md, not grep.
# ---------------------------------------------------------------------------
check_memory() {
local mdir="$REPO_ROOT/.claude/memory" idx="$REPO_ROOT/.claude/memory/MEMORY.md"
if [ ! -d "$mdir" ]; then
emit memory.dir memory WARN "no .claude/memory directory" "Expected the shared memory store; restore via /sync"
return
fi
if [ ! -f "$idx" ]; then
emit memory.index memory WARN "MEMORY.md index missing" "Create .claude/memory/MEMORY.md (the loaded index)"
else
# orphan detection: every *.md (except MEMORY.md) should be referenced in the index
local f base orphans=0
for f in "$mdir"/*.md; do
[ -f "$f" ] || continue
base="$(basename "$f")"
[ "$base" = "MEMORY.md" ] && continue
if ! grep -qF "$base" "$idx" 2>/dev/null; then
orphans=$((orphans+1))
fi
done
if [ "$orphans" -gt 0 ]; then
emit memory.orphans memory WARN "$orphans memory file(s) not referenced in MEMORY.md (orphaned)" \
"Run /memory-dream or add the missing index lines"
else
emit memory.index memory PASS "MEMORY.md index present; no orphaned memory files"
fi
fi
# Manifest-declared contradiction patterns. Each entry:
# { when_field, when_equals, grep, why } - only evaluated when this
# machine's identity.<when_field> == when_equals, so a pattern fires only
# where it is actually a contradiction (e.g. prescribing python3 on a `py` box).
# NB: fields are read via @tsv, so when_equals/grep MUST NOT contain tab chars.
local has; has="$(jq -r '(.memory.contradiction_patterns // []) | length' "$MANIFEST" 2>/dev/null)"
if [ "${has:-0}" -gt 0 ] 2>/dev/null; then
local wf we gx why hits
while IFS=$'\t' read -r wf we gx why; do
[ -n "$wf" ] || continue
if [ "$(idfield ".$wf")" = "$we" ]; then
hits="$(grep -rliE "$gx" "$mdir" 2>/dev/null | grep -vF 'MEMORY.md' | head -5 | tr '\n' ' ')"
if [ -n "$hits" ]; then
emit "memory.contradiction.$wf" memory WARN \
"memory may contradict identity.$wf=$we ($why): $hits" \
"Review the listed memory file(s); correct or delete if they prescribe the wrong behaviour for this machine"
fi
fi
done < <(jq -r '(.memory.contradiction_patterns // [])[] | [.when_field, .when_equals, .grep, .why] | @tsv' "$MANIFEST")
fi
}
# ---------------------------------------------------------------------------
# Build the census JSON from accumulated results
# ---------------------------------------------------------------------------
build_census() {
local fails warns grade
fails="$(jq -s '[.[]|select(.status=="FAIL")]|length' "$RESULTS_FILE")"
warns="$(jq -s '[.[]|select(.status=="WARN")]|length' "$RESULTS_FILE")"
if [ "$fails" -gt 0 ]; then grade="RED"; elif [ "$warns" -gt 0 ]; then grade="AMBER"; else grade="GREEN"; fi
jq -s \
--arg host "$HOST" --arg session "$SESSION" --arg platform "$PLATFORM" --arg arch "$ARCH" \
--arg grade "$grade" --arg ts "$RUN_TS" \
--arg mver "$(jq -r '.schema_version' "$MANIFEST")" \
'{
host:$host, session:$session, platform:$platform, arch:$arch,
grade:$grade, generated_at:$ts, manifest_version:$mver,
summary: { pass:([.[]|select(.status=="PASS")]|length),
warn:([.[]|select(.status=="WARN")]|length),
fail:([.[]|select(.status=="FAIL")]|length),
info:([.[]|select(.status=="INFO")]|length) },
results: .
}' "$RESULTS_FILE"
}
# ---------------------------------------------------------------------------
# Human report
# ---------------------------------------------------------------------------
print_report() {
local census="$1" grade
grade="$(echo "$census" | jq -r .grade)"
echo ""
echo "============================================================"
echo " ClaudeTools self-check - $HOST ($PLATFORM/$ARCH)"
echo " Grade: $grade $(echo "$census" | jq -r '.summary | "PASS \(.pass) WARN \(.warn) FAIL \(.fail) INFO \(.info)"')"
echo " Manifest: $(echo "$census" | jq -r .manifest_version) (provisional) $RUN_TS"
echo "============================================================"
# FAIL then WARN then INFO; PASS summarized per category
echo "$census" | jq -r '
def mark(s): if s=="FAIL" then "[FAIL]" elif s=="WARN" then "[WARN]"
elif s=="INFO" then "[INFO]" elif s=="SKIP" then "[SKIP]" else "[ OK ]" end;
(.results | map(select(.status=="FAIL"))) as $f
| (.results | map(select(.status=="WARN"))) as $w
| (.results | map(select(.status=="INFO"))) as $i
| (if ($f|length)>0 then "\nFAILURES:" else empty end),
($f[] | " [FAIL] \(.category)/\(.id): \(.detail)" + (if .fix!="" then "\n fix: \(.fix)" else "" end)),
(if ($w|length)>0 then "\nWARNINGS:" else empty end),
($w[] | " [WARN] \(.category)/\(.id): \(.detail)" + (if .fix!="" then "\n fix: \(.fix)" else "" end)),
(if ($i|length)>0 then "\nINFO / capability:" else empty end),
($i[] | " [INFO] \(.detail)")
'
# per-category PASS counts
echo ""
echo "PASS by category:"
echo "$census" | jq -r '.results | map(select(.status=="PASS")) | group_by(.category)[] | " \(.[0].category): \(length) ok"'
echo "============================================================"
}
# ---------------------------------------------------------------------------
# Publish census to coord API.
# The coord API uses the PATH-PARAM form: PUT /api/coord/components/{pk}/{comp}
# with a body of {state, version, notes, updated_by} (the body form 405s).
# The component segment must be slash-free (a slash 404s, even URL-encoded), so
# the per-machine component is "selfcheck_<host>" (NOT "selfcheck/<host>").
# ---------------------------------------------------------------------------
COMPONENT="selfcheck_$HOST"
publish_census() {
local census="$1" grade compact body path
grade="$(echo "$census" | jq -r .grade)"
compact="$(echo "$census" | jq -c .)"
path="/api/coord/components/claudetools/$COMPONENT"
body="$(jq -nc --arg state "$grade" \
--arg ver "$(jq -r '.schema_version' "$MANIFEST")" --arg notes "$compact" --arg by "$SESSION" \
'{state:$state, version:$ver, notes:$notes, updated_by:$by}')"
if curl -s -m 8 -X PUT "$API$path" -H "Content-Type: application/json" -d "$body" >/dev/null 2>&1; then
echo "[OK] Published census to coord: component $COMPONENT = $grade"
else
# softfail per coordination protocol - queue the SAME path/body so a
# later /sync drain replays a request that actually works.
local q="$REPO_ROOT/.claude/coord-queue.jsonl"
jq -nc --arg path "$path" --argjson b "$body" --arg ts "$RUN_TS" \
'{ts:$ts, method:"PUT", path:$path, body:$b}' >> "$q" 2>/dev/null
echo "[WARN] coord unreachable; census queued to .claude/coord-queue.jsonl"
fi
}
# ---------------------------------------------------------------------------
# Subcommand: fanout - request all instances to run /self-check --publish
# ---------------------------------------------------------------------------
do_fanout() {
local subj body payload
subj="[self-check] Fleet census + self-remediation request"
body="On THIS machine: (1) run /self-check ; (2) apply the suggested fix commands it prints for any FAIL/WARN - fix your OWN machine, locally, with your operator present (nobody fixes you remotely) ; (3) re-run /self-check to confirm GREEN ; (4) run /self-check --publish to report your census (component selfcheck_<host>) to coord. The check is read-only; only --publish writes (your census only). Requested by $SESSION at $RUN_TS."
payload="$(jq -nc --arg from "$SESSION" --arg subj "$subj" --arg body "$body" \
'{from_session:$from, to_session:"ALL_SESSIONS", project_key:"claudetools", subject:$subj, body:$body}')"
if curl -s -m 8 -X POST "$API/api/coord/messages" -H "Content-Type: application/json" -d "$payload" >/dev/null 2>&1; then
echo "[OK] Broadcast census request to ALL_SESSIONS."
else
echo "[ERROR] Failed to broadcast (coord unreachable)." >&2
exit 1
fi
}
# ---------------------------------------------------------------------------
# Subcommand: aggregate - read all published censuses, build fleet view
# ---------------------------------------------------------------------------
do_aggregate() {
local comps
comps="$(curl -s -m 8 "$API/api/coord/components?project_key=claudetools" 2>/dev/null)"
if [ -z "$comps" ]; then echo "[ERROR] coord unreachable." >&2; exit 1; fi
# The coord API returns {states:[...], total:N}; each row's grade is .state and
# the full census JSON is in .notes. Keep selfcheck_* rows with parseable notes.
# (.components / bare-array kept as defensive fallbacks.)
local censuses
censuses="$(echo "$comps" | jq -c '
( .states? // .components? // (if type=="array" then . else [] end) ) as $rows
| ($rows // [])
| map(select((.component? // "") | startswith("selfcheck")))
| map(.notes | try fromjson catch empty)
' 2>/dev/null)"
local n; n="$(echo "$censuses" | jq 'length' 2>/dev/null || echo 0)"
if [ "${n:-0}" -eq 0 ]; then
echo "No published censuses found yet. Run 'self-check.sh fanout', then have each machine run /self-check --publish."
return
fi
echo "============================================================"
echo " Fleet census: $n machine(s) reporting"
echo "============================================================"
echo "$censuses" | jq -r '.[] | " \(.grade)\t\(.host)\t\(.platform)/\(.arch)\tP\(.summary.pass) W\(.summary.warn) F\(.summary.fail)\t\(.generated_at)"' | column -t -s$'\t' 2>/dev/null \
|| echo "$censuses" | jq -r '.[] | " \(.grade) \(.host) \(.platform)/\(.arch) P\(.summary.pass) W\(.summary.warn) F\(.summary.fail)"'
echo ""
echo "Proposed baseline (intersection = required everywhere; symmetric diff = capability-gated):"
# Tools present on every machine vs only some, derived from tool.* PASS results.
echo "$censuses" | jq -r '
[ .[] | { host:.host, tools:( .results | map(select((.id|startswith("tool."))) | select(.status=="PASS") | (.id|sub("^tool.";""))) ) } ] as $m
| ($m|length) as $count
| ([ $m[].tools[] ] | unique) as $all
| " tools on ALL \($count): " + ( [ $all[] | . as $t | select( ([ $m[] | select(.tools|index($t)) ]|length) == $count ) ] | join(", ") ),
" tools on SOME only: " + ( [ $all[] | . as $t | select( ([ $m[] | select(.tools|index($t)) ]|length) < $count ) ] | join(", ") )
' 2>/dev/null
echo ""
echo "Machines that must self-remediate (RED/AMBER) - each fixes ITSELF, then re-runs + re-publishes:"
local needfix
needfix="$(echo "$censuses" | jq -r '
.[] | select(.grade!="GREEN")
| " \(.host) [\(.grade)] should run, in order:\n"
+ ( [ .results[] | select(.status=="FAIL" or .status=="WARN") | select(.fix!="")
| " - \(.fix)" ] | join("\n") )
+ "\n then: /self-check --publish"
' 2>/dev/null)"
if [ -n "$needfix" ]; then
echo "$needfix"
else
echo " (none - whole fleet is GREEN)"
fi
echo "============================================================"
echo "We do NOT fix remote machines. Relay each machine's fix list to its operator;"
echo "they self-remediate locally, re-run /self-check, and re-publish until GREEN."
echo "Once the fleet is reporting consistently, ratify baseline/manifest.json with Mike."
}
# ---------------------------------------------------------------------------
# Main
# ---------------------------------------------------------------------------
# RUN_TS is passed in by the caller (SKILL.md instructs a real UTC stamp);
# fall back to `date` if available so the script is runnable standalone.
RUN_TS="${SELFCHECK_TS:-$(date -u +%Y-%m-%dT%H:%M:%SZ 2>/dev/null || echo unknown)}"
MODE="${1:-report}"
case "$MODE" in
fanout) do_fanout; exit 0 ;;
aggregate) do_aggregate; exit 0 ;;
esac
# run all checks
check_identity
check_tools
check_capability_tier
check_files
check_settings_hooks
check_git
check_skills_commands
check_duplicates
check_memory
check_vault
check_connectivity
CENSUS="$(build_census)"
case "$MODE" in
--json) echo "$CENSUS" ;;
--publish) print_report "$CENSUS"; publish_census "$CENSUS" ;;
report|*) print_report "$CENSUS" ;;
esac
# exit code reflects grade for scripting (0 GREEN, 1 AMBER, 2 RED)
GR="$(echo "$CENSUS" | jq -r .grade)"
case "$GR" in GREEN) exit 0 ;; AMBER) exit 1 ;; RED) exit 2 ;; *) exit 0 ;; esac