From b153ff158be236cace9957c01f18929e5703f940 Mon Sep 17 00:00:00 2001
From: Mike Swanson <mike@azcomputerguru.com>
Date: Tue, 2 Jun 2026 14:45:42 -0700
Subject: [PATCH] feat(self-check): add harness self-diagnosis / fleet
 conformance skill

New /self-check skill: each machine probes its own ClaudeTools harness wiring
(identity.json paths, required tooling, settings.json hooks, skill/command/script
set, vault decrypt, coord/Gitea connectivity, Ollama capability tier) and grades
RED/AMBER/GREEN against a checked-in provisional baseline manifest.

- Capability-tier model: architectural/OS/hardware differences (e.g. no local
  Ollama) select a fallback ruleset instead of failing.
- Duplicate detection: flags command/skill names that diverge between the repo
  and ~/.claude (the "same /cmd, different behaviour" cross-machine bug);
  CRLF-only diffs ignored.
- Memory check: index + orphan detection, plus a model-driven semantic pass for
  memories that contradict identity/settings.
- V1 is a census tool: --publish writes a per-machine census to coord
  (component selfcheck_<host>); fanout requests the fleet to self-check +
  self-remediate + re-publish; aggregate derives the proposed baseline. No
  machine ever fixes another.

Reviewed twice by the Code Review Agent; three CRITICAL coord-API bugs and the
CRLF false-WARN found and fixed, verified live against the coord API.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 .claude/commands/self-check.md                |  53 ++
 .claude/skills/self-check/SKILL.md            | 161 ++++
 .claude/skills/self-check/baseline/README.md  |  68 ++
 .../skills/self-check/baseline/manifest.json  | 114 +++
 .../skills/self-check/scripts/self-check.sh   | 746 ++++++++++++++++++
 5 files changed, 1142 insertions(+)
 create mode 100644 .claude/commands/self-check.md
 create mode 100644 .claude/skills/self-check/SKILL.md
 create mode 100644 .claude/skills/self-check/baseline/README.md
 create mode 100644 .claude/skills/self-check/baseline/manifest.json
 create mode 100644 .claude/skills/self-check/scripts/self-check.sh
diff --git a/.claude/commands/self-check.md b/.claude/commands/self-check.md
new file mode 100644
index 0000000..05f0d6d
--- /dev/null
+++ b/.claude/commands/self-check.md
@@ -0,0 +1,53 @@
+---
+name: self-check
+description: Self-diagnose this ClaudeTools machine's harness wiring (tools, identity, hooks, skills, commands, scripts, capability tier, connectivity) against the fleet baseline, and optionally publish the census to coord.
+---
+
+# /self-check — ClaudeTools Harness Self-Diagnosis
+
+Runs the conformance probe in `.claude/skills/self-check/scripts/self-check.sh`,
+which grades this machine RED/AMBER/GREEN against the provisional baseline manifest
+and accounts for capability differences (e.g. no local Ollama -> remote/none ruleset).
+
+See `.claude/skills/self-check/SKILL.md` for the full model. This command is a thin
+runner — invoke the **self-check** skill for interpretation and follow-up.
+
+## Run it
+
+Always pass a real UTC timestamp via `SELFCHECK_TS` (the script falls back to
+`date` if available, but pass it explicitly for correctness):
+
+```bash
+SELFCHECK_TS="$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
+  bash .claude/skills/self-check/scripts/self-check.sh report
+```
+
+Modes:
+
+| Arg | What it does |
+|-----|--------------|
+| `report` (default) | Human-readable RED/AMBER/GREEN report. Exit 0/1/2 = GREEN/AMBER/RED. |
+| `--json` | Structured census JSON only (for piping/aggregation). |
+| `--publish` | Run + PUT the census to coord as component `selfcheck_<host>`. |
+| `fanout` | Broadcast a self-check + self-remediate + re-publish request to ALL_SESSIONS. |
+| `aggregate` | Read every machine's published census; print fleet table + proposed baseline. |
+
+## Typical flows
+
+**Just check this box:**
+```bash
+SELFCHECK_TS="$(date -u +%Y-%m-%dT%H:%M:%SZ)" bash .claude/skills/self-check/scripts/self-check.sh report
+```
+
+**Build the fleet baseline (V1 census):**
+```bash
+# 1) From any machine, request the fleet to report:
+bash .claude/skills/self-check/scripts/self-check.sh fanout
+# 2) Each instance (incl. this one) publishes:
+SELFCHECK_TS="$(date -u +%Y-%m-%dT%H:%M:%SZ)" bash .claude/skills/self-check/scripts/self-check.sh --publish
+# 3) Aggregate what came back:
+bash .claude/skills/self-check/scripts/self-check.sh aggregate
+```
+
+After running, summarize the grade and the FAIL/WARN items for the user with the
+exact fix commands. Do NOT auto-apply fixes — V1 is report-only by design.
diff --git a/.claude/skills/self-check/SKILL.md b/.claude/skills/self-check/SKILL.md
new file mode 100644
index 0000000..7797edf
--- /dev/null
+++ b/.claude/skills/self-check/SKILL.md
@@ -0,0 +1,161 @@
+---
+name: self-check
+description: >-
+  Self-diagnose a ClaudeTools session's machine: verify the harness is wired the
+  same way as every other instance while allowing for architectural / OS / hardware
+  differences. Checks that identity.json exists and is correct (the map of WHERE
+  things live on this box), required tooling is installed, env/paths resolve,
+  hooks are wired, the skill/command/script set matches the baseline, the vault
+  decrypts, coord/Gitea are reachable, and the machine's capability tier (e.g. no
+  local Ollama) resolves to the right fallback ruleset. Grades RED/AMBER/GREEN and
+  can publish a census to the coord API so the fleet baseline can be built/refined.
+  Invoke for: "self check", "self diagnosis", "self test", "doctor", "health check",
+  "am I configured right", "is my machine set up correctly", "harness conformance",
+  "fleet conformance", "check my environment", "is everything wired up".
+---
+
+# Self-Check — ClaudeTools Harness Self-Diagnosis
+
+A top-to-bottom evaluation of how *this* machine's ClaudeTools harness is wired,
+graded against a checked-in **baseline manifest** so every machine behaves the
+same way — while explicitly allowing for architecture, OS, and hardware
+differences via a **capability tier** model.
+
+This is the skill the user asked for when a session needs to "make sure
+everything is as it should be."
+
+## The model in one paragraph
+
+`identity.json` is the foundational, per-machine map of **where things live and
+what this box can do** (vault path, repo root, platform, arch, python command,
+Ollama endpoints). The **baseline manifest**
+(`baseline/manifest.json`) declares what *every* machine must have — required
+tools, identity fields, scripts, hook files, the wired `settings.json` hooks, the
+canonical skill/command set, and the **capability rules** that say what to do when
+a capability is absent (e.g. no local Ollama → use the remote endpoint, or if that
+is also down, route Tier-0 work to haiku instead of blocking). The probe compares
+the live machine against the manifest, resolves the machine's capability tier, and
+grades RED/AMBER/GREEN. Required things missing = RED. Advisory drift = AMBER.
+Capability differences are **never** failures — they select a ruleset.
+
+## V1 is a CENSUS tool (read this)
+
+There is no ratified fleet baseline yet. `baseline/manifest.json` is **provisional**,
+generated from a single known-good machine (GURU-5070). So V1's job is to gather
+ground truth from every machine and help Mike build the real baseline:
+
+1. **Probe** — each machine runs the check and produces a structured census.
+2. **Publish** — `--publish` PUTs the census to coord as component
+   `selfcheck_<host>` (state = grade, notes = full JSON). One row per machine =
+   a live fleet conformance view.
+3. **Fan out** — `fanout` broadcasts a request to `ALL_SESSIONS` so every active
+   instance reports.
+4. **Aggregate** — `aggregate` reads all censuses back and proposes a baseline
+   (tools/skills/commands present on *all* machines = "required everywhere";
+   present on *some* = "capability-gated"), and lists machines with FAILs.
+
+Mike reviews the aggregate and ratifies `manifest.json`. From then on the same
+probe enforces conformance. **V1 does not auto-fix anything** — it reports the
+exact fix command for each finding (per the decision on record).
+
+## Running it
+
+The probe is `scripts/self-check.sh` (bash; runs on Git Bash/Windows, macOS,
+Linux; deps: jq + curl). Always pass a real UTC timestamp:
+
+```bash
+SELFCHECK_TS="$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
+  bash .claude/skills/self-check/scripts/self-check.sh <mode>
+```
+
+| Mode | Purpose |
+|------|---------|
+| `report` (default) | Human RED/AMBER/GREEN report. Exit 0/1/2 = GREEN/AMBER/RED. |
+| `--json` | Structured census JSON to stdout (for piping). |
+| `--publish` | Run + publish census to coord (component `selfcheck_<host>`). Softfails to `.claude/coord-queue.jsonl`. |
+| `fanout` | Broadcast a census request to ALL_SESSIONS. |
+| `aggregate` | Fleet table + proposed-baseline summary from published censuses. |
+
+`/self-check` is the slash-command runner for the same script.
+
+## What it checks
+
+| Category | Checks |
+|----------|--------|
+| **identity** | identity.json exists + valid JSON; **all required fields present**; `claudetools_root` exists and equals the running repo; `vault_path` exists; `machine` == hostname; git user.name/email match identity. |
+| **tooling** | required everywhere: bash, git, jq, curl, sops, age, ssh, a python. Missing = FAIL. |
+| **capability** | ollama, cargo, node, gh, docker, op — presence is INFO, never a failure. Resolves the **Ollama tier** (local / remote / none) and prints the effective Tier-0 ruleset. |
+| **files** | required scripts + hook files present and executable. |
+| **hooks** | the three `settings.json` hooks are wired (block-backslash PreToolUse, check-messages UserPromptSubmit, sync-memory SessionStart); `current-mode` present. |
+| **git** | origin points at ACG Gitea (internal IP preferred); main-repo post-commit hook installed (AMBER if not). |
+| **skills/commands** | every skill dir and command file in the baseline is present; extras are reported as census candidates. |
+| **duplicates** | command/skill names present in BOTH the repo and `~/.claude`. Divergent content = WARN (the "same `/cmd`, different behaviour on the Mac" bug); identical = INFO (redundant, will drift). CRLF-only differences are ignored. |
+| **memory** | `MEMORY.md` index exists; no orphaned memory files; manifest-declared contradiction patterns (see semantic pass below). Never FAILs the grade. |
+| **vault** | vault repo exists; sops+age present; `vault.sh list` succeeds (decrypt wired). |
+| **connectivity** | coord API (required), main API + internal Gitea (advisory; off-network is OK). |
+
+## Rogue-memory contradiction — semantic pass (do this when asked, or on a full check)
+
+The engine's memory check is deterministic and conservative (index + orphans +
+declared patterns) so it never produces false alarms. A *true* contradiction
+check — "does any memory directly contradict what this machine's settings say?"
+— is a judgment task, so the model does it (route the prose/classification to
+Ollama Tier-0 per the house rules; Claude reviews the result):
+
+1. Read `identity.json` (where things live + this box's capabilities),
+   `settings.json` (wired hooks/permissions), and `baseline/manifest.json`.
+2. Read the memory index `.claude/memory/MEMORY.md`, then open any memory whose
+   one-line hook touches: paths/roots, python launcher, endpoints/IPs, OS/arch
+   assumptions, tool choices, or model routing.
+3. Flag memories that **directly contradict** this machine's reality, e.g.:
+   - prescribes `python3`/`python` when `identity.python.command` is `py` (or vice-versa),
+   - hardcodes a repo/vault path that isn't this machine's `claudetools_root`/`vault_path`,
+   - names an endpoint/IP that conflicts with `identity.coord_api` or the manifest,
+   - assumes a capability (local Ollama) this machine's tier says is absent.
+4. Report each as: memory file, the contradicting claim, the setting it violates,
+   and a suggested correction. **Do not edit memories** — surface for the operator
+   (deletions/rewrites go through the human, mirroring memory-dream's posture).
+
+Genuinely machine-specific guidance in a *shared* memory is the usual culprit —
+the fix is to scope it ("on Windows…") or split it, not to globally flip it.
+
+## Fleet self-remediation loop (machines fix themselves)
+
+We never fix a remote machine. The flow is:
+
+1. `fanout` — broadcast asks every instance to self-check + self-fix + re-publish.
+2. Each operator runs `/self-check` locally, applies the printed fix commands on
+   their own box, re-runs to confirm GREEN, then `/self-check --publish`.
+3. `aggregate` — shows who is still RED/AMBER and prints each machine's own fix
+   list. Relay it to that operator; do not run it for them.
+4. Repeat until the fleet is consistently GREEN, then ratify the manifest.
+
+## How to interpret a run
+
+After running, summarize for the user:
+- The **grade** and the PASS/WARN/FAIL/INFO tallies.
+- Each **FAIL** and **WARN** with its exact fix command. Do not auto-apply.
+- The **capability tier** line — confirm the machine knows its Tier-0 fallback.
+- If publishing/aggregating, note how many machines have reported and which are RED.
+
+Capability differences (no Ollama, no gh, ARM vs amd64, macOS vs Windows) are
+expected and must never be reported as broken — they are the whole point of the
+tier model.
+
+## Files
+
+```
+.claude/skills/self-check/
+  SKILL.md                  this file
+  scripts/self-check.sh     the probe engine (report / --json / --publish / fanout / aggregate)
+  baseline/manifest.json    the provisional fleet baseline (single source of truth)
+  baseline/README.md        the baseline model + how to refine/ratify it
+.claude/commands/self-check.md   the /self-check runner
+```
+
+## Extending the baseline
+
+When a new tool/skill/command/hook becomes mandatory fleet-wide, edit
+`baseline/manifest.json`, commit, and `/sync`. Every machine's next self-check
+enforces it. Capability-only tools go in `capability_tools` with a matching entry
+in `capability_rules` describing the fallback. See `baseline/README.md`.
diff --git a/.claude/skills/self-check/baseline/README.md b/.claude/skills/self-check/baseline/README.md
new file mode 100644
index 0000000..c0c50c4
--- /dev/null
+++ b/.claude/skills/self-check/baseline/README.md
@@ -0,0 +1,68 @@
+# Self-Check Baseline
+
+`manifest.json` is the **single source of truth** for how a ClaudeTools machine
+should be wired. Every machine's `/self-check` grades itself against this file.
+It is checked into the repo and syncs to all workstations via Gitea, so updating
+it here updates the standard for the whole fleet.
+
+## Why a checked-in manifest (the "how do we set a baseline" answer)
+
+The fleet has architectural differences — Windows/macOS/Linux, amd64/arm64, boxes
+with and without a local GPU for Ollama. We still want every machine to *behave*
+the same. A checked-in manifest gives us that:
+
+- **Required everywhere** lives in `required_*` keys. Missing = a real failure on
+  any machine, regardless of OS.
+- **Allowed-to-differ** lives in `capability_tools` + `capability_rules`. A capability
+  being absent is never a failure — it selects a *fallback ruleset* so the machine
+  knows how to behave without it (the canonical example: no local Ollama → use the
+  remote endpoint; if that is down too, route Tier-0 work to haiku instead of
+  blocking on it).
+
+`identity.json` (per-machine, gitignored) carries the machine's actual coordinates
+and capabilities. The manifest says what's required; identity says where things are
+and what this box can do; the probe reconciles the two.
+
+## How the baseline gets built (V1 census → ratified baseline)
+
+The current `manifest.json` is **provisional** — generated from one known-good
+machine (GURU-5070, 2026-06-02). It has not been confirmed against the fleet.
+The build sequence:
+
+1. `self-check.sh fanout` — ask every active instance to report.
+2. Each machine runs `self-check.sh --publish` — publishes its census to coord as
+   component `selfcheck_<host>`.
+3. `self-check.sh aggregate` — reads them all back and proposes:
+   - items present on **all** reporting machines → **required everywhere** candidates,
+   - items present on **some** machines → **capability-gated** candidates.
+4. Mike reviews the proposal and edits this `manifest.json` to ratify it, then
+   flips `"status": "provisional"` to `"ratified"` and commits.
+
+Until ratified, treat "extra"/"missing" skill and command findings as **candidates**,
+not gospel.
+
+## Manifest structure
+
+| Key | Meaning |
+|-----|---------|
+| `schema_version`, `status` | manifest version; `provisional` or `ratified`. |
+| `required_tools[]` | tools every machine must have on PATH (`name`, `why`). Missing = FAIL. |
+| `required_python.any_of[]` | acceptable python launchers (`py`/`python3`/`python`). |
+| `capability_tools[]` | optional tools; presence is INFO and toggles a `capability`. |
+| `required_identity_fields[]` | dotted identity.json paths that must be set. |
+| `required_scripts[]`, `required_hook_files[]` | repo files that must exist (+ executable). |
+| `required_settings_hooks[]` | hooks that must be wired in settings.json (`event`, `command_contains`, `why`). |
+| `git` | expected remote host, internal IP, post-commit-hook expectation. |
+| `skills[]`, `commands[]` | the canonical skill dirs / command files. |
+| `connectivity[]` | endpoints to probe (`required` ones FAIL if unreachable). |
+| `capability_rules{}` | per-capability fallback behavior (the "different rules" for different hardware). |
+
+## Editing rules
+
+- Add a **required** item only when it should hold on *every* machine, every OS.
+  If a Windows box legitimately can't have it, it's a capability, not a requirement.
+- Every `capability_tools` entry needs a story for what to do without it — encode
+  it in `capability_rules` (or reference an existing tier) so the probe can print
+  the effective ruleset.
+- Keep paths repo-relative and forward-slashed.
+- After editing: commit + `/sync`. The fleet picks it up on next pull.
diff --git a/.claude/skills/self-check/baseline/manifest.json b/.claude/skills/self-check/baseline/manifest.json
new file mode 100644
index 0000000..83893ea
--- /dev/null
+++ b/.claude/skills/self-check/baseline/manifest.json
@@ -0,0 +1,114 @@
+{
+  "schema_version": "1.0.0",
+  "status": "provisional",
+  "derived_from": "GURU-5070",
+  "derived_at": "2026-06-02",
+  "note": "PROVISIONAL baseline, generated from a single known-good machine. V1 of self-check is a CENSUS tool: every machine probes itself, publishes to the coord API, and we refine this manifest from real fleet data (see baseline/README.md). Do NOT treat 'extra' or 'missing' items as authoritative until the fleet census has confirmed them across machines.",
+
+  "required_tools": [
+    { "name": "bash",   "why": "hooks, scripts, sync, vault wrapper" },
+    { "name": "git",    "why": "repo + submodules + Gitea sync" },
+    { "name": "jq",     "why": "every hook and coord script parses JSON with jq" },
+    { "name": "curl",   "why": "coord API, vault, RMM, all HTTP calls" },
+    { "name": "sops",   "why": "vault decryption (SOPS)" },
+    { "name": "age",    "why": "SOPS age recipient/decrypt" },
+    { "name": "ssh",    "why": "infra access; must be system OpenSSH" }
+  ],
+
+  "required_python": {
+    "any_of": ["py", "python3", "python"],
+    "why": "JSON sanitizer in check-messages.sh, identity migration, skill scripts. The resolved command is recorded in identity.json (.python.command)."
+  },
+
+  "capability_tools": [
+    { "name": "ollama", "capability": "ollama_local",  "why": "Tier-0 local inference (prose/classification)" },
+    { "name": "cargo",  "capability": "rust_build",     "why": "GuruRMM / GuruConnect Rust builds" },
+    { "name": "node",   "capability": "node_build",     "why": "dashboard / TS builds" },
+    { "name": "gh",     "capability": "github_cli",     "why": "optional GitHub operations" },
+    { "name": "docker", "capability": "containers",     "why": "optional container workflows" },
+    { "name": "op",     "capability": "onepassword_cli","why": "1Password fallback credential access" }
+  ],
+
+  "required_identity_fields": [
+    "user", "full_name", "email", "role", "machine",
+    "vault_path", "claudetools_root", "platform", "architecture",
+    "python.command", "ollama.endpoint", "ollama.fallback", "ollama.prose_model"
+  ],
+  "optional_identity_fields": ["coord_api", "last_updated"],
+
+  "required_scripts": [
+    ".claude/scripts/vault.sh",
+    ".claude/scripts/sync.sh",
+    ".claude/scripts/sync-memory.sh",
+    ".claude/scripts/check-messages.sh",
+    ".claude/scripts/migrate-identity.sh"
+  ],
+
+  "required_hook_files": [
+    ".claude/hooks/block-backslash-winpath.sh",
+    ".claude/hooks/post-commit.template"
+  ],
+
+  "required_settings_hooks": [
+    { "event": "PreToolUse",      "matcher": "Bash", "command_contains": "block-backslash-winpath.sh", "why": "blocks garbled backslash Windows-path redirects in Git Bash" },
+    { "event": "UserPromptSubmit", "matcher": "",     "command_contains": "check-messages.sh",          "why": "injects unread coord messages + dev-mode locks each prompt" },
+    { "event": "SessionStart",    "matcher": "",     "command_contains": "sync-memory.sh",             "why": "pulls shared memory at session start" }
+  ],
+
+  "git": {
+    "remote_host_contains": "git.azcomputerguru.com",
+    "remote_host_internal_ip": "172.16.3.20",
+    "remote_note": "On-network machines should use the internal Gitea IP (172.16.3.20:3000) to bypass NPM SSL-renewal blips; off-network may use the domain. Either is acceptable; a non-ACG remote is a FAIL.",
+    "post_commit_hook_expected": true,
+    "post_commit_hook_note": "HOOKS.md mandates the dev-alerts post-commit hook in the main repo and each initialized submodule. Missing = AMBER (informational; reinstall from .claude/hooks/post-commit.template)."
+  },
+
+  "skills": [
+    "1password", "b2", "bitdefender", "frontend-design", "gc-audit",
+    "impeccable", "memory-dream", "remediation-tool", "rmm-audit",
+    "skill-creator", "stop-slop", "theme-factory", "self-check"
+  ],
+
+  "commands": [
+    "1password", "autotask", "checkpoint", "context", "create-spec",
+    "feature-request", "forum-post", "gc-feature-request", "import",
+    "inject-standards", "mailbox", "mode", "recover", "remediation-tool",
+    "rmm", "save", "scc", "shape-spec", "sync", "syncro-emergency-billing",
+    "syncro", "wiki-compile", "wiki-lint", "self-check"
+  ],
+
+  "connectivity": [
+    { "name": "coord_api",      "url": "http://172.16.3.30:8001/api/coord/status", "required": true,  "why": "live coordination source of truth" },
+    { "name": "claudetools_api","url": "http://172.16.3.30:8001/health",            "required": false, "why": "main API health" },
+    { "name": "gitea_internal", "url": "http://172.16.3.20:3000",                   "required": false, "why": "internal Gitea (git/API on-network)" }
+  ],
+
+  "memory": {
+    "note": "Deterministic memory checks: MEMORY.md index exists + no orphaned memory files, plus the contradiction_patterns below. A pattern fires ONLY on machines where identity.<when_field> == when_equals, so it flags a memory only where it is actually a contradiction for THIS box. Kept empty in V1 to avoid false positives; the real semantic contradiction analysis (memories vs identity.json + settings.json + this manifest) is done by the model per SKILL.md, optionally via Ollama Tier-0.",
+    "pattern_schema": {
+      "when_field": "dotted identity.json path, e.g. python.command",
+      "when_equals": "value that makes the grep a contradiction, e.g. python3",
+      "grep": "ERE matched case-insensitively against memory files",
+      "why": "human explanation shown in the finding"
+    },
+    "contradiction_patterns": []
+  },
+
+  "capability_rules": {
+    "ollama_local": {
+      "tier0_engine": "local ollama (localhost:11434) for summarize/classify/extract/draft",
+      "detect": "curl localhost:11434/api/tags reachable",
+      "fallback_if_unavailable": "ollama_remote"
+    },
+    "ollama_remote": {
+      "tier0_engine": "remote ollama via identity.ollama.fallback (Beast over Tailscale)",
+      "detect": "localhost unreachable but identity.ollama.fallback reachable",
+      "fallback_if_unavailable": "ollama_none"
+    },
+    "ollama_none": {
+      "tier0_engine": "NONE - no Tier-0 path. Route low-stakes prose/classification to Tier-1 (haiku) instead of Ollama. Do NOT block work waiting on Ollama.",
+      "detect": "neither localhost nor fallback reachable",
+      "fallback_if_unavailable": null
+    }
+  }
+}
diff --git a/.claude/skills/self-check/scripts/self-check.sh b/.claude/skills/self-check/scripts/self-check.sh
new file mode 100644
index 0000000..6286508
--- /dev/null
+++ b/.claude/skills/self-check/scripts/self-check.sh
@@ -0,0 +1,746 @@
+#!/usr/bin/env bash
+# self-check.sh - ClaudeTools harness self-diagnosis / fleet conformance probe.
+#
+# V1 is a CENSUS tool. Each machine probes its own harness wiring (tools,
+# identity, hooks, skills, commands, scripts, connectivity, capability tier),
+# grades what it can against the provisional baseline manifest, and can publish
+# the result to the coord API so the fleet can be compared and the baseline
+# refined from real data. See ../SKILL.md and ../baseline/README.md.
+#
+# Usage:
+#   self-check.sh                 Run checks, print a human report. (default)
+#   self-check.sh --json          Emit the structured census JSON to stdout only.
+#   self-check.sh --publish       Run checks, then PUT the census to coord (component selfcheck_<host>).
+#   self-check.sh fanout          Broadcast a request to ALL_SESSIONS to run /self-check --publish.
+#   self-check.sh aggregate       Read every machine's published census and print a fleet table
+#                                 plus a proposed-baseline (intersection/union) summary.
+#
+# Portable: bash 3.2+ (macOS), Git Bash (Windows), Linux. Deps: jq, curl.
+# Read-only. It collects and reports; it changes nothing on the machine.
+
+set -u
+
+# ---------------------------------------------------------------------------
+# Bootstrap: resolve repo root, identity, coord API, session id
+# ---------------------------------------------------------------------------
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+SKILL_DIR="$(cd "$SCRIPT_DIR/.." && pwd)"
+REPO_ROOT="$(cd "$SCRIPT_DIR/../../../.." && pwd)"
+MANIFEST="$SKILL_DIR/baseline/manifest.json"
+
+if ! command -v jq >/dev/null 2>&1; then
+    echo "[ERROR] jq is required and not found on PATH. Install jq, then re-run." >&2
+    exit 2
+fi
+# Some Windows jq builds (winget) emit CRLF line endings; a trailing \r corrupts
+# every `for x in $(jq ...)` word and `read`-from-@tsv field. Strip \r from all
+# jq output (it is insignificant JSON whitespace and never wanted in raw values).
+jq() { command jq "$@" | tr -d '\r'; }
+if ! command -v curl >/dev/null 2>&1; then
+    echo "[ERROR] curl is required and not found on PATH." >&2
+    exit 2
+fi
+if [ ! -f "$MANIFEST" ]; then
+    echo "[ERROR] Baseline manifest not found: $MANIFEST" >&2
+    exit 2
+fi
+
+# identity.json: prefer repo copy, then ~/.claude (mirrors check-messages.sh).
+IDENTITY=""
+for c in "$REPO_ROOT/.claude/identity.json" "$HOME/.claude/identity.json"; do
+    [ -f "$c" ] && { IDENTITY="$c"; break; }
+done
+
+idfield() { # dotted.path  -> value or empty
+    [ -n "$IDENTITY" ] && jq -r "$1 // empty" "$IDENTITY" 2>/dev/null
+}
+
+HOSTNAME_RAW="$(hostname 2>/dev/null || echo unknown)"
+HOST="${HOSTNAME_RAW%.local}"
+SESSION="${HOST}/claude-main"
+
+API="$(idfield '.coord_api')"
+[ -z "$API" ] && API="http://172.16.3.30:8001"
+
+PLATFORM="$(idfield '.platform')"
+[ -z "$PLATFORM" ] && case "$(uname -s)" in
+    Darwin) PLATFORM="macos" ;; Linux) PLATFORM="linux" ;;
+    CYGWIN*|MINGW*|MSYS*) PLATFORM="windows" ;; *) PLATFORM="unknown" ;;
+esac
+ARCH="$(idfield '.architecture')"
+[ -z "$ARCH" ] && ARCH="$(uname -m 2>/dev/null || echo unknown)"
+
+# ---------------------------------------------------------------------------
+# Results accumulation. Each check appends one compact JSON object.
+# status in {PASS, WARN, FAIL, SKIP, INFO}. Grade: any FAIL->RED, WARN->AMBER.
+# ---------------------------------------------------------------------------
+RESULTS_FILE="$(mktemp 2>/dev/null || echo "${TMPDIR:-/tmp}/selfcheck.$$")"
+: > "$RESULTS_FILE"
+trap 'rm -f "$RESULTS_FILE" 2>/dev/null' EXIT
+
+emit() { # id  category  status  detail  fix
+    jq -nc --arg id "$1" --arg cat "$2" --arg st "$3" --arg detail "$4" --arg fix "${5:-}" \
+        '{id:$id,category:$cat,status:$st,detail:$detail,fix:$fix}' >> "$RESULTS_FILE"
+}
+
+reachable() { curl -s -o /dev/null -m 4 "$1" 2>/dev/null; }   # exit 0 if HTTP responds
+
+# Content-equal ignoring line endings: a repo LF copy and a ~/.claude CRLF copy
+# are the SAME content (the cross-machine case this check polices), so compare
+# with \r stripped rather than byte-for-byte (cmp would false-flag them).
+same_content() { diff -q <(tr -d '\r' < "$1") <(tr -d '\r' < "$2") >/dev/null 2>&1; }
+
+# ---------------------------------------------------------------------------
+# CHECK: identity
+# ---------------------------------------------------------------------------
+check_identity() {
+    if [ -z "$IDENTITY" ]; then
+        emit identity.present identity FAIL "identity.json not found (.claude or ~/.claude)" \
+            "Run onboarding; create .claude/identity.json then 'bash .claude/scripts/migrate-identity.sh'"
+        return
+    fi
+    if ! jq -e . "$IDENTITY" >/dev/null 2>&1; then
+        emit identity.parse identity FAIL "identity.json is not valid JSON: $IDENTITY" "Fix the JSON syntax"
+        return
+    fi
+    emit identity.present identity PASS "identity.json present and valid: $IDENTITY"
+
+    local missing=""
+    for f in $(jq -r '.required_identity_fields[]' "$MANIFEST"); do
+        local v; v="$(jq -r ".$f // empty" "$IDENTITY" 2>/dev/null)"
+        [ -z "$v" ] && missing="$missing $f"
+    done
+    if [ -n "$missing" ]; then
+        emit identity.fields identity WARN "missing/empty identity fields:$missing" \
+            "bash .claude/scripts/migrate-identity.sh  (populates machine-specific fields)"
+    else
+        emit identity.fields identity PASS "all required identity fields present"
+    fi
+
+    # --- path fields: identity.json is the map of WHERE things live on this box.
+    # It is foundational - every later check trusts claudetools_root / vault_path.
+    # Verify they resolve to real locations and that claudetools_root is in fact
+    # the repo we are running from (a stale clone path is a silent footgun).
+    norm() { # path -> lowercase, forward-slash, drive-letter, no trailing slash
+        local p="$1"
+        command -v cygpath >/dev/null 2>&1 && p="$(cygpath -m "$p" 2>/dev/null || echo "$p")"
+        printf '%s' "$p" | tr 'A-Z' 'a-z' | sed 's#\\#/#g; s#/\{1,\}$##'
+    }
+    local ctroot; ctroot="$(idfield '.claudetools_root')"
+    if [ -z "$ctroot" ]; then
+        emit identity.claudetools_root identity FAIL "identity.claudetools_root not set" \
+            "Set claudetools_root in identity.json to this repo's absolute path"
+    elif [ ! -d "$ctroot" ]; then
+        emit identity.claudetools_root identity FAIL "claudetools_root does not exist: $ctroot" \
+            "Fix claudetools_root in identity.json (machine moved/renamed the repo?)"
+    elif [ "$(norm "$ctroot")" != "$(norm "$REPO_ROOT")" ]; then
+        emit identity.claudetools_root identity WARN \
+            "claudetools_root ($ctroot) != running repo ($REPO_ROOT)" \
+            "Reconcile claudetools_root in identity.json with the repo you actually run from"
+    else
+        emit identity.claudetools_root identity PASS "claudetools_root resolves to this repo ($ctroot)"
+    fi
+
+    local vpath2; vpath2="$(idfield '.vault_path')"
+    if [ -z "$vpath2" ]; then
+        emit identity.vault_path identity FAIL "identity.vault_path not set (cannot locate the SOPS vault)" \
+            "Set vault_path in identity.json to the cloned vault repo path"
+    elif [ ! -d "$vpath2" ]; then
+        emit identity.vault_path identity FAIL "vault_path does not exist: $vpath2" \
+            "Clone the vault repo and set vault_path in identity.json"
+    else
+        emit identity.vault_path identity PASS "vault_path resolves ($vpath2)"
+    fi
+
+    # machine field vs actual hostname
+    local idmach; idmach="$(idfield '.machine')"
+    if [ -n "$idmach" ] && [ "$(echo "$idmach" | tr 'A-Z' 'a-z')" != "$(echo "$HOST" | tr 'A-Z' 'a-z')" ]; then
+        emit identity.hostname identity WARN "identity.machine='$idmach' != actual hostname '$HOST'" \
+            "Update .machine in identity.json (did you clone onto a new box?)"
+    else
+        emit identity.hostname identity PASS "identity.machine matches hostname ($HOST)"
+    fi
+
+    # git config vs identity
+    local gn ge idn ide
+    gn="$(git -C "$REPO_ROOT" config user.name 2>/dev/null)"
+    ge="$(git -C "$REPO_ROOT" config user.email 2>/dev/null)"
+    idn="$(idfield '.full_name')"; ide="$(idfield '.email')"
+    if [ -n "$idn" ] && [ "$gn" != "$idn" ]; then
+        emit identity.git_name identity WARN "git user.name='$gn' != identity.full_name='$idn'" \
+            "git config user.name \"$idn\""
+    else
+        emit identity.git_name identity PASS "git user.name matches identity ($gn)"
+    fi
+    if [ -n "$ide" ] && [ "$ge" != "$ide" ]; then
+        emit identity.git_email identity WARN "git user.email='$ge' != identity.email='$ide'" \
+            "git config user.email \"$ide\""
+    else
+        emit identity.git_email identity PASS "git user.email matches identity ($ge)"
+    fi
+}
+
+# ---------------------------------------------------------------------------
+# CHECK: tooling (required + capability-gated)
+# ---------------------------------------------------------------------------
+toolver() { # best-effort one-line version
+    "$1" --version 2>/dev/null | head -1 || true
+}
+check_tools() {
+    local n why
+    while IFS=$'\t' read -r n why; do
+        if command -v "$n" >/dev/null 2>&1; then
+            emit "tool.$n" tooling PASS "$n present ($(toolver "$n"))"
+        else
+            emit "tool.$n" tooling FAIL "$n MISSING (required: $why)" "Install $n and ensure it is on PATH"
+        fi
+    done < <(jq -r '.required_tools[] | [.name, .why] | @tsv' "$MANIFEST")
+
+    # python: any_of
+    local pyok="" pc
+    for pc in $(jq -r '.required_python.any_of[]' "$MANIFEST"); do
+        if command -v "$pc" >/dev/null 2>&1; then pyok="$pc"; break; fi
+    done
+    if [ -n "$pyok" ]; then
+        local declared; declared="$(idfield '.python.command')"
+        if [ -n "$declared" ] && ! command -v "$declared" >/dev/null 2>&1; then
+            emit tool.python tooling WARN "identity.python.command='$declared' not on PATH; '$pyok' is available" \
+                "Update .python.command in identity.json or re-run migrate-identity.sh"
+        else
+            emit tool.python tooling PASS "python available ($pyok; identity declares '${declared:-unset}')"
+        fi
+    else
+        emit tool.python tooling FAIL "no python interpreter found (tried py/python3/python)" "Install Python"
+    fi
+
+    # capability tools - presence only, never FAIL
+    local cn cap cwhy
+    while IFS=$'\t' read -r cn cap cwhy; do
+        if command -v "$cn" >/dev/null 2>&1; then
+            emit "cap.$cn" capability INFO "$cn present [$cap] ($(toolver "$cn"))"
+        else
+            emit "cap.$cn" capability INFO "$cn absent [$cap] - capability off ($cwhy)"
+        fi
+    done < <(jq -r '.capability_tools[] | [.name, .capability, .why] | @tsv' "$MANIFEST")
+}
+
+# ---------------------------------------------------------------------------
+# CHECK: capability tier (ollama) + effective ruleset
+# ---------------------------------------------------------------------------
+check_capability_tier() {
+    local declared fb local_ok="" remote_ok="" tier rule
+    declared="$(idfield '.ollama.endpoint')"
+    fb="$(idfield '.ollama.fallback')"
+
+    reachable "http://localhost:11434/api/tags" && local_ok=1
+    [ -n "$fb" ] && reachable "${fb%/}/api/tags" && remote_ok=1
+
+    if [ -n "$local_ok" ]; then
+        tier="ollama_local"
+    elif [ -n "$remote_ok" ]; then
+        tier="ollama_remote"
+    else
+        tier="ollama_none"
+    fi
+    rule="$(jq -r ".capability_rules.$tier.tier0_engine" "$MANIFEST")"
+
+    # Does the resolved tier agree with what identity declares?
+    if [ "$tier" = "ollama_none" ]; then
+        emit captier.ollama capability WARN "Ollama tier = NONE (local + fallback both unreachable). Effective rule: $rule" \
+            "Confirm this machine is meant to run without Ollama; ensure Tier-0 work routes to haiku, not blocked"
+    else
+        local note=""
+        if [ "$tier" = "ollama_remote" ] && echo "$declared" | grep -q "localhost"; then
+            note=" (identity declares localhost but local is down; using fallback $fb)"
+        fi
+        emit captier.ollama capability PASS "Ollama tier = ${tier#ollama_}${note}. Effective Tier-0: $rule"
+    fi
+}
+
+# ---------------------------------------------------------------------------
+# CHECK: required scripts + hook files (exist + executable)
+# ---------------------------------------------------------------------------
+check_files() {
+    local rel p
+    for rel in $(jq -r '.required_scripts[], .required_hook_files[]' "$MANIFEST"); do
+        p="$REPO_ROOT/$rel"
+        if [ ! -f "$p" ]; then
+            emit "file.$rel" files FAIL "missing: $rel" "Restore via /sync (git pull from Gitea)"
+        elif [ ! -x "$p" ] && echo "$rel" | grep -qE '\.(sh|template)$'; then
+            emit "file.$rel" files WARN "present but not executable: $rel" "chmod +x \"$rel\""
+        else
+            emit "file.$rel" files PASS "present: $rel"
+        fi
+    done
+}
+
+# ---------------------------------------------------------------------------
+# CHECK: settings.json hooks wired correctly
+# ---------------------------------------------------------------------------
+check_settings_hooks() {
+    local settings="$REPO_ROOT/.claude/settings.json"
+    if [ ! -f "$settings" ] || ! jq -e . "$settings" >/dev/null 2>&1; then
+        emit hooks.settings hooks FAIL "settings.json missing or invalid JSON" "Restore .claude/settings.json via /sync"
+        return
+    fi
+    local ev needle why found
+    # NB: omit .matcher from the TSV - an empty middle field collapses under tab
+    # IFS (tab is IFS-whitespace), shifting columns. We do not use the matcher here.
+    while IFS=$'\t' read -r ev needle why; do
+        # any hook command under this event containing the needle
+        found="$(jq -r --arg ev "$ev" --arg n "$needle" \
+            '(.hooks[$ev] // []) | [.[].hooks[]?.command // ""] | map(select(contains($n))) | length' \
+            "$settings" 2>/dev/null)"
+        if [ "${found:-0}" -gt 0 ] 2>/dev/null; then
+            emit "hook.$ev" hooks PASS "$ev hook wired ($needle)"
+        else
+            emit "hook.$ev" hooks FAIL "$ev hook NOT wired (expected command containing '$needle' - $why)" \
+                "Add the $ev hook to .claude/settings.json (see baseline manifest required_settings_hooks)"
+        fi
+    done < <(jq -r '.required_settings_hooks[] | [.event, .command_contains, .why] | @tsv' "$MANIFEST")
+
+    # current-mode file (created by UserPromptSubmit hook, but flag if absent)
+    if [ -f "$REPO_ROOT/.claude/current-mode" ]; then
+        emit hook.current-mode hooks PASS "current-mode present ($(tr -d '[:space:]' < "$REPO_ROOT/.claude/current-mode"))"
+    else
+        emit hook.current-mode hooks WARN "current-mode missing (auto-created on next prompt)" "echo general > .claude/current-mode"
+    fi
+}
+
+# ---------------------------------------------------------------------------
+# CHECK: git remote + post-commit hooks
+# ---------------------------------------------------------------------------
+check_git() {
+    local url want host_ip
+    url="$(git -C "$REPO_ROOT" remote get-url origin 2>/dev/null)"
+    want="$(jq -r '.git.remote_host_contains' "$MANIFEST")"
+    host_ip="$(jq -r '.git.remote_host_internal_ip' "$MANIFEST")"
+    if [ -z "$url" ]; then
+        emit git.remote git WARN "no 'origin' remote on $REPO_ROOT" "git remote add origin <gitea-url>"
+    elif echo "$url" | grep -qF "$want" || echo "$url" | grep -qF "$host_ip"; then
+        emit git.remote git PASS "origin -> $url"
+    else
+        emit git.remote git FAIL "origin does not point at ACG Gitea: $url" \
+            "git remote set-url origin http://<user>@$host_ip:3000/azcomputerguru/claudetools.git"
+    fi
+
+    if [ "$(jq -r '.git.post_commit_hook_expected' "$MANIFEST")" = "true" ]; then
+        if [ -f "$REPO_ROOT/.git/hooks/post-commit" ]; then
+            emit git.post_commit git PASS "main-repo post-commit hook installed"
+        else
+            emit git.post_commit git WARN "main-repo post-commit hook NOT installed (HOOKS.md mandates dev-alerts hook)" \
+                "cp .claude/hooks/post-commit.template .git/hooks/post-commit && chmod +x .git/hooks/post-commit"
+        fi
+    fi
+}
+
+# ---------------------------------------------------------------------------
+# CHECK: skills + commands conformance vs manifest
+# ---------------------------------------------------------------------------
+check_skills_commands() {
+    local name dir
+    # skills present
+    for name in $(jq -r '.skills[]' "$MANIFEST"); do
+        dir="$REPO_ROOT/.claude/skills/$name"
+        if [ -d "$dir" ]; then
+            if [ -f "$dir/SKILL.md" ] || ls "$dir"/*.md >/dev/null 2>&1 || [ -d "$dir/scripts" ]; then
+                emit "skill.$name" skills PASS "skill present: $name"
+            else
+                emit "skill.$name" skills WARN "skill dir present but looks empty: $name" "Restore skill contents via /sync"
+            fi
+        else
+            emit "skill.$name" skills FAIL "skill MISSING: $name" "Restore .claude/skills/$name via /sync"
+        fi
+    done
+    # extra skills not in manifest (drift to report, not fail in V1)
+    local known
+    known="|$(jq -r '.skills[]' "$MANIFEST" | tr '\n' '|')"
+    for dir in "$REPO_ROOT"/.claude/skills/*/; do
+        [ -d "$dir" ] || continue
+        name="$(basename "$dir")"
+        case "$known" in *"|$name|"*) ;; *) emit "skill.extra.$name" skills INFO "skill present but NOT in baseline: $name (census candidate)" ;; esac
+    done
+
+    # commands present
+    for name in $(jq -r '.commands[]' "$MANIFEST"); do
+        if [ -f "$REPO_ROOT/.claude/commands/$name.md" ]; then
+            emit "cmd.$name" commands PASS "command present: /$name"
+        else
+            emit "cmd.$name" commands FAIL "command MISSING: /$name" "Restore .claude/commands/$name.md via /sync"
+        fi
+    done
+    # extra commands
+    known="|$(jq -r '.commands[]' "$MANIFEST" | tr '\n' '|')"
+    for f in "$REPO_ROOT"/.claude/commands/*.md; do
+        [ -f "$f" ] || continue
+        name="$(basename "$f" .md)"
+        [ "$name" = "README" ] && continue
+        case "$known" in *"|$name|"*) ;; *) emit "cmd.extra.$name" commands INFO "command present but NOT in baseline: /$name (census candidate)" ;; esac
+    done
+}
+
+# ---------------------------------------------------------------------------
+# CHECK: vault decrypt readiness
+# ---------------------------------------------------------------------------
+check_vault() {
+    local vpath; vpath="$(idfield '.vault_path')"
+    if [ -z "$vpath" ]; then
+        emit vault.path vault WARN "identity.vault_path not set" "Set vault_path in identity.json"
+        return
+    fi
+    if [ ! -d "$vpath" ]; then
+        emit vault.path vault FAIL "vault_path does not exist: $vpath" "Clone the vault repo and set vault_path"
+        return
+    fi
+    emit vault.path vault PASS "vault repo present: $vpath"
+    if ! command -v sops >/dev/null 2>&1 || ! command -v age >/dev/null 2>&1; then
+        emit vault.tools vault FAIL "sops/age missing - cannot decrypt vault" "Install sops and age"
+        return
+    fi
+    # Lightweight readiness: vault.sh list should enumerate entries without error.
+    if [ -x "$REPO_ROOT/.claude/scripts/vault.sh" ]; then
+        if bash "$REPO_ROOT/.claude/scripts/vault.sh" list >/dev/null 2>&1; then
+            emit vault.list vault PASS "vault.sh list succeeded (sops/age wired)"
+        else
+            emit vault.list vault WARN "vault.sh list failed - check age key + SOPS_AGE_KEY_FILE" \
+                "Verify age key at the SOPS recipient path; run: bash .claude/scripts/vault.sh list"
+        fi
+    fi
+}
+
+# ---------------------------------------------------------------------------
+# CHECK: connectivity
+# ---------------------------------------------------------------------------
+check_connectivity() {
+    local name url req
+    while IFS=$'\t' read -r name url req; do
+        if reachable "$url"; then
+            emit "net.$name" connectivity PASS "$name reachable ($url)"
+        elif [ "$req" = "true" ]; then
+            emit "net.$name" connectivity FAIL "$name UNREACHABLE ($url)" "Check VPN/Tailscale/network to 172.16.3.x"
+        else
+            emit "net.$name" connectivity WARN "$name unreachable ($url) - off-network is OK" ""
+        fi
+    done < <(jq -r '.connectivity[] | [.name, .url, (.required|tostring)] | @tsv' "$MANIFEST")
+}
+
+# ---------------------------------------------------------------------------
+# CHECK: duplicate command/skill definitions across search roots.
+# Claude Code resolves slash commands and skills from BOTH the repo
+# (.claude/commands, .claude/skills) and the user profile (~/.claude/...). When
+# the same name exists in both with DIFFERENT content, the harness may resolve a
+# different one than you expect - the "same /cmd, different behaviour on the Mac"
+# bug. Divergent = WARN; identical = INFO (redundant copy that WILL drift).
+# ---------------------------------------------------------------------------
+check_duplicates() {
+    local kind repo_dir user_dir
+    # commands: compare *.md files by content
+    for kind in commands skills; do
+        repo_dir="$REPO_ROOT/.claude/$kind"
+        user_dir="$HOME/.claude/$kind"
+        [ -d "$repo_dir" ] || continue
+        [ -d "$user_dir" ] || { emit "dup.$kind" duplicates PASS "no user-level ~/.claude/$kind (single source: repo)"; continue; }
+
+        local name rp up dup_div=0 dup_same=0
+        if [ "$kind" = "commands" ]; then
+            for rp in "$repo_dir"/*.md; do
+                [ -f "$rp" ] || continue
+                name="$(basename "$rp" .md)"
+                [ "$name" = "README" ] && continue
+                up="$user_dir/$name.md"
+                [ -f "$up" ] || continue
+                [ "$rp" -ef "$up" ] && continue   # symlink to the same file - cannot drift
+                if same_content "$rp" "$up"; then
+                    dup_same=$((dup_same+1))
+                else
+                    dup_div=$((dup_div+1))
+                    emit "dup.cmd.$name" duplicates WARN \
+                        "/$name is DIVERGENT: repo and ~/.claude copies differ (harness may run the wrong one)" \
+                        "Reconcile: diff \"$rp\" \"$up\"  then make ~/.claude/commands/$name.md match the repo (or remove it)"
+                fi
+            done
+        else
+            for rp in "$repo_dir"/*/; do
+                [ -d "$rp" ] || continue
+                name="$(basename "$rp")"
+                up="$user_dir/$name"
+                [ -d "$up" ] || continue
+                [ "$rp" -ef "$up" ] && continue   # symlinked dir - cannot drift
+                # Only compare when BOTH have a SKILL.md; otherwise not comparable
+                # (script-only / *.md-only skills) - skip rather than miscount.
+                if [ -f "$rp/SKILL.md" ] && [ -f "$up/SKILL.md" ]; then
+                    if same_content "$rp/SKILL.md" "$up/SKILL.md"; then
+                        dup_same=$((dup_same+1))
+                    else
+                        dup_div=$((dup_div+1))
+                        emit "dup.skill.$name" duplicates WARN \
+                            "skill '$name' is DIVERGENT: repo and ~/.claude SKILL.md differ" \
+                            "Reconcile ~/.claude/skills/$name with the repo copy (or remove the user-level one)"
+                    fi
+                fi
+            done
+        fi
+        if [ "$dup_div" -eq 0 ] && [ "$dup_same" -gt 0 ]; then
+            emit "dup.$kind" duplicates INFO \
+                "$dup_same $kind exist in BOTH repo and ~/.claude (identical now, but a redundant copy that can drift)" \
+                "Consider a single source of truth for $kind to prevent future divergence"
+        elif [ "$dup_div" -eq 0 ] && [ "$dup_same" -eq 0 ]; then
+            emit "dup.$kind" duplicates PASS "no duplicate $kind across roots"
+        fi
+    done
+}
+
+# ---------------------------------------------------------------------------
+# CHECK: rogue memories that contradict settings/identity.
+# Deterministic core only: index integrity + a conservative, manifest-declared
+# set of contradiction patterns evaluated against this machine's identity. The
+# SEMANTIC contradiction pass (reasoning over all memories vs identity/settings)
+# is a judgment task and is delegated to the model in SKILL.md, not grep.
+# ---------------------------------------------------------------------------
+check_memory() {
+    local mdir="$REPO_ROOT/.claude/memory" idx="$REPO_ROOT/.claude/memory/MEMORY.md"
+    if [ ! -d "$mdir" ]; then
+        emit memory.dir memory WARN "no .claude/memory directory" "Expected the shared memory store; restore via /sync"
+        return
+    fi
+    if [ ! -f "$idx" ]; then
+        emit memory.index memory WARN "MEMORY.md index missing" "Create .claude/memory/MEMORY.md (the loaded index)"
+    else
+        # orphan detection: every *.md (except MEMORY.md) should be referenced in the index
+        local f base orphans=0
+        for f in "$mdir"/*.md; do
+            [ -f "$f" ] || continue
+            base="$(basename "$f")"
+            [ "$base" = "MEMORY.md" ] && continue
+            if ! grep -qF "$base" "$idx" 2>/dev/null; then
+                orphans=$((orphans+1))
+            fi
+        done
+        if [ "$orphans" -gt 0 ]; then
+            emit memory.orphans memory WARN "$orphans memory file(s) not referenced in MEMORY.md (orphaned)" \
+                "Run /memory-dream or add the missing index lines"
+        else
+            emit memory.index memory PASS "MEMORY.md index present; no orphaned memory files"
+        fi
+    fi
+
+    # Manifest-declared contradiction patterns. Each entry:
+    #   { when_field, when_equals, grep, why }  - only evaluated when this
+    #   machine's identity.<when_field> == when_equals, so a pattern fires only
+    #   where it is actually a contradiction (e.g. prescribing python3 on a `py` box).
+    # NB: fields are read via @tsv, so when_equals/grep MUST NOT contain tab chars.
+    local has; has="$(jq -r '(.memory.contradiction_patterns // []) | length' "$MANIFEST" 2>/dev/null)"
+    if [ "${has:-0}" -gt 0 ] 2>/dev/null; then
+        local wf we gx why hits
+        while IFS=$'\t' read -r wf we gx why; do
+            [ -n "$wf" ] || continue
+            if [ "$(idfield ".$wf")" = "$we" ]; then
+                hits="$(grep -rliE "$gx" "$mdir" 2>/dev/null | grep -vF 'MEMORY.md' | head -5 | tr '\n' ' ')"
+                if [ -n "$hits" ]; then
+                    emit "memory.contradiction.$wf" memory WARN \
+                        "memory may contradict identity.$wf=$we ($why): $hits" \
+                        "Review the listed memory file(s); correct or delete if they prescribe the wrong behaviour for this machine"
+                fi
+            fi
+        done < <(jq -r '(.memory.contradiction_patterns // [])[] | [.when_field, .when_equals, .grep, .why] | @tsv' "$MANIFEST")
+    fi
+}
+
+# ---------------------------------------------------------------------------
+# Build the census JSON from accumulated results
+# ---------------------------------------------------------------------------
+build_census() {
+    local fails warns grade
+    fails="$(jq -s '[.[]|select(.status=="FAIL")]|length' "$RESULTS_FILE")"
+    warns="$(jq -s '[.[]|select(.status=="WARN")]|length' "$RESULTS_FILE")"
+    if [ "$fails" -gt 0 ]; then grade="RED"; elif [ "$warns" -gt 0 ]; then grade="AMBER"; else grade="GREEN"; fi
+
+    jq -s \
+        --arg host "$HOST" --arg session "$SESSION" --arg platform "$PLATFORM" --arg arch "$ARCH" \
+        --arg grade "$grade" --arg ts "$RUN_TS" \
+        --arg mver "$(jq -r '.schema_version' "$MANIFEST")" \
+        '{
+            host:$host, session:$session, platform:$platform, arch:$arch,
+            grade:$grade, generated_at:$ts, manifest_version:$mver,
+            summary: { pass:([.[]|select(.status=="PASS")]|length),
+                       warn:([.[]|select(.status=="WARN")]|length),
+                       fail:([.[]|select(.status=="FAIL")]|length),
+                       info:([.[]|select(.status=="INFO")]|length) },
+            results: .
+         }' "$RESULTS_FILE"
+}
+
+# ---------------------------------------------------------------------------
+# Human report
+# ---------------------------------------------------------------------------
+print_report() {
+    local census="$1" grade
+    grade="$(echo "$census" | jq -r .grade)"
+    echo ""
+    echo "============================================================"
+    echo " ClaudeTools self-check - $HOST ($PLATFORM/$ARCH)"
+    echo " Grade: $grade   $(echo "$census" | jq -r '.summary | "PASS \(.pass)  WARN \(.warn)  FAIL \(.fail)  INFO \(.info)"')"
+    echo " Manifest: $(echo "$census" | jq -r .manifest_version) (provisional)   $RUN_TS"
+    echo "============================================================"
+    # FAIL then WARN then INFO; PASS summarized per category
+    echo "$census" | jq -r '
+        def mark(s): if s=="FAIL" then "[FAIL]" elif s=="WARN" then "[WARN]"
+                     elif s=="INFO" then "[INFO]" elif s=="SKIP" then "[SKIP]" else "[ OK ]" end;
+        (.results | map(select(.status=="FAIL"))) as $f
+        | (.results | map(select(.status=="WARN"))) as $w
+        | (.results | map(select(.status=="INFO"))) as $i
+        | (if ($f|length)>0 then "\nFAILURES:" else empty end),
+          ($f[] | "  [FAIL] \(.category)/\(.id): \(.detail)" + (if .fix!="" then "\n         fix: \(.fix)" else "" end)),
+          (if ($w|length)>0 then "\nWARNINGS:" else empty end),
+          ($w[] | "  [WARN] \(.category)/\(.id): \(.detail)" + (if .fix!="" then "\n         fix: \(.fix)" else "" end)),
+          (if ($i|length)>0 then "\nINFO / capability:" else empty end),
+          ($i[] | "  [INFO] \(.detail)")
+    '
+    # per-category PASS counts
+    echo ""
+    echo "PASS by category:"
+    echo "$census" | jq -r '.results | map(select(.status=="PASS")) | group_by(.category)[] | "  \(.[0].category): \(length) ok"'
+    echo "============================================================"
+}
+
+# ---------------------------------------------------------------------------
+# Publish census to coord API.
+# The coord API uses the PATH-PARAM form: PUT /api/coord/components/{pk}/{comp}
+# with a body of {state, version, notes, updated_by} (the body form 405s).
+# The component segment must be slash-free (a slash 404s, even URL-encoded), so
+# the per-machine component is "selfcheck_<host>" (NOT "selfcheck/<host>").
+# ---------------------------------------------------------------------------
+COMPONENT="selfcheck_$HOST"
+publish_census() {
+    local census="$1" grade compact body path
+    grade="$(echo "$census" | jq -r .grade)"
+    compact="$(echo "$census" | jq -c .)"
+    path="/api/coord/components/claudetools/$COMPONENT"
+    body="$(jq -nc --arg state "$grade" \
+        --arg ver "$(jq -r '.schema_version' "$MANIFEST")" --arg notes "$compact" --arg by "$SESSION" \
+        '{state:$state, version:$ver, notes:$notes, updated_by:$by}')"
+    if curl -s -m 8 -X PUT "$API$path" -H "Content-Type: application/json" -d "$body" >/dev/null 2>&1; then
+        echo "[OK] Published census to coord: component $COMPONENT = $grade"
+    else
+        # softfail per coordination protocol - queue the SAME path/body so a
+        # later /sync drain replays a request that actually works.
+        local q="$REPO_ROOT/.claude/coord-queue.jsonl"
+        jq -nc --arg path "$path" --argjson b "$body" --arg ts "$RUN_TS" \
+            '{ts:$ts, method:"PUT", path:$path, body:$b}' >> "$q" 2>/dev/null
+        echo "[WARN] coord unreachable; census queued to .claude/coord-queue.jsonl"
+    fi
+}
+
+# ---------------------------------------------------------------------------
+# Subcommand: fanout - request all instances to run /self-check --publish
+# ---------------------------------------------------------------------------
+do_fanout() {
+    local subj body payload
+    subj="[self-check] Fleet census + self-remediation request"
+    body="On THIS machine: (1) run /self-check ; (2) apply the suggested fix commands it prints for any FAIL/WARN - fix your OWN machine, locally, with your operator present (nobody fixes you remotely) ; (3) re-run /self-check to confirm GREEN ; (4) run /self-check --publish to report your census (component selfcheck_<host>) to coord. The check is read-only; only --publish writes (your census only). Requested by $SESSION at $RUN_TS."
+    payload="$(jq -nc --arg from "$SESSION" --arg subj "$subj" --arg body "$body" \
+        '{from_session:$from, to_session:"ALL_SESSIONS", project_key:"claudetools", subject:$subj, body:$body}')"
+    if curl -s -m 8 -X POST "$API/api/coord/messages" -H "Content-Type: application/json" -d "$payload" >/dev/null 2>&1; then
+        echo "[OK] Broadcast census request to ALL_SESSIONS."
+    else
+        echo "[ERROR] Failed to broadcast (coord unreachable)." >&2
+        exit 1
+    fi
+}
+
+# ---------------------------------------------------------------------------
+# Subcommand: aggregate - read all published censuses, build fleet view
+# ---------------------------------------------------------------------------
+do_aggregate() {
+    local comps
+    comps="$(curl -s -m 8 "$API/api/coord/components?project_key=claudetools" 2>/dev/null)"
+    if [ -z "$comps" ]; then echo "[ERROR] coord unreachable." >&2; exit 1; fi
+    # The coord API returns {states:[...], total:N}; each row's grade is .state and
+    # the full census JSON is in .notes. Keep selfcheck_* rows with parseable notes.
+    # (.components / bare-array kept as defensive fallbacks.)
+    local censuses
+    censuses="$(echo "$comps" | jq -c '
+        ( .states? // .components? // (if type=="array" then . else [] end) ) as $rows
+        | ($rows // [])
+        | map(select((.component? // "") | startswith("selfcheck")))
+        | map(.notes | try fromjson catch empty)
+    ' 2>/dev/null)"
+    local n; n="$(echo "$censuses" | jq 'length' 2>/dev/null || echo 0)"
+    if [ "${n:-0}" -eq 0 ]; then
+        echo "No published censuses found yet. Run 'self-check.sh fanout', then have each machine run /self-check --publish."
+        return
+    fi
+    echo "============================================================"
+    echo " Fleet census: $n machine(s) reporting"
+    echo "============================================================"
+    echo "$censuses" | jq -r '.[] | "  \(.grade)\t\(.host)\t\(.platform)/\(.arch)\tP\(.summary.pass) W\(.summary.warn) F\(.summary.fail)\t\(.generated_at)"' | column -t -s$'\t' 2>/dev/null \
+        || echo "$censuses" | jq -r '.[] | "  \(.grade)  \(.host)  \(.platform)/\(.arch)  P\(.summary.pass) W\(.summary.warn) F\(.summary.fail)"'
+
+    echo ""
+    echo "Proposed baseline (intersection = required everywhere; symmetric diff = capability-gated):"
+    # Tools present on every machine vs only some, derived from tool.* PASS results.
+    echo "$censuses" | jq -r '
+        [ .[] | { host:.host, tools:( .results | map(select((.id|startswith("tool."))) | select(.status=="PASS") | (.id|sub("^tool.";""))) ) } ] as $m
+        | ($m|length) as $count
+        | ([ $m[].tools[] ] | unique) as $all
+        | "  tools on ALL \($count): " + ( [ $all[] | . as $t | select( ([ $m[] | select(.tools|index($t)) ]|length) == $count ) ] | join(", ") ),
+          "  tools on SOME only:    " + ( [ $all[] | . as $t | select( ([ $m[] | select(.tools|index($t)) ]|length) <  $count ) ] | join(", ") )
+    ' 2>/dev/null
+    echo ""
+    echo "Machines that must self-remediate (RED/AMBER) - each fixes ITSELF, then re-runs + re-publishes:"
+    local needfix
+    needfix="$(echo "$censuses" | jq -r '
+        .[] | select(.grade!="GREEN")
+        | "  \(.host) [\(.grade)] should run, in order:\n"
+          + ( [ .results[] | select(.status=="FAIL" or .status=="WARN") | select(.fix!="")
+                | "      - \(.fix)" ] | join("\n") )
+          + "\n      then: /self-check --publish"
+    ' 2>/dev/null)"
+    if [ -n "$needfix" ]; then
+        echo "$needfix"
+    else
+        echo "  (none - whole fleet is GREEN)"
+    fi
+    echo "============================================================"
+    echo "We do NOT fix remote machines. Relay each machine's fix list to its operator;"
+    echo "they self-remediate locally, re-run /self-check, and re-publish until GREEN."
+    echo "Once the fleet is reporting consistently, ratify baseline/manifest.json with Mike."
+}
+
+# ---------------------------------------------------------------------------
+# Main
+# ---------------------------------------------------------------------------
+# RUN_TS is passed in by the caller (SKILL.md instructs a real UTC stamp);
+# fall back to `date` if available so the script is runnable standalone.
+RUN_TS="${SELFCHECK_TS:-$(date -u +%Y-%m-%dT%H:%M:%SZ 2>/dev/null || echo unknown)}"
+
+MODE="${1:-report}"
+case "$MODE" in
+    fanout)     do_fanout; exit 0 ;;
+    aggregate)  do_aggregate; exit 0 ;;
+esac
+
+# run all checks
+check_identity
+check_tools
+check_capability_tier
+check_files
+check_settings_hooks
+check_git
+check_skills_commands
+check_duplicates
+check_memory
+check_vault
+check_connectivity
+
+CENSUS="$(build_census)"
+
+case "$MODE" in
+    --json)     echo "$CENSUS" ;;
+    --publish)  print_report "$CENSUS"; publish_census "$CENSUS" ;;
+    report|*)   print_report "$CENSUS" ;;
+esac
+
+# exit code reflects grade for scripting (0 GREEN, 1 AMBER, 2 RED)
+GR="$(echo "$CENSUS" | jq -r .grade)"
+case "$GR" in GREEN) exit 0 ;; AMBER) exit 1 ;; RED) exit 2 ;; *) exit 0 ;; esac