From b153ff158be236cace9957c01f18929e5703f940 Mon Sep 17 00:00:00 2001 From: Mike Swanson Date: Tue, 2 Jun 2026 14:45:42 -0700 Subject: [PATCH] feat(self-check): add harness self-diagnosis / fleet conformance skill New /self-check skill: each machine probes its own ClaudeTools harness wiring (identity.json paths, required tooling, settings.json hooks, skill/command/script set, vault decrypt, coord/Gitea connectivity, Ollama capability tier) and grades RED/AMBER/GREEN against a checked-in provisional baseline manifest. - Capability-tier model: architectural/OS/hardware differences (e.g. no local Ollama) select a fallback ruleset instead of failing. - Duplicate detection: flags command/skill names that diverge between the repo and ~/.claude (the "same /cmd, different behaviour" cross-machine bug); CRLF-only diffs ignored. - Memory check: index + orphan detection, plus a model-driven semantic pass for memories that contradict identity/settings. - V1 is a census tool: --publish writes a per-machine census to coord (component selfcheck_); fanout requests the fleet to self-check + self-remediate + re-publish; aggregate derives the proposed baseline. No machine ever fixes another. Reviewed twice by the Code Review Agent; three CRITICAL coord-API bugs and the CRLF false-WARN found and fixed, verified live against the coord API. Co-Authored-By: Claude Opus 4.8 (1M context) --- .claude/commands/self-check.md | 53 ++ .claude/skills/self-check/SKILL.md | 161 ++++ .claude/skills/self-check/baseline/README.md | 68 ++ .../skills/self-check/baseline/manifest.json | 114 +++ .../skills/self-check/scripts/self-check.sh | 746 ++++++++++++++++++ 5 files changed, 1142 insertions(+) create mode 100644 .claude/commands/self-check.md create mode 100644 .claude/skills/self-check/SKILL.md create mode 100644 .claude/skills/self-check/baseline/README.md create mode 100644 .claude/skills/self-check/baseline/manifest.json create mode 100644 .claude/skills/self-check/scripts/self-check.sh diff --git a/.claude/commands/self-check.md b/.claude/commands/self-check.md new file mode 100644 index 0000000..05f0d6d --- /dev/null +++ b/.claude/commands/self-check.md @@ -0,0 +1,53 @@ +--- +name: self-check +description: Self-diagnose this ClaudeTools machine's harness wiring (tools, identity, hooks, skills, commands, scripts, capability tier, connectivity) against the fleet baseline, and optionally publish the census to coord. +--- + +# /self-check — ClaudeTools Harness Self-Diagnosis + +Runs the conformance probe in `.claude/skills/self-check/scripts/self-check.sh`, +which grades this machine RED/AMBER/GREEN against the provisional baseline manifest +and accounts for capability differences (e.g. no local Ollama -> remote/none ruleset). + +See `.claude/skills/self-check/SKILL.md` for the full model. This command is a thin +runner — invoke the **self-check** skill for interpretation and follow-up. + +## Run it + +Always pass a real UTC timestamp via `SELFCHECK_TS` (the script falls back to +`date` if available, but pass it explicitly for correctness): + +```bash +SELFCHECK_TS="$(date -u +%Y-%m-%dT%H:%M:%SZ)" \ + bash .claude/skills/self-check/scripts/self-check.sh report +``` + +Modes: + +| Arg | What it does | +|-----|--------------| +| `report` (default) | Human-readable RED/AMBER/GREEN report. Exit 0/1/2 = GREEN/AMBER/RED. | +| `--json` | Structured census JSON only (for piping/aggregation). | +| `--publish` | Run + PUT the census to coord as component `selfcheck_`. | +| `fanout` | Broadcast a self-check + self-remediate + re-publish request to ALL_SESSIONS. | +| `aggregate` | Read every machine's published census; print fleet table + proposed baseline. | + +## Typical flows + +**Just check this box:** +```bash +SELFCHECK_TS="$(date -u +%Y-%m-%dT%H:%M:%SZ)" bash .claude/skills/self-check/scripts/self-check.sh report +``` + +**Build the fleet baseline (V1 census):** +```bash +# 1) From any machine, request the fleet to report: +bash .claude/skills/self-check/scripts/self-check.sh fanout +# 2) Each instance (incl. this one) publishes: +SELFCHECK_TS="$(date -u +%Y-%m-%dT%H:%M:%SZ)" bash .claude/skills/self-check/scripts/self-check.sh --publish +# 3) Aggregate what came back: +bash .claude/skills/self-check/scripts/self-check.sh aggregate +``` + +After running, summarize the grade and the FAIL/WARN items for the user with the +exact fix commands. Do NOT auto-apply fixes — V1 is report-only by design. diff --git a/.claude/skills/self-check/SKILL.md b/.claude/skills/self-check/SKILL.md new file mode 100644 index 0000000..7797edf --- /dev/null +++ b/.claude/skills/self-check/SKILL.md @@ -0,0 +1,161 @@ +--- +name: self-check +description: >- + Self-diagnose a ClaudeTools session's machine: verify the harness is wired the + same way as every other instance while allowing for architectural / OS / hardware + differences. Checks that identity.json exists and is correct (the map of WHERE + things live on this box), required tooling is installed, env/paths resolve, + hooks are wired, the skill/command/script set matches the baseline, the vault + decrypts, coord/Gitea are reachable, and the machine's capability tier (e.g. no + local Ollama) resolves to the right fallback ruleset. Grades RED/AMBER/GREEN and + can publish a census to the coord API so the fleet baseline can be built/refined. + Invoke for: "self check", "self diagnosis", "self test", "doctor", "health check", + "am I configured right", "is my machine set up correctly", "harness conformance", + "fleet conformance", "check my environment", "is everything wired up". +--- + +# Self-Check — ClaudeTools Harness Self-Diagnosis + +A top-to-bottom evaluation of how *this* machine's ClaudeTools harness is wired, +graded against a checked-in **baseline manifest** so every machine behaves the +same way — while explicitly allowing for architecture, OS, and hardware +differences via a **capability tier** model. + +This is the skill the user asked for when a session needs to "make sure +everything is as it should be." + +## The model in one paragraph + +`identity.json` is the foundational, per-machine map of **where things live and +what this box can do** (vault path, repo root, platform, arch, python command, +Ollama endpoints). The **baseline manifest** +(`baseline/manifest.json`) declares what *every* machine must have — required +tools, identity fields, scripts, hook files, the wired `settings.json` hooks, the +canonical skill/command set, and the **capability rules** that say what to do when +a capability is absent (e.g. no local Ollama → use the remote endpoint, or if that +is also down, route Tier-0 work to haiku instead of blocking). The probe compares +the live machine against the manifest, resolves the machine's capability tier, and +grades RED/AMBER/GREEN. Required things missing = RED. Advisory drift = AMBER. +Capability differences are **never** failures — they select a ruleset. + +## V1 is a CENSUS tool (read this) + +There is no ratified fleet baseline yet. `baseline/manifest.json` is **provisional**, +generated from a single known-good machine (GURU-5070). So V1's job is to gather +ground truth from every machine and help Mike build the real baseline: + +1. **Probe** — each machine runs the check and produces a structured census. +2. **Publish** — `--publish` PUTs the census to coord as component + `selfcheck_` (state = grade, notes = full JSON). One row per machine = + a live fleet conformance view. +3. **Fan out** — `fanout` broadcasts a request to `ALL_SESSIONS` so every active + instance reports. +4. **Aggregate** — `aggregate` reads all censuses back and proposes a baseline + (tools/skills/commands present on *all* machines = "required everywhere"; + present on *some* = "capability-gated"), and lists machines with FAILs. + +Mike reviews the aggregate and ratifies `manifest.json`. From then on the same +probe enforces conformance. **V1 does not auto-fix anything** — it reports the +exact fix command for each finding (per the decision on record). + +## Running it + +The probe is `scripts/self-check.sh` (bash; runs on Git Bash/Windows, macOS, +Linux; deps: jq + curl). Always pass a real UTC timestamp: + +```bash +SELFCHECK_TS="$(date -u +%Y-%m-%dT%H:%M:%SZ)" \ + bash .claude/skills/self-check/scripts/self-check.sh +``` + +| Mode | Purpose | +|------|---------| +| `report` (default) | Human RED/AMBER/GREEN report. Exit 0/1/2 = GREEN/AMBER/RED. | +| `--json` | Structured census JSON to stdout (for piping). | +| `--publish` | Run + publish census to coord (component `selfcheck_`). Softfails to `.claude/coord-queue.jsonl`. | +| `fanout` | Broadcast a census request to ALL_SESSIONS. | +| `aggregate` | Fleet table + proposed-baseline summary from published censuses. | + +`/self-check` is the slash-command runner for the same script. + +## What it checks + +| Category | Checks | +|----------|--------| +| **identity** | identity.json exists + valid JSON; **all required fields present**; `claudetools_root` exists and equals the running repo; `vault_path` exists; `machine` == hostname; git user.name/email match identity. | +| **tooling** | required everywhere: bash, git, jq, curl, sops, age, ssh, a python. Missing = FAIL. | +| **capability** | ollama, cargo, node, gh, docker, op — presence is INFO, never a failure. Resolves the **Ollama tier** (local / remote / none) and prints the effective Tier-0 ruleset. | +| **files** | required scripts + hook files present and executable. | +| **hooks** | the three `settings.json` hooks are wired (block-backslash PreToolUse, check-messages UserPromptSubmit, sync-memory SessionStart); `current-mode` present. | +| **git** | origin points at ACG Gitea (internal IP preferred); main-repo post-commit hook installed (AMBER if not). | +| **skills/commands** | every skill dir and command file in the baseline is present; extras are reported as census candidates. | +| **duplicates** | command/skill names present in BOTH the repo and `~/.claude`. Divergent content = WARN (the "same `/cmd`, different behaviour on the Mac" bug); identical = INFO (redundant, will drift). CRLF-only differences are ignored. | +| **memory** | `MEMORY.md` index exists; no orphaned memory files; manifest-declared contradiction patterns (see semantic pass below). Never FAILs the grade. | +| **vault** | vault repo exists; sops+age present; `vault.sh list` succeeds (decrypt wired). | +| **connectivity** | coord API (required), main API + internal Gitea (advisory; off-network is OK). | + +## Rogue-memory contradiction — semantic pass (do this when asked, or on a full check) + +The engine's memory check is deterministic and conservative (index + orphans + +declared patterns) so it never produces false alarms. A *true* contradiction +check — "does any memory directly contradict what this machine's settings say?" +— is a judgment task, so the model does it (route the prose/classification to +Ollama Tier-0 per the house rules; Claude reviews the result): + +1. Read `identity.json` (where things live + this box's capabilities), + `settings.json` (wired hooks/permissions), and `baseline/manifest.json`. +2. Read the memory index `.claude/memory/MEMORY.md`, then open any memory whose + one-line hook touches: paths/roots, python launcher, endpoints/IPs, OS/arch + assumptions, tool choices, or model routing. +3. Flag memories that **directly contradict** this machine's reality, e.g.: + - prescribes `python3`/`python` when `identity.python.command` is `py` (or vice-versa), + - hardcodes a repo/vault path that isn't this machine's `claudetools_root`/`vault_path`, + - names an endpoint/IP that conflicts with `identity.coord_api` or the manifest, + - assumes a capability (local Ollama) this machine's tier says is absent. +4. Report each as: memory file, the contradicting claim, the setting it violates, + and a suggested correction. **Do not edit memories** — surface for the operator + (deletions/rewrites go through the human, mirroring memory-dream's posture). + +Genuinely machine-specific guidance in a *shared* memory is the usual culprit — +the fix is to scope it ("on Windows…") or split it, not to globally flip it. + +## Fleet self-remediation loop (machines fix themselves) + +We never fix a remote machine. The flow is: + +1. `fanout` — broadcast asks every instance to self-check + self-fix + re-publish. +2. Each operator runs `/self-check` locally, applies the printed fix commands on + their own box, re-runs to confirm GREEN, then `/self-check --publish`. +3. `aggregate` — shows who is still RED/AMBER and prints each machine's own fix + list. Relay it to that operator; do not run it for them. +4. Repeat until the fleet is consistently GREEN, then ratify the manifest. + +## How to interpret a run + +After running, summarize for the user: +- The **grade** and the PASS/WARN/FAIL/INFO tallies. +- Each **FAIL** and **WARN** with its exact fix command. Do not auto-apply. +- The **capability tier** line — confirm the machine knows its Tier-0 fallback. +- If publishing/aggregating, note how many machines have reported and which are RED. + +Capability differences (no Ollama, no gh, ARM vs amd64, macOS vs Windows) are +expected and must never be reported as broken — they are the whole point of the +tier model. + +## Files + +``` +.claude/skills/self-check/ + SKILL.md this file + scripts/self-check.sh the probe engine (report / --json / --publish / fanout / aggregate) + baseline/manifest.json the provisional fleet baseline (single source of truth) + baseline/README.md the baseline model + how to refine/ratify it +.claude/commands/self-check.md the /self-check runner +``` + +## Extending the baseline + +When a new tool/skill/command/hook becomes mandatory fleet-wide, edit +`baseline/manifest.json`, commit, and `/sync`. Every machine's next self-check +enforces it. Capability-only tools go in `capability_tools` with a matching entry +in `capability_rules` describing the fallback. See `baseline/README.md`. diff --git a/.claude/skills/self-check/baseline/README.md b/.claude/skills/self-check/baseline/README.md new file mode 100644 index 0000000..c0c50c4 --- /dev/null +++ b/.claude/skills/self-check/baseline/README.md @@ -0,0 +1,68 @@ +# Self-Check Baseline + +`manifest.json` is the **single source of truth** for how a ClaudeTools machine +should be wired. Every machine's `/self-check` grades itself against this file. +It is checked into the repo and syncs to all workstations via Gitea, so updating +it here updates the standard for the whole fleet. + +## Why a checked-in manifest (the "how do we set a baseline" answer) + +The fleet has architectural differences — Windows/macOS/Linux, amd64/arm64, boxes +with and without a local GPU for Ollama. We still want every machine to *behave* +the same. A checked-in manifest gives us that: + +- **Required everywhere** lives in `required_*` keys. Missing = a real failure on + any machine, regardless of OS. +- **Allowed-to-differ** lives in `capability_tools` + `capability_rules`. A capability + being absent is never a failure — it selects a *fallback ruleset* so the machine + knows how to behave without it (the canonical example: no local Ollama → use the + remote endpoint; if that is down too, route Tier-0 work to haiku instead of + blocking on it). + +`identity.json` (per-machine, gitignored) carries the machine's actual coordinates +and capabilities. The manifest says what's required; identity says where things are +and what this box can do; the probe reconciles the two. + +## How the baseline gets built (V1 census → ratified baseline) + +The current `manifest.json` is **provisional** — generated from one known-good +machine (GURU-5070, 2026-06-02). It has not been confirmed against the fleet. +The build sequence: + +1. `self-check.sh fanout` — ask every active instance to report. +2. Each machine runs `self-check.sh --publish` — publishes its census to coord as + component `selfcheck_`. +3. `self-check.sh aggregate` — reads them all back and proposes: + - items present on **all** reporting machines → **required everywhere** candidates, + - items present on **some** machines → **capability-gated** candidates. +4. Mike reviews the proposal and edits this `manifest.json` to ratify it, then + flips `"status": "provisional"` to `"ratified"` and commits. + +Until ratified, treat "extra"/"missing" skill and command findings as **candidates**, +not gospel. + +## Manifest structure + +| Key | Meaning | +|-----|---------| +| `schema_version`, `status` | manifest version; `provisional` or `ratified`. | +| `required_tools[]` | tools every machine must have on PATH (`name`, `why`). Missing = FAIL. | +| `required_python.any_of[]` | acceptable python launchers (`py`/`python3`/`python`). | +| `capability_tools[]` | optional tools; presence is INFO and toggles a `capability`. | +| `required_identity_fields[]` | dotted identity.json paths that must be set. | +| `required_scripts[]`, `required_hook_files[]` | repo files that must exist (+ executable). | +| `required_settings_hooks[]` | hooks that must be wired in settings.json (`event`, `command_contains`, `why`). | +| `git` | expected remote host, internal IP, post-commit-hook expectation. | +| `skills[]`, `commands[]` | the canonical skill dirs / command files. | +| `connectivity[]` | endpoints to probe (`required` ones FAIL if unreachable). | +| `capability_rules{}` | per-capability fallback behavior (the "different rules" for different hardware). | + +## Editing rules + +- Add a **required** item only when it should hold on *every* machine, every OS. + If a Windows box legitimately can't have it, it's a capability, not a requirement. +- Every `capability_tools` entry needs a story for what to do without it — encode + it in `capability_rules` (or reference an existing tier) so the probe can print + the effective ruleset. +- Keep paths repo-relative and forward-slashed. +- After editing: commit + `/sync`. The fleet picks it up on next pull. diff --git a/.claude/skills/self-check/baseline/manifest.json b/.claude/skills/self-check/baseline/manifest.json new file mode 100644 index 0000000..83893ea --- /dev/null +++ b/.claude/skills/self-check/baseline/manifest.json @@ -0,0 +1,114 @@ +{ + "schema_version": "1.0.0", + "status": "provisional", + "derived_from": "GURU-5070", + "derived_at": "2026-06-02", + "note": "PROVISIONAL baseline, generated from a single known-good machine. V1 of self-check is a CENSUS tool: every machine probes itself, publishes to the coord API, and we refine this manifest from real fleet data (see baseline/README.md). Do NOT treat 'extra' or 'missing' items as authoritative until the fleet census has confirmed them across machines.", + + "required_tools": [ + { "name": "bash", "why": "hooks, scripts, sync, vault wrapper" }, + { "name": "git", "why": "repo + submodules + Gitea sync" }, + { "name": "jq", "why": "every hook and coord script parses JSON with jq" }, + { "name": "curl", "why": "coord API, vault, RMM, all HTTP calls" }, + { "name": "sops", "why": "vault decryption (SOPS)" }, + { "name": "age", "why": "SOPS age recipient/decrypt" }, + { "name": "ssh", "why": "infra access; must be system OpenSSH" } + ], + + "required_python": { + "any_of": ["py", "python3", "python"], + "why": "JSON sanitizer in check-messages.sh, identity migration, skill scripts. The resolved command is recorded in identity.json (.python.command)." + }, + + "capability_tools": [ + { "name": "ollama", "capability": "ollama_local", "why": "Tier-0 local inference (prose/classification)" }, + { "name": "cargo", "capability": "rust_build", "why": "GuruRMM / GuruConnect Rust builds" }, + { "name": "node", "capability": "node_build", "why": "dashboard / TS builds" }, + { "name": "gh", "capability": "github_cli", "why": "optional GitHub operations" }, + { "name": "docker", "capability": "containers", "why": "optional container workflows" }, + { "name": "op", "capability": "onepassword_cli","why": "1Password fallback credential access" } + ], + + "required_identity_fields": [ + "user", "full_name", "email", "role", "machine", + "vault_path", "claudetools_root", "platform", "architecture", + "python.command", "ollama.endpoint", "ollama.fallback", "ollama.prose_model" + ], + "optional_identity_fields": ["coord_api", "last_updated"], + + "required_scripts": [ + ".claude/scripts/vault.sh", + ".claude/scripts/sync.sh", + ".claude/scripts/sync-memory.sh", + ".claude/scripts/check-messages.sh", + ".claude/scripts/migrate-identity.sh" + ], + + "required_hook_files": [ + ".claude/hooks/block-backslash-winpath.sh", + ".claude/hooks/post-commit.template" + ], + + "required_settings_hooks": [ + { "event": "PreToolUse", "matcher": "Bash", "command_contains": "block-backslash-winpath.sh", "why": "blocks garbled backslash Windows-path redirects in Git Bash" }, + { "event": "UserPromptSubmit", "matcher": "", "command_contains": "check-messages.sh", "why": "injects unread coord messages + dev-mode locks each prompt" }, + { "event": "SessionStart", "matcher": "", "command_contains": "sync-memory.sh", "why": "pulls shared memory at session start" } + ], + + "git": { + "remote_host_contains": "git.azcomputerguru.com", + "remote_host_internal_ip": "172.16.3.20", + "remote_note": "On-network machines should use the internal Gitea IP (172.16.3.20:3000) to bypass NPM SSL-renewal blips; off-network may use the domain. Either is acceptable; a non-ACG remote is a FAIL.", + "post_commit_hook_expected": true, + "post_commit_hook_note": "HOOKS.md mandates the dev-alerts post-commit hook in the main repo and each initialized submodule. Missing = AMBER (informational; reinstall from .claude/hooks/post-commit.template)." + }, + + "skills": [ + "1password", "b2", "bitdefender", "frontend-design", "gc-audit", + "impeccable", "memory-dream", "remediation-tool", "rmm-audit", + "skill-creator", "stop-slop", "theme-factory", "self-check" + ], + + "commands": [ + "1password", "autotask", "checkpoint", "context", "create-spec", + "feature-request", "forum-post", "gc-feature-request", "import", + "inject-standards", "mailbox", "mode", "recover", "remediation-tool", + "rmm", "save", "scc", "shape-spec", "sync", "syncro-emergency-billing", + "syncro", "wiki-compile", "wiki-lint", "self-check" + ], + + "connectivity": [ + { "name": "coord_api", "url": "http://172.16.3.30:8001/api/coord/status", "required": true, "why": "live coordination source of truth" }, + { "name": "claudetools_api","url": "http://172.16.3.30:8001/health", "required": false, "why": "main API health" }, + { "name": "gitea_internal", "url": "http://172.16.3.20:3000", "required": false, "why": "internal Gitea (git/API on-network)" } + ], + + "memory": { + "note": "Deterministic memory checks: MEMORY.md index exists + no orphaned memory files, plus the contradiction_patterns below. A pattern fires ONLY on machines where identity. == when_equals, so it flags a memory only where it is actually a contradiction for THIS box. Kept empty in V1 to avoid false positives; the real semantic contradiction analysis (memories vs identity.json + settings.json + this manifest) is done by the model per SKILL.md, optionally via Ollama Tier-0.", + "pattern_schema": { + "when_field": "dotted identity.json path, e.g. python.command", + "when_equals": "value that makes the grep a contradiction, e.g. python3", + "grep": "ERE matched case-insensitively against memory files", + "why": "human explanation shown in the finding" + }, + "contradiction_patterns": [] + }, + + "capability_rules": { + "ollama_local": { + "tier0_engine": "local ollama (localhost:11434) for summarize/classify/extract/draft", + "detect": "curl localhost:11434/api/tags reachable", + "fallback_if_unavailable": "ollama_remote" + }, + "ollama_remote": { + "tier0_engine": "remote ollama via identity.ollama.fallback (Beast over Tailscale)", + "detect": "localhost unreachable but identity.ollama.fallback reachable", + "fallback_if_unavailable": "ollama_none" + }, + "ollama_none": { + "tier0_engine": "NONE - no Tier-0 path. Route low-stakes prose/classification to Tier-1 (haiku) instead of Ollama. Do NOT block work waiting on Ollama.", + "detect": "neither localhost nor fallback reachable", + "fallback_if_unavailable": null + } + } +} diff --git a/.claude/skills/self-check/scripts/self-check.sh b/.claude/skills/self-check/scripts/self-check.sh new file mode 100644 index 0000000..6286508 --- /dev/null +++ b/.claude/skills/self-check/scripts/self-check.sh @@ -0,0 +1,746 @@ +#!/usr/bin/env bash +# self-check.sh - ClaudeTools harness self-diagnosis / fleet conformance probe. +# +# V1 is a CENSUS tool. Each machine probes its own harness wiring (tools, +# identity, hooks, skills, commands, scripts, connectivity, capability tier), +# grades what it can against the provisional baseline manifest, and can publish +# the result to the coord API so the fleet can be compared and the baseline +# refined from real data. See ../SKILL.md and ../baseline/README.md. +# +# Usage: +# self-check.sh Run checks, print a human report. (default) +# self-check.sh --json Emit the structured census JSON to stdout only. +# self-check.sh --publish Run checks, then PUT the census to coord (component selfcheck_). +# self-check.sh fanout Broadcast a request to ALL_SESSIONS to run /self-check --publish. +# self-check.sh aggregate Read every machine's published census and print a fleet table +# plus a proposed-baseline (intersection/union) summary. +# +# Portable: bash 3.2+ (macOS), Git Bash (Windows), Linux. Deps: jq, curl. +# Read-only. It collects and reports; it changes nothing on the machine. + +set -u + +# --------------------------------------------------------------------------- +# Bootstrap: resolve repo root, identity, coord API, session id +# --------------------------------------------------------------------------- +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +SKILL_DIR="$(cd "$SCRIPT_DIR/.." && pwd)" +REPO_ROOT="$(cd "$SCRIPT_DIR/../../../.." && pwd)" +MANIFEST="$SKILL_DIR/baseline/manifest.json" + +if ! command -v jq >/dev/null 2>&1; then + echo "[ERROR] jq is required and not found on PATH. Install jq, then re-run." >&2 + exit 2 +fi +# Some Windows jq builds (winget) emit CRLF line endings; a trailing \r corrupts +# every `for x in $(jq ...)` word and `read`-from-@tsv field. Strip \r from all +# jq output (it is insignificant JSON whitespace and never wanted in raw values). +jq() { command jq "$@" | tr -d '\r'; } +if ! command -v curl >/dev/null 2>&1; then + echo "[ERROR] curl is required and not found on PATH." >&2 + exit 2 +fi +if [ ! -f "$MANIFEST" ]; then + echo "[ERROR] Baseline manifest not found: $MANIFEST" >&2 + exit 2 +fi + +# identity.json: prefer repo copy, then ~/.claude (mirrors check-messages.sh). +IDENTITY="" +for c in "$REPO_ROOT/.claude/identity.json" "$HOME/.claude/identity.json"; do + [ -f "$c" ] && { IDENTITY="$c"; break; } +done + +idfield() { # dotted.path -> value or empty + [ -n "$IDENTITY" ] && jq -r "$1 // empty" "$IDENTITY" 2>/dev/null +} + +HOSTNAME_RAW="$(hostname 2>/dev/null || echo unknown)" +HOST="${HOSTNAME_RAW%.local}" +SESSION="${HOST}/claude-main" + +API="$(idfield '.coord_api')" +[ -z "$API" ] && API="http://172.16.3.30:8001" + +PLATFORM="$(idfield '.platform')" +[ -z "$PLATFORM" ] && case "$(uname -s)" in + Darwin) PLATFORM="macos" ;; Linux) PLATFORM="linux" ;; + CYGWIN*|MINGW*|MSYS*) PLATFORM="windows" ;; *) PLATFORM="unknown" ;; +esac +ARCH="$(idfield '.architecture')" +[ -z "$ARCH" ] && ARCH="$(uname -m 2>/dev/null || echo unknown)" + +# --------------------------------------------------------------------------- +# Results accumulation. Each check appends one compact JSON object. +# status in {PASS, WARN, FAIL, SKIP, INFO}. Grade: any FAIL->RED, WARN->AMBER. +# --------------------------------------------------------------------------- +RESULTS_FILE="$(mktemp 2>/dev/null || echo "${TMPDIR:-/tmp}/selfcheck.$$")" +: > "$RESULTS_FILE" +trap 'rm -f "$RESULTS_FILE" 2>/dev/null' EXIT + +emit() { # id category status detail fix + jq -nc --arg id "$1" --arg cat "$2" --arg st "$3" --arg detail "$4" --arg fix "${5:-}" \ + '{id:$id,category:$cat,status:$st,detail:$detail,fix:$fix}' >> "$RESULTS_FILE" +} + +reachable() { curl -s -o /dev/null -m 4 "$1" 2>/dev/null; } # exit 0 if HTTP responds + +# Content-equal ignoring line endings: a repo LF copy and a ~/.claude CRLF copy +# are the SAME content (the cross-machine case this check polices), so compare +# with \r stripped rather than byte-for-byte (cmp would false-flag them). +same_content() { diff -q <(tr -d '\r' < "$1") <(tr -d '\r' < "$2") >/dev/null 2>&1; } + +# --------------------------------------------------------------------------- +# CHECK: identity +# --------------------------------------------------------------------------- +check_identity() { + if [ -z "$IDENTITY" ]; then + emit identity.present identity FAIL "identity.json not found (.claude or ~/.claude)" \ + "Run onboarding; create .claude/identity.json then 'bash .claude/scripts/migrate-identity.sh'" + return + fi + if ! jq -e . "$IDENTITY" >/dev/null 2>&1; then + emit identity.parse identity FAIL "identity.json is not valid JSON: $IDENTITY" "Fix the JSON syntax" + return + fi + emit identity.present identity PASS "identity.json present and valid: $IDENTITY" + + local missing="" + for f in $(jq -r '.required_identity_fields[]' "$MANIFEST"); do + local v; v="$(jq -r ".$f // empty" "$IDENTITY" 2>/dev/null)" + [ -z "$v" ] && missing="$missing $f" + done + if [ -n "$missing" ]; then + emit identity.fields identity WARN "missing/empty identity fields:$missing" \ + "bash .claude/scripts/migrate-identity.sh (populates machine-specific fields)" + else + emit identity.fields identity PASS "all required identity fields present" + fi + + # --- path fields: identity.json is the map of WHERE things live on this box. + # It is foundational - every later check trusts claudetools_root / vault_path. + # Verify they resolve to real locations and that claudetools_root is in fact + # the repo we are running from (a stale clone path is a silent footgun). + norm() { # path -> lowercase, forward-slash, drive-letter, no trailing slash + local p="$1" + command -v cygpath >/dev/null 2>&1 && p="$(cygpath -m "$p" 2>/dev/null || echo "$p")" + printf '%s' "$p" | tr 'A-Z' 'a-z' | sed 's#\\#/#g; s#/\{1,\}$##' + } + local ctroot; ctroot="$(idfield '.claudetools_root')" + if [ -z "$ctroot" ]; then + emit identity.claudetools_root identity FAIL "identity.claudetools_root not set" \ + "Set claudetools_root in identity.json to this repo's absolute path" + elif [ ! -d "$ctroot" ]; then + emit identity.claudetools_root identity FAIL "claudetools_root does not exist: $ctroot" \ + "Fix claudetools_root in identity.json (machine moved/renamed the repo?)" + elif [ "$(norm "$ctroot")" != "$(norm "$REPO_ROOT")" ]; then + emit identity.claudetools_root identity WARN \ + "claudetools_root ($ctroot) != running repo ($REPO_ROOT)" \ + "Reconcile claudetools_root in identity.json with the repo you actually run from" + else + emit identity.claudetools_root identity PASS "claudetools_root resolves to this repo ($ctroot)" + fi + + local vpath2; vpath2="$(idfield '.vault_path')" + if [ -z "$vpath2" ]; then + emit identity.vault_path identity FAIL "identity.vault_path not set (cannot locate the SOPS vault)" \ + "Set vault_path in identity.json to the cloned vault repo path" + elif [ ! -d "$vpath2" ]; then + emit identity.vault_path identity FAIL "vault_path does not exist: $vpath2" \ + "Clone the vault repo and set vault_path in identity.json" + else + emit identity.vault_path identity PASS "vault_path resolves ($vpath2)" + fi + + # machine field vs actual hostname + local idmach; idmach="$(idfield '.machine')" + if [ -n "$idmach" ] && [ "$(echo "$idmach" | tr 'A-Z' 'a-z')" != "$(echo "$HOST" | tr 'A-Z' 'a-z')" ]; then + emit identity.hostname identity WARN "identity.machine='$idmach' != actual hostname '$HOST'" \ + "Update .machine in identity.json (did you clone onto a new box?)" + else + emit identity.hostname identity PASS "identity.machine matches hostname ($HOST)" + fi + + # git config vs identity + local gn ge idn ide + gn="$(git -C "$REPO_ROOT" config user.name 2>/dev/null)" + ge="$(git -C "$REPO_ROOT" config user.email 2>/dev/null)" + idn="$(idfield '.full_name')"; ide="$(idfield '.email')" + if [ -n "$idn" ] && [ "$gn" != "$idn" ]; then + emit identity.git_name identity WARN "git user.name='$gn' != identity.full_name='$idn'" \ + "git config user.name \"$idn\"" + else + emit identity.git_name identity PASS "git user.name matches identity ($gn)" + fi + if [ -n "$ide" ] && [ "$ge" != "$ide" ]; then + emit identity.git_email identity WARN "git user.email='$ge' != identity.email='$ide'" \ + "git config user.email \"$ide\"" + else + emit identity.git_email identity PASS "git user.email matches identity ($ge)" + fi +} + +# --------------------------------------------------------------------------- +# CHECK: tooling (required + capability-gated) +# --------------------------------------------------------------------------- +toolver() { # best-effort one-line version + "$1" --version 2>/dev/null | head -1 || true +} +check_tools() { + local n why + while IFS=$'\t' read -r n why; do + if command -v "$n" >/dev/null 2>&1; then + emit "tool.$n" tooling PASS "$n present ($(toolver "$n"))" + else + emit "tool.$n" tooling FAIL "$n MISSING (required: $why)" "Install $n and ensure it is on PATH" + fi + done < <(jq -r '.required_tools[] | [.name, .why] | @tsv' "$MANIFEST") + + # python: any_of + local pyok="" pc + for pc in $(jq -r '.required_python.any_of[]' "$MANIFEST"); do + if command -v "$pc" >/dev/null 2>&1; then pyok="$pc"; break; fi + done + if [ -n "$pyok" ]; then + local declared; declared="$(idfield '.python.command')" + if [ -n "$declared" ] && ! command -v "$declared" >/dev/null 2>&1; then + emit tool.python tooling WARN "identity.python.command='$declared' not on PATH; '$pyok' is available" \ + "Update .python.command in identity.json or re-run migrate-identity.sh" + else + emit tool.python tooling PASS "python available ($pyok; identity declares '${declared:-unset}')" + fi + else + emit tool.python tooling FAIL "no python interpreter found (tried py/python3/python)" "Install Python" + fi + + # capability tools - presence only, never FAIL + local cn cap cwhy + while IFS=$'\t' read -r cn cap cwhy; do + if command -v "$cn" >/dev/null 2>&1; then + emit "cap.$cn" capability INFO "$cn present [$cap] ($(toolver "$cn"))" + else + emit "cap.$cn" capability INFO "$cn absent [$cap] - capability off ($cwhy)" + fi + done < <(jq -r '.capability_tools[] | [.name, .capability, .why] | @tsv' "$MANIFEST") +} + +# --------------------------------------------------------------------------- +# CHECK: capability tier (ollama) + effective ruleset +# --------------------------------------------------------------------------- +check_capability_tier() { + local declared fb local_ok="" remote_ok="" tier rule + declared="$(idfield '.ollama.endpoint')" + fb="$(idfield '.ollama.fallback')" + + reachable "http://localhost:11434/api/tags" && local_ok=1 + [ -n "$fb" ] && reachable "${fb%/}/api/tags" && remote_ok=1 + + if [ -n "$local_ok" ]; then + tier="ollama_local" + elif [ -n "$remote_ok" ]; then + tier="ollama_remote" + else + tier="ollama_none" + fi + rule="$(jq -r ".capability_rules.$tier.tier0_engine" "$MANIFEST")" + + # Does the resolved tier agree with what identity declares? + if [ "$tier" = "ollama_none" ]; then + emit captier.ollama capability WARN "Ollama tier = NONE (local + fallback both unreachable). Effective rule: $rule" \ + "Confirm this machine is meant to run without Ollama; ensure Tier-0 work routes to haiku, not blocked" + else + local note="" + if [ "$tier" = "ollama_remote" ] && echo "$declared" | grep -q "localhost"; then + note=" (identity declares localhost but local is down; using fallback $fb)" + fi + emit captier.ollama capability PASS "Ollama tier = ${tier#ollama_}${note}. Effective Tier-0: $rule" + fi +} + +# --------------------------------------------------------------------------- +# CHECK: required scripts + hook files (exist + executable) +# --------------------------------------------------------------------------- +check_files() { + local rel p + for rel in $(jq -r '.required_scripts[], .required_hook_files[]' "$MANIFEST"); do + p="$REPO_ROOT/$rel" + if [ ! -f "$p" ]; then + emit "file.$rel" files FAIL "missing: $rel" "Restore via /sync (git pull from Gitea)" + elif [ ! -x "$p" ] && echo "$rel" | grep -qE '\.(sh|template)$'; then + emit "file.$rel" files WARN "present but not executable: $rel" "chmod +x \"$rel\"" + else + emit "file.$rel" files PASS "present: $rel" + fi + done +} + +# --------------------------------------------------------------------------- +# CHECK: settings.json hooks wired correctly +# --------------------------------------------------------------------------- +check_settings_hooks() { + local settings="$REPO_ROOT/.claude/settings.json" + if [ ! -f "$settings" ] || ! jq -e . "$settings" >/dev/null 2>&1; then + emit hooks.settings hooks FAIL "settings.json missing or invalid JSON" "Restore .claude/settings.json via /sync" + return + fi + local ev needle why found + # NB: omit .matcher from the TSV - an empty middle field collapses under tab + # IFS (tab is IFS-whitespace), shifting columns. We do not use the matcher here. + while IFS=$'\t' read -r ev needle why; do + # any hook command under this event containing the needle + found="$(jq -r --arg ev "$ev" --arg n "$needle" \ + '(.hooks[$ev] // []) | [.[].hooks[]?.command // ""] | map(select(contains($n))) | length' \ + "$settings" 2>/dev/null)" + if [ "${found:-0}" -gt 0 ] 2>/dev/null; then + emit "hook.$ev" hooks PASS "$ev hook wired ($needle)" + else + emit "hook.$ev" hooks FAIL "$ev hook NOT wired (expected command containing '$needle' - $why)" \ + "Add the $ev hook to .claude/settings.json (see baseline manifest required_settings_hooks)" + fi + done < <(jq -r '.required_settings_hooks[] | [.event, .command_contains, .why] | @tsv' "$MANIFEST") + + # current-mode file (created by UserPromptSubmit hook, but flag if absent) + if [ -f "$REPO_ROOT/.claude/current-mode" ]; then + emit hook.current-mode hooks PASS "current-mode present ($(tr -d '[:space:]' < "$REPO_ROOT/.claude/current-mode"))" + else + emit hook.current-mode hooks WARN "current-mode missing (auto-created on next prompt)" "echo general > .claude/current-mode" + fi +} + +# --------------------------------------------------------------------------- +# CHECK: git remote + post-commit hooks +# --------------------------------------------------------------------------- +check_git() { + local url want host_ip + url="$(git -C "$REPO_ROOT" remote get-url origin 2>/dev/null)" + want="$(jq -r '.git.remote_host_contains' "$MANIFEST")" + host_ip="$(jq -r '.git.remote_host_internal_ip' "$MANIFEST")" + if [ -z "$url" ]; then + emit git.remote git WARN "no 'origin' remote on $REPO_ROOT" "git remote add origin " + elif echo "$url" | grep -qF "$want" || echo "$url" | grep -qF "$host_ip"; then + emit git.remote git PASS "origin -> $url" + else + emit git.remote git FAIL "origin does not point at ACG Gitea: $url" \ + "git remote set-url origin http://@$host_ip:3000/azcomputerguru/claudetools.git" + fi + + if [ "$(jq -r '.git.post_commit_hook_expected' "$MANIFEST")" = "true" ]; then + if [ -f "$REPO_ROOT/.git/hooks/post-commit" ]; then + emit git.post_commit git PASS "main-repo post-commit hook installed" + else + emit git.post_commit git WARN "main-repo post-commit hook NOT installed (HOOKS.md mandates dev-alerts hook)" \ + "cp .claude/hooks/post-commit.template .git/hooks/post-commit && chmod +x .git/hooks/post-commit" + fi + fi +} + +# --------------------------------------------------------------------------- +# CHECK: skills + commands conformance vs manifest +# --------------------------------------------------------------------------- +check_skills_commands() { + local name dir + # skills present + for name in $(jq -r '.skills[]' "$MANIFEST"); do + dir="$REPO_ROOT/.claude/skills/$name" + if [ -d "$dir" ]; then + if [ -f "$dir/SKILL.md" ] || ls "$dir"/*.md >/dev/null 2>&1 || [ -d "$dir/scripts" ]; then + emit "skill.$name" skills PASS "skill present: $name" + else + emit "skill.$name" skills WARN "skill dir present but looks empty: $name" "Restore skill contents via /sync" + fi + else + emit "skill.$name" skills FAIL "skill MISSING: $name" "Restore .claude/skills/$name via /sync" + fi + done + # extra skills not in manifest (drift to report, not fail in V1) + local known + known="|$(jq -r '.skills[]' "$MANIFEST" | tr '\n' '|')" + for dir in "$REPO_ROOT"/.claude/skills/*/; do + [ -d "$dir" ] || continue + name="$(basename "$dir")" + case "$known" in *"|$name|"*) ;; *) emit "skill.extra.$name" skills INFO "skill present but NOT in baseline: $name (census candidate)" ;; esac + done + + # commands present + for name in $(jq -r '.commands[]' "$MANIFEST"); do + if [ -f "$REPO_ROOT/.claude/commands/$name.md" ]; then + emit "cmd.$name" commands PASS "command present: /$name" + else + emit "cmd.$name" commands FAIL "command MISSING: /$name" "Restore .claude/commands/$name.md via /sync" + fi + done + # extra commands + known="|$(jq -r '.commands[]' "$MANIFEST" | tr '\n' '|')" + for f in "$REPO_ROOT"/.claude/commands/*.md; do + [ -f "$f" ] || continue + name="$(basename "$f" .md)" + [ "$name" = "README" ] && continue + case "$known" in *"|$name|"*) ;; *) emit "cmd.extra.$name" commands INFO "command present but NOT in baseline: /$name (census candidate)" ;; esac + done +} + +# --------------------------------------------------------------------------- +# CHECK: vault decrypt readiness +# --------------------------------------------------------------------------- +check_vault() { + local vpath; vpath="$(idfield '.vault_path')" + if [ -z "$vpath" ]; then + emit vault.path vault WARN "identity.vault_path not set" "Set vault_path in identity.json" + return + fi + if [ ! -d "$vpath" ]; then + emit vault.path vault FAIL "vault_path does not exist: $vpath" "Clone the vault repo and set vault_path" + return + fi + emit vault.path vault PASS "vault repo present: $vpath" + if ! command -v sops >/dev/null 2>&1 || ! command -v age >/dev/null 2>&1; then + emit vault.tools vault FAIL "sops/age missing - cannot decrypt vault" "Install sops and age" + return + fi + # Lightweight readiness: vault.sh list should enumerate entries without error. + if [ -x "$REPO_ROOT/.claude/scripts/vault.sh" ]; then + if bash "$REPO_ROOT/.claude/scripts/vault.sh" list >/dev/null 2>&1; then + emit vault.list vault PASS "vault.sh list succeeded (sops/age wired)" + else + emit vault.list vault WARN "vault.sh list failed - check age key + SOPS_AGE_KEY_FILE" \ + "Verify age key at the SOPS recipient path; run: bash .claude/scripts/vault.sh list" + fi + fi +} + +# --------------------------------------------------------------------------- +# CHECK: connectivity +# --------------------------------------------------------------------------- +check_connectivity() { + local name url req + while IFS=$'\t' read -r name url req; do + if reachable "$url"; then + emit "net.$name" connectivity PASS "$name reachable ($url)" + elif [ "$req" = "true" ]; then + emit "net.$name" connectivity FAIL "$name UNREACHABLE ($url)" "Check VPN/Tailscale/network to 172.16.3.x" + else + emit "net.$name" connectivity WARN "$name unreachable ($url) - off-network is OK" "" + fi + done < <(jq -r '.connectivity[] | [.name, .url, (.required|tostring)] | @tsv' "$MANIFEST") +} + +# --------------------------------------------------------------------------- +# CHECK: duplicate command/skill definitions across search roots. +# Claude Code resolves slash commands and skills from BOTH the repo +# (.claude/commands, .claude/skills) and the user profile (~/.claude/...). When +# the same name exists in both with DIFFERENT content, the harness may resolve a +# different one than you expect - the "same /cmd, different behaviour on the Mac" +# bug. Divergent = WARN; identical = INFO (redundant copy that WILL drift). +# --------------------------------------------------------------------------- +check_duplicates() { + local kind repo_dir user_dir + # commands: compare *.md files by content + for kind in commands skills; do + repo_dir="$REPO_ROOT/.claude/$kind" + user_dir="$HOME/.claude/$kind" + [ -d "$repo_dir" ] || continue + [ -d "$user_dir" ] || { emit "dup.$kind" duplicates PASS "no user-level ~/.claude/$kind (single source: repo)"; continue; } + + local name rp up dup_div=0 dup_same=0 + if [ "$kind" = "commands" ]; then + for rp in "$repo_dir"/*.md; do + [ -f "$rp" ] || continue + name="$(basename "$rp" .md)" + [ "$name" = "README" ] && continue + up="$user_dir/$name.md" + [ -f "$up" ] || continue + [ "$rp" -ef "$up" ] && continue # symlink to the same file - cannot drift + if same_content "$rp" "$up"; then + dup_same=$((dup_same+1)) + else + dup_div=$((dup_div+1)) + emit "dup.cmd.$name" duplicates WARN \ + "/$name is DIVERGENT: repo and ~/.claude copies differ (harness may run the wrong one)" \ + "Reconcile: diff \"$rp\" \"$up\" then make ~/.claude/commands/$name.md match the repo (or remove it)" + fi + done + else + for rp in "$repo_dir"/*/; do + [ -d "$rp" ] || continue + name="$(basename "$rp")" + up="$user_dir/$name" + [ -d "$up" ] || continue + [ "$rp" -ef "$up" ] && continue # symlinked dir - cannot drift + # Only compare when BOTH have a SKILL.md; otherwise not comparable + # (script-only / *.md-only skills) - skip rather than miscount. + if [ -f "$rp/SKILL.md" ] && [ -f "$up/SKILL.md" ]; then + if same_content "$rp/SKILL.md" "$up/SKILL.md"; then + dup_same=$((dup_same+1)) + else + dup_div=$((dup_div+1)) + emit "dup.skill.$name" duplicates WARN \ + "skill '$name' is DIVERGENT: repo and ~/.claude SKILL.md differ" \ + "Reconcile ~/.claude/skills/$name with the repo copy (or remove the user-level one)" + fi + fi + done + fi + if [ "$dup_div" -eq 0 ] && [ "$dup_same" -gt 0 ]; then + emit "dup.$kind" duplicates INFO \ + "$dup_same $kind exist in BOTH repo and ~/.claude (identical now, but a redundant copy that can drift)" \ + "Consider a single source of truth for $kind to prevent future divergence" + elif [ "$dup_div" -eq 0 ] && [ "$dup_same" -eq 0 ]; then + emit "dup.$kind" duplicates PASS "no duplicate $kind across roots" + fi + done +} + +# --------------------------------------------------------------------------- +# CHECK: rogue memories that contradict settings/identity. +# Deterministic core only: index integrity + a conservative, manifest-declared +# set of contradiction patterns evaluated against this machine's identity. The +# SEMANTIC contradiction pass (reasoning over all memories vs identity/settings) +# is a judgment task and is delegated to the model in SKILL.md, not grep. +# --------------------------------------------------------------------------- +check_memory() { + local mdir="$REPO_ROOT/.claude/memory" idx="$REPO_ROOT/.claude/memory/MEMORY.md" + if [ ! -d "$mdir" ]; then + emit memory.dir memory WARN "no .claude/memory directory" "Expected the shared memory store; restore via /sync" + return + fi + if [ ! -f "$idx" ]; then + emit memory.index memory WARN "MEMORY.md index missing" "Create .claude/memory/MEMORY.md (the loaded index)" + else + # orphan detection: every *.md (except MEMORY.md) should be referenced in the index + local f base orphans=0 + for f in "$mdir"/*.md; do + [ -f "$f" ] || continue + base="$(basename "$f")" + [ "$base" = "MEMORY.md" ] && continue + if ! grep -qF "$base" "$idx" 2>/dev/null; then + orphans=$((orphans+1)) + fi + done + if [ "$orphans" -gt 0 ]; then + emit memory.orphans memory WARN "$orphans memory file(s) not referenced in MEMORY.md (orphaned)" \ + "Run /memory-dream or add the missing index lines" + else + emit memory.index memory PASS "MEMORY.md index present; no orphaned memory files" + fi + fi + + # Manifest-declared contradiction patterns. Each entry: + # { when_field, when_equals, grep, why } - only evaluated when this + # machine's identity. == when_equals, so a pattern fires only + # where it is actually a contradiction (e.g. prescribing python3 on a `py` box). + # NB: fields are read via @tsv, so when_equals/grep MUST NOT contain tab chars. + local has; has="$(jq -r '(.memory.contradiction_patterns // []) | length' "$MANIFEST" 2>/dev/null)" + if [ "${has:-0}" -gt 0 ] 2>/dev/null; then + local wf we gx why hits + while IFS=$'\t' read -r wf we gx why; do + [ -n "$wf" ] || continue + if [ "$(idfield ".$wf")" = "$we" ]; then + hits="$(grep -rliE "$gx" "$mdir" 2>/dev/null | grep -vF 'MEMORY.md' | head -5 | tr '\n' ' ')" + if [ -n "$hits" ]; then + emit "memory.contradiction.$wf" memory WARN \ + "memory may contradict identity.$wf=$we ($why): $hits" \ + "Review the listed memory file(s); correct or delete if they prescribe the wrong behaviour for this machine" + fi + fi + done < <(jq -r '(.memory.contradiction_patterns // [])[] | [.when_field, .when_equals, .grep, .why] | @tsv' "$MANIFEST") + fi +} + +# --------------------------------------------------------------------------- +# Build the census JSON from accumulated results +# --------------------------------------------------------------------------- +build_census() { + local fails warns grade + fails="$(jq -s '[.[]|select(.status=="FAIL")]|length' "$RESULTS_FILE")" + warns="$(jq -s '[.[]|select(.status=="WARN")]|length' "$RESULTS_FILE")" + if [ "$fails" -gt 0 ]; then grade="RED"; elif [ "$warns" -gt 0 ]; then grade="AMBER"; else grade="GREEN"; fi + + jq -s \ + --arg host "$HOST" --arg session "$SESSION" --arg platform "$PLATFORM" --arg arch "$ARCH" \ + --arg grade "$grade" --arg ts "$RUN_TS" \ + --arg mver "$(jq -r '.schema_version' "$MANIFEST")" \ + '{ + host:$host, session:$session, platform:$platform, arch:$arch, + grade:$grade, generated_at:$ts, manifest_version:$mver, + summary: { pass:([.[]|select(.status=="PASS")]|length), + warn:([.[]|select(.status=="WARN")]|length), + fail:([.[]|select(.status=="FAIL")]|length), + info:([.[]|select(.status=="INFO")]|length) }, + results: . + }' "$RESULTS_FILE" +} + +# --------------------------------------------------------------------------- +# Human report +# --------------------------------------------------------------------------- +print_report() { + local census="$1" grade + grade="$(echo "$census" | jq -r .grade)" + echo "" + echo "============================================================" + echo " ClaudeTools self-check - $HOST ($PLATFORM/$ARCH)" + echo " Grade: $grade $(echo "$census" | jq -r '.summary | "PASS \(.pass) WARN \(.warn) FAIL \(.fail) INFO \(.info)"')" + echo " Manifest: $(echo "$census" | jq -r .manifest_version) (provisional) $RUN_TS" + echo "============================================================" + # FAIL then WARN then INFO; PASS summarized per category + echo "$census" | jq -r ' + def mark(s): if s=="FAIL" then "[FAIL]" elif s=="WARN" then "[WARN]" + elif s=="INFO" then "[INFO]" elif s=="SKIP" then "[SKIP]" else "[ OK ]" end; + (.results | map(select(.status=="FAIL"))) as $f + | (.results | map(select(.status=="WARN"))) as $w + | (.results | map(select(.status=="INFO"))) as $i + | (if ($f|length)>0 then "\nFAILURES:" else empty end), + ($f[] | " [FAIL] \(.category)/\(.id): \(.detail)" + (if .fix!="" then "\n fix: \(.fix)" else "" end)), + (if ($w|length)>0 then "\nWARNINGS:" else empty end), + ($w[] | " [WARN] \(.category)/\(.id): \(.detail)" + (if .fix!="" then "\n fix: \(.fix)" else "" end)), + (if ($i|length)>0 then "\nINFO / capability:" else empty end), + ($i[] | " [INFO] \(.detail)") + ' + # per-category PASS counts + echo "" + echo "PASS by category:" + echo "$census" | jq -r '.results | map(select(.status=="PASS")) | group_by(.category)[] | " \(.[0].category): \(length) ok"' + echo "============================================================" +} + +# --------------------------------------------------------------------------- +# Publish census to coord API. +# The coord API uses the PATH-PARAM form: PUT /api/coord/components/{pk}/{comp} +# with a body of {state, version, notes, updated_by} (the body form 405s). +# The component segment must be slash-free (a slash 404s, even URL-encoded), so +# the per-machine component is "selfcheck_" (NOT "selfcheck/"). +# --------------------------------------------------------------------------- +COMPONENT="selfcheck_$HOST" +publish_census() { + local census="$1" grade compact body path + grade="$(echo "$census" | jq -r .grade)" + compact="$(echo "$census" | jq -c .)" + path="/api/coord/components/claudetools/$COMPONENT" + body="$(jq -nc --arg state "$grade" \ + --arg ver "$(jq -r '.schema_version' "$MANIFEST")" --arg notes "$compact" --arg by "$SESSION" \ + '{state:$state, version:$ver, notes:$notes, updated_by:$by}')" + if curl -s -m 8 -X PUT "$API$path" -H "Content-Type: application/json" -d "$body" >/dev/null 2>&1; then + echo "[OK] Published census to coord: component $COMPONENT = $grade" + else + # softfail per coordination protocol - queue the SAME path/body so a + # later /sync drain replays a request that actually works. + local q="$REPO_ROOT/.claude/coord-queue.jsonl" + jq -nc --arg path "$path" --argjson b "$body" --arg ts "$RUN_TS" \ + '{ts:$ts, method:"PUT", path:$path, body:$b}' >> "$q" 2>/dev/null + echo "[WARN] coord unreachable; census queued to .claude/coord-queue.jsonl" + fi +} + +# --------------------------------------------------------------------------- +# Subcommand: fanout - request all instances to run /self-check --publish +# --------------------------------------------------------------------------- +do_fanout() { + local subj body payload + subj="[self-check] Fleet census + self-remediation request" + body="On THIS machine: (1) run /self-check ; (2) apply the suggested fix commands it prints for any FAIL/WARN - fix your OWN machine, locally, with your operator present (nobody fixes you remotely) ; (3) re-run /self-check to confirm GREEN ; (4) run /self-check --publish to report your census (component selfcheck_) to coord. The check is read-only; only --publish writes (your census only). Requested by $SESSION at $RUN_TS." + payload="$(jq -nc --arg from "$SESSION" --arg subj "$subj" --arg body "$body" \ + '{from_session:$from, to_session:"ALL_SESSIONS", project_key:"claudetools", subject:$subj, body:$body}')" + if curl -s -m 8 -X POST "$API/api/coord/messages" -H "Content-Type: application/json" -d "$payload" >/dev/null 2>&1; then + echo "[OK] Broadcast census request to ALL_SESSIONS." + else + echo "[ERROR] Failed to broadcast (coord unreachable)." >&2 + exit 1 + fi +} + +# --------------------------------------------------------------------------- +# Subcommand: aggregate - read all published censuses, build fleet view +# --------------------------------------------------------------------------- +do_aggregate() { + local comps + comps="$(curl -s -m 8 "$API/api/coord/components?project_key=claudetools" 2>/dev/null)" + if [ -z "$comps" ]; then echo "[ERROR] coord unreachable." >&2; exit 1; fi + # The coord API returns {states:[...], total:N}; each row's grade is .state and + # the full census JSON is in .notes. Keep selfcheck_* rows with parseable notes. + # (.components / bare-array kept as defensive fallbacks.) + local censuses + censuses="$(echo "$comps" | jq -c ' + ( .states? // .components? // (if type=="array" then . else [] end) ) as $rows + | ($rows // []) + | map(select((.component? // "") | startswith("selfcheck"))) + | map(.notes | try fromjson catch empty) + ' 2>/dev/null)" + local n; n="$(echo "$censuses" | jq 'length' 2>/dev/null || echo 0)" + if [ "${n:-0}" -eq 0 ]; then + echo "No published censuses found yet. Run 'self-check.sh fanout', then have each machine run /self-check --publish." + return + fi + echo "============================================================" + echo " Fleet census: $n machine(s) reporting" + echo "============================================================" + echo "$censuses" | jq -r '.[] | " \(.grade)\t\(.host)\t\(.platform)/\(.arch)\tP\(.summary.pass) W\(.summary.warn) F\(.summary.fail)\t\(.generated_at)"' | column -t -s$'\t' 2>/dev/null \ + || echo "$censuses" | jq -r '.[] | " \(.grade) \(.host) \(.platform)/\(.arch) P\(.summary.pass) W\(.summary.warn) F\(.summary.fail)"' + + echo "" + echo "Proposed baseline (intersection = required everywhere; symmetric diff = capability-gated):" + # Tools present on every machine vs only some, derived from tool.* PASS results. + echo "$censuses" | jq -r ' + [ .[] | { host:.host, tools:( .results | map(select((.id|startswith("tool."))) | select(.status=="PASS") | (.id|sub("^tool.";""))) ) } ] as $m + | ($m|length) as $count + | ([ $m[].tools[] ] | unique) as $all + | " tools on ALL \($count): " + ( [ $all[] | . as $t | select( ([ $m[] | select(.tools|index($t)) ]|length) == $count ) ] | join(", ") ), + " tools on SOME only: " + ( [ $all[] | . as $t | select( ([ $m[] | select(.tools|index($t)) ]|length) < $count ) ] | join(", ") ) + ' 2>/dev/null + echo "" + echo "Machines that must self-remediate (RED/AMBER) - each fixes ITSELF, then re-runs + re-publishes:" + local needfix + needfix="$(echo "$censuses" | jq -r ' + .[] | select(.grade!="GREEN") + | " \(.host) [\(.grade)] should run, in order:\n" + + ( [ .results[] | select(.status=="FAIL" or .status=="WARN") | select(.fix!="") + | " - \(.fix)" ] | join("\n") ) + + "\n then: /self-check --publish" + ' 2>/dev/null)" + if [ -n "$needfix" ]; then + echo "$needfix" + else + echo " (none - whole fleet is GREEN)" + fi + echo "============================================================" + echo "We do NOT fix remote machines. Relay each machine's fix list to its operator;" + echo "they self-remediate locally, re-run /self-check, and re-publish until GREEN." + echo "Once the fleet is reporting consistently, ratify baseline/manifest.json with Mike." +} + +# --------------------------------------------------------------------------- +# Main +# --------------------------------------------------------------------------- +# RUN_TS is passed in by the caller (SKILL.md instructs a real UTC stamp); +# fall back to `date` if available so the script is runnable standalone. +RUN_TS="${SELFCHECK_TS:-$(date -u +%Y-%m-%dT%H:%M:%SZ 2>/dev/null || echo unknown)}" + +MODE="${1:-report}" +case "$MODE" in + fanout) do_fanout; exit 0 ;; + aggregate) do_aggregate; exit 0 ;; +esac + +# run all checks +check_identity +check_tools +check_capability_tier +check_files +check_settings_hooks +check_git +check_skills_commands +check_duplicates +check_memory +check_vault +check_connectivity + +CENSUS="$(build_census)" + +case "$MODE" in + --json) echo "$CENSUS" ;; + --publish) print_report "$CENSUS"; publish_census "$CENSUS" ;; + report|*) print_report "$CENSUS" ;; +esac + +# exit code reflects grade for scripting (0 GREEN, 1 AMBER, 2 RED) +GR="$(echo "$CENSUS" | jq -r .grade)" +case "$GR" in GREEN) exit 0 ;; AMBER) exit 1 ;; RED) exit 2 ;; *) exit 0 ;; esac