feat(self-check): add harness self-diagnosis / fleet conformance skill
New /self-check skill: each machine probes its own ClaudeTools harness wiring (identity.json paths, required tooling, settings.json hooks, skill/command/script set, vault decrypt, coord/Gitea connectivity, Ollama capability tier) and grades RED/AMBER/GREEN against a checked-in provisional baseline manifest. - Capability-tier model: architectural/OS/hardware differences (e.g. no local Ollama) select a fallback ruleset instead of failing. - Duplicate detection: flags command/skill names that diverge between the repo and ~/.claude (the "same /cmd, different behaviour" cross-machine bug); CRLF-only diffs ignored. - Memory check: index + orphan detection, plus a model-driven semantic pass for memories that contradict identity/settings. - V1 is a census tool: --publish writes a per-machine census to coord (component selfcheck_<host>); fanout requests the fleet to self-check + self-remediate + re-publish; aggregate derives the proposed baseline. No machine ever fixes another. Reviewed twice by the Code Review Agent; three CRITICAL coord-API bugs and the CRLF false-WARN found and fixed, verified live against the coord API. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
53
.claude/commands/self-check.md
Normal file
53
.claude/commands/self-check.md
Normal file
@@ -0,0 +1,53 @@
|
||||
---
|
||||
name: self-check
|
||||
description: Self-diagnose this ClaudeTools machine's harness wiring (tools, identity, hooks, skills, commands, scripts, capability tier, connectivity) against the fleet baseline, and optionally publish the census to coord.
|
||||
---
|
||||
|
||||
# /self-check — ClaudeTools Harness Self-Diagnosis
|
||||
|
||||
Runs the conformance probe in `.claude/skills/self-check/scripts/self-check.sh`,
|
||||
which grades this machine RED/AMBER/GREEN against the provisional baseline manifest
|
||||
and accounts for capability differences (e.g. no local Ollama -> remote/none ruleset).
|
||||
|
||||
See `.claude/skills/self-check/SKILL.md` for the full model. This command is a thin
|
||||
runner — invoke the **self-check** skill for interpretation and follow-up.
|
||||
|
||||
## Run it
|
||||
|
||||
Always pass a real UTC timestamp via `SELFCHECK_TS` (the script falls back to
|
||||
`date` if available, but pass it explicitly for correctness):
|
||||
|
||||
```bash
|
||||
SELFCHECK_TS="$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
|
||||
bash .claude/skills/self-check/scripts/self-check.sh report
|
||||
```
|
||||
|
||||
Modes:
|
||||
|
||||
| Arg | What it does |
|
||||
|-----|--------------|
|
||||
| `report` (default) | Human-readable RED/AMBER/GREEN report. Exit 0/1/2 = GREEN/AMBER/RED. |
|
||||
| `--json` | Structured census JSON only (for piping/aggregation). |
|
||||
| `--publish` | Run + PUT the census to coord as component `selfcheck_<host>`. |
|
||||
| `fanout` | Broadcast a self-check + self-remediate + re-publish request to ALL_SESSIONS. |
|
||||
| `aggregate` | Read every machine's published census; print fleet table + proposed baseline. |
|
||||
|
||||
## Typical flows
|
||||
|
||||
**Just check this box:**
|
||||
```bash
|
||||
SELFCHECK_TS="$(date -u +%Y-%m-%dT%H:%M:%SZ)" bash .claude/skills/self-check/scripts/self-check.sh report
|
||||
```
|
||||
|
||||
**Build the fleet baseline (V1 census):**
|
||||
```bash
|
||||
# 1) From any machine, request the fleet to report:
|
||||
bash .claude/skills/self-check/scripts/self-check.sh fanout
|
||||
# 2) Each instance (incl. this one) publishes:
|
||||
SELFCHECK_TS="$(date -u +%Y-%m-%dT%H:%M:%SZ)" bash .claude/skills/self-check/scripts/self-check.sh --publish
|
||||
# 3) Aggregate what came back:
|
||||
bash .claude/skills/self-check/scripts/self-check.sh aggregate
|
||||
```
|
||||
|
||||
After running, summarize the grade and the FAIL/WARN items for the user with the
|
||||
exact fix commands. Do NOT auto-apply fixes — V1 is report-only by design.
|
||||
161
.claude/skills/self-check/SKILL.md
Normal file
161
.claude/skills/self-check/SKILL.md
Normal file
@@ -0,0 +1,161 @@
|
||||
---
|
||||
name: self-check
|
||||
description: >-
|
||||
Self-diagnose a ClaudeTools session's machine: verify the harness is wired the
|
||||
same way as every other instance while allowing for architectural / OS / hardware
|
||||
differences. Checks that identity.json exists and is correct (the map of WHERE
|
||||
things live on this box), required tooling is installed, env/paths resolve,
|
||||
hooks are wired, the skill/command/script set matches the baseline, the vault
|
||||
decrypts, coord/Gitea are reachable, and the machine's capability tier (e.g. no
|
||||
local Ollama) resolves to the right fallback ruleset. Grades RED/AMBER/GREEN and
|
||||
can publish a census to the coord API so the fleet baseline can be built/refined.
|
||||
Invoke for: "self check", "self diagnosis", "self test", "doctor", "health check",
|
||||
"am I configured right", "is my machine set up correctly", "harness conformance",
|
||||
"fleet conformance", "check my environment", "is everything wired up".
|
||||
---
|
||||
|
||||
# Self-Check — ClaudeTools Harness Self-Diagnosis
|
||||
|
||||
A top-to-bottom evaluation of how *this* machine's ClaudeTools harness is wired,
|
||||
graded against a checked-in **baseline manifest** so every machine behaves the
|
||||
same way — while explicitly allowing for architecture, OS, and hardware
|
||||
differences via a **capability tier** model.
|
||||
|
||||
This is the skill the user asked for when a session needs to "make sure
|
||||
everything is as it should be."
|
||||
|
||||
## The model in one paragraph
|
||||
|
||||
`identity.json` is the foundational, per-machine map of **where things live and
|
||||
what this box can do** (vault path, repo root, platform, arch, python command,
|
||||
Ollama endpoints). The **baseline manifest**
|
||||
(`baseline/manifest.json`) declares what *every* machine must have — required
|
||||
tools, identity fields, scripts, hook files, the wired `settings.json` hooks, the
|
||||
canonical skill/command set, and the **capability rules** that say what to do when
|
||||
a capability is absent (e.g. no local Ollama → use the remote endpoint, or if that
|
||||
is also down, route Tier-0 work to haiku instead of blocking). The probe compares
|
||||
the live machine against the manifest, resolves the machine's capability tier, and
|
||||
grades RED/AMBER/GREEN. Required things missing = RED. Advisory drift = AMBER.
|
||||
Capability differences are **never** failures — they select a ruleset.
|
||||
|
||||
## V1 is a CENSUS tool (read this)
|
||||
|
||||
There is no ratified fleet baseline yet. `baseline/manifest.json` is **provisional**,
|
||||
generated from a single known-good machine (GURU-5070). So V1's job is to gather
|
||||
ground truth from every machine and help Mike build the real baseline:
|
||||
|
||||
1. **Probe** — each machine runs the check and produces a structured census.
|
||||
2. **Publish** — `--publish` PUTs the census to coord as component
|
||||
`selfcheck_<host>` (state = grade, notes = full JSON). One row per machine =
|
||||
a live fleet conformance view.
|
||||
3. **Fan out** — `fanout` broadcasts a request to `ALL_SESSIONS` so every active
|
||||
instance reports.
|
||||
4. **Aggregate** — `aggregate` reads all censuses back and proposes a baseline
|
||||
(tools/skills/commands present on *all* machines = "required everywhere";
|
||||
present on *some* = "capability-gated"), and lists machines with FAILs.
|
||||
|
||||
Mike reviews the aggregate and ratifies `manifest.json`. From then on the same
|
||||
probe enforces conformance. **V1 does not auto-fix anything** — it reports the
|
||||
exact fix command for each finding (per the decision on record).
|
||||
|
||||
## Running it
|
||||
|
||||
The probe is `scripts/self-check.sh` (bash; runs on Git Bash/Windows, macOS,
|
||||
Linux; deps: jq + curl). Always pass a real UTC timestamp:
|
||||
|
||||
```bash
|
||||
SELFCHECK_TS="$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
|
||||
bash .claude/skills/self-check/scripts/self-check.sh <mode>
|
||||
```
|
||||
|
||||
| Mode | Purpose |
|
||||
|------|---------|
|
||||
| `report` (default) | Human RED/AMBER/GREEN report. Exit 0/1/2 = GREEN/AMBER/RED. |
|
||||
| `--json` | Structured census JSON to stdout (for piping). |
|
||||
| `--publish` | Run + publish census to coord (component `selfcheck_<host>`). Softfails to `.claude/coord-queue.jsonl`. |
|
||||
| `fanout` | Broadcast a census request to ALL_SESSIONS. |
|
||||
| `aggregate` | Fleet table + proposed-baseline summary from published censuses. |
|
||||
|
||||
`/self-check` is the slash-command runner for the same script.
|
||||
|
||||
## What it checks
|
||||
|
||||
| Category | Checks |
|
||||
|----------|--------|
|
||||
| **identity** | identity.json exists + valid JSON; **all required fields present**; `claudetools_root` exists and equals the running repo; `vault_path` exists; `machine` == hostname; git user.name/email match identity. |
|
||||
| **tooling** | required everywhere: bash, git, jq, curl, sops, age, ssh, a python. Missing = FAIL. |
|
||||
| **capability** | ollama, cargo, node, gh, docker, op — presence is INFO, never a failure. Resolves the **Ollama tier** (local / remote / none) and prints the effective Tier-0 ruleset. |
|
||||
| **files** | required scripts + hook files present and executable. |
|
||||
| **hooks** | the three `settings.json` hooks are wired (block-backslash PreToolUse, check-messages UserPromptSubmit, sync-memory SessionStart); `current-mode` present. |
|
||||
| **git** | origin points at ACG Gitea (internal IP preferred); main-repo post-commit hook installed (AMBER if not). |
|
||||
| **skills/commands** | every skill dir and command file in the baseline is present; extras are reported as census candidates. |
|
||||
| **duplicates** | command/skill names present in BOTH the repo and `~/.claude`. Divergent content = WARN (the "same `/cmd`, different behaviour on the Mac" bug); identical = INFO (redundant, will drift). CRLF-only differences are ignored. |
|
||||
| **memory** | `MEMORY.md` index exists; no orphaned memory files; manifest-declared contradiction patterns (see semantic pass below). Never FAILs the grade. |
|
||||
| **vault** | vault repo exists; sops+age present; `vault.sh list` succeeds (decrypt wired). |
|
||||
| **connectivity** | coord API (required), main API + internal Gitea (advisory; off-network is OK). |
|
||||
|
||||
## Rogue-memory contradiction — semantic pass (do this when asked, or on a full check)
|
||||
|
||||
The engine's memory check is deterministic and conservative (index + orphans +
|
||||
declared patterns) so it never produces false alarms. A *true* contradiction
|
||||
check — "does any memory directly contradict what this machine's settings say?"
|
||||
— is a judgment task, so the model does it (route the prose/classification to
|
||||
Ollama Tier-0 per the house rules; Claude reviews the result):
|
||||
|
||||
1. Read `identity.json` (where things live + this box's capabilities),
|
||||
`settings.json` (wired hooks/permissions), and `baseline/manifest.json`.
|
||||
2. Read the memory index `.claude/memory/MEMORY.md`, then open any memory whose
|
||||
one-line hook touches: paths/roots, python launcher, endpoints/IPs, OS/arch
|
||||
assumptions, tool choices, or model routing.
|
||||
3. Flag memories that **directly contradict** this machine's reality, e.g.:
|
||||
- prescribes `python3`/`python` when `identity.python.command` is `py` (or vice-versa),
|
||||
- hardcodes a repo/vault path that isn't this machine's `claudetools_root`/`vault_path`,
|
||||
- names an endpoint/IP that conflicts with `identity.coord_api` or the manifest,
|
||||
- assumes a capability (local Ollama) this machine's tier says is absent.
|
||||
4. Report each as: memory file, the contradicting claim, the setting it violates,
|
||||
and a suggested correction. **Do not edit memories** — surface for the operator
|
||||
(deletions/rewrites go through the human, mirroring memory-dream's posture).
|
||||
|
||||
Genuinely machine-specific guidance in a *shared* memory is the usual culprit —
|
||||
the fix is to scope it ("on Windows…") or split it, not to globally flip it.
|
||||
|
||||
## Fleet self-remediation loop (machines fix themselves)
|
||||
|
||||
We never fix a remote machine. The flow is:
|
||||
|
||||
1. `fanout` — broadcast asks every instance to self-check + self-fix + re-publish.
|
||||
2. Each operator runs `/self-check` locally, applies the printed fix commands on
|
||||
their own box, re-runs to confirm GREEN, then `/self-check --publish`.
|
||||
3. `aggregate` — shows who is still RED/AMBER and prints each machine's own fix
|
||||
list. Relay it to that operator; do not run it for them.
|
||||
4. Repeat until the fleet is consistently GREEN, then ratify the manifest.
|
||||
|
||||
## How to interpret a run
|
||||
|
||||
After running, summarize for the user:
|
||||
- The **grade** and the PASS/WARN/FAIL/INFO tallies.
|
||||
- Each **FAIL** and **WARN** with its exact fix command. Do not auto-apply.
|
||||
- The **capability tier** line — confirm the machine knows its Tier-0 fallback.
|
||||
- If publishing/aggregating, note how many machines have reported and which are RED.
|
||||
|
||||
Capability differences (no Ollama, no gh, ARM vs amd64, macOS vs Windows) are
|
||||
expected and must never be reported as broken — they are the whole point of the
|
||||
tier model.
|
||||
|
||||
## Files
|
||||
|
||||
```
|
||||
.claude/skills/self-check/
|
||||
SKILL.md this file
|
||||
scripts/self-check.sh the probe engine (report / --json / --publish / fanout / aggregate)
|
||||
baseline/manifest.json the provisional fleet baseline (single source of truth)
|
||||
baseline/README.md the baseline model + how to refine/ratify it
|
||||
.claude/commands/self-check.md the /self-check runner
|
||||
```
|
||||
|
||||
## Extending the baseline
|
||||
|
||||
When a new tool/skill/command/hook becomes mandatory fleet-wide, edit
|
||||
`baseline/manifest.json`, commit, and `/sync`. Every machine's next self-check
|
||||
enforces it. Capability-only tools go in `capability_tools` with a matching entry
|
||||
in `capability_rules` describing the fallback. See `baseline/README.md`.
|
||||
68
.claude/skills/self-check/baseline/README.md
Normal file
68
.claude/skills/self-check/baseline/README.md
Normal file
@@ -0,0 +1,68 @@
|
||||
# Self-Check Baseline
|
||||
|
||||
`manifest.json` is the **single source of truth** for how a ClaudeTools machine
|
||||
should be wired. Every machine's `/self-check` grades itself against this file.
|
||||
It is checked into the repo and syncs to all workstations via Gitea, so updating
|
||||
it here updates the standard for the whole fleet.
|
||||
|
||||
## Why a checked-in manifest (the "how do we set a baseline" answer)
|
||||
|
||||
The fleet has architectural differences — Windows/macOS/Linux, amd64/arm64, boxes
|
||||
with and without a local GPU for Ollama. We still want every machine to *behave*
|
||||
the same. A checked-in manifest gives us that:
|
||||
|
||||
- **Required everywhere** lives in `required_*` keys. Missing = a real failure on
|
||||
any machine, regardless of OS.
|
||||
- **Allowed-to-differ** lives in `capability_tools` + `capability_rules`. A capability
|
||||
being absent is never a failure — it selects a *fallback ruleset* so the machine
|
||||
knows how to behave without it (the canonical example: no local Ollama → use the
|
||||
remote endpoint; if that is down too, route Tier-0 work to haiku instead of
|
||||
blocking on it).
|
||||
|
||||
`identity.json` (per-machine, gitignored) carries the machine's actual coordinates
|
||||
and capabilities. The manifest says what's required; identity says where things are
|
||||
and what this box can do; the probe reconciles the two.
|
||||
|
||||
## How the baseline gets built (V1 census → ratified baseline)
|
||||
|
||||
The current `manifest.json` is **provisional** — generated from one known-good
|
||||
machine (GURU-5070, 2026-06-02). It has not been confirmed against the fleet.
|
||||
The build sequence:
|
||||
|
||||
1. `self-check.sh fanout` — ask every active instance to report.
|
||||
2. Each machine runs `self-check.sh --publish` — publishes its census to coord as
|
||||
component `selfcheck_<host>`.
|
||||
3. `self-check.sh aggregate` — reads them all back and proposes:
|
||||
- items present on **all** reporting machines → **required everywhere** candidates,
|
||||
- items present on **some** machines → **capability-gated** candidates.
|
||||
4. Mike reviews the proposal and edits this `manifest.json` to ratify it, then
|
||||
flips `"status": "provisional"` to `"ratified"` and commits.
|
||||
|
||||
Until ratified, treat "extra"/"missing" skill and command findings as **candidates**,
|
||||
not gospel.
|
||||
|
||||
## Manifest structure
|
||||
|
||||
| Key | Meaning |
|
||||
|-----|---------|
|
||||
| `schema_version`, `status` | manifest version; `provisional` or `ratified`. |
|
||||
| `required_tools[]` | tools every machine must have on PATH (`name`, `why`). Missing = FAIL. |
|
||||
| `required_python.any_of[]` | acceptable python launchers (`py`/`python3`/`python`). |
|
||||
| `capability_tools[]` | optional tools; presence is INFO and toggles a `capability`. |
|
||||
| `required_identity_fields[]` | dotted identity.json paths that must be set. |
|
||||
| `required_scripts[]`, `required_hook_files[]` | repo files that must exist (+ executable). |
|
||||
| `required_settings_hooks[]` | hooks that must be wired in settings.json (`event`, `command_contains`, `why`). |
|
||||
| `git` | expected remote host, internal IP, post-commit-hook expectation. |
|
||||
| `skills[]`, `commands[]` | the canonical skill dirs / command files. |
|
||||
| `connectivity[]` | endpoints to probe (`required` ones FAIL if unreachable). |
|
||||
| `capability_rules{}` | per-capability fallback behavior (the "different rules" for different hardware). |
|
||||
|
||||
## Editing rules
|
||||
|
||||
- Add a **required** item only when it should hold on *every* machine, every OS.
|
||||
If a Windows box legitimately can't have it, it's a capability, not a requirement.
|
||||
- Every `capability_tools` entry needs a story for what to do without it — encode
|
||||
it in `capability_rules` (or reference an existing tier) so the probe can print
|
||||
the effective ruleset.
|
||||
- Keep paths repo-relative and forward-slashed.
|
||||
- After editing: commit + `/sync`. The fleet picks it up on next pull.
|
||||
114
.claude/skills/self-check/baseline/manifest.json
Normal file
114
.claude/skills/self-check/baseline/manifest.json
Normal file
@@ -0,0 +1,114 @@
|
||||
{
|
||||
"schema_version": "1.0.0",
|
||||
"status": "provisional",
|
||||
"derived_from": "GURU-5070",
|
||||
"derived_at": "2026-06-02",
|
||||
"note": "PROVISIONAL baseline, generated from a single known-good machine. V1 of self-check is a CENSUS tool: every machine probes itself, publishes to the coord API, and we refine this manifest from real fleet data (see baseline/README.md). Do NOT treat 'extra' or 'missing' items as authoritative until the fleet census has confirmed them across machines.",
|
||||
|
||||
"required_tools": [
|
||||
{ "name": "bash", "why": "hooks, scripts, sync, vault wrapper" },
|
||||
{ "name": "git", "why": "repo + submodules + Gitea sync" },
|
||||
{ "name": "jq", "why": "every hook and coord script parses JSON with jq" },
|
||||
{ "name": "curl", "why": "coord API, vault, RMM, all HTTP calls" },
|
||||
{ "name": "sops", "why": "vault decryption (SOPS)" },
|
||||
{ "name": "age", "why": "SOPS age recipient/decrypt" },
|
||||
{ "name": "ssh", "why": "infra access; must be system OpenSSH" }
|
||||
],
|
||||
|
||||
"required_python": {
|
||||
"any_of": ["py", "python3", "python"],
|
||||
"why": "JSON sanitizer in check-messages.sh, identity migration, skill scripts. The resolved command is recorded in identity.json (.python.command)."
|
||||
},
|
||||
|
||||
"capability_tools": [
|
||||
{ "name": "ollama", "capability": "ollama_local", "why": "Tier-0 local inference (prose/classification)" },
|
||||
{ "name": "cargo", "capability": "rust_build", "why": "GuruRMM / GuruConnect Rust builds" },
|
||||
{ "name": "node", "capability": "node_build", "why": "dashboard / TS builds" },
|
||||
{ "name": "gh", "capability": "github_cli", "why": "optional GitHub operations" },
|
||||
{ "name": "docker", "capability": "containers", "why": "optional container workflows" },
|
||||
{ "name": "op", "capability": "onepassword_cli","why": "1Password fallback credential access" }
|
||||
],
|
||||
|
||||
"required_identity_fields": [
|
||||
"user", "full_name", "email", "role", "machine",
|
||||
"vault_path", "claudetools_root", "platform", "architecture",
|
||||
"python.command", "ollama.endpoint", "ollama.fallback", "ollama.prose_model"
|
||||
],
|
||||
"optional_identity_fields": ["coord_api", "last_updated"],
|
||||
|
||||
"required_scripts": [
|
||||
".claude/scripts/vault.sh",
|
||||
".claude/scripts/sync.sh",
|
||||
".claude/scripts/sync-memory.sh",
|
||||
".claude/scripts/check-messages.sh",
|
||||
".claude/scripts/migrate-identity.sh"
|
||||
],
|
||||
|
||||
"required_hook_files": [
|
||||
".claude/hooks/block-backslash-winpath.sh",
|
||||
".claude/hooks/post-commit.template"
|
||||
],
|
||||
|
||||
"required_settings_hooks": [
|
||||
{ "event": "PreToolUse", "matcher": "Bash", "command_contains": "block-backslash-winpath.sh", "why": "blocks garbled backslash Windows-path redirects in Git Bash" },
|
||||
{ "event": "UserPromptSubmit", "matcher": "", "command_contains": "check-messages.sh", "why": "injects unread coord messages + dev-mode locks each prompt" },
|
||||
{ "event": "SessionStart", "matcher": "", "command_contains": "sync-memory.sh", "why": "pulls shared memory at session start" }
|
||||
],
|
||||
|
||||
"git": {
|
||||
"remote_host_contains": "git.azcomputerguru.com",
|
||||
"remote_host_internal_ip": "172.16.3.20",
|
||||
"remote_note": "On-network machines should use the internal Gitea IP (172.16.3.20:3000) to bypass NPM SSL-renewal blips; off-network may use the domain. Either is acceptable; a non-ACG remote is a FAIL.",
|
||||
"post_commit_hook_expected": true,
|
||||
"post_commit_hook_note": "HOOKS.md mandates the dev-alerts post-commit hook in the main repo and each initialized submodule. Missing = AMBER (informational; reinstall from .claude/hooks/post-commit.template)."
|
||||
},
|
||||
|
||||
"skills": [
|
||||
"1password", "b2", "bitdefender", "frontend-design", "gc-audit",
|
||||
"impeccable", "memory-dream", "remediation-tool", "rmm-audit",
|
||||
"skill-creator", "stop-slop", "theme-factory", "self-check"
|
||||
],
|
||||
|
||||
"commands": [
|
||||
"1password", "autotask", "checkpoint", "context", "create-spec",
|
||||
"feature-request", "forum-post", "gc-feature-request", "import",
|
||||
"inject-standards", "mailbox", "mode", "recover", "remediation-tool",
|
||||
"rmm", "save", "scc", "shape-spec", "sync", "syncro-emergency-billing",
|
||||
"syncro", "wiki-compile", "wiki-lint", "self-check"
|
||||
],
|
||||
|
||||
"connectivity": [
|
||||
{ "name": "coord_api", "url": "http://172.16.3.30:8001/api/coord/status", "required": true, "why": "live coordination source of truth" },
|
||||
{ "name": "claudetools_api","url": "http://172.16.3.30:8001/health", "required": false, "why": "main API health" },
|
||||
{ "name": "gitea_internal", "url": "http://172.16.3.20:3000", "required": false, "why": "internal Gitea (git/API on-network)" }
|
||||
],
|
||||
|
||||
"memory": {
|
||||
"note": "Deterministic memory checks: MEMORY.md index exists + no orphaned memory files, plus the contradiction_patterns below. A pattern fires ONLY on machines where identity.<when_field> == when_equals, so it flags a memory only where it is actually a contradiction for THIS box. Kept empty in V1 to avoid false positives; the real semantic contradiction analysis (memories vs identity.json + settings.json + this manifest) is done by the model per SKILL.md, optionally via Ollama Tier-0.",
|
||||
"pattern_schema": {
|
||||
"when_field": "dotted identity.json path, e.g. python.command",
|
||||
"when_equals": "value that makes the grep a contradiction, e.g. python3",
|
||||
"grep": "ERE matched case-insensitively against memory files",
|
||||
"why": "human explanation shown in the finding"
|
||||
},
|
||||
"contradiction_patterns": []
|
||||
},
|
||||
|
||||
"capability_rules": {
|
||||
"ollama_local": {
|
||||
"tier0_engine": "local ollama (localhost:11434) for summarize/classify/extract/draft",
|
||||
"detect": "curl localhost:11434/api/tags reachable",
|
||||
"fallback_if_unavailable": "ollama_remote"
|
||||
},
|
||||
"ollama_remote": {
|
||||
"tier0_engine": "remote ollama via identity.ollama.fallback (Beast over Tailscale)",
|
||||
"detect": "localhost unreachable but identity.ollama.fallback reachable",
|
||||
"fallback_if_unavailable": "ollama_none"
|
||||
},
|
||||
"ollama_none": {
|
||||
"tier0_engine": "NONE - no Tier-0 path. Route low-stakes prose/classification to Tier-1 (haiku) instead of Ollama. Do NOT block work waiting on Ollama.",
|
||||
"detect": "neither localhost nor fallback reachable",
|
||||
"fallback_if_unavailable": null
|
||||
}
|
||||
}
|
||||
}
|
||||
746
.claude/skills/self-check/scripts/self-check.sh
Normal file
746
.claude/skills/self-check/scripts/self-check.sh
Normal file
@@ -0,0 +1,746 @@
|
||||
#!/usr/bin/env bash
|
||||
# self-check.sh - ClaudeTools harness self-diagnosis / fleet conformance probe.
|
||||
#
|
||||
# V1 is a CENSUS tool. Each machine probes its own harness wiring (tools,
|
||||
# identity, hooks, skills, commands, scripts, connectivity, capability tier),
|
||||
# grades what it can against the provisional baseline manifest, and can publish
|
||||
# the result to the coord API so the fleet can be compared and the baseline
|
||||
# refined from real data. See ../SKILL.md and ../baseline/README.md.
|
||||
#
|
||||
# Usage:
|
||||
# self-check.sh Run checks, print a human report. (default)
|
||||
# self-check.sh --json Emit the structured census JSON to stdout only.
|
||||
# self-check.sh --publish Run checks, then PUT the census to coord (component selfcheck_<host>).
|
||||
# self-check.sh fanout Broadcast a request to ALL_SESSIONS to run /self-check --publish.
|
||||
# self-check.sh aggregate Read every machine's published census and print a fleet table
|
||||
# plus a proposed-baseline (intersection/union) summary.
|
||||
#
|
||||
# Portable: bash 3.2+ (macOS), Git Bash (Windows), Linux. Deps: jq, curl.
|
||||
# Read-only. It collects and reports; it changes nothing on the machine.
|
||||
|
||||
set -u
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Bootstrap: resolve repo root, identity, coord API, session id
|
||||
# ---------------------------------------------------------------------------
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
SKILL_DIR="$(cd "$SCRIPT_DIR/.." && pwd)"
|
||||
REPO_ROOT="$(cd "$SCRIPT_DIR/../../../.." && pwd)"
|
||||
MANIFEST="$SKILL_DIR/baseline/manifest.json"
|
||||
|
||||
if ! command -v jq >/dev/null 2>&1; then
|
||||
echo "[ERROR] jq is required and not found on PATH. Install jq, then re-run." >&2
|
||||
exit 2
|
||||
fi
|
||||
# Some Windows jq builds (winget) emit CRLF line endings; a trailing \r corrupts
|
||||
# every `for x in $(jq ...)` word and `read`-from-@tsv field. Strip \r from all
|
||||
# jq output (it is insignificant JSON whitespace and never wanted in raw values).
|
||||
jq() { command jq "$@" | tr -d '\r'; }
|
||||
if ! command -v curl >/dev/null 2>&1; then
|
||||
echo "[ERROR] curl is required and not found on PATH." >&2
|
||||
exit 2
|
||||
fi
|
||||
if [ ! -f "$MANIFEST" ]; then
|
||||
echo "[ERROR] Baseline manifest not found: $MANIFEST" >&2
|
||||
exit 2
|
||||
fi
|
||||
|
||||
# identity.json: prefer repo copy, then ~/.claude (mirrors check-messages.sh).
|
||||
IDENTITY=""
|
||||
for c in "$REPO_ROOT/.claude/identity.json" "$HOME/.claude/identity.json"; do
|
||||
[ -f "$c" ] && { IDENTITY="$c"; break; }
|
||||
done
|
||||
|
||||
idfield() { # dotted.path -> value or empty
|
||||
[ -n "$IDENTITY" ] && jq -r "$1 // empty" "$IDENTITY" 2>/dev/null
|
||||
}
|
||||
|
||||
HOSTNAME_RAW="$(hostname 2>/dev/null || echo unknown)"
|
||||
HOST="${HOSTNAME_RAW%.local}"
|
||||
SESSION="${HOST}/claude-main"
|
||||
|
||||
API="$(idfield '.coord_api')"
|
||||
[ -z "$API" ] && API="http://172.16.3.30:8001"
|
||||
|
||||
PLATFORM="$(idfield '.platform')"
|
||||
[ -z "$PLATFORM" ] && case "$(uname -s)" in
|
||||
Darwin) PLATFORM="macos" ;; Linux) PLATFORM="linux" ;;
|
||||
CYGWIN*|MINGW*|MSYS*) PLATFORM="windows" ;; *) PLATFORM="unknown" ;;
|
||||
esac
|
||||
ARCH="$(idfield '.architecture')"
|
||||
[ -z "$ARCH" ] && ARCH="$(uname -m 2>/dev/null || echo unknown)"
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Results accumulation. Each check appends one compact JSON object.
|
||||
# status in {PASS, WARN, FAIL, SKIP, INFO}. Grade: any FAIL->RED, WARN->AMBER.
|
||||
# ---------------------------------------------------------------------------
|
||||
RESULTS_FILE="$(mktemp 2>/dev/null || echo "${TMPDIR:-/tmp}/selfcheck.$$")"
|
||||
: > "$RESULTS_FILE"
|
||||
trap 'rm -f "$RESULTS_FILE" 2>/dev/null' EXIT
|
||||
|
||||
emit() { # id category status detail fix
|
||||
jq -nc --arg id "$1" --arg cat "$2" --arg st "$3" --arg detail "$4" --arg fix "${5:-}" \
|
||||
'{id:$id,category:$cat,status:$st,detail:$detail,fix:$fix}' >> "$RESULTS_FILE"
|
||||
}
|
||||
|
||||
reachable() { curl -s -o /dev/null -m 4 "$1" 2>/dev/null; } # exit 0 if HTTP responds
|
||||
|
||||
# Content-equal ignoring line endings: a repo LF copy and a ~/.claude CRLF copy
|
||||
# are the SAME content (the cross-machine case this check polices), so compare
|
||||
# with \r stripped rather than byte-for-byte (cmp would false-flag them).
|
||||
same_content() { diff -q <(tr -d '\r' < "$1") <(tr -d '\r' < "$2") >/dev/null 2>&1; }
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# CHECK: identity
|
||||
# ---------------------------------------------------------------------------
|
||||
check_identity() {
|
||||
if [ -z "$IDENTITY" ]; then
|
||||
emit identity.present identity FAIL "identity.json not found (.claude or ~/.claude)" \
|
||||
"Run onboarding; create .claude/identity.json then 'bash .claude/scripts/migrate-identity.sh'"
|
||||
return
|
||||
fi
|
||||
if ! jq -e . "$IDENTITY" >/dev/null 2>&1; then
|
||||
emit identity.parse identity FAIL "identity.json is not valid JSON: $IDENTITY" "Fix the JSON syntax"
|
||||
return
|
||||
fi
|
||||
emit identity.present identity PASS "identity.json present and valid: $IDENTITY"
|
||||
|
||||
local missing=""
|
||||
for f in $(jq -r '.required_identity_fields[]' "$MANIFEST"); do
|
||||
local v; v="$(jq -r ".$f // empty" "$IDENTITY" 2>/dev/null)"
|
||||
[ -z "$v" ] && missing="$missing $f"
|
||||
done
|
||||
if [ -n "$missing" ]; then
|
||||
emit identity.fields identity WARN "missing/empty identity fields:$missing" \
|
||||
"bash .claude/scripts/migrate-identity.sh (populates machine-specific fields)"
|
||||
else
|
||||
emit identity.fields identity PASS "all required identity fields present"
|
||||
fi
|
||||
|
||||
# --- path fields: identity.json is the map of WHERE things live on this box.
|
||||
# It is foundational - every later check trusts claudetools_root / vault_path.
|
||||
# Verify they resolve to real locations and that claudetools_root is in fact
|
||||
# the repo we are running from (a stale clone path is a silent footgun).
|
||||
norm() { # path -> lowercase, forward-slash, drive-letter, no trailing slash
|
||||
local p="$1"
|
||||
command -v cygpath >/dev/null 2>&1 && p="$(cygpath -m "$p" 2>/dev/null || echo "$p")"
|
||||
printf '%s' "$p" | tr 'A-Z' 'a-z' | sed 's#\\#/#g; s#/\{1,\}$##'
|
||||
}
|
||||
local ctroot; ctroot="$(idfield '.claudetools_root')"
|
||||
if [ -z "$ctroot" ]; then
|
||||
emit identity.claudetools_root identity FAIL "identity.claudetools_root not set" \
|
||||
"Set claudetools_root in identity.json to this repo's absolute path"
|
||||
elif [ ! -d "$ctroot" ]; then
|
||||
emit identity.claudetools_root identity FAIL "claudetools_root does not exist: $ctroot" \
|
||||
"Fix claudetools_root in identity.json (machine moved/renamed the repo?)"
|
||||
elif [ "$(norm "$ctroot")" != "$(norm "$REPO_ROOT")" ]; then
|
||||
emit identity.claudetools_root identity WARN \
|
||||
"claudetools_root ($ctroot) != running repo ($REPO_ROOT)" \
|
||||
"Reconcile claudetools_root in identity.json with the repo you actually run from"
|
||||
else
|
||||
emit identity.claudetools_root identity PASS "claudetools_root resolves to this repo ($ctroot)"
|
||||
fi
|
||||
|
||||
local vpath2; vpath2="$(idfield '.vault_path')"
|
||||
if [ -z "$vpath2" ]; then
|
||||
emit identity.vault_path identity FAIL "identity.vault_path not set (cannot locate the SOPS vault)" \
|
||||
"Set vault_path in identity.json to the cloned vault repo path"
|
||||
elif [ ! -d "$vpath2" ]; then
|
||||
emit identity.vault_path identity FAIL "vault_path does not exist: $vpath2" \
|
||||
"Clone the vault repo and set vault_path in identity.json"
|
||||
else
|
||||
emit identity.vault_path identity PASS "vault_path resolves ($vpath2)"
|
||||
fi
|
||||
|
||||
# machine field vs actual hostname
|
||||
local idmach; idmach="$(idfield '.machine')"
|
||||
if [ -n "$idmach" ] && [ "$(echo "$idmach" | tr 'A-Z' 'a-z')" != "$(echo "$HOST" | tr 'A-Z' 'a-z')" ]; then
|
||||
emit identity.hostname identity WARN "identity.machine='$idmach' != actual hostname '$HOST'" \
|
||||
"Update .machine in identity.json (did you clone onto a new box?)"
|
||||
else
|
||||
emit identity.hostname identity PASS "identity.machine matches hostname ($HOST)"
|
||||
fi
|
||||
|
||||
# git config vs identity
|
||||
local gn ge idn ide
|
||||
gn="$(git -C "$REPO_ROOT" config user.name 2>/dev/null)"
|
||||
ge="$(git -C "$REPO_ROOT" config user.email 2>/dev/null)"
|
||||
idn="$(idfield '.full_name')"; ide="$(idfield '.email')"
|
||||
if [ -n "$idn" ] && [ "$gn" != "$idn" ]; then
|
||||
emit identity.git_name identity WARN "git user.name='$gn' != identity.full_name='$idn'" \
|
||||
"git config user.name \"$idn\""
|
||||
else
|
||||
emit identity.git_name identity PASS "git user.name matches identity ($gn)"
|
||||
fi
|
||||
if [ -n "$ide" ] && [ "$ge" != "$ide" ]; then
|
||||
emit identity.git_email identity WARN "git user.email='$ge' != identity.email='$ide'" \
|
||||
"git config user.email \"$ide\""
|
||||
else
|
||||
emit identity.git_email identity PASS "git user.email matches identity ($ge)"
|
||||
fi
|
||||
}
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# CHECK: tooling (required + capability-gated)
|
||||
# ---------------------------------------------------------------------------
|
||||
toolver() { # best-effort one-line version
|
||||
"$1" --version 2>/dev/null | head -1 || true
|
||||
}
|
||||
check_tools() {
|
||||
local n why
|
||||
while IFS=$'\t' read -r n why; do
|
||||
if command -v "$n" >/dev/null 2>&1; then
|
||||
emit "tool.$n" tooling PASS "$n present ($(toolver "$n"))"
|
||||
else
|
||||
emit "tool.$n" tooling FAIL "$n MISSING (required: $why)" "Install $n and ensure it is on PATH"
|
||||
fi
|
||||
done < <(jq -r '.required_tools[] | [.name, .why] | @tsv' "$MANIFEST")
|
||||
|
||||
# python: any_of
|
||||
local pyok="" pc
|
||||
for pc in $(jq -r '.required_python.any_of[]' "$MANIFEST"); do
|
||||
if command -v "$pc" >/dev/null 2>&1; then pyok="$pc"; break; fi
|
||||
done
|
||||
if [ -n "$pyok" ]; then
|
||||
local declared; declared="$(idfield '.python.command')"
|
||||
if [ -n "$declared" ] && ! command -v "$declared" >/dev/null 2>&1; then
|
||||
emit tool.python tooling WARN "identity.python.command='$declared' not on PATH; '$pyok' is available" \
|
||||
"Update .python.command in identity.json or re-run migrate-identity.sh"
|
||||
else
|
||||
emit tool.python tooling PASS "python available ($pyok; identity declares '${declared:-unset}')"
|
||||
fi
|
||||
else
|
||||
emit tool.python tooling FAIL "no python interpreter found (tried py/python3/python)" "Install Python"
|
||||
fi
|
||||
|
||||
# capability tools - presence only, never FAIL
|
||||
local cn cap cwhy
|
||||
while IFS=$'\t' read -r cn cap cwhy; do
|
||||
if command -v "$cn" >/dev/null 2>&1; then
|
||||
emit "cap.$cn" capability INFO "$cn present [$cap] ($(toolver "$cn"))"
|
||||
else
|
||||
emit "cap.$cn" capability INFO "$cn absent [$cap] - capability off ($cwhy)"
|
||||
fi
|
||||
done < <(jq -r '.capability_tools[] | [.name, .capability, .why] | @tsv' "$MANIFEST")
|
||||
}
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# CHECK: capability tier (ollama) + effective ruleset
|
||||
# ---------------------------------------------------------------------------
|
||||
check_capability_tier() {
|
||||
local declared fb local_ok="" remote_ok="" tier rule
|
||||
declared="$(idfield '.ollama.endpoint')"
|
||||
fb="$(idfield '.ollama.fallback')"
|
||||
|
||||
reachable "http://localhost:11434/api/tags" && local_ok=1
|
||||
[ -n "$fb" ] && reachable "${fb%/}/api/tags" && remote_ok=1
|
||||
|
||||
if [ -n "$local_ok" ]; then
|
||||
tier="ollama_local"
|
||||
elif [ -n "$remote_ok" ]; then
|
||||
tier="ollama_remote"
|
||||
else
|
||||
tier="ollama_none"
|
||||
fi
|
||||
rule="$(jq -r ".capability_rules.$tier.tier0_engine" "$MANIFEST")"
|
||||
|
||||
# Does the resolved tier agree with what identity declares?
|
||||
if [ "$tier" = "ollama_none" ]; then
|
||||
emit captier.ollama capability WARN "Ollama tier = NONE (local + fallback both unreachable). Effective rule: $rule" \
|
||||
"Confirm this machine is meant to run without Ollama; ensure Tier-0 work routes to haiku, not blocked"
|
||||
else
|
||||
local note=""
|
||||
if [ "$tier" = "ollama_remote" ] && echo "$declared" | grep -q "localhost"; then
|
||||
note=" (identity declares localhost but local is down; using fallback $fb)"
|
||||
fi
|
||||
emit captier.ollama capability PASS "Ollama tier = ${tier#ollama_}${note}. Effective Tier-0: $rule"
|
||||
fi
|
||||
}
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# CHECK: required scripts + hook files (exist + executable)
|
||||
# ---------------------------------------------------------------------------
|
||||
check_files() {
|
||||
local rel p
|
||||
for rel in $(jq -r '.required_scripts[], .required_hook_files[]' "$MANIFEST"); do
|
||||
p="$REPO_ROOT/$rel"
|
||||
if [ ! -f "$p" ]; then
|
||||
emit "file.$rel" files FAIL "missing: $rel" "Restore via /sync (git pull from Gitea)"
|
||||
elif [ ! -x "$p" ] && echo "$rel" | grep -qE '\.(sh|template)$'; then
|
||||
emit "file.$rel" files WARN "present but not executable: $rel" "chmod +x \"$rel\""
|
||||
else
|
||||
emit "file.$rel" files PASS "present: $rel"
|
||||
fi
|
||||
done
|
||||
}
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# CHECK: settings.json hooks wired correctly
|
||||
# ---------------------------------------------------------------------------
|
||||
check_settings_hooks() {
|
||||
local settings="$REPO_ROOT/.claude/settings.json"
|
||||
if [ ! -f "$settings" ] || ! jq -e . "$settings" >/dev/null 2>&1; then
|
||||
emit hooks.settings hooks FAIL "settings.json missing or invalid JSON" "Restore .claude/settings.json via /sync"
|
||||
return
|
||||
fi
|
||||
local ev needle why found
|
||||
# NB: omit .matcher from the TSV - an empty middle field collapses under tab
|
||||
# IFS (tab is IFS-whitespace), shifting columns. We do not use the matcher here.
|
||||
while IFS=$'\t' read -r ev needle why; do
|
||||
# any hook command under this event containing the needle
|
||||
found="$(jq -r --arg ev "$ev" --arg n "$needle" \
|
||||
'(.hooks[$ev] // []) | [.[].hooks[]?.command // ""] | map(select(contains($n))) | length' \
|
||||
"$settings" 2>/dev/null)"
|
||||
if [ "${found:-0}" -gt 0 ] 2>/dev/null; then
|
||||
emit "hook.$ev" hooks PASS "$ev hook wired ($needle)"
|
||||
else
|
||||
emit "hook.$ev" hooks FAIL "$ev hook NOT wired (expected command containing '$needle' - $why)" \
|
||||
"Add the $ev hook to .claude/settings.json (see baseline manifest required_settings_hooks)"
|
||||
fi
|
||||
done < <(jq -r '.required_settings_hooks[] | [.event, .command_contains, .why] | @tsv' "$MANIFEST")
|
||||
|
||||
# current-mode file (created by UserPromptSubmit hook, but flag if absent)
|
||||
if [ -f "$REPO_ROOT/.claude/current-mode" ]; then
|
||||
emit hook.current-mode hooks PASS "current-mode present ($(tr -d '[:space:]' < "$REPO_ROOT/.claude/current-mode"))"
|
||||
else
|
||||
emit hook.current-mode hooks WARN "current-mode missing (auto-created on next prompt)" "echo general > .claude/current-mode"
|
||||
fi
|
||||
}
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# CHECK: git remote + post-commit hooks
|
||||
# ---------------------------------------------------------------------------
|
||||
check_git() {
|
||||
local url want host_ip
|
||||
url="$(git -C "$REPO_ROOT" remote get-url origin 2>/dev/null)"
|
||||
want="$(jq -r '.git.remote_host_contains' "$MANIFEST")"
|
||||
host_ip="$(jq -r '.git.remote_host_internal_ip' "$MANIFEST")"
|
||||
if [ -z "$url" ]; then
|
||||
emit git.remote git WARN "no 'origin' remote on $REPO_ROOT" "git remote add origin <gitea-url>"
|
||||
elif echo "$url" | grep -qF "$want" || echo "$url" | grep -qF "$host_ip"; then
|
||||
emit git.remote git PASS "origin -> $url"
|
||||
else
|
||||
emit git.remote git FAIL "origin does not point at ACG Gitea: $url" \
|
||||
"git remote set-url origin http://<user>@$host_ip:3000/azcomputerguru/claudetools.git"
|
||||
fi
|
||||
|
||||
if [ "$(jq -r '.git.post_commit_hook_expected' "$MANIFEST")" = "true" ]; then
|
||||
if [ -f "$REPO_ROOT/.git/hooks/post-commit" ]; then
|
||||
emit git.post_commit git PASS "main-repo post-commit hook installed"
|
||||
else
|
||||
emit git.post_commit git WARN "main-repo post-commit hook NOT installed (HOOKS.md mandates dev-alerts hook)" \
|
||||
"cp .claude/hooks/post-commit.template .git/hooks/post-commit && chmod +x .git/hooks/post-commit"
|
||||
fi
|
||||
fi
|
||||
}
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# CHECK: skills + commands conformance vs manifest
|
||||
# ---------------------------------------------------------------------------
|
||||
check_skills_commands() {
|
||||
local name dir
|
||||
# skills present
|
||||
for name in $(jq -r '.skills[]' "$MANIFEST"); do
|
||||
dir="$REPO_ROOT/.claude/skills/$name"
|
||||
if [ -d "$dir" ]; then
|
||||
if [ -f "$dir/SKILL.md" ] || ls "$dir"/*.md >/dev/null 2>&1 || [ -d "$dir/scripts" ]; then
|
||||
emit "skill.$name" skills PASS "skill present: $name"
|
||||
else
|
||||
emit "skill.$name" skills WARN "skill dir present but looks empty: $name" "Restore skill contents via /sync"
|
||||
fi
|
||||
else
|
||||
emit "skill.$name" skills FAIL "skill MISSING: $name" "Restore .claude/skills/$name via /sync"
|
||||
fi
|
||||
done
|
||||
# extra skills not in manifest (drift to report, not fail in V1)
|
||||
local known
|
||||
known="|$(jq -r '.skills[]' "$MANIFEST" | tr '\n' '|')"
|
||||
for dir in "$REPO_ROOT"/.claude/skills/*/; do
|
||||
[ -d "$dir" ] || continue
|
||||
name="$(basename "$dir")"
|
||||
case "$known" in *"|$name|"*) ;; *) emit "skill.extra.$name" skills INFO "skill present but NOT in baseline: $name (census candidate)" ;; esac
|
||||
done
|
||||
|
||||
# commands present
|
||||
for name in $(jq -r '.commands[]' "$MANIFEST"); do
|
||||
if [ -f "$REPO_ROOT/.claude/commands/$name.md" ]; then
|
||||
emit "cmd.$name" commands PASS "command present: /$name"
|
||||
else
|
||||
emit "cmd.$name" commands FAIL "command MISSING: /$name" "Restore .claude/commands/$name.md via /sync"
|
||||
fi
|
||||
done
|
||||
# extra commands
|
||||
known="|$(jq -r '.commands[]' "$MANIFEST" | tr '\n' '|')"
|
||||
for f in "$REPO_ROOT"/.claude/commands/*.md; do
|
||||
[ -f "$f" ] || continue
|
||||
name="$(basename "$f" .md)"
|
||||
[ "$name" = "README" ] && continue
|
||||
case "$known" in *"|$name|"*) ;; *) emit "cmd.extra.$name" commands INFO "command present but NOT in baseline: /$name (census candidate)" ;; esac
|
||||
done
|
||||
}
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# CHECK: vault decrypt readiness
|
||||
# ---------------------------------------------------------------------------
|
||||
check_vault() {
|
||||
local vpath; vpath="$(idfield '.vault_path')"
|
||||
if [ -z "$vpath" ]; then
|
||||
emit vault.path vault WARN "identity.vault_path not set" "Set vault_path in identity.json"
|
||||
return
|
||||
fi
|
||||
if [ ! -d "$vpath" ]; then
|
||||
emit vault.path vault FAIL "vault_path does not exist: $vpath" "Clone the vault repo and set vault_path"
|
||||
return
|
||||
fi
|
||||
emit vault.path vault PASS "vault repo present: $vpath"
|
||||
if ! command -v sops >/dev/null 2>&1 || ! command -v age >/dev/null 2>&1; then
|
||||
emit vault.tools vault FAIL "sops/age missing - cannot decrypt vault" "Install sops and age"
|
||||
return
|
||||
fi
|
||||
# Lightweight readiness: vault.sh list should enumerate entries without error.
|
||||
if [ -x "$REPO_ROOT/.claude/scripts/vault.sh" ]; then
|
||||
if bash "$REPO_ROOT/.claude/scripts/vault.sh" list >/dev/null 2>&1; then
|
||||
emit vault.list vault PASS "vault.sh list succeeded (sops/age wired)"
|
||||
else
|
||||
emit vault.list vault WARN "vault.sh list failed - check age key + SOPS_AGE_KEY_FILE" \
|
||||
"Verify age key at the SOPS recipient path; run: bash .claude/scripts/vault.sh list"
|
||||
fi
|
||||
fi
|
||||
}
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# CHECK: connectivity
|
||||
# ---------------------------------------------------------------------------
|
||||
check_connectivity() {
|
||||
local name url req
|
||||
while IFS=$'\t' read -r name url req; do
|
||||
if reachable "$url"; then
|
||||
emit "net.$name" connectivity PASS "$name reachable ($url)"
|
||||
elif [ "$req" = "true" ]; then
|
||||
emit "net.$name" connectivity FAIL "$name UNREACHABLE ($url)" "Check VPN/Tailscale/network to 172.16.3.x"
|
||||
else
|
||||
emit "net.$name" connectivity WARN "$name unreachable ($url) - off-network is OK" ""
|
||||
fi
|
||||
done < <(jq -r '.connectivity[] | [.name, .url, (.required|tostring)] | @tsv' "$MANIFEST")
|
||||
}
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# CHECK: duplicate command/skill definitions across search roots.
|
||||
# Claude Code resolves slash commands and skills from BOTH the repo
|
||||
# (.claude/commands, .claude/skills) and the user profile (~/.claude/...). When
|
||||
# the same name exists in both with DIFFERENT content, the harness may resolve a
|
||||
# different one than you expect - the "same /cmd, different behaviour on the Mac"
|
||||
# bug. Divergent = WARN; identical = INFO (redundant copy that WILL drift).
|
||||
# ---------------------------------------------------------------------------
|
||||
check_duplicates() {
|
||||
local kind repo_dir user_dir
|
||||
# commands: compare *.md files by content
|
||||
for kind in commands skills; do
|
||||
repo_dir="$REPO_ROOT/.claude/$kind"
|
||||
user_dir="$HOME/.claude/$kind"
|
||||
[ -d "$repo_dir" ] || continue
|
||||
[ -d "$user_dir" ] || { emit "dup.$kind" duplicates PASS "no user-level ~/.claude/$kind (single source: repo)"; continue; }
|
||||
|
||||
local name rp up dup_div=0 dup_same=0
|
||||
if [ "$kind" = "commands" ]; then
|
||||
for rp in "$repo_dir"/*.md; do
|
||||
[ -f "$rp" ] || continue
|
||||
name="$(basename "$rp" .md)"
|
||||
[ "$name" = "README" ] && continue
|
||||
up="$user_dir/$name.md"
|
||||
[ -f "$up" ] || continue
|
||||
[ "$rp" -ef "$up" ] && continue # symlink to the same file - cannot drift
|
||||
if same_content "$rp" "$up"; then
|
||||
dup_same=$((dup_same+1))
|
||||
else
|
||||
dup_div=$((dup_div+1))
|
||||
emit "dup.cmd.$name" duplicates WARN \
|
||||
"/$name is DIVERGENT: repo and ~/.claude copies differ (harness may run the wrong one)" \
|
||||
"Reconcile: diff \"$rp\" \"$up\" then make ~/.claude/commands/$name.md match the repo (or remove it)"
|
||||
fi
|
||||
done
|
||||
else
|
||||
for rp in "$repo_dir"/*/; do
|
||||
[ -d "$rp" ] || continue
|
||||
name="$(basename "$rp")"
|
||||
up="$user_dir/$name"
|
||||
[ -d "$up" ] || continue
|
||||
[ "$rp" -ef "$up" ] && continue # symlinked dir - cannot drift
|
||||
# Only compare when BOTH have a SKILL.md; otherwise not comparable
|
||||
# (script-only / *.md-only skills) - skip rather than miscount.
|
||||
if [ -f "$rp/SKILL.md" ] && [ -f "$up/SKILL.md" ]; then
|
||||
if same_content "$rp/SKILL.md" "$up/SKILL.md"; then
|
||||
dup_same=$((dup_same+1))
|
||||
else
|
||||
dup_div=$((dup_div+1))
|
||||
emit "dup.skill.$name" duplicates WARN \
|
||||
"skill '$name' is DIVERGENT: repo and ~/.claude SKILL.md differ" \
|
||||
"Reconcile ~/.claude/skills/$name with the repo copy (or remove the user-level one)"
|
||||
fi
|
||||
fi
|
||||
done
|
||||
fi
|
||||
if [ "$dup_div" -eq 0 ] && [ "$dup_same" -gt 0 ]; then
|
||||
emit "dup.$kind" duplicates INFO \
|
||||
"$dup_same $kind exist in BOTH repo and ~/.claude (identical now, but a redundant copy that can drift)" \
|
||||
"Consider a single source of truth for $kind to prevent future divergence"
|
||||
elif [ "$dup_div" -eq 0 ] && [ "$dup_same" -eq 0 ]; then
|
||||
emit "dup.$kind" duplicates PASS "no duplicate $kind across roots"
|
||||
fi
|
||||
done
|
||||
}
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# CHECK: rogue memories that contradict settings/identity.
|
||||
# Deterministic core only: index integrity + a conservative, manifest-declared
|
||||
# set of contradiction patterns evaluated against this machine's identity. The
|
||||
# SEMANTIC contradiction pass (reasoning over all memories vs identity/settings)
|
||||
# is a judgment task and is delegated to the model in SKILL.md, not grep.
|
||||
# ---------------------------------------------------------------------------
|
||||
check_memory() {
|
||||
local mdir="$REPO_ROOT/.claude/memory" idx="$REPO_ROOT/.claude/memory/MEMORY.md"
|
||||
if [ ! -d "$mdir" ]; then
|
||||
emit memory.dir memory WARN "no .claude/memory directory" "Expected the shared memory store; restore via /sync"
|
||||
return
|
||||
fi
|
||||
if [ ! -f "$idx" ]; then
|
||||
emit memory.index memory WARN "MEMORY.md index missing" "Create .claude/memory/MEMORY.md (the loaded index)"
|
||||
else
|
||||
# orphan detection: every *.md (except MEMORY.md) should be referenced in the index
|
||||
local f base orphans=0
|
||||
for f in "$mdir"/*.md; do
|
||||
[ -f "$f" ] || continue
|
||||
base="$(basename "$f")"
|
||||
[ "$base" = "MEMORY.md" ] && continue
|
||||
if ! grep -qF "$base" "$idx" 2>/dev/null; then
|
||||
orphans=$((orphans+1))
|
||||
fi
|
||||
done
|
||||
if [ "$orphans" -gt 0 ]; then
|
||||
emit memory.orphans memory WARN "$orphans memory file(s) not referenced in MEMORY.md (orphaned)" \
|
||||
"Run /memory-dream or add the missing index lines"
|
||||
else
|
||||
emit memory.index memory PASS "MEMORY.md index present; no orphaned memory files"
|
||||
fi
|
||||
fi
|
||||
|
||||
# Manifest-declared contradiction patterns. Each entry:
|
||||
# { when_field, when_equals, grep, why } - only evaluated when this
|
||||
# machine's identity.<when_field> == when_equals, so a pattern fires only
|
||||
# where it is actually a contradiction (e.g. prescribing python3 on a `py` box).
|
||||
# NB: fields are read via @tsv, so when_equals/grep MUST NOT contain tab chars.
|
||||
local has; has="$(jq -r '(.memory.contradiction_patterns // []) | length' "$MANIFEST" 2>/dev/null)"
|
||||
if [ "${has:-0}" -gt 0 ] 2>/dev/null; then
|
||||
local wf we gx why hits
|
||||
while IFS=$'\t' read -r wf we gx why; do
|
||||
[ -n "$wf" ] || continue
|
||||
if [ "$(idfield ".$wf")" = "$we" ]; then
|
||||
hits="$(grep -rliE "$gx" "$mdir" 2>/dev/null | grep -vF 'MEMORY.md' | head -5 | tr '\n' ' ')"
|
||||
if [ -n "$hits" ]; then
|
||||
emit "memory.contradiction.$wf" memory WARN \
|
||||
"memory may contradict identity.$wf=$we ($why): $hits" \
|
||||
"Review the listed memory file(s); correct or delete if they prescribe the wrong behaviour for this machine"
|
||||
fi
|
||||
fi
|
||||
done < <(jq -r '(.memory.contradiction_patterns // [])[] | [.when_field, .when_equals, .grep, .why] | @tsv' "$MANIFEST")
|
||||
fi
|
||||
}
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Build the census JSON from accumulated results
|
||||
# ---------------------------------------------------------------------------
|
||||
build_census() {
|
||||
local fails warns grade
|
||||
fails="$(jq -s '[.[]|select(.status=="FAIL")]|length' "$RESULTS_FILE")"
|
||||
warns="$(jq -s '[.[]|select(.status=="WARN")]|length' "$RESULTS_FILE")"
|
||||
if [ "$fails" -gt 0 ]; then grade="RED"; elif [ "$warns" -gt 0 ]; then grade="AMBER"; else grade="GREEN"; fi
|
||||
|
||||
jq -s \
|
||||
--arg host "$HOST" --arg session "$SESSION" --arg platform "$PLATFORM" --arg arch "$ARCH" \
|
||||
--arg grade "$grade" --arg ts "$RUN_TS" \
|
||||
--arg mver "$(jq -r '.schema_version' "$MANIFEST")" \
|
||||
'{
|
||||
host:$host, session:$session, platform:$platform, arch:$arch,
|
||||
grade:$grade, generated_at:$ts, manifest_version:$mver,
|
||||
summary: { pass:([.[]|select(.status=="PASS")]|length),
|
||||
warn:([.[]|select(.status=="WARN")]|length),
|
||||
fail:([.[]|select(.status=="FAIL")]|length),
|
||||
info:([.[]|select(.status=="INFO")]|length) },
|
||||
results: .
|
||||
}' "$RESULTS_FILE"
|
||||
}
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Human report
|
||||
# ---------------------------------------------------------------------------
|
||||
print_report() {
|
||||
local census="$1" grade
|
||||
grade="$(echo "$census" | jq -r .grade)"
|
||||
echo ""
|
||||
echo "============================================================"
|
||||
echo " ClaudeTools self-check - $HOST ($PLATFORM/$ARCH)"
|
||||
echo " Grade: $grade $(echo "$census" | jq -r '.summary | "PASS \(.pass) WARN \(.warn) FAIL \(.fail) INFO \(.info)"')"
|
||||
echo " Manifest: $(echo "$census" | jq -r .manifest_version) (provisional) $RUN_TS"
|
||||
echo "============================================================"
|
||||
# FAIL then WARN then INFO; PASS summarized per category
|
||||
echo "$census" | jq -r '
|
||||
def mark(s): if s=="FAIL" then "[FAIL]" elif s=="WARN" then "[WARN]"
|
||||
elif s=="INFO" then "[INFO]" elif s=="SKIP" then "[SKIP]" else "[ OK ]" end;
|
||||
(.results | map(select(.status=="FAIL"))) as $f
|
||||
| (.results | map(select(.status=="WARN"))) as $w
|
||||
| (.results | map(select(.status=="INFO"))) as $i
|
||||
| (if ($f|length)>0 then "\nFAILURES:" else empty end),
|
||||
($f[] | " [FAIL] \(.category)/\(.id): \(.detail)" + (if .fix!="" then "\n fix: \(.fix)" else "" end)),
|
||||
(if ($w|length)>0 then "\nWARNINGS:" else empty end),
|
||||
($w[] | " [WARN] \(.category)/\(.id): \(.detail)" + (if .fix!="" then "\n fix: \(.fix)" else "" end)),
|
||||
(if ($i|length)>0 then "\nINFO / capability:" else empty end),
|
||||
($i[] | " [INFO] \(.detail)")
|
||||
'
|
||||
# per-category PASS counts
|
||||
echo ""
|
||||
echo "PASS by category:"
|
||||
echo "$census" | jq -r '.results | map(select(.status=="PASS")) | group_by(.category)[] | " \(.[0].category): \(length) ok"'
|
||||
echo "============================================================"
|
||||
}
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Publish census to coord API.
|
||||
# The coord API uses the PATH-PARAM form: PUT /api/coord/components/{pk}/{comp}
|
||||
# with a body of {state, version, notes, updated_by} (the body form 405s).
|
||||
# The component segment must be slash-free (a slash 404s, even URL-encoded), so
|
||||
# the per-machine component is "selfcheck_<host>" (NOT "selfcheck/<host>").
|
||||
# ---------------------------------------------------------------------------
|
||||
COMPONENT="selfcheck_$HOST"
|
||||
publish_census() {
|
||||
local census="$1" grade compact body path
|
||||
grade="$(echo "$census" | jq -r .grade)"
|
||||
compact="$(echo "$census" | jq -c .)"
|
||||
path="/api/coord/components/claudetools/$COMPONENT"
|
||||
body="$(jq -nc --arg state "$grade" \
|
||||
--arg ver "$(jq -r '.schema_version' "$MANIFEST")" --arg notes "$compact" --arg by "$SESSION" \
|
||||
'{state:$state, version:$ver, notes:$notes, updated_by:$by}')"
|
||||
if curl -s -m 8 -X PUT "$API$path" -H "Content-Type: application/json" -d "$body" >/dev/null 2>&1; then
|
||||
echo "[OK] Published census to coord: component $COMPONENT = $grade"
|
||||
else
|
||||
# softfail per coordination protocol - queue the SAME path/body so a
|
||||
# later /sync drain replays a request that actually works.
|
||||
local q="$REPO_ROOT/.claude/coord-queue.jsonl"
|
||||
jq -nc --arg path "$path" --argjson b "$body" --arg ts "$RUN_TS" \
|
||||
'{ts:$ts, method:"PUT", path:$path, body:$b}' >> "$q" 2>/dev/null
|
||||
echo "[WARN] coord unreachable; census queued to .claude/coord-queue.jsonl"
|
||||
fi
|
||||
}
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Subcommand: fanout - request all instances to run /self-check --publish
|
||||
# ---------------------------------------------------------------------------
|
||||
do_fanout() {
|
||||
local subj body payload
|
||||
subj="[self-check] Fleet census + self-remediation request"
|
||||
body="On THIS machine: (1) run /self-check ; (2) apply the suggested fix commands it prints for any FAIL/WARN - fix your OWN machine, locally, with your operator present (nobody fixes you remotely) ; (3) re-run /self-check to confirm GREEN ; (4) run /self-check --publish to report your census (component selfcheck_<host>) to coord. The check is read-only; only --publish writes (your census only). Requested by $SESSION at $RUN_TS."
|
||||
payload="$(jq -nc --arg from "$SESSION" --arg subj "$subj" --arg body "$body" \
|
||||
'{from_session:$from, to_session:"ALL_SESSIONS", project_key:"claudetools", subject:$subj, body:$body}')"
|
||||
if curl -s -m 8 -X POST "$API/api/coord/messages" -H "Content-Type: application/json" -d "$payload" >/dev/null 2>&1; then
|
||||
echo "[OK] Broadcast census request to ALL_SESSIONS."
|
||||
else
|
||||
echo "[ERROR] Failed to broadcast (coord unreachable)." >&2
|
||||
exit 1
|
||||
fi
|
||||
}
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Subcommand: aggregate - read all published censuses, build fleet view
|
||||
# ---------------------------------------------------------------------------
|
||||
do_aggregate() {
|
||||
local comps
|
||||
comps="$(curl -s -m 8 "$API/api/coord/components?project_key=claudetools" 2>/dev/null)"
|
||||
if [ -z "$comps" ]; then echo "[ERROR] coord unreachable." >&2; exit 1; fi
|
||||
# The coord API returns {states:[...], total:N}; each row's grade is .state and
|
||||
# the full census JSON is in .notes. Keep selfcheck_* rows with parseable notes.
|
||||
# (.components / bare-array kept as defensive fallbacks.)
|
||||
local censuses
|
||||
censuses="$(echo "$comps" | jq -c '
|
||||
( .states? // .components? // (if type=="array" then . else [] end) ) as $rows
|
||||
| ($rows // [])
|
||||
| map(select((.component? // "") | startswith("selfcheck")))
|
||||
| map(.notes | try fromjson catch empty)
|
||||
' 2>/dev/null)"
|
||||
local n; n="$(echo "$censuses" | jq 'length' 2>/dev/null || echo 0)"
|
||||
if [ "${n:-0}" -eq 0 ]; then
|
||||
echo "No published censuses found yet. Run 'self-check.sh fanout', then have each machine run /self-check --publish."
|
||||
return
|
||||
fi
|
||||
echo "============================================================"
|
||||
echo " Fleet census: $n machine(s) reporting"
|
||||
echo "============================================================"
|
||||
echo "$censuses" | jq -r '.[] | " \(.grade)\t\(.host)\t\(.platform)/\(.arch)\tP\(.summary.pass) W\(.summary.warn) F\(.summary.fail)\t\(.generated_at)"' | column -t -s$'\t' 2>/dev/null \
|
||||
|| echo "$censuses" | jq -r '.[] | " \(.grade) \(.host) \(.platform)/\(.arch) P\(.summary.pass) W\(.summary.warn) F\(.summary.fail)"'
|
||||
|
||||
echo ""
|
||||
echo "Proposed baseline (intersection = required everywhere; symmetric diff = capability-gated):"
|
||||
# Tools present on every machine vs only some, derived from tool.* PASS results.
|
||||
echo "$censuses" | jq -r '
|
||||
[ .[] | { host:.host, tools:( .results | map(select((.id|startswith("tool."))) | select(.status=="PASS") | (.id|sub("^tool.";""))) ) } ] as $m
|
||||
| ($m|length) as $count
|
||||
| ([ $m[].tools[] ] | unique) as $all
|
||||
| " tools on ALL \($count): " + ( [ $all[] | . as $t | select( ([ $m[] | select(.tools|index($t)) ]|length) == $count ) ] | join(", ") ),
|
||||
" tools on SOME only: " + ( [ $all[] | . as $t | select( ([ $m[] | select(.tools|index($t)) ]|length) < $count ) ] | join(", ") )
|
||||
' 2>/dev/null
|
||||
echo ""
|
||||
echo "Machines that must self-remediate (RED/AMBER) - each fixes ITSELF, then re-runs + re-publishes:"
|
||||
local needfix
|
||||
needfix="$(echo "$censuses" | jq -r '
|
||||
.[] | select(.grade!="GREEN")
|
||||
| " \(.host) [\(.grade)] should run, in order:\n"
|
||||
+ ( [ .results[] | select(.status=="FAIL" or .status=="WARN") | select(.fix!="")
|
||||
| " - \(.fix)" ] | join("\n") )
|
||||
+ "\n then: /self-check --publish"
|
||||
' 2>/dev/null)"
|
||||
if [ -n "$needfix" ]; then
|
||||
echo "$needfix"
|
||||
else
|
||||
echo " (none - whole fleet is GREEN)"
|
||||
fi
|
||||
echo "============================================================"
|
||||
echo "We do NOT fix remote machines. Relay each machine's fix list to its operator;"
|
||||
echo "they self-remediate locally, re-run /self-check, and re-publish until GREEN."
|
||||
echo "Once the fleet is reporting consistently, ratify baseline/manifest.json with Mike."
|
||||
}
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Main
|
||||
# ---------------------------------------------------------------------------
|
||||
# RUN_TS is passed in by the caller (SKILL.md instructs a real UTC stamp);
|
||||
# fall back to `date` if available so the script is runnable standalone.
|
||||
RUN_TS="${SELFCHECK_TS:-$(date -u +%Y-%m-%dT%H:%M:%SZ 2>/dev/null || echo unknown)}"
|
||||
|
||||
MODE="${1:-report}"
|
||||
case "$MODE" in
|
||||
fanout) do_fanout; exit 0 ;;
|
||||
aggregate) do_aggregate; exit 0 ;;
|
||||
esac
|
||||
|
||||
# run all checks
|
||||
check_identity
|
||||
check_tools
|
||||
check_capability_tier
|
||||
check_files
|
||||
check_settings_hooks
|
||||
check_git
|
||||
check_skills_commands
|
||||
check_duplicates
|
||||
check_memory
|
||||
check_vault
|
||||
check_connectivity
|
||||
|
||||
CENSUS="$(build_census)"
|
||||
|
||||
case "$MODE" in
|
||||
--json) echo "$CENSUS" ;;
|
||||
--publish) print_report "$CENSUS"; publish_census "$CENSUS" ;;
|
||||
report|*) print_report "$CENSUS" ;;
|
||||
esac
|
||||
|
||||
# exit code reflects grade for scripting (0 GREEN, 1 AMBER, 2 RED)
|
||||
GR="$(echo "$CENSUS" | jq -r .grade)"
|
||||
case "$GR" in GREEN) exit 0 ;; AMBER) exit 1 ;; RED) exit 2 ;; *) exit 0 ;; esac
|
||||
Reference in New Issue
Block a user