# ClaudeTools Harness Optimization — Implementation Plan (3-way reviewed) > Spec created: 2026-06-08 | Status: P0 in progress | Priority: P1 > Revised per `review-3way.md` (Claude + Grok + Gemini). Fleet-wide changes: every task > must be reversible, version-marked, and rollout-announced via coord. ## Rollout status — 2026-06-08 (harness VERSION 1.1.0) - [DONE] Task 0.5 — `.claude/harness/VERSION` (+ CHANGELOG). - [DONE] Task 0.6 — `.claude/scripts/force-pull-raw.sh` OOB recovery (dry-run default). - [DONE] Task 1 — submodule-safe `sync.sh` (unstages gitlinks unless `--with-submodules`); verified live: ran `/sync` with the submodule ahead of pin -> commit had NO gitlink bump, pointer stayed at the pin. The manual detach-to-pin dance is eliminated. - [DONE] Task 3/3a — Syncro billing SSOT (add_line_item normal; timers outlier-only). - [DONE] Task 4 — `harness-guard.sh` wired pre-commit, WARN-ONLY; verified: detects conflict markers / unencrypted sops / private keys; warns+exit0 by default, exit1 under `HARNESS_GUARD_FATAL=1`; `SKIP_HARNESS_GUARD=1` bypasses. - [DONE] Task 2 — wiki synthesis DECOUPLED from /save (v1.2.0); /wiki-compile now serialized (per-article coord lock, TTL evict, coord-down=warn+proceed) + STAGED (`.claude/wiki_staging/` -> review diff -> apply -> commit -> release). No blind merge. - **P0 COMPLETE (2026-06-08, VERSION 1.2.0).** Fleet broadcast sent. - [DONE] Task 6 — CLAUDE.md split: lean CORE (~1.2k tokens, was ~4.9k) + `CLAUDE_EXTENDED.md` (full original preserved verbatim). All safety-critical rules retained in CORE (v1.3.0). - [DONE] Task 9 (P2) — delegation re-tune folded into CORE: act-directly-by-default; delegate only for (a) high-volume output, (b) blast radius >3 files across layers, (c) domain shift, (d) parallel. Single-agent-for-coupled-tasks kept. - [DONE] Task 5 — one-line registry descriptions on the 8 biggest skills (remediation-tool, gc-audit, packetdial, memory-dream, human-flow, self-check, impeccable, mailprotector); skill-description injection ~3320 -> ~2123 tokens (~36% cut), keyword triggers preserved, frontmatter still valid. - [DONE] Task 7 — thinned `/save` + `/sync` command bodies: they now point to `sync.sh` as the single source instead of re-documenting its internals; load-bearing LLM-judgment parts (Phase 0 save-vs-sync, cross-user note display, exit-75 reporting) kept verbatim. Mechanical sync never depends on an LLM step. - [DONE] Deterministic Bash fixes (from the "deferred to separate spec" carve-out): `now-phoenix.sh` helper added (fixed UTC-7 epoch math — no `TZ=America/Phoenix` Git-Bash bug); `post-bot-alert.sh` already builds JSON via `jq -nc --arg` (verified). Full Python port remains a separate future spec. - [DONE] Task 10 (P3) — knowledge tiering: recall order already in CORE (wiki -> CONTEXT/log -> coord); adopted `session-logs/YYYY-MM/` as a FORWARD convention for new logs (verified: nested path still matches the Phase 0 log-detection grep and wiki slug derivation). Existing flat logs left in place — recall grep covers both layouts; NO mass migration (would break wiki `source_logs` refs for low payoff). - **P1+P2+P3 COMPLETE (2026-06-08, VERSION 1.4.0).** - REMAINING (ops follow-ups, not blocking): - Promote the warn-only guard (Task 4) to FATAL after a clean warn window (check `.claude/harness/guard.log` across the fleet). - Schedule `memory-dream --apply-safe` per-machine (deliberate per-box ops setup; default is read-only/proposals, so unattended --apply-safe is a judgment call left to the operator). - Optional later: migrate existing flat session-logs into month folders if/when the flat dir becomes unwieldy (low priority; grep recall already covers both). - Task 8 (shard big command bodies) stays DEFERRED per Grok/Gemini. ## Task 0: Commit this spec Commit `specs/claudetools-harness-optimization/` (incl. `review-3way.md`). ## Task 0.5: Harness VERSION marker [safety prerequisite — Grok] Files: `.claude/harness/VERSION` (new), CLAUDE.md CORE reference. - Single-line version string, bumped on every fleet-visible behavioral change (each P0 task). Sessions read it to detect partial rollout and give clear errors ("your save is on the pre-Task-2 wiki path"). Enables conditional behavior during heterogeneous rollout. ## Task 0.6: Out-of-band fleet recovery [safety prerequisite — Gemini] Files: `.claude/scripts/force-pull-raw.sh` (new), distributed BEFORE any P0 change. - A raw, un-intercepted `git fetch + reset --hard origin/main` rescue path so a node can recover if a P0 change bricks `sync.sh`/`save.sh` (the normal update vector). No hooks, no guards, minimal deps. Broadcast its existence to the fleet first. --- ## P0 — Reliability footguns ## Task 1+4 (co-developed on one branch — both touch staging) ### Task 1: Submodule-safe sync [P0] Files: `.claude/scripts/sync.sh`. - Replace blanket `git add -A` with submodule-aware staging: stage changes, then unstage any submodule gitlink unless `--with-submodules` is passed (`git reset -q HEAD projects/msp-tools/guru-rmm projects/msp-tools/guru-connect`). Operator must NEVER need to detach the submodule to its pin before `/save`. - Verify: submodule on `main` (ahead of pin) -> `/sync` commits log/wiki/config but NOT a gitlink bump; clean status after. ### Task 4: Pre-commit guard — WARN-ONLY first [P0, highest risk] Files: `.claude/scripts/harness-guard.sh` (new, standalone), later wired into `sync.sh`. - Detects: conflict markers in tracked files; unflagged submodule gitlink change; vault-plaintext / obvious-secret patterns. - **Mode = WARN-ONLY** for the first 7-14 days fleet-wide (logs to a local file, never blocks). Explicit `SKIP_HARNESS_GUARD=1` bypass that is logged + coord-broadcast when used. Test matrix: synthetic bad inputs (incl. legitimate `<<<<<<<` in test files, new SOPS edge cases) + the last N real commits, zero false positives, BEFORE promoting to fatal. Promote to blocking only after a clean warn window. - Verify: warn fires on a planted conflict marker; clean commits pass; bypass logs. ## Task 2: Decouple + stage wiki synthesis [P0] Files: `.claude/commands/save.md` (drop inline recompile), `.claude/commands/wiki-compile.md`, new staged runner. - Sequence (do NOT excise the old path until the new one is proven): 1. Build the serialized synthesis path: gated by a coord lock (`lock claim claudetools wiki/`) with **orphan-lock eviction** (TTL + dead-session reclaim) and a coord-down fallback (skip + warn, never block the save). 2. **Staging pattern** (Gemini): the job writes proposed article updates to `.claude/wiki_staging/.md` + emits a notification; an operator or a focused validation agent reviews the diff and applies it — NO blind background auto-merge into the live article (avoids progressive structural corruption). Preserve Patterns/History; update `last_verified` + `source_logs`. 3. Prove it under concurrent saves from 2 machines (no conflict, one clean update). 4. ONLY THEN remove the inline Sonnet recompile from `/save` (save = write log + push). - Verify: two near-simultaneous `/save`s -> no wiki conflict markers; staged diff reviewed + applied cleanly. ## Task 3a: Confirm billing SSOT truth [RESOLVED 2026-06-08] - DECISION (Mike): normal billing uses `add_line_item`; timers are NOT part of any normal billing loop. The timer capability stays available ONLY for explicit outlier requests ("if I ever ask for a timer"). So the `/syncro` command is the operational truth; the `time-entry-protocol` standard's "always timer_entry" was wrong. ## Task 3: Single source of truth — Syncro billing [DONE 2026-06-08] Files: `.claude/commands/syncro.md`, `.claude/standards/syncro/time-entry-protocol.md`. - Done: rewrote the standard to DEFER to the /syncro command (the SSOT for billing mechanics) and to state timers are outlier-only; aligned the command's two absolute "no timers" lines to "outlier, only on explicit request." One rule, two consistent references; the contradiction is gone. - Still TODO (later): a "command-restates-standard" lint in `/self-check`. --- ## P1 — Context diet ## Task 5a: "Where is text injected" inventory [P1 prerequisite — Grok] - One-off pass cataloguing every repeated injection: CLAUDE.md, the skill/command registry, system-reminders, MEMORY.md, CONTEXT.md loads, hook-injected prose, big command bodies. Measure tokens each. This scopes Tasks 5/6/7 so the diet does not under-deliver. One-off measurement (Grok: skip ongoing telemetry). ## Task 5: One-line registry descriptions [P1] (biggest token win) Files: every `.claude/skills/*/SKILL.md` `description:`; every `.claude/commands/*.md` description. - Truncate each registry description to ONE sentence; move verbose triggers/examples into the SKILL.md body (loads only on activation). Worst first: rmm-audit, gc-audit, packetdial, bitdefender, mailprotector, remediation-tool. - Verify: registry-list injection size cut >=50% (measured per Task 5a); skills still invoke. ## Task 6: CLAUDE.md -> CORE + EXTENDED [P1] Files: `.claude/CLAUDE.md` (lean CORE) + `.claude/CLAUDE_EXTENDED.md` (on-demand). - CORE (<=1.5k tokens): identity/multi-user, security/vault, model routing, delegation thresholds (Task 9), coord-message-in-system-reminder rule, harness VERSION ref. Everything else -> EXTENDED, loaded when relevant. Add a discoverability pointer so on-demand detail is findable (Grok). - Verify: CORE <=1.5k tokens; nothing safety-critical lost. ## Task 7: Thin `/save` and `/sync` [P1] Files: `.claude/commands/save.md`, `.claude/commands/sync.md`. - The command file becomes "run the script, report the summary"; the procedure lives in `sync.sh`/`save.sh`. **Keep any LLM/narrative step OUT of the critical mechanical path** (Grok) — log narrative generation stays optional/separate, never a new failure domain in the hot save/sync loop. - Verify: command-body token size cut substantially; behavior unchanged; save/sync never blocked by an LLM call. ## Task 8: Shard big command bodies [P2 — DEFERRED] - Deferred past P1 (Grok): one-line descriptions + CLAUDE split + thin wrappers deliver most of the win. Revisit only with a real lazy-load gate (Gemini: otherwise models glob-load the appendices anyway). Not in the P0/P1 cut. --- ## P2 — Coordinator model ## Task 9: Re-tune delegation defaults [P2, gated on harness VERSION] Files: CLAUDE.md CORE. - "Act directly by default; delegate only for (a) high-volume tool output, (b) blast radius >3 files across layers, (c) genuine domain shift, (d) parallelizable independent work." Keep single-agent-for-coupled-tasks. Behavioral-contract change -> gate on fleet VERSION so behavior is consistent once all machines sync. --- ## P3 — Knowledge architecture ## Task 10: Tier the knowledge layers [P3] Files: CLAUDE.md (recall order), `session-logs/YYYY-MM/` reorg, `memory-dream` schedule, wiki frontmatter. - Recall order: scoped log grep (date-foldered) -> wiki (CONTEXT.md/article) -> memory. Reorganize logs into `session-logs/YYYY-MM/` and recall via scoped grep — **no monolithic INDEX** (Gemini: it recreates bloat). - Memory: `memory-dream --apply-safe` on a schedule; `/graduate` finished-project memory into wiki/standards then delete. (No arbitrary file-count target — Grok.) - Wiki: `last_verified` + `source_logs`; merge via the Task 2 staging pattern. - Verify: a recovery query hits the right layer first; no memory restates a wiki article. --- ## Deferred to a SEPARATE future spec (not this one) - **Python port of the core harness** (sync/save/hooks). This spec lands only the deterministic Bash fixes that are cheap + high-value: `jq -n --arg` for ALL shell JSON (fixes `post-bot-alert.sh`), a `now-phoenix` timestamp helper (PowerShell/Python, never `TZ=America/Phoenix date`), and the skills-deploy parity (already shipped 2026-06-08). A full port is its own spec (Gemini). --- ## Task 12: Verification + fleet rollout (consolidated) - Single P0/P1 rollout checklist (replaces per-task verify repetition): `/self-check` green; real `/save` dry-run on a scratch branch; bump `VERSION`; coord-broadcast; other machines pull + re-sync. - Fleet-level success criteria: 2 consecutive days across ALL machines with zero conflict markers and zero submodule gitlink accidents; registry-injection tokens down >=50%; CLAUDE CORE <=1.5k. - Add harness smoke tests to `/self-check`: sync/save dry-run, skills-deploy dir non-empty, registry-size budget, guard self-test.