Task 3/3a of the harness-optimization spec. Mike confirmed normal billing uses add_line_item; timers stay available only for explicit outlier requests, never the normal loop. Rewrote time-entry-protocol.md to defer to the /syncro command (SSOT for billing mechanics) and state timers are outlier-only; aligned the command's two absolute "no timers" lines. Contradiction removed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
8.9 KiB
ClaudeTools Harness Optimization — Implementation Plan (3-way reviewed)
Spec created: 2026-06-08 | Status: not started | Priority: P1 Revised per
review-3way.md(Claude + Grok + Gemini). Fleet-wide changes: every task must be reversible, version-marked, and rollout-announced via coord.
Task 0: Commit this spec
Commit specs/claudetools-harness-optimization/ (incl. review-3way.md).
Task 0.5: Harness VERSION marker [safety prerequisite — Grok]
Files: .claude/harness/VERSION (new), CLAUDE.md CORE reference.
- Single-line version string, bumped on every fleet-visible behavioral change (each P0 task). Sessions read it to detect partial rollout and give clear errors ("your save is on the pre-Task-2 wiki path"). Enables conditional behavior during heterogeneous rollout.
Task 0.6: Out-of-band fleet recovery [safety prerequisite — Gemini]
Files: .claude/scripts/force-pull-raw.sh (new), distributed BEFORE any P0 change.
- A raw, un-intercepted
git fetch + reset --hard origin/mainrescue path so a node can recover if a P0 change brickssync.sh/save.sh(the normal update vector). No hooks, no guards, minimal deps. Broadcast its existence to the fleet first.
P0 — Reliability footguns
Task 1+4 (co-developed on one branch — both touch staging)
Task 1: Submodule-safe sync [P0]
Files: .claude/scripts/sync.sh.
- Replace blanket
git add -Awith submodule-aware staging: stage changes, then unstage any submodule gitlink unless--with-submodulesis passed (git reset -q HEAD projects/msp-tools/guru-rmm projects/msp-tools/guru-connect). Operator must NEVER need to detach the submodule to its pin before/save. - Verify: submodule on
main(ahead of pin) ->/synccommits log/wiki/config but NOT a gitlink bump; clean status after.
Task 4: Pre-commit guard — WARN-ONLY first [P0, highest risk]
Files: .claude/scripts/harness-guard.sh (new, standalone), later wired into sync.sh.
- Detects: conflict markers in tracked files; unflagged submodule gitlink change; vault-plaintext / obvious-secret patterns.
- Mode = WARN-ONLY for the first 7-14 days fleet-wide (logs to a local file, never
blocks). Explicit
SKIP_HARNESS_GUARD=1bypass that is logged + coord-broadcast when used. Test matrix: synthetic bad inputs (incl. legitimate<<<<<<<in test files, new SOPS edge cases) + the last N real commits, zero false positives, BEFORE promoting to fatal. Promote to blocking only after a clean warn window. - Verify: warn fires on a planted conflict marker; clean commits pass; bypass logs.
Task 2: Decouple + stage wiki synthesis [P0]
Files: .claude/commands/save.md (drop inline recompile), .claude/commands/wiki-compile.md,
new staged runner.
- Sequence (do NOT excise the old path until the new one is proven):
- Build the serialized synthesis path: gated by a coord lock (
lock claim claudetools wiki/<slug>) with orphan-lock eviction (TTL + dead-session reclaim) and a coord-down fallback (skip + warn, never block the save). - Staging pattern (Gemini): the job writes proposed article updates to
.claude/wiki_staging/<slug>.md+ emits a notification; an operator or a focused validation agent reviews the diff and applies it — NO blind background auto-merge into the live article (avoids progressive structural corruption). Preserve Patterns/History; updatelast_verified+source_logs. - Prove it under concurrent saves from 2 machines (no conflict, one clean update).
- ONLY THEN remove the inline Sonnet recompile from
/save(save = write log + push).
- Build the serialized synthesis path: gated by a coord lock (
- Verify: two near-simultaneous
/saves -> no wiki conflict markers; staged diff reviewed + applied cleanly.
Task 3a: Confirm billing SSOT truth [RESOLVED 2026-06-08]
- DECISION (Mike): normal billing uses
add_line_item; timers are NOT part of any normal billing loop. The timer capability stays available ONLY for explicit outlier requests ("if I ever ask for a timer"). So the/syncrocommand is the operational truth; thetime-entry-protocolstandard's "always timer_entry" was wrong.
Task 3: Single source of truth — Syncro billing [DONE 2026-06-08]
Files: .claude/commands/syncro.md, .claude/standards/syncro/time-entry-protocol.md.
- Done: rewrote the standard to DEFER to the /syncro command (the SSOT for billing mechanics) and to state timers are outlier-only; aligned the command's two absolute "no timers" lines to "outlier, only on explicit request." One rule, two consistent references; the contradiction is gone.
- Still TODO (later): a "command-restates-standard" lint in
/self-check.
P1 — Context diet
Task 5a: "Where is text injected" inventory [P1 prerequisite — Grok]
- One-off pass cataloguing every repeated injection: CLAUDE.md, the skill/command registry, system-reminders, MEMORY.md, CONTEXT.md loads, hook-injected prose, big command bodies. Measure tokens each. This scopes Tasks 5/6/7 so the diet does not under-deliver. One-off measurement (Grok: skip ongoing telemetry).
Task 5: One-line registry descriptions [P1] (biggest token win)
Files: every .claude/skills/*/SKILL.md description:; every .claude/commands/*.md
description.
- Truncate each registry description to ONE sentence; move verbose triggers/examples into the SKILL.md body (loads only on activation). Worst first: rmm-audit, gc-audit, packetdial, bitdefender, mailprotector, remediation-tool.
- Verify: registry-list injection size cut >=50% (measured per Task 5a); skills still invoke.
Task 6: CLAUDE.md -> CORE + EXTENDED [P1]
Files: .claude/CLAUDE.md (lean CORE) + .claude/CLAUDE_EXTENDED.md (on-demand).
- CORE (<=1.5k tokens): identity/multi-user, security/vault, model routing, delegation thresholds (Task 9), coord-message-in-system-reminder rule, harness VERSION ref. Everything else -> EXTENDED, loaded when relevant. Add a discoverability pointer so on-demand detail is findable (Grok).
- Verify: CORE <=1.5k tokens; nothing safety-critical lost.
Task 7: Thin /save and /sync [P1]
Files: .claude/commands/save.md, .claude/commands/sync.md.
- The command file becomes "run the script, report the summary"; the procedure lives in
sync.sh/save.sh. Keep any LLM/narrative step OUT of the critical mechanical path (Grok) — log narrative generation stays optional/separate, never a new failure domain in the hot save/sync loop. - Verify: command-body token size cut substantially; behavior unchanged; save/sync never blocked by an LLM call.
Task 8: Shard big command bodies [P2 — DEFERRED]
- Deferred past P1 (Grok): one-line descriptions + CLAUDE split + thin wrappers deliver most of the win. Revisit only with a real lazy-load gate (Gemini: otherwise models glob-load the appendices anyway). Not in the P0/P1 cut.
P2 — Coordinator model
Task 9: Re-tune delegation defaults [P2, gated on harness VERSION]
Files: CLAUDE.md CORE.
- "Act directly by default; delegate only for (a) high-volume tool output, (b) blast radius >3 files across layers, (c) genuine domain shift, (d) parallelizable independent work." Keep single-agent-for-coupled-tasks. Behavioral-contract change -> gate on fleet VERSION so behavior is consistent once all machines sync.
P3 — Knowledge architecture
Task 10: Tier the knowledge layers [P3]
Files: CLAUDE.md (recall order), session-logs/YYYY-MM/ reorg, memory-dream schedule,
wiki frontmatter.
- Recall order: scoped log grep (date-foldered) -> wiki (CONTEXT.md/article) -> memory.
Reorganize logs into
session-logs/YYYY-MM/and recall via scoped grep — no monolithic INDEX (Gemini: it recreates bloat). - Memory:
memory-dream --apply-safeon a schedule;/graduatefinished-project memory into wiki/standards then delete. (No arbitrary file-count target — Grok.) - Wiki:
last_verified+source_logs; merge via the Task 2 staging pattern. - Verify: a recovery query hits the right layer first; no memory restates a wiki article.
Deferred to a SEPARATE future spec (not this one)
- Python port of the core harness (sync/save/hooks). This spec lands only the
deterministic Bash fixes that are cheap + high-value:
jq -n --argfor ALL shell JSON (fixespost-bot-alert.sh), anow-phoenixtimestamp helper (PowerShell/Python, neverTZ=America/Phoenix date), and the skills-deploy parity (already shipped 2026-06-08). A full port is its own spec (Gemini).
Task 12: Verification + fleet rollout (consolidated)
- Single P0/P1 rollout checklist (replaces per-task verify repetition):
/self-checkgreen; real/savedry-run on a scratch branch; bumpVERSION; coord-broadcast; other machines pull + re-sync. - Fleet-level success criteria: 2 consecutive days across ALL machines with zero conflict markers and zero submodule gitlink accidents; registry-injection tokens down >=50%; CLAUDE CORE <=1.5k.
- Add harness smoke tests to
/self-check: sync/save dry-run, skills-deploy dir non-empty, registry-size budget, guard self-test.