Files
claudetools/specs/claudetools-harness-optimization/shape.md
Mike Swanson bb7dc147ca spec: ClaudeTools harness optimization (3-way reviewed)
Optimize the harness (not projects) for accuracy/completeness with context pressure
as a first-class constraint; token efficiency secondary. Authored as a Claude+Grok+
Gemini review (see review-3way.md): P0 reliability footguns (submodule-safe sync,
serialized/staged wiki synthesis, syncro SSOT, warn-only guard), P1 context diet
(one-line registry descriptions, CLAUDE CORE/EXTENDED, thin save/sync), P2 delegation
re-tune, P3 knowledge tiering. Adds harness VERSION marker + OOB recovery as rollout
safety. Python port split to a separate future spec.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 07:32:45 -07:00

5.0 KiB

ClaudeTools Harness Optimization — Shape & Constraints

Spec created: 2026-06-08. Authored as a 3-way review (Claude + Grok + Gemini); see review-3way.md for each model's independent input and the points of consensus.

What this is

Optimize the ClaudeTools harness itself (not its projects) for day-to-day MSP ops. Primary goal: accuracy + completeness, with context pressure as a first-class constraint. Secondary: token efficiency. The harness is strong on capability surface (skills, coord, wiki, vault) but weak on two axes the three reviewers independently agreed on:

  • Distributed-write safetygit add -A + a git submodule + concurrent multi-machine wiki recompiles produce footguns (stale gitlinks, committed conflict markers).
  • Context economics — large static prose (the skill/command registry with paragraph descriptions, CLAUDE.md ~4.9k tokens, multi-K command bodies) is injected repeatedly, crowding attention and inflating tokens.

Scope (prioritized)

P0 — Reliability footguns (do first):

  1. Submodule-safe sync.sh (eliminate the manual detach-to-pin/restore dance).
  2. Decouple wiki recompile from /save; serialize synthesis (one machine / lock) and merge-not-rewrite.
  3. Single source of truth for command-vs-standard (the /syncro add_line_item vs the timer_entry standard contradiction on a billing path).
  4. Pre-commit guard: block conflict markers, unflagged submodule gitlink bumps, and vault-plaintext patterns.

P1 — Context diet:

  1. Trim every skill/command registry description to one line; move detail into the SKILL.md body (loads only on activation).
  2. CLAUDE.md -> lean CORE (identity, security/vault, routing, delegation thresholds; target <=1.5k tokens) + on-demand EXTENDED.
  3. Make /save and /sync thin wrappers around deterministic scripts that return a short summary, not multi-K instruction bodies injected on every invoke.
  4. Shard the largest command bodies (syncro 1139 lines, rmm 783) into a core flow
    • lazy reference sections.

P2 — Coordinator/sub-agent model: re-tune from "delegate almost everything; code/db = ALWAYS delegate" to "act directly by default; delegate only for high-volume output, broad blast radius (>3 files/layers), genuine domain shift, or parallel work."

P3 — Knowledge architecture: tier logs (write-only archive) / wiki (synthesized, dated, cited, on-demand) / memory (small ephemeral + quirks, ~40-60 files). Recall order log-index -> wiki -> memory; add a session-logs/INDEX; graduate finished-project memories into wiki/standards.

Deterministic Bash fixes (in this spec): jq -n --arg for ALL shell JSON (fixes the post-bot-alert.sh 400), a now-phoenix timestamp helper (never TZ=America/Phoenix date on Git Bash), skills-deploy parity (shipped 2026-06-08).

P4 — Code-over-prompt Python port (MOVED OUT — separate future spec): porting the core harness (sync/save/hooks) to one cross-platform language is its own spec per the 3-way review; this spec lands the cheap deterministic Bash fixes above, not a rewrite.

What this is NOT (out of scope)

  • NOT touching any project (GuruRMM, GuruConnect, Dataforth, client work) — harness only.
  • NOT changing the SOPS vault, the coordination API, or Gitea hosting.
  • NOT removing any capability — skills/commands stay; only their CONTEXT footprint and reliability change.
  • P4 (Python port) is a stretch goal; v1 may land the deterministic fixes in Bash and port incrementally. Do not block P0-P3 on a full rewrite.
  • NOT a redesign of the multi-user identity or work-mode model.

Hard constraints

  • Changes are FLEET-WIDE (every machine syncs them) — each change must be safe to roll out via the existing sync, reversible, and announced via coord broadcast.
  • Must not break the existing /save /sync contracts mid-flight (other sessions rely on them); change behavior behind the same entry points.
  • No secrets in context or logs; keep the SOPS-vault discipline.
  • NO EMOJIS; ASCII markers.

Key decisions

  • Accuracy first, tokens second (per Mike): never compress safety rules, billing SSOT, or vault discipline to save tokens. Cut redundancy and lazy-load reference, not guardrails.
  • Determinism over prompt for mechanical workflows: anything needing >500 tokens of English to run correctly should be a script the model calls.
  • Single source of truth: a fact lives in exactly one layer (CORE rule, standard, command, wiki, or memory) and is referenced, not duplicated.
  • Serialize distributed writes: wiki synthesis and any shared-file regeneration run behind a coord lock or on one designated node.

Priority

P1 (this is the operator's daily experience). Reviewer consensus: P0 removes real operational risk; P1 buys back the context headroom that makes everything else land.

Provenance

Independent reviews by Claude (firsthand session friction), Grok (xAI), and Gemini (Google) on 2026-06-08 converged on the same P0/P1/P2/P3 priorities. Full per-model input + consensus/dissent in review-3way.md.