spec: ClaudeTools harness optimization (3-way reviewed)
Optimize the harness (not projects) for accuracy/completeness with context pressure as a first-class constraint; token efficiency secondary. Authored as a Claude+Grok+ Gemini review (see review-3way.md): P0 reliability footguns (submodule-safe sync, serialized/staged wiki synthesis, syncro SSOT, warn-only guard), P1 context diet (one-line registry descriptions, CLAUDE CORE/EXTENDED, thin save/sync), P2 delegation re-tune, P3 knowledge tiering. Adds harness VERSION marker + OOB recovery as rollout safety. Python port split to a separate future spec. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
98
specs/claudetools-harness-optimization/shape.md
Normal file
98
specs/claudetools-harness-optimization/shape.md
Normal file
@@ -0,0 +1,98 @@
|
||||
# ClaudeTools Harness Optimization — Shape & Constraints
|
||||
|
||||
> Spec created: 2026-06-08. Authored as a 3-way review (Claude + Grok + Gemini); see
|
||||
> `review-3way.md` for each model's independent input and the points of consensus.
|
||||
|
||||
## What this is
|
||||
|
||||
Optimize the **ClaudeTools harness itself** (not its projects) for day-to-day MSP ops.
|
||||
Primary goal: **accuracy + completeness**, with **context pressure** as a first-class
|
||||
constraint. Secondary: **token efficiency**. The harness is strong on capability
|
||||
surface (skills, coord, wiki, vault) but weak on two axes the three reviewers
|
||||
independently agreed on:
|
||||
|
||||
- **Distributed-write safety** — `git add -A` + a git submodule + concurrent multi-machine
|
||||
wiki recompiles produce footguns (stale gitlinks, committed conflict markers).
|
||||
- **Context economics** — large static prose (the skill/command registry with
|
||||
paragraph descriptions, CLAUDE.md ~4.9k tokens, multi-K command bodies) is injected
|
||||
repeatedly, crowding attention and inflating tokens.
|
||||
|
||||
## Scope (prioritized)
|
||||
|
||||
**P0 — Reliability footguns (do first):**
|
||||
1. Submodule-safe `sync.sh` (eliminate the manual detach-to-pin/restore dance).
|
||||
2. Decouple wiki recompile from `/save`; serialize synthesis (one machine / lock) and
|
||||
merge-not-rewrite.
|
||||
3. Single source of truth for command-vs-standard (the `/syncro` add_line_item vs the
|
||||
timer_entry standard contradiction on a billing path).
|
||||
4. Pre-commit guard: block conflict markers, unflagged submodule gitlink bumps, and
|
||||
vault-plaintext patterns.
|
||||
|
||||
**P1 — Context diet:**
|
||||
1. Trim every skill/command registry description to one line; move detail into the
|
||||
`SKILL.md` body (loads only on activation).
|
||||
2. CLAUDE.md -> lean CORE (identity, security/vault, routing, delegation thresholds;
|
||||
target <=1.5k tokens) + on-demand EXTENDED.
|
||||
3. Make `/save` and `/sync` thin wrappers around deterministic scripts that return a
|
||||
short summary, not multi-K instruction bodies injected on every invoke.
|
||||
4. Shard the largest command bodies (`syncro` 1139 lines, `rmm` 783) into a core flow
|
||||
+ lazy reference sections.
|
||||
|
||||
**P2 — Coordinator/sub-agent model:** re-tune from "delegate almost everything;
|
||||
code/db = ALWAYS delegate" to "act directly by default; delegate only for high-volume
|
||||
output, broad blast radius (>3 files/layers), genuine domain shift, or parallel work."
|
||||
|
||||
**P3 — Knowledge architecture:** tier logs (write-only archive) / wiki (synthesized,
|
||||
dated, cited, on-demand) / memory (small ephemeral + quirks, ~40-60 files). Recall
|
||||
order log-index -> wiki -> memory; add a `session-logs/INDEX`; graduate finished-project
|
||||
memories into wiki/standards.
|
||||
|
||||
**Deterministic Bash fixes (in this spec):** `jq -n --arg` for ALL shell JSON (fixes the
|
||||
`post-bot-alert.sh` 400), a `now-phoenix` timestamp helper (never `TZ=America/Phoenix
|
||||
date` on Git Bash), skills-deploy parity (shipped 2026-06-08).
|
||||
|
||||
**P4 — Code-over-prompt Python port (MOVED OUT — separate future spec):** porting the
|
||||
core harness (sync/save/hooks) to one cross-platform language is its own spec per the
|
||||
3-way review; this spec lands the cheap deterministic Bash fixes above, not a rewrite.
|
||||
|
||||
## What this is NOT (out of scope)
|
||||
|
||||
- NOT touching any project (GuruRMM, GuruConnect, Dataforth, client work) — harness only.
|
||||
- NOT changing the SOPS vault, the coordination API, or Gitea hosting.
|
||||
- NOT removing any capability — skills/commands stay; only their CONTEXT footprint and
|
||||
reliability change.
|
||||
- P4 (Python port) is a stretch goal; v1 may land the deterministic fixes in Bash and
|
||||
port incrementally. Do not block P0-P3 on a full rewrite.
|
||||
- NOT a redesign of the multi-user identity or work-mode model.
|
||||
|
||||
## Hard constraints
|
||||
|
||||
- Changes are FLEET-WIDE (every machine syncs them) — each change must be safe to roll
|
||||
out via the existing sync, reversible, and announced via coord broadcast.
|
||||
- Must not break the existing `/save` `/sync` contracts mid-flight (other sessions rely
|
||||
on them); change behavior behind the same entry points.
|
||||
- No secrets in context or logs; keep the SOPS-vault discipline.
|
||||
- NO EMOJIS; ASCII markers.
|
||||
|
||||
## Key decisions
|
||||
|
||||
- **Accuracy first, tokens second** (per Mike): never compress safety rules, billing
|
||||
SSOT, or vault discipline to save tokens. Cut redundancy and lazy-load reference, not
|
||||
guardrails.
|
||||
- **Determinism over prompt** for mechanical workflows: anything needing >500 tokens of
|
||||
English to run correctly should be a script the model calls.
|
||||
- **Single source of truth**: a fact lives in exactly one layer (CORE rule, standard,
|
||||
command, wiki, or memory) and is referenced, not duplicated.
|
||||
- **Serialize distributed writes**: wiki synthesis and any shared-file regeneration run
|
||||
behind a coord lock or on one designated node.
|
||||
|
||||
## Priority
|
||||
|
||||
P1 (this is the operator's daily experience). Reviewer consensus: P0 removes real
|
||||
operational risk; P1 buys back the context headroom that makes everything else land.
|
||||
|
||||
## Provenance
|
||||
|
||||
Independent reviews by Claude (firsthand session friction), Grok (xAI), and Gemini
|
||||
(Google) on 2026-06-08 converged on the same P0/P1/P2/P3 priorities. Full per-model
|
||||
input + consensus/dissent in `review-3way.md`.
|
||||
Reference in New Issue
Block a user