spec: ClaudeTools harness optimization (3-way reviewed)
Optimize the harness (not projects) for accuracy/completeness with context pressure as a first-class constraint; token efficiency secondary. Authored as a Claude+Grok+ Gemini review (see review-3way.md): P0 reliability footguns (submodule-safe sync, serialized/staged wiki synthesis, syncro SSOT, warn-only guard), P1 context diet (one-line registry descriptions, CLAUDE CORE/EXTENDED, thin save/sync), P2 delegation re-tune, P3 knowledge tiering. Adds harness VERSION marker + OOB recovery as rollout safety. Python port split to a separate future spec. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
161
specs/claudetools-harness-optimization/plan.md
Normal file
161
specs/claudetools-harness-optimization/plan.md
Normal file
@@ -0,0 +1,161 @@
|
||||
# ClaudeTools Harness Optimization — Implementation Plan (3-way reviewed)
|
||||
|
||||
> Spec created: 2026-06-08 | Status: not started | Priority: P1
|
||||
> Revised per `review-3way.md` (Claude + Grok + Gemini). Fleet-wide changes: every task
|
||||
> must be reversible, version-marked, and rollout-announced via coord.
|
||||
|
||||
## Task 0: Commit this spec
|
||||
Commit `specs/claudetools-harness-optimization/` (incl. `review-3way.md`).
|
||||
|
||||
## Task 0.5: Harness VERSION marker [safety prerequisite — Grok]
|
||||
Files: `.claude/harness/VERSION` (new), CLAUDE.md CORE reference.
|
||||
- Single-line version string, bumped on every fleet-visible behavioral change (each P0
|
||||
task). Sessions read it to detect partial rollout and give clear errors ("your save is
|
||||
on the pre-Task-2 wiki path"). Enables conditional behavior during heterogeneous rollout.
|
||||
|
||||
## Task 0.6: Out-of-band fleet recovery [safety prerequisite — Gemini]
|
||||
Files: `.claude/scripts/force-pull-raw.sh` (new), distributed BEFORE any P0 change.
|
||||
- A raw, un-intercepted `git fetch + reset --hard origin/main` rescue path so a node can
|
||||
recover if a P0 change bricks `sync.sh`/`save.sh` (the normal update vector). No hooks,
|
||||
no guards, minimal deps. Broadcast its existence to the fleet first.
|
||||
|
||||
---
|
||||
|
||||
## P0 — Reliability footguns
|
||||
|
||||
## Task 1+4 (co-developed on one branch — both touch staging)
|
||||
|
||||
### Task 1: Submodule-safe sync [P0]
|
||||
Files: `.claude/scripts/sync.sh`.
|
||||
- Replace blanket `git add -A` with submodule-aware staging: stage changes, then unstage
|
||||
any submodule gitlink unless `--with-submodules` is passed
|
||||
(`git reset -q HEAD projects/msp-tools/guru-rmm projects/msp-tools/guru-connect`).
|
||||
Operator must NEVER need to detach the submodule to its pin before `/save`.
|
||||
- Verify: submodule on `main` (ahead of pin) -> `/sync` commits log/wiki/config but NOT
|
||||
a gitlink bump; clean status after.
|
||||
|
||||
### Task 4: Pre-commit guard — WARN-ONLY first [P0, highest risk]
|
||||
Files: `.claude/scripts/harness-guard.sh` (new, standalone), later wired into `sync.sh`.
|
||||
- Detects: conflict markers in tracked files; unflagged submodule gitlink change;
|
||||
vault-plaintext / obvious-secret patterns.
|
||||
- **Mode = WARN-ONLY** for the first 7-14 days fleet-wide (logs to a local file, never
|
||||
blocks). Explicit `SKIP_HARNESS_GUARD=1` bypass that is logged + coord-broadcast when
|
||||
used. Test matrix: synthetic bad inputs (incl. legitimate `<<<<<<<` in test files, new
|
||||
SOPS edge cases) + the last N real commits, zero false positives, BEFORE promoting to
|
||||
fatal. Promote to blocking only after a clean warn window.
|
||||
- Verify: warn fires on a planted conflict marker; clean commits pass; bypass logs.
|
||||
|
||||
## Task 2: Decouple + stage wiki synthesis [P0]
|
||||
Files: `.claude/commands/save.md` (drop inline recompile), `.claude/commands/wiki-compile.md`,
|
||||
new staged runner.
|
||||
- Sequence (do NOT excise the old path until the new one is proven):
|
||||
1. Build the serialized synthesis path: gated by a coord lock (`lock claim claudetools
|
||||
wiki/<slug>`) with **orphan-lock eviction** (TTL + dead-session reclaim) and a
|
||||
coord-down fallback (skip + warn, never block the save).
|
||||
2. **Staging pattern** (Gemini): the job writes proposed article updates to
|
||||
`.claude/wiki_staging/<slug>.md` + emits a notification; an operator or a focused
|
||||
validation agent reviews the diff and applies it — NO blind background auto-merge
|
||||
into the live article (avoids progressive structural corruption). Preserve
|
||||
Patterns/History; update `last_verified` + `source_logs`.
|
||||
3. Prove it under concurrent saves from 2 machines (no conflict, one clean update).
|
||||
4. ONLY THEN remove the inline Sonnet recompile from `/save` (save = write log + push).
|
||||
- Verify: two near-simultaneous `/save`s -> no wiki conflict markers; staged diff
|
||||
reviewed + applied cleanly.
|
||||
|
||||
## Task 3a: Confirm billing SSOT truth [P0 prerequisite — Grok]
|
||||
- Hard prerequisite before Task 3 edits: confirm with Mike which is operational truth —
|
||||
`/syncro` command's `add_line_item` (current practice) vs the `time-entry-protocol`
|
||||
standard's `timer_entry`. Picking wrong bakes a permanent contradictory money rule.
|
||||
|
||||
## Task 3: Single source of truth — Syncro billing [P0]
|
||||
Files: `.claude/commands/syncro.md`, `.claude/standards/syncro/time-entry-protocol.md`.
|
||||
- Per Task 3a's decision, make one the source and the other a reference; delete the
|
||||
contradictory/duplicated prose. Add a "command-restates-standard" lint to `/self-check`.
|
||||
- Verify: one rule, two consistent references.
|
||||
|
||||
---
|
||||
|
||||
## P1 — Context diet
|
||||
|
||||
## Task 5a: "Where is text injected" inventory [P1 prerequisite — Grok]
|
||||
- One-off pass cataloguing every repeated injection: CLAUDE.md, the skill/command
|
||||
registry, system-reminders, MEMORY.md, CONTEXT.md loads, hook-injected prose, big
|
||||
command bodies. Measure tokens each. This scopes Tasks 5/6/7 so the diet does not
|
||||
under-deliver. One-off measurement (Grok: skip ongoing telemetry).
|
||||
|
||||
## Task 5: One-line registry descriptions [P1] (biggest token win)
|
||||
Files: every `.claude/skills/*/SKILL.md` `description:`; every `.claude/commands/*.md`
|
||||
description.
|
||||
- Truncate each registry description to ONE sentence; move verbose triggers/examples into
|
||||
the SKILL.md body (loads only on activation). Worst first: rmm-audit, gc-audit,
|
||||
packetdial, bitdefender, mailprotector, remediation-tool.
|
||||
- Verify: registry-list injection size cut >=50% (measured per Task 5a); skills still invoke.
|
||||
|
||||
## Task 6: CLAUDE.md -> CORE + EXTENDED [P1]
|
||||
Files: `.claude/CLAUDE.md` (lean CORE) + `.claude/CLAUDE_EXTENDED.md` (on-demand).
|
||||
- CORE (<=1.5k tokens): identity/multi-user, security/vault, model routing, delegation
|
||||
thresholds (Task 9), coord-message-in-system-reminder rule, harness VERSION ref.
|
||||
Everything else -> EXTENDED, loaded when relevant. Add a discoverability pointer so
|
||||
on-demand detail is findable (Grok).
|
||||
- Verify: CORE <=1.5k tokens; nothing safety-critical lost.
|
||||
|
||||
## Task 7: Thin `/save` and `/sync` [P1]
|
||||
Files: `.claude/commands/save.md`, `.claude/commands/sync.md`.
|
||||
- The command file becomes "run the script, report the summary"; the procedure lives in
|
||||
`sync.sh`/`save.sh`. **Keep any LLM/narrative step OUT of the critical mechanical path**
|
||||
(Grok) — log narrative generation stays optional/separate, never a new failure domain
|
||||
in the hot save/sync loop.
|
||||
- Verify: command-body token size cut substantially; behavior unchanged; save/sync never
|
||||
blocked by an LLM call.
|
||||
|
||||
## Task 8: Shard big command bodies [P2 — DEFERRED]
|
||||
- Deferred past P1 (Grok): one-line descriptions + CLAUDE split + thin wrappers deliver
|
||||
most of the win. Revisit only with a real lazy-load gate (Gemini: otherwise models
|
||||
glob-load the appendices anyway). Not in the P0/P1 cut.
|
||||
|
||||
---
|
||||
|
||||
## P2 — Coordinator model
|
||||
|
||||
## Task 9: Re-tune delegation defaults [P2, gated on harness VERSION]
|
||||
Files: CLAUDE.md CORE.
|
||||
- "Act directly by default; delegate only for (a) high-volume tool output, (b) blast
|
||||
radius >3 files across layers, (c) genuine domain shift, (d) parallelizable independent
|
||||
work." Keep single-agent-for-coupled-tasks. Behavioral-contract change -> gate on
|
||||
fleet VERSION so behavior is consistent once all machines sync.
|
||||
|
||||
---
|
||||
|
||||
## P3 — Knowledge architecture
|
||||
|
||||
## Task 10: Tier the knowledge layers [P3]
|
||||
Files: CLAUDE.md (recall order), `session-logs/YYYY-MM/` reorg, `memory-dream` schedule,
|
||||
wiki frontmatter.
|
||||
- Recall order: scoped log grep (date-foldered) -> wiki (CONTEXT.md/article) -> memory.
|
||||
Reorganize logs into `session-logs/YYYY-MM/` and recall via scoped grep — **no
|
||||
monolithic INDEX** (Gemini: it recreates bloat).
|
||||
- Memory: `memory-dream --apply-safe` on a schedule; `/graduate` finished-project memory
|
||||
into wiki/standards then delete. (No arbitrary file-count target — Grok.)
|
||||
- Wiki: `last_verified` + `source_logs`; merge via the Task 2 staging pattern.
|
||||
- Verify: a recovery query hits the right layer first; no memory restates a wiki article.
|
||||
|
||||
---
|
||||
|
||||
## Deferred to a SEPARATE future spec (not this one)
|
||||
- **Python port of the core harness** (sync/save/hooks). This spec lands only the
|
||||
deterministic Bash fixes that are cheap + high-value: `jq -n --arg` for ALL shell JSON
|
||||
(fixes `post-bot-alert.sh`), a `now-phoenix` timestamp helper (PowerShell/Python, never
|
||||
`TZ=America/Phoenix date`), and the skills-deploy parity (already shipped 2026-06-08).
|
||||
A full port is its own spec (Gemini).
|
||||
|
||||
---
|
||||
|
||||
## Task 12: Verification + fleet rollout (consolidated)
|
||||
- Single P0/P1 rollout checklist (replaces per-task verify repetition): `/self-check`
|
||||
green; real `/save` dry-run on a scratch branch; bump `VERSION`; coord-broadcast; other
|
||||
machines pull + re-sync.
|
||||
- Fleet-level success criteria: 2 consecutive days across ALL machines with zero conflict
|
||||
markers and zero submodule gitlink accidents; registry-injection tokens down >=50%;
|
||||
CLAUDE CORE <=1.5k.
|
||||
- Add harness smoke tests to `/self-check`: sync/save dry-run, skills-deploy dir
|
||||
non-empty, registry-size budget, guard self-test.
|
||||
56
specs/claudetools-harness-optimization/references.md
Normal file
56
specs/claudetools-harness-optimization/references.md
Normal file
@@ -0,0 +1,56 @@
|
||||
# ClaudeTools Harness Optimization — Code References
|
||||
|
||||
Harness files (the ClaudeTools repo `.claude/` + root), with their role and current cost.
|
||||
|
||||
## Always / repeatedly in context (the diet targets)
|
||||
- `.claude/CLAUDE.md` — 351 lines / ~4.9k tokens. Injected every session; observed
|
||||
re-injected mid-session as a full system-reminder. -> Task 6 (CORE/EXTENDED split).
|
||||
- Skill/command REGISTRY — each `.claude/skills/*/SKILL.md` `description:` frontmatter +
|
||||
each `.claude/commands/*.md` description. The full list injects repeatedly (~4x in one
|
||||
session). Paragraph-length offenders: rmm-audit (669 ln), gc-audit (669), b2 (261),
|
||||
bitdefender, packetdial, mailprotector. -> Task 5 (one-line descriptions).
|
||||
- `.claude/memory/MEMORY.md` — 114 lines, auto-loaded each session. -> Task 10.
|
||||
|
||||
## Injected on invoke (command bodies)
|
||||
- `.claude/commands/syncro.md` — 1139 lines. -> Task 8 (shard) + Task 3 (SSOT vs standard).
|
||||
- `.claude/commands/rmm.md` — 783 lines. -> Task 8.
|
||||
- `.claude/commands/save.md`, `.claude/commands/sync.md` — multi-K bodies injected every
|
||||
use. -> Task 7 (thin wrappers).
|
||||
|
||||
## Workflow / write-safety
|
||||
- `.claude/scripts/sync.sh` — `git add -A` auto-commit + rebase + push + deploy
|
||||
commands/skills to global + vault sync. The `git add -A` + submodule = the gitlink
|
||||
footgun. Phase 5b (commands->global) + the new Phase 5c (skills->global, added
|
||||
2026-06-08). -> Task 1, Task 4, Task 11.
|
||||
- `/save` flow — writes a session log, then a Sonnet wiki recompile, then `sync.sh`. The
|
||||
inline recompile is the 3-way-conflict source. -> Task 2.
|
||||
- `.claude/commands/wiki-compile.md` (403 ln) — full re-synthesis; should merge + lock.
|
||||
-> Task 2.
|
||||
- The git submodule: `projects/msp-tools/guru-rmm` (and `guru-connect`). Parent pins a
|
||||
commit that intentionally lags `main`. -> Task 1.
|
||||
|
||||
## Standards / source-of-truth
|
||||
- `.claude/standards/` (19 files, ~12k tokens, on-demand) + `index.yml`. The
|
||||
`syncro/time-entry-protocol.md` vs `commands/syncro.md` contradiction. -> Task 3.
|
||||
|
||||
## Hooks
|
||||
- `.claude/settings.json` hooks: PreToolUse, SessionStart, UserPromptSubmit.
|
||||
`.claude/hooks/` (3 entries, incl. `block-backslash-winpath.sh`). -> Task 4
|
||||
(pre-commit guard), Task 12 (SessionStart smoke checks).
|
||||
|
||||
## Knowledge layers
|
||||
- `wiki/` — 57 articles, compiled from session logs (the synthesized layer).
|
||||
- `session-logs/` — 283 logs (write-only archive). -> Task 10 (INDEX).
|
||||
- `.claude/memory/` — 97 files + MEMORY.md. `memory-dream` skill lints/dedupes. -> Task 10.
|
||||
|
||||
## Cross-OS fragility (Task 11 targets)
|
||||
- `TZ=America/Phoenix date` silently returns UTC on Git Bash (mis-dates logs).
|
||||
- `.claude/scripts/post-bot-alert.sh` — 400s on em-dash/`$` (no JSON escaping).
|
||||
- `/tmp` resolves differently in the Write tool vs Git Bash (documented; heredocs used).
|
||||
- The skills-deploy gap (global `~/.claude/skills` was empty until Phase 5c).
|
||||
|
||||
## Existing tooling to lean on
|
||||
- `coord` skill / API — inter-session locks + broadcast (use for serialized wiki writes,
|
||||
fleet rollout announcements).
|
||||
- `self-check` skill — harness conformance; extend with smoke tests (Task 12).
|
||||
- `memory-dream` skill — memory dedupe/lint (Task 10).
|
||||
87
specs/claudetools-harness-optimization/review-3way.md
Normal file
87
specs/claudetools-harness-optimization/review-3way.md
Normal file
@@ -0,0 +1,87 @@
|
||||
# ClaudeTools Harness Optimization — 3-Way Review Record
|
||||
|
||||
Independent reviews on 2026-06-08: Claude (firsthand session friction), Grok (xAI),
|
||||
Gemini (Google). The analysis converged on the same P0/P1/P2/P3 priorities; the spec
|
||||
draft was then put back to Grok and Gemini for a review pass. This file records each
|
||||
model's input, the consensus, the dissent + how it was reconciled, and the resulting
|
||||
revisions applied to `shape.md`/`plan.md`.
|
||||
|
||||
## Consensus (all three, analysis phase)
|
||||
- Two core weaknesses: distributed-write safety (`git add -A` + submodule + concurrent
|
||||
wiki recompiles) and context economics (repeated injection of the skill/command
|
||||
registry, CLAUDE.md, big command bodies).
|
||||
- P0 = reliability footguns (submodule-safe sync, decouple/serialize wiki, SSOT for
|
||||
command-vs-standard, guards). P1 = context diet (one-line descriptions, CLAUDE
|
||||
CORE/EXTENDED, thin save/sync). P2 = re-tune over-delegation. P3 = tier the knowledge
|
||||
layers. Accuracy first; never compress safety/billing/vault rules for tokens.
|
||||
|
||||
## Grok — spec review highlights
|
||||
- GAPS: no fleet rollout/escape-hatch under heterogeneity; no harness VERSION marker;
|
||||
coord-lock failure modes unanalyzed; incomplete "where is text injected" inventory;
|
||||
discoverability after the diet; no fleet-level success criteria.
|
||||
- SEQUENCING: Task 3 "confirm with Mike" must be a prerequisite, not inside the edit
|
||||
task. Task 2 — build the gated/serialized path and prove it under concurrent load
|
||||
BEFORE excising the inline recompile. Task 1+4 touch the same staging path -> same
|
||||
branch. Task 9 is a behavioral-contract change under heterogeneity.
|
||||
- HIGHEST RISK: Task 4 (pre-commit guard) can fail-closed and brick every machine's
|
||||
`/save`. De-risk: standalone script, WARN-ONLY for 7-14 days, `SKIP_HARNESS_GUARD=1`
|
||||
bypass (logged+broadcast), test matrix, wire to fatal only after a clean warn period.
|
||||
- CUT/SIMPLIFY: defer Task 8 (shard syncro/rmm) past P1; consolidate per-task verifies
|
||||
into one rollout checklist; drop ongoing token telemetry (one-off before/after is
|
||||
enough); drop the arbitrary 40-60 memory number.
|
||||
- ADD: Task 0.5 — `.claude/harness/VERSION` + CORE reference, bumped on every
|
||||
fleet-visible behavioral change, so sessions can detect partial rollout.
|
||||
|
||||
## Gemini — spec review highlights
|
||||
- GAPS: coord-lock lifecycle/orphan-eviction undefined; the lazy-load mechanism for
|
||||
sharded command files is undefined (models will glob-load them anyway).
|
||||
- SEQUENCING: put the guard (Task 4) in EARLY so you don't commit the very footguns you
|
||||
are fixing while developing Tasks 1-2.
|
||||
- HIGHEST RISK: Task 2 "wiki merge-not-rewrite" via automation -> progressive structural
|
||||
corruption / duplicated headers / hallucinated artifacts. De-risk: a STAGING pattern
|
||||
— background job writes proposed updates to `.claude/wiki_staging/` + notifies; an
|
||||
operator or focused validation agent reviews + applies the diff.
|
||||
- CUT/SIMPLIFY: cut Task 11 (Python port) from this spec -> a separate future spec; fix
|
||||
Bash determinism first. Simplify Task 10 — NO monolithic `session-logs/INDEX.md`
|
||||
(recreates the bloat); use date-folder structure (`session-logs/YYYY-MM/`) + scoped
|
||||
grep instead.
|
||||
- ADD: an out-of-band recovery script (`force-pull-raw.sh`) distributed BEFORE P0, so a
|
||||
node can rescue itself if a P0 change bricks `sync.sh`/`save.sh` (the update vector).
|
||||
|
||||
## Dissent + reconciliation
|
||||
1. **Pre-commit guard timing** — Grok: wire late, warn-only (it can brick saves).
|
||||
Gemini: put it in first (catch footguns during dev). RESOLVED: build it EARLY as a
|
||||
standalone script in **WARN-ONLY** mode (present during Task 1-2 dev, non-fatal),
|
||||
with a logged bypass; promote to fatal only after a clean warn period. Satisfies both.
|
||||
2. **Highest-risk task** — Grok: Task 4 (guard fail-closed). Gemini: Task 2 (wiki
|
||||
auto-merge corruption). RESOLVED: both are top risks; address both — guard is
|
||||
warn-only, and wiki uses the staging-review pattern (no blind auto-merge).
|
||||
3. **Session-log INDEX** — Claude proposed a monolithic INDEX; Gemini: that recreates
|
||||
bloat. RESOLVED: drop the monolithic index; reorganize logs into `YYYY-MM/` folders
|
||||
and recall via scoped grep.
|
||||
4. **Python port** — Grok accepted it as a P4 stretch; Gemini: cut to a separate spec.
|
||||
RESOLVED: this spec lands the deterministic Bash fixes (jq JSON, phoenix-date helper,
|
||||
skills-deploy parity) only; a full Python port is a SEPARATE future spec.
|
||||
|
||||
## Revisions applied to the spec (see plan.md)
|
||||
- NEW Task 0.5: harness `VERSION` marker (+ CORE reference) for partial-rollout detection.
|
||||
- NEW Task 0.6: OOB `force-pull-raw.sh` recovery script, distributed before any P0 change.
|
||||
- Task 3 split: Task 3a (confirm the billing SSOT truth with Mike) is a hard prerequisite.
|
||||
- Task 2 rewritten: staging pattern (`.claude/wiki_staging/` + review/apply) + prove
|
||||
under concurrent load + coord-lock lifecycle (orphan eviction, coord-down fallback)
|
||||
BEFORE removing the inline recompile.
|
||||
- Task 4 rewritten: standalone guard, WARN-ONLY first, bypass flag, test matrix; built
|
||||
early, co-developed with Task 1 on one branch, promoted to fatal only after a clean
|
||||
warn window.
|
||||
- Task 7 constraint: keep any LLM/narrative step OUT of the critical mechanical save/sync
|
||||
path (no new failure domain in the hot loop).
|
||||
- Task 8 (shard syncro/rmm) demoted to P2/deferred; needs a real lazy-load gate to be
|
||||
worth it.
|
||||
- Task 9 gated on fleet harness VERSION (behavioral-contract change).
|
||||
- Task 10: date-folder logs + scoped grep (no monolithic index); drop the 40-60 number;
|
||||
keep graduate + memory-dream.
|
||||
- Task 11 (Python port) moved OUT of this spec to a future one; deterministic Bash fixes
|
||||
stay (jq, phoenix-date, skills-deploy).
|
||||
- Verification consolidated into one P0/P1 rollout checklist with fleet-level success
|
||||
criteria (e.g., 2 consecutive days, zero conflict markers / submodule accidents).
|
||||
- Added a "where is text injected" inventory pass as a prerequisite to the diet (P1).
|
||||
98
specs/claudetools-harness-optimization/shape.md
Normal file
98
specs/claudetools-harness-optimization/shape.md
Normal file
@@ -0,0 +1,98 @@
|
||||
# ClaudeTools Harness Optimization — Shape & Constraints
|
||||
|
||||
> Spec created: 2026-06-08. Authored as a 3-way review (Claude + Grok + Gemini); see
|
||||
> `review-3way.md` for each model's independent input and the points of consensus.
|
||||
|
||||
## What this is
|
||||
|
||||
Optimize the **ClaudeTools harness itself** (not its projects) for day-to-day MSP ops.
|
||||
Primary goal: **accuracy + completeness**, with **context pressure** as a first-class
|
||||
constraint. Secondary: **token efficiency**. The harness is strong on capability
|
||||
surface (skills, coord, wiki, vault) but weak on two axes the three reviewers
|
||||
independently agreed on:
|
||||
|
||||
- **Distributed-write safety** — `git add -A` + a git submodule + concurrent multi-machine
|
||||
wiki recompiles produce footguns (stale gitlinks, committed conflict markers).
|
||||
- **Context economics** — large static prose (the skill/command registry with
|
||||
paragraph descriptions, CLAUDE.md ~4.9k tokens, multi-K command bodies) is injected
|
||||
repeatedly, crowding attention and inflating tokens.
|
||||
|
||||
## Scope (prioritized)
|
||||
|
||||
**P0 — Reliability footguns (do first):**
|
||||
1. Submodule-safe `sync.sh` (eliminate the manual detach-to-pin/restore dance).
|
||||
2. Decouple wiki recompile from `/save`; serialize synthesis (one machine / lock) and
|
||||
merge-not-rewrite.
|
||||
3. Single source of truth for command-vs-standard (the `/syncro` add_line_item vs the
|
||||
timer_entry standard contradiction on a billing path).
|
||||
4. Pre-commit guard: block conflict markers, unflagged submodule gitlink bumps, and
|
||||
vault-plaintext patterns.
|
||||
|
||||
**P1 — Context diet:**
|
||||
1. Trim every skill/command registry description to one line; move detail into the
|
||||
`SKILL.md` body (loads only on activation).
|
||||
2. CLAUDE.md -> lean CORE (identity, security/vault, routing, delegation thresholds;
|
||||
target <=1.5k tokens) + on-demand EXTENDED.
|
||||
3. Make `/save` and `/sync` thin wrappers around deterministic scripts that return a
|
||||
short summary, not multi-K instruction bodies injected on every invoke.
|
||||
4. Shard the largest command bodies (`syncro` 1139 lines, `rmm` 783) into a core flow
|
||||
+ lazy reference sections.
|
||||
|
||||
**P2 — Coordinator/sub-agent model:** re-tune from "delegate almost everything;
|
||||
code/db = ALWAYS delegate" to "act directly by default; delegate only for high-volume
|
||||
output, broad blast radius (>3 files/layers), genuine domain shift, or parallel work."
|
||||
|
||||
**P3 — Knowledge architecture:** tier logs (write-only archive) / wiki (synthesized,
|
||||
dated, cited, on-demand) / memory (small ephemeral + quirks, ~40-60 files). Recall
|
||||
order log-index -> wiki -> memory; add a `session-logs/INDEX`; graduate finished-project
|
||||
memories into wiki/standards.
|
||||
|
||||
**Deterministic Bash fixes (in this spec):** `jq -n --arg` for ALL shell JSON (fixes the
|
||||
`post-bot-alert.sh` 400), a `now-phoenix` timestamp helper (never `TZ=America/Phoenix
|
||||
date` on Git Bash), skills-deploy parity (shipped 2026-06-08).
|
||||
|
||||
**P4 — Code-over-prompt Python port (MOVED OUT — separate future spec):** porting the
|
||||
core harness (sync/save/hooks) to one cross-platform language is its own spec per the
|
||||
3-way review; this spec lands the cheap deterministic Bash fixes above, not a rewrite.
|
||||
|
||||
## What this is NOT (out of scope)
|
||||
|
||||
- NOT touching any project (GuruRMM, GuruConnect, Dataforth, client work) — harness only.
|
||||
- NOT changing the SOPS vault, the coordination API, or Gitea hosting.
|
||||
- NOT removing any capability — skills/commands stay; only their CONTEXT footprint and
|
||||
reliability change.
|
||||
- P4 (Python port) is a stretch goal; v1 may land the deterministic fixes in Bash and
|
||||
port incrementally. Do not block P0-P3 on a full rewrite.
|
||||
- NOT a redesign of the multi-user identity or work-mode model.
|
||||
|
||||
## Hard constraints
|
||||
|
||||
- Changes are FLEET-WIDE (every machine syncs them) — each change must be safe to roll
|
||||
out via the existing sync, reversible, and announced via coord broadcast.
|
||||
- Must not break the existing `/save` `/sync` contracts mid-flight (other sessions rely
|
||||
on them); change behavior behind the same entry points.
|
||||
- No secrets in context or logs; keep the SOPS-vault discipline.
|
||||
- NO EMOJIS; ASCII markers.
|
||||
|
||||
## Key decisions
|
||||
|
||||
- **Accuracy first, tokens second** (per Mike): never compress safety rules, billing
|
||||
SSOT, or vault discipline to save tokens. Cut redundancy and lazy-load reference, not
|
||||
guardrails.
|
||||
- **Determinism over prompt** for mechanical workflows: anything needing >500 tokens of
|
||||
English to run correctly should be a script the model calls.
|
||||
- **Single source of truth**: a fact lives in exactly one layer (CORE rule, standard,
|
||||
command, wiki, or memory) and is referenced, not duplicated.
|
||||
- **Serialize distributed writes**: wiki synthesis and any shared-file regeneration run
|
||||
behind a coord lock or on one designated node.
|
||||
|
||||
## Priority
|
||||
|
||||
P1 (this is the operator's daily experience). Reviewer consensus: P0 removes real
|
||||
operational risk; P1 buys back the context headroom that makes everything else land.
|
||||
|
||||
## Provenance
|
||||
|
||||
Independent reviews by Claude (firsthand session friction), Grok (xAI), and Gemini
|
||||
(Google) on 2026-06-08 converged on the same P0/P1/P2/P3 priorities. Full per-model
|
||||
input + consensus/dissent in `review-3way.md`.
|
||||
Reference in New Issue
Block a user