sync: auto-sync from GURU-5070 at 2026-06-05 20:02:53
Author: Mike Swanson Machine: GURU-5070 Timestamp: 2026-06-05 20:02:53
This commit is contained in:
@@ -0,0 +1,83 @@
|
||||
# 2026-06-05 — GuruConnect three-way review + multi-session git isolation
|
||||
|
||||
## User
|
||||
- **User:** Mike Swanson (mike)
|
||||
- **Machine:** GURU-5070
|
||||
- **Role:** admin
|
||||
|
||||
## Session Summary
|
||||
|
||||
Two major threads. First, ran an independent **three-way code review of the GuruConnect (GC) project** (Claude synthesis + Gemini 3.1 Pro + Grok 4.3). Started by answering whether GC's in-product audit feature is functional (verdict: the audit-event write path works and is viewable per-machine in the dashboard, but there's no global/searchable audit view and a few event types are stubbed). Then partitioned the GC codebase into three subsystems — server (42 Rust files), agent (34), dashboard (66 TS/TSX) — and ran Gemini and Grok over each via their `review-files` modes, six reviews in parallel to report files under `reports/review-2026-06-05/`. Synthesized everything into `SYNTHESIS-three-way.md` with Claude cross-checks, then a phased `REMEDIATION-PLAN.md`.
|
||||
|
||||
While kicking off those reviews I hit (again) the path-resolution gotcha in the agy/grok skills — `review-files` resolves relative paths only against CWD or `$CLAUDETOOLS_ROOT`, never a submodule subdir, so submodule-relative paths fail "file not found." Mike asked to document it; added a `[!WARNING]` callout to both SKILL.md files, fixed the misleading "repo-relative" table wording, and added inline GOTCHA comments at every resolution site in both scripts (committed `ec411f4`).
|
||||
|
||||
The review's top convergent finding (CRITICAL on all three surfaces, both models independently): **secrets/tokens in WebSocket URL query strings** (agent creds + dashboard viewer token). Pushed all review artifacts to the GC repo (`ded99c5`), wrote the remediation plan (P0–P5), and operationalized it: 6 coord todos (`project_key=guruconnect`) + 13 Gitea issues (#10–#22, one per CRITICAL/HIGH) — the latter blocked briefly by a **Gitea outage** (502, port 3000 down) that self-resolved.
|
||||
|
||||
Second thread: Mike asked how to isolate 3–4 concurrent Claude sessions so session data is captured and submodule **pointers stop stepping on each other** (a real collision happened earlier: a concurrent session's commits blocked a clean submodule-pointer bump). Laid out a layered plan (L1 explicit staging/lock, L2 sibling repos + junctions, L3 per-session worktrees) and implemented **L1**: a per-machine concurrency lock in `sync.sh` (atomic mkdir in `.git/`, exit 75 on contention, stale/dead-PID reclaim with re-verify-before-clear), mandatory per-session-unique log filenames, then extended the lock to **all** commit paths by extracting it into a sourceable lib (`sync-lock.sh`) used by `/scc` and `/checkpoint`, and finally a **best-effort coord visibility signal** (`git_sync_<machine>` component, syncing→idle/degraded). Every script change went Coding Agent → Code Review (which caught two real TOCTOU races and a residual race before they shipped). All verified live on the closing `/sync`.
|
||||
|
||||
## Key Decisions
|
||||
|
||||
- **GC review scoped by subsystem, models read files themselves.** Codebase (~1.2 MB) far exceeds CLI embed/argv limits, so used `review-files` (each model `read_file`s) over server/agent/dashboard rather than embedding diffs. Grok's server pass was split A/B because 42 files exceeded its read-file capacity (Cancelled + 256 KB embed-fallback cap < 548 KB server tree).
|
||||
- **The WS-query-string finding is partly by-design** — GC's `CLAUDE.md` mandates the viewer token in a query param. Both external models reject it, so the remediation's first step is changing that spec rule, not just code. Logged as remediation P1 / issue #10.
|
||||
- **Downgraded the bootstrap-secret finding from CRITICAL to HIGH** — GC server deploys on Linux, so the `0o600` chmod is effective; Grok's "Windows ACL no-op" framing doesn't apply to the deployed server. Residual risk is on-disk plaintext + the `info!` log fallback.
|
||||
- **Isolation L1 over literal "drop git add -A".** Mike approved "patch sync.sh/save to drop git add -A." Found that blind removal would regress untracked-file auto-capture (per `project_sync_script_bug` memory). Instead: serialize with a lock + per-session-unique filenames, keeping a now-serialized `add -A` as the catch-all. Full per-session commit isolation deferred to a tracked todo.
|
||||
- **Coord enforces nothing; the file lock does.** Coord is advisory, network-dependent, and softfail-offline, so it must not gate the git critical section. Used coord only for best-effort cross-machine *visibility*, never as a blocker — the sync never depends on coord being reachable.
|
||||
- **Lock keyed per-repo (`<repo>/.git/claudetools-sync.lock`)** so the shared lib mutually excludes with `sync.sh` in ClaudeTools and self-scopes in any project repo (lets `/checkpoint` work generically).
|
||||
- **`/scc` routed through `sync.sh`** rather than reimplementing rebase-safe push; accepted losing its custom `scc:` commit message in favour of one locked git path.
|
||||
- **Deferred the ClaudeTools submodule-pointer bump** for guru-connect — entangled with another live session's unpushed commits; left for auto-sync to reconcile rather than publishing another session's WIP.
|
||||
|
||||
## Problems Encountered
|
||||
|
||||
- **agy/grok `review-files` "file not found"** — file lists were submodule-relative but the driver ran from the reports dir. Fixed by regenerating lists with absolute paths; then documented the gotcha in both skills (the user's actual ask).
|
||||
- **Grok server review failed first pass** — 42 files Cancelled and the embed fallback (256 KB cap) skipped the 548 KB tree. Split the server list A/B (21+21); both then succeeded.
|
||||
- **Transient `ask-grok.sh` syntax error** during one background run — a running grok process parsed the script mid-Edit. Not a real bug (`bash -n` clean; comment-only edits).
|
||||
- **Gitea outage (502, `172.16.3.20:3000` down)** blocked issue creation. Agent correctly refused to fabricate issue numbers. Self-resolved minutes later (v1.25.2); re-ran and created #10–#22.
|
||||
- **Two real TOCTOU races in the sync lock**, caught by Code Review before shipping: (1) non-atomic `rm -rf`+`mkdir` stale recovery → fixed with atomic rename-aside; (2) residual window where a loser renames-away a winner's freshly-acquired live lock → fixed with re-verify-staleness-immediately-before-`mv` + a `sleep 1` busy-spin guard.
|
||||
- **SendMessage tool unavailable** to resume a paused background agent — spawned fresh agents instead.
|
||||
|
||||
## Configuration Changes
|
||||
|
||||
ClaudeTools repo (D:\claudetools):
|
||||
- `ec411f4` — `.claude/skills/agy/SKILL.md`, `.claude/skills/grok/SKILL.md`, `.claude/skills/agy/scripts/ask-gemini.sh`, `.claude/skills/grok/scripts/ask-grok.sh` (path-gotcha docs + inline comments)
|
||||
- `6b0ce9a` — `.claude/scripts/sync.sh` (concurrency lock), `.claude/commands/save.md`, `.claude/commands/sync.md`
|
||||
- `353ba63` — NEW `.claude/scripts/sync-lock.sh` (shared lock lib + `run` wrapper); `.claude/scripts/sync.sh` (sources it); `.claude/commands/scc.md`, `.claude/commands/checkpoint.md` (lock-protected)
|
||||
- `f174d1b` — `.claude/scripts/sync.sh` (best-effort coord `git_sync_<machine>` signal)
|
||||
|
||||
GuruConnect submodule (azcomputerguru/guru-connect):
|
||||
- `ded99c5` — `reports/review-2026-06-05/` (6 per-model reviews, SYNTHESIS-three-way.md, REMEDIATION-PLAN.md, file lists, run-review.sh driver)
|
||||
|
||||
Deferred (untouched): `projects/msp-tools/guru-connect` submodule pointer bump in the ClaudeTools parent.
|
||||
|
||||
## Credentials & Secrets
|
||||
|
||||
None created or rotated this session. Gitea token used by the Gitea Agent lives at vault `services/gitea.sops.yaml` — note the field path is `credentials.api.api-token` (the bare `token` field returns null); the SSH URI on port 2222 is also in that entry.
|
||||
|
||||
## Infrastructure & Servers
|
||||
|
||||
- Coord API: `http://172.16.3.30:8001/api/coord` (up; DB softfail 503-aware)
|
||||
- Gitea: internal `http://172.16.3.20:3000` (preferred on-network), public `https://git.azcomputerguru.com` (NPM/openresty front). Had a ~minutes 502 outage this session; recovered to v1.25.2.
|
||||
- GC deploy target (per GC CLAUDE.md): server on Linux/Ubuntu 22.04 @ 172.16.3.30:3002 → connect.azcomputerguru.com; agent = Windows.
|
||||
|
||||
## Commands & Outputs
|
||||
|
||||
- Per-subsystem review driver: `reports/review-2026-06-05/run-review.sh <gemini|grok> <subsystem> <abs-file-list> <out.md>` (uses `ask-{gemini,grok}.sh review-files -i "<instr>" <files>`).
|
||||
- File lists MUST be absolute for `review-files`: `find "$(pwd)/server/src" -name '*.rs'` from inside the submodule.
|
||||
- Coord component signal (validated, HTTP 200): `PUT /api/coord/components/claudetools/git_sync_<MACHINE>` with `{state,version,notes,updated_by}`.
|
||||
- Coord todos created via `urllib` POST to `/api/coord/todos` (fields: text, project_key, auto_created, source_context, created_by_user, created_by_machine).
|
||||
- Closing `/sync` confirmed: lock acquired/released, submodule pointer skipped (not committed), coord signal `git_sync_GURU-5070 = idle ... (Mike Swanson)`.
|
||||
|
||||
## Pending / Incomplete Tasks
|
||||
|
||||
- **GuruConnect remediation** — coord todos P0–P5 (`project_key=guruconnect`); Gitea issues #10–#22. Suggested: ship P0 (issues #11, #12, #15, #16, #17, #19 — small, no protocol changes) on a `feat/spec-019` branch; kick off P1/#10 (WS auth handshake) design review in parallel (longest lead).
|
||||
- **Multi-session git isolation follow-ups** — coord todo `e84ac521` (`project_key=claudetools`): (1) drop blind `git add -A` for explicit per-session staging (true per-session commit isolation), (2) L2 sibling repos + gitignored junctions to remove submodule pointers, (3) L3 per-session worktrees.
|
||||
- **ClaudeTools submodule-pointer bump** for guru-connect (`→ ded99c5`) still deferred; auto-sync will reconcile, or push manually once the other session's stack lands.
|
||||
- **Rollout note:** each machine gets the locked sync only from its *next* sync after pulling these commits (the pulling sync itself runs unlocked).
|
||||
- Left a `git_sync_GURU-5070` test note earlier; overwritten by the closing real sync.
|
||||
|
||||
## Reference Information
|
||||
|
||||
- Review artifacts: `projects/msp-tools/guru-connect/reports/review-2026-06-05/` (`SYNTHESIS-three-way.md`, `REMEDIATION-PLAN.md`, `gemini-{server,agent,dashboard}.md`, `grok-server-a.md`, `grok-server-b.md`, `grok-agent.md`, `grok-dashboard.md`)
|
||||
- Gitea issues: https://git.azcomputerguru.com/azcomputerguru/guru-connect/issues/10 … /22 (C1=#10 … H8=#22)
|
||||
- Coord todos: guruconnect P0 `5817feff`, P1 `1735bbbb`, P2 `d5d7e50d`, P3 `929de40b`, P4 `fe6b8166`, P5 `c96b71ab`; claudetools isolation `e84ac521`
|
||||
- Commits — ClaudeTools: `ec411f4`, `6b0ce9a`, `353ba63`, `f174d1b`. guru-connect: `ded99c5`.
|
||||
- Lock files: `.claude/scripts/sync-lock.sh` (lib + `bash sync-lock.sh run <cmd>` wrapper), lock dir `<repo>/.git/claudetools-sync.lock`, env knobs `SYNC_LOCK_WAIT` (120s), `SYNC_LOCK_STALE` (600s); contention exit code 75 (EX_TEMPFAIL).
|
||||
Reference in New Issue
Block a user