sync: auto-sync from GURU-KALI at 2026-05-27 20:42:46
Author: Mike Swanson Machine: GURU-KALI Timestamp: 2026-05-27 20:42:46
This commit is contained in:
84
session-logs/2026-05-27-guru-kali-coord-hook-fix.md
Normal file
84
session-logs/2026-05-27-guru-kali-coord-hook-fix.md
Normal file
@@ -0,0 +1,84 @@
|
||||
# Session Log — Coord UserPromptSubmit hook fix (broadcasts + JSON robustness)
|
||||
|
||||
Distinct work follow-on to today's earlier `2026-05-27-guru-kali-identity-phase2-followups.md`. The hooks coord-todo (`c28d1baa`) plus the Mac-confirmed install-hooks status from earlier today are unchanged.
|
||||
|
||||
## User
|
||||
- **User:** Mike Swanson (mike)
|
||||
- **Machine:** GURU-KALI
|
||||
- **Role:** admin
|
||||
- **Session span:** 2026-05-27 ~17:14 MST through ~20:40 MST
|
||||
|
||||
## Session Summary
|
||||
|
||||
A run of routine syncs through the afternoon and evening pulled a busy day of fleet activity (30 commits, then 3, then 2) — `/mailbox` skill, Autotask, RMM Phase-2 deploy, `rmm-audit` Agent F, GuruScan module refactor, Quantum M365 onboarding, Syncro delivery-channel rule, and three vault entries. The Mac picked up my Phase-2 step-4 hand-off cleanly (`2c12bd2` wired sync.sh + syncro.md to read from identity.json, `0e2629a` cleaned CLAUDE.md). Each sync.sh rebase merged my attribution guards intact with the Mac's concurrent rewrites — verified after every round.
|
||||
|
||||
Mike flagged the coord UserPromptSubmit hook was broken: through several syncs neither he nor I had received any coord messages despite real fleet activity. Diagnosis: the hook (`.claude/scripts/check-messages.sh`) only queried `to_session=$SESSION` and `to_session=$USER_ALIAS`. It never asked for `to_session=ALL_SESSIONS`, which the fleet uses for broadcasts. Four real broadcasts were sitting unread (LHM/WinRing0 in-flight then done, SPEC-010 features+bugs, SPEC-011 ARP). The hook ran exit 0 every time and printed nothing — the gap was coverage, not error.
|
||||
|
||||
Implemented the fix in three rounds, each surfacing a new bug. First pass added an `ALL_SESSIONS` query and a per-machine seen-file (`.claude/coord-broadcasts-seen`, gitignored) so the hook never PUTs `/read` on broadcasts and never re-surfaces ones already shown. Testing showed silent failure: `jq` aborted with "Invalid string: control characters from U+0000 through U+001F must be escaped". The coord API emits some message bodies with raw unescaped control chars — invalid per RFC 8259. Added a `sanitize_json` helper that round-trips through `python3 json.loads(strict=False)` (which accepts them) and re-emits valid JSON.
|
||||
|
||||
Second round still silent: `bash echo "$json_var"` was interpreting backslash escapes inside the JSON (e.g. `\n` in body text becoming a literal newline) and corrupting it before jq saw it. Replaced every `echo "$VAR" | jq` with `printf '%s' "$VAR" | jq` throughout — the original script had the same latent bug, untriggered only because personal messages were usually empty. Third round still empty seen-file: the jq filter `($seen | index(.id))` evaluated `.id` against `$seen` (the array) not against the current message — classic jq scoping. Fixed by binding the message: `map(. as $m | select(($seen | index($m.id)) == null) ...)`.
|
||||
|
||||
Also fixed an interim bug introduced by my own refactor: `${result:-{}}` as a bash default — the first `}` closes the parameter expansion early, producing `{...}}` and breaking jq. Replaced with an explicit `[ -z "$result_safe" ] && result_safe='{"messages":[]}'` guard. End-to-end test: run 1 surfaces 4 broadcasts and writes 4 UUIDs to the seen-file; run 2 is silent; server still shows 4 unread (other machines unaffected). Code Review Agent reviewed the whole change and returned APPROVED (no defects, only cosmetic observations). Shipped as `a35b583`.
|
||||
|
||||
## Key Decisions
|
||||
|
||||
- Per-machine local seen-tracking for broadcasts, NOT a server-side `read_at` flip — the schema has one read_at field, so PUT /read on a broadcast would clobber it for every other recipient that hasn't seen it.
|
||||
- Sanitize JSON client-side via python `strict=False` rather than fixing the server. The API bug (unescaped control chars in bodies) is a separate concern; the hook needs to be robust regardless of server quality.
|
||||
- Convert ALL `echo "$JSON" | jq` to `printf '%s'` fleet-wide, not just the broadcast paths. The bug was latent in the original code; fixing it everywhere prevents future surprise corruption on any message with backslash escapes.
|
||||
- Mandatory code review (CLAUDE.md rule) before syncing the hook to the fleet — this script runs on every prompt on every machine, so a regression breaks coord-message surfacing globally. Review came back APPROVED.
|
||||
|
||||
## Problems Encountered
|
||||
|
||||
- **Hook silent despite real unread:** the hook only queried personal + alias, never ALL_SESSIONS. Fixed by adding the third query.
|
||||
- **jq parse error on broadcast bodies:** server returned message JSON with unescaped U+0001 control chars (and similar). Fixed with the `sanitize_json` python round-trip helper.
|
||||
- **echo backslash escape corruption:** bash `echo "$json"` expanded `\n`/`\"` inside JSON string values, breaking the content before jq could parse it. Fixed by switching every JSON pipe to `printf '%s'`. Latent in the original script.
|
||||
- **jq scoping bug in filter:** `($seen | index(.id))` evaluated `.id` against `$seen` not the message. Fixed by binding the message to `$m` and using `$m.id` / `$m.from_session`.
|
||||
- **Bash brace-default trap:** `${result:-{}}` parses as `${result:-{}` + `}` → extra brace → jq error. Replaced with an explicit guard.
|
||||
- **Misleading test runs:** intermediate tests showed empty output and empty seen-file. A trace via `awk`-modified script copy in `/tmp` made SCRIPT_DIR resolve to `/`, giving a misleading `SEEN_FILE=//coord-broadcasts-seen`. Clean re-test in the real script path confirmed the actual fix worked end-to-end.
|
||||
|
||||
## Configuration Changes
|
||||
|
||||
- `.claude/scripts/check-messages.sh` — added `sanitize_json` helper; added `ALL_SESSIONS` query + per-machine seen-file logic; converted all `echo "$JSON" | jq` to `printf '%s' "$JSON" | jq` (12 sites incl. the locks block); fixed jq scoping with `. as $m` binding; replaced `${result:-{}}` brace-trap with explicit guard.
|
||||
- `.gitignore` — added `.claude/coord-broadcasts-seen` (per-machine local seen-tracking).
|
||||
- `.claude/memory/project_mac_gururmm_setup_pending.md` — caveat updated earlier today to `[CONFIRMED PENDING 2026-05-27]` based on Mac coord reply.
|
||||
- `projects/msp-tools/guru-rmm` — submodule pointer auto-advanced by Phase 1a to `6326ec6`.
|
||||
|
||||
## Credentials & Secrets
|
||||
|
||||
None created or discovered.
|
||||
|
||||
## Infrastructure & Servers
|
||||
|
||||
- Coord API `http://172.16.3.30:8001/api/coord/messages` — broadcasts use `to_session=ALL_SESSIONS` (magic recipient string). Single server-side `read_at` field per message (no per-recipient tracking).
|
||||
- 4 unread `ALL_SESSIONS` broadcasts at fix-time: `e032f029` (LHM v0.6.46 done), `620af7f5` (LHM in flight), `7bdc6d3c` (SPEC-011 ARP), `3fe667e1` (SPEC-010 UX).
|
||||
- Hook config in `.claude/settings.json` → `UserPromptSubmit` → `bash .claude/scripts/check-messages.sh` (15s timeout). Unchanged.
|
||||
- Per-machine seen-file: `.claude/coord-broadcasts-seen` (gitignored, append-only UUIDs).
|
||||
|
||||
## Commands & Outputs
|
||||
|
||||
- Hook test pattern:
|
||||
```bash
|
||||
rm -f .claude/coord-broadcasts-seen
|
||||
bash .claude/scripts/check-messages.sh # surfaces broadcasts, writes seen-file
|
||||
bash .claude/scripts/check-messages.sh # silent (broadcasts in seen-file)
|
||||
curl -s "http://172.16.3.30:8001/api/coord/messages?to_session=ALL_SESSIONS&unread_only=true"
|
||||
# → still 4 unread on server: other machines unaffected
|
||||
```
|
||||
- Reading API msg shape (no auth): `curl -s "http://172.16.3.30:8001/api/coord/messages?to_session=ALL_SESSIONS&unread_only=true&limit=100"` → returns `{total, skip, limit, messages:[...]}`. Bodies may contain raw control chars (RFC violation; client-side sanitize required).
|
||||
- jq scoping form: `map(. as $m | select(($seen | index($m.id)) == null) and (($m.from_session | ascii_downcase | sub("\\.local/"; "/")) != $self))`.
|
||||
|
||||
## Pending / Incomplete Tasks
|
||||
|
||||
Per Mike (earlier today, 2026-05-27): these are not this instance's tasks — recorded only as a fleet reference:
|
||||
- Mac `install-hooks.sh` — owned by Mac via coord to-do `c28d1baa`.
|
||||
- GURU-5070 pubkey on Pluto, Ollama-fallback rollout to other machines — owned elsewhere.
|
||||
|
||||
No open items for GURU-KALI.
|
||||
|
||||
## Reference Information
|
||||
|
||||
- Commits this round: `45126c0`/`01af931` (submodule auto-advances), `2678d38` (mac-pending caveat), `a35b583` (coord hook fix). Pulled: 30 commits + 3 vault (RMM Phase 2, /mailbox, Autotask, Quantum M365, GuruScan refactor); then 3 (LHM removal submodule advance + SPEC-011 + Howard sync); then 2 (Mac + Howard auto-sync); then 1 (own hook fix push).
|
||||
- Code Review Agent verdict on hook fix: APPROVED. Agent id `a7d01ffc8547e8dff`.
|
||||
- Files: `.claude/scripts/check-messages.sh`, `.gitignore`. Hook config: `.claude/settings.json`.
|
||||
- Coord broadcasts surfaced during fix: `e032f029`, `620af7f5`, `7bdc6d3c`, `3fe667e1`.
|
||||
- Vault pulled: `clients/quantumwms/m365-breakglass.sops.yaml` (new), `clients/sif-oidak/laptops.sops.yaml` (new), `msp-tools/autotask.sops.yaml` (updated).
|
||||
Reference in New Issue
Block a user