8.7 KiB
Session Log — Coord UserPromptSubmit hook fix (broadcasts + JSON robustness)
Distinct work follow-on to today's earlier 2026-05-27-guru-kali-identity-phase2-followups.md. The hooks coord-todo (c28d1baa) plus the Mac-confirmed install-hooks status from earlier today are unchanged.
User
- User: Mike Swanson (mike)
- Machine: GURU-KALI
- Role: admin
- Session span: 2026-05-27 ~17:14 MST through ~20:40 MST
Session Summary
A run of routine syncs through the afternoon and evening pulled a busy day of fleet activity (30 commits, then 3, then 2) — /mailbox skill, Autotask, RMM Phase-2 deploy, rmm-audit Agent F, GuruScan module refactor, Quantum M365 onboarding, Syncro delivery-channel rule, and three vault entries. The Mac picked up my Phase-2 step-4 hand-off cleanly (2c12bd2 wired sync.sh + syncro.md to read from identity.json, 0e2629a cleaned CLAUDE.md). Each sync.sh rebase merged my attribution guards intact with the Mac's concurrent rewrites — verified after every round.
Mike flagged the coord UserPromptSubmit hook was broken: through several syncs neither he nor I had received any coord messages despite real fleet activity. Diagnosis: the hook (.claude/scripts/check-messages.sh) only queried to_session=$SESSION and to_session=$USER_ALIAS. It never asked for to_session=ALL_SESSIONS, which the fleet uses for broadcasts. Four real broadcasts were sitting unread (LHM/WinRing0 in-flight then done, SPEC-010 features+bugs, SPEC-011 ARP). The hook ran exit 0 every time and printed nothing — the gap was coverage, not error.
Implemented the fix in three rounds, each surfacing a new bug. First pass added an ALL_SESSIONS query and a per-machine seen-file (.claude/coord-broadcasts-seen, gitignored) so the hook never PUTs /read on broadcasts and never re-surfaces ones already shown. Testing showed silent failure: jq aborted with "Invalid string: control characters from U+0000 through U+001F must be escaped". The coord API emits some message bodies with raw unescaped control chars — invalid per RFC 8259. Added a sanitize_json helper that round-trips through python3 json.loads(strict=False) (which accepts them) and re-emits valid JSON.
Second round still silent: bash echo "$json_var" was interpreting backslash escapes inside the JSON (e.g. \n in body text becoming a literal newline) and corrupting it before jq saw it. Replaced every echo "$VAR" | jq with printf '%s' "$VAR" | jq throughout — the original script had the same latent bug, untriggered only because personal messages were usually empty. Third round still empty seen-file: the jq filter ($seen | index(.id)) evaluated .id against $seen (the array) not against the current message — classic jq scoping. Fixed by binding the message: map(. as $m | select(($seen | index($m.id)) == null) ...).
Also fixed an interim bug introduced by my own refactor: ${result:-{}} as a bash default — the first } closes the parameter expansion early, producing {...}} and breaking jq. Replaced with an explicit [ -z "$result_safe" ] && result_safe='{"messages":[]}' guard. End-to-end test: run 1 surfaces 4 broadcasts and writes 4 UUIDs to the seen-file; run 2 is silent; server still shows 4 unread (other machines unaffected). Code Review Agent reviewed the whole change and returned APPROVED (no defects, only cosmetic observations). Shipped as a35b583.
Key Decisions
- Per-machine local seen-tracking for broadcasts, NOT a server-side
read_atflip — the schema has one read_at field, so PUT /read on a broadcast would clobber it for every other recipient that hasn't seen it. - Sanitize JSON client-side via python
strict=Falserather than fixing the server. The API bug (unescaped control chars in bodies) is a separate concern; the hook needs to be robust regardless of server quality. - Convert ALL
echo "$JSON" | jqtoprintf '%s'fleet-wide, not just the broadcast paths. The bug was latent in the original code; fixing it everywhere prevents future surprise corruption on any message with backslash escapes. - Mandatory code review (CLAUDE.md rule) before syncing the hook to the fleet — this script runs on every prompt on every machine, so a regression breaks coord-message surfacing globally. Review came back APPROVED.
Problems Encountered
- Hook silent despite real unread: the hook only queried personal + alias, never ALL_SESSIONS. Fixed by adding the third query.
- jq parse error on broadcast bodies: server returned message JSON with unescaped U+0001 control chars (and similar). Fixed with the
sanitize_jsonpython round-trip helper. - echo backslash escape corruption: bash
echo "$json"expanded\n/\"inside JSON string values, breaking the content before jq could parse it. Fixed by switching every JSON pipe toprintf '%s'. Latent in the original script. - jq scoping bug in filter:
($seen | index(.id))evaluated.idagainst$seennot the message. Fixed by binding the message to$mand using$m.id/$m.from_session. - Bash brace-default trap:
${result:-{}}parses as${result:-{}+}→ extra brace → jq error. Replaced with an explicit guard. - Misleading test runs: intermediate tests showed empty output and empty seen-file. A trace via
awk-modified script copy in/tmpmade SCRIPT_DIR resolve to/, giving a misleadingSEEN_FILE=//coord-broadcasts-seen. Clean re-test in the real script path confirmed the actual fix worked end-to-end.
Configuration Changes
.claude/scripts/check-messages.sh— addedsanitize_jsonhelper; addedALL_SESSIONSquery + per-machine seen-file logic; converted allecho "$JSON" | jqtoprintf '%s' "$JSON" | jq(12 sites incl. the locks block); fixed jq scoping with. as $mbinding; replaced${result:-{}}brace-trap with explicit guard..gitignore— added.claude/coord-broadcasts-seen(per-machine local seen-tracking)..claude/memory/project_mac_gururmm_setup_pending.md— caveat updated earlier today to[CONFIRMED PENDING 2026-05-27]based on Mac coord reply.projects/msp-tools/guru-rmm— submodule pointer auto-advanced by Phase 1a to6326ec6.
Credentials & Secrets
None created or discovered.
Infrastructure & Servers
- Coord API
http://172.16.3.30:8001/api/coord/messages— broadcasts useto_session=ALL_SESSIONS(magic recipient string). Single server-sideread_atfield per message (no per-recipient tracking). - 4 unread
ALL_SESSIONSbroadcasts at fix-time:e032f029(LHM v0.6.46 done),620af7f5(LHM in flight),7bdc6d3c(SPEC-011 ARP),3fe667e1(SPEC-010 UX). - Hook config in
.claude/settings.json→UserPromptSubmit→bash .claude/scripts/check-messages.sh(15s timeout). Unchanged. - Per-machine seen-file:
.claude/coord-broadcasts-seen(gitignored, append-only UUIDs).
Commands & Outputs
- Hook test pattern:
rm -f .claude/coord-broadcasts-seen bash .claude/scripts/check-messages.sh # surfaces broadcasts, writes seen-file bash .claude/scripts/check-messages.sh # silent (broadcasts in seen-file) curl -s "http://172.16.3.30:8001/api/coord/messages?to_session=ALL_SESSIONS&unread_only=true" # → still 4 unread on server: other machines unaffected - Reading API msg shape (no auth):
curl -s "http://172.16.3.30:8001/api/coord/messages?to_session=ALL_SESSIONS&unread_only=true&limit=100"→ returns{total, skip, limit, messages:[...]}. Bodies may contain raw control chars (RFC violation; client-side sanitize required). - jq scoping form:
map(. as $m | select(($seen | index($m.id)) == null) and (($m.from_session | ascii_downcase | sub("\\.local/"; "/")) != $self)).
Pending / Incomplete Tasks
Per Mike (earlier today, 2026-05-27): these are not this instance's tasks — recorded only as a fleet reference:
- Mac
install-hooks.sh— owned by Mac via coord to-doc28d1baa. - GURU-5070 pubkey on Pluto, Ollama-fallback rollout to other machines — owned elsewhere.
No open items for GURU-KALI.
Reference Information
- Commits this round:
45126c0/01af931(submodule auto-advances),2678d38(mac-pending caveat),a35b583(coord hook fix). Pulled: 30 commits + 3 vault (RMM Phase 2, /mailbox, Autotask, Quantum M365, GuruScan refactor); then 3 (LHM removal submodule advance + SPEC-011 + Howard sync); then 2 (Mac + Howard auto-sync); then 1 (own hook fix push). - Code Review Agent verdict on hook fix: APPROVED. Agent id
a7d01ffc8547e8dff. - Files:
.claude/scripts/check-messages.sh,.gitignore. Hook config:.claude/settings.json. - Coord broadcasts surfaced during fix:
e032f029,620af7f5,7bdc6d3c,3fe667e1. - Vault pulled:
clients/quantumwms/m365-breakglass.sops.yaml(new),clients/sif-oidak/laptops.sops.yaml(new),msp-tools/autotask.sops.yaml(updated).