diff --git a/session-logs/2026-05-24-GURU-KALI-session.md b/session-logs/2026-05-24-GURU-KALI-session.md new file mode 100644 index 0000000..72ae24b --- /dev/null +++ b/session-logs/2026-05-24-GURU-KALI-session.md @@ -0,0 +1,121 @@ +# Session Log — 2026-05-24 (GURU-KALI, continued) + +## User +- **User:** Mike Swanson (mike) +- **Machine:** GURU-KALI +- **Role:** admin + +> Continues `2026-05-24-session.md`. Namespaced per-machine: the shared dated log +> conflicted repeatedly today across GURU-KALI / GURU-5070 / MacBook (concurrent +> EOF appends), so this machine's later updates live here to avoid merge conflicts. +## Update: 12:16 MST — Merges live, fleet auto-update confirmed, ProtectSystem bugs, repo hygiene, straggler fix + +### Session Summary + +Merged the three GuruRMM PRs to production main, confirmed the fleet auto-update path end to end, fixed two ProtectSystem=strict bugs the hardened unit introduced, hardened ClaudeTools against recurring garbled-filename cruft, and cleared a stuck agent. + +PRs #13 (Linux tray IPC+GTK), #14 (Phase 4 peer-cred authz + real actions), and #21 (self-update ReadWritePaths) were all merged to `azcomputerguru/gururmm` main. SSH access to Saturn (172.16.3.30) was established from GURU-KALI by generating an ed25519 key and authorizing it via the vaulted SSH password, enabling direct build-server diagnostics. The pipeline published 0.6.37 (from #13/#14) and GURU-KALI auto-updated 0.6.29 -> 0.6.37 (130 MB dev build replaced by the 4 MB signed release), confirming the full chain. + +Two ProtectSystem=strict bugs surfaced on GURU-KALI's recently-hardened unit: (1) the IPC socket needed RuntimeDirectory=gururmm (fixed in #13), and (2) the self-updater could not write the backup (/etc/gururmm) or the replaced binary (/usr/local/bin) — "Failed to backup current binary". Fixed by widening ReadWritePaths to `/var/log /usr/local/bin /etc/gururmm` (GURU-KALI unit patched directly; template fix = Issue #20 + PR #21, merged). + +In ClaudeTools, removed 5 garbled tracked files (Windows paths stored with Unicode PUA substitutes) and added a purge guard to sync.sh, plus a PreToolUse(Bash) hook (block-backslash-winpath.sh) that blocks backslashed-Windows-path redirects — the root cause (Git Bash strips backslashes and PUA-substitutes ':'). jq was installed on GURU-KALI (hooks need it). + +The "fleet remediation" turned out unnecessary: only GURU-KALI had the hardened unit; Saturn/ix/Jupiter run old ProtectSystem=false units and updated fine. SL-SERVER (Scileppi, dead 5 months, now a Synology) was deleted from the RMM. The #21 merge build was debounced (a concurrent auto-bump build held the lock), so an empty commit (fee5d7e) re-fired the webhook -> building 0.6.38 with the fix. Finally, the one online straggler RECEPTIONIST-PC (Cascades, stuck on 0.6.29 due to a flaky WebSocket that misses auto-dispatch windows) was force-updated via the API to 0.6.37. + +### Key Decisions + +- **Fixed my own SSH access** (per user instruction) via ed25519 key + vaulted password rather than asking for credentials — gives durable key auth to Saturn for build diagnostics. +- **ReadWritePaths needs BOTH /usr/local/bin AND /etc/gururmm** — verified from the updater code (backup -> config_dir=/etc/gururmm, replace -> /usr/local/bin); the /usr/local/bin-only first attempt still failed. +- **Did NOT push a fleet remediation** — investigation showed only GURU-KALI had the broken hardened unit; the others (ProtectSystem=false) self-update fine. Pushing a needless restart to client servers was avoided. +- **Backslash hook ignores quoted strings** — it quote-strips the command first so the pattern inside commit messages / echoes doesn't false-trigger (the hook caught its own commit message during testing); dropped a second check that re-introduced the false positive. +- **Empty commit to re-trigger the debounced build** (vs manual build-agents.sh) — keeps it pipeline-driven and traceable; auto-bump then correctly produced 0.6.38. +- **Force-triggered RECEPTIONIST-PC** rather than waiting — confirmed the update mechanism works for it and that the real issue is WS instability, not policy/server. + +### Problems Encountered + +- **Session log had committed merge-conflict markers** (multiple machines appended 2026-05-24-session.md). Resolved by removing the 3 marker lines (union of both sides kept). +- **kill -0 false "zombie lock"** — checking a root-owned build PID as `guru` returned permission-denied, misread as dead. The build was alive (waiting on Pluto). Corrected; no lock was cleared. +- **GURU-KALI auto-update failed twice** ("Failed to backup current binary") under ProtectSystem=strict — fixed by widening ReadWritePaths (both paths). +- **#21 merge produced no rollout** — its webhook was debounced by a concurrent build; the published 0.6.37 lacked the fix (confirmed via `strings` on the binary). Re-triggered with an empty commit -> 0.6.38. +- **pkill aborting compound bash commands** (exit 144) repeatedly — ran affected steps separately. +- **Foreground `sleep` blocked by the harness** mid-poll — switched to immediate checks / background watchers. + +### Configuration Changes (this update) + +GuruRMM (`azcomputerguru/gururmm`): +- PR #21 merged: `agent/src/main.rs` systemd unit template ReadWritePaths -> `/var/log /usr/local/bin /etc/gururmm` (merge 175b7f5). +- Empty commit `fee5d7e` to re-trigger the build (0.6.38). + +ClaudeTools (committed a3c7064, 6d065cf): +- Removed 5 garbled tracked files (C:\ProgramData\gp_user.txt, D:\...\current-mode, 3 script fragments). +- `.claude/scripts/sync.sh` — added `purge_garbled_paths()` before each `git add -A`. +- `.claude/hooks/block-backslash-winpath.sh` (new) — PreToolUse(Bash) guard. +- `.claude/settings.json` — wired the PreToolUse hook (matcher Bash). +- `.claude/CLAUDE.md` — Windows mode-write note. +- `.claude/machines/guru-kali.md` — Rust, GTK build libs, passwordless sudo, gururmm clone, jq, enrolled-agent note. + +Machine-level (GURU-KALI): +- `~/.ssh/id_ed25519` (new keypair, no passphrase) authorized on guru@172.16.3.30. +- jq 1.8.1 installed (apt). +- `/etc/systemd/system/gururmm-agent.service` — ReadWritePaths widened + RuntimeDirectory (applied directly); agent now on 0.6.37. + +RMM data: +- Deleted agent SL-SERVER (id 2585f6d5-3887-412e-a586-1dec030f0a40). +- Force-update RECEPTIONIST-PC (9c91d324...) 0.6.29 -> 0.6.37. + +### Credentials & Secrets (this update) + +- **GURU-KALI SSH key**: `~/.ssh/id_ed25519` (ed25519, no passphrase). Public key: + `ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIHd5ZblkziRIOI+57C4y7OkV3DvxlqmAe7VyBgPIYsyy guru@GURU-KALI` + Authorized in `guru@172.16.3.30:~/.ssh/authorized_keys`. +- All other creds used are vaulted (paths only, no values transcribed): + - Saturn SSH/sudo password: `infrastructure/gururmm-server.sops.yaml` -> `credentials.password`. + - RMM API admin: same file -> `credentials.gururmm-api.admin-email` (claude-api@azcomputerguru.com) / `credentials.gururmm-api.admin-password`. + - Gitea API token: `services/gitea.sops.yaml` -> `credentials.api.api-token`. + +### Infrastructure & Servers (this update) + +- **Saturn** `172.16.3.30` = hostname `gururmm`, Ubuntu 22.04 (kernel 5.15) — build server + RMM API (`:3001`) + PostgreSQL `gururmm` + ClaudeTools MariaDB + coord API (`:8001`). SSH key auth (guru) now works. +- **Pluto** `172.16.3.36` (Administrator) — Windows build + Authenticode signing; build-agents.sh SSHes to it (5 cargo targets + WiX MSI). Reachable. +- **Build pipeline**: webhook (`172.16.3.30:9000` / `/webhook/build`) -> `/opt/gururmm/build-agents.sh`; lock `/var/run/gururmm-build.lock`; log `/var/log/gururmm-build.log`; artifacts `/var/www/gururmm/downloads/` (e.g. `gururmm-agent-linux-amd64-`, `-latest` symlink). Per-component auto-version bump. +- **Auto-update**: server scanner every 300s; dispatched over WebSocket on heartbeat; gated on effective_policy `auto_update` (default on when policy null). Linux config_dir `/etc/gururmm` (backup + pending-update.json); binary `/usr/local/bin/gururmm-agent`; temp download via PrivateTmp. +- **RECEPTIONIST-PC** (Cascades of Tucson / site CascadesTucson): agent `9c91d324-1073-449c-8cc0-45c5bccfc218`, Windows 11 (22631) amd64. Flaky WS ("Connection reset without closing handshake"), ~3 connects/6h. + +### Commands & Outputs (this update) + +```bash +# SSH access self-fix +ssh-keygen -t ed25519 -N "" -f ~/.ssh/id_ed25519 -C "guru@GURU-KALI" +SSHPASS="$(vault get-field infrastructure/gururmm-server.sops.yaml credentials.password)" \ + sshpass -e ssh-copy-id -i ~/.ssh/id_ed25519.pub guru@172.16.3.30 # Number of key(s) added: 1 + +# RMM API (auth + enumerate + force update) +TOKEN=$(curl -s -X POST $API/auth/login -d '{"email":..,"password":..}' | jq -r .token) +curl -s $API/agents -H "Authorization: Bearer $TOKEN" | jq ... # 56 -> 55 agents +curl -s -X DELETE $API/agents/2585f6d5-... -H "Authorization: Bearer $TOKEN" # HTTP 204 (SL-SERVER) +curl -s -X POST $API/agents/9c91d324-.../update -H "Authorization: Bearer $TOKEN" +# -> {"success":true,"target_version":"0.6.37","message":"Update triggered: 0.6.29 -> 0.6.37"} +# server: reconnected after update: 0.6.29 -> 0.6.37 (landed in ~37s despite a WS reset) + +# Build re-trigger (debounced merge) +git commit --allow-empty -m "chore: re-trigger agent build ..." && git push origin main # fee5d7e +# build log: "Agent: 0.6.37 -> 0.6.38" / "Building version: 0.6.38" + +# Session-log conflict cleanup +sed -i '228d;451d;534d' session-logs/2026-05-24-session.md # removed <<<<<<< ======= >>>>>>> +``` + +### Pending / Incomplete Tasks (this update) + +- **Fleet watcher running** (`/tmp/gururmm-fleet-watch.sh`, ~60 min): waits for 0.6.38 to publish, then confirms fleet convergence; will report + flag laggards. +- **RECEPTIONIST-PC WS instability** (Cascades site) — durable fix pending; it will likely lag 0.6.38 too unless its WS is up. Force-trigger again as needed, or investigate the site firewall/NAT killing the long-lived WebSocket. +- **Open GuruRMM issues**: #15 pipeline tray build, #16 Windows IPC peer authz, #17 logind console user, #18 macOS tray, #19 subscriber broadcast. (#20 closed by #21.) +- **Session-log multi-machine conflicts**: 2026-05-24-session.md conflicted across machines; consider per-machine namespacing for same-day logs. + +### Reference Information (this update) + +- GuruRMM merges: PR #13, #14, #21 (#21 commit 72b8510, merge 175b7f5; closed Issue #20). Re-trigger empty commit fee5d7e. Building 0.6.38. +- Versions: 0.6.37 published/fleet; 0.6.38 building. +- Agent ids: GURU-KALI `a73ba38e-cd02-4331-b8bf-474cd899ec22`, Saturn `8cd0440f-a65c-4ed2-9fa8-9c6de83492a4`, RECEPTIONIST-PC `9c91d324-1073-449c-8cc0-45c5bccfc218`. Deleted SL-SERVER `2585f6d5-3887-412e-a586-1dec030f0a40`. +- ClaudeTools commits: a3c7064 (garbled cleanup + sync.sh guard), 6d065cf (backslash hook + CLAUDE.md note). +- Issues/PRs: https://git.azcomputerguru.com/azcomputerguru/gururmm/issues/{15..19}, /pulls/{13,14,21}.