session-log: GURU-KALI 2026-05-24 continued (merges, fleet auto-update, ProtectSystem bugs, repo hygiene, straggler) — namespaced to avoid shared-log conflicts
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
121
session-logs/2026-05-24-GURU-KALI-session.md
Normal file
121
session-logs/2026-05-24-GURU-KALI-session.md
Normal file
@@ -0,0 +1,121 @@
|
||||
# Session Log — 2026-05-24 (GURU-KALI, continued)
|
||||
|
||||
## User
|
||||
- **User:** Mike Swanson (mike)
|
||||
- **Machine:** GURU-KALI
|
||||
- **Role:** admin
|
||||
|
||||
> Continues `2026-05-24-session.md`. Namespaced per-machine: the shared dated log
|
||||
> conflicted repeatedly today across GURU-KALI / GURU-5070 / MacBook (concurrent
|
||||
> EOF appends), so this machine's later updates live here to avoid merge conflicts.
|
||||
## Update: 12:16 MST — Merges live, fleet auto-update confirmed, ProtectSystem bugs, repo hygiene, straggler fix
|
||||
|
||||
### Session Summary
|
||||
|
||||
Merged the three GuruRMM PRs to production main, confirmed the fleet auto-update path end to end, fixed two ProtectSystem=strict bugs the hardened unit introduced, hardened ClaudeTools against recurring garbled-filename cruft, and cleared a stuck agent.
|
||||
|
||||
PRs #13 (Linux tray IPC+GTK), #14 (Phase 4 peer-cred authz + real actions), and #21 (self-update ReadWritePaths) were all merged to `azcomputerguru/gururmm` main. SSH access to Saturn (172.16.3.30) was established from GURU-KALI by generating an ed25519 key and authorizing it via the vaulted SSH password, enabling direct build-server diagnostics. The pipeline published 0.6.37 (from #13/#14) and GURU-KALI auto-updated 0.6.29 -> 0.6.37 (130 MB dev build replaced by the 4 MB signed release), confirming the full chain.
|
||||
|
||||
Two ProtectSystem=strict bugs surfaced on GURU-KALI's recently-hardened unit: (1) the IPC socket needed RuntimeDirectory=gururmm (fixed in #13), and (2) the self-updater could not write the backup (/etc/gururmm) or the replaced binary (/usr/local/bin) — "Failed to backup current binary". Fixed by widening ReadWritePaths to `/var/log /usr/local/bin /etc/gururmm` (GURU-KALI unit patched directly; template fix = Issue #20 + PR #21, merged).
|
||||
|
||||
In ClaudeTools, removed 5 garbled tracked files (Windows paths stored with Unicode PUA substitutes) and added a purge guard to sync.sh, plus a PreToolUse(Bash) hook (block-backslash-winpath.sh) that blocks backslashed-Windows-path redirects — the root cause (Git Bash strips backslashes and PUA-substitutes ':'). jq was installed on GURU-KALI (hooks need it).
|
||||
|
||||
The "fleet remediation" turned out unnecessary: only GURU-KALI had the hardened unit; Saturn/ix/Jupiter run old ProtectSystem=false units and updated fine. SL-SERVER (Scileppi, dead 5 months, now a Synology) was deleted from the RMM. The #21 merge build was debounced (a concurrent auto-bump build held the lock), so an empty commit (fee5d7e) re-fired the webhook -> building 0.6.38 with the fix. Finally, the one online straggler RECEPTIONIST-PC (Cascades, stuck on 0.6.29 due to a flaky WebSocket that misses auto-dispatch windows) was force-updated via the API to 0.6.37.
|
||||
|
||||
### Key Decisions
|
||||
|
||||
- **Fixed my own SSH access** (per user instruction) via ed25519 key + vaulted password rather than asking for credentials — gives durable key auth to Saturn for build diagnostics.
|
||||
- **ReadWritePaths needs BOTH /usr/local/bin AND /etc/gururmm** — verified from the updater code (backup -> config_dir=/etc/gururmm, replace -> /usr/local/bin); the /usr/local/bin-only first attempt still failed.
|
||||
- **Did NOT push a fleet remediation** — investigation showed only GURU-KALI had the broken hardened unit; the others (ProtectSystem=false) self-update fine. Pushing a needless restart to client servers was avoided.
|
||||
- **Backslash hook ignores quoted strings** — it quote-strips the command first so the pattern inside commit messages / echoes doesn't false-trigger (the hook caught its own commit message during testing); dropped a second check that re-introduced the false positive.
|
||||
- **Empty commit to re-trigger the debounced build** (vs manual build-agents.sh) — keeps it pipeline-driven and traceable; auto-bump then correctly produced 0.6.38.
|
||||
- **Force-triggered RECEPTIONIST-PC** rather than waiting — confirmed the update mechanism works for it and that the real issue is WS instability, not policy/server.
|
||||
|
||||
### Problems Encountered
|
||||
|
||||
- **Session log had committed merge-conflict markers** (multiple machines appended 2026-05-24-session.md). Resolved by removing the 3 marker lines (union of both sides kept).
|
||||
- **kill -0 false "zombie lock"** — checking a root-owned build PID as `guru` returned permission-denied, misread as dead. The build was alive (waiting on Pluto). Corrected; no lock was cleared.
|
||||
- **GURU-KALI auto-update failed twice** ("Failed to backup current binary") under ProtectSystem=strict — fixed by widening ReadWritePaths (both paths).
|
||||
- **#21 merge produced no rollout** — its webhook was debounced by a concurrent build; the published 0.6.37 lacked the fix (confirmed via `strings` on the binary). Re-triggered with an empty commit -> 0.6.38.
|
||||
- **pkill aborting compound bash commands** (exit 144) repeatedly — ran affected steps separately.
|
||||
- **Foreground `sleep` blocked by the harness** mid-poll — switched to immediate checks / background watchers.
|
||||
|
||||
### Configuration Changes (this update)
|
||||
|
||||
GuruRMM (`azcomputerguru/gururmm`):
|
||||
- PR #21 merged: `agent/src/main.rs` systemd unit template ReadWritePaths -> `/var/log /usr/local/bin /etc/gururmm` (merge 175b7f5).
|
||||
- Empty commit `fee5d7e` to re-trigger the build (0.6.38).
|
||||
|
||||
ClaudeTools (committed a3c7064, 6d065cf):
|
||||
- Removed 5 garbled tracked files (C:\ProgramData\gp_user.txt, D:\...\current-mode, 3 script fragments).
|
||||
- `.claude/scripts/sync.sh` — added `purge_garbled_paths()` before each `git add -A`.
|
||||
- `.claude/hooks/block-backslash-winpath.sh` (new) — PreToolUse(Bash) guard.
|
||||
- `.claude/settings.json` — wired the PreToolUse hook (matcher Bash).
|
||||
- `.claude/CLAUDE.md` — Windows mode-write note.
|
||||
- `.claude/machines/guru-kali.md` — Rust, GTK build libs, passwordless sudo, gururmm clone, jq, enrolled-agent note.
|
||||
|
||||
Machine-level (GURU-KALI):
|
||||
- `~/.ssh/id_ed25519` (new keypair, no passphrase) authorized on guru@172.16.3.30.
|
||||
- jq 1.8.1 installed (apt).
|
||||
- `/etc/systemd/system/gururmm-agent.service` — ReadWritePaths widened + RuntimeDirectory (applied directly); agent now on 0.6.37.
|
||||
|
||||
RMM data:
|
||||
- Deleted agent SL-SERVER (id 2585f6d5-3887-412e-a586-1dec030f0a40).
|
||||
- Force-update RECEPTIONIST-PC (9c91d324...) 0.6.29 -> 0.6.37.
|
||||
|
||||
### Credentials & Secrets (this update)
|
||||
|
||||
- **GURU-KALI SSH key**: `~/.ssh/id_ed25519` (ed25519, no passphrase). Public key:
|
||||
`ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIHd5ZblkziRIOI+57C4y7OkV3DvxlqmAe7VyBgPIYsyy guru@GURU-KALI`
|
||||
Authorized in `guru@172.16.3.30:~/.ssh/authorized_keys`.
|
||||
- All other creds used are vaulted (paths only, no values transcribed):
|
||||
- Saturn SSH/sudo password: `infrastructure/gururmm-server.sops.yaml` -> `credentials.password`.
|
||||
- RMM API admin: same file -> `credentials.gururmm-api.admin-email` (claude-api@azcomputerguru.com) / `credentials.gururmm-api.admin-password`.
|
||||
- Gitea API token: `services/gitea.sops.yaml` -> `credentials.api.api-token`.
|
||||
|
||||
### Infrastructure & Servers (this update)
|
||||
|
||||
- **Saturn** `172.16.3.30` = hostname `gururmm`, Ubuntu 22.04 (kernel 5.15) — build server + RMM API (`:3001`) + PostgreSQL `gururmm` + ClaudeTools MariaDB + coord API (`:8001`). SSH key auth (guru) now works.
|
||||
- **Pluto** `172.16.3.36` (Administrator) — Windows build + Authenticode signing; build-agents.sh SSHes to it (5 cargo targets + WiX MSI). Reachable.
|
||||
- **Build pipeline**: webhook (`172.16.3.30:9000` / `/webhook/build`) -> `/opt/gururmm/build-agents.sh`; lock `/var/run/gururmm-build.lock`; log `/var/log/gururmm-build.log`; artifacts `/var/www/gururmm/downloads/` (e.g. `gururmm-agent-linux-amd64-<ver>`, `-latest` symlink). Per-component auto-version bump.
|
||||
- **Auto-update**: server scanner every 300s; dispatched over WebSocket on heartbeat; gated on effective_policy `auto_update` (default on when policy null). Linux config_dir `/etc/gururmm` (backup + pending-update.json); binary `/usr/local/bin/gururmm-agent`; temp download via PrivateTmp.
|
||||
- **RECEPTIONIST-PC** (Cascades of Tucson / site CascadesTucson): agent `9c91d324-1073-449c-8cc0-45c5bccfc218`, Windows 11 (22631) amd64. Flaky WS ("Connection reset without closing handshake"), ~3 connects/6h.
|
||||
|
||||
### Commands & Outputs (this update)
|
||||
|
||||
```bash
|
||||
# SSH access self-fix
|
||||
ssh-keygen -t ed25519 -N "" -f ~/.ssh/id_ed25519 -C "guru@GURU-KALI"
|
||||
SSHPASS="$(vault get-field infrastructure/gururmm-server.sops.yaml credentials.password)" \
|
||||
sshpass -e ssh-copy-id -i ~/.ssh/id_ed25519.pub guru@172.16.3.30 # Number of key(s) added: 1
|
||||
|
||||
# RMM API (auth + enumerate + force update)
|
||||
TOKEN=$(curl -s -X POST $API/auth/login -d '{"email":..,"password":..}' | jq -r .token)
|
||||
curl -s $API/agents -H "Authorization: Bearer $TOKEN" | jq ... # 56 -> 55 agents
|
||||
curl -s -X DELETE $API/agents/2585f6d5-... -H "Authorization: Bearer $TOKEN" # HTTP 204 (SL-SERVER)
|
||||
curl -s -X POST $API/agents/9c91d324-.../update -H "Authorization: Bearer $TOKEN"
|
||||
# -> {"success":true,"target_version":"0.6.37","message":"Update triggered: 0.6.29 -> 0.6.37"}
|
||||
# server: reconnected after update: 0.6.29 -> 0.6.37 (landed in ~37s despite a WS reset)
|
||||
|
||||
# Build re-trigger (debounced merge)
|
||||
git commit --allow-empty -m "chore: re-trigger agent build ..." && git push origin main # fee5d7e
|
||||
# build log: "Agent: 0.6.37 -> 0.6.38" / "Building version: 0.6.38"
|
||||
|
||||
# Session-log conflict cleanup
|
||||
sed -i '228d;451d;534d' session-logs/2026-05-24-session.md # removed <<<<<<< ======= >>>>>>>
|
||||
```
|
||||
|
||||
### Pending / Incomplete Tasks (this update)
|
||||
|
||||
- **Fleet watcher running** (`/tmp/gururmm-fleet-watch.sh`, ~60 min): waits for 0.6.38 to publish, then confirms fleet convergence; will report + flag laggards.
|
||||
- **RECEPTIONIST-PC WS instability** (Cascades site) — durable fix pending; it will likely lag 0.6.38 too unless its WS is up. Force-trigger again as needed, or investigate the site firewall/NAT killing the long-lived WebSocket.
|
||||
- **Open GuruRMM issues**: #15 pipeline tray build, #16 Windows IPC peer authz, #17 logind console user, #18 macOS tray, #19 subscriber broadcast. (#20 closed by #21.)
|
||||
- **Session-log multi-machine conflicts**: 2026-05-24-session.md conflicted across machines; consider per-machine namespacing for same-day logs.
|
||||
|
||||
### Reference Information (this update)
|
||||
|
||||
- GuruRMM merges: PR #13, #14, #21 (#21 commit 72b8510, merge 175b7f5; closed Issue #20). Re-trigger empty commit fee5d7e. Building 0.6.38.
|
||||
- Versions: 0.6.37 published/fleet; 0.6.38 building.
|
||||
- Agent ids: GURU-KALI `a73ba38e-cd02-4331-b8bf-474cd899ec22`, Saturn `8cd0440f-a65c-4ed2-9fa8-9c6de83492a4`, RECEPTIONIST-PC `9c91d324-1073-449c-8cc0-45c5bccfc218`. Deleted SL-SERVER `2585f6d5-3887-412e-a586-1dec030f0a40`.
|
||||
- ClaudeTools commits: a3c7064 (garbled cleanup + sync.sh guard), 6d065cf (backslash hook + CLAUDE.md note).
|
||||
- Issues/PRs: https://git.azcomputerguru.com/azcomputerguru/gururmm/issues/{15..19}, /pulls/{13,14,21}.
|
||||
Reference in New Issue
Block a user