From 63e46eacdf34af3f2ff5c6b1c8762ce848ebd257 Mon Sep 17 00:00:00 2001 From: Howard Enos Date: Sun, 31 May 2026 15:22:51 -0700 Subject: [PATCH] sync: auto-sync from HOWARD-HOME at 2026-05-31 15:22:41 Author: Howard Enos Machine: HOWARD-HOME Timestamp: 2026-05-31 15:22:41 --- .../2026-05-31-howard-gururmm-roadmap.md | 170 ++++++++++++++++++ 1 file changed, 170 insertions(+) create mode 100644 session-logs/2026-05-31-howard-gururmm-roadmap.md diff --git a/session-logs/2026-05-31-howard-gururmm-roadmap.md b/session-logs/2026-05-31-howard-gururmm-roadmap.md new file mode 100644 index 0000000..f8fd328 --- /dev/null +++ b/session-logs/2026-05-31-howard-gururmm-roadmap.md @@ -0,0 +1,170 @@ +# Session Log — 2026-05-31 — Howard — GuruRMM roadmap: BUG-015, onboarding jq fix, SSE auth, agent IP capture + +## User +- **User:** Howard Enos (howard) +- **Machine:** Howard-Home +- **Role:** tech + +## Session Summary + +Worked through the GuruRMM roadmap/bug/todo backlog and shipped four items. Started by pulling +live coord state (no active locks; reviewed the pending gururmm todos) and the FEATURE_ROADMAP. +Confirmed the previously-assigned quick-wins todo (`15a5440f`) was already largely merged +(`def0d34` on main; BUG-008/009/010/011 marked Fixed), leaving BUG-015 as the only survivor from +that tier. Submodule was even with `origin/main` at `6f31d22`. + +Four items were completed, each on its own branch + PR (except the onboarding fix, which went +straight to main), each gated through a Coding Agent → Code Review Agent flow with a coordination +lock claimed/released around the work: + +1. **BUG-015 / SPEC-011** — agent now registers in Windows Programs & Features. Installer-only WiX + change: added an `` element (Icon table) + 6 ARP properties + pinned the `AgentBinary` + component GUID, plus a generated multi-resolution `gururmm-agent.ico`. PR #30. +2. **Onboarding diagnostic jq bug (`cc5dbdfa`)** — PowerShell `ConvertTo-Json` collapses + single-element arrays to bare objects/strings, which silently dropped the Fixed Volumes table + and errored the network-adapter/local-admin/software-diff lines on single-volume/NIC/admin + machines. Fixed jq-side in the runner (backward-compatible with already-written immutable + baselines). Merged to main (`a735d8c`). Todo marked done. +3. **SSE auth (`06c16144`)** — the public `/api/agents/status-stream` SSE endpoint leaked every + agent UUID + online/offline state. Added a dedicated `SseAuth` extractor that accepts the JWT + via a `?token=` query param (EventSource can't set headers), deliberately NOT broadening the + global `AuthUser` extractor. Three dashboard EventSource callers updated. PR #31. +4. **Agent IP capture (`7459428e`)** — added `agents.local_ips` (JSONB, extracted server-side from + the inventory the agent already sends — no agent change) and `agents.external_ip` (TEXT, stamped + at WS auth from X-Forwarded-For but only when the TCP peer is a configured trusted proxy). + Migration 048 + API + Agent Detail UI. PR #32. + +Three PRs (#30, #31, #32) are stacked for Mike's review. The onboarding fix is live on main. + +## Key Decisions + +- **BUG-015 icon via `` element, not ``** — SPEC-011 proposed setting `ARPPRODUCTICON` + to a `` Id and installing the .ico to Program Files. In MSI, `ARPPRODUCTICON` must + reference a row in the **Icon table**, which `` does not populate, so the spec's approach + would leave it dangling. Used the WiX-canonical `` element (embeds into the MSI, no disk + install, no extra component). Code Review confirmed this is the materially-correct approach. +- **Onboarding fix jq-side, not PS-side** — followed the todo's decision: fixing the runner's jq + is backward-compatible with already-written immutable baselines (which may carry the collapsed + shape), whereas a PS `@()` wrap would only fix future baselines and the diff against old ones + would still break. Used the universal idiom `if type=="array" then . elif .==null then [] else + [.] end`. Extended the fix to inner per-adapter `.ip`/`.dns` arrays (single-IP adapters — the + common case — collapse to a bare string and break `join`), beyond the todo's named sites. +- **SSE auth via dedicated extractor, not global AuthUser** — adding a `?token=` fallback to the + global `AuthUser` would make every authenticated endpoint accept tokens in the URL, broadening + token-in-URL leakage (proxy logs, history, referrers). Scoped a separate `SseAuth` extractor to + the one stream instead. `SseAuth` still tries the Authorization header first for non-browser + clients. Kept scope at "require valid JWT" (closes the public leak); per-org stream filtering is + a separate future change. +- **Agent IP capture needs no agent change** — the agent already ships NIC IPs in + `InventoryReport.network_interfaces`, so `local_ips` is extracted server-side from existing data + (loopback/link-local filtered, deduped). Avoids an agent rebuild/redeploy. `external_ip` is + server-stamped. Better than the todo's literal "agent heartbeat enumerates NICs" — IPs change + rarely so inventory cadence suffices, and `Heartbeat` is a unit variant anyway. +- **external_ip must be trusted-proxy gated** — blindly trusting X-Forwarded-For is spoofable. + Added `RMM_TRUSTED_PROXIES` config (default `172.16.3.20`/Jupiter NPM); XFF is only trusted when + the TCP peer is a known proxy, taking the rightmost-untrusted hop (relies on NPM appending XFF). + Untrusted peers fall back to the direct peer IP. Mirrors GuruConnect's `CONNECT_TRUSTED_PROXIES`. +- **Left `06c16144` and `7459428e` todos pending** (not done) — they are in open PRs (#31, #32), + not yet merged/deployed; whoever merges should close them. Matches how BUG-015 was handled. + +## Problems Encountered + +- **SSE-auth Coding Agent drifted** — the first Coding Agent for the SSE fix returned a + *security review* of its own diff instead of an implementation report, and never committed, + pushed, or opened a PR (branch `fix/sse-auth-status-stream` had only uncommitted working-tree + edits). Caught it by inspecting the repo directly. Verified the implementation myself end-to-end + (`cargo check --tests` clean, `cargo test auth` 11/11, `tsc --noEmit` exit 0), then committed + + PR'd via a Gitea Agent. Subsequent agent briefs were given an explicit "IMPLEMENT/COMMIT/PUSH/PR + — do not substitute analysis for delivery" instruction. +- **New clippy warning from SSE fix** — `SseAuth(pub AuthUser)`'s inner field was unread (handler + takes `_auth: SseAuth`), producing a `field 0 is never read` dead-code warning. Resolved with a + scoped `#[allow(dead_code)]` + comment (the AuthUser is retained for future per-org filtering), + keeping the crate's no-new-warnings standard. +- **Gitea PR API needs the internal URL** — Cloudflare fronts the public Gitea host and blocks + non-browser API calls. PRs were opened against `http://172.16.3.20:3000` using the shared + `azcomputerguru` api-token from the SOPS vault (`services/gitea.sops.yaml`, + `credentials.api.api-token`). `gh` does not work (this is Gitea, not GitHub). +- **Branch checkout swapped auth/mod.rs** — creating `feat/agent-ip-capture` off main reverted the + working-tree `auth/mod.rs` to the main version (no SseAuth). This is expected and correct: the + SSE changes live on their own branch (PR #31); the two PRs stay cleanly separated. + +## Configuration Changes + +Files modified/created across three GuruRMM PRs (submodule `projects/msp-tools/guru-rmm`) and one +ClaudeTools-repo fix: + +**PR #30 — `fix/bug-015-arp-programs-features`:** +- `installer/gururmm-agent.wxs` (modified) — `` + 6 ARP `` + AgentBinary GUID +- `installer/gururmm-agent.ico` (new, binary, 8279 bytes, 16/32/48/256) +- `docs/FEATURE_ROADMAP.md` (modified) — BUG-015 status → Fixed (pending merge) + +**Merged to main — onboarding fix (`a735d8c`):** +- `.claude/scripts/run-onboarding-diagnostic.sh` (modified, ClaudeTools repo) — 6 jq sites normalized + +**PR #31 — `fix/sse-auth-status-stream`:** +- `server/src/auth/mod.rs` (modified) — `authenticate_token` helper + `SseAuth` extractor + tests +- `server/src/api/agents.rs` (modified) — `_auth: SseAuth` on `agent_status_stream`, doc update +- `dashboard/src/pages/{AgentDetail,Agents,SiteDetail}.tsx` (modified) — `?token=` on EventSource + +**PR #32 — `feat/agent-ip-capture`:** +- `server/migrations/048_agent_ip_addresses.sql` (new) — local_ips JSONB, external_ip TEXT, index +- `server/src/config.rs` (modified) — `trusted_proxies` + `parse_trusted_proxies` + tests + warn +- `server/src/db/agents.rs` (modified) — fields on Agent/AgentResponse/AgentWithDetails + helpers +- `server/src/ws/mod.rs` (modified) — `resolve_external_ip` helper + ws_handler wiring + stamping +- `dashboard/src/api/client.ts` (modified) — `local_ips`/`external_ip` on Agent type +- `dashboard/src/pages/AgentDetail.tsx` (modified) — External IP + Local IPs display + +## Credentials & Secrets + +- No new credentials created or discovered this session. +- Gitea PR creation used the existing shared api-token: vault `services/gitea.sops.yaml` field + `credentials.api.api-token` (account: `azcomputerguru`), against internal `http://172.16.3.20:3000`. + +## Infrastructure & Servers + +- GuruRMM server (Pluto): `172.16.3.30` — API/WS `:3001`, coord API `:8001`. Dashboard + `rmm.azcomputerguru.com`; API `rmm-api.azcomputerguru.com`. +- Jupiter NPM/openresty reverse proxy: `172.16.3.20` — the trusted proxy for X-Forwarded-For; also + hosts internal Gitea on `:3000`. New `RMM_TRUSTED_PROXIES` config defaults to `172.16.3.20`. +- Builds: agent MSI builds on Pluto via webhook pipeline; `wix` is not installed on Howard-Home + (WiX change validated by XML well-formedness + review only, not a local build). + +## Commands & Outputs + +- `cargo check --tests` (server) — clean, only pre-existing dead-code warnings (~80). +- `cargo test --bin gururmm-server auth` — 11 passed (incl. 4 new `authenticate_logic` tests). +- `cargo test ... external_ip_tests ip_tests trusted_proxy_tests` — 9 passed. +- `npx tsc --noEmit` (dashboard) — exit 0. +- Toolchain on Howard-Home: cargo 1.95.0, node v24.15.0. Server uses runtime sqlx + (`SQLX_OFFLINE=true`, no DB needed to compile). + +## Pending / Incomplete Tasks + +- **PRs #30, #31, #32 awaiting Mike's review/merge.** None merged. On merge, close coord todos + `06c16144` (SSE auth) and `7459428e` (agent IP capture), and flip BUG-015's roadmap checklist box. +- **Post-merge verification:** + - PR #30: after CI builds the MSI on Pluto, verify "GuruRMM Agent" appears in Programs & Features + on a Win10/Win11 test VM before any client rollout. Existing agents only get the ARP entry on + the next MSI upgrade, not via binary auto-update. + - PR #31: verify the deployed dashboard's live agent-status badges still update (EventSource must + carry a valid `?token=`); test a hard refresh. + - PR #32: set `RMM_TRUSTED_PROXIES` if the proxy IP ever differs from `172.16.3.20`. +- **Follow-up todos filed this session:** + - LOW defense-in-depth (assigned howard): onboarding runner `findings[]` consumers still use the + old `// []` pattern — safe only because the probe always emits multiple findings. + - PR #31 body notes: EventSource retries 401 ~every 3s for logged-out/expired users (consider + gating on `useAuth().isAuthenticated`); proxy logs the `?token=` (consider access-log scrubbing + or a short-lived SSE ticket). +- **Not started:** `windows-update-mvp` shape spec (P1, 8 tasks, build-ready) — the remaining big item. + +## Reference Information + +- PRs: #30 (BUG-015), #31 (SSE auth), #32 (agent IP capture) — `azcomputerguru/gururmm` on Gitea. +- Commits: onboarding fix `a735d8c` (main); PR #30 `a27147a`+`41ec355`; PR #31 `fdf39f1`; + PR #32 `a06f4fa`+`791a2df`. +- Coord todos: `cc5dbdfa` (done), `06c16144` (pending/PR #31), `7459428e` (pending/PR #32), + `15a5440f` (largely merged), plus new findings defense-in-depth follow-up. +- Roadmap/specs: `projects/msp-tools/guru-rmm/docs/FEATURE_ROADMAP.md`, `docs/specs/SPEC-011-*`, + shape spec `specs/windows-update-mvp/` (plan.md is build-ready, Task 1 = migration). +- Migration sequence: latest applied is 047; PR #32 adds 048. +- Submodule base for all branches: `6f31d22` (= origin/main at session start).