Files
claudetools/session-logs/2026-05-31-howard-gururmm-roadmap.md
Howard Enos 63e46eacdf sync: auto-sync from HOWARD-HOME at 2026-05-31 15:22:41
Author: Howard Enos
Machine: HOWARD-HOME
Timestamp: 2026-05-31 15:22:41
2026-05-31 15:22:53 -07:00

11 KiB

Session Log — 2026-05-31 — Howard — GuruRMM roadmap: BUG-015, onboarding jq fix, SSE auth, agent IP capture

User

  • User: Howard Enos (howard)
  • Machine: Howard-Home
  • Role: tech

Session Summary

Worked through the GuruRMM roadmap/bug/todo backlog and shipped four items. Started by pulling live coord state (no active locks; reviewed the pending gururmm todos) and the FEATURE_ROADMAP. Confirmed the previously-assigned quick-wins todo (15a5440f) was already largely merged (def0d34 on main; BUG-008/009/010/011 marked Fixed), leaving BUG-015 as the only survivor from that tier. Submodule was even with origin/main at 6f31d22.

Four items were completed, each on its own branch + PR (except the onboarding fix, which went straight to main), each gated through a Coding Agent → Code Review Agent flow with a coordination lock claimed/released around the work:

  1. BUG-015 / SPEC-011 — agent now registers in Windows Programs & Features. Installer-only WiX change: added an <Icon> element (Icon table) + 6 ARP properties + pinned the AgentBinary component GUID, plus a generated multi-resolution gururmm-agent.ico. PR #30.
  2. Onboarding diagnostic jq bug (cc5dbdfa) — PowerShell ConvertTo-Json collapses single-element arrays to bare objects/strings, which silently dropped the Fixed Volumes table and errored the network-adapter/local-admin/software-diff lines on single-volume/NIC/admin machines. Fixed jq-side in the runner (backward-compatible with already-written immutable baselines). Merged to main (a735d8c). Todo marked done.
  3. SSE auth (06c16144) — the public /api/agents/status-stream SSE endpoint leaked every agent UUID + online/offline state. Added a dedicated SseAuth extractor that accepts the JWT via a ?token= query param (EventSource can't set headers), deliberately NOT broadening the global AuthUser extractor. Three dashboard EventSource callers updated. PR #31.
  4. Agent IP capture (7459428e) — added agents.local_ips (JSONB, extracted server-side from the inventory the agent already sends — no agent change) and agents.external_ip (TEXT, stamped at WS auth from X-Forwarded-For but only when the TCP peer is a configured trusted proxy). Migration 048 + API + Agent Detail UI. PR #32.

Three PRs (#30, #31, #32) are stacked for Mike's review. The onboarding fix is live on main.

Key Decisions

  • BUG-015 icon via <Icon> element, not <File> — SPEC-011 proposed setting ARPPRODUCTICON to a <File> Id and installing the .ico to Program Files. In MSI, ARPPRODUCTICON must reference a row in the Icon table, which <File> does not populate, so the spec's approach would leave it dangling. Used the WiX-canonical <Icon> element (embeds into the MSI, no disk install, no extra component). Code Review confirmed this is the materially-correct approach.
  • Onboarding fix jq-side, not PS-side — followed the todo's decision: fixing the runner's jq is backward-compatible with already-written immutable baselines (which may carry the collapsed shape), whereas a PS @() wrap would only fix future baselines and the diff against old ones would still break. Used the universal idiom if type=="array" then . elif .==null then [] else [.] end. Extended the fix to inner per-adapter .ip/.dns arrays (single-IP adapters — the common case — collapse to a bare string and break join), beyond the todo's named sites.
  • SSE auth via dedicated extractor, not global AuthUser — adding a ?token= fallback to the global AuthUser would make every authenticated endpoint accept tokens in the URL, broadening token-in-URL leakage (proxy logs, history, referrers). Scoped a separate SseAuth extractor to the one stream instead. SseAuth still tries the Authorization header first for non-browser clients. Kept scope at "require valid JWT" (closes the public leak); per-org stream filtering is a separate future change.
  • Agent IP capture needs no agent change — the agent already ships NIC IPs in InventoryReport.network_interfaces, so local_ips is extracted server-side from existing data (loopback/link-local filtered, deduped). Avoids an agent rebuild/redeploy. external_ip is server-stamped. Better than the todo's literal "agent heartbeat enumerates NICs" — IPs change rarely so inventory cadence suffices, and Heartbeat is a unit variant anyway.
  • external_ip must be trusted-proxy gated — blindly trusting X-Forwarded-For is spoofable. Added RMM_TRUSTED_PROXIES config (default 172.16.3.20/Jupiter NPM); XFF is only trusted when the TCP peer is a known proxy, taking the rightmost-untrusted hop (relies on NPM appending XFF). Untrusted peers fall back to the direct peer IP. Mirrors GuruConnect's CONNECT_TRUSTED_PROXIES.
  • Left 06c16144 and 7459428e todos pending (not done) — they are in open PRs (#31, #32), not yet merged/deployed; whoever merges should close them. Matches how BUG-015 was handled.

Problems Encountered

  • SSE-auth Coding Agent drifted — the first Coding Agent for the SSE fix returned a security review of its own diff instead of an implementation report, and never committed, pushed, or opened a PR (branch fix/sse-auth-status-stream had only uncommitted working-tree edits). Caught it by inspecting the repo directly. Verified the implementation myself end-to-end (cargo check --tests clean, cargo test auth 11/11, tsc --noEmit exit 0), then committed + PR'd via a Gitea Agent. Subsequent agent briefs were given an explicit "IMPLEMENT/COMMIT/PUSH/PR — do not substitute analysis for delivery" instruction.
  • New clippy warning from SSE fixSseAuth(pub AuthUser)'s inner field was unread (handler takes _auth: SseAuth), producing a field 0 is never read dead-code warning. Resolved with a scoped #[allow(dead_code)] + comment (the AuthUser is retained for future per-org filtering), keeping the crate's no-new-warnings standard.
  • Gitea PR API needs the internal URL — Cloudflare fronts the public Gitea host and blocks non-browser API calls. PRs were opened against http://172.16.3.20:3000 using the shared azcomputerguru api-token from the SOPS vault (services/gitea.sops.yaml, credentials.api.api-token). gh does not work (this is Gitea, not GitHub).
  • Branch checkout swapped auth/mod.rs — creating feat/agent-ip-capture off main reverted the working-tree auth/mod.rs to the main version (no SseAuth). This is expected and correct: the SSE changes live on their own branch (PR #31); the two PRs stay cleanly separated.

Configuration Changes

Files modified/created across three GuruRMM PRs (submodule projects/msp-tools/guru-rmm) and one ClaudeTools-repo fix:

PR #30 — fix/bug-015-arp-programs-features:

  • installer/gururmm-agent.wxs (modified) — <Icon> + 6 ARP <Property> + AgentBinary GUID
  • installer/gururmm-agent.ico (new, binary, 8279 bytes, 16/32/48/256)
  • docs/FEATURE_ROADMAP.md (modified) — BUG-015 status → Fixed (pending merge)

Merged to main — onboarding fix (a735d8c):

  • .claude/scripts/run-onboarding-diagnostic.sh (modified, ClaudeTools repo) — 6 jq sites normalized

PR #31 — fix/sse-auth-status-stream:

  • server/src/auth/mod.rs (modified) — authenticate_token helper + SseAuth extractor + tests
  • server/src/api/agents.rs (modified) — _auth: SseAuth on agent_status_stream, doc update
  • dashboard/src/pages/{AgentDetail,Agents,SiteDetail}.tsx (modified) — ?token= on EventSource

PR #32 — feat/agent-ip-capture:

  • server/migrations/048_agent_ip_addresses.sql (new) — local_ips JSONB, external_ip TEXT, index
  • server/src/config.rs (modified) — trusted_proxies + parse_trusted_proxies + tests + warn
  • server/src/db/agents.rs (modified) — fields on Agent/AgentResponse/AgentWithDetails + helpers
  • server/src/ws/mod.rs (modified) — resolve_external_ip helper + ws_handler wiring + stamping
  • dashboard/src/api/client.ts (modified) — local_ips/external_ip on Agent type
  • dashboard/src/pages/AgentDetail.tsx (modified) — External IP + Local IPs display

Credentials & Secrets

  • No new credentials created or discovered this session.
  • Gitea PR creation used the existing shared api-token: vault services/gitea.sops.yaml field credentials.api.api-token (account: azcomputerguru), against internal http://172.16.3.20:3000.

Infrastructure & Servers

  • GuruRMM server (Pluto): 172.16.3.30 — API/WS :3001, coord API :8001. Dashboard rmm.azcomputerguru.com; API rmm-api.azcomputerguru.com.
  • Jupiter NPM/openresty reverse proxy: 172.16.3.20 — the trusted proxy for X-Forwarded-For; also hosts internal Gitea on :3000. New RMM_TRUSTED_PROXIES config defaults to 172.16.3.20.
  • Builds: agent MSI builds on Pluto via webhook pipeline; wix is not installed on Howard-Home (WiX change validated by XML well-formedness + review only, not a local build).

Commands & Outputs

  • cargo check --tests (server) — clean, only pre-existing dead-code warnings (~80).
  • cargo test --bin gururmm-server auth — 11 passed (incl. 4 new authenticate_logic tests).
  • cargo test ... external_ip_tests ip_tests trusted_proxy_tests — 9 passed.
  • npx tsc --noEmit (dashboard) — exit 0.
  • Toolchain on Howard-Home: cargo 1.95.0, node v24.15.0. Server uses runtime sqlx (SQLX_OFFLINE=true, no DB needed to compile).

Pending / Incomplete Tasks

  • PRs #30, #31, #32 awaiting Mike's review/merge. None merged. On merge, close coord todos 06c16144 (SSE auth) and 7459428e (agent IP capture), and flip BUG-015's roadmap checklist box.
  • Post-merge verification:
    • PR #30: after CI builds the MSI on Pluto, verify "GuruRMM Agent" appears in Programs & Features on a Win10/Win11 test VM before any client rollout. Existing agents only get the ARP entry on the next MSI upgrade, not via binary auto-update.
    • PR #31: verify the deployed dashboard's live agent-status badges still update (EventSource must carry a valid ?token=); test a hard refresh.
    • PR #32: set RMM_TRUSTED_PROXIES if the proxy IP ever differs from 172.16.3.20.
  • Follow-up todos filed this session:
    • LOW defense-in-depth (assigned howard): onboarding runner findings[] consumers still use the old // [] pattern — safe only because the probe always emits multiple findings.
    • PR #31 body notes: EventSource retries 401 ~every 3s for logged-out/expired users (consider gating on useAuth().isAuthenticated); proxy logs the ?token= (consider access-log scrubbing or a short-lived SSE ticket).
  • Not started: windows-update-mvp shape spec (P1, 8 tasks, build-ready) — the remaining big item.

Reference Information

  • PRs: #30 (BUG-015), #31 (SSE auth), #32 (agent IP capture) — azcomputerguru/gururmm on Gitea.
  • Commits: onboarding fix a735d8c (main); PR #30 a27147a+41ec355; PR #31 fdf39f1; PR #32 a06f4fa+791a2df.
  • Coord todos: cc5dbdfa (done), 06c16144 (pending/PR #31), 7459428e (pending/PR #32), 15a5440f (largely merged), plus new findings defense-in-depth follow-up.
  • Roadmap/specs: projects/msp-tools/guru-rmm/docs/FEATURE_ROADMAP.md, docs/specs/SPEC-011-*, shape spec specs/windows-update-mvp/ (plan.md is build-ready, Task 1 = migration).
  • Migration sequence: latest applied is 047; PR #32 adds 048.
  • Submodule base for all branches: 6f31d22 (= origin/main at session start).