From 89a3369097bb5cfb137c74ab16fb0a53ec46da74 Mon Sep 17 00:00:00 2001 From: Mike Swanson Date: Sat, 30 May 2026 17:59:44 -0700 Subject: [PATCH] sync: auto-sync from GURU-5070 at 2026-05-30 17:59:38 Author: Mike Swanson Machine: GURU-5070 Timestamp: 2026-05-30 17:59:38 --- session-logs/2026-05-30-session.md | 79 ++++++++++++++++++++++++++++++ 1 file changed, 79 insertions(+) diff --git a/session-logs/2026-05-30-session.md b/session-logs/2026-05-30-session.md index 22e4715..9aeb1f8 100644 --- a/session-logs/2026-05-30-session.md +++ b/session-logs/2026-05-30-session.md @@ -411,3 +411,82 @@ Client-IP investigation: the relay logged repeated agent rejections "from 172.16 - guru-connect submodule HEAD: abc55ab. Server component: deployed v0.2.1. - Deploy memory: .claude/memory/project_guruconnect_deploy.md. - Verified reject log post-fix: "Agent connection rejected: 795cbc06-... from 98.172.64.243 - invalid API key". + +--- + +## Update: 00:56 PT — GuruConnect/GuruRMM feature specs, RMM CI docs-guard, GC v2 sprint planning + +### Session Summary + +Started with a clean `/sync` (both repos already in sync). Then handled an infra request: a Pavon machine was hammering the GuruConnect relay with auth failures. Used `/rmm` to identify the offending endpoint — only `DESKTOP-I66IM5Q` (Pavon/Raiders, external IP 98.172.64.243) carried the GuruConnect client; the Curves box did not. Removed it cleanly (killed the running `guruconnect-pavon-raidersreef` process, deleted the GeoVision HKCU Run-key entry, the desktop launcher, and the `C:\Program Files\GuruConnect` copy). Mike pushed back twice that I should match the offending IP to the agent rather than reconning every candidate; the RMM agent record carries no IP fields at all, which became GuruRMM todo `7459428e` (capture local + external agent IPs). Saved a feedback memory. + +Answered a Claude Code question (Windows Snipping-Tool clipboard images no longer paste with Alt+V — a confirmed DIB-vs-CF_HDROP regression; copied image files still work). Drafted a `/feedback` writeup and a GitHub issue; Mike submitted the feedback. + +Filed a large batch of feature requests as researched specs. GuruConnect (via `/gc-feature-request`): SPEC-003 machine inventory, SPEC-004 stable machine identity + session lifecycle reaping + operator removal, SPEC-005 machines list view (dual Host/Guest indicators + rich rows), SPEC-006 universal machine search, SPEC-007 managed-agent installer builder, SPEC-008 valuable error messages, SPEC-009 feature-rich documented API. GuruRMM (via `/feature-request`): SPEC-018 valuable error messages, SPEC-019 feature-rich documented API, SPEC-020 migrate CI/CD from webhook+shell to Gitea Actions. Pushed everything to Gitea. + +Implemented the SPEC-020 Phase-0 interim fix live: added a docs-only build guard to the GuruRMM build webhook handler (`/opt/gururmm/webhook-handler.py` on 172.16.3.30) so pushes touching only `docs/`, `*.md`, `.claude/`, `session-logs/` skip the build. Patched on-host (a local `/tmp` path-mapping bug made the edit round-trip unreliable), backed up the original, unit-tested 12 cases + syntax-checked before deploy, restarted the service, and verified live (a real docs push and a test POST both returned "build skipped", no build locks). Recorded a project memory. + +Closed with GC sprint planning. Mike chose "v2 reset first." While scoping Sprint 0 (the 3 relay-auth CRITICAL hotfix), discovered from the git log and the running server that v2 Phase 1 (secure-session-core Tasks 1-7) is ALREADY implemented and DEPLOYED, and the 3 CRITICALs are already closed in production. The roadmap banner written minutes earlier (claiming the bypasses were live) was wrong; corrected it and re-baselined. Created a 5-task tracked list for the actual remainder (verification + code review, not building). + +### Key Decisions + +- Accepted Mike's correction to identify the offending Pavon endpoint by matching the known external IP rather than reconning all candidates; root-caused that GuruRMM stores no agent IPs and filed todo 7459428e. +- For SPEC-004, made stable machine-derived identity (deterministic `machine_uid`, MachineGuid-based, bound to the per-agent key) the PRIMARY fix per Mike — reaping/removal became defense-in-depth. Flagged that a client-asserted hash is spoofable and must be auth-bound. +- RMM CI: chose the minimal host-script path guard (Phase 0) over migrating to Gitea Actions immediately; the full migration is SPEC-020. Guard is fail-safe toward building (skips only when every changed file is provably non-buildable). +- GC direction: v2 reset first (Mike). Then corrected course on discovering Phase 1 is already done — the planned Sprint 0 CRITICAL hotfix was a no-op. Re-scoped to verification. +- Did NOT patch the stale repo copy `scripts/webhook-handler.py` (109 lines vs deployed 206) — would have triggered a wasteful build and implied maintenance it lacks. Host is source of truth until SPEC-020. + +### Problems Encountered + +- Local `/tmp` path mismatch: the editor tools and the Bash shell resolved `/tmp/webhook-handler.py` to different physical files, so `pscp` uploaded the un-edited copy. Resolved by patching the file on-host via a Python script piped over SSH stdin. +- `py_compile` failed writing to root-owned `/tmp/__pycache__` — used `python3 -B` / `ast.parse` instead. +- importlib refused a `.py.new` extension; tested via `exec(open(...).read(), ns)` into a namespace. +- Roadmap banner factual error (CRITICALs "live") — self-introduced from the stale 2026-05-29 audit narrative; caught by reading the actual relay code + git log, then corrected. + +### Configuration Changes + +- guru-connect repo: added `docs/specs/SPEC-003..009*.md`; edited `docs/FEATURE_ROADMAP.md` (entries, v2-first banner, then v2 re-baseline correction). +- guru-rmm repo: added `docs/specs/SPEC-018/019/020*.md`; edited `docs/FEATURE_ROADMAP.md`. +- claudetools (parent): submodule pointer bumps for both; new memories `.claude/memory/feedback_rmm_identify_by_ip.md`, `.claude/memory/project_rmm_webhook_docs_guard.md`; updated `.claude/memory/MEMORY.md`. +- BUILD HOST 172.16.3.30 (NOT in git): `/opt/gururmm/webhook-handler.py` patched with the docs-only guard; backup `/opt/gururmm/webhook-handler.py.bak-20260530-guard`. +- Endpoint `DESKTOP-I66IM5Q`: removed GuruConnect client (Run-key, desktop exe, Program Files dir). + +### Credentials & Secrets + +- Build/host SSH used: `guru@172.16.3.30:22` — already vaulted at `infrastructure/gururmm-server.sops.yaml` (sudo password same as SSH). No new secrets created. +- RMM API admin creds: `infrastructure/gururmm-server.sops.yaml` `credentials.gururmm-api.*`. +- Gitea webhook secret `gururmm-build-secret`: `projects/gururmm/ci-cd.sops.yaml`. + +### Infrastructure & Servers + +- 172.16.3.30 (Ubuntu 22.04) — hosts BOTH GuruConnect (`guruconnect.service`, listening :3002, deployed checkout `abc55ab`) and GuruRMM (server :3001, build host). GuruRMM build webhook: `gururmm-webhook.service` → `/opt/gururmm/webhook-handler.py` (binds 127.0.0.1:9000, nginx proxies `/webhook/build`); per-platform builds via `build-shared.sh` + `build-{linux,windows,mac}.sh`; Pluto (172.16.3.36) does the Windows/MSI build over SSH. +- Gitea internal: `http://172.16.3.20:3000` (preferred on-network). +- GC v2 secure-session-core: Tasks 1-7 committed; CRITICALs closed in deployed prod (verified `abc55ab` descends from CRITICAL#1 fix `a453e79` + Task 7 `f9bdecb`). + +### Commands & Outputs + +- RMM GuruConnect removal verified: `guruconnect procs running: none`, Run-key gone, files deleted. +- Webhook guard live test: docs-only POST → `Docs-only change -- build skipped`; non-main ref → `Ignored push`; no build locks; `last-built-commit` unchanged (`ef0830f`). +- GC prod check: `guruconnect.service active running`, `ss -tlnp` shows `:3002 guruconnect-ser pid 1287186`. + +### Pending / Incomplete Tasks + +Tracked list (TaskCreate #1-5) — the real GC Phase-1-exit remainder: +1. Code-review secure-session-core Tasks 3-5 (pending review; written without a compiler, since built+deployed). Highest priority. +2. Security re-audit — `/gc-audit --pass=security` + 4 manual CRITICAL checks. +3. Functional verification — consent flow, key fidelity (Win+R/clipboard/Ctrl+Alt+Del/no stuck modifiers), rate limiting, fresh-DB migrations. Needs a real Windows desktop. +4. Live HW-H.264 validation — GPU needed on the AGENT (encode: QuickSync/NVENC/AMF) and the VIEWER (decode); server needs NO GPU. Then flip `DEFAULT_PREFER_H264`. Non-blocking (raw is default). +5. Retire deprecated shared `AGENT_API_KEY` fallback — GATED on confirming zero agents depend on it. + +Other open threads: +- SPEC-020 (RMM CI → Gitea Actions) staged as a spec; Phase-0 guard is live. Ratify as RMM ADR-009 when started. +- GuruRMM todo `7459428e` — capture agent local + external IPs. +- GC SPEC-003..009 fold into v2 Phase 2/3 (annotated on the roadmap). + +### Reference Information + +- GC commits: SPEC-003 `abf499c` → SPEC-009 `7ab8738`; roadmap v2 banner `03f62d4`; v2 re-baseline `786d3e4`. +- RMM commits: SPEC-018 `be2b6f0`, SPEC-019 `ef0830f`, SPEC-020 `950fa08`. +- GC secure-session-core: plan at `specs/v2-secure-session-core/plan.md`; Tasks: 1 `fef8111`, 2 `41691bf`, 3 `0f25878`, CRITICAL#1 split `a453e79`, 4 `bfcdbb5`, 5 `9082e11`, 6 `bb73ba6`, 7 `f9bdecb`. +- Pavon/Raiders endpoint: `DESKTOP-I66IM5Q`, WAN 98.172.64.243; Curves: `DESKTOP-VRBQ6LM`, WAN 174.78.94.186 / LAN 192.168.1.128 / MAC 04:42:1A:0C:8C:A6. +- Claude Code clipboard regression: Alt+V is the correct Windows binding; DIB bitmap (Snipping Tool) fails, CF_HDROP file paste works; CLI v2.1.158.