sync: auto-sync from GURU-5070 at 2026-05-30 17:59:38

Author: Mike Swanson
Machine: GURU-5070
Timestamp: 2026-05-30 17:59:38
This commit is contained in:
2026-05-30 17:59:44 -07:00
parent fac3cad672
commit 89a3369097

View File

@@ -411,3 +411,82 @@ Client-IP investigation: the relay logged repeated agent rejections "from 172.16
- guru-connect submodule HEAD: abc55ab. Server component: deployed v0.2.1.
- Deploy memory: .claude/memory/project_guruconnect_deploy.md.
- Verified reject log post-fix: "Agent connection rejected: 795cbc06-... from 98.172.64.243 - invalid API key".
---
## Update: 00:56 PT — GuruConnect/GuruRMM feature specs, RMM CI docs-guard, GC v2 sprint planning
### Session Summary
Started with a clean `/sync` (both repos already in sync). Then handled an infra request: a Pavon machine was hammering the GuruConnect relay with auth failures. Used `/rmm` to identify the offending endpoint — only `DESKTOP-I66IM5Q` (Pavon/Raiders, external IP 98.172.64.243) carried the GuruConnect client; the Curves box did not. Removed it cleanly (killed the running `guruconnect-pavon-raidersreef` process, deleted the GeoVision HKCU Run-key entry, the desktop launcher, and the `C:\Program Files\GuruConnect` copy). Mike pushed back twice that I should match the offending IP to the agent rather than reconning every candidate; the RMM agent record carries no IP fields at all, which became GuruRMM todo `7459428e` (capture local + external agent IPs). Saved a feedback memory.
Answered a Claude Code question (Windows Snipping-Tool clipboard images no longer paste with Alt+V — a confirmed DIB-vs-CF_HDROP regression; copied image files still work). Drafted a `/feedback` writeup and a GitHub issue; Mike submitted the feedback.
Filed a large batch of feature requests as researched specs. GuruConnect (via `/gc-feature-request`): SPEC-003 machine inventory, SPEC-004 stable machine identity + session lifecycle reaping + operator removal, SPEC-005 machines list view (dual Host/Guest indicators + rich rows), SPEC-006 universal machine search, SPEC-007 managed-agent installer builder, SPEC-008 valuable error messages, SPEC-009 feature-rich documented API. GuruRMM (via `/feature-request`): SPEC-018 valuable error messages, SPEC-019 feature-rich documented API, SPEC-020 migrate CI/CD from webhook+shell to Gitea Actions. Pushed everything to Gitea.
Implemented the SPEC-020 Phase-0 interim fix live: added a docs-only build guard to the GuruRMM build webhook handler (`/opt/gururmm/webhook-handler.py` on 172.16.3.30) so pushes touching only `docs/`, `*.md`, `.claude/`, `session-logs/` skip the build. Patched on-host (a local `/tmp` path-mapping bug made the edit round-trip unreliable), backed up the original, unit-tested 12 cases + syntax-checked before deploy, restarted the service, and verified live (a real docs push and a test POST both returned "build skipped", no build locks). Recorded a project memory.
Closed with GC sprint planning. Mike chose "v2 reset first." While scoping Sprint 0 (the 3 relay-auth CRITICAL hotfix), discovered from the git log and the running server that v2 Phase 1 (secure-session-core Tasks 1-7) is ALREADY implemented and DEPLOYED, and the 3 CRITICALs are already closed in production. The roadmap banner written minutes earlier (claiming the bypasses were live) was wrong; corrected it and re-baselined. Created a 5-task tracked list for the actual remainder (verification + code review, not building).
### Key Decisions
- Accepted Mike's correction to identify the offending Pavon endpoint by matching the known external IP rather than reconning all candidates; root-caused that GuruRMM stores no agent IPs and filed todo 7459428e.
- For SPEC-004, made stable machine-derived identity (deterministic `machine_uid`, MachineGuid-based, bound to the per-agent key) the PRIMARY fix per Mike — reaping/removal became defense-in-depth. Flagged that a client-asserted hash is spoofable and must be auth-bound.
- RMM CI: chose the minimal host-script path guard (Phase 0) over migrating to Gitea Actions immediately; the full migration is SPEC-020. Guard is fail-safe toward building (skips only when every changed file is provably non-buildable).
- GC direction: v2 reset first (Mike). Then corrected course on discovering Phase 1 is already done — the planned Sprint 0 CRITICAL hotfix was a no-op. Re-scoped to verification.
- Did NOT patch the stale repo copy `scripts/webhook-handler.py` (109 lines vs deployed 206) — would have triggered a wasteful build and implied maintenance it lacks. Host is source of truth until SPEC-020.
### Problems Encountered
- Local `/tmp` path mismatch: the editor tools and the Bash shell resolved `/tmp/webhook-handler.py` to different physical files, so `pscp` uploaded the un-edited copy. Resolved by patching the file on-host via a Python script piped over SSH stdin.
- `py_compile` failed writing to root-owned `/tmp/__pycache__` — used `python3 -B` / `ast.parse` instead.
- importlib refused a `.py.new` extension; tested via `exec(open(...).read(), ns)` into a namespace.
- Roadmap banner factual error (CRITICALs "live") — self-introduced from the stale 2026-05-29 audit narrative; caught by reading the actual relay code + git log, then corrected.
### Configuration Changes
- guru-connect repo: added `docs/specs/SPEC-003..009*.md`; edited `docs/FEATURE_ROADMAP.md` (entries, v2-first banner, then v2 re-baseline correction).
- guru-rmm repo: added `docs/specs/SPEC-018/019/020*.md`; edited `docs/FEATURE_ROADMAP.md`.
- claudetools (parent): submodule pointer bumps for both; new memories `.claude/memory/feedback_rmm_identify_by_ip.md`, `.claude/memory/project_rmm_webhook_docs_guard.md`; updated `.claude/memory/MEMORY.md`.
- BUILD HOST 172.16.3.30 (NOT in git): `/opt/gururmm/webhook-handler.py` patched with the docs-only guard; backup `/opt/gururmm/webhook-handler.py.bak-20260530-guard`.
- Endpoint `DESKTOP-I66IM5Q`: removed GuruConnect client (Run-key, desktop exe, Program Files dir).
### Credentials & Secrets
- Build/host SSH used: `guru@172.16.3.30:22` — already vaulted at `infrastructure/gururmm-server.sops.yaml` (sudo password same as SSH). No new secrets created.
- RMM API admin creds: `infrastructure/gururmm-server.sops.yaml` `credentials.gururmm-api.*`.
- Gitea webhook secret `gururmm-build-secret`: `projects/gururmm/ci-cd.sops.yaml`.
### Infrastructure & Servers
- 172.16.3.30 (Ubuntu 22.04) — hosts BOTH GuruConnect (`guruconnect.service`, listening :3002, deployed checkout `abc55ab`) and GuruRMM (server :3001, build host). GuruRMM build webhook: `gururmm-webhook.service``/opt/gururmm/webhook-handler.py` (binds 127.0.0.1:9000, nginx proxies `/webhook/build`); per-platform builds via `build-shared.sh` + `build-{linux,windows,mac}.sh`; Pluto (172.16.3.36) does the Windows/MSI build over SSH.
- Gitea internal: `http://172.16.3.20:3000` (preferred on-network).
- GC v2 secure-session-core: Tasks 1-7 committed; CRITICALs closed in deployed prod (verified `abc55ab` descends from CRITICAL#1 fix `a453e79` + Task 7 `f9bdecb`).
### Commands & Outputs
- RMM GuruConnect removal verified: `guruconnect procs running: none`, Run-key gone, files deleted.
- Webhook guard live test: docs-only POST → `Docs-only change -- build skipped`; non-main ref → `Ignored push`; no build locks; `last-built-commit` unchanged (`ef0830f`).
- GC prod check: `guruconnect.service active running`, `ss -tlnp` shows `:3002 guruconnect-ser pid 1287186`.
### Pending / Incomplete Tasks
Tracked list (TaskCreate #1-5) — the real GC Phase-1-exit remainder:
1. Code-review secure-session-core Tasks 3-5 (pending review; written without a compiler, since built+deployed). Highest priority.
2. Security re-audit — `/gc-audit --pass=security` + 4 manual CRITICAL checks.
3. Functional verification — consent flow, key fidelity (Win+R/clipboard/Ctrl+Alt+Del/no stuck modifiers), rate limiting, fresh-DB migrations. Needs a real Windows desktop.
4. Live HW-H.264 validation — GPU needed on the AGENT (encode: QuickSync/NVENC/AMF) and the VIEWER (decode); server needs NO GPU. Then flip `DEFAULT_PREFER_H264`. Non-blocking (raw is default).
5. Retire deprecated shared `AGENT_API_KEY` fallback — GATED on confirming zero agents depend on it.
Other open threads:
- SPEC-020 (RMM CI → Gitea Actions) staged as a spec; Phase-0 guard is live. Ratify as RMM ADR-009 when started.
- GuruRMM todo `7459428e` — capture agent local + external IPs.
- GC SPEC-003..009 fold into v2 Phase 2/3 (annotated on the roadmap).
### Reference Information
- GC commits: SPEC-003 `abf499c` → SPEC-009 `7ab8738`; roadmap v2 banner `03f62d4`; v2 re-baseline `786d3e4`.
- RMM commits: SPEC-018 `be2b6f0`, SPEC-019 `ef0830f`, SPEC-020 `950fa08`.
- GC secure-session-core: plan at `specs/v2-secure-session-core/plan.md`; Tasks: 1 `fef8111`, 2 `41691bf`, 3 `0f25878`, CRITICAL#1 split `a453e79`, 4 `bfcdbb5`, 5 `9082e11`, 6 `bb73ba6`, 7 `f9bdecb`.
- Pavon/Raiders endpoint: `DESKTOP-I66IM5Q`, WAN 98.172.64.243; Curves: `DESKTOP-VRBQ6LM`, WAN 174.78.94.186 / LAN 192.168.1.128 / MAC 04:42:1A:0C:8C:A6.
- Claude Code clipboard regression: Alt+V is the correct Windows binding; DIB bitmap (Snipping Tool) fails, CF_HDROP file paste works; CLI v2.1.158.