sync: auto-sync from GURU-5070 at 2026-05-30 15:26:54

Author: Mike Swanson
Machine: GURU-5070
Timestamp: 2026-05-30 15:26:54
This commit is contained in:
2026-05-30 15:27:00 -07:00
parent 40a2eb4c60
commit 3895aa363c
2 changed files with 64 additions and 2 deletions

View File

@@ -41,8 +41,11 @@ GOTCHAS (all hit on the 2026-05-30 deploy):
`WatchdogSec=30s` — so do NOT run `setup-systemd.sh` / copy the repo unit, or v2 restart-loops
every 30s. Unit: User=guru, EnvironmentFile=server/.env, WorkingDirectory=server/, ProtectSystem=strict.
- **`CONNECT_TRUSTED_PROXIES`** is a v2 env var (comma-separated IPs; defaults to loopback fail-closed).
NPM proxies from `172.16.3.30`, so set `CONNECT_TRUSTED_PROXIES=127.0.0.1,::1,172.16.3.30` in
`server/.env` or client-IP extraction (rate-limit-per-IP, logging) is wrong. Only `JWT_SECRET` is hard-required.
Public `connect.azcomputerguru.com` ingresses through **NPM on Jupiter (172.16.3.20)**, which forwards to
the relay on 172.16.3.30:3002. So set `CONNECT_TRUSTED_PROXIES=127.0.0.1,::1,172.16.3.20` in `server/.env`
(the Jupiter NPM hop, NOT the relay host .30 — that was a wrong first guess). Without trusting 172.16.3.20
the relay logs every public agent as 172.16.3.20 instead of reading X-Forwarded-For; with it, the real client
IP shows (verified: a Pavon agent logged its true public IP 98.172.64.243). Only `JWT_SECRET` is hard-required.
- **NULL tags bug:** `connect_machines.tags` is `text[]` nullable with no default; v2 decodes it as
non-`Option`, so rows with NULL tags throw "unexpected null" at reconcile (and likely the Machines
list). Mitigated with `UPDATE connect_machines SET tags='{}' WHERE tags IS NULL`. Real fix is a

View File

@@ -352,3 +352,62 @@ A shortcut's icon only brands the `.lnk` itself. Once `wt.exe` is the running pr
- Restore WT settings: copy `settings.json.bak` over `settings.json` (path: `%LOCALAPPDATA%\Packages\Microsoft.WindowsTerminal_8wekyb3d8bbwe\LocalState\`).
- Revert shortcut to inline launch: set `Claude.lnk` Args back to `-d "D:\claudetools" "C:\Users\guru\.local\bin\claude.exe" --chrome --dangerously-skip-permissions`.
- Icon source PNG: `C:\Users\guru\.local\bin\claude-icon.png` (safe to delete if profile removed).
---
## Update: 15:25 PT — GuruConnect v2 production deploy + hotfixes + client-IP investigation
### Session Summary (this update)
Deployed GuruConnect v2 to production (172.16.3.30, public connect.azcomputerguru.com) and resolved three post-deploy issues. The deploy was manual: the server (hostname gururmm) builds its own Linux binary (rust 1.94.1 + node 20 present via login shell). Recon established the model (systemd service guruconnect, EnvironmentFile server/.env, sqlx migrations auto-run on startup and are embedded in the binary at compile time, SPA served from server/static/app). The server's local git main had diverged from origin (v2 greenfield rewrote history), so reset --hard origin/main to 96b4fd7. Backed up the DB first, built the SPA + binary, set CONNECT_TRUSTED_PROXIES, confirmed the installed unit has no WatchdogSec (so v2 without sd_notify won't restart-loop; do NOT run setup-systemd.sh which would re-add it), restarted, migrations 004/005/006 applied, smoke-tested local + public.
Hotfix 1 (tags decode bug): a startup WARN "Failed to reconcile managed sessions: column tags unexpected null" was a real bug — connect_machines.tags (text[] nullable, no default) had 6 NULL rows, decoded by the derived FromRow as non-Option Vec<String>. Hot-patched the data (NULL->'{}'), then fixed properly: manual FromRow making all nullable-non-Option columns NULL-tolerant + migration 007 (backfill, DEFAULT '{}', NOT NULL). Code Review APPROVE, committed abc55ab, redeployed; migration 007 confirmed applied (tags nullable=NO default='{}').
Login problem: neither admin nor howard could log into the portal. Diagnosed — both accounts were enabled with valid Argon2 hashes and the login API worked end-to-end (verified 401 on wrong password via localhost AND public NPM), so it was wrong passwords, not a code/flow bug. With Mike's authorization, reset both passwords using python argon2-cffi (verify is param-agnostic, so server-compatible), verified login returns 200 local + public.
Client-IP investigation: the relay logged repeated agent rejections "from 172.16.3.20" (agent_id 795cbc06, invalid API key, every ~5s). Ruled out the gururmm-agent container (stop-test: rejects continued). Mike identified DESKTOP-I66IM5Q as the Pavon (Raiders site) external client machine (GeoVision box, public 98.172.64.243). Root cause: the Pavon agent connects via the public URL through Jupiter's NPM (172.16.3.20); since 172.16.3.20 was not in CONNECT_TRUSTED_PROXIES (had wrongly trusted .30), the relay logged the proxy hop instead of reading X-Forwarded-For. Fixed: set CONNECT_TRUSTED_PROXIES=127.0.0.1,::1,172.16.3.20, redeployed; reject log now shows the real client IP 98.172.64.243. Remaining: the Pavon agent's old shared key is invalid post-cutover and needs re-enrollment (client-side todo filed).
### Key Decisions (this update)
- Build on the server itself (rust+node present) rather than cross-build; reset --hard origin/main since the server's divergent local commit (1bfd476) was a stale local-only commit, SHA saved.
- Did NOT run setup-systemd.sh: the installed unit lacks WatchdogSec, which is correct for v2 (no sd_notify); the repo unit would re-add it and cause a 30s restart loop.
- Build the binary while v1 stayed running (atomic rename is safe), restart as the cutover, to avoid downtime on a build failure.
- Reset passwords via python argon2-cffi rather than a server CLI (none exists); argon2 verify is param-agnostic so any argon2id PHC hash is accepted.
- CONNECT_TRUSTED_PROXIES must trust Jupiter NPM (172.16.3.20), not the relay host (.30) — the public ingress proxy is on Jupiter.
- Tags migration 007 sets NOT NULL after backfill: safe because no writer inserts NULL (upsert omits tags -> default; metadata update binds a non-null array).
### Problems Encountered (this update)
- git pull --ff-only refused (divergent history from the greenfield respec) -> git reset --hard origin/main (clean tree, rollback SHA saved).
- npm ci/build failed initially because the tree was still v1 (pull had aborted); resolved after the reset to v2.
- NULL-tags reconcile WARN -> data hot-patch + code fix (007) -> redeploy.
- Login appeared broken; proved the API works (401 on bad pw, both paths) -> it was wrong passwords -> reset.
- Reject "from 172.16.3.20" misattribution: not gururmm-agent (stop-test), not a Jupiter VM (Mike rejected the suspend), not NPM HTTP/stream config found by grep; root cause was the trusted-proxy misconfig truncating the real client IP at the Jupiter NPM hop. Fixed by trusting 172.16.3.20.
- Bash CWD drifted into the submodule (SSH/git work) breaking the relative whoami-block.sh path; re-ran from /d/claudetools.
### Configuration Changes (this update)
- guru-connect submodule: abc55ab (fix: NULL-tags decode + migration 007). Files: server/src/db/machines.rs (manual FromRow), server/migrations/007_fix_machine_tags_null.sql.
- Parent claudetools pointer commits (local, unpushed): 56ce575 (Users view, earlier), 40a2eb4 (tags fix). Plus this session log + memory edits.
- Server (172.16.3.30) /home/guru/guru-connect: reset to abc55ab; server/.env CONNECT_TRUSTED_PROXIES=127.0.0.1,::1,172.16.3.20; rebuilt binary; SPA at server/static/app; service restarted (now v0.2.1).
- DB: migration 007 applied; connect_machines.tags now NOT NULL DEFAULT '{}'; admin+howard password_hash reset.
- Memory: .claude/memory/project_guruconnect_deploy.md created + corrected (trusted-proxy = Jupiter 172.16.3.20).
### Credentials & Secrets (this update)
- GuruConnect portal (connect.azcomputerguru.com), reset 2026-05-30 (change on first login):
- admin : WNwKn-qp4eW-jkXAs
- howard : iWACa-Ks5PP-nrP6x
### Infrastructure & Servers (this update)
- GuruConnect server: VM "GuruRMM" = host gururmm = 172.16.3.30:3002 (systemd guruconnect, WorkingDirectory /home/guru/guru-connect/server, EnvironmentFile server/.env, binary target/x86_64-unknown-linux-gnu/release/guruconnect-server). Postgres 14 db guruconnect on localhost.
- Public ingress: connect.azcomputerguru.com -> Jupiter NPM (172.16.3.20, jc21/nginx-proxy-manager) -> 172.16.3.30:3002. Trust 172.16.3.20 for client-IP extraction.
- Jupiter (172.16.3.20, Unraid): runs NPM + VMs (GuruRMM, GuruConnect, Claude-Builder=Pluto, Unifi, OwnCloud) + the gururmm-agent container (host networking; NOT the reject source).
- DB backup: /home/guru/backups/guruconnect/pre-v2-cutover-20260530-213507.sql.gz; v1 binary ~/guruconnect-server.v1.bak; rollback commit /tmp/gc-rollback-commit.txt (1bfd476).
- Pavon/Raiders client machine DESKTOP-I66IM5Q: public 98.172.64.243, private 192.168.0.10, GeoVision box, GuruConnect agent_id 795cbc06 (stale key).
### Pending / Incomplete Tasks (this update)
- Re-enroll/re-key the Pavon DESKTOP-I66IM5Q GuruConnect agent (old shared key invalid post-cutover) - todo filed.
- Mike to log into connect.azcomputerguru.com and validate all four dashboard views against live data.
- claudetools parent: unpushed pointer commits (56ce575, 40a2eb4) + memory edits + this log -> this /save pushes them.
- Open server todos: viewer-token revocation on logout, multi-instance DB single-use gate, agent clippy in CI, support-code expires_at, users authz gaps (self-demotion/last-admin guards), H.264 go-live gating.
### Reference Information (this update)
- guru-connect submodule HEAD: abc55ab. Server component: deployed v0.2.1.
- Deploy memory: .claude/memory/project_guruconnect_deploy.md.
- Verified reject log post-fix: "Agent connection rejected: 795cbc06-... from 98.172.64.243 - invalid API key".