diff --git a/.claude/memory/project_guruconnect_deploy.md b/.claude/memory/project_guruconnect_deploy.md index 1b3ed4c..f766039 100644 --- a/.claude/memory/project_guruconnect_deploy.md +++ b/.claude/memory/project_guruconnect_deploy.md @@ -41,8 +41,11 @@ GOTCHAS (all hit on the 2026-05-30 deploy): `WatchdogSec=30s` — so do NOT run `setup-systemd.sh` / copy the repo unit, or v2 restart-loops every 30s. Unit: User=guru, EnvironmentFile=server/.env, WorkingDirectory=server/, ProtectSystem=strict. - **`CONNECT_TRUSTED_PROXIES`** is a v2 env var (comma-separated IPs; defaults to loopback fail-closed). - NPM proxies from `172.16.3.30`, so set `CONNECT_TRUSTED_PROXIES=127.0.0.1,::1,172.16.3.30` in - `server/.env` or client-IP extraction (rate-limit-per-IP, logging) is wrong. Only `JWT_SECRET` is hard-required. + Public `connect.azcomputerguru.com` ingresses through **NPM on Jupiter (172.16.3.20)**, which forwards to + the relay on 172.16.3.30:3002. So set `CONNECT_TRUSTED_PROXIES=127.0.0.1,::1,172.16.3.20` in `server/.env` + (the Jupiter NPM hop, NOT the relay host .30 — that was a wrong first guess). Without trusting 172.16.3.20 + the relay logs every public agent as 172.16.3.20 instead of reading X-Forwarded-For; with it, the real client + IP shows (verified: a Pavon agent logged its true public IP 98.172.64.243). Only `JWT_SECRET` is hard-required. - **NULL tags bug:** `connect_machines.tags` is `text[]` nullable with no default; v2 decodes it as non-`Option`, so rows with NULL tags throw "unexpected null" at reconcile (and likely the Machines list). Mitigated with `UPDATE connect_machines SET tags='{}' WHERE tags IS NULL`. Real fix is a diff --git a/session-logs/2026-05-30-session.md b/session-logs/2026-05-30-session.md index 1e5ee6e..22e4715 100644 --- a/session-logs/2026-05-30-session.md +++ b/session-logs/2026-05-30-session.md @@ -352,3 +352,62 @@ A shortcut's icon only brands the `.lnk` itself. Once `wt.exe` is the running pr - Restore WT settings: copy `settings.json.bak` over `settings.json` (path: `%LOCALAPPDATA%\Packages\Microsoft.WindowsTerminal_8wekyb3d8bbwe\LocalState\`). - Revert shortcut to inline launch: set `Claude.lnk` Args back to `-d "D:\claudetools" "C:\Users\guru\.local\bin\claude.exe" --chrome --dangerously-skip-permissions`. - Icon source PNG: `C:\Users\guru\.local\bin\claude-icon.png` (safe to delete if profile removed). + +--- + +## Update: 15:25 PT — GuruConnect v2 production deploy + hotfixes + client-IP investigation + +### Session Summary (this update) +Deployed GuruConnect v2 to production (172.16.3.30, public connect.azcomputerguru.com) and resolved three post-deploy issues. The deploy was manual: the server (hostname gururmm) builds its own Linux binary (rust 1.94.1 + node 20 present via login shell). Recon established the model (systemd service guruconnect, EnvironmentFile server/.env, sqlx migrations auto-run on startup and are embedded in the binary at compile time, SPA served from server/static/app). The server's local git main had diverged from origin (v2 greenfield rewrote history), so reset --hard origin/main to 96b4fd7. Backed up the DB first, built the SPA + binary, set CONNECT_TRUSTED_PROXIES, confirmed the installed unit has no WatchdogSec (so v2 without sd_notify won't restart-loop; do NOT run setup-systemd.sh which would re-add it), restarted, migrations 004/005/006 applied, smoke-tested local + public. + +Hotfix 1 (tags decode bug): a startup WARN "Failed to reconcile managed sessions: column tags unexpected null" was a real bug — connect_machines.tags (text[] nullable, no default) had 6 NULL rows, decoded by the derived FromRow as non-Option Vec. Hot-patched the data (NULL->'{}'), then fixed properly: manual FromRow making all nullable-non-Option columns NULL-tolerant + migration 007 (backfill, DEFAULT '{}', NOT NULL). Code Review APPROVE, committed abc55ab, redeployed; migration 007 confirmed applied (tags nullable=NO default='{}'). + +Login problem: neither admin nor howard could log into the portal. Diagnosed — both accounts were enabled with valid Argon2 hashes and the login API worked end-to-end (verified 401 on wrong password via localhost AND public NPM), so it was wrong passwords, not a code/flow bug. With Mike's authorization, reset both passwords using python argon2-cffi (verify is param-agnostic, so server-compatible), verified login returns 200 local + public. + +Client-IP investigation: the relay logged repeated agent rejections "from 172.16.3.20" (agent_id 795cbc06, invalid API key, every ~5s). Ruled out the gururmm-agent container (stop-test: rejects continued). Mike identified DESKTOP-I66IM5Q as the Pavon (Raiders site) external client machine (GeoVision box, public 98.172.64.243). Root cause: the Pavon agent connects via the public URL through Jupiter's NPM (172.16.3.20); since 172.16.3.20 was not in CONNECT_TRUSTED_PROXIES (had wrongly trusted .30), the relay logged the proxy hop instead of reading X-Forwarded-For. Fixed: set CONNECT_TRUSTED_PROXIES=127.0.0.1,::1,172.16.3.20, redeployed; reject log now shows the real client IP 98.172.64.243. Remaining: the Pavon agent's old shared key is invalid post-cutover and needs re-enrollment (client-side todo filed). + +### Key Decisions (this update) +- Build on the server itself (rust+node present) rather than cross-build; reset --hard origin/main since the server's divergent local commit (1bfd476) was a stale local-only commit, SHA saved. +- Did NOT run setup-systemd.sh: the installed unit lacks WatchdogSec, which is correct for v2 (no sd_notify); the repo unit would re-add it and cause a 30s restart loop. +- Build the binary while v1 stayed running (atomic rename is safe), restart as the cutover, to avoid downtime on a build failure. +- Reset passwords via python argon2-cffi rather than a server CLI (none exists); argon2 verify is param-agnostic so any argon2id PHC hash is accepted. +- CONNECT_TRUSTED_PROXIES must trust Jupiter NPM (172.16.3.20), not the relay host (.30) — the public ingress proxy is on Jupiter. +- Tags migration 007 sets NOT NULL after backfill: safe because no writer inserts NULL (upsert omits tags -> default; metadata update binds a non-null array). + +### Problems Encountered (this update) +- git pull --ff-only refused (divergent history from the greenfield respec) -> git reset --hard origin/main (clean tree, rollback SHA saved). +- npm ci/build failed initially because the tree was still v1 (pull had aborted); resolved after the reset to v2. +- NULL-tags reconcile WARN -> data hot-patch + code fix (007) -> redeploy. +- Login appeared broken; proved the API works (401 on bad pw, both paths) -> it was wrong passwords -> reset. +- Reject "from 172.16.3.20" misattribution: not gururmm-agent (stop-test), not a Jupiter VM (Mike rejected the suspend), not NPM HTTP/stream config found by grep; root cause was the trusted-proxy misconfig truncating the real client IP at the Jupiter NPM hop. Fixed by trusting 172.16.3.20. +- Bash CWD drifted into the submodule (SSH/git work) breaking the relative whoami-block.sh path; re-ran from /d/claudetools. + +### Configuration Changes (this update) +- guru-connect submodule: abc55ab (fix: NULL-tags decode + migration 007). Files: server/src/db/machines.rs (manual FromRow), server/migrations/007_fix_machine_tags_null.sql. +- Parent claudetools pointer commits (local, unpushed): 56ce575 (Users view, earlier), 40a2eb4 (tags fix). Plus this session log + memory edits. +- Server (172.16.3.30) /home/guru/guru-connect: reset to abc55ab; server/.env CONNECT_TRUSTED_PROXIES=127.0.0.1,::1,172.16.3.20; rebuilt binary; SPA at server/static/app; service restarted (now v0.2.1). +- DB: migration 007 applied; connect_machines.tags now NOT NULL DEFAULT '{}'; admin+howard password_hash reset. +- Memory: .claude/memory/project_guruconnect_deploy.md created + corrected (trusted-proxy = Jupiter 172.16.3.20). + +### Credentials & Secrets (this update) +- GuruConnect portal (connect.azcomputerguru.com), reset 2026-05-30 (change on first login): + - admin : WNwKn-qp4eW-jkXAs + - howard : iWACa-Ks5PP-nrP6x + +### Infrastructure & Servers (this update) +- GuruConnect server: VM "GuruRMM" = host gururmm = 172.16.3.30:3002 (systemd guruconnect, WorkingDirectory /home/guru/guru-connect/server, EnvironmentFile server/.env, binary target/x86_64-unknown-linux-gnu/release/guruconnect-server). Postgres 14 db guruconnect on localhost. +- Public ingress: connect.azcomputerguru.com -> Jupiter NPM (172.16.3.20, jc21/nginx-proxy-manager) -> 172.16.3.30:3002. Trust 172.16.3.20 for client-IP extraction. +- Jupiter (172.16.3.20, Unraid): runs NPM + VMs (GuruRMM, GuruConnect, Claude-Builder=Pluto, Unifi, OwnCloud) + the gururmm-agent container (host networking; NOT the reject source). +- DB backup: /home/guru/backups/guruconnect/pre-v2-cutover-20260530-213507.sql.gz; v1 binary ~/guruconnect-server.v1.bak; rollback commit /tmp/gc-rollback-commit.txt (1bfd476). +- Pavon/Raiders client machine DESKTOP-I66IM5Q: public 98.172.64.243, private 192.168.0.10, GeoVision box, GuruConnect agent_id 795cbc06 (stale key). + +### Pending / Incomplete Tasks (this update) +- Re-enroll/re-key the Pavon DESKTOP-I66IM5Q GuruConnect agent (old shared key invalid post-cutover) - todo filed. +- Mike to log into connect.azcomputerguru.com and validate all four dashboard views against live data. +- claudetools parent: unpushed pointer commits (56ce575, 40a2eb4) + memory edits + this log -> this /save pushes them. +- Open server todos: viewer-token revocation on logout, multi-instance DB single-use gate, agent clippy in CI, support-code expires_at, users authz gaps (self-demotion/last-admin guards), H.264 go-live gating. + +### Reference Information (this update) +- guru-connect submodule HEAD: abc55ab. Server component: deployed v0.2.1. +- Deploy memory: .claude/memory/project_guruconnect_deploy.md. +- Verified reject log post-fix: "Agent connection rejected: 795cbc06-... from 98.172.64.243 - invalid API key".