diff --git a/.claude/machines/guru-kali.md b/.claude/machines/guru-kali.md index 4ed0359..fe56a56 100644 --- a/.claude/machines/guru-kali.md +++ b/.claude/machines/guru-kali.md @@ -1,7 +1,7 @@ # Machine: GURU-KALI **Hostname:** GURU-KALI -**Last Updated:** 2026-05-24 +**Last Updated:** 2026-05-25 --- @@ -32,7 +32,7 @@ | mysql/mariadb client | 11.8.6 | | nmap | 7.99 (Kali security tooling) | | GuruRMM build dev libs | libgtk-3-dev, libayatana-appindicator3-dev, libxdo-dev, libssl-dev, pkg-config (for agent + tray builds) — added 2026-05-24 | -| NVIDIA driver | nouveau (open-source) — NO proprietary driver / CUDA | +| NVIDIA driver | nvidia-open 595.71.05 (open kernel modules + CUDA, DKMS) — built/signed for kernel 6.19 via the NVIDIA CUDA debian13 repo; nouveau blacklisted; **ACTIVE AFTER REBOOT**. Added 2026-05-25. | | jq | 1.8.1 (added 2026-05-24, needed by hooks) | | gh / docker / age / op / grepai / ollama | NOT installed | @@ -42,8 +42,8 @@ None — Ollama not installed. If installed, `qwen3:8b` (5.2 GB) fits the 8 GB VRAM fully (mirrors DESKTOP-0O8A1RL prose model); qwen3.6 / codestral / qwen3:14b would -split to CPU. GPU acceleration requires the proprietary NVIDIA driver first (currently -nouveau). See `.claude/OLLAMA.md`. +split to CPU. GPU acceleration is available via nvidia-open 595.71.05 (CUDA) once the +pending reboot activates the driver. See `.claude/OLLAMA.md`. --- @@ -97,7 +97,7 @@ Verified 2026-05-24: coord API `172.16.3.30:8001` -> HTTP 200, remote Ollama - [ ] Ollama LOCAL (not installed — would add offline Tier 0) - [ ] GrepAI semantic search (not installed) - [ ] 1Password CLI (op not installed) -- [ ] NVIDIA CUDA compute (nouveau driver — no CUDA) +- [~] NVIDIA CUDA compute — nvidia-open 595.71.05 installed (DKMS built on kernel 6.19); ACTIVE after reboot - [ ] Docker --- @@ -105,8 +105,8 @@ Verified 2026-05-24: coord API `172.16.3.30:8001` -> HTTP 200, remote Ollama ## Notes - **Strongest raw hardware in the fleet** for AI inference (i9-14900HX, 31 GB RAM, - RTX 4070 8 GB) — but GPU compute is blocked on the nouveau -> proprietary-NVIDIA - driver swap (needs a package install + reboot on Kali rolling). + RTX 4070 8 GB). NVIDIA driver now installed (see 2026-05-25 note) — GPU/CUDA compute + available after the pending reboot. - **Field/mobile laptop.** On wifi off the company LAN, but Tailscale (added 2026-05-24) bridges to internal services, so coord API/DB and remote Ollama work. A local Ollama would still add value for *offline* use (away from any network). @@ -124,3 +124,19 @@ Verified 2026-05-24: coord API `172.16.3.30:8001` -> HTTP 200, remote Ollama Screen may still blank/screensave on idle but does NOT prompt for a password. Do NOT re-enable. Persisted in `~/.config/xfce4/xfconf/xfce-perchannel-xml/xfce4-screensaver.xml` (machine-local, not in the repo). +- **2026-05-24: idle-suspend (sleep) DISABLED on AC *and* battery (user request).** + xfce4-power-manager `inactivity-on-ac=0`, `inactivity-on-battery=0` (+ sleep-mode actions 0). + Why both: it was sleeping while plugged in because the ACPI AC-adapter state goes **stale** on + this Legion + kernel 6.19 (the EC doesn't reliably send AC plug/unplug events, esp. across S3 + resume), so the OS intermittently thought it was on battery and applied the 10-min battery + suspend. `udevadm trigger --action=change /sys/class/power_supply/ADP0` forces a re-read + (online flips back to 1). Disabling idle-suspend on both states makes the flaky detection + harmless. Do NOT re-enable. Persisted in `xfce4-power-manager.xml` (machine-local). Proper + upstream fix = Lenovo BIOS/EC update. +- **2026-05-25: NVIDIA driver installed — nvidia-open 595.71.05** (open kernel modules + CUDA), + via the NVIDIA CUDA `debian13` repo (cuda-keyring) since Kali only packages 550 (too old for + kernel 6.19). DKMS module built + signed for `6.19.14+kali-amd64`; nouveau blacklisted + (`/etc/modprobe.d/nvidia.conf`) + initramfs rebuilt. **Requires one reboot to activate** (swap + the dGPU off nouveau). Hybrid graphics: Intel i915 stays the display, NVIDIA dGPU for + CUDA/offload. Secure Boot off. Verify post-reboot: `nvidia-smi` (expect RTX 4070 Mobile, driver + 595.71.05). Unblocks the local-Ollama-on-GPU option. diff --git a/session-logs/2026-05-25-session.md b/session-logs/2026-05-25-session.md index 4a5fae6..5bd0bbd 100644 --- a/session-logs/2026-05-25-session.md +++ b/session-logs/2026-05-25-session.md @@ -145,3 +145,157 @@ None. Recovery complete. - Complete removal: 190efe95-a11a-4960-869d-8be778e129bf - Final reboot: 8628dce8-8755-4a49-9904-c684455de70f - Blacklist creation: f6737830-4ca9-4ed3-b616-d3305a445f10 +# Session Log -- 2026-05-25 + +## User +- **User:** Mike Swanson (mike) +- **Machine:** DESKTOP-0O8A1RL (GURU-5070) +- **Role:** admin +- **Session span:** ~19:42 PT (2026-05-24) -- 04:59 PT (2026-05-25) + +--- + +## Session Summary + +Session opened with three completed tasks carrying over from the prior context: Pluto machine doc, rmm-audit skill update, and session save. Those were completed and synced before this session started (see 2026-05-24 session log updates). + +The MacBook's in-progress auto-update re-dispatch fix was picked up. The MacBook session had identified that agents BB-SERVER and RECEPTIONIST-PC were stuck on v0.6.37 while the fleet was on v0.6.38, and had left uncommitted changes to `server/src/ws/mod.rs`. Since those changes were not committed, the fix was reimplemented from scratch against the live server code. The Coding Agent implemented `db::get_pending_update()` check before `needs_update()` in the reconnect handler, using the original `update_id` for re-dispatch with semver guard and URL/checksum validation. A bonus discovery: migrations 042-044 (`agent_mspbackups_mapping` and related) had not been applied to production and the `.sqlx` offline cache was stale -- both fixed in the same commit (c8d5af6). Service deployed and confirmed active. Both agents confirmed on 0.6.38 with `status=completed` update records within minutes of deploy. + +Tucson Golden Corral was onboarded as a new GuruRMM client. Client "Tucson Golden Corral" and site "Co-Located" were created via the GuruRMM API (auth via admin JWT). Site enrollment key vaulted at `clients/tucson-golden-corral/gururmm-site-co-located.sops.yaml`. The IEX installer one-liner was requested -- it already existed at the dashboard installer page (`irm 'https://rmm.azcomputerguru.com/install/INNER-STORM-2733/windows' | iex`); this was not checked before asking. + +TGC-SERVER enrolled immediately after the installer was run. Metrics pulled via RMM showed: online, v0.6.38, Windows Server 2016 (build 14393), 16 GB RAM at 45.6%, 1.8 TB disk at 36.2%, CPU at 23.8%, uptime ~5 hours. Process list indicated DNS, Active Directory, SQL Server, IIS (with Certify the Web/Let's Encrypt), ScreenConnect, Hyper-V, and Chrome running as Administrator on a DC. A PowerShell command was dispatched via the RMM to enumerate installed Windows roles; result confirmed: Hyper-V installed with two VMs (MAS90 -- Running, MAS90.old -- Off) and a full RDS stack (Connection Broker, Gateway, Licensing, Session Host, Web Access). User confirmed Hyper-V should not be on this server; RDS is expected. MAS90 = Sage 100 ERP. Disposition of the VMs not yet decided -- session ended before resolution. + +--- + +## Key Decisions + +- **Reimplement from scratch rather than recover MacBook draft**: MacBook changes were uncommitted and inaccessible from DESKTOP. Reimplementation from session log description + live code produced a cleaner result than the MacBook draft which had gone through two rejection cycles. +- **Bundle migrations with fix commit**: Migrations 042-044 were a pre-existing production blocker (next CI server build would have failed silently). Bundling avoids a separate emergency fix. +- **Vault TGC enrollment key immediately on site creation**: Consistent with practice for all other clients. Key is a shared secret for agent enrollment; losing it means re-generating and updating all agents. + +--- + +## Problems Encountered + +- **Wrong field name on auth login**: Sent `username` instead of `email` field. API returned deserialization error. Fixed by reading the error message. +- **Commands endpoint field mismatch**: Sent `command_text` instead of `command` field. Discovered correct field name by reading the `SendCommandRequest` struct in `server/src/api/commands.rs`. +- **JSON escaping in bash heredoc**: Shell escaping of PowerShell dollar signs in JSON payload caused empty responses from curl. Resolved by using PowerShell's `Invoke-RestMethod` with a here-string for the command body. +- **Checked wrong IEX installer URL**: Asked if an `irm | iex` endpoint existed before checking the dashboard installer page, which already displayed it. The URL (`/install/INNER-STORM-2733/windows`) uses site_code not site_id UUID. + +--- + +## Configuration Changes + +**New files (vault repo):** +- `clients/tucson-golden-corral/gururmm-site-co-located.sops.yaml` -- GuruRMM enrollment key for TGC Co-Located site + +**Modified files (gururmm repo, pushed to Gitea):** +- `server/src/ws/mod.rs` -- added `use semver::Version;` + pending update re-dispatch logic +- `.sqlx/` -- regenerated offline query cache after applying migrations 042-044 + +**Applied DB migrations (production gururmm PostgreSQL on 172.16.3.30):** +- Migration 042 -- agent_mspbackups_mapping table +- Migration 043 -- (mspbackups related) +- Migration 044 -- (mspbackups related) + +--- + +## Credentials & Secrets + +**Tucson Golden Corral -- Co-Located site:** +- Enrollment API key: `grmm_p4g5z7Oj1-rE6GjjjrQqWBouk9BGl4v3` +- Vault: `clients/tucson-golden-corral/gururmm-site-co-located.sops.yaml` + +**GuruRMM admin (already in vault):** +- Email: `admin@azcomputerguru.com` +- Password: `GuruRMM2025` +- Vault: `projects/gururmm/dashboard.sops.yaml` + +--- + +## Infrastructure & Servers + +| Host | IP | Notes | +|------|-----|-------| +| GuruRMM server | 172.16.3.30 | gururmm-server restarted after re-dispatch fix deploy | +| TGC-SERVER | public IP 98.181.90.163 | New GuruRMM client; Windows Server 2016 build 14393; DC+DNS+SQL+IIS+RDS+Hyper-V | + +**TGC-SERVER details:** +- Agent ID: 1275daa1-3996-4ecf-a1db-c82e88f757b4 +- OS: Windows Server 2016 (build 14393), extended support ends Jan 2027 +- Roles confirmed installed: Hyper-V, RDS (full stack), AD DS, DNS +- Hyper-V VMs: MAS90 (Running -- Sage 100 ERP), MAS90.old (Off -- prior snapshot/backup) +- Other services: SQL Server, IIS + Certify the Web (Let's Encrypt), ScreenConnect client +- Administrator logged in, idle since boot, running Chrome on a DC (security concern) +- RDS expected per customer; Hyper-V NOT expected per customer + +**New GuruRMM client/site:** +- Client: Tucson Golden Corral (ID: 3248bdec-cbc3-45df-ba63-c8cdc9395e58) +- Site: Co-Located (ID: e5caa88f-f395-40e3-befa-f54e035f4293, code: INNER-STORM-2733) + +--- + +## Commands & Outputs + +`powershell +# GuruRMM API auth +POST http://172.16.3.30:3001/api/auth/login +{"email":"admin@azcomputerguru.com","password":"GuruRMM2025"} + +# Create client +POST http://172.16.3.30:3001/api/clients +{"name":"Tucson Golden Corral"} +# -> id: 3248bdec-cbc3-45df-ba63-c8cdc9395e58 + +# Create site +POST http://172.16.3.30:3001/api/sites +{"name":"Co-Located","client_id":"3248bdec-cbc3-45df-ba63-c8cdc9395e58"} +# -> site_id: e5caa88f, site_code: INNER-STORM-2733, api_key: grmm_p4g5z7Oj1-rE6GjjjrQqWBouk9BGl4v3 + +# Windows installer one-liner (already on dashboard installer page) +irm 'https://rmm.azcomputerguru.com/install/INNER-STORM-2733/windows' | iex + +# RMM command dispatched to TGC-SERVER (command ID: e4d372fb) +# Checked installed Hyper-V + RDS roles and running VMs +# Result: Hyper-V + full RDS stack installed; VMs: MAS90 (Running), MAS90.old (Off) + +# Verify BB-SERVER/RECEPTIONIST-PC update completion +SELECT hostname, old_version, target_version, status, completed_at +FROM agent_updates JOIN agents ON agents.id = agent_updates.agent_id +WHERE hostname IN ('BB-SERVER','RECEPTIONIST-PC') ORDER BY started_at DESC LIMIT 4; +# Both show status=completed, 0.6.37->0.6.38, ~00:13-00:14 UTC 2026-05-25 +` + +--- + +## Pending / Incomplete Tasks + +- **TGC-SERVER Hyper-V disposition**: MAS90 (Sage 100 ERP) is running in a Hyper-V VM on TGC-SERVER. Customer says Hyper-V should not be on this box. Options: (1) migrate MAS90 VM to dedicated Hyper-V host, (2) P2V or migrate MAS90 to run natively. Decision not made -- needs customer input on hardware and MAS90 usage pattern. +- **TGC-SERVER Chrome-on-DC**: Administrator account actively browsing from a domain controller. Should be flagged to customer and remediated (dedicated admin workstation or jump server). +- **TGC-SERVER OS age**: Windows Server 2016 -- extended support Jan 2027. Not urgent but should be in the planning queue. +- **MSPBackups Phase 2**: The mspbackups mapping migrations (042-044) were applied to production but no backup status data has been pulled yet for TGC or other clients. + +--- + +## Reference Information + +**gururmm commits:** +- `c8d5af6` -- fix(server): re-dispatch pending updates on agent reconnect + sqlx migrate + .sqlx cache + +**Agents confirmed updated:** +- BB-SERVER: agent_id 6c02baa7, now 0.6.38, completed_at 2026-05-25 00:14 UTC +- RECEPTIONIST-PC: agent_id 9c91d324, now 0.6.38, completed_at 2026-05-25 00:13 UTC + +**TGC RMM command result (e4d372fb):** +- Hyper-V, RSAT-Hyper-V-Tools, Hyper-V-Tools, Hyper-V-PowerShell -- all Installed +- Remote-Desktop-Services, RDS-Connection-Broker, RDS-Gateway, RDS-Licensing, RDS-RD-Server, RDS-Web-Access -- all Installed +- MAS90 VM: Running, Operating normally +- MAS90.old VM: Off, Operating normally + +**IEX installer:** +irm 'https://rmm.azcomputerguru.com/install/INNER-STORM-2733/windows' | iex + +**Vault paths:** +- TGC enrollment key: clients/tucson-golden-corral/gururmm-site-co-located.sops.yaml +- GuruRMM admin: projects/gururmm/dashboard.sops.yaml +- GuruRMM API JWT secret: projects/gururmm/api-server.sops.yaml