sync: auto-sync from DESKTOP-0O8A1RL at 2026-05-15 21:14:51
Author: Mike Swanson Machine: DESKTOP-0O8A1RL Timestamp: 2026-05-15 21:14:51
This commit is contained in:
@@ -949,3 +949,121 @@ docker run -d \
|
||||
- GuruRMM Debug site ID: d6b8233a-6cc1-4a44-888d-01ee49123fba
|
||||
- AZ Computer Guru client ID: 417420f4-c3f4-482a-acd4-d6f63c8cddde
|
||||
- DB migration applied: server/migrations/032_vm_detection.sql
|
||||
|
||||
|
||||
---
|
||||
|
||||
## Update: 21:11 PT — Jupiter hypervisor wiring, Pluto VM detection, watchdog fix, dashboard terminal layout
|
||||
|
||||
### User
|
||||
- **User:** Mike Swanson (mike)
|
||||
- **Machine:** DESKTOP-0O8A1RL
|
||||
- **Role:** admin
|
||||
- **Session span:** ~03:00-21:15 UTC 2026-05-16 (continued from previous context window)
|
||||
|
||||
### Session Summary
|
||||
|
||||
Picked up from a context-compacted prior window where three bugs had been identified: Jupiter's Docker container lacked /dev/kvm, Pluto's Windows agent was not detecting itself as a VM, and both is_container/is_unraid columns were missing from the database. The prior window had already applied DB migration 033 and pushed a KVM detection fix (30016da), but those were not verifiable until this session resumed.
|
||||
|
||||
The Jupiter container was recreated with /dev/kvm mounted, confirming is_hypervisor: true in the inventory API. However, hosted_vm_uuids remained empty because virsh was not installed in the Docker image (only ca-certificates and docker.io were present). Added libvirt-clients to the Dockerfile, pushed to Gitea, and the build pipeline produced a new Docker image. Recreating the container with the new image still yielded empty hosted_vm_uuids — the libvirt socket was not mounted. The libvirt socket path on Unraid (/var/run/libvirt/libvirt-sock) was discovered and mounted. virsh then enumerated 7 hosted VMs and hosted_vm_uuids populated correctly. The host-guest UUID matching on the server side linked the GuruRMM server agent (gururmm) to Jupiter as its hypervisor host.
|
||||
|
||||
For Pluto, the WMI detection fix had been compiled into v0.6.21 but Pluto was already running v0.6.21 from a prior build, so the auto-updater skipped re-delivery. The agent version was bumped to 0.6.22 and the pipeline rebuilt. Pluto received the update command on reconnect but its GuruRMM service went offline and did not recover automatically for ~25 minutes. Investigation via paramiko SSH (sshpass unavailable, VPN was connected providing direct access to 172.16.3.36) found GuruRMMAgent stopped. The agent log showed a two-stage failure: the watchdog received RestartMainService IPC but service.stop() via the windows-service SCM API returned access denied, then the watchdog entered 14 minutes of suppression mode instead of resuming monitoring immediately. Service was manually started and came back online on v0.6.22 with is_virtual_machine: true, hypervisor_type: KVM, and hypervisor_host linked back to Jupiter.
|
||||
|
||||
Three watchdog bugs were patched: (1) RestartMainService falls back to sc.exe stop when the SCM API call fails; (2) suppress_until is set to Instant::now() on restart failure so monitoring resumes immediately; (3) PerformUpdate warning demoted to debug since the updater handles its own binary swap without watchdog involvement. The v0.6.22 changelog was generated (the generate-changelog.sh script existed but was not wired into build-agents.sh) and the pipeline hook was added. Finally, a layout bug in the dashboard Terminal tab was fixed: NativeSelect was applying the caller's className to the inner select while the outer wrapper div had hardcoded w-full, causing the div to claim the entire flex row and squeezing the command input to zero width. The fix moved className to the outer div; twMerge ensures caller width classes override the default w-full. The terminal output panel was also enlarged from h-80 to h-[28rem].
|
||||
|
||||
### Key Decisions
|
||||
|
||||
- **libvirt socket path vs. libvirtd TCP**: Mounted /var/run/libvirt/libvirt-sock (Unix socket) rather than configuring libvirtd to listen on TCP. Unix socket is safer and avoids reconfiguring Unraid libvirtd.
|
||||
- **Version bump to 0.6.22 instead of content-addressing**: The auto-updater compares version strings; it cannot detect a same-version binary with different content. Bumping was the only reliable way to force re-delivery.
|
||||
- **paramiko over sshpass**: sshpass not installed. paramiko handles password SSH from Python without an interactive TTY.
|
||||
- **NativeSelect className to outer div**: All callers pass only width classNames. Moving className to the outer div is safe; twMerge resolves the conflict with the default w-full. The inner select always uses w-full to fill its container.
|
||||
- **Changelog wired into build-agents.sh**: Called just before the "Build complete" log line, keeping it atomic with the build.
|
||||
- **Live terminal deferred**: xterm.js/PTY bridge is a future feature. Current command-dispatch model is sufficient.
|
||||
|
||||
### Problems Encountered
|
||||
|
||||
- **virsh not in Docker image**: hosted_vm_uuids empty after /dev/kvm mount. Fix: added libvirt-clients to Dockerfile, rebuilt image.
|
||||
- **Wrong libvirt socket path**: virsh failed with "No such file or directory: /var/run/libvirt/libvirt-sock". Checked /var/run/libvirt/ on Jupiter and mounted the correct path.
|
||||
- **Pluto auto-update stuck for 25 minutes**: watchdog received RestartMainService but SCM service.stop() returned access denied, then suppression mode held 14 minutes. Unblocked by manual Start-Service via paramiko. Root cause fixed in watchdog.
|
||||
- **SSH password auth non-interactive**: ssh prompts but cannot receive password in non-interactive shell; sshpass not installed. Resolved with paramiko.
|
||||
- **NativeSelect outer div w-full**: Wrapper div claimed full flex width regardless of w-32 passed by callers. ChevronDown appeared at absolute right-2 of the full-width div (far right of page). Fixed by moving className to outer div.
|
||||
|
||||
### Configuration Changes
|
||||
|
||||
**gururmm repo (172.16.3.20:azcomputerguru/gururmm.git):**
|
||||
- `agent/Dockerfile` — added libvirt-clients apt package
|
||||
- `agent/Cargo.toml` — version 0.6.21 to 0.6.22
|
||||
- `agent/src/watchdog/monitor.rs` — sc.exe fallback for stop, suppress_until cleared on failure, PerformUpdate debug
|
||||
- `scripts/build-agents.sh` — wired generate-changelog.sh before "Build complete" log line
|
||||
- `changelogs/agent/v0.6.22.md` — new file
|
||||
- `changelogs/LATEST_AGENT.md` — updated to v0.6.22
|
||||
- `dashboard/src/components/Select.tsx` — NativeSelect className to outer div, removed inline-block
|
||||
- `dashboard/src/components/CommandTerminal.tsx` — NativeSelect shrink-0, output panel h-[28rem]
|
||||
|
||||
**claudetools repo (local):**
|
||||
- `projects/msp-tools/guru-rmm/agent/Dockerfile` — libvirt-clients (submodule copy)
|
||||
- `projects/msp-tools/guru-rmm/docs/unraid-ca-template.xml` — added /dev/kvm and libvirt-sock mount entries
|
||||
|
||||
**Jupiter (172.16.3.20) — container:**
|
||||
- Final run command: `docker run -d --name gururmm-agent --network host --restart unless-stopped -v /mnt/user/appdata/gururmm:/config -v /sys:/sys:ro -v /proc:/proc:ro -v /dev/kvm:/dev/kvm:ro -v /var/run/docker.sock:/var/run/docker.sock -v /var/run/libvirt/libvirt-sock:/var/run/libvirt/libvirt-sock:ro -e GURURMM_CONFIG=/config/config.toml localhost:3000/azcomputerguru/gururmm-agent:latest`
|
||||
|
||||
### Credentials & Secrets
|
||||
|
||||
- **GuruRMM dashboard admin:** admin@azcomputerguru.com / GuruRMM2025 — vault: projects/gururmm/dashboard.sops.yaml
|
||||
- **Pluto Administrator SSH:** Paper123!@# — NOT IN VAULT. Needs infrastructure/pluto-build-server.sops.yaml
|
||||
- **GuruRMM API JWT secret:** vault: projects/gururmm/api-server.sops.yaml
|
||||
|
||||
### Infrastructure & Servers
|
||||
|
||||
- **Jupiter (172.16.3.20):** Unraid 7.2.5, KVM hypervisor, Docker 29.3. libvirtd socket: /var/run/libvirt/libvirt-sock. 7 hosted VMs. Agent ID: 443bfabb-9213-4157-8be6-2b6d5d3113b2
|
||||
- **Pluto (172.16.3.36):** Windows Server 2019 VM on Jupiter (virsh name: Claude-Builder, UUID: 2087a53f-1aa1-3eca-41a9-2139bf9d57d4). Agent v0.6.22. Agent ID: 5316f56f-a1b3-4ac5-97ac-71ddf6a74d2e
|
||||
- **GuruRMM server (172.16.3.30):** Agent ID 8cd0440f-a65c-4ed2-9fa8-9c6de83492a4, hostname "gururmm". KVM guest on Jupiter.
|
||||
- **Gitea (172.16.3.20:3000):** Docker registry for gururmm-agent image
|
||||
|
||||
### Commands & Outputs
|
||||
|
||||
```bash
|
||||
# Jupiter inventory VM fields (confirmed working)
|
||||
# is_hypervisor: true, hosted_vm_uuids: [7 UUIDs], guest_vms: [{gururmm}]
|
||||
|
||||
# Jupiter final container run
|
||||
docker run -d --name gururmm-agent --network host --restart unless-stopped \
|
||||
-v /mnt/user/appdata/gururmm:/config -v /sys:/sys:ro -v /proc:/proc:ro \
|
||||
-v /dev/kvm:/dev/kvm:ro -v /var/run/docker.sock:/var/run/docker.sock \
|
||||
-v /var/run/libvirt/libvirt-sock:/var/run/libvirt/libvirt-sock:ro \
|
||||
-e GURURMM_CONFIG=/config/config.toml \
|
||||
localhost:3000/azcomputerguru/gururmm-agent:latest
|
||||
|
||||
# Pluto inventory after v0.6.22
|
||||
# is_virtual_machine: true, hypervisor_type: KVM, hypervisor_host: {Jupiter}
|
||||
|
||||
# v0.6.22 build: 03:15:59 - === Build complete: v0.6.22 total 363s ===
|
||||
# Windows agent: /var/www/gururmm/downloads/gururmm-agent-windows-amd64-0.6.22.exe
|
||||
|
||||
# Watchdog failure log (before fix)
|
||||
# ERROR watchdog: IPC-triggered restart failed: Failed to stop main service
|
||||
# INFO watchdog: suppression active, skipping poll (repeated 14 min)
|
||||
```
|
||||
|
||||
### Pending / Incomplete Tasks
|
||||
|
||||
- **Pluto vault entry**: Paper123!@# needs infrastructure/pluto-build-server.sops.yaml
|
||||
- **Pluto SSH key**: Add DESKTOP-0O8A1RL pubkey to Pluto authorized_keys
|
||||
- **macOS agent**: No Docker/install path. build-agents.sh has TODO-MACOS
|
||||
- **Live terminal**: xterm.js + PTY bridge deferred to future feature
|
||||
- **Policy wiring plan**: ticklish-questing-stallman.md plan exists, deferred
|
||||
- **BB-SERVER enrollment loop**: Pre-existing duplicate key constraint, not addressed
|
||||
- **PowerShell command_type bug**: Agent prepends -OutputEncoding UTF8 -Command incorrectly on Windows PS 5.1
|
||||
- **Dashboard VM badges**: Data now correct in API; verify dashboard VM/Hypervisor badge renders on both Jupiter and Pluto agent detail pages
|
||||
|
||||
### Reference Information
|
||||
|
||||
- **v0.6.22 changelog:** gururmm repo changelogs/agent/v0.6.22.md
|
||||
- **Watchdog fix commit:** a29007c
|
||||
- **NativeSelect + terminal fix commit:** 8551120
|
||||
- **Changelog pipeline commit:** 41d841a
|
||||
- **Jupiter agent ID:** 443bfabb-9213-4157-8be6-2b6d5d3113b2
|
||||
- **Pluto agent ID:** 5316f56f-a1b3-4ac5-97ac-71ddf6a74d2e
|
||||
- **GuruRMM server agent ID:** 8cd0440f-a65c-4ed2-9fa8-9c6de83492a4
|
||||
- **Pluto virsh name:** Claude-Builder (UUID: 2087a53f-1aa1-3eca-41a9-2139bf9d57d4)
|
||||
- **libvirt socket on Unraid:** /var/run/libvirt/libvirt-sock
|
||||
|
||||
Reference in New Issue
Block a user