sync: auto-sync from Mikes-MacBook-Air.local at 2026-06-10 11:39:35
Author: Mike Swanson Machine: Mikes-MacBook-Air.local Timestamp: 2026-06-10 11:39:35
This commit is contained in:
@@ -0,0 +1,170 @@
|
||||
# Lonestar Unraid Server — VM Restoration and libvirt TPM Fix
|
||||
|
||||
## User
|
||||
- **User:** Mike Swanson (mike)
|
||||
- **Machine:** Mikes-MacBook-Air
|
||||
- **Role:** admin
|
||||
|
||||
## Session Summary
|
||||
|
||||
Continued troubleshooting the lonestar Unraid server (192.168.120.177) after the previous session addressed GURU-5070 brightness and initial lonestar RMM connectivity. The server had two critical issues: libvirt VMs tab not loading in the web UI, and LONESTAR-VM (Windows VM) offline since 2026-06-06.
|
||||
|
||||
Investigated the libvirt access issue for the Unraid web UI's `nobody` user. Initial attempts focused on XDG environment variables and socket permissions, following web search results and attempted external AI consultation (both Grok and Gemini failed to provide results). The core problem was identified as missing `/etc/libvirt/qemu.conf` file — created it with `dynamic_ownership = 1` and `group = "users"` settings to allow the users group (which nobody belongs to) to access VMs. Verified the fix by testing `virsh` as the nobody user, which successfully connected.
|
||||
|
||||
Attempted to make the fix persistent by adding qemu.conf creation to `/boot/config/go`, but this broke the boot process completely — nginx, libvirtd, and even emhttpd failed to start after reboot. Reverted to a minimal `/boot/config/go` (just starting emhttpd) and rebooted again to restore basic functionality. User confirmed the web UI was accessible after the minimal reboot.
|
||||
|
||||
The final critical issue was TPM directory permissions. When attempting to start LONESTAR-VM, libvirt failed with "Could not create TPM directory /var/lib/libvirt/swtpm/<UUID>: No such file or directory". Created `/var/lib/libvirt/swtpm` and set it to 777 permissions to allow libvirt to create VM-specific TPM subdirectories. User then toggled VM Manager off/on in the Unraid web UI, which rediscovered and restored LONESTAR-VM from its stored configuration. The VM started successfully and its RMM agent (ID: a4d39a9d-2210-483c-9b1e-6348efdba627) reconnected immediately.
|
||||
|
||||
Jupiter Unraid server was examined for comparison at user's request — found it had minimal libvirt configuration (no custom config files) and was instructed not to change anything since it was working.
|
||||
|
||||
## Key Decisions
|
||||
|
||||
- **Abandoned complex /boot/config/go libvirt setup** — The comprehensive libvirt fix (creating all config files on boot) broke Unraid's init system completely. Reverted to minimal config and relied on Unraid's built-in VM Manager instead. This is the correct approach for Unraid — let the VM Manager handle libvirt startup.
|
||||
|
||||
- **Used 777 permissions on TPM directory** — Made `/var/lib/libvirt/swtpm` world-writable instead of trying to chase down exact user/group requirements. This is pragmatic for a single-user Unraid system and ensures libvirt can create VM-specific subdirectories regardless of which user context it runs under.
|
||||
|
||||
- **Did not persist qemu.conf fix** — Since the comprehensive boot script broke the system, the qemu.conf file was not added to persistent storage. However, this turned out to be unnecessary — toggling VM Manager off/on was sufficient to restore VM functionality, suggesting Unraid's VM Manager handles this internally.
|
||||
|
||||
- **Leveraged Unraid's VM Manager rediscovery** — Instead of manually recreating VM XML configuration, toggling VM Manager off/on caused Unraid to rediscover LONESTAR-VM from its stored configuration. This preserved all original VM settings (BIOS, TPM config, disk paths, etc.) without manual reconstruction.
|
||||
|
||||
## Problems Encountered
|
||||
|
||||
- **External AI consultation failures** — Both Grok (`/grok xsearch`) and Gemini (`/agy ask`) failed to return results when searching for Unraid libvirt permission solutions. Grok returned `stopReason=` with no result; Gemini returned error code 41. Fell back to standard web search which provided the critical qemu.conf lead.
|
||||
|
||||
- **Missing qemu.conf file** — The root cause of "nobody user cannot access libvirt" was the complete absence of `/etc/libvirt/qemu.conf`. Standard libvirt installations on Unraid don't create this file by default. Created it with `dynamic_ownership = 1` which allows VMs to be accessed by multiple users in the users group.
|
||||
|
||||
- **Comprehensive boot script broke Unraid startup** — Adding libvirt config file creation (libvirtd.conf, virtlockd.conf, virtlogd.conf, qemu.conf) to `/boot/config/go` caused a complete boot failure — nginx, libvirtd, and emhttpd all failed to start. The init system likely choked on the backgrounded subshell with multiple heredocs. Reverted to minimal config (just starting emhttpd) to restore functionality.
|
||||
|
||||
- **Nginx dying repeatedly** — Before the reboot, nginx would start successfully but die within seconds with no error messages. Traced to stuck Unix socket `/var/run/nginx.socket`. Removed the socket manually and nginx started, but it continued to die silently. Reboot with minimal config resolved this.
|
||||
|
||||
- **TPM directory permission issue** — VM start failed with "Could not create TPM directory /var/lib/libvirt/swtpm/<UUID>". The parent directory existed but lacked write permissions for the context libvirt was running under. Set to 777 to allow creation of VM-specific subdirectories. Libvirt wasn't running at the time, so enabling VM Manager was required first.
|
||||
|
||||
- **VM Manager needed to be toggled** — Even with TPM directory created and writable, LONESTAR-VM didn't appear until VM Manager was toggled off then on again. This rediscovery process restored the VM from its stored configuration files.
|
||||
|
||||
## Configuration Changes
|
||||
|
||||
**Modified:**
|
||||
- `/boot/config/go` (lonestar) — Reverted to minimal version (just starts emhttpd). Previous complex libvirt setup removed.
|
||||
|
||||
**Created:**
|
||||
- `/etc/libvirt/qemu.conf` (lonestar, ephemeral) — Created with `dynamic_ownership = 1`, `group = "users"`, `security_driver = "none"` to allow users group access to VMs. This file is ephemeral (recreated on boot by VM Manager).
|
||||
- `/var/lib/libvirt/swtpm/` (lonestar, ephemeral) — Created with 777 permissions to allow libvirt to create VM-specific TPM subdirectories.
|
||||
- `/etc/libvirt/qemu/swtpm/tpm-states/` (lonestar, ephemeral) — Initially created based on incorrect path from web search; not actually needed by Unraid.
|
||||
|
||||
**Not Persisted:**
|
||||
- The qemu.conf and swtpm directory fixes are ephemeral. They survive until reboot but are not added to `/boot/config/go`. User should add these to `/boot/config/go` if they want persistence:
|
||||
```bash
|
||||
mkdir -p /var/lib/libvirt/swtpm
|
||||
chmod 777 /var/lib/libvirt/swtpm
|
||||
```
|
||||
|
||||
## Credentials & Secrets
|
||||
|
||||
- **lonestar root password:** `Gptf*77ttb123!@#` (provided by user at session start)
|
||||
- **lonestar IP:** 192.168.120.177 (local LAN address)
|
||||
|
||||
## Infrastructure & Servers
|
||||
|
||||
- **lonestar (Tower)** — Unraid server at 192.168.120.177
|
||||
- RMM Agent ID: e827f798-bab1-484b-b641-98da7ff5af87
|
||||
- Hostname in RMM: bfcfbc739d23 (Docker container hash, not Tower)
|
||||
- OS: Unraid
|
||||
- Last seen: 2026-06-10 17:46:15 UTC (online)
|
||||
|
||||
- **LONESTAR-VM** — Windows VM on lonestar Unraid server
|
||||
- RMM Agent ID: a4d39a9d-2210-483c-9b1e-6348efdba627
|
||||
- OS: Windows
|
||||
- Status: online (restored 2026-06-10 17:46:15 UTC after being offline since 2026-06-06)
|
||||
- VM disk: `/mnt/user/domains/Windows 11/vdisk1.img` (path from previous session, not verified this session due to array mount timing)
|
||||
|
||||
- **Jupiter** — Comparison Unraid server (examined for config reference)
|
||||
- RMM Agent ID: 443bfabb-9213-4157-8be6-2b6d5d3113b2
|
||||
- OS: linux (Unraid)
|
||||
- Status: online
|
||||
- Config: Minimal libvirt setup, no custom config files, working correctly — left unchanged per user instruction
|
||||
|
||||
## Commands & Outputs
|
||||
|
||||
**Key diagnostic command (nobody user libvirt test):**
|
||||
```bash
|
||||
su -s /bin/bash nobody -c "virsh -c qemu:///system version 2>&1"
|
||||
```
|
||||
Before qemu.conf fix: `error: Cannot create user runtime directory /.cache/libvirt: Permission denied`
|
||||
After qemu.conf fix:
|
||||
```
|
||||
Compiled against library: libvirt 11.2.0
|
||||
Using library: libvirt 11.2.0
|
||||
Using API: QEMU 11.2.0
|
||||
Running hypervisor: QEMU 9.2.3
|
||||
```
|
||||
|
||||
**Creating qemu.conf:**
|
||||
```bash
|
||||
cat > /etc/libvirt/qemu.conf << 'EOF'
|
||||
user = "root"
|
||||
group = "users"
|
||||
dynamic_ownership = 1
|
||||
security_driver = "none"
|
||||
group_ownership = 1
|
||||
unix_sock_group = "users"
|
||||
unix_sock_rw_perms = "0770"
|
||||
log_level = 3
|
||||
log_outputs="3:syslog:qemu"
|
||||
EOF
|
||||
```
|
||||
|
||||
**Fixing TPM directory permissions:**
|
||||
```bash
|
||||
mkdir -p /var/lib/libvirt/swtpm
|
||||
chmod 777 /var/lib/libvirt/swtpm
|
||||
```
|
||||
|
||||
**Minimal /boot/config/go (working version):**
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# Start the Management Utility
|
||||
/usr/local/sbin/emhttp &
|
||||
```
|
||||
|
||||
**Nginx error that revealed stuck socket:**
|
||||
```
|
||||
nginx: [emerg] bind() to unix:/var/run/nginx.socket failed (98: Address already in use)
|
||||
nginx: [emerg] still could not bind()
|
||||
```
|
||||
Resolution: `rm -f /var/run/nginx.socket` then restart nginx via rc.nginx
|
||||
|
||||
**Final VM status check:**
|
||||
```bash
|
||||
curl -s "$RMM/api/agents/a4d39a9d-2210-483c-9b1e-6348efdba627" -H "Authorization: Bearer $TOKEN"
|
||||
```
|
||||
Result: `{"hostname": "LONESTAR-VM", "status": "online", "last_seen": "2026-06-10T17:46:15.185825Z"}`
|
||||
|
||||
## Pending / Incomplete Tasks
|
||||
|
||||
- **Persist TPM directory fix** — The `/var/lib/libvirt/swtpm` directory with 777 permissions is ephemeral. If the user wants this to survive reboots, add to `/boot/config/go`:
|
||||
```bash
|
||||
mkdir -p /var/lib/libvirt/swtpm
|
||||
chmod 777 /var/lib/libvirt/swtpm
|
||||
```
|
||||
|
||||
- **Monitor LONESTAR-VM stability** — VM just came back online after being offline since 2026-06-06. Monitor over next 24-48 hours to ensure it stays online and RMM agent remains connected.
|
||||
|
||||
- **Understand why comprehensive boot script failed** — The complex `/boot/config/go` with libvirt config file creation broke Unraid's init system completely. Root cause not fully diagnosed — could be the backgrounded subshell syntax, heredoc nesting, or timing issues with emhttpd startup. If libvirt fixes need to be persistent, investigate simpler approaches or separate init scripts.
|
||||
|
||||
## Reference Information
|
||||
|
||||
- **GuruRMM API endpoint:** http://172.16.3.30:3001
|
||||
- **GuruRMM external endpoint:** wss://rmm-api.azcomputerguru.com/ws
|
||||
- **RMM credentials:** vault path `infrastructure/gururmm-server.sops.yaml`
|
||||
- **Web search results used:** https://github.com/cockpit-project/cockpit/issues/3765 (libvirt XDG runtime directory issues)
|
||||
- **Unraid web UI:** http://192.168.120.177
|
||||
- **VM Manager path:** Settings → VM Manager in Unraid web UI
|
||||
- **Libvirt version:** 11.2.0
|
||||
- **QEMU version:** 9.2.3
|
||||
- **Unraid VM disk path format:** `/mnt/user/domains/<VM-Name>/vdisk1.img`
|
||||
- **TPM directory path:** `/var/lib/libvirt/swtpm/` (parent, world-writable) with VM-specific subdirectories created as `<UUID>/`
|
||||
|
||||
## Related Session Logs
|
||||
|
||||
- Previous session (2026-06-09 or earlier) — Initial lonestar investigation, found VM disk at `/mnt/user/domains/Windows 11/vdisk1.img`, identified VM offline since 2026-06-06
|
||||
- GURU-5070 brightness fix — Completed in same session, dispatched PowerShell command to disable adaptive brightness (agent ID: 819df0c8-4824-4424-b55a-2c5cb4d6ca39)
|
||||
Reference in New Issue
Block a user