sync: auto-sync from DESKTOP-0O8A1RL at 2026-05-17 22:07:52

Author: Mike Swanson
Machine: DESKTOP-0O8A1RL
Timestamp: 2026-05-17 22:07:52
This commit is contained in:
2026-05-17 22:07:59 -07:00
parent acb0af9d3a
commit 3baaf91183
4 changed files with 438 additions and 0 deletions

View File

@@ -0,0 +1,149 @@
# Session Log — 2026-05-17
## User
- **User:** Mike Swanson (mike)
- **Machine:** DESKTOP-0O8A1RL
- **Role:** admin
- **Session span:** ~19:0021:30 PT
---
## Session Summary
The session continued from a compacted prior context. The first task was executing a pre-written Python patch script (`tmp_syncro_patch.py`) that had been prepared in the previous session to insert the Syncro PSA Webhook Integration section into the GuruRMM feature roadmap. The script was SCP'd to the server, executed successfully, and the change was committed directly to the `azcomputerguru/gururmm` repo as `cef20dd` with 54 lines inserted under a new `## Integrations` section. A coord message was sent to Howard confirming the feature request was accepted into the roadmap with P1/P2 phase labels. The temp script was cleaned up afterward.
A `/sync` attempt immediately following was blocked: the claudetools git remote URL still pointed to `git.azcomputerguru.com` (Cloudflare-fronted, not accessible from this machine). The remote was updated to `http://172.16.3.20:3000` (internal Gitea IP), but then all 172.16.3.x hosts became unreachable. Tailscale connectivity investigation revealed the pfSense subnet router node (`pfsense-2`) had `rx=0` and was not forwarding traffic — the office had experienced a power failure.
Mike connected to the office LAN in person and described the situation: batteries had been disconnected from UPS units during a rack move; a power flicker tripped the units with no backup, shutting down the office infrastructure. Mike reconnected the batteries and restarted the UPS units. Jupiter (Unraid, 172.16.3.20) and Uranus (172.16.3.21) auto-started; the IX server required a manual button press at the rack. The remainder of the recovery was performed remotely via SSH.
Recovery proceeded in five stages. First, pfSense Tailscale routes were restored: `AdvertiseRoutes` had reset to `null` after reboot; `tailscale up --advertise-routes=172.16.0.0/22 --accept-routes` re-established subnet routing. Second, Unraid's libvirtd was failing because `libvirt.img` was not loop-mounted at boot and a stale `/run/libvirt/libvirt-sock` directory (not a socket file) blocked the daemon. The directory was removed, the image was manually loop-mounted to `/etc/libvirt/`, and `libvirtd -d` was started. All four VMs (GuruRMM, Unifi, OwnCloud, Claude-Builder) came up automatically. Third, Seafile's seahub/gunicorn process was not running despite the containers showing "Up"; `seahub.sh start` inside the container restored service. Fourth, `sync.azcomputerguru.com` was still not reachable because the pfSense DNS override pointed to 172.16.3.21 (Uranus — OwnCloud storage, not a proxy) rather than 172.16.3.20 (Jupiter/NPM). NPM's HTTPS listener is on host port 18443, not 443, so an iptables PREROUTING rule was also required. Both were applied and persisted. Fifth, a full end-to-end verification confirmed all services healthy.
A power failure recovery runbook was written to `.claude/POWER_FAILURE_RUNBOOK.md` covering all five stages with exact commands and a single-block PowerShell verification script. Infrastructure facts were committed to memory. Howard was notified via coord with the full post-mortem.
---
## Key Decisions
- **Syncro roadmap patch executed directly on server:** The patch script had already been reviewed and approved (option 3 selected in prior session). No re-review needed; SCP + python3 was the fastest path to commit.
- **Tailscale fix applied before anything else:** Without pfSense advertising routes, no subsequent SSH commands to 172.16.3.x would work. Route restoration was the prerequisite for all other steps.
- **libvirt.img mounted manually rather than rebooting Unraid:** A reboot would have been slower and risked the same failure mode repeating. Direct loop-mount + daemon start was faster and diagnostic.
- **DNS fix targeted pfSense config.xml AND host_entries.conf separately:** Editing config.xml alone wasn't sufficient — pfSense regenerates host_entries.conf on Unbound restart. Needed to edit host_entries.conf directly and send SIGHUP (not restart) to avoid regen overwriting the change. config.xml was also fixed so future regenerations use the correct IP.
- **iptables rule added to /boot/config/go:** Unraid's go script is the correct persistence mechanism. Without this, the port 443 → NPM routing would be lost on next reboot.
- **IX server BIOS power-restore setting noted as TODO:** It did not auto-restart after power loss. "Power On After Power Failure" BIOS/IPMI setting should be reviewed to match Jupiter's behavior.
---
## Problems Encountered
- **git remote URL pointed to Cloudflare-fronted domain:** `git.azcomputerguru.com` blocks curl. Fixed by updating remote to `http://172.16.3.20:3000`. This should be permanent but the remote reverted at some point — likely from a prior `/sync` on a machine that still had the old URL.
- **Pre-bash-backslash hook blocked several curl one-liners:** Hook at `.claude/hooks/pre-bash-backslash.sh` blocks commands with backslash line continuations. Worked around by using PowerShell SSH calls and Python urllib one-liners throughout.
- **pfSense `sed -i` used BSD syntax (not GNU):** FreeBSD `sed` requires `''` after `-i`. Initial attempts failed with "unterminated substitute pattern." Fixed by adjusting quoting.
- **Unbound ignored config.xml edit on `svc restart`:** pfSense's `pfSsh.php playback svc restart unbound` regenerates host_entries.conf from config.xml at restart time. The first restart overwrote the direct file edit. Solution: edit host_entries.conf, then send SIGHUP to the running Unbound PID instead of doing a full restart.
- **libvirtd failed with "snapshot dir not a directory":** `/var/lib/libvirt/qemu/snapshot` was a symlink to `/etc/libvirt/qemu/snapshot`, which didn't exist because libvirt.img wasn't mounted. Mounting the image first resolved it.
- **Seahub reported "started" but gunicorn process was absent:** The container start script ran seahub.sh twice (two start sequences visible in `docker logs seafile`) and the second run encountered the "creating admin" error. Despite reporting success, gunicorn wasn't running. Manually running `seahub.sh start` from inside the container fixed it.
- **SSH to Uranus (172.16.3.21) rejected:** Key not authorized. Irrelevant once it was established .21 is storage-only and the DNS fix was the correct path.
- **PowerShell subshell expansion broke `kill -HUP $(cat /var/run/unbound.pid)`:** PowerShell expanded `$(cat ...)` locally before passing to SSH. Worked around by querying the PID in a separate SSH call, then passing the integer directly.
---
## Configuration Changes
- **`/home/guru/gururmm/docs/FEATURE_ROADMAP.md`** — 54 lines inserted: new `## Integrations > ### Syncro PSA` section with P1/P2 phase breakdown. Committed as `cef20dd` on `azcomputerguru/gururmm` main.
- **`D:\claudetools\.claude\POWER_FAILURE_RUNBOOK.md`** — Created. Full office power failure recovery runbook with 5 steps + 2026-05-17 post-mortem.
- **`D:\claudetools\.claude\memory\infra_office_network.md`** — Created. ACG office infrastructure reference (IPs, hosts, roles, Docker containers, NPM proxy map).
- **`D:\claudetools\.claude\memory\MEMORY.md`** — Updated. Added entries for infra reference and power failure runbook.
- **`/boot/config/go` on Jupiter (172.16.3.20)** — Appended: `iptables -t nat -I PREROUTING -p tcp --dport 443 -j DNAT --to-destination 172.17.0.2:443`
- **pfSense `/cf/conf/config.xml`** — `sync` DNS override changed from 172.16.3.21 → 172.16.3.20.
- **pfSense `/var/unbound/host_entries.conf`** — Same change applied to running Unbound config.
- **`D:\claudetools\tmp_syncro_patch.py`** — Deleted (cleanup after use).
---
## Credentials & Secrets
None created or discovered this session.
---
## Infrastructure & Servers
| Host | IP | Role | Notes |
|------|----|------|-------|
| pfSense | 172.16.0.1 | Router, DNS, Tailscale subnet router | SSH port 2248, user admin |
| Jupiter | 172.16.3.20 | Unraid NAS | All VMs + Docker; root SSH |
| Uranus | 172.16.3.21 | OwnCloud additional storage | NOT a reverse proxy; no working SSH key from this machine |
| GuruRMM VM | 172.16.3.30 | Linux VM on Jupiter | GuruRMM server, Coord API (8001), MariaDB, SSH as guru |
| Pluto | 172.16.3.36 | Windows Server 2019 VM on Jupiter | MSI build server |
**Docker containers on Jupiter:**
| Container | Image | Ports |
|-----------|-------|-------|
| npm | jc21/nginx-proxy-manager | 1880→80, 7818→81, 18443→443 |
| seafile | seafileltd/seafile-pro-mc:12.0-latest | 8082→80 |
| seafile-mysql | mariadb:10.6 | internal |
| seafile-elasticsearch | elasticsearch:7.17.26 | internal |
| seafile-memcached | memcached:1.6.18 | internal |
**NPM SSL cert for sync.azcomputerguru.com:** `/etc/letsencrypt/live/npm-8/fullchain.pem` — valid until 2026-07-25.
**Tailscale subnet:** 172.16.0.0/22 advertised via pfSense node `pfsense-2` (100.119.153.74).
---
## Commands & Outputs
```bash
# Syncro roadmap patch
scp D:/claudetools/tmp_syncro_patch.py guru@172.16.3.30:/tmp/syncro_patch.py
ssh guru@172.16.3.30 "python3 /tmp/syncro_patch.py" # output: inserted
ssh guru@172.16.3.30 "cd /home/guru/gururmm && git add docs/FEATURE_ROADMAP.md && git commit -m '...' && git push origin main"
# Result: cef20dd pushed to azcomputerguru/gururmm main
# pfSense Tailscale fix
ssh -p 2248 admin@172.16.0.1 "tailscale up --advertise-routes=172.16.0.0/22 --accept-routes"
# libvirt recovery on Jupiter
ssh root@172.16.3.20 "losetup -f --show /mnt/user/system/libvirt/libvirt.img" # returned /dev/loop4
ssh root@172.16.3.20 "mount /dev/loop4 /etc/libvirt && libvirtd -d"
ssh root@172.16.3.20 "virsh -c qemu:///system list --all" # all 4 VMs running
# Seafile seahub fix
ssh root@172.16.3.20 "docker exec seafile bash -c 'cd /opt/seafile/seafile-pro-server-12.0.19 && ./seahub.sh start'"
# Verified: docker exec seafile curl -s -o /dev/null -w '%{http_code}' http://localhost:8000/ → 302
# iptables port 443 → NPM
ssh root@172.16.3.20 "iptables -t nat -I PREROUTING -p tcp --dport 443 -j DNAT --to-destination 172.17.0.2:443"
# Persisted: echo 'iptables ...' >> /boot/config/go
# pfSense DNS fix
ssh -p 2248 admin@172.16.0.1 "sed -i '' 's/172.16.3.21/172.16.3.20/g' /var/unbound/host_entries.conf"
ssh -p 2248 admin@172.16.0.1 "kill -HUP 62718" # Unbound SIGHUP (PID from /var/run/unbound.pid)
# Final verification
# sync.azcomputerguru.com:443 → True; HTTPS 200
```
---
## Pending / Incomplete Tasks
- **IX server BIOS power-restore setting:** Did not auto-restart after power loss. Check "Power On After Power Failure" in BIOS or IPMI. Should match Jupiter's behavior (auto-start).
- **Unraid libvirt auto-mount on boot:** The VM plugin should mount `libvirt.img` automatically but failed. A User Scripts plugin script triggered at array start would make the manual step 2c in the runbook unnecessary. Not implemented yet.
- **claudetools git remote URL:** Verified it was corrected to `http://172.16.3.20:3000` this session but it had reverted previously. Check other machines (MacBook, Howard) that they also have the internal URL.
- **`/sync` was never completed this session** due to network being down when first attempted. Sync will run at end of this `/save`.
- **Pluto vault entry + SSH key** — carried over from prior session, not addressed today.
- **BB-SERVER enrollment loop** — carried over, not addressed.
- **PowerShell `command_type` bug on Windows PS 5.1** — carried over, not addressed.
- **Policy wiring plan** (`ticklish-questing-stallman.md`) — still pending, not addressed.
---
## Reference Information
- **gururmm Syncro roadmap commit:** `cef20dd``docs: add Syncro PSA Webhook Integration to roadmap (Howard feature request 2026-05-17)`
- **Power failure runbook:** `D:\claudetools\.claude\POWER_FAILURE_RUNBOOK.md`
- **NPM admin panel:** `http://172.16.3.20:7818`
- **Seafile direct:** `http://172.16.3.20:8082`
- **Coord message to Howard (post-mortem):** `321679a2-d890-45e9-a19c-ca3701ad293b`
- **Coord message to Howard (Syncro roadmap):** `90d20ecb-eb40-4c40-86cc-6b52a8084304`