Files

Mike Swanson 3baaf91183 sync: auto-sync from DESKTOP-0O8A1RL at 2026-05-17 22:07:52

Author: Mike Swanson
Machine: DESKTOP-0O8A1RL
Timestamp: 2026-05-17 22:07:52

2026-05-17 22:07:59 -07:00

12 KiB

Raw Blame History

Session Log — 2026-05-17

User

User: Mike Swanson (mike)
Machine: DESKTOP-0O8A1RL
Role: admin
Session span: ~19:00–21:30 PT

Session Summary

The session continued from a compacted prior context. The first task was executing a pre-written Python patch script (tmp_syncro_patch.py) that had been prepared in the previous session to insert the Syncro PSA Webhook Integration section into the GuruRMM feature roadmap. The script was SCP'd to the server, executed successfully, and the change was committed directly to the azcomputerguru/gururmm repo as cef20dd with 54 lines inserted under a new ## Integrations section. A coord message was sent to Howard confirming the feature request was accepted into the roadmap with P1/P2 phase labels. The temp script was cleaned up afterward.

A /sync attempt immediately following was blocked: the claudetools git remote URL still pointed to git.azcomputerguru.com (Cloudflare-fronted, not accessible from this machine). The remote was updated to http://172.16.3.20:3000 (internal Gitea IP), but then all 172.16.3.x hosts became unreachable. Tailscale connectivity investigation revealed the pfSense subnet router node (pfsense-2) had rx=0 and was not forwarding traffic — the office had experienced a power failure.

Mike connected to the office LAN in person and described the situation: batteries had been disconnected from UPS units during a rack move; a power flicker tripped the units with no backup, shutting down the office infrastructure. Mike reconnected the batteries and restarted the UPS units. Jupiter (Unraid, 172.16.3.20) and Uranus (172.16.3.21) auto-started; the IX server required a manual button press at the rack. The remainder of the recovery was performed remotely via SSH.

Recovery proceeded in five stages. First, pfSense Tailscale routes were restored: AdvertiseRoutes had reset to null after reboot; tailscale up --advertise-routes=172.16.0.0/22 --accept-routes re-established subnet routing. Second, Unraid's libvirtd was failing because libvirt.img was not loop-mounted at boot and a stale /run/libvirt/libvirt-sock directory (not a socket file) blocked the daemon. The directory was removed, the image was manually loop-mounted to /etc/libvirt/, and libvirtd -d was started. All four VMs (GuruRMM, Unifi, OwnCloud, Claude-Builder) came up automatically. Third, Seafile's seahub/gunicorn process was not running despite the containers showing "Up"; seahub.sh start inside the container restored service. Fourth, sync.azcomputerguru.com was still not reachable because the pfSense DNS override pointed to 172.16.3.21 (Uranus — OwnCloud storage, not a proxy) rather than 172.16.3.20 (Jupiter/NPM). NPM's HTTPS listener is on host port 18443, not 443, so an iptables PREROUTING rule was also required. Both were applied and persisted. Fifth, a full end-to-end verification confirmed all services healthy.

A power failure recovery runbook was written to .claude/POWER_FAILURE_RUNBOOK.md covering all five stages with exact commands and a single-block PowerShell verification script. Infrastructure facts were committed to memory. Howard was notified via coord with the full post-mortem.

Key Decisions

Syncro roadmap patch executed directly on server: The patch script had already been reviewed and approved (option 3 selected in prior session). No re-review needed; SCP + python3 was the fastest path to commit.
Tailscale fix applied before anything else: Without pfSense advertising routes, no subsequent SSH commands to 172.16.3.x would work. Route restoration was the prerequisite for all other steps.
libvirt.img mounted manually rather than rebooting Unraid: A reboot would have been slower and risked the same failure mode repeating. Direct loop-mount + daemon start was faster and diagnostic.
DNS fix targeted pfSense config.xml AND host_entries.conf separately: Editing config.xml alone wasn't sufficient — pfSense regenerates host_entries.conf on Unbound restart. Needed to edit host_entries.conf directly and send SIGHUP (not restart) to avoid regen overwriting the change. config.xml was also fixed so future regenerations use the correct IP.
iptables rule added to /boot/config/go: Unraid's go script is the correct persistence mechanism. Without this, the port 443 → NPM routing would be lost on next reboot.
IX server BIOS power-restore setting noted as TODO: It did not auto-restart after power loss. "Power On After Power Failure" BIOS/IPMI setting should be reviewed to match Jupiter's behavior.

Problems Encountered

git remote URL pointed to Cloudflare-fronted domain: git.azcomputerguru.com blocks curl. Fixed by updating remote to http://172.16.3.20:3000. This should be permanent but the remote reverted at some point — likely from a prior /sync on a machine that still had the old URL.
Pre-bash-backslash hook blocked several curl one-liners: Hook at .claude/hooks/pre-bash-backslash.sh blocks commands with backslash line continuations. Worked around by using PowerShell SSH calls and Python urllib one-liners throughout.
pfSense sed -i used BSD syntax (not GNU): FreeBSD sed requires '' after -i. Initial attempts failed with "unterminated substitute pattern." Fixed by adjusting quoting.
Unbound ignored config.xml edit on svc restart: pfSense's pfSsh.php playback svc restart unbound regenerates host_entries.conf from config.xml at restart time. The first restart overwrote the direct file edit. Solution: edit host_entries.conf, then send SIGHUP to the running Unbound PID instead of doing a full restart.
libvirtd failed with "snapshot dir not a directory": /var/lib/libvirt/qemu/snapshot was a symlink to /etc/libvirt/qemu/snapshot, which didn't exist because libvirt.img wasn't mounted. Mounting the image first resolved it.
Seahub reported "started" but gunicorn process was absent: The container start script ran seahub.sh twice (two start sequences visible in docker logs seafile) and the second run encountered the "creating admin" error. Despite reporting success, gunicorn wasn't running. Manually running seahub.sh start from inside the container fixed it.
SSH to Uranus (172.16.3.21) rejected: Key not authorized. Irrelevant once it was established .21 is storage-only and the DNS fix was the correct path.
PowerShell subshell expansion broke kill -HUP $(cat /var/run/unbound.pid): PowerShell expanded $(cat ...) locally before passing to SSH. Worked around by querying the PID in a separate SSH call, then passing the integer directly.

Configuration Changes

/home/guru/gururmm/docs/FEATURE_ROADMAP.md — 54 lines inserted: new ## Integrations > ### Syncro PSA section with P1/P2 phase breakdown. Committed as cef20dd on azcomputerguru/gururmm main.
D:\claudetools\.claude\POWER_FAILURE_RUNBOOK.md — Created. Full office power failure recovery runbook with 5 steps + 2026-05-17 post-mortem.
D:\claudetools\.claude\memory\infra_office_network.md — Created. ACG office infrastructure reference (IPs, hosts, roles, Docker containers, NPM proxy map).
D:\claudetools\.claude\memory\MEMORY.md — Updated. Added entries for infra reference and power failure runbook.
/boot/config/go on Jupiter (172.16.3.20) — Appended: iptables -t nat -I PREROUTING -p tcp --dport 443 -j DNAT --to-destination 172.17.0.2:443
pfSense /cf/conf/config.xml — sync DNS override changed from 172.16.3.21 → 172.16.3.20.
pfSense /var/unbound/host_entries.conf — Same change applied to running Unbound config.
D:\claudetools\tmp_syncro_patch.py — Deleted (cleanup after use).

Credentials & Secrets

None created or discovered this session.

Infrastructure & Servers

Host	IP	Role	Notes
pfSense	172.16.0.1	Router, DNS, Tailscale subnet router	SSH port 2248, user admin
Jupiter	172.16.3.20	Unraid NAS	All VMs + Docker; root SSH
Uranus	172.16.3.21	OwnCloud additional storage	NOT a reverse proxy; no working SSH key from this machine
GuruRMM VM	172.16.3.30	Linux VM on Jupiter	GuruRMM server, Coord API (8001), MariaDB, SSH as guru
Pluto	172.16.3.36	Windows Server 2019 VM on Jupiter	MSI build server

Docker containers on Jupiter:

Container	Image	Ports
npm	jc21/nginx-proxy-manager	1880→80, 7818→81, 18443→443
seafile	seafileltd/seafile-pro-mc:12.0-latest	8082→80
seafile-mysql	mariadb:10.6	internal
seafile-elasticsearch	elasticsearch:7.17.26	internal
seafile-memcached	memcached:1.6.18	internal

NPM SSL cert for sync.azcomputerguru.com: /etc/letsencrypt/live/npm-8/fullchain.pem — valid until 2026-07-25.

Tailscale subnet: 172.16.0.0/22 advertised via pfSense node pfsense-2 (100.119.153.74).

Commands & Outputs

# Syncro roadmap patch
scp D:/claudetools/tmp_syncro_patch.py guru@172.16.3.30:/tmp/syncro_patch.py
ssh guru@172.16.3.30 "python3 /tmp/syncro_patch.py"   # output: inserted
ssh guru@172.16.3.30 "cd /home/guru/gururmm && git add docs/FEATURE_ROADMAP.md && git commit -m '...' && git push origin main"
# Result: cef20dd pushed to azcomputerguru/gururmm main

# pfSense Tailscale fix
ssh -p 2248 admin@172.16.0.1 "tailscale up --advertise-routes=172.16.0.0/22 --accept-routes"

# libvirt recovery on Jupiter
ssh root@172.16.3.20 "losetup -f --show /mnt/user/system/libvirt/libvirt.img"  # returned /dev/loop4
ssh root@172.16.3.20 "mount /dev/loop4 /etc/libvirt && libvirtd -d"
ssh root@172.16.3.20 "virsh -c qemu:///system list --all"   # all 4 VMs running

# Seafile seahub fix
ssh root@172.16.3.20 "docker exec seafile bash -c 'cd /opt/seafile/seafile-pro-server-12.0.19 && ./seahub.sh start'"
# Verified: docker exec seafile curl -s -o /dev/null -w '%{http_code}' http://localhost:8000/  → 302

# iptables port 443 → NPM
ssh root@172.16.3.20 "iptables -t nat -I PREROUTING -p tcp --dport 443 -j DNAT --to-destination 172.17.0.2:443"
# Persisted: echo 'iptables ...' >> /boot/config/go

# pfSense DNS fix
ssh -p 2248 admin@172.16.0.1 "sed -i '' 's/172.16.3.21/172.16.3.20/g' /var/unbound/host_entries.conf"
ssh -p 2248 admin@172.16.0.1 "kill -HUP 62718"   # Unbound SIGHUP (PID from /var/run/unbound.pid)

# Final verification
# sync.azcomputerguru.com:443 → True; HTTPS 200

Pending / Incomplete Tasks

IX server BIOS power-restore setting: Did not auto-restart after power loss. Check "Power On After Power Failure" in BIOS or IPMI. Should match Jupiter's behavior (auto-start).
Unraid libvirt auto-mount on boot: The VM plugin should mount libvirt.img automatically but failed. A User Scripts plugin script triggered at array start would make the manual step 2c in the runbook unnecessary. Not implemented yet.
claudetools git remote URL: Verified it was corrected to http://172.16.3.20:3000 this session but it had reverted previously. Check other machines (MacBook, Howard) that they also have the internal URL.
/sync was never completed this session due to network being down when first attempted. Sync will run at end of this /save.
Pluto vault entry + SSH key — carried over from prior session, not addressed today.
BB-SERVER enrollment loop — carried over, not addressed.
PowerShell command_type bug on Windows PS 5.1 — carried over, not addressed.
Policy wiring plan (ticklish-questing-stallman.md) — still pending, not addressed.

Reference Information

gururmm Syncro roadmap commit: cef20dd — docs: add Syncro PSA Webhook Integration to roadmap (Howard feature request 2026-05-17)
Power failure runbook: D:\claudetools\.claude\POWER_FAILURE_RUNBOOK.md
NPM admin panel: http://172.16.3.20:7818
Seafile direct: http://172.16.3.20:8082
Coord message to Howard (post-mortem): 321679a2-d890-45e9-a19c-ca3701ad293b
Coord message to Howard (Syncro roadmap): 90d20ecb-eb40-4c40-86cc-6b52a8084304

12 KiB Raw Blame History Unescape Escape