12 KiB
Session Log — 2026-06-01
User
- User: Mike Swanson (mike)
- Machine: GURU-5070
- Role: admin
Session Summary
Three distinct work streams ran this session: Jupiter Unraid container templating, a GURU-KALI ghost-agent investigation that turned into a GuruRMM bug-tracking cleanup, and an EZ Fast Auto Glass email-deliverability investigation.
Jupiter Docker → Unraid templating. Inspected all 21 containers on Jupiter (172.16.3.20). 14 were created via raw docker run/docker-compose and lack the net.unraid.docker.* labels that drive the Unraid UI (WebUI button, icon, update-check, Edit form). Because those labels are immutable, the fix requires recreating each container. Executed target #1 (lowest risk): wrote an Unraid template for gururmm-agent, recreated it with net.unraid.docker.managed=dockerman and the faithful config; verified it re-authenticated and resumed reporting. Then added a device-id persistence mapping (/var/lib/gururmm → appdata) after noticing the agent's identity was ephemeral. Produced a full plan doc and held the higher-risk recreates (seafile, gitea, npm — compose stacks/public proxy) for a maintenance window.
GURU-KALI ghost agents. A ghost check after the Jupiter work surfaced ~11 duplicate GuruRMM agent rows for hostname GURU-KALI. Traced it to a daily device_id churn: the agent could not persist /var/lib/gururmm/.device-id. My initial diagnosis ("host root filesystem is read-only") was wrong — commands dispatched through the RMM agent execute inside the agent's ProtectSystem=strict systemd sandbox, so they showed the agent's namespace view, not the host. Sent the diagnosis to GURU-KALI's own Claude session via the coord API as a self-heal experiment; it correctly identified the real cause (the systemd unit's ReadWritePaths omitted /var/lib/gururmm), applied a drop-in override, and reported back. Purged the ghosts (keeper 9bca5090, stable device-id ec975630), discovering the DELETE /api/agents/:id endpoint is buggy (resets the connection / commits asynchronously). Filed three bugs and reconciled the bug-tracking system: GuruRMM bugs live as BUG-NNN sections in docs/FEATURE_ROADMAP.md (not Gitea issues), so BUG-018 was added there and the duplicate coord todos were closed.
EZ Fast Auto Glass Amazon email. Jon Shailer reported Amazon emails possibly blocked/filtered. ezfastautoglass.com email is hosted on the IX cPanel server (172.16.3.10 / 72.194.62.5). Exim logs over ~3 weeks showed every Amazon SES message accepted, scored NOT spam, delivered to the local mailbox AND forwarded to jshailer1@gmail.com (Gmail returns 250 OK). Nothing is blocked server-side. The likely cause is Gmail filing the forwarded mail into Spam/Promotions, aggravated by SRS being disabled (srs=0) so forwards fail SPF at Gmail. Drafted client instructions (Gmail filter + POP3 pull) and opened Syncro tracking ticket #32359.
Key Decisions
- Templated gururmm-agent via CLI
docker runwith thenet.unraid.docker.managed=dockermanlabel rather than the Unraid web UI — replicates exactly what Unraid's docker manager does, and works over SSH. Verified the label + faithful config post-recreate. - Mapped
/var/lib/gururmmto appdata for gururmm-agent so the agent's device-id survives recreates; copied the existing device-id out first so identity stayed stable (no new enrollment). - Held seafile/gitea/npm recreates for a maintenance window — compose stacks should be adopted into the Compose Manager plugin (already installed), and npm/gitea are high-blast-radius (public proxy / repos + build pipeline). Discourse (
app) left alone — self-managed by its own./launcher. - Used the coord API to hand the GURU-KALI diagnosis to its own Claude session as a self-heal test instead of remediating it remotely. It succeeded and corrected my misdiagnosis.
- Finished the ghost purge via the Database Agent (direct SQL) rather than the flaky
DELETE /api/agents/:idAPI. The API deletes turned out to commit asynchronously, so the ghosts were already gone by the time the DB Agent checked. - Kept GuruRMM bug tracking in
FEATURE_ROADMAP.md(no Gitea issues) per Mike, and added BUG-018 there for consistency; closed the interim coord todos as duplicates. - For ezfastautoglass, concluded the server is not at fault and pointed remediation at Gmail (filter) + an optional POP3-pull / SRS change, rather than touching the mail server reactively.
- Did not include the mailbox password in the client email draft — instructed that ACG provides it securely.
Problems Encountered
- Misdiagnosed GURU-KALI as "host fs read-only." RMM-agent-dispatched shell commands run inside the agent's
ProtectSystem=strictmount namespace (/is ro there), sofindmnt/touchreflected the sandbox, not the host. Resolved by GURU-KALI's own session (the real cause was the unit'sReadWritePathsmissing/var/lib/gururmm). Saved as memoryreference_rmm_agent_runs_in_systemd_sandbox.md. DELETE /api/agents/:idreturned HTTP 000 for all-but-one delete per burst. Consistent across 3 runs. Determined the deletes actually commit asynchronously (ghosts cleared ~5 min later with no further calls); likely a missing index on child-tableagent_idFK columns making theON DELETE CASCADEslow enough to reset the connection. Filed as BUG-018.post-bot-alert.shreturned Discord 400 (invalid JSON) on a message containing a Unicode em-dash/arrow — the helper does not JSON-escape non-ASCII. Resolved by resending ASCII-only. Helper bug noted.- Plink first-connect to IX server failed in batch mode (host key not cached). Resolved by pinning the presented fingerprint with
-hostkey. - Parent claudetools push rejected (non-fast-forward) during the BUG-018 commit — resolved by the Gitea Agent with
git pull --rebasethen push (no force).
Configuration Changes
Created:
wiki/systems/jupiter-docker-templating.md— Jupiter container inventory + recreate plan + execution log (gururmm-agent done, device-id fix, ghost-check notes)..claude/memory/reference_rmm_agent_runs_in_systemd_sandbox.md(+ MEMORY.md index pointer).clients/ezfastautoglass/jon-amazon-email-instructions.txt— client email draft (still untracked at save time; will be committed by sync).- On Jupiter:
/boot/config/plugins/dockerMan/templates-user/my-gururmm-agent.xml. - On GURU-KALI:
/etc/systemd/system/gururmm-agent.service.d/override.conf(ReadWritePaths=/var/lib/gururmm) — applied by GURU-KALI's session.
Modified:
projects/msp-tools/guru-rmm/docs/FEATURE_ROADMAP.md— added BUG-018 (gururmm commitc0c0119).- Jupiter:
gururmm-agentcontainer recreated twice (managed label; then +/var/lib/gururmmmapping). Backup/root/gururmm-agent.inspect.bak.json. - GuruRMM DB: GURU-KALI ghost agent rows deleted (11 → 1 keeper
9bca5090).
Closed (coord todos, status=done): 7eb7e60a, 5ada8ff4, 04497b97 (all dupes of BUG-016/017/018).
Credentials & Secrets
- IX cPanel server — vault
infrastructure/ix-server.sops.yaml: host 172.16.3.10 (ext 72.194.62.5), port 22, root. SSH host key fingerprintSHA256:GZYP/o5XUoRtFRCv1iGjxmqGfQoEsMuiNQBJucoJUh8. WHM 2087 / cPanel 2083. - Jupiter — vault
infrastructure/jupiter-unraid-primary.sops.yaml: 172.16.3.20:22 root. - GuruRMM API — vault
infrastructure/gururmm-server.sops.yaml(admin email/password). GuruRMM Postgres:gururmm@ 172.16.3.30:5432 (vaultprojects/gururmm/database.sops.yaml). - Syncro — vault
msp-tools/syncro.sops.yaml(Mike key, user_id 1735). - No new secrets created.
jon@ezfastautoglass.commailbox password NOT retrieved/created (would be needed if Jon opts for POP3 pull).
Infrastructure & Servers
- Jupiter 172.16.3.20 — Unraid primary container host. Compose Manager plugin installed. Compose files: gitea
/mnt/cache/appdata/gitea/docker-compose.yml, seafile/mnt/user0/SeaFile/DockerCompose/docker-compose.yml. gururmm-agent: net=host, imagelocalhost:3000/azcomputerguru/gururmm-agent:latest. - GURU-KALI — Kali Linux box, GuruRMM Linux agent via systemd
gururmm-agent.service(PID stable, no docker)./etc/gururmm/agent.toml→wss://rmm-api.azcomputerguru.com/ws. Stable device-idec975630-d297-4df9-bcb5-a445c65b648d, agent_id9bca5090-.... - IX cPanel server 172.16.3.10 (ext 72.194.62.5), Rocky Linux WHM/cPanel. ezfastautoglass.com: cPanel acct
ezfastautoglass, MX self (SPFv=spf1 +a +mx +ip4:72.194.62.5 ~all). Mailboxjon@forwards tojshailer1@gmail.com(+ local copy, 2,822 msgs). SRS disabled (srs=0). AutoSSL cert coversmail.ezfastautoglass.com(valid to 2026-07-29); POP3S 995 / IMAPS 993 listening. - GuruRMM server 172.16.3.30:3001; coord API 172.16.3.30:8001.
Commands & Outputs
- gururmm-agent recreate (Jupiter):
docker run -d --name=gururmm-agent --network=host --restart=unless-stopped -e GURURMM_CONFIG=/config/config.toml -l net.unraid.docker.managed=dockerman -v /dev/kvm:/dev/kvm:ro -v /proc:/proc:ro -v /sys:/sys:ro -v /var/run/docker.sock:/var/run/docker.sock -v /var/run/libvirt/libvirt-sock:/var/run/libvirt/libvirt-sock:ro -v /mnt/user/appdata/gururmm:/config -v /mnt/user/appdata/gururmm/lib:/var/lib/gururmm --entrypoint /usr/local/bin/gururmm-agent localhost:3000/azcomputerguru/gururmm-agent:latest run - GURU-KALI root cause (its journal):
WARN Failed to persist device ID: Read-only file system (os error 30); mount/var/lib/gururmm ext4 rw(bind) while sandbox/ro — systemdProtectSystem=strict+ReadWritePathsmissing/var/lib/gururmm. - Ghost purge:
DELETE /api/agents/:idreturned HTTP 000 (reset) for most; ghosts committed async. Final DB state: 1 GURU-KALI row (keeper), 84 agents total. - Exim (IX):
zgrep -ih ezfastautoglass /var/log/exim_mainlog* | grep -i amazon→ all<=fromsmtp-out.amazonses.com, SpamAssassin "NOT spam",=> jon ... Saved+=> jshailer1@gmail.com ... 250 OK. - Syncro ticket create: POST /tickets (customer_id 35547225, problem_type Email) → #32359 id 111854503; initial-issue comment 415836801.
Pending / Incomplete Tasks
- Jupiter templating (held): youtube-sync-test, radio-archive (low); binhex-radarr/sonarr, MariaDB-Official (templates exist, reconcile); seafile stack + gitea (adopt into Compose Manager); npm (recreate from existing template). All backup-first, in a maintenance window. Discourse left alone.
- ezfastautoglass: send Jon the drafted instructions; optionally enable SRS in WHM Exim (server-wide); if Jon wants POP3 pull, provide/reset
jon@mailbox password securely + drop the forward to avoid duplicates. Syncro #32359 open. - GuruRMM bugs open in roadmap: BUG-016 (agent ReadWritePaths — workaround only on GURU-KALI; installer fix pending, fleet-wide), BUG-017 (device_id churn — defense-in-depth), BUG-018 (DELETE endpoint / FK index).
- C2 (#4): beast→5070 H.264 cross-GPU test still deferred.
- Optional: harden
post-bot-alert.shto JSON-escape non-ASCII.
Reference Information
- Commits: gururmm
c0c0119(BUG-018), claudetools0925582(submodule bump). Submodule was ate3d6a46(KALI's BUG-016/017). - Syncro: ticket #32359 = https://computerguru.syncromsp.com/tickets/111854503 ; customer EZ Fast Auto Glass id 35547225 (Jon Shailer, jon@ezfastautoglass.com / jshailer1@gmail.com).
- Coord messages: GURU-KALI handoff
23b095d0, KALI replyd91406ce, purge confirmf2ee93b6. - Plan doc:
wiki/systems/jupiter-docker-templating.md. Bug tracker:projects/msp-tools/guru-rmm/docs/FEATURE_ROADMAP.md(BUG-NNN sections). - POP3 settings for Jon: server
mail.ezfastautoglass.com, port 995, SSL, userjon@ezfastautoglass.com.