Files
claudetools/session-logs/2026-06-01-session.md
Mike Swanson c5546f646c sync: auto-sync from GURU-5070 at 2026-06-01 08:06:52
Author: Mike Swanson
Machine: GURU-5070
Timestamp: 2026-06-01 08:06:52
2026-06-01 08:07:15 -07:00

12 KiB

Session Log — 2026-06-01

User

  • User: Mike Swanson (mike)
  • Machine: GURU-5070
  • Role: admin

Session Summary

Three distinct work streams ran this session: Jupiter Unraid container templating, a GURU-KALI ghost-agent investigation that turned into a GuruRMM bug-tracking cleanup, and an EZ Fast Auto Glass email-deliverability investigation.

Jupiter Docker → Unraid templating. Inspected all 21 containers on Jupiter (172.16.3.20). 14 were created via raw docker run/docker-compose and lack the net.unraid.docker.* labels that drive the Unraid UI (WebUI button, icon, update-check, Edit form). Because those labels are immutable, the fix requires recreating each container. Executed target #1 (lowest risk): wrote an Unraid template for gururmm-agent, recreated it with net.unraid.docker.managed=dockerman and the faithful config; verified it re-authenticated and resumed reporting. Then added a device-id persistence mapping (/var/lib/gururmm → appdata) after noticing the agent's identity was ephemeral. Produced a full plan doc and held the higher-risk recreates (seafile, gitea, npm — compose stacks/public proxy) for a maintenance window.

GURU-KALI ghost agents. A ghost check after the Jupiter work surfaced ~11 duplicate GuruRMM agent rows for hostname GURU-KALI. Traced it to a daily device_id churn: the agent could not persist /var/lib/gururmm/.device-id. My initial diagnosis ("host root filesystem is read-only") was wrong — commands dispatched through the RMM agent execute inside the agent's ProtectSystem=strict systemd sandbox, so they showed the agent's namespace view, not the host. Sent the diagnosis to GURU-KALI's own Claude session via the coord API as a self-heal experiment; it correctly identified the real cause (the systemd unit's ReadWritePaths omitted /var/lib/gururmm), applied a drop-in override, and reported back. Purged the ghosts (keeper 9bca5090, stable device-id ec975630), discovering the DELETE /api/agents/:id endpoint is buggy (resets the connection / commits asynchronously). Filed three bugs and reconciled the bug-tracking system: GuruRMM bugs live as BUG-NNN sections in docs/FEATURE_ROADMAP.md (not Gitea issues), so BUG-018 was added there and the duplicate coord todos were closed.

EZ Fast Auto Glass Amazon email. Jon Shailer reported Amazon emails possibly blocked/filtered. ezfastautoglass.com email is hosted on the IX cPanel server (172.16.3.10 / 72.194.62.5). Exim logs over ~3 weeks showed every Amazon SES message accepted, scored NOT spam, delivered to the local mailbox AND forwarded to jshailer1@gmail.com (Gmail returns 250 OK). Nothing is blocked server-side. The likely cause is Gmail filing the forwarded mail into Spam/Promotions, aggravated by SRS being disabled (srs=0) so forwards fail SPF at Gmail. Drafted client instructions (Gmail filter + POP3 pull) and opened Syncro tracking ticket #32359.

Key Decisions

  • Templated gururmm-agent via CLI docker run with the net.unraid.docker.managed=dockerman label rather than the Unraid web UI — replicates exactly what Unraid's docker manager does, and works over SSH. Verified the label + faithful config post-recreate.
  • Mapped /var/lib/gururmm to appdata for gururmm-agent so the agent's device-id survives recreates; copied the existing device-id out first so identity stayed stable (no new enrollment).
  • Held seafile/gitea/npm recreates for a maintenance window — compose stacks should be adopted into the Compose Manager plugin (already installed), and npm/gitea are high-blast-radius (public proxy / repos + build pipeline). Discourse (app) left alone — self-managed by its own ./launcher.
  • Used the coord API to hand the GURU-KALI diagnosis to its own Claude session as a self-heal test instead of remediating it remotely. It succeeded and corrected my misdiagnosis.
  • Finished the ghost purge via the Database Agent (direct SQL) rather than the flaky DELETE /api/agents/:id API. The API deletes turned out to commit asynchronously, so the ghosts were already gone by the time the DB Agent checked.
  • Kept GuruRMM bug tracking in FEATURE_ROADMAP.md (no Gitea issues) per Mike, and added BUG-018 there for consistency; closed the interim coord todos as duplicates.
  • For ezfastautoglass, concluded the server is not at fault and pointed remediation at Gmail (filter) + an optional POP3-pull / SRS change, rather than touching the mail server reactively.
  • Did not include the mailbox password in the client email draft — instructed that ACG provides it securely.

Problems Encountered

  • Misdiagnosed GURU-KALI as "host fs read-only." RMM-agent-dispatched shell commands run inside the agent's ProtectSystem=strict mount namespace (/ is ro there), so findmnt/touch reflected the sandbox, not the host. Resolved by GURU-KALI's own session (the real cause was the unit's ReadWritePaths missing /var/lib/gururmm). Saved as memory reference_rmm_agent_runs_in_systemd_sandbox.md.
  • DELETE /api/agents/:id returned HTTP 000 for all-but-one delete per burst. Consistent across 3 runs. Determined the deletes actually commit asynchronously (ghosts cleared ~5 min later with no further calls); likely a missing index on child-table agent_id FK columns making the ON DELETE CASCADE slow enough to reset the connection. Filed as BUG-018.
  • post-bot-alert.sh returned Discord 400 (invalid JSON) on a message containing a Unicode em-dash/arrow — the helper does not JSON-escape non-ASCII. Resolved by resending ASCII-only. Helper bug noted.
  • Plink first-connect to IX server failed in batch mode (host key not cached). Resolved by pinning the presented fingerprint with -hostkey.
  • Parent claudetools push rejected (non-fast-forward) during the BUG-018 commit — resolved by the Gitea Agent with git pull --rebase then push (no force).

Configuration Changes

Created:

  • wiki/systems/jupiter-docker-templating.md — Jupiter container inventory + recreate plan + execution log (gururmm-agent done, device-id fix, ghost-check notes).
  • .claude/memory/reference_rmm_agent_runs_in_systemd_sandbox.md (+ MEMORY.md index pointer).
  • clients/ezfastautoglass/jon-amazon-email-instructions.txt — client email draft (still untracked at save time; will be committed by sync).
  • On Jupiter: /boot/config/plugins/dockerMan/templates-user/my-gururmm-agent.xml.
  • On GURU-KALI: /etc/systemd/system/gururmm-agent.service.d/override.conf (ReadWritePaths=/var/lib/gururmm) — applied by GURU-KALI's session.

Modified:

  • projects/msp-tools/guru-rmm/docs/FEATURE_ROADMAP.md — added BUG-018 (gururmm commit c0c0119).
  • Jupiter: gururmm-agent container recreated twice (managed label; then + /var/lib/gururmm mapping). Backup /root/gururmm-agent.inspect.bak.json.
  • GuruRMM DB: GURU-KALI ghost agent rows deleted (11 → 1 keeper 9bca5090).

Closed (coord todos, status=done): 7eb7e60a, 5ada8ff4, 04497b97 (all dupes of BUG-016/017/018).

Credentials & Secrets

  • IX cPanel server — vault infrastructure/ix-server.sops.yaml: host 172.16.3.10 (ext 72.194.62.5), port 22, root. SSH host key fingerprint SHA256:GZYP/o5XUoRtFRCv1iGjxmqGfQoEsMuiNQBJucoJUh8. WHM 2087 / cPanel 2083.
  • Jupiter — vault infrastructure/jupiter-unraid-primary.sops.yaml: 172.16.3.20:22 root.
  • GuruRMM API — vault infrastructure/gururmm-server.sops.yaml (admin email/password). GuruRMM Postgres: gururmm @ 172.16.3.30:5432 (vault projects/gururmm/database.sops.yaml).
  • Syncro — vault msp-tools/syncro.sops.yaml (Mike key, user_id 1735).
  • No new secrets created. jon@ezfastautoglass.com mailbox password NOT retrieved/created (would be needed if Jon opts for POP3 pull).

Infrastructure & Servers

  • Jupiter 172.16.3.20 — Unraid primary container host. Compose Manager plugin installed. Compose files: gitea /mnt/cache/appdata/gitea/docker-compose.yml, seafile /mnt/user0/SeaFile/DockerCompose/docker-compose.yml. gururmm-agent: net=host, image localhost:3000/azcomputerguru/gururmm-agent:latest.
  • GURU-KALI — Kali Linux box, GuruRMM Linux agent via systemd gururmm-agent.service (PID stable, no docker). /etc/gururmm/agent.tomlwss://rmm-api.azcomputerguru.com/ws. Stable device-id ec975630-d297-4df9-bcb5-a445c65b648d, agent_id 9bca5090-....
  • IX cPanel server 172.16.3.10 (ext 72.194.62.5), Rocky Linux WHM/cPanel. ezfastautoglass.com: cPanel acct ezfastautoglass, MX self (SPF v=spf1 +a +mx +ip4:72.194.62.5 ~all). Mailbox jon@ forwards to jshailer1@gmail.com (+ local copy, 2,822 msgs). SRS disabled (srs=0). AutoSSL cert covers mail.ezfastautoglass.com (valid to 2026-07-29); POP3S 995 / IMAPS 993 listening.
  • GuruRMM server 172.16.3.30:3001; coord API 172.16.3.30:8001.

Commands & Outputs

  • gururmm-agent recreate (Jupiter): docker run -d --name=gururmm-agent --network=host --restart=unless-stopped -e GURURMM_CONFIG=/config/config.toml -l net.unraid.docker.managed=dockerman -v /dev/kvm:/dev/kvm:ro -v /proc:/proc:ro -v /sys:/sys:ro -v /var/run/docker.sock:/var/run/docker.sock -v /var/run/libvirt/libvirt-sock:/var/run/libvirt/libvirt-sock:ro -v /mnt/user/appdata/gururmm:/config -v /mnt/user/appdata/gururmm/lib:/var/lib/gururmm --entrypoint /usr/local/bin/gururmm-agent localhost:3000/azcomputerguru/gururmm-agent:latest run
  • GURU-KALI root cause (its journal): WARN Failed to persist device ID: Read-only file system (os error 30); mount /var/lib/gururmm ext4 rw (bind) while sandbox / ro — systemd ProtectSystem=strict + ReadWritePaths missing /var/lib/gururmm.
  • Ghost purge: DELETE /api/agents/:id returned HTTP 000 (reset) for most; ghosts committed async. Final DB state: 1 GURU-KALI row (keeper), 84 agents total.
  • Exim (IX): zgrep -ih ezfastautoglass /var/log/exim_mainlog* | grep -i amazon → all <= from smtp-out.amazonses.com, SpamAssassin "NOT spam", => jon ... Saved + => jshailer1@gmail.com ... 250 OK.
  • Syncro ticket create: POST /tickets (customer_id 35547225, problem_type Email) → #32359 id 111854503; initial-issue comment 415836801.

Pending / Incomplete Tasks

  • Jupiter templating (held): youtube-sync-test, radio-archive (low); binhex-radarr/sonarr, MariaDB-Official (templates exist, reconcile); seafile stack + gitea (adopt into Compose Manager); npm (recreate from existing template). All backup-first, in a maintenance window. Discourse left alone.
  • ezfastautoglass: send Jon the drafted instructions; optionally enable SRS in WHM Exim (server-wide); if Jon wants POP3 pull, provide/reset jon@ mailbox password securely + drop the forward to avoid duplicates. Syncro #32359 open.
  • GuruRMM bugs open in roadmap: BUG-016 (agent ReadWritePaths — workaround only on GURU-KALI; installer fix pending, fleet-wide), BUG-017 (device_id churn — defense-in-depth), BUG-018 (DELETE endpoint / FK index).
  • C2 (#4): beast→5070 H.264 cross-GPU test still deferred.
  • Optional: harden post-bot-alert.sh to JSON-escape non-ASCII.

Reference Information

  • Commits: gururmm c0c0119 (BUG-018), claudetools 0925582 (submodule bump). Submodule was at e3d6a46 (KALI's BUG-016/017).
  • Syncro: ticket #32359 = https://computerguru.syncromsp.com/tickets/111854503 ; customer EZ Fast Auto Glass id 35547225 (Jon Shailer, jon@ezfastautoglass.com / jshailer1@gmail.com).
  • Coord messages: GURU-KALI handoff 23b095d0, KALI reply d91406ce, purge confirm f2ee93b6.
  • Plan doc: wiki/systems/jupiter-docker-templating.md. Bug tracker: projects/msp-tools/guru-rmm/docs/FEATURE_ROADMAP.md (BUG-NNN sections).
  • POP3 settings for Jon: server mail.ezfastautoglass.com, port 995, SSL, user jon@ezfastautoglass.com.