91 lines
12 KiB
Markdown
91 lines
12 KiB
Markdown
# Session Log — 2026-06-01
|
|
|
|
## User
|
|
- **User:** Mike Swanson (mike)
|
|
- **Machine:** GURU-5070
|
|
- **Role:** admin
|
|
|
|
## Session Summary
|
|
|
|
Three distinct work streams ran this session: Jupiter Unraid container templating, a GURU-KALI ghost-agent investigation that turned into a GuruRMM bug-tracking cleanup, and an EZ Fast Auto Glass email-deliverability investigation.
|
|
|
|
**Jupiter Docker → Unraid templating.** Inspected all 21 containers on Jupiter (172.16.3.20). 14 were created via raw `docker run`/`docker-compose` and lack the `net.unraid.docker.*` labels that drive the Unraid UI (WebUI button, icon, update-check, Edit form). Because those labels are immutable, the fix requires recreating each container. Executed target #1 (lowest risk): wrote an Unraid template for `gururmm-agent`, recreated it with `net.unraid.docker.managed=dockerman` and the faithful config; verified it re-authenticated and resumed reporting. Then added a device-id persistence mapping (`/var/lib/gururmm` → appdata) after noticing the agent's identity was ephemeral. Produced a full plan doc and held the higher-risk recreates (seafile, gitea, npm — compose stacks/public proxy) for a maintenance window.
|
|
|
|
**GURU-KALI ghost agents.** A ghost check after the Jupiter work surfaced ~11 duplicate GuruRMM agent rows for hostname GURU-KALI. Traced it to a daily device_id churn: the agent could not persist `/var/lib/gururmm/.device-id`. My initial diagnosis ("host root filesystem is read-only") was wrong — commands dispatched through the RMM agent execute inside the agent's `ProtectSystem=strict` systemd sandbox, so they showed the agent's namespace view, not the host. Sent the diagnosis to GURU-KALI's own Claude session via the coord API as a self-heal experiment; it correctly identified the real cause (the systemd unit's `ReadWritePaths` omitted `/var/lib/gururmm`), applied a drop-in override, and reported back. Purged the ghosts (keeper `9bca5090`, stable device-id `ec975630`), discovering the `DELETE /api/agents/:id` endpoint is buggy (resets the connection / commits asynchronously). Filed three bugs and reconciled the bug-tracking system: GuruRMM bugs live as `BUG-NNN` sections in `docs/FEATURE_ROADMAP.md` (not Gitea issues), so BUG-018 was added there and the duplicate coord todos were closed.
|
|
|
|
**EZ Fast Auto Glass Amazon email.** Jon Shailer reported Amazon emails possibly blocked/filtered. ezfastautoglass.com email is hosted on the IX cPanel server (172.16.3.10 / 72.194.62.5). Exim logs over ~3 weeks showed every Amazon SES message accepted, scored NOT spam, delivered to the local mailbox AND forwarded to jshailer1@gmail.com (Gmail returns 250 OK). Nothing is blocked server-side. The likely cause is Gmail filing the forwarded mail into Spam/Promotions, aggravated by SRS being disabled (srs=0) so forwards fail SPF at Gmail. Drafted client instructions (Gmail filter + POP3 pull) and opened Syncro tracking ticket #32359.
|
|
|
|
## Key Decisions
|
|
|
|
- **Templated gururmm-agent via CLI `docker run` with the `net.unraid.docker.managed=dockerman` label** rather than the Unraid web UI — replicates exactly what Unraid's docker manager does, and works over SSH. Verified the label + faithful config post-recreate.
|
|
- **Mapped `/var/lib/gururmm` to appdata** for gururmm-agent so the agent's device-id survives recreates; copied the existing device-id out first so identity stayed stable (no new enrollment).
|
|
- **Held seafile/gitea/npm recreates** for a maintenance window — compose stacks should be adopted into the Compose Manager plugin (already installed), and npm/gitea are high-blast-radius (public proxy / repos + build pipeline). Discourse (`app`) left alone — self-managed by its own `./launcher`.
|
|
- **Used the coord API to hand the GURU-KALI diagnosis to its own Claude session** as a self-heal test instead of remediating it remotely. It succeeded and corrected my misdiagnosis.
|
|
- **Finished the ghost purge via the Database Agent (direct SQL)** rather than the flaky `DELETE /api/agents/:id` API. The API deletes turned out to commit asynchronously, so the ghosts were already gone by the time the DB Agent checked.
|
|
- **Kept GuruRMM bug tracking in `FEATURE_ROADMAP.md` (no Gitea issues)** per Mike, and added BUG-018 there for consistency; closed the interim coord todos as duplicates.
|
|
- **For ezfastautoglass, concluded the server is not at fault** and pointed remediation at Gmail (filter) + an optional POP3-pull / SRS change, rather than touching the mail server reactively.
|
|
- **Did not include the mailbox password in the client email draft** — instructed that ACG provides it securely.
|
|
|
|
## Problems Encountered
|
|
|
|
- **Misdiagnosed GURU-KALI as "host fs read-only."** RMM-agent-dispatched shell commands run inside the agent's `ProtectSystem=strict` mount namespace (`/` is ro there), so `findmnt`/`touch` reflected the sandbox, not the host. Resolved by GURU-KALI's own session (the real cause was the unit's `ReadWritePaths` missing `/var/lib/gururmm`). Saved as memory `reference_rmm_agent_runs_in_systemd_sandbox.md`.
|
|
- **`DELETE /api/agents/:id` returned HTTP 000 for all-but-one delete per burst.** Consistent across 3 runs. Determined the deletes actually commit asynchronously (ghosts cleared ~5 min later with no further calls); likely a missing index on child-table `agent_id` FK columns making the `ON DELETE CASCADE` slow enough to reset the connection. Filed as BUG-018.
|
|
- **`post-bot-alert.sh` returned Discord 400 (invalid JSON)** on a message containing a Unicode em-dash/arrow — the helper does not JSON-escape non-ASCII. Resolved by resending ASCII-only. Helper bug noted.
|
|
- **Plink first-connect to IX server failed in batch mode** (host key not cached). Resolved by pinning the presented fingerprint with `-hostkey`.
|
|
- **Parent claudetools push rejected (non-fast-forward)** during the BUG-018 commit — resolved by the Gitea Agent with `git pull --rebase` then push (no force).
|
|
|
|
## Configuration Changes
|
|
|
|
Created:
|
|
- `wiki/systems/jupiter-docker-templating.md` — Jupiter container inventory + recreate plan + execution log (gururmm-agent done, device-id fix, ghost-check notes).
|
|
- `.claude/memory/reference_rmm_agent_runs_in_systemd_sandbox.md` (+ MEMORY.md index pointer).
|
|
- `clients/ezfastautoglass/jon-amazon-email-instructions.txt` — client email draft (still untracked at save time; will be committed by sync).
|
|
- On Jupiter: `/boot/config/plugins/dockerMan/templates-user/my-gururmm-agent.xml`.
|
|
- On GURU-KALI: `/etc/systemd/system/gururmm-agent.service.d/override.conf` (`ReadWritePaths=/var/lib/gururmm`) — applied by GURU-KALI's session.
|
|
|
|
Modified:
|
|
- `projects/msp-tools/guru-rmm/docs/FEATURE_ROADMAP.md` — added BUG-018 (gururmm commit `c0c0119`).
|
|
- Jupiter: `gururmm-agent` container recreated twice (managed label; then + `/var/lib/gururmm` mapping). Backup `/root/gururmm-agent.inspect.bak.json`.
|
|
- GuruRMM DB: GURU-KALI ghost agent rows deleted (11 → 1 keeper `9bca5090`).
|
|
|
|
Closed (coord todos, status=done): `7eb7e60a`, `5ada8ff4`, `04497b97` (all dupes of BUG-016/017/018).
|
|
|
|
## Credentials & Secrets
|
|
|
|
- **IX cPanel server** — vault `infrastructure/ix-server.sops.yaml`: host 172.16.3.10 (ext 72.194.62.5), port 22, root. SSH host key fingerprint `SHA256:GZYP/o5XUoRtFRCv1iGjxmqGfQoEsMuiNQBJucoJUh8`. WHM 2087 / cPanel 2083.
|
|
- **Jupiter** — vault `infrastructure/jupiter-unraid-primary.sops.yaml`: 172.16.3.20:22 root.
|
|
- **GuruRMM API** — vault `infrastructure/gururmm-server.sops.yaml` (admin email/password). GuruRMM Postgres: `gururmm` @ 172.16.3.30:5432 (vault `projects/gururmm/database.sops.yaml`).
|
|
- **Syncro** — vault `msp-tools/syncro.sops.yaml` (Mike key, user_id 1735).
|
|
- No new secrets created. `jon@ezfastautoglass.com` mailbox password NOT retrieved/created (would be needed if Jon opts for POP3 pull).
|
|
|
|
## Infrastructure & Servers
|
|
|
|
- **Jupiter** 172.16.3.20 — Unraid primary container host. Compose Manager plugin installed. Compose files: gitea `/mnt/cache/appdata/gitea/docker-compose.yml`, seafile `/mnt/user0/SeaFile/DockerCompose/docker-compose.yml`. gururmm-agent: net=host, image `localhost:3000/azcomputerguru/gururmm-agent:latest`.
|
|
- **GURU-KALI** — Kali Linux box, GuruRMM Linux agent via systemd `gururmm-agent.service` (PID stable, no docker). `/etc/gururmm/agent.toml` → `wss://rmm-api.azcomputerguru.com/ws`. Stable device-id `ec975630-d297-4df9-bcb5-a445c65b648d`, agent_id `9bca5090-...`.
|
|
- **IX cPanel server** 172.16.3.10 (ext 72.194.62.5), Rocky Linux WHM/cPanel. ezfastautoglass.com: cPanel acct `ezfastautoglass`, MX self (SPF `v=spf1 +a +mx +ip4:72.194.62.5 ~all`). Mailbox `jon@` forwards to `jshailer1@gmail.com` (+ local copy, 2,822 msgs). SRS disabled (srs=0). AutoSSL cert covers `mail.ezfastautoglass.com` (valid to 2026-07-29); POP3S 995 / IMAPS 993 listening.
|
|
- **GuruRMM** server 172.16.3.30:3001; coord API 172.16.3.30:8001.
|
|
|
|
## Commands & Outputs
|
|
|
|
- gururmm-agent recreate (Jupiter): `docker run -d --name=gururmm-agent --network=host --restart=unless-stopped -e GURURMM_CONFIG=/config/config.toml -l net.unraid.docker.managed=dockerman -v /dev/kvm:/dev/kvm:ro -v /proc:/proc:ro -v /sys:/sys:ro -v /var/run/docker.sock:/var/run/docker.sock -v /var/run/libvirt/libvirt-sock:/var/run/libvirt/libvirt-sock:ro -v /mnt/user/appdata/gururmm:/config -v /mnt/user/appdata/gururmm/lib:/var/lib/gururmm --entrypoint /usr/local/bin/gururmm-agent localhost:3000/azcomputerguru/gururmm-agent:latest run`
|
|
- GURU-KALI root cause (its journal): `WARN Failed to persist device ID: Read-only file system (os error 30)`; mount `/var/lib/gururmm ext4 rw` (bind) while sandbox `/` ro — systemd `ProtectSystem=strict` + `ReadWritePaths` missing `/var/lib/gururmm`.
|
|
- Ghost purge: `DELETE /api/agents/:id` returned HTTP 000 (reset) for most; ghosts committed async. Final DB state: 1 GURU-KALI row (keeper), 84 agents total.
|
|
- Exim (IX): `zgrep -ih ezfastautoglass /var/log/exim_mainlog* | grep -i amazon` → all `<=` from `smtp-out.amazonses.com`, SpamAssassin "NOT spam", `=> jon ... Saved` + `=> jshailer1@gmail.com ... 250 OK`.
|
|
- Syncro ticket create: POST /tickets (customer_id 35547225, problem_type Email) → #32359 id 111854503; initial-issue comment 415836801.
|
|
|
|
## Pending / Incomplete Tasks
|
|
|
|
- **Jupiter templating (held):** youtube-sync-test, radio-archive (low); binhex-radarr/sonarr, MariaDB-Official (templates exist, reconcile); seafile stack + gitea (adopt into Compose Manager); npm (recreate from existing template). All backup-first, in a maintenance window. Discourse left alone.
|
|
- **ezfastautoglass:** send Jon the drafted instructions; optionally enable SRS in WHM Exim (server-wide); if Jon wants POP3 pull, provide/reset `jon@` mailbox password securely + drop the forward to avoid duplicates. Syncro #32359 open.
|
|
- **GuruRMM bugs open in roadmap:** BUG-016 (agent ReadWritePaths — workaround only on GURU-KALI; installer fix pending, fleet-wide), BUG-017 (device_id churn — defense-in-depth), BUG-018 (DELETE endpoint / FK index).
|
|
- **C2 (#4):** beast→5070 H.264 cross-GPU test still deferred.
|
|
- Optional: harden `post-bot-alert.sh` to JSON-escape non-ASCII.
|
|
|
|
## Reference Information
|
|
|
|
- Commits: gururmm `c0c0119` (BUG-018), claudetools `0925582` (submodule bump). Submodule was at `e3d6a46` (KALI's BUG-016/017).
|
|
- Syncro: ticket #32359 = https://computerguru.syncromsp.com/tickets/111854503 ; customer EZ Fast Auto Glass id 35547225 (Jon Shailer, jon@ezfastautoglass.com / jshailer1@gmail.com).
|
|
- Coord messages: GURU-KALI handoff `23b095d0`, KALI reply `d91406ce`, purge confirm `f2ee93b6`.
|
|
- Plan doc: `wiki/systems/jupiter-docker-templating.md`. Bug tracker: `projects/msp-tools/guru-rmm/docs/FEATURE_ROADMAP.md` (BUG-NNN sections).
|
|
- POP3 settings for Jon: server `mail.ezfastautoglass.com`, port 995, SSL, user `jon@ezfastautoglass.com`.
|