sync: auto-sync from GURU-5070 at 2026-06-01 06:57:20

Author: Mike Swanson
Machine: GURU-5070
Timestamp: 2026-06-01 06:57:20
This commit is contained in:
2026-06-01 06:57:27 -07:00
parent ba7aeebf9e
commit 501f3eb130
4 changed files with 212 additions and 0 deletions

View File

@@ -1,6 +1,7 @@
# Memory Index
## Reference
- [RMM agent runs in systemd sandbox](reference_rmm_agent_runs_in_systemd_sandbox.md) — Commands dispatched via the GuruRMM agent run inside its ProtectSystem=strict namespace (/ is ro there); fs/mount probes show the agent's view NOT the host. SSH or read /proc/<pid>/mountinfo for host truth. (lesson 2026-06-01, GURU-KALI ghost churn)
- [GURU-5070 Rust toolchain](reference_guru5070_rust_toolchain.md) — GURU-5070 now has cargo + MSVC + protoc; build/clippy/test guru-connect LOCALLY (set PROTOC to the winget path) instead of the build host. CI only clippy-checks the Linux server, not the Windows agent.
- [ACG Office Network Infrastructure](infra_office_network.md) — IPs/hosts/roles for pfSense/Jupiter/VMs/Docker. Check before assuming; .21 (Uranus) is storage.
- [Power Failure Runbook](../POWER_FAILURE_RUNBOOK.md) — Recovery order after a power event: Tailscale routes, libvirt/VMs, Seafile, NPM/DNS.
@@ -21,6 +22,7 @@
- [GuruRMM user_session command context](reference_gururmm_user_session_context.md) — command API `context=user_session` runs as the logged-on user (WTS); does interactive-only cmds that fail as SYSTEM. Needs an active (admin) user.
- [Pluto Build Server](reference_pluto_build_server.md) — Windows build VM: hostname PLUTO = Unraid VM "Claude-Builder" = 172.16.3.36 (all the same box). MSVC + WiX. No `pluto` vault entry. Drive via /rmm (agent enrolls as PLUTO) when SSH key isn't authorized.
- [Coord /messages API shape](reference_coord_messages_api_shape.md) — GET /api/coord/messages returns {total,skip,limit,messages[]} NOT a bare array; parse .messages[], strip control chars, read flag may be null.
- [GuruRMM pipeline vendored](reference_gururmm_pipeline_vendored.md) — RMM build scripts version-controlled at gururmm `deploy/build-pipeline/` (2026-06-01); build-shared.sh auto-syncs them to /opt/gururmm each build. Edit-in-repo + push = live, EXCEPT build-shared.sh + webhook-handler.py (manual cp).
- [Gitea API credential](reference_gitea_api_credential.md) — Gitea API (PRs/merges) as howard uses services/gitea-howard.sops.yaml password on internal http://172.16.3.20:3000; NOT the gururmm-server SSH password.
## Users

View File

@@ -0,0 +1,29 @@
---
name: reference_gururmm_pipeline_vendored
description: GuruRMM build-pipeline scripts are now version-controlled at deploy/build-pipeline/ in the gururmm repo (2026-06-01); build-shared.sh auto-syncs them to /opt/gururmm each build, so edit-in-repo + push = live — EXCEPT build-shared.sh + webhook-handler.py, which need a manual cp.
metadata:
type: reference
---
The GuruRMM build/CI pipeline runs at **`/opt/gururmm/`** on the gururmm server (172.16.3.30,
root-owned, hand-maintained). Those scripts had silently diverged from the repo's older `scripts/`
generation (that drift caused the BUG-015 Windows build-gate gap). Reconciled 2026-06-01:
- **Source of truth:** the live scripts are vendored into the gururmm repo at
**`deploy/build-pipeline/`** (build-{windows,linux,mac,agents,server,shared}.sh, sign-windows.sh,
webhook-handler.py + README). Commit `2bf539e`.
- **Drift-stop (commit `24b5daf`):** `build-shared.sh` (runs first every build, after
`git reset --hard origin/main`) now `install -m 0755`-syncs the 6 build scripts from
`deploy/build-pipeline/``/opt/gururmm/` each build. So to change a GuruRMM build script:
**edit it in `deploy/build-pipeline/`, push to gururmm main — the next build runs it.** No manual
copy, no restart.
- **Two exceptions — need a manual `sudo cp` on change** (they can't self-overwrite mid-run):
`build-shared.sh` (the running puller) and `webhook-handler.py` (the persistent HTTP server;
also needs `sudo systemctl restart gururmm-webhook` to reload). They change rarely. See
`deploy/build-pipeline/README.md`.
Webhook still INVOKES the `/opt/gururmm` copies (not the repo copies directly) — the sync keeps
them current. The repo's older `scripts/webhook-handler.py` + `scripts/build-agents.sh` are a prior
generation, superseded. Build-windows.sh's change-gate watches `agent/ installer/` (BUG-015 fix —
installer-only `.wxs`/`.ico` changes rebuild the MSI). Supersedes the "repo copy is stale, don't
redeploy" caveat in [[project_rmm_webhook_docs_guard]] for the build scripts (not webhook-handler.py).

View File

@@ -0,0 +1,36 @@
---
name: reference_rmm_agent_runs_in_systemd_sandbox
description: Commands dispatched via the GuruRMM agent execute INSIDE the agent's systemd sandbox (ProtectSystem=strict) — fs/mount observations reflect the agent's private namespace, NOT the host. For host truth, SSH directly or read /proc/<host-pid>/mountinfo.
metadata:
type: reference
---
The GuruRMM Linux agent runs as a systemd service (`gururmm-agent.service`) hardened with
**`ProtectSystem=strict`**, which gives the agent process a **private mount namespace where `/`
is mounted read-only**, with only `ReadWritePaths=` entries writable. **Any command you dispatch
through the RMM agent (`/rmm shell`, probes) runs inside that namespace** — so `findmnt /`,
`touch`, `/proc/mounts` etc. report the **agent's sandboxed view, not the host's actual state**.
**Trap (hit 2026-06-01, GURU-KALI):** I diagnosed "host root filesystem is read-only" because
RMM-dispatched `touch /var/lib/gururmm` returned EROFS (os error 30) and `findmnt /` showed `ro`.
The host root was **rw the entire time** (SMART PASSED, ext4 clean, no kernel remount-ro — all
consistent with the host being fine). The real cause: the unit's
`ReadWritePaths=/var/log /usr/local/bin /etc/gururmm` **omitted `/var/lib/gururmm`**, so the agent
couldn't persist `/var/lib/gururmm/.device-id` → it re-minted a device_id on each daily
identity refresh → the server (no machine_uid dedup) filed a new agent row each time (~11 ghosts).
**How to get host truth instead of the sandbox view:**
- SSH to the host directly (commands there run in the host namespace), OR
- Read the agent PID's namespace explicitly: `cat /proc/<agent_pid>/mountinfo` — the process-scoped
`ro` on `/` is the tell that it's sandbox, not host. Compare against the host's `findmnt`.
- `errors=remount-ro` in a mount line is just the stock default mount option — NOT evidence an
error fired. Confirm an actual remount-ro with kernel `EXT4-fs error` logs + `dumpe2fs -h` error
count, not the mount option alone.
**The fix pattern** (durable, additive): drop-in
`/etc/systemd/system/gururmm-agent.service.d/override.conf` with `[Service]\nReadWritePaths=/var/lib/gururmm`
(systemd merges ReadWritePaths additively across drop-ins), then `daemon-reload` + `restart`.
Better upstream fix: `StateDirectory=gururmm` (handles dir creation + perms + RW bind in one
directive). **Fleet implication:** every systemd-installed GuruRMM Linux agent with this unit shape
has the same latent bug until the installer is fixed. See filed todos (agent ReadWritePaths/
StateDirectory + server machine_uid dedup).

View File

@@ -0,0 +1,145 @@
# Jupiter — Docker → Unraid Template/Compose Adoption Plan
**System:** Jupiter (Unraid primary container host) — `172.16.3.20`, SSH `root@:22`
(creds: `infrastructure/jupiter-unraid-primary.sops.yaml`).
**Goal:** make every container show Unraid's UI features (WebUI button, icon, update-check,
rich Edit form) by giving it the `net.unraid.docker.*` labels it currently lacks.
**Status:** INSPECTION + PLANNING complete (2026-06-01). **Target #1 (gururmm-agent) DONE
2026-06-01** — workflow validated. Remaining recreates HELD for a maintenance window.
## Execution log
- **2026-06-01 — gururmm-agent [DONE].** Wrote `templates-user/my-gururmm-agent.xml`; backed up
full inspect to `/root/gururmm-agent.inspect.bak.json`; stopped+rm'd, recreated via `docker run`
with `-l net.unraid.docker.managed=dockerman` + the captured spec. Verified: label=dockerman,
config faithful, agent re-authenticated and resumed metrics/inventory/check polling. One clean
exit-0 restart at startup = the agent's self-update finalize (cleaned rollback artifacts), then
stable. Now shows as a managed container in the Unraid Docker tab with the rich Edit form.
- **device-id persistence [FIXED 2026-06-01]:** the agent's device-id lived at
`/var/lib/gururmm/.device-id` *inside* the container (ephemeral). Fix: `docker cp`'d the live
`/var/lib/gururmm/.` out to `/mnt/user/appdata/gururmm/lib/` (preserving device-id
`88abeef0-cb3a-4c3f-9353-61fedcdf587d`), added a `-v /mnt/user/appdata/gururmm/lib:/var/lib/gururmm`
mapping + matching template Config, and recreated. Verified the agent reused the same device-id
(no "Persisting new device ID") and the **same agent_id `443bfabb`** — identity now durable
across recreates/updates. Enrollment identity also persists via `config.toml` in `/config`.
- **Ghost check [CLEAN]:** GuruRMM `/api/agents` shows exactly one Jupiter row (`443bfabb`,
last-seen current). The three recreates created no duplicate. **Incidental:** `GURU-KALI` has
~11 duplicate agent rows (v0.6.46/0.6.50, stale) — same ephemeral-identity pattern on a
frequently-reinstalled box; cleanup candidate, out of scope for this task.
---
## Why the features are missing
Unraid's per-container UI (WebUI/icon/update-check/Edit) is driven by **container labels**
(`net.unraid.docker.managed`, `.webui`, `.icon`) + a template XML in
`/boot/config/plugins/dockerMan/templates-user/`. Those labels are **immutable on a running
container** — they're baked in at `docker create` time. Containers started by raw
`docker run` / `docker-compose` CLI (instead of Unraid's "Add Container" form or the Compose
Manager plugin) never get them. **The only fix is to RECREATE each container** through the
proper mechanism. Data in mapped volumes is untouched by a recreate; the risk is downtime +
getting the recreate config exactly right.
Two correct mechanisms (both yield the full UI feature set):
- **dockerman template** — for single CLI containers. Unraid "Add Container" → template.
- **composeman (Compose Manager plugin)** — for multi-container compose stacks. Adopt the
existing `docker-compose.yml` into the plugin so the whole stack gets the labels while
keeping its compose orchestration + private network. Plugin IS installed (only "RustDesk"
registered today).
---
## Inventory (21 containers; 14 raw / `managed=NONE`)
### Already templated (managed=dockerman) — no action
DockerUISP, Seerr, qbittorrent, binhex-emby, binhex-sabnzbd, binhex-plexpass, rsync-server
### Raw (managed=NONE) — the targets, grouped by disposition
| Container | Image | Created via | Existing template | Disposition | Risk |
|---|---|---|---|---|---|
| **gururmm-agent** | localhost:3000/azcomputerguru/gururmm-agent:latest | CLI, net=host | none | NEW dockerman template | LOW |
| youtube-sync-test | azcomputerguru/youtube-sync:latest | CLI | none | NEW template (or retire — "test") | LOW |
| binhex-radarr | binhex/arch-radarr | CLI | my-binhex-radarr.xml | reconcile + recreate from template | MED |
| binhex-sonarr | binhex/arch-sonarr | CLI | my-binhex-sonarr.xml | reconcile + recreate from template | MED |
| MariaDB-Official | mariadb:latest | CLI | my-MariaDB-Official.xml | reconcile + recreate (snapshot appdata first) | MED (DB) |
| **seafile** + seafile-mysql + seafile-memcached + seafile-elasticsearch | seafileltd/seafile-pro-mc:12.0 / mariadb:10.6 / memcached:1.6.18 / elasticsearch:7.17.26 | compose `dockercompose` @ /mnt/user0/SeaFile/DockerCompose/docker-compose.yml | partial (my-SeaFile*.xml, my-memcached.xml) | **adopt stack into Compose Manager** | MED-HIGH |
| **gitea** + **gitea-db** | gitea/gitea:latest / mysql:8 | compose `gitea` @ /mnt/cache/appdata/gitea/docker-compose.yml | none | **adopt stack into Compose Manager** | **HIGH** (repos + GuruRMM build pipeline) |
| **npm** | jc21/nginx-proxy-manager:latest | CLI | my-NginxProxyManager.xml | reconcile + recreate from template | **HIGH** (public reverse proxy) |
| app (Discourse) | local_discourse/app | Discourse `./launcher` (no compose file) | none | **LEAVE AS-IS** — self-managed; templating breaks `./launcher rebuild` | n/a |
| radio-archive | radio-archive:latest | compose `app` | none | tied to Discourse project — leave with app | LOW |
Note: several CLI containers (npm, radarr, sonarr, MariaDB-Official) already HAVE a matching
template XML — the running container just isn't linked to it (recreated via CLI later, which
stripped the managed label). For these, recreate-from-existing-template is the easy path, but
**verify the template's ports/paths/env still match the live container** before applying.
---
## Recreate sequencing (least → most critical)
Do them one at a time, verify each comes back healthy before the next.
1. **gururmm-agent** — LOW. Local image, net=host, no public dependents. Proves the workflow.
Spec captured below.
2. youtube-sync-test, radio-archive — LOW. (Confirm youtube-sync-test isn't disposable first.)
3. binhex-radarr, binhex-sonarr — MED. Media, non-critical, templates already exist.
4. MariaDB-Official — MED. **Snapshot `/mnt/.../appdata` (or mysqldump) first.**
5. seafile stack — MED-HIGH. Adopt into Compose Manager. **Backup first.** `down` → register → `up`.
6. **gitea + gitea-db** — HIGH, dedicated window. **Backup gitea appdata + `mysqldump` gitea-db
first.** Pausing Gitea stops repo access AND the GuruRMM webhook build pipeline. Adopt the
existing compose into Compose Manager.
7. **npm** — HIGH, schedule with comms. Recreating drops the public reverse proxy → all proxied
public services (connect., rmm., git., community., seafile.) briefly down. **Backup `/data` +
`/etc/letsencrypt` first.** Recreate from my-NginxProxyManager.xml (verify port maps:
80→1880, 81→7818, 443→18443).
8. Discourse (app) — LEAVE.
---
## Captured recreate spec — gururmm-agent (target #1)
```
Image: localhost:3000/azcomputerguru/gururmm-agent:latest
Network: host
Restart: unless-stopped
Privileged: false CapAdd: none Devices: none (kvm passed as a bind mount)
Entrypoint: /usr/local/bin/gururmm-agent
Cmd: run
Env: GURURMM_CONFIG=/config/config.toml
Volumes:
/dev/kvm -> /dev/kvm (ro)
/proc -> /proc (ro)
/sys -> /sys (ro)
/var/run/docker.sock -> /var/run/docker.sock (rw)
/var/run/libvirt/libvirt-sock -> /var/run/libvirt/libvirt-sock (ro)
/mnt/user/appdata/gururmm -> /config (rw)
```
Equivalent `docker run` (what the dockerman template encodes):
```bash
docker run -d --name gururmm-agent \
--network host --restart unless-stopped \
-e GURURMM_CONFIG=/config/config.toml \
-v /dev/kvm:/dev/kvm:ro \
-v /proc:/proc:ro \
-v /sys:/sys:ro \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /var/run/libvirt/libvirt-sock:/var/run/libvirt/libvirt-sock:ro \
-v /mnt/user/appdata/gururmm:/config \
--entrypoint /usr/local/bin/gururmm-agent \
localhost:3000/azcomputerguru/gururmm-agent:latest run
```
Recreate path: build the template in Unraid "Add Container" (Repository, Network=host, the 6
path mappings, the env var, Extra Params `--entrypoint /usr/local/bin/gururmm-agent`, Post
Arguments `run`), `docker stop && docker rm gururmm-agent`, then apply the template. Note: it's
a localhost-registry image, so Unraid update-check won't be meaningful — but WebUI(n/a)/icon/Edit
form all come back.
---
## Open items before execution
- Confirm `youtube-sync-test` is keep-or-retire (the "-test" name suggests disposable).
- For each "template exists" container (npm/radarr/sonarr/MariaDB-Official): diff the template
XML against the live `docker inspect` (ports/paths/env) so the recreate doesn't lose config.
- Pick the maintenance window(s). Suggest: a low-risk batch (1-4) any time; seafile its own slot;
gitea + npm each in a dedicated announced window, backup-first.