claudetools/.claude/memory/reference_gururmm.md at e8144a862e2db06a41904456dc33a4f9ab91b473

Files

Mike Swanson 0c000109dc chore(memory): consolidate scattered feedback/project/reference files

Compressed memory store 104 -> 71 files via four passes:

- Syncro: 19 scattered feedback_syncro_* files merged into 3 rule files
  (api/billing/workflow) + an on-demand feedback_syncro_history.md for
  incident detail, quotes, and tech/product ID tables.
- Four near-duplicate merges: Howard paste-safety, Pluto build server,
  Howard backend deferral, IX server access (ssh+tailscale).
- Per-cluster rule/state/history split applied to GuruConnect (2->1),
  Dataforth (3->2), Cascades (7->3), GuruRMM (13->3).
- New reference_resource_map.md: single auto-loaded cheatsheet for
  "do I have access to X and how do I connect from this machine?"
- MEMORY.md rewritten to match the new layout.

Health: broken backlinks 8->7, overlap clusters 12->5, orphans 17->0.

2026-06-01 16:25:45 -07:00

9.0 KiB

Raw Blame History

name, description, type

name	description	type
GuruRMM technical reference — server, API, user_session, pipeline, agent sandbox	Operational reference for GuruRMM — server layout (SSH user, paths on 172.16.3.30), API auth + command execution + polling, user_session context (WTS impersonation, when SYSTEM fails), build-pipeline vendoring at deploy/build-pipeline/ (auto-sync to /opt/gururmm), Linux agent systemd sandbox trap (ProtectSystem=strict makes fs/mount observations sandbox-local).	reference

Rules: feedback_gururmm. Project state + principles + pending setup: project_gururmm.

Server layout (172.16.3.30)

SSH user is guru, not mike. Home is /home/guru/. Other users with home dirs: gitea-runner only.

Repo: /home/guru/gururmm
Dashboard build: cd /home/guru/gururmm/dashboard && npm run build
Deploy: sudo cp -r dist/* /var/www/gururmm/dashboard/
Other dirs under /home/guru/: guru-connect, guruconnect-server, backups

API — execute a script on any agent

Base: http://172.16.3.30:3001 (reachable from HOWARD-HOME and similar dev machines via Tailscale).

Auth: infrastructure/gururmm-server.sops.yaml → credentials.gururmm-api.admin-email + admin-password. Login returns a JWT valid for ~24h (86400s from iat).

Flow

VAULT="$PWD/.claude/scripts/vault.sh"
EMAIL=$(bash "$VAULT" get-field infrastructure/gururmm-server.sops.yaml credentials.gururmm-api.admin-email)
PASS=$(bash  "$VAULT" get-field infrastructure/gururmm-server.sops.yaml credentials.gururmm-api.admin-password)

JWT=$(curl -s -X POST http://172.16.3.30:3001/api/auth/login \
  -H "Content-Type: application/json" \
  -d "{\"email\":\"$EMAIL\",\"password\":\"$PASS\"}" \
  | python -c "import json,sys; print(json.load(sys.stdin)['token'])")

# Find agent
curl -s http://172.16.3.30:3001/api/agents -H "Authorization: Bearer $JWT"

# Submit (json-encode to preserve quotes/newlines)
AGENT="<agent-uuid>"
PAYLOAD=$(python -c "
import json
with open('path/to/script.ps1','r',encoding='utf-8') as f: s=f.read()
print(json.dumps({'command_type':'powershell','command':s}))
")
RESP=$(curl -s -X POST http://172.16.3.30:3001/api/agents/$AGENT/command \
  -H "Authorization: Bearer $JWT" -H "Content-Type: application/json" -d "$PAYLOAD")
CMD_ID=$(echo "$RESP" | python -c "import json,sys; print(json.load(sys.stdin)['command_id'])")

# Poll
while true; do
  STATUS=$(curl -s http://172.16.3.30:3001/api/commands/$CMD_ID -H "Authorization: Bearer $JWT" \
    | python -c "import json,sys; print(json.load(sys.stdin)['status'])")
  [ "$STATUS" != "running" ] && break
  sleep 5
done

# Fetch result
curl -s http://172.16.3.30:3001/api/commands/$CMD_ID -H "Authorization: Bearer $JWT"

Required fields & response

POST /api/agents/:id/command requires command_type (use powershell for Windows agents — the API accepts any string but Windows agent only runs powershell-compatible) and command (script text, JSON-encoded).

Response from /api/commands/:cmd_id:

{
  "id": "uuid", "agent_id": "uuid", "command_type": "powershell",
  "command_text": "...", "status": "completed",   // running | completed | failed | timeout
  "exit_code": 0, "stdout": "...", "stderr": "...",
  "created_at": "ISO-8601", "started_at": "ISO-8601", "completed_at": "ISO-8601"
}

When to use / not to use

Use for diagnostic checks on any enrolled agent, one-off remediation without ScreenConnect, anywhere you'd ask a user to paste a script.

Don't when the agent isn't enrolled (GET /api/agents first), for interactive sessions (no stdin), for scripts >1 MB (untested — keep modular).

Notes: command_type: "powershell" runs in SYSTEM context on Windows (agent runs as LocalSystem). Idempotent commands only — no rollback. If output is large, have the script write to a file on the agent and fetch via a separate command. Tunnel API (/api/v1/tunnel/...) is a planned interactive feature per .claude/gururmm-tunnel-plan.md, not deployed.

`context=user_session` — run as the active logged-on user

POST /api/agents/:id/command accepts an optional context field (migration 041):

"system" (default) — Session 0 / SYSTEM. Original behavior.
"user_session" — runs in the active logged-on user's desktop session via WTS token impersonation (WTSQueryUserToken + DuplicateTokenEx + CreateProcessAsUserW, in agent/src/watchdog/wts.rs). Requires an active logged-on user on the endpoint.

Why it matters: some Windows cmdlets fail as SYSTEM with "NonInteractive mode" / interactive-session errors and historically had to be done on-site. user_session runs them remotely instead. Verified 2026-05-27 on the Peaceful Spirit BridgetteHome L2TP VPN deploy: Set-VpnConnection -L2tpPsk -AllUserConnection — previously documented as "cannot be done remotely" — was set successfully via user_session, completing a VPN rollout entirely through RMM with no on-site visit.

Elevation: the WTS-impersonated token of a logged-on admin user comes back effectively elevated (WindowsPrincipal.IsInRole(Administrator)=True) — enough to write the all-user phonebook / HKLM. A standard logged-on user is NOT elevated, so admin-requiring commands still fail. Agent launches powershell.exe -NonInteractive; don't rely on real interactive prompts.

Invoke: {"command_type":"powershell","command":"...","context":"user_session"}. To dodge shell-quoting on multi-line scripts, base64-encode the script as UTF-16LE and send powershell -NoProfile -NonInteractive -EncodedCommand <b64> (iconv is absent in Git Bash — encode with py).

Build-pipeline vendoring (`/opt/gururmm/` ⇄ repo `deploy/build-pipeline/`)

Pipeline runs at /opt/gururmm/ on the gururmm server (root-owned, hand-maintained). The scripts had silently diverged from the repo (caused BUG-015 Windows build-gate gap). Reconciled 2026-06-01:

Source of truth: scripts vendored in the gururmm repo at deploy/build-pipeline/ — build-{windows,linux,mac,agents,server,shared}.sh, sign-windows.sh, webhook-handler.py, README (commit 2bf539e).
Drift-stop (commit 24b5daf): build-shared.sh (runs first every build, after git reset --hard origin/main) install -m 0755-syncs the 6 build scripts from deploy/build-pipeline/ → /opt/gururmm/ each build. Edit in repo + push to main → next build runs it. No manual copy, no restart.
Two exceptions — manual sudo cp required (can't self-overwrite mid-run):
- build-shared.sh (the running puller).
- webhook-handler.py (persistent HTTP server; also sudo systemctl restart gururmm-webhook to reload). They change rarely. See deploy/build-pipeline/README.md.
Webhook still INVOKES the /opt/gururmm copies (not repo copies directly) — the sync keeps them current.
Repo's older scripts/webhook-handler.py + scripts/build-agents.sh are a prior generation, superseded.
build-windows.sh's change-gate watches agent/ installer/ (BUG-015 fix — installer-only .wxs/.ico changes now rebuild the MSI).

Linux agent runs in a systemd sandbox — `findmnt` lies

The Linux agent (gururmm-agent.service) is hardened with ProtectSystem=strict → private mount namespace where / is read-only, only ReadWritePaths= entries are writable. Every command dispatched through the agent runs inside that namespace — so findmnt /, touch, /proc/mounts etc. report the agent's sandboxed view, not the host's actual state.

Trap (hit 2026-06-01 on GURU-KALI): I diagnosed "host root filesystem is read-only" because an RMM-dispatched touch /var/lib/gururmm returned EROFS (os error 30) and findmnt / showed ro. The host root was rw the entire time (SMART PASSED, ext4 clean, no kernel remount-ro). Real cause: the unit's ReadWritePaths= omitted /var/lib/gururmm → agent couldn't persist /var/lib/gururmm/.device-id → re-minted a device_id on each daily identity refresh → server (no machine_uid dedup) filed a new agent row each time (~11 ghosts).

How to get host truth instead of sandbox view:

SSH to the host directly (commands run in the host namespace), OR
Read the agent PID's namespace explicitly: cat /proc/<agent_pid>/mountinfo — the process-scoped ro on / is the tell that it's sandbox, not host. Compare against the host's findmnt.
errors=remount-ro in a mount line is the stock default mount option — NOT evidence an error fired. Confirm an actual remount-ro with kernel EXT4-fs error logs + dumpe2fs -h error count.

Fix pattern (additive): drop-in /etc/systemd/system/gururmm-agent.service.d/override.conf with

[Service]
ReadWritePaths=/var/lib/gururmm

(systemd merges ReadWritePaths additively across drop-ins), then daemon-reload + restart.

Better upstream fix: StateDirectory=gururmm (handles dir creation + perms + RW bind in one directive).

Fleet implication: every systemd-installed GuruRMM Linux agent with this unit shape has the same latent bug until the installer is fixed. See filed todos (agent ReadWritePaths / StateDirectory + server machine_uid dedup).

9.0 KiB Raw Blame History