335 lines
19 KiB
Markdown
335 lines
19 KiB
Markdown
# Session Log — 2026-05-24
|
||
|
||
## User
|
||
- **User:** Mike Swanson (mike)
|
||
- **Machine:** GURU-KALI
|
||
- **Role:** admin
|
||
- **Session span:** ~06:30–09:31 MST
|
||
|
||
---
|
||
|
||
## Session Summary
|
||
|
||
Provisioned GURU-KALI (Lenovo Legion Pro 5, Kali rolling) for full ClaudeTools/GuruRMM
|
||
work and then implemented Linux support for the GuruRMM agent tray, testing it end to
|
||
end on this machine.
|
||
|
||
First half was machine onboarding. The SOPS vault was not present locally, so the vault
|
||
repo was cloned to `/home/guru/vault`; `sops` 3.13.1 was installed to `~/.local/bin`
|
||
(checksum-verified), the age key directory was created, and after the user supplied the
|
||
age private key, vault decryption was verified working. Tailscale was then installed —
|
||
this machine was off the company LAN (wifi 10.2.x) with no path to internal services, so
|
||
coord API, the internal DB, and the remote Ollama were all unreachable. After
|
||
`tailscale up --accept-routes`, pfSense-2's advertised `172.16.0.0/22` subnet route made
|
||
`172.16.3.30` reachable; coord API and remote Ollama were both confirmed (HTTP 200). A
|
||
per-machine spec was written to `.claude/machines/guru-kali.md` following the existing
|
||
fleet convention (the first attempt created a wrong-location `.claude/MACHINES.md`, which
|
||
was removed after the user pointed to the existing `.claude/machines/` + `LINUX_PC_ONBOARDING.md`).
|
||
|
||
Second half was the GuruRMM Linux tray. The active repo was cloned to `/home/guru/gururmm`.
|
||
The parity matrix in `.claude/CODING_GUIDELINES.md` confirmed the gap: IPC/tray was
|
||
`[OK]` on Windows, `[GAP]` on Linux/macOS (a `cfg(not(windows))` no-op). After installing
|
||
the Rust toolchain (rustup, missing) and GTK/appindicator/openssl dev libs, a Coding Agent
|
||
implemented: a real Unix-domain-socket IPC server in the agent (transport-agnostic handler
|
||
shared with the Windows named pipe), the tray's Unix-socket client, and a Linux GTK
|
||
main-loop run path (winit does not pump libappindicator on Linux). Code Review returned
|
||
APPROVE WITH NITS; H1 (socket-dir hardening) was fixed in-diff, H2 (policy gating + Denied)
|
||
partly closed, and M2/M3 applied.
|
||
|
||
The tray was verified live in the XFCE panel. Running the agent under the systemd service
|
||
surfaced a real deployment bug: `ProtectSystem=strict` with only `/var/log` writable made
|
||
`/run` read-only in the sandbox, so the agent could not create its socket. Fixed by adding
|
||
`RuntimeDirectory=gururmm` to the unit (both on this machine and in the agent's unit
|
||
template in `main.rs`). With the fix, the enrolled agent (this machine was already enrolled,
|
||
id `a73ba38e`) authenticated, served the socket, and the tray showed the green "Connected"
|
||
icon. XDG autostart + best-effort installer wiring were added. Work landed on branch
|
||
`feat/linux-tray-ipc` as PR #13 (not merged — branch+PR was chosen to avoid triggering the
|
||
fleet build pipeline).
|
||
|
||
---
|
||
|
||
## Key Decisions
|
||
|
||
- **Tailscale-only (not local Ollama) for onboarding now.** Tailscale restored coord API +
|
||
DB + remote Ollama in one step; local Ollama deferred (GPU is on nouveau, needs proprietary
|
||
driver + reboot for accel).
|
||
- **Passwordless sudo enabled for `guru`** (`/etc/sudoers.d/guru-nopasswd`) per user choice,
|
||
so privileged steps (apt, systemd, /run) run without per-command prompts.
|
||
- **Branch + PR, not push to main.** Pushing to `main` triggers the webhook build pipeline
|
||
and a fleet-wide stable-channel auto-update of the agent; a PR keeps it reviewable.
|
||
- **`cfg(unix)` for the socket IPC, `cfg(target_os="linux")` for GTK** (per platform-parity
|
||
standard) — the Unix-socket IPC advances macOS for free; macOS tray launch left as
|
||
`TODO(platform)`.
|
||
- **`RuntimeDirectory=gururmm` over loosening ProtectSystem** — the systemd-native, minimal
|
||
way to give the agent a writable `/run/gururmm` for its socket.
|
||
- **Tray policy left as-is** — the server already pushes this agent `enabled=true`
|
||
(with `allow_view_logs=false`), so "show the tray for this machine" was already satisfied;
|
||
no explicit override added.
|
||
- **Ran the agent as root / under systemd, tray as `guru`** — the 0666 socket bridges the
|
||
root-owned agent and the non-root user-session tray (Linux equivalent of the Windows
|
||
NULL-DACL pipe).
|
||
|
||
---
|
||
|
||
## Problems Encountered
|
||
|
||
- **Vault sync skipped** — `/home/guru/vault` was not a git repo. Resolved by cloning the
|
||
vault repo there.
|
||
- **No sops / no age key** — vault clone alone could not decrypt. Installed sops 3.13.1,
|
||
created `~/.config/sops/age/`, user supplied the private key; decryption verified.
|
||
- **Session not elevated** — assumed elevated but `sudo -n` required a password. Resolved by
|
||
the user enabling passwordless sudo.
|
||
- **Tailscale not in Kali apt** — used the official `install.sh` (it explicitly maps `kali`).
|
||
- **Wrong machine-doc artifact** — created `.claude/MACHINES.md`; the convention is
|
||
`.claude/machines/<host>.md`. Removed the stray file, wrote `guru-kali.md`, repointed refs.
|
||
- **Rust missing** — installed via rustup (`~/.cargo`). GTK/appindicator/openssl dev libs
|
||
installed via apt.
|
||
- **Agent panicked on `--help` as `guru`** — it initializes a rolling file logger to
|
||
`/var/log/gururmm` (root-only). Runs fine as root.
|
||
- **`--config` rejected after `run`** — it is a global flag; correct form is
|
||
`gururmm-agent --config <path> run`.
|
||
- **IPC socket failed under systemd** (`removing stale agent socket`) — `ProtectSystem=strict`
|
||
made `/run` read-only in the sandbox (EROFS). Fixed with `RuntimeDirectory=gururmm`.
|
||
- **Screenshot showed a screensaver** (xfce4-screensaver mice on black). Deactivated with
|
||
`xfce4-screensaver-command --deactivate` before re-capturing.
|
||
- **5.8 GB cgroup "memory" alarm walked back** — actual agent RSS was 32 MB; the figure was
|
||
the systemd cgroup peak, not resident memory.
|
||
|
||
---
|
||
|
||
## Configuration Changes
|
||
|
||
**ClaudeTools repo (`/home/guru/claudetools`):**
|
||
- Created `.claude/machines/guru-kali.md` — full machine spec (updated this session with Rust,
|
||
GTK build libs, passwordless sudo, gururmm clone, enrolled-agent note).
|
||
- `.claude/OLLAMA.md` — added GURU-KALI to the machine table + status note.
|
||
- `.claude/CLAUDE.md` — Reference pointer to `.claude/machines/`.
|
||
- Removed the mistakenly-created `.claude/MACHINES.md`.
|
||
- (Earlier commit `4383f9e` carried the first three; this session's `guru-kali.md` edits sync now.)
|
||
|
||
**GuruRMM repo (`/home/guru/gururmm`) — PR #13, branch `feat/linux-tray-ipc`, commit `01fa6c4`:**
|
||
- `agent/src/ipc.rs` — Unix-socket IPC server; transport-agnostic shared handler; hardened
|
||
socket-dir creation; policy-gated StopAgent/ForceCheckin + `Denied` variant.
|
||
- `agent/src/main.rs` — added `RuntimeDirectory=gururmm` + `RuntimeDirectoryMode=0755` to the
|
||
generated systemd unit template.
|
||
- `agent/scripts/install.sh` — best-effort tray binary download + XDG autostart install.
|
||
- `agent/deploy/linux/gururmm-tray.desktop` — new XDG autostart entry.
|
||
- `tray/Cargo.toml` — gtk/glib 0.18 under linux cfg; tokio `net` for unix; winit gated to non-linux.
|
||
- `tray/src/ipc.rs` — Unix-socket client + capped exponential backoff; dropped redundant GetStatus.
|
||
- `tray/src/tray.rs` — Linux GTK main-loop run path; Linux ViewLogs branch.
|
||
|
||
**Machine-level (GURU-KALI, not in any repo):**
|
||
- `/etc/sudoers.d/guru-nopasswd` — passwordless sudo for guru.
|
||
- `~/.local/bin/sops` (3.13.1), `~/.config/sops/age/keys.txt` (age private key, mode 600).
|
||
- `/home/guru/vault` (vault repo clone), `/home/guru/gururmm` (gururmm repo clone).
|
||
- Rust via rustup (`~/.cargo`); apt: libgtk-3-dev, libayatana-appindicator3-dev, libxdo-dev,
|
||
libssl-dev, pkg-config, build-essential.
|
||
- Tailscale installed; `tailscale up --accept-routes`.
|
||
- `/etc/systemd/system/gururmm-agent.service` — patched with `RuntimeDirectory=gururmm`.
|
||
- Deployed local dev builds to `/usr/local/bin/gururmm-agent` and `/usr/local/bin/gururmm-tray`;
|
||
`/etc/xdg/autostart/gururmm-tray.desktop` installed.
|
||
|
||
---
|
||
|
||
## Credentials & Secrets
|
||
|
||
- **age private key** at `~/.config/sops/age/keys.txt` (mode 600) — public key
|
||
`age1qz7ct84m50u06h97artqddkj3c8se2yu4nxu59clq8rhj945jc0s5excpr` (vault recipient #1).
|
||
Supplied by the user this session; matches the vault's first `.sops.yaml` recipient.
|
||
- **GuruRMM agent api_key** — in `/etc/gururmm/agent.toml` (root, mode 600), real enrolled key
|
||
for agent id `a73ba38e-cd02-4331-b8bf-474cd899ec22`. Not transcribed here (already on-machine).
|
||
- **Gitea API token** used for PR #13 — from vault `services/gitea.sops.yaml` field `api.api-token`
|
||
(whoami = azcomputerguru). No new secrets created.
|
||
- `/etc/gururmm/config.toml` — a generated test config with a placeholder api_key
|
||
(`your-api-key-here`); not a real credential.
|
||
|
||
---
|
||
|
||
## Infrastructure & Servers
|
||
|
||
- **GURU-KALI** — Tailscale `100.75.148.91` (mike@); wifi `10.2.209.225/16`. XFCE/X11, `DISPLAY=:0.0`.
|
||
- **Coord API / ClaudeTools DB** — `172.16.3.30:8001` (reachable via Tailscale subnet route
|
||
`172.16.0.0/22` advertised by pfSense-2 `100.119.153.74`).
|
||
- **Remote Ollama** — `100.92.127.64:11434` (DESKTOP-0O8A1RL), 5 models, reachable.
|
||
- **GuruRMM server** — `wss://rmm-api.azcomputerguru.com/ws` (agent WS endpoint); dashboard
|
||
`https://rmm.azcomputerguru.com`.
|
||
- **Gitea** — internal API `http://172.16.3.20:3000` (external `git.azcomputerguru.com` blocks curl/Cloudflare).
|
||
- **GuruRMM agent socket** — `/run/gururmm/agent.sock` (srw-rw-rw-, root); created via systemd
|
||
`RuntimeDirectory`. Agent logs to `/var/log/gururmm/agent.log`.
|
||
|
||
---
|
||
|
||
## Commands & Outputs
|
||
|
||
```bash
|
||
# Vault + sops
|
||
git clone <vault-url> /home/guru/vault
|
||
install -m 0755 sops ~/.local/bin/sops # 3.13.1, sha256 verified
|
||
bash .claude/scripts/vault.sh list # decryption OK after key placed
|
||
|
||
# Tailscale
|
||
curl -fsSL https://tailscale.com/install.sh | sh
|
||
sudo tailscale up --accept-routes # node 100.75.148.91
|
||
# pfSense-2 advertises 172.16.0.0/22 -> 172.16.3.30 reachable
|
||
|
||
# Build env
|
||
curl --proto '=https' https://sh.rustup.rs | sh -s -- -y --profile minimal # rust 1.95.0
|
||
sudo apt-get install -y libgtk-3-dev libayatana-appindicator3-dev libxdo-dev libssl-dev pkg-config build-essential
|
||
|
||
# Build + run (local cargo, NOT build-agents.sh)
|
||
cd /home/guru/gururmm/agent && cargo build # clean (51 pre-existing warnings)
|
||
cd /home/guru/gururmm/tray && cargo build # clean
|
||
sudo /usr/local/bin/gururmm-agent --config /etc/gururmm/agent.toml run # via systemd after fix
|
||
DISPLAY=:0.0 /usr/local/bin/gururmm-tray # tray; green when agent connected
|
||
|
||
# Verify tray registration
|
||
gdbus call --session --dest org.kde.StatusNotifierWatcher \
|
||
--object-path /StatusNotifierWatcher \
|
||
--method org.freedesktop.DBus.Properties.Get \
|
||
org.kde.StatusNotifierWatcher RegisteredStatusNotifierItems
|
||
# -> org/ayatana/NotificationItem/tray_icon_tray_app
|
||
```
|
||
|
||
Key log lines:
|
||
- `Authentication successful, agent_id: Some(a73ba38e-cd02-4331-b8bf-474cd899ec22)`
|
||
- `[INFO] IPC server listening on /var/run/gururmm/agent.sock`
|
||
- tray: `Connected to agent` / `Updated status: connected=true` / `Updated policy: enabled=true`
|
||
- pre-fix error: `IPC server error: removing stale agent socket` (EROFS under ProtectSystem=strict)
|
||
|
||
---
|
||
|
||
## Pending / Incomplete Tasks
|
||
|
||
- **PR #13 review/merge** — https://git.azcomputerguru.com/azcomputerguru/gururmm/pulls/13.
|
||
Not merged; merging triggers the build pipeline + fleet auto-update.
|
||
- **Build pipeline must build + publish `gururmm-tray-linux-<arch>`** to the downloads dir, and
|
||
confirm `install.sh` `TRAY_DOWNLOAD_URL` matches the published name (installer is best-effort until then).
|
||
- **Phase-4 IPC hardening (task #10):** SO_PEERCRED on the 0666 socket, real StopAgent/ForceCheckin
|
||
enforcement + confirmation dialog (policy gating + Denied are in place; peer-cred + real action deferred).
|
||
- **macOS tray launch** (launchd user agent) — untested, `TODO(platform)`.
|
||
- **GURU-KALI service** runs an unsigned local dev build with a hand-patched unit; it realigns
|
||
when PR #13 merges and the pipeline ships a signed agent.
|
||
- **Optional onboarding leftovers:** local Ollama, GrepAI, 1Password CLI not installed.
|
||
|
||
---
|
||
|
||
## Reference Information
|
||
|
||
- GuruRMM PR: https://git.azcomputerguru.com/azcomputerguru/gururmm/pulls/13 (branch `feat/linux-tray-ipc`, commit `01fa6c4`)
|
||
- Agent id (GURU-KALI): `a73ba38e-cd02-4331-b8bf-474cd899ec22`
|
||
- Tailscale: GURU-KALI `100.75.148.91`, DESKTOP-0O8A1RL `100.92.127.64`, pfSense-2 `100.119.153.74`
|
||
- Repos: claudetools `/home/guru/claudetools`, vault `/home/guru/vault`, gururmm `/home/guru/gururmm`
|
||
- Coord lock used: `425f588c-b41d-4d5f-a926-60d3e342c416` (released)
|
||
- Machine doc: `.claude/machines/guru-kali.md`; onboarding: `.claude/machines/LINUX_PC_ONBOARDING.md`
|
||
- Standards referenced: `.claude/CODING_GUIDELINES.md`, `.claude/standards/gururmm/{platform-parity,build-pipeline,sqlx-migrations}.md`
|
||
|
||
---
|
||
|
||
## Update: 10:15 MST — Phase 4 IPC hardening, PRs merged, follow-up issues, update watch
|
||
|
||
### Session Summary
|
||
|
||
Merged the Linux-tray PR (#13) to `main`, then implemented and merged Phase 4 of the
|
||
agent IPC (the H2 hardening follow-up from #13's review), opened tracking issues for
|
||
the remaining gaps, and set up a watcher to confirm GURU-KALI auto-updates once the
|
||
build pipeline publishes the new agent.
|
||
|
||
PR #13 was merged via the internal Gitea API (merge commit `2857559`, then a CI
|
||
`auto-bump versions` commit `9e7977c`). The local `gururmm` clone was fast-forwarded to
|
||
the merged main, which also brought in unrelated landed work: `server/migrations/042_agent_events.sql`,
|
||
`server/src/db/events.rs`, and an `AppState.log_sender_watch` field.
|
||
|
||
Phase 4 was implemented by a Coding Agent (opus): peer-credential authorization on the
|
||
0666 Unix socket (deny-by-default), real `ForceCheckin`/`StopAgent` wiring, and a tray
|
||
GTK confirmation dialog. Code Review (opus) returned APPROVE WITH NITS, no blockers;
|
||
the deny-by-default authz was verified sound across all paths. A follow-up fix pass
|
||
addressed the two MEDIUMs (StopAgent on non-systemd installs; stale force_checkin Notify
|
||
permit) and LOW-2 (macOS `admin` group). The change shipped as PR #14 and was merged
|
||
(merge `b0e8ad9`, CI bump `bb3e8c0`).
|
||
|
||
Five tracking issues were opened for the non-blocking follow-ups. Then, because the
|
||
agent updater is server-push (not poll-based) and SSH to the build server is unavailable
|
||
from GURU-KALI, a background watcher was started that polls the published-downloads
|
||
endpoint for a version > 0.6.29 and GURU-KALI's running version, to confirm the
|
||
pipeline publish + subsequent auto-update. As of this save the pipeline had not yet
|
||
published the post-merge version (still 0.6.29); the watcher continues, and the user
|
||
asked to be pinged (push notification) on publish.
|
||
|
||
### Key Decisions
|
||
|
||
- **Merged both PRs to main** (user-authorized) despite the earlier branch+PR caution —
|
||
each merge triggers the webhook build + stable-channel fleet auto-update.
|
||
- **Differentiated IPC authz model** (user choice): ForceCheckin = active session-user
|
||
uid or root; StopAgent = root or `sudo`/`wheel`/`admin` group AND policy
|
||
`allow_stop_agent`; read-only requests ungated. Order: policy gate first, then peer-cred.
|
||
- **force_checkin Notify wired into the WS task** (transport/websocket.rs), not the
|
||
collect-only loop in main.rs — `notify_one()` wakes one waiter, so two waiters would
|
||
race/steal the wakeup. Drained at WS task start to avoid a stale permit firing a
|
||
spurious send on reconnect.
|
||
- **StopAgent self-exits on non-systemd installs** (Unraid/Synology cron/nohup path)
|
||
where `systemctl stop` is a no-op — detected via existing `has_systemd()`.
|
||
- **Opened issues rather than expanding the PRs** for Windows peer authz, logind
|
||
console-user resolution, macOS completion, pipeline tray build, and subscriber broadcast.
|
||
|
||
### Problems Encountered
|
||
|
||
- **`AppState` drift on merged main** — main gained `log_sender_watch`; the Coding Agent
|
||
added it (and `force_checkin`) to BOTH main.rs and service.rs, also fixing a pre-existing
|
||
Windows-only build break where service.rs was missing `log_sender_watch`.
|
||
- **`systemctl stop` no-op on non-systemd installs** (review MEDIUM-2) — fixed with a
|
||
`has_systemd()` branch that self-exits otherwise.
|
||
- **Stale force_checkin permit** (review MEDIUM-1) — drained once at WS task start via a
|
||
`biased` select against `std::future::ready(())`.
|
||
- **No SSH to build server** (`guru@172.16.3.30` Permission denied, publickey) — can't
|
||
read `/var/log/gururmm-build.log`; watching the published-downloads endpoint instead.
|
||
- **Vault field path** — token is at `credentials.api.api-token` (not `api.api-token`);
|
||
the Gitea Agent corrected the lookup.
|
||
- **`pkill` aborting compound bash commands** (exit 144) — re-ran the affected steps
|
||
individually; wrote the watcher script via the Write tool after a heredoc was truncated.
|
||
|
||
### Configuration Changes (this update)
|
||
|
||
GuruRMM (`/home/guru/gururmm`), Phase 4 — merged via PR #14:
|
||
- `agent/src/ipc.rs` — `PeerIdentity`, peer_cred() at accept, authz helpers
|
||
(`authorize_force_checkin`, `authorize_stop_agent`, `active_session_uid`,
|
||
`uid_in_admin_group`), real `spawn_service_stop`, Denied responses.
|
||
- `agent/src/main.rs` — `AppState.force_checkin: Arc<Notify>`; `has_systemd()` made `pub(crate)`.
|
||
- `agent/src/metrics/mod.rs` — `logged_in_username()` associated fn.
|
||
- `agent/src/service.rs` — mirrored `force_checkin` + `log_sender_watch` in the Windows AppState.
|
||
- `agent/src/transport/websocket.rs` — metrics task selects on force_checkin Notify; drains stale permit at start.
|
||
- `tray/src/tray.rs` — GTK Yes/No confirm before StopAgent (Linux).
|
||
|
||
Local (not in repo): background watcher scripts `/tmp/gururmm-watch-publish.sh`,
|
||
log `/tmp/gururmm-watch.log`.
|
||
|
||
### Commands & Outputs (this update)
|
||
|
||
```bash
|
||
# current vs published agent version
|
||
sudo /usr/local/bin/gururmm-agent --version # gururmm-agent 0.6.29
|
||
curl -s https://rmm.azcomputerguru.com/downloads/ | grep -oiE 'gururmm-agent-linux[^"<> ]*'
|
||
# -> gururmm-agent-linux-amd64-0.6.29 (+ .sha256, -latest) [no newer version yet]
|
||
|
||
# live running version via IPC socket (no need to spawn the binary)
|
||
echo '{"type":"get_status"}' | socat - UNIX-CONNECT:/var/run/gururmm/agent.sock | grep agent_version
|
||
|
||
ssh guru@172.16.3.30 # -> Permission denied (publickey,password) — no build-log access
|
||
```
|
||
|
||
### Pending / Incomplete Tasks (this update)
|
||
|
||
- **Watching for pipeline publish + GURU-KALI auto-update** — watcher running; ping user
|
||
(push notification) on publish. If published version moves but the agent doesn't update,
|
||
auto-update is disabled/manual (needs dashboard or `POST /agents/{id}/update`).
|
||
- Follow-up issues open: #15 (pipeline tray build), #16 (Windows peer authz), #17 (logind
|
||
console user), #18 (macOS tray), #19 (subscriber broadcast).
|
||
- GURU-KALI still on local dev binaries until the pipeline build deploys.
|
||
|
||
### Reference Information (this update)
|
||
|
||
- PR #13 merged: merge `2857559`, CI bump `9e7977c`.
|
||
- PR #14 merged: https://git.azcomputerguru.com/azcomputerguru/gururmm/pulls/14 — merge `b0e8ad9`, CI bump `bb3e8c0`.
|
||
- Issues #15-#19: https://git.azcomputerguru.com/azcomputerguru/gururmm/issues/15 .. /19
|
||
- Phase 4 commit: `7a4e745`. Coord lock used + released: `3116d737`.
|
||
- Published downloads: https://rmm.azcomputerguru.com/downloads/ (poll target). Build server `172.16.3.30` (no SSH from GURU-KALI).
|