Files
claudetools/session-logs/2026-04-21-session.md

400 lines
17 KiB
Markdown

# Session Log — 2026-04-21
## User
- **User:** Mike Swanson (mike)
- **Machine:** DESKTOP-0O8A1RL
- **Role:** admin
---
## Session Summary
Continuation from previous conversation (context compacted). This session covered three areas:
1. **BirthBiologic vault save** — fixed a broken vault stub and saved GuruRMM site credentials for the new BirthBiologic client
2. **MSI build fix** — diagnosed and fixed "MSI build on Pluto failed" error caused by a missing WiX extension flag in `install.rs`
3. **DESIGN.md created** — comprehensive per-component design guide for GuruRMM covering architectural decisions, rules, and constraints that were previously only in session logs and verbal decisions
---
## Key Work
### 1. BirthBiologic Vault Entry — Fixed and Saved
**Problem:** A broken unencrypted stub existed at `D:/vault/clients/birthbiologic/gururmm-site-main.sops.yaml`. `vault.sh add` failed ("file already exists"), `vault.sh create` doesn't exist, and `sops --encrypt` failed with "no matching creation rules found" when the input file wasn't named `.sops.yaml`.
**Root cause:** The SOPS `.sops.yaml` creation rule uses `path_regex: '.*\.sops\.yaml$'` — it only matches files already named `.sops.yaml`. Attempting to encrypt a `.plain.yaml` file doesn't match the rule.
**Fix:**
1. Deleted the broken stub
2. Wrote plaintext to `gururmm-site-main.plain.yaml`
3. Encrypted with explicit AGE key + `--encrypted-regex` flags: `sops --encrypt --age age1qz7ct84m50u06h97artqddkj3c8se2yu4nxu59clq8rhj945jc0s5excpr --encrypted-regex '^(credentials|...)$' input.plain.yaml > output.sops.yaml`
4. Deleted plaintext
5. Verified: `vault.sh get-field clients/birthbiologic/gururmm-site-main.sops.yaml credentials.api_key` returned correct value
**BirthBiologic GuruRMM credentials (also in vault):**
```
client_id: da526b38-e832-4159-ab13-a3d94e9897a2
site_id: 3b20ef97-c764-4ef8-9154-79c3d5b486f8
site_code: BRIGHT-PEAK-5980
api_key: grmm_1ZB1qV9Q61b9Noq8BIaZGwLNjZMfF49i
installer_url (landing): https://rmm.azcomputerguru.com/install/BRIGHT-PEAK-5980
msi_url (direct): https://rmm.azcomputerguru.com/sites/3b20ef97-c764-4ef8-9154-79c3d5b486f8/installer
```
Vault file: `D:/vault/clients/birthbiologic/gururmm-site-main.sops.yaml`
---
### 2. MSI Build Fix — "MSI build on Pluto failed"
**Symptom:** Clicking "Download MSI" in the GuruRMM dashboard for any site returned "MSI build on Pluto failed" in red.
**Diagnosis:** Server log showed:
```
stdout=C:\gururmm\installer\gururmm-agent.wxs(226) : error WIX0094:
The identifier 'Binary:Wix4UtilCA_X64' could not be found.
```
**Root cause:** The `build_site_msi_on_pluto` function in `server/src/api/install.rs` was calling `wix build` without `-ext WixToolset.Util.wixext`. The `InstallReportCA` custom action uses `BinaryRef="Wix4UtilCA_X64"` which lives in the Util extension. The base-MSI build in `build-agents.sh` had the flag; the on-demand per-site build did not.
**Fix:** Added `-ext WixToolset.Util.wixext` to the WiX command in `build_site_msi_on_pluto`:
```
"cd C:\\gururmm\\installer && wix.exe build gururmm-agent.wxs \
-arch x64 -d Version={version} -d SITEKEY={site_id} \
-o {remote_out} -ext WixToolset.Util.wixext"
```
Applied directly on Jupiter via `sed -i`, rebuilt server (`cargo build --release` in `server/`), restarted `gururmm-server`. Then committed and pushed the fix to Gitea.
**Fix commit:** `6106087` — "fix: add WixToolset.Util.wixext to site MSI build command"
**Note:** This was a discrepancy between `build-agents.sh` (had the flag) and `install.rs` (didn't). Added to DESIGN.md as a documented rule.
---
### 3. DESIGN.md — GuruRMM Design Guide Created
Created `docs/DESIGN.md` in the GuruRMM repo. This is a new document capturing per-component design decisions and hard constraints that were previously scattered across session logs and verbal decisions.
**Committed:** `6b76dd7` — "docs: add DESIGN.md — per-component architectural decisions and rules"
**Sections:**
- Project-Wide Rules (no TOML/config for endpoints, registry as source of truth)
- Agent (auto-install, per-agent enrollment keys, legacy OS support, .old cleanup, downgrade guard)
- Installer/MSI (WiX v4 only, Pluto-only, required extension, Wait="no" rationale, install-report CA as debug logger, no UI extension)
- Build Pipeline (webhook-only builds, parallelism, signing, toolchain self-bootstrapping, build lock)
- Server (PostgreSQL not MariaDB, INET sqlx pattern, ConnectInfo extractor, stop-before-replace, migration recording)
- Dashboard (useMemo pitfall, sidebar colors, modal key reset, theme support)
- Tray Application (separate crate, user session, policy-controlled, named pipe IPC)
- Protocol / Wire Format (WebSocket message types, heartbeat)
---
## Files Created / Modified
| File | Change |
|------|--------|
| `D:/vault/clients/birthbiologic/gururmm-site-main.sops.yaml` | Created (encrypted vault entry for BirthBiologic RMM site) |
| `/home/guru/gururmm/server/src/api/install.rs` | Added `-ext WixToolset.Util.wixext` to Pluto WiX build command |
| `docs/DESIGN.md` (in gururmm repo) | Created — comprehensive design guide |
---
## Commits (gururmm repo)
| SHA | Message |
|-----|---------|
| `6106087` | fix: add WixToolset.Util.wixext to site MSI build command |
| `6b76dd7` | docs: add DESIGN.md — per-component architectural decisions and rules |
---
## Update: 19:25 UTC — MSI Still Failing, Root Cause Found and Fixed
### Problem
After the earlier `install.rs` fix and server rebuild, MSI generation was still failing with the same `WIX0094` error.
### Root Cause
Two compounding issues:
**1. Wrong binary deployed.** The `gururmm-server` service runs from `/opt/gururmm/gururmm-server`, not `/usr/local/bin/gururmm-server`. The rebuild at 17:53 placed the new binary in `/home/guru/gururmm/server/target/release/gururmm-server` but it was never copied to `/opt/gururmm/`. The old binary (from 2026-04-20 18:32) kept running.
```
ExecStart=/opt/gururmm/gururmm-server ← service path
/usr/local/bin/gururmm-server ← wrong path (stale, Apr 20)
/home/guru/gururmm/server/target/release/gururmm-server ← new binary (never deployed)
```
**2. Migration 013 not registered.** Once the correct binary was deployed and the service restarted, it crashed immediately on startup:
```
Error: while executing migration 13: error returned from database:
relation "install_reports" already exists
```
Migration 013 (`install_reports` table) had been applied to the DB in a prior session but never recorded in `_sqlx_migrations`. sqlx tried to re-run it, hit the conflict, and crashed.
### Fix
1. Deployed the correct binary:
```bash
sudo systemctl stop gururmm-server
sudo cp /home/guru/gururmm/server/target/release/gururmm-server /opt/gururmm/gururmm-server
```
2. Registered migration 013 in `_sqlx_migrations`:
```sql
INSERT INTO _sqlx_migrations (version, description, installed_on, success, checksum, execution_time)
VALUES (
13,
'install reports',
NOW(),
true,
decode('76d53ea1c51f9ce70c01f5b8b545d17f63eab5b2c447e880cdb1f25807ed30c626df818aadea6db9d024cdf2e72d3062', 'hex'),
0
);
```
Checksum was computed via `hashlib.sha384` of the migration file contents.
3. Restarted service — came up clean, agents reconnected.
### Lesson
**Always deploy to `/opt/gururmm/gururmm-server`** — that is the path in the systemd `ExecStart`. `/usr/local/bin/gururmm-server` is a stale copy from early setup and is not used. This should be added to CONTEXT.md / DESIGN.md anti-patterns.
---
## Pending / Next Tasks
From previous session (still pending):
- [ ] Test MSI installer on BirthBiologic server — install via `https://rmm.azcomputerguru.com/install/BRIGHT-PEAK-5980` or MSI from dashboard
- [ ] Consent `tenant-admin` and `user-manager` apps in BirthBiologic tenant (only `investigator` consented so far)
- [ ] BirthBiologic Datto → SharePoint migration script (PowerShell, tenant-admin Graph API, app-only auth, reads Datto Workplace local file server, uploads to SharePoint via Sites.ReadWrite.All)
- [ ] mvaninc CA policy — create policy requiring MFA for all sign-ins (Mike to do in portal, not scriptable)
- [ ] Legacy build deployment — still needs first trigger via webhook push to produce legacy binaries
---
## Infrastructure
| Component | Location | Notes |
|-----------|----------|-------|
| GuruRMM server | guru@172.16.3.30 | `gururmm-server` service |
| Pluto build VM | Administrator@172.16.3.36 | Windows MSVC + WiX |
| Downloads dir | /var/www/gururmm/downloads/ | binaries, MSIs |
| Build log | /var/log/gururmm-build.log | |
| Vault | D:/vault/ | SOPS AGE-encrypted |
---
## Credentials
- **PostgreSQL (gururmm):** `gururmm` / `43617ebf7eb242e814ca9988cc4df5ad` @ 172.16.3.30:5432/gururmm
- **Build server SSH:** guru@172.16.3.30
- **Pluto SSH:** Administrator@172.16.3.36
- **Webhook secret:** `gururmm-build-secret`
- **Gitea internal API:** http://172.16.3.20:3000
- **BirthBiologic RMM site:** api_key `grmm_1ZB1qV9Q61b9Noq8BIaZGwLNjZMfF49i` (also in vault)
---
## Update: 21:30 UTC — Cleanup EXE, Debug Agent, BB-SERVER MSI Troubleshooting
### Context
Continuing from the previous compacted conversation. All work in this update is in the GuruRMM project (gururmm repo on Jupiter, local copy at D:\claudetools\projects\msp-tools\guru-rmm).
---
### 1. Cleanup EXE Deployment
Resumed deploying `gururmm-cleanup.exe` to Jupiter. Method used: base64-encode the EXE on Pluto via RMM agent command, capture the output, decode locally, SCP to Jupiter.
**Pluto agent ID:** `5316f56f-a1b3-4ac5-97ac-71ddf6a74d2e`
**JWT generation (Pluto admin user):**
```python
import json, base64, hmac, hashlib, time
secret_bytes = 'ZNzGxghru2XUdBVlaf2G2L1YUBVcl5xH0lr/Gpf/QmE='.encode('utf-8')
# User sub: 490e2d0f-067d-4130-98fd-83f06ed0b932 (admin@azcomputerguru.com)
```
**SCP to Pluto failed** (SYSTEM account has no SSH private key at `C:\Windows\System32\config\systemprofile\.ssh\`). Fell back to base64-through-agent approach.
**Base64 command sent to Pluto:**
```powershell
[Convert]::ToBase64String([IO.File]::ReadAllBytes('C:/gururmm/agent/target/debug-agent/release/gururmm-agent.exe'))
```
File size: 3.8 MB (3,948,544 bytes). B64 length: 5,264,728 chars.
**Decode locally and SCP to Jupiter:**
```bash
py -c "import base64; ..." # decode to D:/tmp/gururmm-agent-debug.exe
scp D:/tmp/gururmm-agent-debug.exe guru@172.16.3.30:/tmp/gururmm-agent-debug.exe
ssh guru@172.16.3.30 'sudo cp /tmp/gururmm-agent-debug.exe /var/www/gururmm/downloads/gururmm-agent-debug.exe'
```
**Result:** `/var/www/gururmm/downloads/gururmm-agent-debug.exe` deployed (3.8 MB).
`http://172.16.3.30:3001/install/debug/download` → HTTP 200 (3,948,544 bytes). ✓
**Note:** Cloudflare challenges `https://rmm.azcomputerguru.com/install/debug/download` for non-browser requests — this is expected/normal. Browser downloads work fine.
**Note on cleanup.exe:** Not yet built. The `gururmm-cleanup.exe` will be produced automatically by `build-agents.sh` on the next triggered build. The server route `/install/cleanup/download/exe` returns 503 until that first build completes.
---
### 2. Pluto's SSH Public Key (for future reference)
Pluto SYSTEM account does NOT have `id_ed25519`. The pubkey retrieved earlier (`system@PLUTO`) was incorrect or from a different context.
**Pluto's SYSTEM .ssh dir** contains only `known_hosts` (94 bytes).
**Jupiter's authorized_keys** was updated to add Pluto pubkey:
```
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIFWaMV0U3WZG3kuts7mqVaF9SN0TsKqPAC37GdVGbq0Y system@PLUTO
```
(Added to `/home/guru/.ssh/authorized_keys` — may be irrelevant since SYSTEM has no private key.)
---
### 3. Debug Agent Feature — build-agents.sh and Server Routes
**Already committed in prior session:**
- `build-agents.sh`: added `--features debug-agent --target-dir target\debug-agent` to Pluto SSH build command + SCP + deploy block
- `agent/Cargo.toml`: added `debug-agent = []` feature
- `agent/src/service.rs`: cfg-gated `SERVICE_NAME`, `SERVICE_DISPLAY_NAME`, `INSTALL_DIR`, `CONFIG_DIR` constants
- `agent/src/registry.rs`: `REGISTRY_KEY` = `SOFTWARE\GuruRMM-Debug` when feature enabled
- `agent/src/device_id.rs`: stores device ID in `C:\ProgramData\GuruRMM-Debug\.device-id`
- `agent/src/updater/mod.rs`: `detect_binary_path()` and `detect_config_dir()` use debug paths
- `server/src/main.rs` on Jupiter: routes for `/install/debug/download` and cleanup endpoints
- `server/src/api/install.rs` on Jupiter: `download_debug_exe()` handler
---
### 4. GuruRMM Debug Site Created
Created a new site for the debug agent to enroll into:
| Field | Value |
|-------|-------|
| Site ID | `d6b8233a-6cc1-4a44-888d-01ee49123fba` |
| Site name | GuruRMM Debug |
| Site code | `BOLD-HARBOR-1855` |
| API key | `grmm_mm2DnrF6kt9Ml8AyJCuHJJHnBTyXHX_4` |
| Client | AZ Computer Guru (`417420f4-c3f4-482a-acd4-d6f63c8cddde`) |
**Issue identified:** The debug agent currently prompts for a site code on first run because:
1. No config file exists
2. No site code embedded in the binary
**Fix needed (not yet done):** Hardcode the debug site API key into the `debug-agent` feature using a `cfg`-gated constant. Or embed it at build time. This would allow the debug EXE to auto-install silently without prompting.
**Current workaround:** User entered `BRIGHT-PEAK-5980` (BirthBiologic) when prompted.
---
### 5. BB-SERVER Connected
Debug agent installed on BB-SERVER (BirthBiologic's server) and is now online in the RMM.
| Field | Value |
|-------|-------|
| Agent ID | `6c02baa7-0f1c-4990-b466-c9ab9eaefd3b` |
| Hostname | BB-SERVER |
| OS | Windows Server 2016 (build 14393) |
| Agent version | 0.6.2 |
| Site | BirthBiologic Main Office (`3b20ef97-c764-4ef8-9154-79c3d5b486f8`) |
| Status | online |
---
### 6. MSI Installer Troubleshooting via BB-SERVER
Using BB-SERVER's debug agent to test the MSI installer and capture verbose logs.
**Problem 1 — Cloudflare blocks non-browser downloads:**
- `Invoke-WebRequest` without a browser UA gets Cloudflare's JS challenge page instead of the MSI
- Fix: pass `-UserAgent 'Mozilla/5.0 ...'` to Invoke-WebRequest
**Problem 2 — msiexec doesn't accept forward slashes:**
- Error 2203 "Cannot open database file" with C:/grmm.msi
- Fix: use `C:\\grmm.msi` (JSON-escaped backslash)
**Working command format:**
```
Invoke-WebRequest -Uri '...' -OutFile C:\\grmm.msi -UserAgent $ua -UseBasicParsing;
msiexec /i C:\\grmm.msi /quiet /l*v C:\\grmm.log;
Get-Content C:\\grmm.log -Tail 100
```
**Command in flight** (cmd ID `fa68659e-3395-48a2-adee-9624dfd40cd7`) — still running as of session save. Check with:
```bash
curl -s "http://172.16.3.30:3001/api/commands/fa68659e-3395-48a2-adee-9624dfd40cd7" \
-H "Authorization: Bearer <JWT>"
```
---
### 7. RMM API — Correct Endpoints
| Operation | Endpoint |
|-----------|----------|
| Send command | `POST http://172.16.3.30:3001/api/agents/:id/command` |
| Get command status | `GET http://172.16.3.30:3001/api/commands/:id` |
| List agents | `GET http://172.16.3.30:3001/api/agents` |
| Get site install info | `GET http://172.16.3.30:3001/api/sites/:id/install-info` |
| Download site MSI (auth) | `GET http://172.16.3.30:3001/api/sites/:id/installer` |
| Download site MSI (public) | `GET https://rmm.azcomputerguru.com/install/BRIGHT-PEAK-5980/download/msi` |
**JWT generation for API calls:**
- Secret (raw bytes): `ZNzGxghru2XUdBVlaf2G2L1YUBVcl5xH0lr/Gpf/QmE=`
- Admin user sub: `490e2d0f-067d-4130-98fd-83f06ed0b932` (admin@azcomputerguru.com)
- Claims: `sub`, `role: "admin"`, `orgs: []`, `exp: now+3600`, `iat: now`
- Algorithm: HS256, key = secret string encoded as UTF-8 bytes (NOT base64-decoded)
**Known user IDs:**
```
490e2d0f-067d-4130-98fd-83f06ed0b932 admin@azcomputerguru.com (admin)
4d754f36-0763-4f35-9aa2-0b98bbcdb309 claude-api@azcomputerguru.com (admin)
294c1242-68ac-42e7-85b0-564c8b155dba howard@azcomputerguru.com (admin)
```
---
### 8. JSON Escaping Issue with Agent Commands
The RMM server's serde_json is strict about JSON escape sequences. Commands containing `\"` embedded double-quotes cause "invalid escape" errors when passed via `--data-binary @file` from curl if there are edge cases.
**Working approach:** Use shell single-quote wrapping with `'"'"'` technique for embedded single-quoted PowerShell strings in the curl -d argument. Avoids file escaping entirely.
**Key rules:**
- Never use `\g`, `\L`, `\D`, etc. — only valid JSON escapes: `\\`, `\"`, `\/`, `\b`, `\f`, `\n`, `\r`, `\t`, `\uXXXX`
- Forward slashes are fine in JSON strings
- Backslashes in PowerShell paths need `\\` in JSON (gives `\` in the actual string)
---
### Pending Tasks
| Task | Status | Notes |
|------|--------|-------|
| Cleanup EXE on Pluto | Pending | Needs first build trigger. Route ready, will 503 until built. |
| Debug agent auto-install | Not done | Needs hardcoded debug site key in `debug-agent` feature |
| MSI 2762 test on BB-SERVER | In progress | Command running, awaiting result |
| BirthBiologic — MSI verified working | Pending | Testing now |
| BirthBiologic — consent tenant-admin/user-manager | Pending | |
| BirthBiologic — Datto→SharePoint migration script | Pending | |
| mvaninc CA policy (MFA) | Pending | Mike to do manually in portal |
| Remote uninstall feature | Pending | New WS message + server DELETE endpoint + dashboard button |
---
### Infrastructure Additions This Update
| Item | Value |
|------|-------|
| Debug site | BOLD-HARBOR-1855, api_key `grmm_mm2DnrF6kt9Ml8AyJCuHJJHnBTyXHX_4` |
| BB-SERVER agent | ID `6c02baa7-...`, online, BirthBiologic Main Office |
| Debug EXE | `/var/www/gururmm/downloads/gururmm-agent-debug.exe` (3.8 MB) |