Files
claudetools/session-logs/2026-05-15-session.md
Mike Swanson 31088cb8de sync: auto-sync from DESKTOP-0O8A1RL at 2026-05-15 15:23:02
Author: Mike Swanson
Machine: DESKTOP-0O8A1RL
Timestamp: 2026-05-15 15:23:02
2026-05-15 15:23:05 -07:00

652 lines
50 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Session Log — 2026-05-15
---
## Update: 06:21 UTC — Session log housekeeping, submodule sync fix
### Session Summary
After completing the main RMM work (fleet update, dead write-half fix), the session turned to housekeeping: establishing correct session log placement for GuruRMM work and fixing the submodule to stay current on sync.
Session log placement was corrected end-to-end. The convention had been ambiguous — session logs were being committed to the gururmm submodule repo, then the claudetools parent repo updated the submodule pointer, creating unnecessary double commits and coupling session notes to a code repo. The rule was established: GuruRMM session logs belong in claudetools `session-logs/` root, not in the gururmm repo. CLAUDE.md and FILE_PLACEMENT_GUIDE.md were updated with explicit rules. Today's session log (written earlier in the session) was moved from the gururmm repo to the correct location in claudetools.
All historical session logs in the gururmm repo were then audited and migrated. Nine files were found: four were unique to gururmm and copied to claudetools, four had duplicates in claudetools where the gururmm version was more complete (replaced), and one where the claudetools version was longer (kept). All nine were then deleted from gururmm (commit `02d10b7` on gururmm, `3042975``02d10b7` on server). The gururmm repo is now session-log-free.
The sync.sh script was updated in two passes to properly maintain the submodule. First pass added a Phase 1a that ran `git submodule update --remote` — this fetched the latest gururmm commits but left the submodule in detached HEAD state. Second pass replaced this with a `set +e`-guarded block that runs `git fetch origin`, `git checkout main`, and `git merge --ff-only origin/main` inside each submodule, ensuring the working tree is on the `main` branch and fast-forwarded. `.gitmodules` was also updated to declare `branch = main` so git knows which remote branch to track with `--remote`.
### Key Decisions
- **Session logs in claudetools, not gururmm**: gururmm is a code repo; mixing session notes into it creates noise in git history and couples operational logs to a repo that developers and tools may clone independently.
- **Replace claudetools with longer gururmm version**: where the same date existed in both repos, line count was used as a proxy for completeness (more lines = session was appended to over time). The one case where claudetools was longer (04-20), claudetools was kept.
- **`set +e` / `set -e` wrapper for submodule ops**: git emits non-fatal status messages ("Your branch is behind") that, under `set -e`, were triggering exit code 128 and killing the script. Temporarily disabling errexit for the submodule section is the standard solution.
- **`git merge --ff-only` rather than `git pull --rebase`**: submodule should never have local commits that need rebasing; if it does, fast-forward failing is the right signal to investigate rather than silently rebase.
### Problems Encountered
- **`set -e` + `git checkout main` = exit 128**: "Your branch is behind 'origin/main'" is stdout output from a successful checkout, but something in the submodule context caused exit code 128. Resolution: wrap the entire submodule block in `set +e` / `set -e`.
- **`git submodule update --remote` leaves detached HEAD**: `--remote` checks out the target commit directly rather than staying on a branch. Resolution: follow with explicit `git checkout main` and `git merge --ff-only` inside the submodule.
- **Binary deployed to wrong path on first try**: copied new server binary to `/usr/local/bin/` but systemd unit points to `/opt/gururmm/`. Resolution: stop service, copy to correct path, start.
- **`cp: Text file busy`**: attempted to copy new binary while service was running. Resolution: stop first, then copy.
### Configuration Changes
| File | Change |
|------|--------|
| `.claude/CLAUDE.md` | Added explicit GuruRMM session log placement rule (root session-logs/, not submodule) |
| `.claude/FILE_PLACEMENT_GUIDE.md` | Added GuruRMM row to quick reference table |
| `.claude/scripts/sync.sh` | Added Phase 1a: submodule fetch + checkout main + ff-merge |
| `.gitmodules` | Added `branch = main` to gururmm submodule entry |
| `session-logs/2025-12-15-session.md` | Migrated from gururmm (created) |
| `session-logs/2025-12-20-session.md` | Migrated from gururmm (created) |
| `session-logs/2026-04-19-session.md` | Replaced with longer gururmm version |
| `session-logs/2026-04-21-session.md` | Replaced with longer gururmm version |
| `session-logs/2026-05-12-session.md` | Replaced with longer gururmm version |
| `session-logs/2026-05-12-guru-rmm-macos-agent-phase1.md` | Migrated from gururmm (created) |
| `session-logs/2026-05-13-session.md` | Replaced with longer gururmm version |
| `session-logs/2026-05-14-session.md` | Migrated from gururmm (created) |
### Reference Information
- gururmm session log removal commit: `3042975` (server local), pushed as `02d10b7` (Gitea)
- sync.sh submodule fix commits: `415476e` (first pass, --remote), `b6c981d` (second pass, branch-aware)
- claudetools migration commit: `39bc5f1` (session log migration)
## User
- **User:** Mike Swanson (mike)
- **Machine:** DESKTOP-0O8A1RL
- **Role:** admin
- **Session Span:** ~03:30 UTC 06:03 UTC (continued from prior context window)
---
## Session Summary
This session was a continuation of a prior context window that had implemented 0.6.19 agent features (extended temperature sensors, wts.rs Windows fixes, watchdog always-on policy changes). The immediate work on entry was completing the 0.6.19 fleet rollout: three agents — IMC1 (fa99e913), GND-SERVER (cd086074), and CS-SERVER (6766e973) — were stuck on 0.6.18 with dead WebSocket write halves. The server's ConnectedAgents in-memory map held stale entries: read side (heartbeats) still worked, but write side (commands) was dead, so update dispatch failed with "Agent is offline" even though DB showed them online.
The first approach was setting those agents offline in the DB to force a reconnect. This failed because the agents were still heartbeating (the server's in-memory read task was alive), so the DB immediately got updated back to online on the next heartbeat. A server restart was needed to clear the in-memory map. After restart, all three agents reconnected with fresh connections within seconds and immediately accepted the 0.6.19 update. All completed successfully within 3 seconds of reconnection.
During log inspection, two server bugs were identified and fixed. First: `TemperatureSensor` struct in `server/src/ws/mod.rs` used field names `temp_celsius` and `critical_celsius`, but the agent's `SensorReading` struct serializes to `value`, `sensor_type`, `unit`, and `critical_value`. Every metrics message from any agent that included temperature readings caused a deserialization error (`missing field 'temp_celsius'`) that was logged but silently dropped the data. Second: the WebSocket receive loop did not monitor the send task. When a WebSocket write failed (killing the send task), the receive loop continued running indefinitely, keeping the agent in ConnectedAgents with a dead write half. Every subsequent command dispatch attempt failed silently. The fix uses `tokio::select!` to watch both incoming messages and the send task — when the send task exits, the receive loop breaks, cleanup removes the agent from ConnectedAgents, and the agent reconnects fresh.
Both fixes were implemented via Python patch script on the server source, compiled with cargo (4m 6s build), and deployed by stopping the service, replacing `/opt/gururmm/gururmm-server`, and restarting. The fixes were committed and pushed to Gitea as commit 56283dd. The patched server ran cleanly with no `temp_celsius` errors and no failed command dispatches in the new process's logs.
At session end: 15 online agents on 0.6.19, AD2 on 0.6.1 (offline since April 20, requires physical/VPN access), and ~30 offline agents on older versions that will auto-update on next reconnect.
---
## Key Decisions
- **Server restart over DB-offline trick**: Setting agents offline in the DB does not disconnect them because the server's in-memory receive loop is still running and updates `last_seen` on every heartbeat, racing with any DB status change. Only a server restart clears the in-memory ConnectedAgents map. Accepted the brief (~10s) outage of all agents.
- **`biased` ordering in select! (send_task first)**: Could have put incoming messages first, but polling send_task first ensures dead write halves are detected on the very next loop iteration rather than waiting for the next incoming message. Incoming messages still get processed every iteration as long as the send task is alive.
- **TemperatureSensor renamed to match agent**: Rather than aliasing with `#[serde(rename)]`, fully renamed the struct fields to match the agent's canonical names (`value`, `sensor_type`, `unit`, `critical_value`). Any previously stored JSON in the `temperatures` column used wrong field names and was silently unreadable, so there's no backward-compat cost to renaming.
- **Edit directly on server vs. local + push**: Local repo is a stale copy of gururmm. Edited the live source on `/home/guru/gururmm/`, built there, deployed, then committed and pushed. Faster than any local→Gitea→pull flow, and the single file edit was low-risk.
- **Deployed first, then pushed to Gitea**: Committed after confirming the fix worked in production. Appropriate for a targeted bugfix with no DB migrations.
---
## Problems Encountered
- **`cp: cannot create regular file '/opt/gururmm/gururmm-server': Text file busy`**: Tried to copy the new binary while the service was running. Resolution: stop service first (`systemctl stop`), then copy, then start. Standard Linux "can't replace a running executable" behavior.
- **Binary deployed to wrong path first**: Copied to `/usr/local/bin/gururmm-server` but systemd unit's `ExecStart` points to `/opt/gururmm/gururmm-server`. The service restarted but ran the old binary. Identified by checking `systemctl show gururmm-server --property=ExecStart`. Resolution: stop/copy to correct path/start.
- **git push rejected (non-fast-forward)**: Remote had commits not in local. Resolution: `git pull --rebase` then `git push`.
- **psql peer auth failed**: `psql -U gururmm gururmm` uses peer auth (Unix socket), requires matching OS user. Used `sudo -u postgres psql -d gururmm` to execute queries as postgres superuser.
- **`temp_celsius` errors in patched server logs**: After deploying the patch (PID 946066), still saw `temp_celsius` errors in `journalctl`. Turned out those error lines had PID 943615 or 945573 (old server instances) — the patched server produced none. Confirmed by filtering with `_PID=946066`.
---
## Configuration Changes
### Server Source (on `/home/guru/gururmm/`)
**`server/src/ws/mod.rs`** — Two changes:
1. `TemperatureSensor` struct renamed to match agent:
- `temp_celsius: f32``value: f32`
- `critical_celsius: Option<f32>``critical_value: Option<f32>`
- Added: `sensor_type: String`, `unit: String`
2. `let send_task``let mut send_task`
3. Receive loop changed from `while let Some(msg_result) = receiver.next().await` to `loop { tokio::select! { biased; _ = &mut send_task => { warn!(...); break; } msg_result = receiver.next() => { ... } } }`
### Binary Deployed
- `/opt/gururmm/gururmm-server` — replaced with build from 2026-05-15 03:47 UTC
### Commits
- Gururmm repo: `56283dd` — "fix: TemperatureSensor schema mismatch and dead write-half detection"
---
## Credentials & Secrets
None new. API credentials used:
- GuruRMM API login: `claude-api@azcomputerguru.com` / `ClaudeAPI2026!@#` (from vault, used to get JWT for manual update trigger attempts)
---
## Infrastructure & Servers
- **GuruRMM Server**: `172.16.3.30:3001` — Rust/Axum, systemd unit `gururmm-server`
- **Binary path**: `/opt/gururmm/gururmm-server`
- **Source path**: `/home/guru/gururmm/` (git repo, remote at `172.16.3.20:azcomputerguru/gururmm.git`)
- **Gitea**: `http://172.16.3.20:3000` (internal, not git.azcomputerguru.com which is behind Cloudflare)
- **DB**: PostgreSQL on `172.16.3.30`, database `gururmm`, accessed via `sudo -u postgres psql -d gururmm`
---
## Commands & Outputs
```bash
# Set agents offline to force reconnect (didn't work alone, needed restart too)
sudo -u postgres psql -d gururmm -c \
"UPDATE agents SET status='offline' WHERE hostname IN ('IMC1','GND-SERVER','CS-SERVER') RETURNING hostname, status, agent_version;"
# Server restart (clears in-memory ConnectedAgents map)
sudo systemctl restart gururmm-server
# Build patched server (4m 6s)
cd /home/guru/gururmm/server && /home/guru/.cargo/bin/cargo build --release
# Deploy (stop-first pattern to avoid "Text file busy")
sudo systemctl stop gururmm-server
sudo cp /home/guru/gururmm/server/target/release/gururmm-server /opt/gururmm/gururmm-server
sudo systemctl start gururmm-server
# Commit and push fixes
cd /home/guru/gururmm
git add server/src/ws/mod.rs
git commit -m 'fix: TemperatureSensor schema mismatch and dead write-half detection'
git pull --rebase && git push
# Result: 56283dd pushed to 172.16.3.20:azcomputerguru/gururmm.git
```
Key log evidence of dead write half (before fix):
```
INFO gururmm_server::ws: Dispatching update to connected agent fa99e913... on heartbeat: 0.6.18 -> 0.6.19
ERROR gururmm_server::ws: Failed to send heartbeat update command to agent fa99e913... — rolling back pending record
```
After restart + update:
```
INFO gururmm_server::ws: Received update result from agent fa99e913...: update_id=..., status=starting
INFO gururmm_server::ws: Agent fa99e913... reconnected after update: 0.6.18 -> 0.6.19
```
---
## Pending / Incomplete Tasks
- **AD2 (0.6.1, offline since 2026-04-20)**: Requires physical or VPN access. Cannot be updated remotely. Low priority but should be investigated when accessible.
- **BB-SERVER enrollment loop**: Repeatedly hitting `duplicate key value violates unique constraint "idx_agents_site_device"` on every WS connect attempt. Not investigated. The agent is already enrolled (row exists) but its auth flow is re-attempting first-time enrollment. Likely needs a code fix in the site-based auth logic to handle "already enrolled, just reconnecting" more gracefully.
- **Offline agents on older versions** (will auto-update on reconnect):
- 0.6.18: LAPTOP-8P7HDSEI, MSI, Maras-HP-Laptop
- 0.6.3: ~14 machines (ACCT2-PC, ANN-PC, ASSISTMAN-PC, etc. — Stamback/Safesite fleet)
- 0.6.2: NurseAssist, PST-SURFACE, StambackLaptopNew
- 0.6.1: Mikes-MacBook-Air.local (offline)
- 0.5.1: SL-SERVER x2 (offline, possibly abandoned)
- **`unsupported Unicode escape sequence` on hardware inventory for IMC1**: Logged at `WARN` level after 0.6.19 update. The agent's hardware inventory JSON contains a Unicode escape sequence that PostgreSQL rejects. Likely a field value (serial number, software name, etc.) with a problematic character. Not investigated.
- **Dead write half root cause not fully diagnosed**: We know the pattern (send_task dies, receive loop keeps running), and the fix prevents it from being persistent. But what originally causes the send_task to die (network issue? buffer full? specific message type?) is not determined. The `select!` fix means it self-heals now (agent reconnects), so this is lower priority.
- **Policy wiring plan** (`ticklish-questing-stallman.md`): Full end-to-end policy propagation still pending. Server sends ConfigUpdate on connect (wired), but agent-side handling is not complete. Deferred.
- **Safesite Glendale MSI machine**: Waiting for user to be away to push DisplayLink driver update.
- **LHM bundling in MSI**: LibreHardwareMonitor files not in build pipeline; self-healing download not implemented.
- **Build lock**: No flock on `build-agents.sh` to prevent concurrent invocations.
---
## Reference Information
- **Gururmm Gitea repo**: `http://172.16.3.20:3000/azcomputerguru/gururmm`
- **Fix commit**: `56283dd` — fix: TemperatureSensor schema mismatch and dead write-half detection
- **Server source**: `/home/guru/gururmm/server/src/ws/mod.rs`
- **Agent metrics struct**: `agent/src/metrics/mod.rs:17``SensorReading { label, value, sensor_type, unit, critical_value }`
- **Server TemperatureSensor struct**: `server/src/ws/mod.rs:316` — now matches agent
- **Dead write half fix**: `server/src/ws/mod.rs:679``let mut send_task`, receive loop at ~691 uses `tokio::select!`
- **Plan file**: `C:\Users\guru\.claude\plans\ticklish-questing-stallman.md` (policy wiring, deferred)
- **Fleet status as of session end**:
- Online on 0.6.19: CS-SERVER, DESKTOP-0O8A1RL, DESKTOP-BTR2AM3, DESKTOP-DLTAGOI, DESKTOP-H6QHRR7, DESKTOP-KQSL232, DF-GAGETRAK, GND-SERVER, IMC1, LAPTOP-DRQ5L558, LAPTOP-E0STJJE8, MAINTENANCE-PC, MDIRECTOR-PC, NURSESTATION-PC, gururmm (15 agents)
- Online on 0.6.1: AD2 (offline since 2026-04-20, unreachable)
---
## Update: 07:50 PT — Network discovery: hostname lookup, subnet auto-detection, fleet update to 0.6.20
## User
- **User:** Mike Swanson (mike)
- **Machine:** DESKTOP-0O8A1RL
- **Role:** admin
- **Session Span:** ~07:00 PT 07:50 PT (continued from prior context window)
---
## Session Summary
This session picked up from a prior context window that had implemented the network discovery hostname lookup and subnet auto-detection features. All code changes across 8 files had been applied but a compile error was blocking the build: `format!({}/{}, network, prefix)` on line 775 of `agent/src/metrics/mod.rs` was missing quotes around the format string. Fixed with a single `sed` line-number substitution.
Agent and server release builds were launched in parallel. Agent (0.6.19) compiled clean. Server failed with a second missing-quotes error in the new `get_suggested_subnets` handler: `iface.get(ipv4_subnets)` instead of `iface.get("ipv4_subnets")` at line 301 of `server/src/api/discovery.rs`. Fixed and server rebuilt successfully. Dashboard TypeScript build then failed with multiple missing string literals: `.join(, )` instead of `.join(", ")` in two places, bare `manual` instead of `"manual"` in two places (one the earlier Python fix missed), `api.get<string[]>()` with no URL argument, and `setIpRanges()`/`setExclusions()` with no empty-string argument. Each required a targeted fix. The `_getSensorUnit` function in `AgentDetail.tsx` was declared but unused (pre-existing dead code that TS6133 finally flagged); it was deleted.
All three artifacts built clean after the fixes. Server binary was deployed (stop/copy/start pattern), dashboard dist was copied to `/var/www/gururmm/dashboard/`, and all changes were committed to the gururmm repo as `0c60d36`. The `latest` symlink and `gururmm-agent-linux-amd64-latest` were both pointing at 0.6.19, which meant the scanner would not dispatch updates. Version bumped to 0.6.20, rebuilt, and the binary + sha256 placed at `/var/www/gururmm/downloads/gururmm-agent-linux-amd64-0.6.20`. The version bump was committed as `c97b0f3`.
At the 14:47 UTC scan (5-minute interval), the server found 50 binaries (up from 49), immediately identified agents on 0.6.19 as needing an update, dispatched to the first connected agent, and that agent reconnected on 0.6.20 within 11 seconds. Fleet rollout is proceeding automatically on heartbeat.
---
## Key Decisions
- **Single-quoted SSH heredocs do not protect backtick template literals**: Despite using `<< 'ENDSCRIPT'`, bash inside an SSH double-quoted command still executed backtick template literals in the heredoc content as command substitution. Workaround: build the TypeScript template literal string using Python's `chr(96)` to represent the backtick character, passing everything via `python3 -c '...'` with single-quoted outer shell quoting.
- **Version bump to 0.6.20 required to trigger fleet update**: The scanner only dispatches updates when the available version is strictly greater than the agent's reported version. Since the discovery feature changes (PTR lookup, subnet reporting) were built at 0.6.19, a bump to 0.6.20 was needed to push the update to the fleet. Alternative (editing the binary in-place without a version bump) would have left agents unaware of the new capabilities.
- **Correct downloads directory was `/var/www/gururmm/downloads/`, not `/opt/gururmm/updates/`**: The server's `DOWNLOADS_DIR` env var (from `/opt/gururmm/.env`) points to the web-accessible path. The `/opt/gururmm/updates/` directory is not scanned. This was discovered when the scanner continued reporting 49 binaries after placing the file in the wrong location.
- **`latest` symlink updated alongside versioned binary**: The `gururmm-agent-linux-amd64-latest` symlink is used by agent self-updaters that don't know the target version ahead of time. Updated atomically with `ln -sf` to point at 0.6.20.
---
## Problems Encountered
- **`format!({}/{}, network, prefix)` compile error**: Missing double quotes around the format string in the subnet CIDR formatting line. Fixed with `sed -i '775s/...'` line-number substitution.
- **`iface.get(ipv4_subnets)` compile error in server**: Same pattern — missing quotes made Rust look for a variable named `ipv4_subnets`. Fixed with `sed -i` on the specific line.
- **Dashboard TS errors — multiple missing string literals**: Python patch scripts applied earlier in the session used heredocs that silently dropped or corrupted string content (backticks executed as commands, quotes stripped). Result: `.join(, )`, `setIpRanges()`, `setSchedule(manual)`, `api.get<string[]>()` (no URL) in the compiled TypeScript. Fixed with targeted `sed -i` and `python3 -c` with `chr(96)` for backtick characters.
- **`_getSensorUnit` TS6133 error**: Prefixing with `_` does not suppress TS6133 for function declarations (only works for parameters/variables). Resolved by deleting the unused function entirely.
- **Binary placed in wrong updates directory**: Placed initial 0.6.20 binary at `/opt/gururmm/updates/` (wrong) instead of `/var/www/gururmm/downloads/` (correct, from `.env`). Scanner continued to report 49 binaries. Found the correct path by reading `.env` and confirmed by comparing `ls` counts vs the scanner's "49 binaries" log output.
---
## Configuration Changes
### Server Source (`/home/guru/gururmm/`)
| File | Change |
|------|--------|
| `agent/Cargo.toml` | Bumped version 0.6.19 → 0.6.20 |
| `agent/src/metrics/mod.rs` | Fixed `format!({}/{}, ...)``format!("{}/{}", ...)` on line 775; added `use if_addrs::IfAddr`, `ipv4_subnets` field, subnet collection block |
| `agent/src/discovery/mod.rs` | Replaced stub `reverse_dns()` with working PTR implementation using `dns_lookup::lookup_addr` in `spawn_blocking` |
| `agent/Cargo.toml` | Added `if-addrs = "0.10"` and `dns-lookup = "2"` |
| `server/src/api/discovery.rs` | Added `get_suggested_subnets` handler; fixed `iface.get("ipv4_subnets")` quote |
| `server/src/api/mod.rs` | Added `.route("/agents/:id/discovery/subnets", get(discovery::get_suggested_subnets))` |
| `server/src/ws/mod.rs` | Added `#[serde(default)] pub ipv4_subnets: Vec<String>` to `NetworkInterface` struct |
| `dashboard/src/api/client.ts` | Added `getSuggestedSubnets` to `discoveryApi`; fixed missing URL in `api.get<string[]>()` |
| `dashboard/src/components/DiscoveryTab.tsx` | Two-effect pattern for subnet auto-population; fixed all missing string literals |
| `dashboard/src/pages/AgentDetail.tsx` | Deleted unused `getSensorUnit` / `_getSensorUnit` function |
### Deployed Artifacts
| Path | Change |
|------|--------|
| `/opt/gururmm/gururmm-server` | Replaced with build from 2026-05-15 14:32 UTC |
| `/var/www/gururmm/dashboard/` | Replaced with dashboard dist from 2026-05-15 14:38 UTC |
| `/var/www/gururmm/downloads/gururmm-agent-linux-amd64-0.6.20` | New — 3.9 MB, sha256 `ed5ce77cd5d9e30ee9f5a73a6904e7f6667041ab9fff798e7d255a905efbf1a2` |
| `/var/www/gururmm/downloads/gururmm-agent-linux-amd64-0.6.20.sha256` | New — companion checksum |
| `/var/www/gururmm/downloads/gururmm-agent-linux-amd64-latest` | Symlink updated: 0.6.19 → 0.6.20 |
---
## Credentials & Secrets
None new.
---
## Infrastructure & Servers
- **GuruRMM Server**: `172.16.3.30:3001` — Rust/Axum, systemd unit `gururmm-server`
- **Downloads dir**: `/var/www/gururmm/downloads/` (configured via `DOWNLOADS_DIR` in `/opt/gururmm/.env`)
- **Dashboard nginx root**: `/var/www/gururmm/dashboard/`
- **Downloads base URL**: `https://rmm-api.azcomputerguru.com/downloads`
- **Scanner interval**: 300s (5 min), configured via `SCAN_INTERVAL_SECS` env var (default 300)
---
## Commands & Outputs
```bash
# Fix format! quote (line 775 of agent/src/metrics/mod.rs)
sed -i '775s/.*/ let cidr = format!("{}\/{}", network, prefix);/' \
/home/guru/gururmm/agent/src/metrics/mod.rs
# Fix server quote (line 301 of server/src/api/discovery.rs)
sed -i '301s/iface.get(ipv4_subnets)/iface.get("ipv4_subnets")/' \
/home/guru/gururmm/server/src/api/discovery.rs
# Fix client.ts backtick URL using chr(96) trick
python3 -c "
path = '/home/guru/gururmm/dashboard/src/api/client.ts'
bt = chr(96)
new_line = ' api.get<string[]>(' + bt + '/api/agents/\${agentId}/discovery/subnets' + bt + '),\n'
lines = open(path).readlines()
for i, line in enumerate(lines):
if 'api.get<string[]>()' in line and 'getSuggestedSubnets' not in line:
lines[i] = new_line
open(path, 'w').writelines(lines)
"
# Deploy server
sudo systemctl stop gururmm-server
sudo cp /home/guru/gururmm/server/target/release/gururmm-server /opt/gururmm/gururmm-server
sudo systemctl start gururmm-server
# Deploy dashboard
sudo cp -r /home/guru/gururmm/dashboard/dist/. /var/www/gururmm/dashboard/
# Place 0.6.20 agent binary
DEST=/var/www/gururmm/downloads/gururmm-agent-linux-amd64-0.6.20
sudo cp /home/guru/gururmm/agent/target/release/gururmm-agent "$DEST"
sudo chmod 755 "$DEST"
sha256sum "$DEST" | awk '{print $1}' | sudo tee "$DEST.sha256" > /dev/null
sudo ln -sf gururmm-agent-linux-amd64-0.6.20 \
/var/www/gururmm/downloads/gururmm-agent-linux-amd64-latest
```
14:47 UTC scan confirmation:
```
INFO gururmm_server::updates::scanner: Scanned 50 agent binaries across 5 platform/arch combinations
INFO gururmm_server::updates::scanner: Agent needs update: 0.6.19 -> 0.6.20 (linux-amd64, channel=stable)
INFO gururmm_server::ws: Dispatching update to connected agent 8cd0440f-... on heartbeat: 0.6.19 -> 0.6.20
INFO gururmm_server::ws: Agent 8cd0440f-... reconnected after update: 0.6.19 -> 0.6.20
```
---
## Pending / Incomplete Tasks
- **Fleet update to 0.6.20**: Rollout underway automatically on heartbeat. Agents update one at a time as they heartbeat. Offline agents will update on next reconnect.
- **AD2 (0.6.1, offline since 2026-04-20)**: Requires physical or VPN access. Unchanged.
- **BB-SERVER enrollment loop**: `duplicate key value violates unique constraint "idx_agents_site_device"` on every WS connect. Agent already enrolled, auth flow re-attempting first-time enrollment. Needs code fix.
- **`unsupported Unicode escape sequence` on hardware inventory for IMC1**: Logged at WARN after 0.6.19 update. Unresolved — likely a problematic character in a serial number or software name field.
- **Policy wiring plan** (`ticklish-questing-stallman.md`): Full end-to-end policy propagation deferred. Server sends ConfigUpdate on connect (wired), agent-side handling not complete.
- **Windows/macOS agents**: Only Linux 0.6.20 built this session. Windows and macOS builds require the `build-agents.sh` script (which handles cross-compilation / signing). Not run this session.
- **LHM bundling in MSI**: LibreHardwareMonitor files not in build pipeline; self-healing download not implemented.
- **Build lock**: No flock on `build-agents.sh` to prevent concurrent invocations.
- **Safesite Glendale MSI machine**: Waiting for user to be away to push DisplayLink driver update.
---
## Reference Information
- **Feature commit**: `0c60d36` — feat: network discovery hostname lookup, subnet auto-detection, fix IP display and new_devices count
- **Version bump commit**: `c97b0f3` — chore: bump agent version to 0.6.20 (hostname lookup + subnet reporting)
- **Gururmm Gitea repo**: `http://172.16.3.20:3000/azcomputerguru/gururmm`
- **Downloads dir**: `/var/www/gururmm/downloads/` (from `DOWNLOADS_DIR` in `/opt/gururmm/.env`)
- **Agent 0.6.20 sha256**: `ed5ce77cd5d9e30ee9f5a73a6904e7f6667041ab9fff798e7d255a905efbf1a2`
- **New API endpoint**: `GET /api/agents/:id/discovery/subnets` → returns `Vec<String>` of CIDR subnets from agent's reported network interfaces
- **Discovery DB fixes**: `server/src/db/discovery.rs``host(ip_address)` instead of `ip_address::text`; `complete_scan()` computes `new_devices` via CTE
- **Subnet field**: agents now report `ipv4_subnets: Vec<String>` alongside `ipv4_addresses` in `NetworkInterface` struct (both agent and server side)
- **PTR lookup**: `agent/src/discovery/mod.rs``dns_lookup::lookup_addr(&ip)` wrapped in `spawn_blocking`
---
## Update: 09:13 PT — Zombie connection fix (0.6.21) + automated changelog system
## User
- **User:** Mike Swanson (mike)
- **Machine:** DESKTOP-0O8A1RL
- **Role:** admin
- **Session span:** ~08:3009:13 PT (continued from prior context window)
## Session Summary
Investigation began after a screenshot showed a failed network discovery scan at 8:26 AM (19ms, no devices) on the gururmm site. The discovery node (agent 8cd0440f on host `gururmm`) had been unavailable since 14:48:36 UTC — over an hour without reconnecting, despite the process (PID 1026153) still running.
Diagnostic work confirmed the agent had zero TCP connections but was logging metrics every 60 seconds (in two interleaved streams, ~3 seconds apart). The dual metrics stream is normal: the `connect_and_run` metrics task and the `main.rs` metrics loop both log independently. The absence of any reconnect attempts or timeout messages pointed to the agent being stuck inside `connect_and_run` with what appeared to be a live WebSocket but was actually a zombie: Cloudflare held the client-side WebSocket open after the backend server closed it at 14:48:36 (TCP RST), so the agent receive-side was blocking indefinitely with no error.
Root cause in `agent/src/transport/websocket.rs`: the 90-second connection timeout used `tokio::time::sleep(Duration::from_secs(90))` inside the select loop. Because this sleep restarts from zero on every loop iteration — and the heartbeat task fires every 30 seconds, resetting the sleep constantly — the timeout never expired. Fix: track `last_incoming = Instant::now()` initialized before the loop, update it only in the incoming message branch, replace the sleep with `sleep_until(last_incoming + Duration::from_secs(90))`. Timeout now fires if no server message is received for 90 seconds regardless of outgoing heartbeat frequency.
After restarting the service to restore the discovery node immediately, the fix was implemented, agent bumped to 0.6.21, built, and deployed. The scanner picked up the new binary and dispatched auto-update at 16:12:02 UTC. PID changed from 1033371 to 1038912 with "Backup file cleaned up" confirming the full update flow end-to-end.
Second half of the session implemented automated changelog generation. `scripts/generate-changelog.sh` generates two sections per build: a user-facing release notes section (parsed from conventional commits — feat/fix/perf prefixes) and a full developer section (complete git log with commit bodies for the component path since the previous version). Wired into `agent/build-all-platforms.sh` and new `build-server.sh`. Files stored in `changelogs/agent/vX.Y.Z.md` and `changelogs/server/vX.Y.Z.md` in the repo (GrepAI indexes them) and copied to `/var/www/gururmm/changelogs/` for serving. Two server API endpoints added: `GET /api/changelog/:component/latest` and `GET /api/changelog/:component/:version`. All committed and pushed to Gitea.
## Key Decisions
- **`sleep_until` anchored to incoming messages only** — fix must not reset the deadline on outgoing writes. Cloudflare accepts writes from the agent while sending nothing back; any reset on outgoing events would continue masking zombie connections.
- **90-second deadline retained** — matches existing intent. Healthy connections see server messages (ConfigUpdate, AuthAck) on reconnect well within 90 seconds.
- **Service restart before code fix** — restored the discovery node immediately rather than waiting for the full build cycle.
- **Changelog in-repo + served directory** — git repo location ensures GrepAI indexes content for context searches; `/var/www/gururmm/changelogs/` copy serves the API endpoint.
- **No Ollama for changelog generation** — server (172.16.3.30) cannot reach Ollama at 100.92.127.64:11434. Shell-based conventional commit parsing used instead; clean release notes without AI dependency.
- **Version path sanitization in changelog endpoint** — only digits, dots, and leading `v` allowed to prevent path traversal. Component validated against allowlist.
## Problems Encountered
- **Zombie connection not self-detecting**: Agent stuck ~56 minutes without triggering its own 90s timeout. `sleep(90s)` inside select loop resets on every iteration; 30s heartbeats prevented it from ever firing. Fixed with `sleep_until`.
- **Dual metrics stream misread**: Initially suspected as evidence of two concurrent reconnects or task leak. Actually normal — two independent timers started at slightly different times. Not a bug.
- **Changelog directory write permissions**: `generate-changelog.sh` runs as `guru`; `/var/www/gururmm/changelogs/` owned by root. Added `sudo mkdir -p` and `sudo cp` with `|| true` fallback.
- **Heredoc quoting failures**: Multiple SSH heredoc and Python one-liner attempts failed due to quote escaping. Resolved by writing scripts to `/tmp/` locally and using `scp`.
## Configuration Changes
**Modified (gururmm repo):**
- `agent/src/transport/websocket.rs``last_incoming` deadline replacing `sleep(90s)`; imports updated
- `agent/Cargo.toml` — version 0.6.20 -> 0.6.21
- `server/src/api/mod.rs` — added `pub mod changelog;` and two changelog routes
- `agent/build-all-platforms.sh` — appended changelog generation call
**Created (gururmm repo):**
- `server/src/api/changelog.rs``latest` and `by_version` handlers
- `scripts/generate-changelog.sh` — dev + user changelog generator
- `build-server.sh` — build, deploy, changelog in one script
- `changelogs/agent/v0.6.21.md`, `changelogs/server/v0.3.1.md`
- `changelogs/LATEST_AGENT.md`, `changelogs/LATEST_SERVER.md`
**Modified (server filesystem):**
- `/opt/gururmm/.env` — added `CHANGELOG_DIR=/var/www/gururmm/changelogs`
- `/usr/local/bin/gururmm-agent` — auto-updated to 0.6.21
- `/opt/gururmm/gururmm-server` — redeployed with changelog endpoint
**Created (server filesystem):**
- `/var/www/gururmm/changelogs/` — served changelog directory
- `/var/www/gururmm/downloads/gururmm-agent-linux-amd64-0.6.21` + `.sha256`
## Credentials & Secrets
None new.
## Infrastructure & Servers
- **GuruRMM server**: 172.16.3.30:3001, service `gururmm-server` (PID 1022326)
- **GuruRMM agent** (gururmm host): PID 1038912, version 0.6.21
- **Agent WebSocket**: `wss://rmm-api.azcomputerguru.com/ws` (through Cloudflare)
- **Changelog API**: `https://rmm-api.azcomputerguru.com/api/changelog/:component/latest`
- **Changelogs served**: `/var/www/gururmm/changelogs/`
- **Changelogs in repo**: `/home/guru/gururmm/changelogs/`
## Commands & Outputs
```bash
# Restore discovery node
sudo systemctl restart gururmm-agent
# Build agent 0.6.21 (server-side)
source ~/.cargo/env && cd /home/guru/gururmm/agent && cargo build --release
# Finished release in 1m 24s
# Deploy binary + sha256
sudo cp agent/target/release/gururmm-agent /var/www/gururmm/downloads/gururmm-agent-linux-amd64-0.6.21
sha256sum /var/www/gururmm/downloads/gururmm-agent-linux-amd64-0.6.21 | awk '{print $1}' | sudo tee ...sha256
# SHA256: 54637a82d113471fe11983800bf0ef207ec250dcaf1b2fe2cfd15e2e03cd8b76
# Build server with changelog endpoint
source ~/.cargo/env && cd /home/guru/gururmm/server && cargo build --release
# Finished in 4m 28s
# Test endpoints
curl http://localhost:3001/api/changelog/agent/latest # 200 text/markdown
curl http://localhost:3001/api/changelog/agent/0.6.21 # 200
curl http://localhost:3001/api/changelog/server/latest # 200
# Auto-update log (agent, 16:12:02 UTC)
# INFO Received update command: 0.6.20 -> 0.6.21 (id: 3721cb41-e87c-487e-899e-079186ff8dd5)
# INFO Downloading from https://rmm-api.azcomputerguru.com/downloads/gururmm-agent-linux-amd64-0.6.21
# INFO Exiting for service restart by systemd
# INFO Server confirmed update success — cleaning up rollback artifacts
```
## Pending / Incomplete Tasks
- **BB-SERVER enrollment loop**: duplicate key `idx_agents_site_device` every ~10s — pre-existing, unresolved
- **Windows/macOS agent builds**: 0.6.21 not built for Windows or macOS
- **LHM bundling in MSI**: LibreHardwareMonitor not in build pipeline
- **Build lock**: `build-all-platforms.sh` has no `flock` mutex
- **Portal changelog page**: API endpoints exist; no dashboard UI to display them yet
- **Tray changelog link**: no `changelog_url` in TrayPolicy yet
- **Policy wiring plan** (`ticklish-questing-stallman.md`): Still deferred
- **IMC1 Unicode escape sequence** in hardware inventory JSON: unresolved
## Reference Information
- **Commits (gururmm repo)**:
- `1849733` — fix(agent): replace resetting sleep with sleep_until for zombie connection detection
- `b8809c5` — feat: add automated changelog generation for agent and server builds
- `52b5695` — feat(server): add changelog API endpoints + deploy-to-serve in generate script
- **Changelog API**:
- `GET https://rmm-api.azcomputerguru.com/api/changelog/agent/latest`
- `GET https://rmm-api.azcomputerguru.com/api/changelog/server/latest`
- `GET https://rmm-api.azcomputerguru.com/api/changelog/agent/0.6.21`
- **Agent 0.6.21 SHA256**: `54637a82d113471fe11983800bf0ef207ec250dcaf1b2fe2cfd15e2e03cd8b76`
- **Auto-update dispatch**: 2026-05-15T16:12:02Z, update_id `3721cb41-e87c-487e-899e-079186ff8dd5`
- **Key file**: `agent/src/transport/websocket.rs``last_incoming` at line ~279, `sleep_until` at line ~361
- **Key file**: `server/src/api/changelog.rs`
- **Key file**: `scripts/generate-changelog.sh`
---
## Update: 15:20 PT — Pluto SSH recovery, Defender removal, build pipeline repair, perf test
## User
- **User:** Mike Swanson (mike)
- **Machine:** DESKTOP-0O8A1RL
- **Role:** admin
- **Session span:** ~18:00 UTC 22:20 UTC 2026-05-15 (continued from prior context window)
## Session Summary
The session opened with Pluto (172.16.3.36, Windows Server 2019, the Windows build server) offline and unreachable via SSH. Pluto had been unreachable since at least the prior session. SSH key access had been lost — the cause was investigated via Windows event logs pulled through the RMM. The OpenSSH operational log revealed that the last successful connections used key fingerprint `SHA256:FirWvKG7jOqtG2nzX+D0a79/YLFjGAwuWcjP3yz5hCs`, which is root's key on the build server (`/root/.ssh/id_ed25519`), not the guru user's key. This was the root cause of subsequent SSH failures: prior repair attempts added guru's key (`Q+ivqd/...`) instead of root's key. SSH access was restored by adding root's key to `C:\ProgramData\ssh\administrators_authorized_keys` via RMM cmd script. A secondary issue caused the initial repair attempts to fail even with the correct key content: PowerShell's `>` operator writes UTF-16 LE, which Windows OpenSSH silently rejects. The file must be written with explicit ASCII encoding via `[System.IO.File]::WriteAllText(..., [System.Text.Encoding]::ASCII)`. Once both the correct key and correct encoding were in place, SSH worked.
With Pluto accessible, Windows Defender was removed to improve build performance. `Set-MpPreference` and registry policy approaches were blocked by Tamper Protection. DISM failed due to wrong flag syntax for Server 2019. `Uninstall-WindowsFeature` fails over SSH due to a Windows console I/O buffer issue. The only working approach was running `Uninstall-WindowsFeature -Name Windows-Defender -Restart` interactively via ScreenConnect. Pluto rebooted, Defender was fully removed.
With Defender gone, the build pipeline was repaired end-to-end. Three separate issues prevented automatic builds from firing. First: Gitea 1.25.2 blocks webhook delivery to private/internal IP addresses by default — no `[webhook]` section existed in `app.ini`, so all push events were silently dropped. Fix: added `ALLOWED_HOST_LIST = *` to `app.ini` and restarted the Gitea container. Second: the webhook handler (`/opt/gururmm/webhook-handler.py`) used `subprocess.Popen` without ever calling `proc.wait()`, causing every completed build to leave a zombie sudo process. `os.kill(pid, 0)` returns success for zombies, so `is_build_running()` permanently returned True after the first build, silently dropping all subsequent webhooks. Fix: moved build execution to a daemon thread that calls `proc.wait()` and removes the lock file on completion. Third: `administrators_authorized_keys` had guru's key instead of root's key; the build script runs as root via sudo, so only root's key matters. Fix: added root's key via RMM alongside guru's key.
With all three fixes in place, a clean build completed in 42 seconds total (1s Linux, 25s Pluto, rest deploy/sign). The previous baseline with Defender enabled was 367 seconds — an 8.7x speedup. Defender had consumed approximately 325 seconds per build on Pluto alone (scanning cargo output, the sccache directory, and the compiled binaries during linking and signing). A Gitea webhook to the Pluto password (`Paper123!@#`) was also set during the session when Mike reset the Administrator account after the Defender removal complications.
## Key Decisions
- **ASCII encoding for authorized_keys**: PowerShell's `>` and `Out-File` default to UTF-16 LE. Windows OpenSSH requires ASCII or UTF-8 without BOM for authorized_keys files. Silently fails with no error message — looks like a permissions issue. Use `[System.IO.File]::WriteAllText` with `[System.Text.Encoding]::ASCII` exclusively.
- **Root's key, not guru's key**: The build script runs as root via `sudo bash /opt/gururmm/build-agents.sh`. SSH connections to Pluto use `/root/.ssh/id_ed25519`, not `/home/guru/.ssh/id_ed25519`. Both keys should be in `administrators_authorized_keys` — root's for builds, guru's for manual access.
- **Defender removal via ScreenConnect only**: All automated approaches (registry, DISM, scheduled task, `Uninstall-WindowsFeature` over SSH) fail on Server 2019 with Tamper Protection enabled. Interactive console is required. Not worth automating further.
- **Thread-based build dispatch in webhook handler**: Alternative was fixing `is_build_running()` to detect zombies via `/proc/<pid>/status`. Thread approach is cleaner: `proc.wait()` in the thread reaps the child and removes the lock atomically. Lock file is only present while the build is actively running.
- **No manual build runs**: Rule established (and saved to memory) — `build-agents.sh` must only be triggered via the Gitea webhook pipeline. Manual runs execute as `guru` instead of root, breaking log writes, artifact cleanup, and service restart.
## Problems Encountered
- **SSH key wrong user**: Added guru's key to Pluto instead of root's key. Build pipeline uses root. SSH from build server (as guru via manual testing) worked; build pipeline (as root) failed. Fixed by adding root's key via RMM.
- **UTF-16 encoding silently broke SSH auth**: CMD `echo` and PowerShell `>` both produce encodings that Windows OpenSSH rejects. No error in sshd logs — just falls through to password auth. Resolution: `[System.IO.File]::WriteAllText` with explicit ASCII encoding.
- **Gitea silently blocked webhook delivery**: `ALLOWED_HOST_LIST` unset in `app.ini` caused Gitea 1.25.2 to drop all push webhook deliveries to 172.16.3.30 with no log entry, no retry, and a 200 response from the test delivery endpoint. Discovered by checking nginx access logs (zero POST entries from Gitea despite successful pushes).
- **Zombie lock permanently blocking builds**: Every build after the first was silently skipped. `is_build_running()` returned True indefinitely because zombie PIDs respond to `os.kill(pid, 0)`. Discovered by checking lock file PID against `ps` — process showed `<defunct>`. Fixed by reaping child in a thread.
- **Gitea app.ini edit left duplicate `[webhook]` sections**: Echo without `-e` wrote literal `\n` characters. Fixed by pulling the file out of the container with `docker cp`, cleaning with `grep -v`, and pushing back.
- **`Uninstall-WindowsFeature` over SSH returns "Win32 internal error 0x5"**: Not an access denial — the console output buffer isn't available in a non-interactive SSH session. This specific cmdlet requires a real console. Cannot be automated over SSH.
## Configuration Changes
| Location | File/Resource | Change |
|---|---|---|
| Gitea container | `/data/gitea/conf/app.ini` | Added `[webhook]\nALLOWED_HOST_LIST = *` |
| Build server | `/opt/gururmm/webhook-handler.py` | Replaced Popen-without-wait with daemon thread; zombie-aware `is_build_running()` |
| Pluto | `C:\ProgramData\ssh\administrators_authorized_keys` | Added root's key + guru's key; ASCII-encoded, icacls restricted |
| Pluto | Windows Defender | Fully removed via `Uninstall-WindowsFeature` |
| Memory | `project_pluto_build_server.md` | Added Administrator password, SSH encoding requirement, root key vs guru key distinction |
| Memory | `MEMORY.md` | Added GuruRMM build rule entry |
| Memory | `feedback_gururmm_builds.md` | New: no manual builds, always use webhook pipeline |
## Credentials & Secrets
- **Pluto Administrator password**: `Paper123!@#` (set 2026-05-15 by Mike via ScreenConnect after Defender removal complications)
- **Jupiter root**: `172.16.3.20` / `root` / `Th1nk3r^99##` — from vault `infrastructure/jupiter-unraid-primary.sops.yaml`
- **Jupiter iDRAC**: `172.16.1.73` / `root` / `Window123!@#-idrac`
- **Gitea API token**: `9b1da4b79a38ef782268341d25a4b6880572063f` (azcomputerguru account) — from vault `services/gitea.sops.yaml`
- **RMM API**: `claude-api@azcomputerguru.com` / `ClaudeAPI2026!@#``http://localhost:3001/api`
## Infrastructure & Servers
- **Pluto**: `172.16.3.36`, Windows Server 2019, VM on Jupiter. SSH: `Administrator@172.16.3.36`. Build pipeline SSHes as root (uses `/root/.ssh/id_ed25519`). Manual access uses guru's key.
- **Jupiter**: `172.16.3.20`, Unraid primary. SSH: `root@172.16.3.20`. 125 GB RAM total, 92 GB used (80 GB VMs, ~8 GB Docker). 33 GB available.
- **Jupiter VMs**: Windows Server 2016 (32 GB), GuruRMM (16 GB), OwnCloud (16 GB), Claude-Builder (8 GB), Unifi (8 GB)
- **Jupiter notable Docker containers**: seafile-elasticsearch (1.86 GB / 2 GB limit — at capacity), app (1.39 GB), seafile (1.13 GB), gitea (852 MB)
- **Gitea**: Docker container on Jupiter, port 3000 (internal). External: `https://git.azcomputerguru.com` (via Cloudflare). Always use `http://172.16.3.20:3000` for API calls.
- **Build webhook**: `POST http://172.16.3.30/webhook/build` → nginx → `http://127.0.0.1:9000``gururmm-webhook.service``/opt/gururmm/webhook-handler.py`
## Commands & Outputs
```bash
# SSH to build server
ssh guru@172.16.3.30
# SSH hop to Pluto (from build server)
ssh -o StrictHostKeyChecking=no Administrator@172.16.3.36 hostname
# Jupiter RAM check
ssh root@172.16.3.20 "free -h"
# Mem: 125Gi total, 92Gi used, 808Mi free, 34Gi buff/cache, 33Gi available
# Gitea webhook test delivery
curl -s -X POST 'http://172.16.3.20:3000/api/v1/repos/azcomputerguru/gururmm/hooks/1/tests' \
-H 'Authorization: token 9b1da4b79a38ef782268341d25a4b6880572063f'
# Trigger build via empty commit (correct method)
ssh guru@172.16.3.30 "cd /home/guru/gururmm && git commit --allow-empty -m 'chore: trigger build' && git push"
# Restart Gitea after app.ini change
ssh root@172.16.3.20 "docker restart gitea"
# Check webhook handler zombie issue
cat /var/run/gururmm-build.lock # showed PID
ps -p <PID> # showed <defunct>
rm /var/run/gururmm-build.lock # cleared stale lock
```
Build performance results:
```
Baseline (Defender on, warm sccache): 367s total
Post-Defender (warm sccache): 42s total
Linux agent: 1s (fully cached)
Pluto: 25s (cargo + WiX + 4 binaries)
Deploy/sign: 16s
Speedup: 8.7x
```
## Pending / Incomplete Tasks
- **Pluto password not in vault**: `infrastructure/pluto-build-server.sops.yaml` doesn't exist yet. Password `Paper123!@#` is in memory only. Mike to add to vault.
- **BB-SERVER enrollment loop**: duplicate key `idx_agents_site_device` — pre-existing, unresolved.
- **Windows 0.6.21 not yet distributed**: Pluto builds produce 0.6.21 Windows artifacts on each run. After today's fixes, they should now deploy correctly on future pushes. Verify next build publishes Windows artifacts.
- **IMC1 Unicode escape sequence** in hardware inventory: unresolved.
- **Policy wiring plan** (`ticklish-questing-stallman.md`): Deferred.
- **Portal changelog page**: API exists, no dashboard UI.
- **seafile-elasticsearch at container memory limit** (1.86 GB / 2 GB): Monitor — may need limit raised.
- **macOS agent builds**: Not yet implemented.
- **pre-commit hook not executable** on build server: `hint: The '/home/guru/gururmm/scripts/hooks/pre-commit' hook was ignored because it's not set as executable` — emitted on every commit. Low priority but noisy.
## Reference Information
- **Build pipeline commits (gururmm)**: `7773f49`, `44fef95`, `6eed227`, `106fce9`, `3e9ef32`, `509f901` (all empty trigger commits from this session)
- **Pluto agent ID (RMM)**: `5316f56f-a1b3-4ac5-97ac-71ddf6a74d2e`
- **Root SSH key fingerprint** (build server, used by pipeline): `SHA256:FirWvKG7jOqtG2nzX+D0a79/YLFjGAwuWcjP3yz5hCs``/root/.ssh/id_ed25519.pub`
- **Guru SSH key fingerprint** (build server, manual access): `SHA256:Q+ivqd/K3eKMqvLdwlkvNWKxvp3NyLt17PcxDwtykFs``/home/guru/.ssh/id_ed25519.pub`
- **Webhook handler**: `/opt/gururmm/webhook-handler.py``gururmm-webhook.service`, port 9000
- **Build script**: `/opt/gururmm/build-agents.sh` (production, runs as root via webhook)
- **Gitea webhook ID**: 1, repo `azcomputerguru/gururmm`, event `push`, URL `http://172.16.3.30/webhook/build`
- **Gitea app.ini**: `/data/gitea/conf/app.ini` inside `gitea` Docker container on Jupiter