sync: auto-sync from GURU-5070 at 2026-06-21 18:11:16

Author: Mike Swanson
Machine: GURU-5070
Timestamp: 2026-06-21 18:11:16
This commit is contained in:
2026-06-21 18:12:01 -07:00
parent 2a7dd2d4a0
commit 0c3ae5d33d

View File

@@ -0,0 +1,161 @@
# Session — RMM BUG-019 merge, gururmm-build skill, errorlog lint, coord purge
## User
- **User:** Mike Swanson (mike)
- **Machine:** GURU-5070
- **Role:** admin
## Session Summary
Started as routine `/sync` runs that surfaced a backlog of coord questions from Howard. Answered
all three open threads via coord: pfSense SSH backend cred-path (approved Option A — slug arg
containing `/` is used as a literal vault path), BUG-018 approach (fast 202 + background purge), and
the Tier-1 RMM items (Mail.Send already lives in the suite; packetdial still has no NetSapiens key).
A verification step found the 2a/2b answers had been made in commits earlier in the day but never
communicated to Howard via coord, so a corrected reply was sent pointing at the actual decisions
rather than a non-existent prior message.
Merged GuruRMM BUG-019 to `main`. The guru-rmm submodule pushes to `git.azcomputerguru.com` (a
different host than the claudetools parent's HTTP remote), and the gitea skill can't inject creds
there, nor is the Gitea SSH key authorized on GURU-5070 — so the working auth path was the vaulted
`services/gitea` API token via `GIT_ASKPASS`. Fast-forwarded `ed8cad3 -> 66a7f4e`; CI's version-bump
bot committed `8b5e0dc` on top, confirming the pipeline fired. Verified via the gururmm server's own
root RMM agent that the Linux agent build completed clean (v0.6.67, 0 errors, marker `8b5e0dc`),
proving the container-gated `is_docker_container()` compile gate — which Howard could not cross-build
from Windows — passed on Linux. Agent landed on the beta channel.
Mike asked that all Claude instances internalize the build process so the fleet builds the same way.
Built a `gururmm-build` skill (`verify.sh server|agent|dashboard|migrations`) plus a developer-facing
`docs/BUILD.md` in guru-rmm and refreshed the stale CONTEXT.md build-pipeline section, after reading
the actual `deploy/build-pipeline/*.sh` scripts to keep the docs accurate. Captured the core rules in
a `feedback_gururmm_build_verification` memory (merge-to-main = build+deploy; Windows cargo can't gate
Linux agent code; server needs SQLX_OFFLINE + fresh `.sqlx`; check migration-number collisions).
Audited the memory store for stale `fabb3421` references and fixed the one stale entry
(`reference_resource_map`), then broadcast both learnings (build verification + fabb3421-deleted/use-
the-suite) to the fleet.
Ran an errorlog lint at Mike's request. The dominant signal (~70% of entries) was bitdefender
`gz.py` logging expected GravityZone validation/probe errors; Howard had already built suppression,
but one marker gap remained (`"Missing name '...' in 'options' object"`), which was added and verified
against all 10 distinct spam phrasings. Secondary findings: the recurring auto-synced-submodule
detached-HEAD friction (captured as a new `feedback_submodule_autosync_discipline` memory) and the
Windows `/tmp`/quote-stripping traps that keep recurring despite living in memory (promoted to the
always-loaded CLAUDE.md CORE Windows bullet). Mike corrected the assumption that GuruRMM merges are
Mike-only — Howard can handle merges/deploys himself — which was logged as a correction and written
into the approval-workflow memory; Howard was then cleared to merge BUG-018 and told to merge it now.
Finally, purged 208 dealt-with coord messages (kept the 29 from 2026-06-19 onward), hitting the
known jq-on-Windows CRLF trap mid-loop, and added a reusable `msg purge --before` command (dry-run by
default) to the coord skill so future cleanup doesn't need an ad-hoc curl loop.
## Key Decisions
- **pfSense backend cred-path:** Option A — if the slug arg contains `/`, use it verbatim as the
vault path; else treat as a client slug. No cred duplication; gw-audit/gw-control inherit it.
- **BUG-018 fix:** fast 202 Accepted + background/scheduled child-row purge (volume problem, not
indexes); 204->202 contract change accepted. Assigned to Howard.
- **Mailbox architecture stays split (settled):** `/mailbox` (ACG own-mail) on single-tenant
`1873b1b0`; client mail send on the suite's Exchange Operator `b43e7342` (already holds Graph
Mail.Send). `fabb3421` is deleted — do not reference.
- **Cross-host submodule auth on GURU-5070:** use the vaulted `services/gitea` API token via
`GIT_ASKPASS` (gitea skill can't inject — parent is HTTP, submodule is a different host; SSH key
not authorized). Push merges by explicit SHA.
- **Build knowledge as skill + doc + memory:** `gururmm-build` skill is local pre-merge verification
only (does NOT trigger the prod build — that's the webhook on merge). `docs/BUILD.md` is the human
guide; `deploy/build-pipeline/README.md` remains the server-side source of truth.
- **Windows /tmp + quote traps promoted to CLAUDE.md CORE** rather than left in memory — they recur
because a memory may not be recalled; CORE is always loaded.
- **Howard cleared for GuruRMM merges/deploys** (Mike). Mike still owns RMM architecture/direction;
prepared+verified branches no longer bottleneck on Mike to land.
- **Coord purge cutoff:** delete everything before 2026-06-19 (208 msgs); keep the last few days +
today's live threads (29). Coordination messages are ephemeral; durable record is session logs.
- **`msg purge` safety:** `--before` required (can't wipe the store by accident), dry-run by default,
`--yes` to delete — mirrors `promote-dashboard.sh`.
## Problems Encountered
- **gitea skill couldn't push the guru-rmm submodule:** it bails inside the submodule ("locate git
repo") and its cred-injection only works for HTTPS-with-embedded-auth parents (claudetools is HTTP).
Gitea SSH key not authorized on GURU-5070 (`Permission denied (publickey)`). Resolved with the
vaulted API token via `GIT_ASKPASS`.
- **Docs push rejected non-fast-forward:** remote main moved (CI/other session). Resolved by fetch +
rebase of the docs commit onto the new tip, then push (`572435f`).
- **RMM agent `sudo -u guru` blocked** when checking build state on `.30`: the agent runs under
systemd `NoNewPrivileges`, so sudo failed. The non-sudo reads (build log, marker, downloads dir,
Cargo.toml version) still answered the question, so no workaround needed.
- **Coord purge loop returned HTTP 000 on all 208 deletes:** jq-on-Windows emits CRLF, so each ID
carried a trailing `\r` that broke the curl URL. A single manual DELETE worked (200), proving the
endpoint is fine. Resolved by regenerating the ID list through `tr -d '\r'` + trimming in the read
loop; re-ran clean (207/207 + 1 = 208). Logged as `--friction` (documented-gotcha repeat).
- **`?limit=2000` on coord messages returned a validation error** (cap is 1000). 237 msgs fit in one
page; the new `msg purge` paginates over the cap with `skip`.
## Configuration Changes
Created:
- `.claude/skills/gururmm-build/SKILL.md` — new skill (build + pre-merge verification).
- `.claude/skills/gururmm-build/scripts/verify.sh` — local verify (server/agent/dashboard/migrations).
- `.claude/memory/feedback_gururmm_build_verification.md`
- `.claude/memory/feedback_submodule_autosync_discipline.md`
- `projects/msp-tools/guru-rmm/docs/BUILD.md` (guru-rmm repo)
- `session-logs/2026-06/2026-06-21-mike-rmm-build-skill-errorlog-lint-coord-purge.md` (this file)
Modified:
- `.claude/skills/bitdefender/scripts/gz.py` — added `"missing name"` to `_EXPECTED_ERROR_MARKERS`.
- `.claude/skills/coord/scripts/coord.py` + `SKILL.md` — added `msg purge --before` command.
- `.claude/CLAUDE.md` — Windows CORE bullet: /tmp path + curl.exe/plink quote-stripping hard rules.
- `.claude/memory/MEMORY.md` — index lines for the two new memories + updated approval-workflow line.
- `.claude/memory/approval-workflow-tools-vs-projects.md` — Howard cleared for GuruRMM merges/deploys.
- `.claude/memory/reference_resource_map.md` — replaced stale `fabb3421` cred ref with the suite.
- `projects/msp-tools/guru-rmm/CONTEXT.md` — refreshed the stale Build Pipeline section.
## Credentials & Secrets
- No new credentials created or discovered. Used the existing vaulted Gitea API token at
`services/gitea` field `credentials.api.api-token` (via `vault.sh get-field`) for cross-host
guru-rmm pushes through `GIT_ASKPASS`. Username embedded in URL (`azcomputerguru@...`), token as
password; never written to argv/history/git config.
- packetdial/NetSapiens: still NO key vaulted (`msp-tools/oitvoip.sops.yaml` empty/absent). Skill
remains non-functional until the key is obtained — parked.
## Infrastructure & Servers
- GuruRMM server: `172.16.3.30` (API `:3001`, coord API `:8001/api/coord`). Build host = same box;
repo at `/home/guru/gururmm`; downloads `/var/www/gururmm/downloads`; build logs
`/var/log/gururmm-build-{linux,windows,server,dashboard}.log`.
- gururmm git remote: `https://git.azcomputerguru.com/azcomputerguru/gururmm.git` (Gitea at
`172.16.3.20`, HTTP `:3000`, SSH `:2222`). Gitea SSH `.20:2222` was UP from GURU-5070 (port open,
sshd answers); Howard reported a transient "refused 3x" from the build host ~00:23, after the
BUG-019 build completed at 00:12.
- Windows agent build hosts: Beast (`100.101.122.4`, Tailscale, primary) / Pluto (`172.16.3.36`,
fallback). Agent beta channel; dashboard beta at `rmm-beta.azcomputerguru.com`.
- gururmm own RMM agent: hostname `gururmm`, id `9b92b187-98c7-41b0-9e97-1698d263c42d`, os linux
(runs under systemd NoNewPrivileges — no sudo via the agent).
## Commands & Outputs
- Cross-host submodule push (token via askpass, no token in argv):
`GIT_ASKPASS=<script> GIT_TERMINAL_PROMPT=0 git -c credential.helper= push "https://azcomputerguru@git.azcomputerguru.com/azcomputerguru/gururmm.git" <sha>:refs/heads/main`
- BUG-019 Linux build confirmation (build-linux.log tail): `=== Linux build complete: v0.6.67 in 54s ===`,
`Finished release profile [optimized] in 53.64s`, 0 errors; `last-built-commit-linux = 8b5e0dc`.
- Coord purge (after CRLF fix): `deleted ok: 207 failed: 0`; remaining total 29, oldest kept 2026-06-19.
- New command: `coord.py msg purge --before YYYY-MM-DD [--to <session>] [--yes]` (dry-run default).
- bitdefender suppression verify: all 10 real spam phrasings -> `_is_expected_error` True; genuine
"Connection refused" -> still logs.
## Pending / Incomplete Tasks
- **BUG-018** (`fix/bug-018-fast-delete`, commit `cea87d4`): Howard cleared + told to merge now
(=deploy). Watch `/var/log/gururmm-build-server.log`; server-only change, no migration. Optional
follow-ups: dashboard optimistic-remove for the 202; bulk-delete endpoint (#3).
- **packetdial:** needs the NetSapiens API key vaulted before the skill works. Blocked on the key/portal.
- **Migration order on gururmm main:** 060 BUG-019 (merged), 061 BUG-018 (#41), 062 MSP360 (#42),
063 SPEC-021 (#40, renumbered from 060 to resolve the collision).
- **Optional:** `/wiki-compile` not run this session (decoupled). No single-article slug implied
(root session log).
## Reference Information
- gururmm commits: BUG-019 merge `66a7f4e` (CI bump `8b5e0dc`, agent v0.6.67); docs `572435f`
(BUILD.md + CONTEXT.md). Earlier fabb purge: `f55b8d25`, `1f65facb`.
- claudetools commits (this session): `f8c33c90` (gururmm-build skill + build-verification memory +
fabb fix), `ef55121d` (errorlog lint follow-ups: bitdefender marker + submodule memory + Windows
CORE), `dd033289`/`615b6f2a` (Howard merge authority), `c281d253`/`6e96ec42` (coord msg purge cmd).
- Coord messages sent: `60b119de` (3 decisions), `c8c28d80` (2a/2b correction), `f6fcd546` (BUG-019
merged), `15207219` (fleet broadcast), `8b2989f1` (BUG-019 verified), `3a5e0ca1` (bitdefender fix),
`3b2892df` (Howard cleared to merge), `1ed75523` (merge BUG-018 now).
- App IDs: Exchange Operator `b43e7342` (Graph Mail.Send), Security Investigator `bfbc12a4`, User
Manager `64fac46b`, Tenant Admin `709e6eed`, Defender `dbf8ad1a`, ACG Mailbox `1873b1b0`. Deleted:
`fabb3421` (Claude-MSP-Access, AADSTS700016).
- New endpoint used: `DELETE /api/coord/messages/{id}` (coord API).