sync: auto-sync from GURU-5070 at 2026-06-03 20:07:24
Author: Mike Swanson Machine: GURU-5070 Timestamp: 2026-06-03 20:07:24
This commit is contained in:
89
session-logs/2026-06-03-session.md
Normal file
89
session-logs/2026-06-03-session.md
Normal file
@@ -0,0 +1,89 @@
|
||||
## User
|
||||
- **User:** Mike Swanson (mike)
|
||||
- **Machine:** GURU-5070
|
||||
- **Role:** admin
|
||||
|
||||
## Session Summary
|
||||
|
||||
Three workstreams ran this session. First, a beta dashboard channel was stood up for GuruRMM: `rmm-beta.azcomputerguru.com` now auto-builds on every push to `main` (touching `dashboard/`) while production stays explicit promote-only, mirroring the agent beta/stable model. This added `build-dashboard.sh` (webhook-dispatched, change-gated, deploys to `/var/www/gururmm/dashboard-beta` only), `promote-dashboard.sh` (dry-run/`--confirm`/`--rollback`, the sole sanctioned prod path), a second nginx vhost with an nginx-layer `sub_filter` BETA banner (so the build stays byte-identical and promotion is a plain rsync), a Cloudflare `rmm-beta` record, and Jupiter NPM proxy host id=11. Code review flagged three HIGH issues (unknown-arg rejection, backup-before-delete, marker-on-failed-deploy), all fixed and verified. Committed: gururmm `23f43ef`, ClaudeTools `11d2b17`. Detail: the recovered beta-dashboard session log.
|
||||
|
||||
Second, Grok's harness-parity attempt at "remove the SBS machine from mspbackups for Glaztech" was diagnosed. Grok reported success off two HTTP 200s but never verified; live checks showed the SBS computer was only disabled, not removed. The correct delete path (`DELETE /api/Users/{id}?deleteUserData=true`, per the Saguaro precedent) is blocked on this account by a "Not Acceptable personal user" guard (expired-trial/personal classification), and there is no spare Server license to lift it — so the removal genuinely requires the MSP360 web portal. A post-mortem for Grok training was written (`docs/session-notes/2026-06-03-claude-postmortem-grok-mspbackups-sbs.md`), capturing the core lesson: on this API no single HTTP response is authoritative; only re-reading state over time is.
|
||||
|
||||
Third, and the bulk of the session, was an MSP360/Backblaze B2 storage investigation that started from "admin@azcomputerguru.com shows 13TB with no computers." This established that MSP360 dashboard storage is a plan-derived internal tally refreshed daily, not a live read of B2, and is unreliable in both directions. A full per-prefix physical scan of the generic bucket `MSPBackups20200311` measured 51.015 TB across 28 prefixes (11.33M versions); ~48 TB is active client backups and only ~3-4 TB is genuinely orphaned. The phantom accounts (admin@ 15.21 TB, vland 12.57 TB) were explained: their data was already deleted by prior B2 lifecycle purge rules, and MSP360's counters froze.
|
||||
|
||||
After verifying real B2 sizes and freshness per prefix, an orphan purge of ~3.2 TB was scheduled via B2 lifecycle rules (completes 24-48h): Tucson Safety, bestmassage-generic (confirmed live copy is in ACG-PST), Saguaro x3, seastman `CBB_GTI-EXCHANGE`, mike@ webhost, and one rogue prefix. Critically, the lifecycle-rule verification uncovered pre-existing purge rules (from a prior cleanup) sitting on ACTIVE accounts — including rohrbach's `CBB_MIKE-THINK` (13.23 TB, backing up that day) and diegobuilder. Those were removed protectively before B2's daily pass could fire, along with a `CBB_MAS90SVR` rule and an over-broad account-root rule, to protect MAS90SVR pending Mike's check. The work was then parked with coord todos and a detailed resume note.
|
||||
|
||||
## Key Decisions
|
||||
|
||||
- Beta dashboard branded via nginx `sub_filter` rather than a code change, keeping the build byte-identical to prod so promotion is a pure rsync (one artifact, two channels).
|
||||
- Beta reuses the auto-renewing `rmm`/`rmm-api` LE cert (id=10) since Cloudflare SSL mode is "Full"; documented the Full-Strict dependency as a follow-up.
|
||||
- Treated MSP360's reported per-user/destination volumes as unreliable after they failed to reconcile to B2 physical bytes; drove all delete decisions from B2 ground-truth scans instead.
|
||||
- Used the freshness check (newest object timestamp) to resolve dual-destination accounts: rohrbach 13.23 TB is still live on generic (not reclaimable); bestmassage 2.47 TB is frozen since 2024-08 (migrated to ACG-PST, reclaimable).
|
||||
- Removed purge rules on active accounts immediately as a protective (non-destructive) action — removing a lifecycle rule deletes no data, and 13 TB of live backup was at risk.
|
||||
- Narrowed the seastman purge from account-root to `CBB_GTI-EXCHANGE` only after Mike confirmed GTI-EXCHANGE but wanted to check MAS90SVR — account-root purges are risky precisely because they sweep in sibling machines.
|
||||
- Held beckykahn (475 GB) from purge: it stopped backing up 2026-04-14 (recent), so it is a possibly-broken backup, not a long-abandoned orphan.
|
||||
- Did not delete the SBS MSP360 user via the data-deleting routes (opaque blast radius); the metadata-only `DELETE /api/Users/{id}/Account` is the no-bucket-touch path for phantom records.
|
||||
|
||||
## Problems Encountered
|
||||
|
||||
- Grok's "removal done" was a false completion — fixed by re-reading MSP360 state, which showed disabled-not-removed; root-caused to no verification loop + trusting HTTP status.
|
||||
- MSP360 user-delete returned 200-but-noop (bare) then 400 "personal user" (with `deleteUserData`); resolved by reading `/Help`, finding `DELETE /api/Users/{userId}/Computers`, then confirming the account's "personal" guard blocks all data-delete routes — portal required.
|
||||
- MSP360 API DNS fails locally; worked around by pinning IP `52.6.7.137` with correct SNI (Python forced-IP HTTPS connection / curl `--resolve`).
|
||||
- MSP360 storage numbers didn't reconcile to B2; resolved by running a full per-prefix physical scan (B2 = ground truth) and discovering prior lifecycle purges explained the phantoms.
|
||||
- My account-root purge of seastman swept in MAS90SVR (which Mike wanted to keep); fixed by `lifecycle-remove` of the account-root + `CBB_MAS90SVR` rules, leaving only `CBB_GTI-EXCHANGE`.
|
||||
- Pre-existing purge rules endangered active backups (rohrbach 13.23 TB); removed before B2's daily lifecycle pass, data confirmed intact.
|
||||
|
||||
## Configuration Changes
|
||||
|
||||
GuruRMM repo (`projects/msp-tools/guru-rmm`, commit `23f43ef`):
|
||||
- NEW `deploy/build-pipeline/build-dashboard.sh`, `deploy/build-pipeline/promote-dashboard.sh`, `deploy/nginx/rmm-beta.conf`
|
||||
- EDIT `deploy/build-pipeline/webhook-handler.py` (DASHBOARD dispatch), `build-shared.sh` (self-sync list), `README.md`, `CONTEXT.md`
|
||||
|
||||
ClaudeTools repo (commit `11d2b17`): `wiki/projects/gururmm.md`, `.claude/memory/feedback_dashboard_beta_first.md`, `.claude/memory/MEMORY.md`, guru-rmm submodule pointer.
|
||||
|
||||
Server `172.16.3.30`: created `/var/www/gururmm/dashboard-beta/`, installed `/etc/nginx/sites-enabled/gururmm-beta`, deployed the 4 pipeline scripts to `/opt/gururmm/`, restarted `gururmm-webhook`.
|
||||
|
||||
Cloudflare: added `rmm-beta.azcomputerguru.com` A -> `72.194.62.10` (proxied). Jupiter NPM: added proxy host id=11 (`rmm-beta` -> `.30:80`, cert id=10).
|
||||
|
||||
B2 bucket `MSPBackups20200311`: added 5 + 3-existing lifecycle purge rules (orphan purge); removed 4 rules protectively (rohrbach MIKE-THINK, diegobuilder, seastman MAS90SVR + account-root). End state: 21 rules, revision 63.
|
||||
|
||||
This-session detail logs: `session-logs/2026-06-03-mspbackups-b2-storage-cleanup.md`, `docs/session-notes/2026-06-03-claude-postmortem-grok-mspbackups-sbs.md`, `session-logs/2026-06-03-recovered-create-beta-version-of-dashboard.md`.
|
||||
|
||||
## Credentials & Secrets
|
||||
|
||||
- MSP360 API: vault `msp-tools/msp360-api.sops.yaml` (login/password). **SECURITY: the plaintext login/password are leaked in several already-committed session logs (`2026-05-18`, `2026-06-01`, `2026-06-02-mike-saguaro-mspbackups-deletion`, plus Grok's two 2026-06-03 files). Rotate the MSP360 API key and scrub those logs.** Not re-pasted here by design.
|
||||
- Backblaze B2: vault `projects/claudetools/backblaze-b2.sops.yaml` (key_id / credentials.application_key); the skill uses the `ClaudeTools` key.
|
||||
- Cloudflare: vault `services/cloudflare.sops.yaml` (`api_token_full_dns`, zone_id `1beb9917c22b54be32e5215df2c227ce`). NPM admin: vault `services/npm.sops.yaml` (mike@ / admin) + a cloudflare DNS token.
|
||||
- gururmm-server sudo/SSH: vault `infrastructure/gururmm-server.sops.yaml`.
|
||||
|
||||
## Infrastructure & Servers
|
||||
|
||||
- GuruRMM server `172.16.3.30`: nginx (prod `dashboard/` + beta `dashboard-beta/`), API :3001, webhook :9000, builds in `/opt/gururmm/`.
|
||||
- Jupiter `172.16.3.20`: NPM (admin :7818); proxy host id=11 = rmm-beta.
|
||||
- MSP360 API `https://api.mspbackups.com` (pin `52.6.7.137`); `/Help` = full endpoint docs.
|
||||
- B2 account `46f69bc61163` (us-west-001); generic bucket `MSPBackups20200311` bucketId `b4268f56790bccc671010613` (51.015 TB); storage account `MSPBackups` AccountID `ad1b19fd-9350-4fa2-9a07-6200fca14797`. `ACG-IX` bucket has no MSP360 destination (empty). No bucket has immutability.
|
||||
|
||||
## Commands & Outputs
|
||||
|
||||
- B2 per-prefix scan: `py .claude/skills/b2/scripts/b2.py bucket-size MSPBackups20200311` -> 51,011,665,961,467 bytes, 11,326,538 files.
|
||||
- Orphan purge: `b2.py delete-prefix MSPBackups20200311 <prefixes...> --allow-account-root --confirm` -> rules written (rev 62).
|
||||
- Protective removal: `b2.py lifecycle-remove MSPBackups20200311 "MBS-a39b0d4c-.../CBB_MIKE-THINK/" "MBS-651e4c79-.../" "MBS-c064d061-.../CBB_MAS90SVR/" "MBS-c064d061-.../" --confirm` -> rev 63, 21 rules.
|
||||
- MSP360 correct delete (blocked here): `DELETE /api/Users/{id}?deleteUserData=true` -> 400 "Not Acceptable personal user"; `DELETE /api/Users/{userId}/Computers` (body `[{DestinationId,ComputerName}]`) -> 400.
|
||||
|
||||
## Pending / Incomplete Tasks
|
||||
|
||||
- SAFETY: audit ALL B2 buckets' lifecycle rules for purge rules on ACTIVE backups (coord todo `2e50f388`) — do first.
|
||||
- After 2026-06-05: verify ~3.2 TB drop in `MSPBackups20200311`, then `lifecycle-remove` the spent purge rules (todo `dc3a6233`).
|
||||
- Decisions (todo `0fed5eb2`): MAS90SVR keep/purge; beckykahn/LAB-BECKY decommission confirm; confirm rohrbach+diegobuilder removal correct.
|
||||
- Glaztech SBS removal needs the MSP360 portal (todo `db03f8fe`).
|
||||
- Phantom MSP360 user cleanup (admin@/vland/etc.) via metadata-only `/Account` delete (cosmetic).
|
||||
- Active-user migration off the generic bucket (~48 TB) — console work; not a cost play.
|
||||
- Rotate the leaked MSP360 API key + scrub plaintext from committed logs.
|
||||
- beta dashboard: register `rmm-beta` Entra redirect URI only if SSO-on-beta wanted (JWT login works now); issue a dedicated rmm-beta cert if CF ever set to Full-Strict (todo filed earlier).
|
||||
|
||||
## Reference Information
|
||||
|
||||
- beta dashboard: https://rmm-beta.azcomputerguru.com ; prod https://rmm.azcomputerguru.com ; promote `sudo /opt/gururmm/promote-dashboard.sh --confirm`.
|
||||
- Commits: gururmm `23f43ef`, ClaudeTools `11d2b17`.
|
||||
- Detail logs: `session-logs/2026-06-03-mspbackups-b2-storage-cleanup.md`, `docs/session-notes/2026-06-03-claude-postmortem-grok-mspbackups-sbs.md`.
|
||||
- Coord todos: `2e50f388`, `dc3a6233`, `0fed5eb2`, `db03f8fe` (+ earlier beta/cert + Howard beta-first todos `ccf8980e`).
|
||||
Reference in New Issue
Block a user