11 KiB
User
- User: Mike Swanson (mike)
- Machine: GURU-5070
- Role: admin
Session Summary
Three workstreams ran this session. First, a beta dashboard channel was stood up for GuruRMM: rmm-beta.azcomputerguru.com now auto-builds on every push to main (touching dashboard/) while production stays explicit promote-only, mirroring the agent beta/stable model. This added build-dashboard.sh (webhook-dispatched, change-gated, deploys to /var/www/gururmm/dashboard-beta only), promote-dashboard.sh (dry-run/--confirm/--rollback, the sole sanctioned prod path), a second nginx vhost with an nginx-layer sub_filter BETA banner (so the build stays byte-identical and promotion is a plain rsync), a Cloudflare rmm-beta record, and Jupiter NPM proxy host id=11. Code review flagged three HIGH issues (unknown-arg rejection, backup-before-delete, marker-on-failed-deploy), all fixed and verified. Committed: gururmm 23f43ef, ClaudeTools 11d2b17. Detail: the recovered beta-dashboard session log.
Second, Grok's harness-parity attempt at "remove the SBS machine from mspbackups for Glaztech" was diagnosed. Grok reported success off two HTTP 200s but never verified; live checks showed the SBS computer was only disabled, not removed. The correct delete path (DELETE /api/Users/{id}?deleteUserData=true, per the Saguaro precedent) is blocked on this account by a "Not Acceptable personal user" guard (expired-trial/personal classification), and there is no spare Server license to lift it — so the removal genuinely requires the MSP360 web portal. A post-mortem for Grok training was written (docs/session-notes/2026-06-03-claude-postmortem-grok-mspbackups-sbs.md), capturing the core lesson: on this API no single HTTP response is authoritative; only re-reading state over time is.
Third, and the bulk of the session, was an MSP360/Backblaze B2 storage investigation that started from "admin@azcomputerguru.com shows 13TB with no computers." This established that MSP360 dashboard storage is a plan-derived internal tally refreshed daily, not a live read of B2, and is unreliable in both directions. A full per-prefix physical scan of the generic bucket MSPBackups20200311 measured 51.015 TB across 28 prefixes (11.33M versions); ~48 TB is active client backups and only ~3-4 TB is genuinely orphaned. The phantom accounts (admin@ 15.21 TB, vland 12.57 TB) were explained: their data was already deleted by prior B2 lifecycle purge rules, and MSP360's counters froze.
After verifying real B2 sizes and freshness per prefix, an orphan purge of ~3.2 TB was scheduled via B2 lifecycle rules (completes 24-48h): Tucson Safety, bestmassage-generic (confirmed live copy is in ACG-PST), Saguaro x3, seastman CBB_GTI-EXCHANGE, mike@ webhost, and one rogue prefix. Critically, the lifecycle-rule verification uncovered pre-existing purge rules (from a prior cleanup) sitting on ACTIVE accounts — including rohrbach's CBB_MIKE-THINK (13.23 TB, backing up that day) and diegobuilder. Those were removed protectively before B2's daily pass could fire, along with a CBB_MAS90SVR rule and an over-broad account-root rule, to protect MAS90SVR pending Mike's check. The work was then parked with coord todos and a detailed resume note.
Key Decisions
- Beta dashboard branded via nginx
sub_filterrather than a code change, keeping the build byte-identical to prod so promotion is a pure rsync (one artifact, two channels). - Beta reuses the auto-renewing
rmm/rmm-apiLE cert (id=10) since Cloudflare SSL mode is "Full"; documented the Full-Strict dependency as a follow-up. - Treated MSP360's reported per-user/destination volumes as unreliable after they failed to reconcile to B2 physical bytes; drove all delete decisions from B2 ground-truth scans instead.
- Used the freshness check (newest object timestamp) to resolve dual-destination accounts: rohrbach 13.23 TB is still live on generic (not reclaimable); bestmassage 2.47 TB is frozen since 2024-08 (migrated to ACG-PST, reclaimable).
- Removed purge rules on active accounts immediately as a protective (non-destructive) action — removing a lifecycle rule deletes no data, and 13 TB of live backup was at risk.
- Narrowed the seastman purge from account-root to
CBB_GTI-EXCHANGEonly after Mike confirmed GTI-EXCHANGE but wanted to check MAS90SVR — account-root purges are risky precisely because they sweep in sibling machines. - Held beckykahn (475 GB) from purge: it stopped backing up 2026-04-14 (recent), so it is a possibly-broken backup, not a long-abandoned orphan.
- Did not delete the SBS MSP360 user via the data-deleting routes (opaque blast radius); the metadata-only
DELETE /api/Users/{id}/Accountis the no-bucket-touch path for phantom records.
Problems Encountered
- Grok's "removal done" was a false completion — fixed by re-reading MSP360 state, which showed disabled-not-removed; root-caused to no verification loop + trusting HTTP status.
- MSP360 user-delete returned 200-but-noop (bare) then 400 "personal user" (with
deleteUserData); resolved by reading/Help, findingDELETE /api/Users/{userId}/Computers, then confirming the account's "personal" guard blocks all data-delete routes — portal required. - MSP360 API DNS fails locally; worked around by pinning IP
52.6.7.137with correct SNI (Python forced-IP HTTPS connection / curl--resolve). - MSP360 storage numbers didn't reconcile to B2; resolved by running a full per-prefix physical scan (B2 = ground truth) and discovering prior lifecycle purges explained the phantoms.
- My account-root purge of seastman swept in MAS90SVR (which Mike wanted to keep); fixed by
lifecycle-removeof the account-root +CBB_MAS90SVRrules, leaving onlyCBB_GTI-EXCHANGE. - Pre-existing purge rules endangered active backups (rohrbach 13.23 TB); removed before B2's daily lifecycle pass, data confirmed intact.
Configuration Changes
GuruRMM repo (projects/msp-tools/guru-rmm, commit 23f43ef):
- NEW
deploy/build-pipeline/build-dashboard.sh,deploy/build-pipeline/promote-dashboard.sh,deploy/nginx/rmm-beta.conf - EDIT
deploy/build-pipeline/webhook-handler.py(DASHBOARD dispatch),build-shared.sh(self-sync list),README.md,CONTEXT.md
ClaudeTools repo (commit 11d2b17): wiki/projects/gururmm.md, .claude/memory/feedback_dashboard_beta_first.md, .claude/memory/MEMORY.md, guru-rmm submodule pointer.
Server 172.16.3.30: created /var/www/gururmm/dashboard-beta/, installed /etc/nginx/sites-enabled/gururmm-beta, deployed the 4 pipeline scripts to /opt/gururmm/, restarted gururmm-webhook.
Cloudflare: added rmm-beta.azcomputerguru.com A -> 72.194.62.10 (proxied). Jupiter NPM: added proxy host id=11 (rmm-beta -> .30:80, cert id=10).
B2 bucket MSPBackups20200311: added 5 + 3-existing lifecycle purge rules (orphan purge); removed 4 rules protectively (rohrbach MIKE-THINK, diegobuilder, seastman MAS90SVR + account-root). End state: 21 rules, revision 63.
This-session detail logs: session-logs/2026-06-03-mspbackups-b2-storage-cleanup.md, docs/session-notes/2026-06-03-claude-postmortem-grok-mspbackups-sbs.md, session-logs/2026-06-03-recovered-create-beta-version-of-dashboard.md.
Credentials & Secrets
- MSP360 API: vault
msp-tools/msp360-api.sops.yaml(login/password). SECURITY: the plaintext login/password are leaked in several already-committed session logs (2026-05-18,2026-06-01,2026-06-02-mike-saguaro-mspbackups-deletion, plus Grok's two 2026-06-03 files). Rotate the MSP360 API key and scrub those logs. Not re-pasted here by design. - Backblaze B2: vault
projects/claudetools/backblaze-b2.sops.yaml(key_id / credentials.application_key); the skill uses theClaudeToolskey. - Cloudflare: vault
services/cloudflare.sops.yaml(api_token_full_dns, zone_id1beb9917c22b54be32e5215df2c227ce). NPM admin: vaultservices/npm.sops.yaml(mike@ / admin) + a cloudflare DNS token. - gururmm-server sudo/SSH: vault
infrastructure/gururmm-server.sops.yaml.
Infrastructure & Servers
- GuruRMM server
172.16.3.30: nginx (proddashboard/+ betadashboard-beta/), API :3001, webhook :9000, builds in/opt/gururmm/. - Jupiter
172.16.3.20: NPM (admin :7818); proxy host id=11 = rmm-beta. - MSP360 API
https://api.mspbackups.com(pin52.6.7.137);/Help= full endpoint docs. - B2 account
46f69bc61163(us-west-001); generic bucketMSPBackups20200311bucketIdb4268f56790bccc671010613(51.015 TB); storage accountMSPBackupsAccountIDad1b19fd-9350-4fa2-9a07-6200fca14797.ACG-IXbucket has no MSP360 destination (empty). No bucket has immutability.
Commands & Outputs
- B2 per-prefix scan:
py .claude/skills/b2/scripts/b2.py bucket-size MSPBackups20200311-> 51,011,665,961,467 bytes, 11,326,538 files. - Orphan purge:
b2.py delete-prefix MSPBackups20200311 <prefixes...> --allow-account-root --confirm-> rules written (rev 62). - Protective removal:
b2.py lifecycle-remove MSPBackups20200311 "MBS-a39b0d4c-.../CBB_MIKE-THINK/" "MBS-651e4c79-.../" "MBS-c064d061-.../CBB_MAS90SVR/" "MBS-c064d061-.../" --confirm-> rev 63, 21 rules. - MSP360 correct delete (blocked here):
DELETE /api/Users/{id}?deleteUserData=true-> 400 "Not Acceptable personal user";DELETE /api/Users/{userId}/Computers(body[{DestinationId,ComputerName}]) -> 400.
Pending / Incomplete Tasks
- SAFETY: audit ALL B2 buckets' lifecycle rules for purge rules on ACTIVE backups (coord todo
2e50f388) — do first. - After 2026-06-05: verify ~3.2 TB drop in
MSPBackups20200311, thenlifecycle-removethe spent purge rules (tododc3a6233). - Decisions (todo
0fed5eb2): MAS90SVR keep/purge; beckykahn/LAB-BECKY decommission confirm; confirm rohrbach+diegobuilder removal correct. - Glaztech SBS removal needs the MSP360 portal (todo
db03f8fe). - Phantom MSP360 user cleanup (admin@/vland/etc.) via metadata-only
/Accountdelete (cosmetic). - Active-user migration off the generic bucket (~48 TB) — console work; not a cost play.
- Rotate the leaked MSP360 API key + scrub plaintext from committed logs.
- beta dashboard: register
rmm-betaEntra redirect URI only if SSO-on-beta wanted (JWT login works now); issue a dedicated rmm-beta cert if CF ever set to Full-Strict (todo filed earlier).
Reference Information
- beta dashboard: https://rmm-beta.azcomputerguru.com ; prod https://rmm.azcomputerguru.com ; promote
sudo /opt/gururmm/promote-dashboard.sh --confirm. - Commits: gururmm
23f43ef, ClaudeTools11d2b17. - Detail logs:
session-logs/2026-06-03-mspbackups-b2-storage-cleanup.md,docs/session-notes/2026-06-03-claude-postmortem-grok-mspbackups-sbs.md. - Coord todos:
2e50f388,dc3a6233,0fed5eb2,db03f8fe(+ earlier beta/cert + Howard beta-first todosccf8980e).