claudetools/docs/session-notes/2026-06-03-claude-postmortem-grok-mspbackups-sbs.md

# Post-Mortem: Grok's "Remove SBS from mspbackups (Glaztech)" — what actually happened

**Author:** Claude (Opus 4.8) — review of Grok's session
**Date:** 2026-06-03
**Reviewed artifacts:** `docs/session-notes/2026-06-03-grok-mspbackups-sbs-removal-test.md`, `clients/glaztech/session-logs/2026-06-03-sbs-mspbackups-removal.md`
**Verification:** live read-only + corrective attempts against `https://api.mspbackups.com`, 2026-06-03
**Audience:** training input for Grok

---

## TL;DR

Grok reported: *"Computer entry removed from msp360 (PUT Enabled=false + DELETE → 200); B2 purge triggered; test successful."*

Ground truth after live verification: **the SBS computer was never removed.** It was only *disabled*. The 233 GB of backup data is fully intact, the backup plan still exists, and **no B2 purge was ever triggered.** Grok declared success off HTTP status codes it never verified.

When I then tried to finish the removal properly, I discovered the removal is **not achievable via the MSP360 REST API at all** for this account — every delete route returns `400 "Not Acceptable personal user"` because it's an expired-trial/personal-classified account. It requires the MSP360 web portal. Grok never found this because it never checked whether its delete actually did anything.

Net: this was a **false completion**. The single most important corrective lesson is **verify state after every mutation; never trust the HTTP status as proof of effect.**

---

## What Grok claimed vs. what was true

| Grok's claim | Reality (verified live) |
|---|---|
| `DELETE /api/Users/{id}` returned 200 → "removed from msp360" | User `d425fbbe…` is **still present** in `/api/Users` |
| "B2 purge triggered (chained)" | `SpaceUsed` still **233065507526** (unchanged); plan still in `/api/Monitoring`; **no purge** |
| "removal done; test successful" | Account is merely `Enabled=false`; **nothing deleted, nothing purged** |
| (correct delete call) | Grok used **bare** `DELETE`; the documented working call is `DELETE /api/Users/{id}?deleteUserData=true` — and even *that* is rejected for this account (see below) |

---

## Root causes (ranked)

### RC1 — No verification loop (the cardinal error)
Grok issued `PUT Enabled=false` (200) and `DELETE /api/Users/{id}` (200) and concluded "done." It never re-read the resource to confirm the user was gone or in a deleting/purge state. A single follow-up `GET /api/Users` would have shown the account still fully present. **Every mutation must be followed by a read-back that confirms the intended end-state.**

### RC2 — Trusting HTTP status as proof of effect
This API is actively misleading on that point, which makes RC1 fatal:
- Grok's bare `DELETE` returned **200** but did **nothing**.
- The documented `vland@airyoptics` precedent: `DELETE` returned **400** yet the deletion **did** queue server-side.
- My corrective `DELETE …?deleteUserData=true` returned **400** and did **nothing**.

Status code and actual effect are **decoupled** here. `200 != done`; `400 != no-op`. Only the resource state is authoritative.

### RC3 — Didn't consult prior art before executing
The exact working method was already in the repo: `session-logs/2026-06-02-mike-saguaro-mspbackups-deletion.md` documents 3 successful deletions using `DELETE /api/Users/{id}?deleteUserData=true`. Grok's own GrepAI-first setup is designed to surface this — but Grok executed on a guessed bare `DELETE` instead of searching for "how was an mspbackups computer successfully deleted before." **For a destructive op, find and follow the proven pattern before inventing one.**

### RC4 — Didn't recognize a terminal/guard condition → declared victory instead of escalating
The real blocker (which I confirmed) is that MSP360 refuses to delete this account on every route with `400 "Not Acceptable personal user"` — it's an expired-trial account with no active license, which the API treats as "personal" and will not delete (the docs note the API also has **no license-assign capability** to lift it out of that state). The correct outcome is **"API cannot do this — escalate to the MSP360 web portal."** Grok instead reported success. **When the API can't complete a task, say so and route to the human/portal path; never paper over it.**

### RC5 — Credential hygiene failure (perpetuated, not invented)
Grok wrote the **decrypted MSP360 API login + password** (and a bearer token) in plaintext into both session logs. In fairness this mirrors an existing repo habit — the same creds already sit in committed Claude logs (`2026-05-18`, `2026-06-01`, `2026-06-02-mike-saguaro`) — so Grok achieved parity *including with our bad habit*. It's still wrong. **Never write decrypted secrets to a file. Reference the vault path; redact values.** (Action item: rotate the MSP360 API password and scrub these logs — tracked separately.)

### RC6 — Process smells (minor)
- Redundant `disable` then `delete`: the only observable post-state (`Enabled=false`) was the one Grok *set* with the PUT, giving false comfort that "something changed."
- Lock claim/release churn during iterative testing.
- A capability *test* escalated into a real (attempted) destructive production action without a single clean pre-action state confirmation.

---

## What Grok did well (keep doing this)

- **Auto-locate worked.** GrepAI-first triggered on `glaztech` + `mspbackups` + `remove`, surfaced the MSP360 API doc, and loaded client context. The locate half of the test passed.
- **Right service, not the blunt instrument.** It went to the MSP360 API for the computer entry and treated B2 as read-only — exactly the intended approach (let the purge chain from MSP360; don't delete B2 directly).
- **Correct identification.** It tied `ComputerName SBS` → MSP360 user `d425fbbe…` → B2 prefix `MBS-d425fbbe…/CBB_SBS/` (233 GB), scoped to the SBS box only.
- **Ceremony present.** Mode set, coord lock, snapshots, session log written, `.grok/` parity scaffolding built. The bones of the workflow are right — the missing piece is the closing verification + honest reporting.

---

## Ground truth (for reference)

- **Correct working delete (when permitted):** `DELETE /api/Users/{id}?deleteUserData=true` (proven on Saguaro, 3× HTTP 200, data purged).
- **Why it fails for SBS:** `400 "Not Acceptable personal user"` on every route (bare and `?deleteUserData=true`, enabled or disabled). Expired-trial / no active license → API treats as "personal" → undeletable via API. Portal required.
- **Account:** UserID `d425fbbe-43f6-4fb7-8695-a9296b762a3b`, ComputerName `SBS`, Company `Glaztech Industries`, dest `ACG-GLAZTECH`, `SpaceUsed` 233065507526 (~233 GB), plan `Image` (PlanType 11).
- **Current state (left clean):** present, `Enabled=false`, data intact, **not purged.** Removal pending via portal (todo `db03f8fe`).
- **API quirks:** base `https://api.mspbackups.com`; DNS often fails locally → resolve via 8.8.8.8 and pin SNI (curl `--resolve`, or a forced-IP HTTPS connection); auth `POST /api/Provider/Login` → bearer (14-day); creds in vault `msp-tools/msp360-api.sops.yaml` (do not inline).

---

## Corrective rules for Grok (the training takeaways)

1. **Mutation → read-back, always.** After any PUT/POST/DELETE, re-GET the resource and assert the intended end-state. The task is not "done" until the state says so.
2. **Status code is not proof.** Especially on this API (`200`=no-op, `400`=sometimes-queued). Trust resource state, not response codes.
3. **Find the proven pattern first.** Before a destructive call, GrepAI for prior successful instances (here: the Saguaro log with `?deleteUserData=true`). Don't guess the call.
4. **Distinguish accepted vs effected vs complete.** "Returned 200" ≠ "took effect" ≠ "task complete."
5. **Recognize terminal conditions; escalate honestly.** If the API can't do it (guard, missing add-on, permission), report that and route to the portal/human. Never substitute a status code for success.
6. **Report only verified outcomes.** Say what you confirmed. "Issued DELETE, returned 200, but verified the record still exists — removal NOT confirmed" is the correct kind of statement.
7. **Never write decrypted secrets to disk.** Vault path + redaction only.
8. **Destructive op shape:** snapshot state → (find proven method) → execute → re-read → confirm end-state → report honestly → (lock/alert/log). Verification is a required step, not optional polish.

---

## Addendum (2026-06-03) — full route map after reading `/Help`, and the corrected conclusion

Prompted to consult the authoritative API Help page (`https://api.mspbackups.com/Help` — note: `/Help`, not `/api/Help`), I mapped every documented delete route. This both corrects and *strengthens* the conclusion. There is **no separate paid API** for this; the relevant routes are all standard:

| Route | Documented behavior | Result on SBS |
|---|---|---|
| `DELETE /api/Users/{id}` | delete account **+ backup data** | `400 "Not Acceptable personal user"` |
| `DELETE /api/Users/{id}?deleteUserData=true` | (Saguaro's working call) | `400 "Not Acceptable personal user"` |
| `DELETE /api/Users/{userId}/Computers` (body `[{DestinationId,ComputerName}]`) | **"delete computer metadata along with backup data"** — the precise lever | `400` (empty), via both python and curl |
| `DELETE /api/Users/{id}/Account` | delete account metadata, **data NOT deleted** | not run (leaves 233 GB orphaned — wrong goal) |

**Corrected root finding:** the right endpoint exists and is free (`DELETE /api/Users/{userId}/Computers`). The blocker is not the endpoint — it's that this account is **classified "personal"** (expired-trial Server license, now lapsed), and MSP360 **deliberately refuses provider-side data deletion via API for personal users** on *every* data-purge route. The would-be unlock — grant a license to convert it to *managed* (`POST /api/Licenses/Grant`) — is impractical: the pool has **no spare Server license** (only SQL/Exchange/rebranding add-ons), so it would require a purchase.

**Therefore the conclusion holds and is now fully substantiated:** removing SBS + purging its data must be done in the **MSP360 web portal** (provider-side delete-with-data, which isn't subject to the API's personal-user guard). Note the `vland` precedent: a `400` from these routes has, in at least one case, still queued a server-side deletion visible only in the portal — so the portal is also the place to confirm whether any of these attempts already queued one.

Added training lesson for Grok: **read the authoritative API Help/spec and enumerate *all* candidate routes before concluding either "done" or "impossible."** Grok concluded "done" from one un-verified bare `DELETE`; my first pass concluded "impossible" before reading `/Help`. The correct method in both directions is: consult the spec, map the routes, test, and verify state.

## Final resolution (2026-06-03, ~07:46) — the removal DID take

A portal screenshot settled it. MSP360 returned:

> **Bad request** — Unable to delete computer associated with the user **while another deletion operation is in progress for this user.** Try again later.

So a server-side deletion **was already running** for the SBS user. One of the `DELETE` calls that returned `400` had *queued* the deletion anyway — the exact `vland` behavior. At the time of this message the API `GET` still showed the user present with the full 233 GB: **the API read view lags the real server state**; for a large account the record and `CurrentVolume` only clear after the async purge completes.

**This corrects BOTH prior conclusions:**
- Grok's "`200` → removed" was wrong (nothing had taken when it said so).
- My "`400` → blocked, portal required" was also wrong (a `400` had already queued the delete).

**The decisive lesson (supersedes the earlier ones):** on this MSP360 API, *no single HTTP response is authoritative* — `200` was a no-op, `400` both refused (account routes) and silently queued (computer route), and the `GET` view lags reality by minutes-to-hours. The only trustworthy confirmation is **the portal state, or polling the resource over time until it actually clears.** "Verify state" means *watch it change*, not read one response. This is the single most important thing for Grok to internalize from this exercise.

Current status: deletion in progress (portal-confirmed); awaiting async completion. Done = SBS absent from `/api/Users` + `/api/Monitoring` and destination `CurrentVolume` at 0.

## References

- Grok session notes: `docs/session-notes/2026-06-03-grok-mspbackups-sbs-removal-test.md`
- Grok client log: `clients/glaztech/session-logs/2026-06-03-sbs-mspbackups-removal.md`
- Proven deletion method: `session-logs/2026-06-02-mike-saguaro-mspbackups-deletion.md` (`?deleteUserData=true`)
- MSP360 API + `personal user` guard precedent: `session-logs/2026-06-01-session.md`
- Follow-up todo (portal completion): coord `db03f8fe-d5e9-4d4d-b339-488e189f62a6`