diff --git a/clients/birth-biologic/session-logs/2026-06/2026-06-30-mike-birthbio-quality-consolidation-corruption.md b/clients/birth-biologic/session-logs/2026-06/2026-06-30-mike-birthbio-quality-consolidation-corruption.md new file mode 100644 index 00000000..a6555f5d --- /dev/null +++ b/clients/birth-biologic/session-logs/2026-06/2026-06-30-mike-birthbio-quality-consolidation-corruption.md @@ -0,0 +1,153 @@ +## User +- **User:** Mike Swanson (mike) +- **Machine:** GURU-5070 +- **Role:** admin + +## Session Summary + +Consolidated BirthBiologic's Quality content in SharePoint and reconciled it against the Datto +Workplace source of truth, then handled the migration-corruption fallout. The session began by +ensuring sysadmin@birthbiologic.com had access to the BirthBio SharePoint content sites: granted +owner+member on the canonical **Quality Systems Department** (QSD) site via its M365 group (he had +no access); confirmed existing owner access on Donor Services, Supply Management, and the old +Quality Department. The `[admin]` communication site and four empty `Quality Systems Department-*` +spoke sites have no M365 group, so they require a site-collection-admin grant via PnP (commands +DM'd to Mike; not runnable by the app — `Unsupported app only token` on SP REST). + +Investigated why two Quality sites existed (`Quality Department`, created 2026-04-20, the original +migration landing site; `Quality Systems Department`, created 2026-06-02, canonical) and treated +Datto Workplace on ACG-DWP-X-BB as the authoritative source. Enumerated the Datto "Quality +Department" project (3,812 files / 28 GB). Reconciled QSD to Datto: removed 811 byte-identical +duplicate files (kept the Datto-aligned copy of each, verified by quickXorHash), removed 195 +SP-only files older than the current week (kept 11 recent), and backfilled 31 files Datto had but +QSD lacked (30 via server-side copy from the old site, 1 — `2025.requested items for bone bank +audit.pdf` — uploaded from the Datto host via a pre-authenticated upload session). Final QSD ~3,766 +files with 0 Datto files missing; the only remaining diff is 13 files under a renamed folder +(`Lab Log.2020 to Current` vs Datto's `2020 to 2024`). + +Archived the duplicate site: copied the divergent `Processor Contact Information...Surgenex +8.21.24.xlsx` from the old site into QSD alongside its counterpart, ran a pre-delete safety delta +(0 unaccounted recent edits), then deleted the old `Quality Department` M365 group/site. The Tenant +Admin app 403s on group DELETE (has GroupMember write, not Group.ReadWrite.All); the **User Manager +app** performed the delete (HTTP 204). Group soft-deleted (restorable ~30 days; site recycle ~93 days). + +Then handled migration corruption: the Surgenex xlsx Mike tried to open errored ("file format not +valid"). Diagnosed the byte-array->decimal-text corruption (header `80 75 3 4...` instead of +`PK\x03\x04`); the 64 KB QSD copy was corrupt, the 20 KB old-site copy clean and newer — deleted the +corrupt copy and renamed the clean one to canonical (after a WOPI lock cleared). A QSD-wide sweep +found **81 corrupt files** (60 docx/17 pdf/2 xlsx/2 doc), more than a parallel session's earlier 47 +(my reconciliation propagated corrupt orphans via byte-preserving copy). Coordinated with the +parallel session (coord todo 28e3e7ab): appended reconciliation caveats + the live 81 count as +child todos, and graduated their recovery tool `bb-recover.py` from session scratchpad into the repo +at `clients/birth-biologic/scripts/`. Recovery itself remains deferred per Mike. + +Finally, on a context switch, ran a read-only DC health verification on Peaceful Spirit's PST-SERVER +(192.168.0.2) via RMM: PST-SERVER healthy (all 5 FSMO, dcdiag clean, SYSVOL/NETLOGON shared, SYSVOL +Event 4602 from the 06-13 D4 restore), but PST-SERVER2 unreachable (ping fails, RPC 1722, ~16 days +stale) — Gate-4 DFSR work blocked until SERVER2 is back online. + +## Key Decisions + +- **Datto = source of truth** for the Quality reconciliation; the Datto-aligned path is the keeper + in any duplicate pair. Confirmed dupes removed only when a byte-identical Datto-aligned twin + remained in QSD (no unique data lost). +- **Did not bulk-delete by path.** Path comparison flagged 1,016 SP files "not in Datto," but 921 + were the same files reorganized into different folders. Switched to hash-based dedup to avoid + destroying real data; surfaced this to Mike rather than executing the literal rule. +- **Backfill via server-side copy from the old site** (not Datto host upload) for 30 of 31 files — + cheaper and byte-identical; only the 1 file absent from the old site needed a host upload. +- **In-place reconstruct is the right recovery method** (the parallel session's bb-recover.py + PUTs recovered bytes to the same item id, preserving share links) — better than my re-copy + approach. Graduated their tool rather than writing a competing one. +- **Deleted the old site via the User Manager app** after the Tenant Admin app 403'd on group + delete — different app tier for the privilege. +- **Surgenex fix by swap, not reconstruct** — a clean newer copy existed on the old site, so + swapping it in was more reliable than reconstructing the corrupt bytes. +- **All destructive ops gated behind dry-runs + recycle bin** — every deletion recoverable ~93 days. + +## Problems Encountered + +- **Path-based "not in Datto" was misleading** — 921 of 1,016 were reorg copies; resolved by + switching to content-hash dedup. +- **Timestamp filter poisoned by migration copies** — 1,007 orphans carried today's date from my + earlier copy; used the old site's lastModified as the true-age signal instead. +- **Tenant Admin app cannot delete M365 groups** (403, GroupMember-only) and cannot manage SP site + state (`Unsupported app only token`) — used the User Manager app for group delete; SP site + lock/spoke-site grants pushed to Mike via PnP. +- **WOPI lock (HTTP 423)** blocked the Surgenex swap while the file was open in Excel; retried after + Mike closed it. +- **RMM stdout capped** the base64 of the 6.8 MB file (got 786 KB of 6.8 MB) — switched to a + pre-authenticated Graph upload session so the host PUT the bytes directly. +- **bash `${var/pat/repl}` mangled the upload URL** (& separators) — rewrote the PS script via + Python and dispatched with `jq --rawfile`. +- **Corruption scan kwarg bug** (`requests.get got multiple values for 'headers'`) — fixed the + retry wrapper to merge headers. +- **errorlog.md rebase conflict** on sync (concurrent top-prepends from another session) — resolved + keeping all entries. +- **bb-recover.py was stranded in another session's scratchpad** (session-local path) — graduated + to the repo so any machine can run it. + +## Configuration Changes + +- Created `clients/birth-biologic/docs/migration/2026-06-29-quality-dept-archival-plan.md` (then + marked COMPLETED). +- Created `clients/birth-biologic/scripts/bb-recover.py` (graduated recovery tool; pushed in 801ff788). +- Modified `errorlog.md` (friction: group-delete app gap; + conflict resolution). +- SharePoint (QSD, tenant 19a568e8-...): +sysadmin@ owner/member on QSD group; 811 dupes deleted; + 195 stale SP-only deleted; 31 files backfilled; Surgenex corrupt copy deleted + clean copy renamed. +- Deleted old `Quality Department` M365 group/site (soft-deleted). +- Coord todos: appended children 5162f79f (caveats) + ac832238 (live 81 count) under 28e3e7ab. + +## Credentials & Secrets + +- No new credentials created or discovered. Used (read-only) from vault: + - `msp-tools/computerguru-tenant-admin.sops.yaml` field `credentials.client_secret` (app + 709e6eed-0711-4875-9c44-2d3518c47063) — Graph app-only against BirthBio tenant. + - `msp-tools/computerguru-user-manager.sops.yaml` field `credentials.client_secret` (app + 64fac46b-8b44-41ad-93ee-7da03927576c) — used for the M365 group delete (has Group.ReadWrite.All). +- bb-recover.py reads the Tenant Admin secret via env `BBSEC` (not embedded). + +## Infrastructure & Servers + +- **BirthBio M365 tenant:** 19a568e8-9e88-413b-9341-cbc224b39145 +- **QSD site:** birthbiologic.sharepoint.com/sites/QualitySystemsDepartment (site id + ...,3173c017-58bd-406a-8858-2c969667336f,...; single drive "Documents") +- **Old (deleted) group id:** f24b2e10-2d73-49d7-ab06-fe63065301d1 (QualityDepartment@), deletedDateTime 2026-06-29T22:23:15Z +- **Datto source host:** ACG-DWP-X-BB (172.16.3.45), RMM agent a4524e85-8a07-45d0-91b1-51ce7e2ca74a; + source tree `C:\Users\Public\Desktop\Datto Workplace Server Projects\Quality Department` (3,812 files) +- **Peaceful Spirit:** PST-SERVER 192.168.0.2 (all 5 FSMO, Server 2016, RMM 87293069-33b6-45e8-a68f-6811216cdb96); + PST-SERVER2 192.168.1.127/192.168.1.5 (RMM 5d2d7ba0-3903-4aa3-9e97-6ca4424ffe65) — UNREACHABLE. + +## Commands & Outputs + +- Graph app-only token: POST login.microsoftonline.com/{tenant}/oauth2/v2.0/token, scope + graph.microsoft.com/.default, grant client_credentials. +- Dedup keep-rule: same quickXorHash, keep copy whose path is in the Datto path set. +- Group delete: `DELETE /groups/{id}` — 403 via Tenant Admin app, **204 via User Manager app**. +- Pre-auth upload: `POST /drives/{id}/root:/{path}:/createUploadSession` -> host PUTs bytes to uploadUrl. +- DC verify (PST-SERVER): dcdiag Advertising/FSMOCheck/Services PASSED; `repadmin /replsummary` + PST-SERVER2 fails 5/5 error 1722; ping PST-SERVER2 = False. + +## Pending / Incomplete Tasks + +- **BirthBio QMS corruption recovery (DEFERRED, todo 28e3e7ab):** ~81 corrupt files in QSD. Run + `clients/birth-biologic/scripts/bb-recover.py birthbiologic.sharepoint.com:/sites/QualitySystemsDepartment` + (dry-run) then `--apply` (set BBSEC). Re-scan live; do NOT trust the saved 47-list. Then widen + the scan tenant-wide (Admin/Donor Services/Supply were in the same 6/26 corrupt batch). +- **Spoke + [admin] site access:** Mike to run the PnP grant (DM'd) for sysadmin@ on the 5 groupless sites. +- **Quality ticket write-up** (drafted in Mike's voice) NOT yet posted — needs ticket # (assumed + #32187), visibility (customer/internal), and billing decision. +- **Surgenex two-version note** moot now (corrupt copy removed; one clean file remains). +- **Peaceful Spirit:** PST-SERVER2 is dark — confirm whether it should be online; Gate-4 DFSR + (backlog drain, re-add folder/root targets) blocked until it is. If PST-SERVER was full-restored, + run a deeper post-restore DC pass (USN/invocationID, dcdiag /test:Replications, DNS SRV). +- **6 files** returned fetch-errors during the corruption sweep — re-check on recovery run. + +## Reference Information + +- Coord todos: 28e3e7ab-f77d-4d4f-b2e0-15f0254155ea (parent recovery), 5162f79f-de05-49d3-9f16-546f8d11c241 + (caveats), ac832238-69f0-421f-9cc3-e9a7537d8ede (live 81 count). +- Pushed commit: 801ff788 (bb-recover.py graduation + errorlog). +- Recovery tool: clients/birth-biologic/scripts/bb-recover.py (BBSEC env = tenant-admin client_secret). +- Archival plan: clients/birth-biologic/docs/migration/2026-06-29-quality-dept-archival-plan.md +- Peaceful Spirit runbook: clients/peaceful-spirit/AD-DC2-REBUILD-RUNBOOK.md