sync: auto-sync from GURU-5070 at 2026-06-30 15:16:30

Author: Mike Swanson
Machine: GURU-5070
Timestamp: 2026-06-30 15:16:30
This commit is contained in:
2026-06-30 15:17:12 -07:00
parent f6f6aae618
commit a808200dc4

View File

@@ -0,0 +1,153 @@
## User
- **User:** Mike Swanson (mike)
- **Machine:** GURU-5070
- **Role:** admin
## Session Summary
Consolidated BirthBiologic's Quality content in SharePoint and reconciled it against the Datto
Workplace source of truth, then handled the migration-corruption fallout. The session began by
ensuring sysadmin@birthbiologic.com had access to the BirthBio SharePoint content sites: granted
owner+member on the canonical **Quality Systems Department** (QSD) site via its M365 group (he had
no access); confirmed existing owner access on Donor Services, Supply Management, and the old
Quality Department. The `[admin]` communication site and four empty `Quality Systems Department-*`
spoke sites have no M365 group, so they require a site-collection-admin grant via PnP (commands
DM'd to Mike; not runnable by the app — `Unsupported app only token` on SP REST).
Investigated why two Quality sites existed (`Quality Department`, created 2026-04-20, the original
migration landing site; `Quality Systems Department`, created 2026-06-02, canonical) and treated
Datto Workplace on ACG-DWP-X-BB as the authoritative source. Enumerated the Datto "Quality
Department" project (3,812 files / 28 GB). Reconciled QSD to Datto: removed 811 byte-identical
duplicate files (kept the Datto-aligned copy of each, verified by quickXorHash), removed 195
SP-only files older than the current week (kept 11 recent), and backfilled 31 files Datto had but
QSD lacked (30 via server-side copy from the old site, 1 — `2025.requested items for bone bank
audit.pdf` — uploaded from the Datto host via a pre-authenticated upload session). Final QSD ~3,766
files with 0 Datto files missing; the only remaining diff is 13 files under a renamed folder
(`Lab Log.2020 to Current` vs Datto's `2020 to 2024`).
Archived the duplicate site: copied the divergent `Processor Contact Information...Surgenex
8.21.24.xlsx` from the old site into QSD alongside its counterpart, ran a pre-delete safety delta
(0 unaccounted recent edits), then deleted the old `Quality Department` M365 group/site. The Tenant
Admin app 403s on group DELETE (has GroupMember write, not Group.ReadWrite.All); the **User Manager
app** performed the delete (HTTP 204). Group soft-deleted (restorable ~30 days; site recycle ~93 days).
Then handled migration corruption: the Surgenex xlsx Mike tried to open errored ("file format not
valid"). Diagnosed the byte-array->decimal-text corruption (header `80 75 3 4...` instead of
`PK\x03\x04`); the 64 KB QSD copy was corrupt, the 20 KB old-site copy clean and newer — deleted the
corrupt copy and renamed the clean one to canonical (after a WOPI lock cleared). A QSD-wide sweep
found **81 corrupt files** (60 docx/17 pdf/2 xlsx/2 doc), more than a parallel session's earlier 47
(my reconciliation propagated corrupt orphans via byte-preserving copy). Coordinated with the
parallel session (coord todo 28e3e7ab): appended reconciliation caveats + the live 81 count as
child todos, and graduated their recovery tool `bb-recover.py` from session scratchpad into the repo
at `clients/birth-biologic/scripts/`. Recovery itself remains deferred per Mike.
Finally, on a context switch, ran a read-only DC health verification on Peaceful Spirit's PST-SERVER
(192.168.0.2) via RMM: PST-SERVER healthy (all 5 FSMO, dcdiag clean, SYSVOL/NETLOGON shared, SYSVOL
Event 4602 from the 06-13 D4 restore), but PST-SERVER2 unreachable (ping fails, RPC 1722, ~16 days
stale) — Gate-4 DFSR work blocked until SERVER2 is back online.
## Key Decisions
- **Datto = source of truth** for the Quality reconciliation; the Datto-aligned path is the keeper
in any duplicate pair. Confirmed dupes removed only when a byte-identical Datto-aligned twin
remained in QSD (no unique data lost).
- **Did not bulk-delete by path.** Path comparison flagged 1,016 SP files "not in Datto," but 921
were the same files reorganized into different folders. Switched to hash-based dedup to avoid
destroying real data; surfaced this to Mike rather than executing the literal rule.
- **Backfill via server-side copy from the old site** (not Datto host upload) for 30 of 31 files —
cheaper and byte-identical; only the 1 file absent from the old site needed a host upload.
- **In-place reconstruct is the right recovery method** (the parallel session's bb-recover.py
PUTs recovered bytes to the same item id, preserving share links) — better than my re-copy
approach. Graduated their tool rather than writing a competing one.
- **Deleted the old site via the User Manager app** after the Tenant Admin app 403'd on group
delete — different app tier for the privilege.
- **Surgenex fix by swap, not reconstruct** — a clean newer copy existed on the old site, so
swapping it in was more reliable than reconstructing the corrupt bytes.
- **All destructive ops gated behind dry-runs + recycle bin** — every deletion recoverable ~93 days.
## Problems Encountered
- **Path-based "not in Datto" was misleading** — 921 of 1,016 were reorg copies; resolved by
switching to content-hash dedup.
- **Timestamp filter poisoned by migration copies** — 1,007 orphans carried today's date from my
earlier copy; used the old site's lastModified as the true-age signal instead.
- **Tenant Admin app cannot delete M365 groups** (403, GroupMember-only) and cannot manage SP site
state (`Unsupported app only token`) — used the User Manager app for group delete; SP site
lock/spoke-site grants pushed to Mike via PnP.
- **WOPI lock (HTTP 423)** blocked the Surgenex swap while the file was open in Excel; retried after
Mike closed it.
- **RMM stdout capped** the base64 of the 6.8 MB file (got 786 KB of 6.8 MB) — switched to a
pre-authenticated Graph upload session so the host PUT the bytes directly.
- **bash `${var/pat/repl}` mangled the upload URL** (& separators) — rewrote the PS script via
Python and dispatched with `jq --rawfile`.
- **Corruption scan kwarg bug** (`requests.get got multiple values for 'headers'`) — fixed the
retry wrapper to merge headers.
- **errorlog.md rebase conflict** on sync (concurrent top-prepends from another session) — resolved
keeping all entries.
- **bb-recover.py was stranded in another session's scratchpad** (session-local path) — graduated
to the repo so any machine can run it.
## Configuration Changes
- Created `clients/birth-biologic/docs/migration/2026-06-29-quality-dept-archival-plan.md` (then
marked COMPLETED).
- Created `clients/birth-biologic/scripts/bb-recover.py` (graduated recovery tool; pushed in 801ff788).
- Modified `errorlog.md` (friction: group-delete app gap; + conflict resolution).
- SharePoint (QSD, tenant 19a568e8-...): +sysadmin@ owner/member on QSD group; 811 dupes deleted;
195 stale SP-only deleted; 31 files backfilled; Surgenex corrupt copy deleted + clean copy renamed.
- Deleted old `Quality Department` M365 group/site (soft-deleted).
- Coord todos: appended children 5162f79f (caveats) + ac832238 (live 81 count) under 28e3e7ab.
## Credentials & Secrets
- No new credentials created or discovered. Used (read-only) from vault:
- `msp-tools/computerguru-tenant-admin.sops.yaml` field `credentials.client_secret` (app
709e6eed-0711-4875-9c44-2d3518c47063) — Graph app-only against BirthBio tenant.
- `msp-tools/computerguru-user-manager.sops.yaml` field `credentials.client_secret` (app
64fac46b-8b44-41ad-93ee-7da03927576c) — used for the M365 group delete (has Group.ReadWrite.All).
- bb-recover.py reads the Tenant Admin secret via env `BBSEC` (not embedded).
## Infrastructure & Servers
- **BirthBio M365 tenant:** 19a568e8-9e88-413b-9341-cbc224b39145
- **QSD site:** birthbiologic.sharepoint.com/sites/QualitySystemsDepartment (site id
...,3173c017-58bd-406a-8858-2c969667336f,...; single drive "Documents")
- **Old (deleted) group id:** f24b2e10-2d73-49d7-ab06-fe63065301d1 (QualityDepartment@), deletedDateTime 2026-06-29T22:23:15Z
- **Datto source host:** ACG-DWP-X-BB (172.16.3.45), RMM agent a4524e85-8a07-45d0-91b1-51ce7e2ca74a;
source tree `C:\Users\Public\Desktop\Datto Workplace Server Projects\Quality Department` (3,812 files)
- **Peaceful Spirit:** PST-SERVER 192.168.0.2 (all 5 FSMO, Server 2016, RMM 87293069-33b6-45e8-a68f-6811216cdb96);
PST-SERVER2 192.168.1.127/192.168.1.5 (RMM 5d2d7ba0-3903-4aa3-9e97-6ca4424ffe65) — UNREACHABLE.
## Commands & Outputs
- Graph app-only token: POST login.microsoftonline.com/{tenant}/oauth2/v2.0/token, scope
graph.microsoft.com/.default, grant client_credentials.
- Dedup keep-rule: same quickXorHash, keep copy whose path is in the Datto path set.
- Group delete: `DELETE /groups/{id}` — 403 via Tenant Admin app, **204 via User Manager app**.
- Pre-auth upload: `POST /drives/{id}/root:/{path}:/createUploadSession` -> host PUTs bytes to uploadUrl.
- DC verify (PST-SERVER): dcdiag Advertising/FSMOCheck/Services PASSED; `repadmin /replsummary`
PST-SERVER2 fails 5/5 error 1722; ping PST-SERVER2 = False.
## Pending / Incomplete Tasks
- **BirthBio QMS corruption recovery (DEFERRED, todo 28e3e7ab):** ~81 corrupt files in QSD. Run
`clients/birth-biologic/scripts/bb-recover.py birthbiologic.sharepoint.com:/sites/QualitySystemsDepartment`
(dry-run) then `--apply` (set BBSEC). Re-scan live; do NOT trust the saved 47-list. Then widen
the scan tenant-wide (Admin/Donor Services/Supply were in the same 6/26 corrupt batch).
- **Spoke + [admin] site access:** Mike to run the PnP grant (DM'd) for sysadmin@ on the 5 groupless sites.
- **Quality ticket write-up** (drafted in Mike's voice) NOT yet posted — needs ticket # (assumed
#32187), visibility (customer/internal), and billing decision.
- **Surgenex two-version note** moot now (corrupt copy removed; one clean file remains).
- **Peaceful Spirit:** PST-SERVER2 is dark — confirm whether it should be online; Gate-4 DFSR
(backlog drain, re-add folder/root targets) blocked until it is. If PST-SERVER was full-restored,
run a deeper post-restore DC pass (USN/invocationID, dcdiag /test:Replications, DNS SRV).
- **6 files** returned fetch-errors during the corruption sweep — re-check on recovery run.
## Reference Information
- Coord todos: 28e3e7ab-f77d-4d4f-b2e0-15f0254155ea (parent recovery), 5162f79f-de05-49d3-9f16-546f8d11c241
(caveats), ac832238-69f0-421f-9cc3-e9a7537d8ede (live 81 count).
- Pushed commit: 801ff788 (bb-recover.py graduation + errorlog).
- Recovery tool: clients/birth-biologic/scripts/bb-recover.py (BBSEC env = tenant-admin client_secret).
- Archival plan: clients/birth-biologic/docs/migration/2026-06-29-quality-dept-archival-plan.md
- Peaceful Spirit runbook: clients/peaceful-spirit/AD-DC2-REBUILD-RUNBOOK.md