sync: auto-sync from GURU-5070 at 2026-06-29 15:30:34

Author: Mike Swanson
Machine: GURU-5070
Timestamp: 2026-06-29 15:30:34
This commit is contained in:
2026-06-29 15:31:29 -07:00
parent a88a360450
commit 9a6e1157a7
9 changed files with 275 additions and 1014 deletions

View File

@@ -0,0 +1,141 @@
## User
- **User:** Mike Swanson (mike)
- **Machine:** GURU-5070
- **Role:** admin
## Session Summary
Multi-client continuation session covering Dataforth, Birth Biologic, and Peaceful Spirit, plus
closing out a prior /save. First, the Birth Biologic Calm Ops media upload (background task from the
prior session) was found to have failed: 33 docs uploaded but all 10 large media files died at chunk
0 (connection-closed-on-send + 503) using 60 MB chunks. The uploader was fixed (10 MiB chunks,
Expect:100-continue disabled, `-MediaOnly` switch to skip the already-uploaded docs); re-upload was
prepared but the user indicated the media was handled, so it was not re-run.
Second, a Dataforth issue from John Lehman: test database file 5BMAIN.DAT updated 6/26 shows in
`K:\TS-*\ProdSW\5BDATA` but production DOS test stations don't pick it up on restart. Loaded the
DOS-test-machine context and traced the whole pipeline via AD2's RMM agent (read-only). Confirmed
the updated 5BMAIN.DAT (6/26, 83,200 B) is staged correctly on AD2 AND on the D2TESTNAS NAS the
stations pull from — so sync is healthy. Root cause: `NWTOC.BAT` v5.0 (deployed 2026-03-16), the
on-boot updater, copies only `*.BAT` (from `T:\COMMON\ProdSW`) and `*.EXE` (from `T:\Ate\ProdSW`);
its own note reads "removed DATA folder copies (avoid cyclic overwrites)," so stations stopped
refreshing master spec DATs (5BMAIN/8BMAIN/DSCMAIN4/SCTMAIN/7BMAIN). CHECKUPD detects updates but
NWTOC never applies them. Per Mike's instruction the work was parked: opened Syncro ticket #32489
(Scheduled), booked a Wednesday 2026-07-01 8:00 AM appointment, and posted a customer-visible note
telling John the fix lands Wednesday with a validation method.
Third, Birth Biologic flagged a "broken" Excel file in the QualitySystemsDepartment SharePoint site.
Resolved the share link via Graph (read-only): the 64,466-byte ".xlsx" is actually ASCII text —
space-separated decimal byte values starting `80 75 3 4` (= the `PK` zip signature). A broken upload
wrote the byte array as a stringified decimal list instead of raw bytes. Reconstructed the 19,124
real bytes into a valid xlsx (verified OOXML structure + real contact data: "Processor Contact
Information"). The file was created by our Tenant Admin app on 6/26, implying a systemic bug in that
migration batch. Built a recovery tool (`bb-recover.py`) that enumerates the site's libraries,
detects the decimal-text signature, reconstructs + validates each, and re-uploads in place; launched
a dry-run scan of the QualitySystemsDepartment site (still running at session end).
Fourth and largest: Peaceful Spirit deletion-scope investigation. Mara worried other files
disappeared with Glennda's. A live-filesystem mtime scan flagged 13 client folders changed in the
6/24 10:05-12:05 window across the alphabet — initially read as "widespread," but a per-folder
restore-point diff (cbb) showed those were `DELETED=0, added=1` (normal new scans, not deletions),
confirming the mtime heuristic was noise. Pivoted to Mike's approach: restore the pre-deletion state
to staging and diff locally. Verified space (C: 803 GB free; @Clients = 72.5 GB / 142,288 files),
stopped the MSP360 backup, and launched a staging restore of the 6/24 10:05 AM restore point to
`C:\PST-Recovery\PreDelete-0624`. A second restore of the oldest point (6/29/2025) is queued to
check whether mass deletion happened before. Both restores feed fast local diffs.
## Key Decisions
- Parked the Dataforth NWTOC fix to a scheduled Wednesday appointment rather than fixing live —
it touches every test station's boot and needs John to confirm the authoritative master-spec
file list first. Fix will be NWTOC v5.1: copy ONLY engineering-owned master spec files one-way
from `T:\Ate\ProdSW\*DATA`, avoiding the cyclic-overwrite the v5.0 change guarded against.
- Recovered the corrupt BirthBio xlsx by parsing decimal-byte text back to binary rather than
treating it as data-loss — the original file was intact inside the text. Only auto-replace files
whose reconstruction yields a known binary magic (PK/%PDF/OLE/PNG/etc.); others flagged for review.
- Abandoned the live-mtime heuristic for PST deletion scope after the cbb diff proved the flagged
folders were additions, not deletions. Adopted restore-to-staging + local diff as the trustworthy,
complete method (catches victim folders the mtime scan hides because they were touched later).
- Ran the two PST restores sequentially (not concurrently) to avoid a same-bunch usage lock; staged
to C: (most free space) rather than D: (VM files) or G: (the live data drive).
- Logged a root-level multi-client session log (spans 3 clients) — no single wiki article implied.
## Problems Encountered
- BirthBio media upload failed (10/10 large files, 60 MB chunks) — fixed uploader to 10 MiB chunks +
Expect:100-continue off + per-chunk retry; re-run deferred (user handled media).
- SSH-as-SYSTEM on AD2 could not use sysadmin's key (`Permission denied (publickey,password)`) — the
RMM agent runs as SYSTEM; used the rsync daemon (password auth) for read-only NAS listing instead.
- cwRsync (cygwin) on AD2 misread a Windows `C:\path` DESTINATION as a remote host; single-file pulls
silently failed. Fix: use `/cygdrive/c/...` for the local destination. (Logged as friction.)
- Read the wrong (inactive) sync script first (`Sync-FromNAS.ps1`, SCP); the scheduled task actually
runs `Sync-FromNAS-rsync.ps1`. Confirmed via the task action.
- PST scope: 60s/folder cbb listing × ~2,500 folders made a full tree diff infeasible interactively;
resolved by the restore-and-local-diff approach.
- Bash 120s tool timeout repeatedly cut polling of long RMM commands; mitigated with detached
server-side jobs writing to files, polled across calls.
## Configuration Changes
- Syncro: created ticket #32489 (Dataforth, Scheduled) + 2 comments + appointment id 5626864474
(Wed 2026-07-01 8:00-9:00 AM MST, Remote).
- PST-SERVER: MSP360 backup "Files Backup 2025" STOPPED (must restart after restores).
- PST-SERVER: created restore plan `ZPreDelete0624` (run-once) → restoring to
`C:\PST-Recovery\PreDelete-0624` (in progress); also `C:\PST-Recovery\scope-diff.ps1` (detached
diff job) and `deletion-scope-report.txt`.
- Local repo: removed stray `.pst_sweep` / `.pst_when` (prior session); no other repo edits this session.
- Scratch (not committed): `scratchpad/bb-recover.py`, `upload-calmops.ps1` (fixed), `.bbfile.xlsx`
(broken), `.bbfile.recovered.xlsx` (recovered).
## Credentials & Secrets
- BirthBio Tenant Admin app (Graph): vault `msp-tools/computerguru-tenant-admin.sops.yaml`, field
`credentials.client_secret`. Tenant `19a568e8-9e88-413b-9341-cbc224b39145`, client
`709e6eed-0711-4875-9c44-2d3518c47063`.
- Dataforth D2TESTNAS rsync daemon: host 192.168.0.9 port 873, module `test` (=/data/test),
user `rsync` / `IQ203s32119` (from `Sync-FromNAS-rsync.ps1`). NAS root SSH key on AD2:
`C:\Users\sysadmin\.ssh\id_ed25519` (usable only as sysadmin, not SYSTEM).
- Dataforth AD sysadmin: `INTRANET\sysadmin` / `Paper123!@#`. No new secrets created.
## Infrastructure & Servers
- Dataforth AD2: 192.168.0.6, RMM agent `cfa93bb6-0cdc-4d4e-a29e-1609cda6f047`. Sync task runs
`Sync-FromNAS-rsync.ps1` every 15 min. `test` share = `C:\Shares\test`. Stray `TS-21\ProdSW` is a
file (not dir) → sync `Errors=1`/`LastResult=1` each run (cleanup pending, not the cause).
- DOS update chain: AUTOEXEC -> STARTNET (maps T:=\\D2TESTNAS\test, X:=datasheets) -> NWTOC v5.0
(COMMON\ProdSW *.BAT to C:\BAT, Ate\ProdSW *.EXE to C:\ATE) -> CTONW. Master spec DATs live in
`Ate\ProdSW\*DATA` and per-station `TS-*\ProdSW\*DATA`; COMMON\ProdSW\5BDATA is empty.
- BirthBio QualitySystemsDepartment site id
`birthbiologic.sharepoint.com,3173c017-58bd-406a-8858-2c969667336f,ab1e4b4f-0f71-4c15-a4b4-fa900c189ac3`,
one library "Documents", broken-file parent drive
`b!F8BzMb1YakCIWCyWlmczb09LHqtxDxVMpLT6kAwYmsM7NUY4oPLSRq7ng3tJq-E9`.
- PST-SERVER: 192.168.0.2, RMM agent `87293069-33b6-45e8-a68f-6811216cdb96`. MSP360 bunch
`6a121575-84a0-4e98-9c0f-4a656d1a5132`, account ACG-PST `084b5069-d634-434b-84a2-971b1dcb4b43`,
prefix PST-SERVER, cbb `C:\Program Files\Arizona Computer Guru\Online Backup\cbb.exe`.
- PST restore points: oldest `20250629170034` (6/29/2025 10:00:34 AM, Full); pre-incident
`20260624170506` (6/24 10:05:06 AM); post-deletion `20260624190522` (6/24 12:05:22 PM).
- PST volumes: C: 803.5 GB free / D: (VM Files) 109.9 / G: (data) 193.3. @Clients = 72.5 GB / 142,288 files.
## Commands & Outputs
- Graph share resolve: `GET /shares/u!<base64url>/driveItem`. Reconstruct: `bytes(int(t) for t in text.split())`.
- cbb restore to alt location: `cbb addRestorePlan -n <name> -aid 084b5069-... -bp PST-SERVER -bunch 6a121575-... -restorePoint <id> -rt "<date>" -d "G:\Shares\Scanned\@Clients" -rl "<destpath>" -ro yes -deleted yes` then `cbb plan -r "<name>"`. Stop backup: `cbb plan -s "Files Backup 2025"`.
- cbb `list` is NON-recursive, ~60s/folder; `-rlocation` = original|path; `-d` = source dir.
- PST scope-diff sample: every mtime-flagged folder = `DELETED=0 added=1` (adds, not deletions).
## Pending / Incomplete Tasks
- **CRITICAL: PST-SERVER backup "Files Backup 2025" is STOPPED** — must restart after restores complete.
- **PST restore #1** (`ZPreDelete0624` → C:\PST-Recovery\PreDelete-0624) IN PROGRESS (72 GB from B2).
- **PST restore #2** (oldest `20250629170034` → C:\PST-Recovery\Oldest-20250629) — launch after #1 finishes.
- **PST local diffs:** (a) PreDelete-0624 vs LIVE = complete deleted-file list + repair source (copy-back no-overwrite); (b) oldest vs pre-incident = "has this happened before."
- **BirthBio QualitySystemsDepartment scan** (bb-recover.py dry-run) running — review list, then run `--apply` to recover+replace in place.
- **BirthBio Calm Ops media** (10 files) re-upload still available if needed (uploader fixed).
- **Dataforth ticket #32489** — Wed 2026-07-01 8a: confirm master-spec file list with John, build NWTOC v5.1, test on TS-3R, roll out; cleanup AD2 TS-21 stray + NAS COMMON\ProdSW junk; build John a validation BAT.
## Reference Information
- Syncro ticket #32489 id 113201089; appointment 5626864474; Dataforth Corp customer 578095; John Lehman contact 2851723.
- RMM API `http://172.16.3.30:3001`. Graph `https://graph.microsoft.com/v1.0`.
- Recovery tool: `scratchpad/bb-recover.py`. Recovered sample: `.bbfile.recovered.xlsx`.
- Server artifacts: `C:\PST-Recovery\` (rps.txt, deletion-scope-report.txt, scope-diff.ps1, PreDelete-0624\, Glennda_0605\, Gtest\).