sync: auto-sync from GURU-5070 at 2026-06-17 17:34:25
Author: Mike Swanson Machine: GURU-5070 Timestamp: 2026-06-17 17:34:25
This commit is contained in:
@@ -0,0 +1,87 @@
|
||||
## User
|
||||
- **User:** Mike Swanson (mike)
|
||||
- **Machine:** GURU-5070
|
||||
- **Role:** admin
|
||||
|
||||
## Session Summary
|
||||
|
||||
Drove the Dataforth test-datasheet pipeline remediation end-to-end on host AD2, opened from a `/sync` that surfaced an AD2 cloud session's diagnosis. AD2 (a fork that pushes to the `ad2` branch, NOT main) had already root-caused the datasheet defects John Lehman reported and pushed a diagnosis + proposals. Read John's two newest emails (which required repointing the broken `/mailbox` skill off a deleted Graph app), which added a live Phytec Germany deliverable and a concrete DSCA Defect-B example. Created Syncro ticket #32441 (Dataforth Corp, contact John Lehman, warranty labor) and emailed John a findings/plan summary.
|
||||
|
||||
Ran a multi-AI (Grok adversarial + Gemini) review of the fix approaches, which materially hardened them, and committed the consolidated spec to main (`projects/dataforth-dos/DATASHEET-FIX-SPEC-2026-06-17.md`). Then, with backups in place (VSS shadow + 612 MB pg_dump) and per-file save-states, deployed three fixes to the live pipeline on AD2 via RMM (running node as SYSTEM): Fix 1 (RTD label), Fix 4 (encoded-serial importer recovery), and Fix 3 (retest "latest wins"). Each followed diagnose/validate-before-write with explicit verification.
|
||||
|
||||
Fix 1: RTD modules rendered the input column as `Rin (ohms)` instead of `Temp. (C)`. Validated read-only (byte-compare vs staged originals; data correct, label fixed; remaining cosmetic diffs are pre-existing across all families — leading `===` line + ~1-space column shift — and were deferred per Mike). Deployed, re-pushed the 102 Phytec SCM5B35 certs (SO 120006-A) + the Wellbore audit unit 179553-13 to Hoffman (103 rows, 0 errors).
|
||||
|
||||
Fix 4: importer dropped letter-prefixed encoded serials (DOS 8.3 encoding, e.g. `A243-1`=`10243-1`). A read-only pre-flight scan revealed the naive "decode any leading letter" produced false positives (lowercase/short tokens), so the decode rule was tightened to `^[A-Z]\d{3,}-`. Added `raw_serial_number` column + `test_records_quarantine` table; patched `multiline.js`+`import.js`; recovered 603 units (3 reused-serial collisions quarantined), published 518 to Hoffman.
|
||||
|
||||
Fix 3: same-day retests could keep a non-final run. Added an `ingest_seq` tiebreaker (source mtime x 1e6 + line index); rewrote the conflict rule (date primary, same-day broken by ingest_seq, churn-guarded by `raw_data IS DISTINCT`, cross-model quarantined). SQL validated via a rolled-back execution test. The settle pass (re-import of ~9k multiline logs) was running in the background at save time.
|
||||
|
||||
## Key Decisions
|
||||
- Filed the fix spec on `main` (not the `ad2` fork branch) and left AD2 a coord message; later switched to driving the fixes directly from GURU-5070 via RMM when Mike said VPN was up.
|
||||
- Multi-AI = the grok/gemini CLI skills (the established "MultiAI" pattern), not the Workflow tool.
|
||||
- Fix 2 (DSCA): per both AIs, DERIVE per-subtype templates from the staged `.TXT` originals (ground truth) rather than reverse-engineering the DOS `DSCFIN.DAT` binary. (Deferred — not yet built.)
|
||||
- Fix 4 decode rule restricted to `^[A-Z]\d{3,}-` after the pre-flight showed lowercase/short letter-prefixed serials are NOT 8.3 encodings (false positives). Kept the literal in `raw_serial_number`.
|
||||
- Fix 3 recency uses test_date primary with `ingest_seq` (mtime+line) only as a same-day tiebreaker — avoids the fragile pure-scan-order recency both AIs refuted, while keeping date (the reliable signal) authoritative.
|
||||
- Added a cross-model overwrite guard to the live `import.js` conflict rule (during Fix 4) so a different-model serial collision can never clobber an existing unit; full live-path quarantine routing remains Fix 3 territory.
|
||||
- Byte-for-byte DOS fidelity (the cosmetic render gap) deferred per Mike; recorded as a disclosure note to send John when the work is complete.
|
||||
- All production writes gated behind: VSS shadow + pg_dump backup, per-file timestamped `.bak` save-states, read-only validation first, atomic multi-patch (prepare-all-then-write), JS load-check, and (for Fix 3) a rolled-back SQL execution test.
|
||||
|
||||
## Problems Encountered
|
||||
- `/mailbox` (Graph) was dead — `AADSTS700016`, the `Claude-MSP-Access` app `fabb3421` was deleted. The skill had been claimed-fixed in a prior session but never repointed. Repointed `mailbox.md` + memory to the dedicated mailbox app `1873b1b0` via `get-token.sh ... mailbox` (cert auth); logged a `--correction`.
|
||||
- Ticket `contact_id` would not set via POST or PUT (returned null) — but the Syncro GUI showed John as contact (API serialization quirk). Confirmed with Mike before sending the customer email.
|
||||
- First Phytec re-push matched only 58/102 — the DB stores a mix of padded (`179754-01`) and unpadded (`179754-1`) serials. Re-ran with both forms (union) → all 102.
|
||||
- WizTree zip (205 MB) was committed on the `ad2` branch. Moved it on AD2 into a new gitignored `clients/dataforth/local-artifacts/` dir + added the ignore rule, rather than racing AD2's active session with a competing commit.
|
||||
- RMM runs as SYSTEM; git balks at the repo on AD2 ("dubious ownership") — used `git -c safe.directory=...` for reads; did all DB/node work as SYSTEM (PG auth via db.js defaults works regardless of user).
|
||||
- `/tmp` path mismatch between Git Bash and Python recurred — passed tokens via env var instead of a `/tmp` file.
|
||||
|
||||
## Configuration Changes
|
||||
Repo (main):
|
||||
- NEW `projects/dataforth-dos/DATASHEET-FIX-SPEC-2026-06-17.md` (commit f39516eb) — hardened multi-AI spec.
|
||||
- MODIFIED `.claude/commands/mailbox.md` — repointed off dead app `fabb3421` to mailbox app `1873b1b0` (get-token.sh mailbox tier); token() now delegates to get-token.sh.
|
||||
- MODIFIED `.claude/memory/feedback_365_remediation_tool.md` — fabb3421 marked DELETED; documented the mailbox app.
|
||||
- errorlog.md — correction (mailbox), plus earlier friction entries.
|
||||
|
||||
AD2 production (`C:\Shares\testdatadb`, NOT in repo — deployed via RMM):
|
||||
- `templates/datasheet-exact.js` — RTD (sensorNum 7) folded into temperature path. Save-state `.bak-2026-06-17-1646`.
|
||||
- `parsers/multiline.js` — widened serial regex + decode + `raw_serial_number` (Fix 4); `fileMtime` + `ingest_seq` (Fix 3). Save-states `.bak-2026-06-17-1713`, `.bak-2026-06-17-1726`.
|
||||
- `database/import.js` — `raw_serial_number` in INSERT + cross-model guard (Fix 4); `ingest_seq` + new same-day conflict rule (Fix 3). Same save-states.
|
||||
- DB schema: `test_records.raw_serial_number` (TEXT), `test_records.ingest_seq` (BIGINT), NEW `test_records_quarantine` table.
|
||||
- NOTE: the repo copies under `projects/dataforth-dos/datasheet-pipeline/implementation/` are STALE vs deployed (e.g. repo import.js shows a 5-tuple ON CONFLICT; deployed uses serial_number). Always edit deployed; reconcile repo later.
|
||||
|
||||
## Credentials & Secrets
|
||||
- Dataforth testdatadb PostgreSQL 18: host localhost (on AD2), port 5432, db `testdatadb`, user `testdatadb_app`, password `DfTestDB2026!` — found in plaintext in `database/db.js` defaults (no .env). VAULTED at `clients/dataforth/testdatadb-postgres.sops.yaml`.
|
||||
- Hoffman uploader OAuth creds at `C:\ProgramData\dataforth-uploader\credentials.json` on AD2 (ACL'd SYSTEM/Administrators/svc_testdatadb) — NOT vaulted (on-machine only); used by upload-to-api.js (client_credentials to CF_TOKEN_URL, bulk POST to CF_API_BASE/api/v1/TestReportDataFiles/bulk).
|
||||
- ACG mailbox Graph app (own tenant): `1873b1b0-3377-485c-a848-bae9b2f8f1f5`, vault `msp-tools/computerguru-mailbox.sops.yaml`, cert auth, SP disabled-when-idle.
|
||||
|
||||
## Infrastructure & Servers
|
||||
- AD2 = 192.168.0.6, Windows Server 2019 Standard, RMM agent id `cfa93bb6-0cdc-4d4e-a29e-1609cda6f047` ("ACG Internal"). Test-datasheet pipeline host.
|
||||
- testdatadb: Node + PostgreSQL 18; long-running Windows service `testdatadb` (restart via `Restart-Service` or the `TestDataDB-ServiceRestart` scheduled task). Pipeline cron = scheduled task `DataforthTestDatasheetUploader` (run-pipeline.ps1, fresh node process each run). Other tasks: HoffmanInventoryPull, TestDataDB-Backup, TestDataDB-VacuumAnalyze.
|
||||
- Pipeline: DOS stations (.DAT logs, QuickBASIC) -> HISTLOGS (C:\Shares\test\Ate\HISTLOGS\<logtype>\<model>.DAT) + station logs (C:\Shares\test\TS-xx\LOGS) + Recovery (C:\Shares\Recovery-TEST) -> import.js -> test_records (474,383 rows after Fix 4) -> render-datasheet.renderContent -> upload-to-api -> Hoffman bulk API -> public TestDataReport.aspx.
|
||||
- Staged original datasheets (ground truth, pre-regeneration): C:\Shares\test\STAGE\<TS-station>\<encoded-SN>.TXT (8.3 hex-prefix encoding: 17->H, 18->I, etc.). 11,922 staged files.
|
||||
- pg_dump: C:\Program Files\PostgreSQL\18\bin\pg_dump.exe. Backups in C:\Shares\testdatadb\_backups\.
|
||||
- VSS active (153 GB shadow storage). C: ~362 GB free.
|
||||
|
||||
## Commands & Outputs
|
||||
- Backup: VSS shadow `{fe3f53f6-e6b1-43f2-8da6-5be3b8cd2240}` (4:30 PM) + `testdatadb-2026-06-17-1630.dump` (611.9 MB).
|
||||
- Fix 1 validate: RTD scope = 23,937 rows (SCM5B34 9726, 8B35 5477, SCM5B35 5161, DSCA34 3573). Render byte-compare: data + final-test match; only cosmetic (leading `===` line + ~1-space column) differ (pre-existing, all families).
|
||||
- Phytec re-push: 103 rows (102 + audit unit), 45 updated + 58 unchanged, 0 errors.
|
||||
- Fix 4 pre-flight: 840 distinct dropped serials / 9,510 records / 100 models; 831 decoded-absent; 8 "collisions" = 5 false positives (lowercase/short) + 3 genuine (A819-1/A821-1/A821-2, reused across 8B34/8B36).
|
||||
- Fix 4 recovery: 606 genuine encoded units; 603 inserted, 3 quarantined, 0 errors; test_records 473,780 -> 474,383. Published: created=518, skipped=85 (no spec / unregistered model), 0 errors.
|
||||
- Fix 3 deploy: 7 patches OK, SQL-TEST valid (rolled-back upsert). Settle pass (re-import ~9k files) running at save time.
|
||||
|
||||
## Pending / Incomplete Tasks
|
||||
- Fix 3 settle pass running (background) — when done: re-run same-day sweep (violations -> ~0), publish settled rows via uploadBySerialNumbers (controlled), document on #32441.
|
||||
- Fix 2 (DSCA Final-Test rebuild) — the big one; derive per-subtype templates from staged `.TXT`, byte-validate. NOT started.
|
||||
- Fix 5 (backfill 379 cryptolocker-era units) — publish staged `.TXT` directly via a `legacy_cert_text` column (decision pending). NOT started.
|
||||
- Bulk RTD re-push (~23,800 remaining RTD certs) — awaiting Mike's go.
|
||||
- 85 recovered units skipped at publish (no spec entry / unregistered Hoffman model) — investigate.
|
||||
- Byte-for-byte DOS fidelity — deferred; disclosure note to John recorded on ticket (comment 419546930) to send when done.
|
||||
- Reconcile the stale repo copies under `projects/dataforth-dos/datasheet-pipeline/implementation/` with the deployed files.
|
||||
- coord API not reachable from AD2 (no VPN there) — Mike noted "make coord VPN-optional" as a future idea.
|
||||
|
||||
## Reference Information
|
||||
- Syncro ticket #32441 (id 112783550), Dataforth Corp (cust 578095, prepay 31.5h), contact John Lehman (2851723, jlehman@dataforth.com). Comments: Initial Issue 419541902, internal accounting 419541903, findings email 419542053, byte-fidelity follow-up 419546930, deploy record 419547387, progress update (emailed) 419548057, Fix-4 record 419549891.
|
||||
- Spec: projects/dataforth-dos/DATASHEET-FIX-SPEC-2026-06-17.md (commit f39516eb).
|
||||
- AD2 ad2-branch docs: DATASHEET-RTD-BUG-DIAGNOSIS / PARSING-FIDELITY-VERDICT / MISSING-UNITS-REPORT-FOR-JOHN / CONFLICT-RULE-FIX-PROPOSAL / EMAIL-TO-JOHN-datasheet-findings (all 2026-06-17).
|
||||
- Phytec SO 120006-A missing serials (now published): SCM5B35-02D 179754-01..32, 179756-01..07/09..21; SCM5B35-01D 178413-06..16, 179736-01..05; SCM5B35-1761 179740-01..13, 179753-01..21.
|
||||
- AD2 save-states: datasheet-exact.js.bak-2026-06-17-1646; multiline.js/import.js .bak-2026-06-17-1713 (Fix4) and .bak-2026-06-17-1726 (Fix3).
|
||||
- Encoded-serial decode: leading uppercase letter L, real prefix = String(L.charCodeAt(0)-55) (A=10..H=17..I=18); only applied to `^[A-Z]\d{3,}-`.
|
||||
Reference in New Issue
Block a user