12 KiB
User
- User: Mike Swanson (mike)
- Machine: GURU-5070
- Role: admin
Session Summary
Drove the Dataforth test-datasheet pipeline remediation end-to-end on host AD2, opened from a /sync that surfaced an AD2 cloud session's diagnosis. AD2 (a fork that pushes to the ad2 branch, NOT main) had already root-caused the datasheet defects John Lehman reported and pushed a diagnosis + proposals. Read John's two newest emails (which required repointing the broken /mailbox skill off a deleted Graph app), which added a live Phytec Germany deliverable and a concrete DSCA Defect-B example. Created Syncro ticket #32441 (Dataforth Corp, contact John Lehman, warranty labor) and emailed John a findings/plan summary.
Ran a multi-AI (Grok adversarial + Gemini) review of the fix approaches, which materially hardened them, and committed the consolidated spec to main (projects/dataforth-dos/DATASHEET-FIX-SPEC-2026-06-17.md). Then, with backups in place (VSS shadow + 612 MB pg_dump) and per-file save-states, deployed three fixes to the live pipeline on AD2 via RMM (running node as SYSTEM): Fix 1 (RTD label), Fix 4 (encoded-serial importer recovery), and Fix 3 (retest "latest wins"). Each followed diagnose/validate-before-write with explicit verification.
Fix 1: RTD modules rendered the input column as Rin (ohms) instead of Temp. (C). Validated read-only (byte-compare vs staged originals; data correct, label fixed; remaining cosmetic diffs are pre-existing across all families — leading === line + ~1-space column shift — and were deferred per Mike). Deployed, re-pushed the 102 Phytec SCM5B35 certs (SO 120006-A) + the Wellbore audit unit 179553-13 to Hoffman (103 rows, 0 errors).
Fix 4: importer dropped letter-prefixed encoded serials (DOS 8.3 encoding, e.g. A243-1=10243-1). A read-only pre-flight scan revealed the naive "decode any leading letter" produced false positives (lowercase/short tokens), so the decode rule was tightened to ^[A-Z]\d{3,}-. Added raw_serial_number column + test_records_quarantine table; patched multiline.js+import.js; recovered 603 units (3 reused-serial collisions quarantined), published 518 to Hoffman.
Fix 3: same-day retests could keep a non-final run. Added an ingest_seq tiebreaker (source mtime x 1e6 + line index); rewrote the conflict rule (date primary, same-day broken by ingest_seq, churn-guarded by raw_data IS DISTINCT, cross-model quarantined). SQL validated via a rolled-back execution test. The settle pass (re-import of ~9k multiline logs) was running in the background at save time.
Key Decisions
- Filed the fix spec on
main(not thead2fork branch) and left AD2 a coord message; later switched to driving the fixes directly from GURU-5070 via RMM when Mike said VPN was up. - Multi-AI = the grok/gemini CLI skills (the established "MultiAI" pattern), not the Workflow tool.
- Fix 2 (DSCA): per both AIs, DERIVE per-subtype templates from the staged
.TXToriginals (ground truth) rather than reverse-engineering the DOSDSCFIN.DATbinary. (Deferred — not yet built.) - Fix 4 decode rule restricted to
^[A-Z]\d{3,}-after the pre-flight showed lowercase/short letter-prefixed serials are NOT 8.3 encodings (false positives). Kept the literal inraw_serial_number. - Fix 3 recency uses test_date primary with
ingest_seq(mtime+line) only as a same-day tiebreaker — avoids the fragile pure-scan-order recency both AIs refuted, while keeping date (the reliable signal) authoritative. - Added a cross-model overwrite guard to the live
import.jsconflict rule (during Fix 4) so a different-model serial collision can never clobber an existing unit; full live-path quarantine routing remains Fix 3 territory. - Byte-for-byte DOS fidelity (the cosmetic render gap) deferred per Mike; recorded as a disclosure note to send John when the work is complete.
- All production writes gated behind: VSS shadow + pg_dump backup, per-file timestamped
.baksave-states, read-only validation first, atomic multi-patch (prepare-all-then-write), JS load-check, and (for Fix 3) a rolled-back SQL execution test.
Problems Encountered
/mailbox(Graph) was dead —AADSTS700016, theClaude-MSP-Accessappfabb3421was deleted. The skill had been claimed-fixed in a prior session but never repointed. Repointedmailbox.md+ memory to the dedicated mailbox app1873b1b0viaget-token.sh ... mailbox(cert auth); logged a--correction.- Ticket
contact_idwould not set via POST or PUT (returned null) — but the Syncro GUI showed John as contact (API serialization quirk). Confirmed with Mike before sending the customer email. - First Phytec re-push matched only 58/102 — the DB stores a mix of padded (
179754-01) and unpadded (179754-1) serials. Re-ran with both forms (union) → all 102. - WizTree zip (205 MB) was committed on the
ad2branch. Moved it on AD2 into a new gitignoredclients/dataforth/local-artifacts/dir + added the ignore rule, rather than racing AD2's active session with a competing commit. - RMM runs as SYSTEM; git balks at the repo on AD2 ("dubious ownership") — used
git -c safe.directory=...for reads; did all DB/node work as SYSTEM (PG auth via db.js defaults works regardless of user). /tmppath mismatch between Git Bash and Python recurred — passed tokens via env var instead of a/tmpfile.
Configuration Changes
Repo (main):
- NEW
projects/dataforth-dos/DATASHEET-FIX-SPEC-2026-06-17.md(commit f39516eb) — hardened multi-AI spec. - MODIFIED
.claude/commands/mailbox.md— repointed off dead appfabb3421to mailbox app1873b1b0(get-token.sh mailbox tier); token() now delegates to get-token.sh. - MODIFIED
.claude/memory/feedback_365_remediation_tool.md— fabb3421 marked DELETED; documented the mailbox app. - errorlog.md — correction (mailbox), plus earlier friction entries.
AD2 production (C:\Shares\testdatadb, NOT in repo — deployed via RMM):
templates/datasheet-exact.js— RTD (sensorNum 7) folded into temperature path. Save-state.bak-2026-06-17-1646.parsers/multiline.js— widened serial regex + decode +raw_serial_number(Fix 4);fileMtime+ingest_seq(Fix 3). Save-states.bak-2026-06-17-1713,.bak-2026-06-17-1726.database/import.js—raw_serial_numberin INSERT + cross-model guard (Fix 4);ingest_seq+ new same-day conflict rule (Fix 3). Same save-states.- DB schema:
test_records.raw_serial_number(TEXT),test_records.ingest_seq(BIGINT), NEWtest_records_quarantinetable. - NOTE: the repo copies under
projects/dataforth-dos/datasheet-pipeline/implementation/are STALE vs deployed (e.g. repo import.js shows a 5-tuple ON CONFLICT; deployed uses serial_number). Always edit deployed; reconcile repo later.
Credentials & Secrets
- Dataforth testdatadb PostgreSQL 18: host localhost (on AD2), port 5432, db
testdatadb, usertestdatadb_app, passwordDfTestDB2026!— found in plaintext indatabase/db.jsdefaults (no .env). VAULTED atclients/dataforth/testdatadb-postgres.sops.yaml. - Hoffman uploader OAuth creds at
C:\ProgramData\dataforth-uploader\credentials.jsonon AD2 (ACL'd SYSTEM/Administrators/svc_testdatadb) — NOT vaulted (on-machine only); used by upload-to-api.js (client_credentials to CF_TOKEN_URL, bulk POST to CF_API_BASE/api/v1/TestReportDataFiles/bulk). - ACG mailbox Graph app (own tenant):
1873b1b0-3377-485c-a848-bae9b2f8f1f5, vaultmsp-tools/computerguru-mailbox.sops.yaml, cert auth, SP disabled-when-idle.
Infrastructure & Servers
- AD2 = 192.168.0.6, Windows Server 2019 Standard, RMM agent id
cfa93bb6-0cdc-4d4e-a29e-1609cda6f047("ACG Internal"). Test-datasheet pipeline host. - testdatadb: Node + PostgreSQL 18; long-running Windows service
testdatadb(restart viaRestart-Serviceor theTestDataDB-ServiceRestartscheduled task). Pipeline cron = scheduled taskDataforthTestDatasheetUploader(run-pipeline.ps1, fresh node process each run). Other tasks: HoffmanInventoryPull, TestDataDB-Backup, TestDataDB-VacuumAnalyze. - Pipeline: DOS stations (.DAT logs, QuickBASIC) -> HISTLOGS (C:\Shares\test\Ate\HISTLOGS<logtype><model>.DAT) + station logs (C:\Shares\test\TS-xx\LOGS) + Recovery (C:\Shares\Recovery-TEST) -> import.js -> test_records (474,383 rows after Fix 4) -> render-datasheet.renderContent -> upload-to-api -> Hoffman bulk API -> public TestDataReport.aspx.
- Staged original datasheets (ground truth, pre-regeneration): C:\Shares\test\STAGE<TS-station><encoded-SN>.TXT (8.3 hex-prefix encoding: 17->H, 18->I, etc.). 11,922 staged files.
- pg_dump: C:\Program Files\PostgreSQL\18\bin\pg_dump.exe. Backups in C:\Shares\testdatadb_backups.
- VSS active (153 GB shadow storage). C: ~362 GB free.
Commands & Outputs
- Backup: VSS shadow
{fe3f53f6-e6b1-43f2-8da6-5be3b8cd2240}(4:30 PM) +testdatadb-2026-06-17-1630.dump(611.9 MB). - Fix 1 validate: RTD scope = 23,937 rows (SCM5B34 9726, 8B35 5477, SCM5B35 5161, DSCA34 3573). Render byte-compare: data + final-test match; only cosmetic (leading
===line + ~1-space column) differ (pre-existing, all families). - Phytec re-push: 103 rows (102 + audit unit), 45 updated + 58 unchanged, 0 errors.
- Fix 4 pre-flight: 840 distinct dropped serials / 9,510 records / 100 models; 831 decoded-absent; 8 "collisions" = 5 false positives (lowercase/short) + 3 genuine (A819-1/A821-1/A821-2, reused across 8B34/8B36).
- Fix 4 recovery: 606 genuine encoded units; 603 inserted, 3 quarantined, 0 errors; test_records 473,780 -> 474,383. Published: created=518, skipped=85 (no spec / unregistered model), 0 errors.
- Fix 3 deploy: 7 patches OK, SQL-TEST valid (rolled-back upsert). Settle pass (re-import ~9k files) running at save time.
Pending / Incomplete Tasks
- Fix 3 settle pass running (background) — when done: re-run same-day sweep (violations -> ~0), publish settled rows via uploadBySerialNumbers (controlled), document on #32441.
- Fix 2 (DSCA Final-Test rebuild) — the big one; derive per-subtype templates from staged
.TXT, byte-validate. NOT started. - Fix 5 (backfill 379 cryptolocker-era units) — publish staged
.TXTdirectly via alegacy_cert_textcolumn (decision pending). NOT started. - Bulk RTD re-push (~23,800 remaining RTD certs) — awaiting Mike's go.
- 85 recovered units skipped at publish (no spec entry / unregistered Hoffman model) — investigate.
- Byte-for-byte DOS fidelity — deferred; disclosure note to John recorded on ticket (comment 419546930) to send when done.
- Reconcile the stale repo copies under
projects/dataforth-dos/datasheet-pipeline/implementation/with the deployed files. - coord API not reachable from AD2 (no VPN there) — Mike noted "make coord VPN-optional" as a future idea.
Reference Information
- Syncro ticket #32441 (id 112783550), Dataforth Corp (cust 578095, prepay 31.5h), contact John Lehman (2851723, jlehman@dataforth.com). Comments: Initial Issue 419541902, internal accounting 419541903, findings email 419542053, byte-fidelity follow-up 419546930, deploy record 419547387, progress update (emailed) 419548057, Fix-4 record 419549891.
- Spec: projects/dataforth-dos/DATASHEET-FIX-SPEC-2026-06-17.md (commit f39516eb).
- AD2 ad2-branch docs: DATASHEET-RTD-BUG-DIAGNOSIS / PARSING-FIDELITY-VERDICT / MISSING-UNITS-REPORT-FOR-JOHN / CONFLICT-RULE-FIX-PROPOSAL / EMAIL-TO-JOHN-datasheet-findings (all 2026-06-17).
- Phytec SO 120006-A missing serials (now published): SCM5B35-02D 179754-01..32, 179756-01..07/09..21; SCM5B35-01D 178413-06..16, 179736-01..05; SCM5B35-1761 179740-01..13, 179753-01..21.
- AD2 save-states: datasheet-exact.js.bak-2026-06-17-1646; multiline.js/import.js .bak-2026-06-17-1713 (Fix4) and .bak-2026-06-17-1726 (Fix3).
- Encoded-serial decode: leading uppercase letter L, real prefix = String(L.charCodeAt(0)-55) (A=10..H=17..I=18); only applied to
^[A-Z]\d{3,}-.