Files
claudetools/projects/dataforth-dos/session-logs/2026-06/2026-06-17-mike-dataforth-datasheet-fixes.md
Mike Swanson 7ad6202e6e sync: auto-sync from GURU-5070 at 2026-06-17 17:34:25
Author: Mike Swanson
Machine: GURU-5070
Timestamp: 2026-06-17 17:34:25
2026-06-17 17:34:41 -07:00

12 KiB

User

  • User: Mike Swanson (mike)
  • Machine: GURU-5070
  • Role: admin

Session Summary

Drove the Dataforth test-datasheet pipeline remediation end-to-end on host AD2, opened from a /sync that surfaced an AD2 cloud session's diagnosis. AD2 (a fork that pushes to the ad2 branch, NOT main) had already root-caused the datasheet defects John Lehman reported and pushed a diagnosis + proposals. Read John's two newest emails (which required repointing the broken /mailbox skill off a deleted Graph app), which added a live Phytec Germany deliverable and a concrete DSCA Defect-B example. Created Syncro ticket #32441 (Dataforth Corp, contact John Lehman, warranty labor) and emailed John a findings/plan summary.

Ran a multi-AI (Grok adversarial + Gemini) review of the fix approaches, which materially hardened them, and committed the consolidated spec to main (projects/dataforth-dos/DATASHEET-FIX-SPEC-2026-06-17.md). Then, with backups in place (VSS shadow + 612 MB pg_dump) and per-file save-states, deployed three fixes to the live pipeline on AD2 via RMM (running node as SYSTEM): Fix 1 (RTD label), Fix 4 (encoded-serial importer recovery), and Fix 3 (retest "latest wins"). Each followed diagnose/validate-before-write with explicit verification.

Fix 1: RTD modules rendered the input column as Rin (ohms) instead of Temp. (C). Validated read-only (byte-compare vs staged originals; data correct, label fixed; remaining cosmetic diffs are pre-existing across all families — leading === line + ~1-space column shift — and were deferred per Mike). Deployed, re-pushed the 102 Phytec SCM5B35 certs (SO 120006-A) + the Wellbore audit unit 179553-13 to Hoffman (103 rows, 0 errors).

Fix 4: importer dropped letter-prefixed encoded serials (DOS 8.3 encoding, e.g. A243-1=10243-1). A read-only pre-flight scan revealed the naive "decode any leading letter" produced false positives (lowercase/short tokens), so the decode rule was tightened to ^[A-Z]\d{3,}-. Added raw_serial_number column + test_records_quarantine table; patched multiline.js+import.js; recovered 603 units (3 reused-serial collisions quarantined), published 518 to Hoffman.

Fix 3: same-day retests could keep a non-final run. Added an ingest_seq tiebreaker (source mtime x 1e6 + line index); rewrote the conflict rule (date primary, same-day broken by ingest_seq, churn-guarded by raw_data IS DISTINCT, cross-model quarantined). SQL validated via a rolled-back execution test. The settle pass (re-import of ~9k multiline logs) was running in the background at save time.

Key Decisions

  • Filed the fix spec on main (not the ad2 fork branch) and left AD2 a coord message; later switched to driving the fixes directly from GURU-5070 via RMM when Mike said VPN was up.
  • Multi-AI = the grok/gemini CLI skills (the established "MultiAI" pattern), not the Workflow tool.
  • Fix 2 (DSCA): per both AIs, DERIVE per-subtype templates from the staged .TXT originals (ground truth) rather than reverse-engineering the DOS DSCFIN.DAT binary. (Deferred — not yet built.)
  • Fix 4 decode rule restricted to ^[A-Z]\d{3,}- after the pre-flight showed lowercase/short letter-prefixed serials are NOT 8.3 encodings (false positives). Kept the literal in raw_serial_number.
  • Fix 3 recency uses test_date primary with ingest_seq (mtime+line) only as a same-day tiebreaker — avoids the fragile pure-scan-order recency both AIs refuted, while keeping date (the reliable signal) authoritative.
  • Added a cross-model overwrite guard to the live import.js conflict rule (during Fix 4) so a different-model serial collision can never clobber an existing unit; full live-path quarantine routing remains Fix 3 territory.
  • Byte-for-byte DOS fidelity (the cosmetic render gap) deferred per Mike; recorded as a disclosure note to send John when the work is complete.
  • All production writes gated behind: VSS shadow + pg_dump backup, per-file timestamped .bak save-states, read-only validation first, atomic multi-patch (prepare-all-then-write), JS load-check, and (for Fix 3) a rolled-back SQL execution test.

Problems Encountered

  • /mailbox (Graph) was dead — AADSTS700016, the Claude-MSP-Access app fabb3421 was deleted. The skill had been claimed-fixed in a prior session but never repointed. Repointed mailbox.md + memory to the dedicated mailbox app 1873b1b0 via get-token.sh ... mailbox (cert auth); logged a --correction.
  • Ticket contact_id would not set via POST or PUT (returned null) — but the Syncro GUI showed John as contact (API serialization quirk). Confirmed with Mike before sending the customer email.
  • First Phytec re-push matched only 58/102 — the DB stores a mix of padded (179754-01) and unpadded (179754-1) serials. Re-ran with both forms (union) → all 102.
  • WizTree zip (205 MB) was committed on the ad2 branch. Moved it on AD2 into a new gitignored clients/dataforth/local-artifacts/ dir + added the ignore rule, rather than racing AD2's active session with a competing commit.
  • RMM runs as SYSTEM; git balks at the repo on AD2 ("dubious ownership") — used git -c safe.directory=... for reads; did all DB/node work as SYSTEM (PG auth via db.js defaults works regardless of user).
  • /tmp path mismatch between Git Bash and Python recurred — passed tokens via env var instead of a /tmp file.

Configuration Changes

Repo (main):

  • NEW projects/dataforth-dos/DATASHEET-FIX-SPEC-2026-06-17.md (commit f39516eb) — hardened multi-AI spec.
  • MODIFIED .claude/commands/mailbox.md — repointed off dead app fabb3421 to mailbox app 1873b1b0 (get-token.sh mailbox tier); token() now delegates to get-token.sh.
  • MODIFIED .claude/memory/feedback_365_remediation_tool.md — fabb3421 marked DELETED; documented the mailbox app.
  • errorlog.md — correction (mailbox), plus earlier friction entries.

AD2 production (C:\Shares\testdatadb, NOT in repo — deployed via RMM):

  • templates/datasheet-exact.js — RTD (sensorNum 7) folded into temperature path. Save-state .bak-2026-06-17-1646.
  • parsers/multiline.js — widened serial regex + decode + raw_serial_number (Fix 4); fileMtime + ingest_seq (Fix 3). Save-states .bak-2026-06-17-1713, .bak-2026-06-17-1726.
  • database/import.jsraw_serial_number in INSERT + cross-model guard (Fix 4); ingest_seq + new same-day conflict rule (Fix 3). Same save-states.
  • DB schema: test_records.raw_serial_number (TEXT), test_records.ingest_seq (BIGINT), NEW test_records_quarantine table.
  • NOTE: the repo copies under projects/dataforth-dos/datasheet-pipeline/implementation/ are STALE vs deployed (e.g. repo import.js shows a 5-tuple ON CONFLICT; deployed uses serial_number). Always edit deployed; reconcile repo later.

Credentials & Secrets

  • Dataforth testdatadb PostgreSQL 18: host localhost (on AD2), port 5432, db testdatadb, user testdatadb_app, password DfTestDB2026! — found in plaintext in database/db.js defaults (no .env). VAULTED at clients/dataforth/testdatadb-postgres.sops.yaml.
  • Hoffman uploader OAuth creds at C:\ProgramData\dataforth-uploader\credentials.json on AD2 (ACL'd SYSTEM/Administrators/svc_testdatadb) — NOT vaulted (on-machine only); used by upload-to-api.js (client_credentials to CF_TOKEN_URL, bulk POST to CF_API_BASE/api/v1/TestReportDataFiles/bulk).
  • ACG mailbox Graph app (own tenant): 1873b1b0-3377-485c-a848-bae9b2f8f1f5, vault msp-tools/computerguru-mailbox.sops.yaml, cert auth, SP disabled-when-idle.

Infrastructure & Servers

  • AD2 = 192.168.0.6, Windows Server 2019 Standard, RMM agent id cfa93bb6-0cdc-4d4e-a29e-1609cda6f047 ("ACG Internal"). Test-datasheet pipeline host.
  • testdatadb: Node + PostgreSQL 18; long-running Windows service testdatadb (restart via Restart-Service or the TestDataDB-ServiceRestart scheduled task). Pipeline cron = scheduled task DataforthTestDatasheetUploader (run-pipeline.ps1, fresh node process each run). Other tasks: HoffmanInventoryPull, TestDataDB-Backup, TestDataDB-VacuumAnalyze.
  • Pipeline: DOS stations (.DAT logs, QuickBASIC) -> HISTLOGS (C:\Shares\test\Ate\HISTLOGS<logtype><model>.DAT) + station logs (C:\Shares\test\TS-xx\LOGS) + Recovery (C:\Shares\Recovery-TEST) -> import.js -> test_records (474,383 rows after Fix 4) -> render-datasheet.renderContent -> upload-to-api -> Hoffman bulk API -> public TestDataReport.aspx.
  • Staged original datasheets (ground truth, pre-regeneration): C:\Shares\test\STAGE<TS-station><encoded-SN>.TXT (8.3 hex-prefix encoding: 17->H, 18->I, etc.). 11,922 staged files.
  • pg_dump: C:\Program Files\PostgreSQL\18\bin\pg_dump.exe. Backups in C:\Shares\testdatadb_backups.
  • VSS active (153 GB shadow storage). C: ~362 GB free.

Commands & Outputs

  • Backup: VSS shadow {fe3f53f6-e6b1-43f2-8da6-5be3b8cd2240} (4:30 PM) + testdatadb-2026-06-17-1630.dump (611.9 MB).
  • Fix 1 validate: RTD scope = 23,937 rows (SCM5B34 9726, 8B35 5477, SCM5B35 5161, DSCA34 3573). Render byte-compare: data + final-test match; only cosmetic (leading === line + ~1-space column) differ (pre-existing, all families).
  • Phytec re-push: 103 rows (102 + audit unit), 45 updated + 58 unchanged, 0 errors.
  • Fix 4 pre-flight: 840 distinct dropped serials / 9,510 records / 100 models; 831 decoded-absent; 8 "collisions" = 5 false positives (lowercase/short) + 3 genuine (A819-1/A821-1/A821-2, reused across 8B34/8B36).
  • Fix 4 recovery: 606 genuine encoded units; 603 inserted, 3 quarantined, 0 errors; test_records 473,780 -> 474,383. Published: created=518, skipped=85 (no spec / unregistered model), 0 errors.
  • Fix 3 deploy: 7 patches OK, SQL-TEST valid (rolled-back upsert). Settle pass (re-import ~9k files) running at save time.

Pending / Incomplete Tasks

  • Fix 3 settle pass running (background) — when done: re-run same-day sweep (violations -> ~0), publish settled rows via uploadBySerialNumbers (controlled), document on #32441.
  • Fix 2 (DSCA Final-Test rebuild) — the big one; derive per-subtype templates from staged .TXT, byte-validate. NOT started.
  • Fix 5 (backfill 379 cryptolocker-era units) — publish staged .TXT directly via a legacy_cert_text column (decision pending). NOT started.
  • Bulk RTD re-push (~23,800 remaining RTD certs) — awaiting Mike's go.
  • 85 recovered units skipped at publish (no spec entry / unregistered Hoffman model) — investigate.
  • Byte-for-byte DOS fidelity — deferred; disclosure note to John recorded on ticket (comment 419546930) to send when done.
  • Reconcile the stale repo copies under projects/dataforth-dos/datasheet-pipeline/implementation/ with the deployed files.
  • coord API not reachable from AD2 (no VPN there) — Mike noted "make coord VPN-optional" as a future idea.

Reference Information

  • Syncro ticket #32441 (id 112783550), Dataforth Corp (cust 578095, prepay 31.5h), contact John Lehman (2851723, jlehman@dataforth.com). Comments: Initial Issue 419541902, internal accounting 419541903, findings email 419542053, byte-fidelity follow-up 419546930, deploy record 419547387, progress update (emailed) 419548057, Fix-4 record 419549891.
  • Spec: projects/dataforth-dos/DATASHEET-FIX-SPEC-2026-06-17.md (commit f39516eb).
  • AD2 ad2-branch docs: DATASHEET-RTD-BUG-DIAGNOSIS / PARSING-FIDELITY-VERDICT / MISSING-UNITS-REPORT-FOR-JOHN / CONFLICT-RULE-FIX-PROPOSAL / EMAIL-TO-JOHN-datasheet-findings (all 2026-06-17).
  • Phytec SO 120006-A missing serials (now published): SCM5B35-02D 179754-01..32, 179756-01..07/09..21; SCM5B35-01D 178413-06..16, 179736-01..05; SCM5B35-1761 179740-01..13, 179753-01..21.
  • AD2 save-states: datasheet-exact.js.bak-2026-06-17-1646; multiline.js/import.js .bak-2026-06-17-1713 (Fix4) and .bak-2026-06-17-1726 (Fix3).
  • Encoded-serial decode: leading uppercase letter L, real prefix = String(L.charCodeAt(0)-55) (A=10..H=17..I=18); only applied to ^[A-Z]\d{3,}-.