diff --git a/projects/dataforth-dos/DATASHEET-FIX-SPEC-2026-06-17.md b/projects/dataforth-dos/DATASHEET-FIX-SPEC-2026-06-17.md new file mode 100644 index 00000000..ade31301 --- /dev/null +++ b/projects/dataforth-dos/DATASHEET-FIX-SPEC-2026-06-17.md @@ -0,0 +1,134 @@ +# Dataforth Test-Datasheet Pipeline — Fix Spec (hardened) + +**Date:** 2026-06-17 · **Host:** AD2 (`C:\Shares\testdatadb`, Node + PostgreSQL 18) · **Status:** SPEC for review — implementation driven on AD2 +**Inputs:** AD2 diagnosis (`DATASHEET-RTD-BUG-DIAGNOSIS`, `PARSING-FIDELITY-VERDICT`, `MISSING-UNITS-REPORT`, `CONFLICT-RULE-FIX-PROPOSAL`) + independent multi-AI review (Grok adversarial + Gemini). Owner direction on retest handling (Mike, 2026-06-17). + +All defects are in the **regeneration/ingestion** pipeline that replaced the cryptolocker-destroyed original parser/publisher. Source test data is intact (DB matches staged originals across 11,239 records, 0 parse faults). + +--- + +## 0. Ground truth (verified live, 2026-06-17) +- `test_records`: 473,780 rows = 473,780 distinct `serial_number` → **exactly one row per serial**. +- Unique constraints: `uq_test_records_sn` UNIQUE(serial_number) [operative]; redundant UNIQUE(log_type,model_number,serial_number,test_date,test_station). +- Columns incl. `raw_data` (verbatim .DAT), `overall_result`, `api_uploaded_at`, `forweb_exported_at`, `datasheet_exported_at`, `work_order`. +- **Deployed `database/import.js` uses `ON CONFLICT (serial_number)`** with `WHERE overall_result='FAIL' OR (EXCLUDED PASS AND EXCLUDED.test_date > test_records.test_date)`. +- **WARNING — repo drift:** the repo copy `…/implementation/database/import.js` is STALE (shows a 5-tuple `ON CONFLICT`). **Edit the DEPLOYED file; reconcile the repo copy after.** Verify every file's deployed content before changing it. + +## 0a. CROSS-CUTTING — re-publication discipline (MANDATORY for any fix that changes cert text) +Every fix below that alters rendered output must be published deliberately, not by blanket cache-clear: +1. **Diff before re-push.** For each candidate serial, render OLD vs NEW and only act where output actually changes. Do not clear `api_uploaded_at`/`forweb_exported_at` for unchanged renders. +2. **Re-POST semantics.** Hoffman bulk API is idempotent — returns `Unchanged` when content matches, overwrites when it differs (per diagnosis §6). Confirm this holds before bulk re-push; watch for dedup/version behavior. +3. **Targeted, staged rollout.** Re-publish in bounded batches (start with the Phytec 102), confirm counts, then widen. Log every batch. +4. **Rollback.** Keep the prior rendered text (or the prior template commit) for every re-published serial so a bad batch can be reverted. +5. **Audit framing.** A cert's text changing after initial publication is a bug correction — record it (ticket #32441 + an internal change log of affected serial ranges) so it's defensible in an audit. + +--- + +## Fix 1 — Defect A: RTD input labeled resistance, not temperature (the audit finding) +**File:** `templates/datasheet-exact.js` · **Scope:** ~24,000 certs (8B35, DSCA34, SCM5B34/35) · **Status:** fix written, needs scope/ground-truth proof. + +**Root cause:** `getSensorNum()` returns 7 for RTD sentypes (`s.includes('RTD')`); two branches on `sensorNum===7` emit `' Rin (ohms)'` header + unsigned value. Dataforth RTD certs report the input as Temperature (deg C); raw_data stimulus is already deg C. + +**Approach (AD2 diff):** fold `sensorNum===7` into the temperature branch (3–6) for header (`' Temp. (C)'`) and value (`formatSigned`). Leave the `i===13` ohm/ohm Lead-R override intact. + +**Hardening (multi-AI):** +- "15 renders changed / 0 non-RTD" proves a branch moved, **not** correctness. Before deploy: (a) **byte-compare** the fixed render against the staged original `.TXT` for a real RTD sample (8B35 incl. SN 179553-13, DSCA34, SCM5B34/35) — require exact match; (b) **confirm RTD-detection coverage**: count RTD-family rows in the DB (`model_number LIKE '%34%'/'%35%'` etc.) and confirm the regenerator's `s.includes('RTD')` actually classifies all of them as 7 (only 15 changing across 184 renders may mean the sample was RTD-thin, or some RTD sentypes aren't matched). + +**Risk:** LOW-MODERATE (localized; main risk is under-detecting which modules are RTD). **Deploy:** after byte-match + coverage check; then targeted re-push (Phytec 102 first). + +--- + +## Fix 2 — Defect B: DSCA Final-Test table wrong / dropped lines +**File:** `templates/datasheet-exact.js` (`DATA_LINES['DSCA']`, `buildTSpecs()` DSCA branch, accuracy-block titles) · **Scope:** up to ~78,000 DSCA certs · **Status:** NEEDS DESIGN — highest structural risk. + +**Root cause:** a single hardcoded `DATA_LINES['DSCA']` + single DSCA `buildTSpecs` branch; real DSCA modules have **per-subtype** Final-Test layouts → wrong names, garbage specs (`< 0 mA`, `+/- 0 %`), rows misaligned, lines dropped (e.g. Output Noise on DSCA38-05). Accuracy block also uses 5B/8B titles (`Vout (V)` / `====`) instead of DSCA's (`Output (V|mA)` / `----`). + +**Approach (multi-AI consensus — REVISED from AD2's DSCFIN.DAT idea):** +- **Do NOT reverse-engineer `DSCFIN.DAT`** (legacy DOS config; QB writer has hardcoded overrides outside the config → guaranteed edge-case drift). +- **Derive per-subtype templates from the staged original `.TXT`** (the actual correct customer certs are ground truth): group staged DSCA `.TXT` by subtype, extract each subtype's Final-Test parameter name/unit/spec list and accuracy titles directly. +- **Key subtype selection** on `model_number` (prefix) + `SENTYPE` / output-signal type. Build an explicit subtype→layout map. +- Fix the DSCA accuracy block titles/separators (`Output (V|mA)`, `----`). + +**Validation (hard gate):** generate ALL DSCA certs and **byte-for-byte diff vs the staged originals** across every subtype; zero-delta required. Any subtype with no staged original → flag, do not guess. + +**Risk:** HIGH (largest population + wrong numeric labels/limits). The longer pole; do after 1/3/4. **Deploy:** only after zero-delta validation per subtype; re-publish in batches with diff-gating. + +--- + +## Fix 3 — Retest handling: latest test supersedes (OWNER DIRECTION + hardening) +**File:** deployed `database/import.js` (`ON CONFLICT (serial_number)` WHERE clause) · **Scope:** ~311 stuck units now + all future retests · **Status:** design owner-set, implementation hardened. + +**Owner rule (Mike):** one row per serial; a new test on the SAME unit (same model / "everything else checks out") **supersedes anything prior — latest test wins**. A reused serial on a DIFFERENT product is NOT the same unit — recognize the collision, don't blindly overwrite. + +**Why the current rule fails:** strictly-greater date + date-only granularity → same-day reruns can't replace (~311 stuck on a non-final run). + +**Approach (hardened — both AIs refute pure scan-order as the recency signal):** +- Conflict on `serial_number`. Update when the incoming row is the SAME unit and is genuinely newer: + - `EXCLUDED.model_number = test_records.model_number` (same unit) **AND** `EXCLUDED.raw_data IS DISTINCT FROM test_records.raw_data` (real change; avoids re-push churn) **AND** incoming is at least as new. + - **Recency must not rely on import scan order alone.** Use `EXCLUDED.test_date >= test_records.test_date`, and break same-date ties with a **monotonic signal captured at parse time** — source `.DAT` mtime or an ingest sequence number (add a column, e.g. `ingest_seq` / `source_mtime`). Last-by-(date, tiebreaker) wins. +- **Collision handling (different `model_number`, same serial):** do NOT overwrite. Route to a `test_records_quarantine` table (or a flagged status) + alert. These are the reused generic serials (`1-1`, `1-2`) — genuinely different units. +- Keep the `FAIL → PASS` override. + +**Validation:** ingest the owner's **4-retest sample IN REVERSE chronological order**; final DB state MUST be the mathematically newest run. Re-run `tools/validate-parsing.js`; same-day violations → ~0. Confirm the 311 settle on the latest run. + +**Risk:** HIGH if scan-order is trusted (a future bulk re-import could overwrite newer with older across the whole DB). MITIGATED by the date+tiebreaker rule. **Deploy:** after the reverse-order sample passes; then re-import + diff-gated re-push of the 311. + +--- + +## Fix 4 — Importer drops letter-prefixed encoded serials +**File:** `parsers/multiline.js` (serial/date regex) · **Scope:** ~9,510 records / 840 serials / 141 models · **Status:** needs design (PK-boundary mutation). + +**Root cause:** `line.match(/^"(\d+-\d+[A-Za-z]?)","(\d{2}-\d{2}-\d{4})"$/)` — `\d+-\d+` requires leading digits, so DOS 8.3-encoded serials (`10243-1` → `A243-1`; first two digits → letter, `prefix = charCodeAt(0)-55`) never match → whole record silently dropped. + +**Approach (hardened — keep the literal, both AIs):** +- Widen regex to allow an optional leading letter: `/^"([A-Za-z]?\d+-\d+[A-Za-z]?)","(\d{2}-\d{2}-\d{4})"$/`. +- **Store BOTH:** add `raw_serial_number` (the literal file bytes, e.g. `A243-1`) and keep `serial_number` = decoded numeric (`10243-1`). UNIQUE stays on `serial_number`. Preserves a perfect audit trail. +- Decode only when the captured serial matches `^[A-Za-z]\d`. + +**Pre-flight (MANDATORY before any import):** run the parser over the ~9,510 dropped records read-only → emit CSV `raw_serial, decoded_serial, model_number`. **Search decoded serials for collisions against the existing 473,780 rows.** For each collision decide policy (a decoded `A243-1` colliding with a genuine `10243-1` of a different model = a real conflict — quarantine, don't merge two physical units). Only proceed once collisions are enumerated and a policy set. + +**Risk:** MODERATE (transforms the uniqueness key + customer lookup id). **Deploy:** after the collision CSV is clean/resolved; the re-import re-exercises the upsert path → run under the Fix-3 rule, diff-gated re-push. + +--- + +## Fix 5 — Backfill 379 cryptolocker-era units from staged originals +**Scope:** 379 units (Oct 2025–Jan 2026, 3 stations), no surviving `.DAT`; staged `.TXT` exist · **Status:** operational. + +**Key fact:** the staged `.TXT` were produced by the ORIGINAL (pre-crypto) renderer → already correct (no Defect A/B). + +**Approach (both AIs — publish directly, don't round-trip):** +- Add `legacy_cert_text` column. Insert the 379 with `raw_data = NULL`, `legacy_cert_text` = the staged `.TXT` content. +- Publisher serves `legacy_cert_text` when `raw_data IS NULL` (bypasses regeneration). +- Do NOT reverse `.TXT`→raw_data→re-render (two translation-loss points; guarantees drift). + +**Validation:** cross-reference the 379 serials against the ERP / work-order system to confirm they are valid shipped units before exposing via the API. Spot-check rendered vs staged text. + +**Consistency note:** these rows are permanently "original-renderer" output, divergent from what the fixed template would emit. Acceptable (and safer) given no raw source; document the class. + +**Risk:** LOW (bounded 379, immutable text) — but zero tolerance for wrong text (no raw fallback). **Deploy:** after ERP cross-check. + +--- + +## Risk ranking (combined) & recommended order +| Rank | Fix | Why | +|---|---|---| +| 1 (tie) | **Fix 3** retest | scan-order recency could corrupt DB-wide on any future re-import (Gemini #1) | +| 1 (tie) | **Fix 2** DSCA | largest scope + wrong numeric labels/limits; design-heavy (Grok #1) | +| 3 | **Fix 4** serials | mutates the uniqueness/lookup key; merge risk | +| 4 | **Fix 1** RTD | localized; risk is under-scoping RTD detection | +| 5 | **Fix 5** backfill | small, immutable, but no raw fallback | + +**Suggested execution order (lowest-risk customer win first, hardest last):** +1. **Fix 1 (RTD label)** — after byte-match + scope check → clears the audit + corrects the Phytec 102 (deploy tonight candidate). +2. Re-publish Phytec 102 (diff-gated). +3. **Fix 4 (serial decode)** — after clean collision CSV. +4. **Fix 3 (retest rule)** — after reverse-order sample passes. +5. **Fix 2 (DSCA rebuild)** — after per-subtype zero-delta validation. +6. **Fix 5 (backfill 379)** — after ERP cross-check. + +## Open items needing an owner decision +- **Fix 3:** add a real tie-breaker column (`ingest_seq`/`source_mtime`) vs accept `test_date >=` + scan-order? (recommend the column.) +- **Fix 3/4 collisions:** quarantine table + alert vs reject-and-log? (recommend quarantine table.) +- **Fix 4:** add `raw_serial_number` column (recommended) — schema change. +- **Fix 5:** add `legacy_cert_text` column + publisher branch (recommended) — schema + publisher change. +- **Tonight:** scope = Fix 1 + Phytec re-push only (smallest safe win); the rest staged after.