Files
claudetools/projects/dataforth-dos/CONFLICT-RULE-FIX-PROPOSAL-2026-06-17.md
Mike Swanson d58d1dd76c dataforth(datasheet): same-day retest faithfulness — exposure sweep + fix proposal
Whole-source sweep (981,716 records / 406,549 serials): 6,515 same-day multi-run
events; DB holds a NON-latest run for 311 (the strictly-greater-date conflict rule
freezes on an arbitrary same-day run). Corrects the verdict doc to flag same-day
retests as a latest-wins faithfulness violation (not benign). Adds the proposed
>= -with-data-differs conflict-rule fix (diagnose-only) and the sweep tool.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-18 13:02:32 -07:00

5.0 KiB

Proposal — make the DB hold the LATEST test run (same-day retest fix)

Date: 2026-06-17 · Host: AD2 · Status: PROPOSAL — diagnose-only, review before deploying File to change: C:\Shares\testdatadb\database\import.js (repo: projects/dataforth-dos/database/import.js) Evidence: PARSING-FIDELITY-VERDICT-2026-06-17.md, SAMEDAY-RETEST-EXPOSURE-2026-06-17.txt

Problem

test_records is one row per serial number. On re-import, the INSERT ... ON CONFLICT (serial_number) updates only when:

WHERE test_records.overall_result = 'FAIL'
   OR (EXCLUDED.overall_result = 'PASS' AND EXCLUDED.test_date > test_records.test_date)

The date comparison is strictly greater, and the .DAT serial/date line carries date only (no time). So when a unit is tested two or more times on the same date, the first same-day run to be imported wins and no later same-day run can replace it. The DB — and therefore the website datasheet — can show a non-final run.

This is the documented audit failure mode: same-day runs are usually trim / re-test iterations, and the last run is the accepted certificate result.

Exposure (whole-source sweep, 2026-06-17)

981,716 records parsed across 26,815 .DAT files (406,549 serials):

  • Same-day multi-run events (distinct values): 6,515 across 5,977 serials
  • DB already on the latest same-day run: 3,803
  • Superseded by a later-date retest (fine): 984
  • DB on a non-latest run (the defect): 311
  • Serial absent from DB (collisions/completeness): 1,417

Root cause

  1. Strictly-greater date (>) in the conflict WHERE — rejects all same-date updates.
  2. Date-only granularity — no intra-day timestamp in the .DAT to order same-day runs.

Proposed fix (minimal, guarded)

Allow a same-date PASS to overwrite only when the data actually differs, so the last differing same-day run processed wins (imports run in chronological append order, and the live station logs are scanned last — so the last-processed run is the latest):

ON CONFLICT (serial_number) DO UPDATE SET
    log_type = EXCLUDED.log_type,
    model_number = EXCLUDED.model_number,
    test_date = EXCLUDED.test_date,
    test_station = EXCLUDED.test_station,
    overall_result = EXCLUDED.overall_result,
    raw_data = EXCLUDED.raw_data,
    source_file = EXCLUDED.source_file,
    api_uploaded_at = NULL,
    forweb_exported_at = NULL
WHERE test_records.overall_result = 'FAIL'
   OR (EXCLUDED.overall_result = 'PASS' AND EXCLUDED.test_date > test_records.test_date)
   OR (EXCLUDED.overall_result = 'PASS' AND EXCLUDED.test_date = test_records.test_date
       AND EXCLUDED.raw_data IS DISTINCT FROM test_records.raw_data)   -- NEW: latest same-day run wins

The added clause only fires on a genuine same-date data change, so identical re-imports do not needlessly clear api_uploaded_at (avoids re-push churn).

Behavior after fix

Existing Incoming Before After
PASS date D PASS date D, different data ignored (stale) updated → latest run
PASS date D PASS date D, identical ignored ignored (no churn)
PASS date D PASS date D+1 updated updated (unchanged)
PASS date D+1 PASS date D ignored ignored (unchanged)
FAIL PASS (any date) updated updated (unchanged)

Caveats / assumptions

  • Relies on chronological append order within a .DAT and on the live station logs being scanned last (they are: runImport does HISTLOGS → Recovery → station TEST_PATH). If a serial's latest run existed only in HISTLOGS (scanned first) and an older copy in a station log (scanned last), the older copy would win. Rare, but possible. For a hard guarantee, add a monotonic tiebreaker (ingest sequence, or a per-run timestamp if the test program can emit one) — a larger change.
  • Re-push impact: the 311 corrected rows (plus any future same-day retests) will clear api_uploaded_at and re-upload to Hoffman on the next run. Expected and desired (the website gets the final result), but it is outbound API traffic — run deliberately.
  • Does NOT fix generic reused serials (1-1, 1-2, …) that collide across different products, nor the 608 units absent from the DB. Those are separate items (serial-uniqueness model / ingestion completeness).

Stronger alternative (larger migration)

If full per-run archival is required (every test sheet reproducible), replace the UNIQUE (serial_number) model with a composite key (serial_number, test_date, run_sequence) (or store all runs and select the latest at render time). This preserves every run and removes the same-day ambiguity entirely, but is a schema migration + dedupe + render/upload changes — propose separately if desired.

Rollout (after approval)

  1. Apply the WHERE-clause change to database/import.js (repo copy first, review, then deploy).
  2. Re-run the import so the 311 same-day cases settle on the latest run.
  3. Let the upload path re-push the cleared rows; confirm counts.
  4. Re-run tools/validate-parsing.js to confirm same-day violations drop to ~0.