Whole-source sweep (981,716 records / 406,549 serials): 6,515 same-day multi-run events; DB holds a NON-latest run for 311 (the strictly-greater-date conflict rule freezes on an arbitrary same-day run). Corrects the verdict doc to flag same-day retests as a latest-wins faithfulness violation (not benign). Adds the proposed >= -with-data-differs conflict-rule fix (diagnose-only) and the sweep tool. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
84 lines
5.0 KiB
Markdown
84 lines
5.0 KiB
Markdown
# Proposal — make the DB hold the LATEST test run (same-day retest fix)
|
|
|
|
**Date:** 2026-06-17 · **Host:** AD2 · **Status:** PROPOSAL — diagnose-only, review before deploying
|
|
**File to change:** `C:\Shares\testdatadb\database\import.js` (repo: `projects/dataforth-dos/database/import.js`)
|
|
**Evidence:** `PARSING-FIDELITY-VERDICT-2026-06-17.md`, `SAMEDAY-RETEST-EXPOSURE-2026-06-17.txt`
|
|
|
|
## Problem
|
|
|
|
`test_records` is one row per serial number. On re-import, the `INSERT ... ON CONFLICT (serial_number)` updates only when:
|
|
|
|
```sql
|
|
WHERE test_records.overall_result = 'FAIL'
|
|
OR (EXCLUDED.overall_result = 'PASS' AND EXCLUDED.test_date > test_records.test_date)
|
|
```
|
|
|
|
The date comparison is **strictly greater**, and the `.DAT` serial/date line carries **date only** (no time). So when a unit is tested two or more times on the **same date**, the first same-day run to be imported wins and no later same-day run can replace it. The DB — and therefore the website datasheet — can show a **non-final** run.
|
|
|
|
This is the documented audit failure mode: same-day runs are usually trim / re-test iterations, and the **last** run is the accepted certificate result.
|
|
|
|
## Exposure (whole-source sweep, 2026-06-17)
|
|
|
|
981,716 records parsed across 26,815 `.DAT` files (406,549 serials):
|
|
|
|
- Same-day multi-run events (distinct values): **6,515** across **5,977 serials**
|
|
- DB already on the latest same-day run: 3,803
|
|
- Superseded by a later-date retest (fine): 984
|
|
- **DB on a non-latest run (the defect): 311**
|
|
- Serial absent from DB (collisions/completeness): 1,417
|
|
|
|
## Root cause
|
|
|
|
1. **Strictly-greater date** (`>`) in the conflict `WHERE` — rejects all same-date updates.
|
|
2. **Date-only granularity** — no intra-day timestamp in the `.DAT` to order same-day runs.
|
|
|
|
## Proposed fix (minimal, guarded)
|
|
|
|
Allow a same-date PASS to overwrite **only when the data actually differs**, so the last differing same-day run processed wins (imports run in chronological append order, and the live station logs are scanned last — so the last-processed run is the latest):
|
|
|
|
```sql
|
|
ON CONFLICT (serial_number) DO UPDATE SET
|
|
log_type = EXCLUDED.log_type,
|
|
model_number = EXCLUDED.model_number,
|
|
test_date = EXCLUDED.test_date,
|
|
test_station = EXCLUDED.test_station,
|
|
overall_result = EXCLUDED.overall_result,
|
|
raw_data = EXCLUDED.raw_data,
|
|
source_file = EXCLUDED.source_file,
|
|
api_uploaded_at = NULL,
|
|
forweb_exported_at = NULL
|
|
WHERE test_records.overall_result = 'FAIL'
|
|
OR (EXCLUDED.overall_result = 'PASS' AND EXCLUDED.test_date > test_records.test_date)
|
|
OR (EXCLUDED.overall_result = 'PASS' AND EXCLUDED.test_date = test_records.test_date
|
|
AND EXCLUDED.raw_data IS DISTINCT FROM test_records.raw_data) -- NEW: latest same-day run wins
|
|
```
|
|
|
|
The added clause only fires on a genuine same-date data change, so identical re-imports do **not** needlessly clear `api_uploaded_at` (avoids re-push churn).
|
|
|
|
### Behavior after fix
|
|
|
|
| Existing | Incoming | Before | After |
|
|
|---|---|---|---|
|
|
| PASS date D | PASS date D, different data | ignored (stale) | **updated → latest run** |
|
|
| PASS date D | PASS date D, identical | ignored | ignored (no churn) |
|
|
| PASS date D | PASS date D+1 | updated | updated (unchanged) |
|
|
| PASS date D+1 | PASS date D | ignored | ignored (unchanged) |
|
|
| FAIL | PASS (any date) | updated | updated (unchanged) |
|
|
|
|
## Caveats / assumptions
|
|
|
|
- **Relies on chronological append order** within a `.DAT` and on the live station logs being scanned **last** (they are: `runImport` does HISTLOGS → Recovery → station `TEST_PATH`). If a serial's latest run existed only in HISTLOGS (scanned first) and an older copy in a station log (scanned last), the older copy would win. Rare, but possible. For a hard guarantee, add a monotonic tiebreaker (ingest sequence, or a per-run timestamp if the test program can emit one) — a larger change.
|
|
- **Re-push impact:** the 311 corrected rows (plus any future same-day retests) will clear `api_uploaded_at` and re-upload to Hoffman on the next run. Expected and desired (the website gets the final result), but it is outbound API traffic — run deliberately.
|
|
- **Does NOT fix** generic reused serials (`1-1`, `1-2`, …) that collide across different products, nor the 608 units absent from the DB. Those are separate items (serial-uniqueness model / ingestion completeness).
|
|
|
|
## Stronger alternative (larger migration)
|
|
|
|
If full per-run archival is required (every test sheet reproducible), replace the `UNIQUE (serial_number)` model with a composite key **`(serial_number, test_date, run_sequence)`** (or store all runs and select the latest at render time). This preserves every run and removes the same-day ambiguity entirely, but is a schema migration + dedupe + render/upload changes — propose separately if desired.
|
|
|
|
## Rollout (after approval)
|
|
|
|
1. Apply the `WHERE`-clause change to `database/import.js` (repo copy first, review, then deploy).
|
|
2. Re-run the import so the 311 same-day cases settle on the latest run.
|
|
3. Let the upload path re-push the cleared rows; confirm counts.
|
|
4. Re-run `tools/validate-parsing.js` to confirm same-day violations drop to ~0.
|