Files
claudetools/projects/dataforth-dos/MISSING-UNITS-REPORT-FOR-JOHN-2026-06-17.md
Mike Swanson 8f06426ba0 dataforth(datasheet): root-cause the 608 missing units (report for John)
608 staged datasheets absent from DB. Two causes: (1) 229 units with encoded/
non-standard serials the importer's leading-digit regex silently skips - data is
in the .DAT, recoverable; full blind spot is 840 serials / 9,510 records / 141
models dropped fleet-wide. (2) 379 units whose per-model .DAT was overwritten by a
later work order - recoverable only from the staged .TXT or a log backup. Adds
John-facing report, raw data, and the chase-missing-units.js tool.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-18 13:02:32 -07:00

6.1 KiB

Test Datasheets Missing From the Database/Website — Findings

To: John Lehman (Engineering) From: Mike Swanson, AZ Computer Guru Date: 2026-06-17 Scope: Why some tested units have a staged datasheet but no record in testdatadb / on the website.


Summary

I cross-checked all 11,921 staged datasheet files (the .TXT the test stations produce) against the database. 608 had no matching database record. They fall into two distinct causes:

Cause Units Recoverable?
1. Encoded / non-standard serial numbers the importer skips 229 Yes — the data exists, the importer just doesn't read it
2. Source log (.DAT) was overwritten before import 379 Only from a backup, or from the staged .TXT itself

The first cause is the important one: it is a software limitation we can fix, and its true reach is far larger than these 608 — see "Full scope" below.


Cause 1 — Encoded serial numbers are silently skipped (229 units, fixable)

What happens: When a serial number is too long for the DOS 8.3 filename, the test program encodes the first two digits as a letter (e.g. 10243-1 is written as A243-1; 10 -> A). For these units, the serial is stored with a leading letter inside the log file:

"A243-1","01-21-2025"      <- real serial 10243-1, model 5B45-25D

The database importer recognizes a record only when the serial starts with a digit. A serial that starts with a letter never matches, so the whole record is silently dropped — it is never imported, never rendered, never sent to the website.

Confirmed: the encoded serials are present in the .DAT logs (e.g. 5BLOG\45-25D.DAT contains "A243-1","01-21-2025"), and the decoded form (10243-1) appears in no log and in no database row. So the data exists; the importer simply can't read it.

Of the 608 missing, 229 are this case:

  • 212 are hex-encoded serials (all A-prefix, i.e. 10xxx serials)
  • 17 are other non-standard serial formats the same rule rejects (e.g. TEST-1, 178540-A1, A-1)
  • Stations: TS-11R (142), TS-11L (59), TS-8R (11), TS-8L (17)
  • Dates: late 2025 through 2026
  • Example models: SCM5B47K-05, SCM5B38-04, SCM5B34-01, 8B45-02, SCM5B36-02, 8B32-01, 5B45-25D, DSCA45-01

Examples (encoded -> real serial):

A243-1 -> 10243-1   5B45-25D    02-03-2026   TS-11L
A244-1 -> 10244-1   SCM5B30-02  03-26-2026   TS-11L
A276-1 -> 10276-1   DSCA39-05   05-07-2026   TS-11L
A328-1 -> 10328-1   DSCA45-08   02-17-2026   TS-11L

Full scope of this bug (beyond the 608)

The 229 above are only the units that also still have a staged .TXT on disk. Scanning every .DAT log across all stations and the central history logs, the importer is dropping:

  • 840 distinct encoded serial numbers
  • 9,510 individual test records
  • across 141 models
  • of which 831 of 840 serials are absent from the database

So this single serial-format limitation is keeping on the order of ~9,500 test results out of the database and off the website.

Fix: teach the importer to (a) accept a serial that starts with a letter and (b) decode it back to its real number (A243-1 -> 10243-1) before storing — matching how the longer serials (the H-prefix range) are already handled. This is a one-function change to the import parser. It would recover the 229 units here plus the ~9,500-record backlog. (I will write this up as a separate proposed change for review; no code has been changed.)


Cause 2 — Source log overwritten before import (379 units)

These have ordinary numeric serials (no encoding issue), but their test data is no longer in any log file we import from. The per-model .DAT log is reused for later production runs, and the older records get overwritten.

Confirmed example: units 177097-1 ... 177097-16 (model DSCA33-05, tested 10-17-2025, TS-1R) appear in no log file anywhere under the test share. Their model's log (DSCLOG\33-05.DAT) now contains a different work order (178644-*, tested 02-26-2026). The 10-17-2025 results were overwritten; only the staged .TXT datasheet survived.

Recovery options for these 379:

  1. From the staged .TXT itself — the rendered datasheet still exists on disk for these units; they can be imported directly from the .TXT (store the existing sheet as-is) rather than re-derived from the .DAT. This is the most practical recovery and would cover most of the 379.
  2. From a pre-overwrite backup of the .DAT logs, if one exists. (The Recovery-TEST backup area referenced by the importer does not exist on this server; if Engineering keeps log backups elsewhere, those could be imported.)

If neither is pursued, these units' results remain only as the staged .TXT files and will not appear in the database or on the website.


  1. Fix the importer's serial handling (Cause 1) — recovers 229 staged units and ~9,500 total dropped records across 141 models. Highest value, single code change. Proposal to follow for review.
  2. Backfill the 379 overwritten units (Cause 2) from their staged .TXT files — recovers the datasheets that still exist on disk. Lower-risk than chasing log backups.
  3. Prevent recurrence: the per-model log overwrite is the underlying reason Cause 2 data is lost. If retaining every run matters, the logs should be archived before reuse (or the import scheduled frequently enough that records are captured before a log is overwritten).

How this was determined (for the record)

  • Compared every C:\Shares\test\STAGE\**\*.TXT against test_records (serial, model, date, measured results).
  • For each missing unit, searched all .DAT sources (central HISTLOGS + every station's LOGS, 26,815 files) for the encoded and decoded serial.
  • Confirmed encoded serials are present in the logs but skipped by the import regex; confirmed overwritten units are absent from all logs and their model log now holds a newer work order.
  • Tools (read-only) committed under projects/dataforth-dos/datasheet-pipeline/implementation/tools/; raw data in MISSING-UNITS-ROOTCAUSE-2026-06-17.txt.