Per Mike: import runs every 15 min, so routine timing isn't the cause. The 379 absent units are confined to 2025-10..2026-01 (stop after Jan 2026) on TS-4L/4R/1R - fingerprint of a one-time overwrite during the incident/recovery (fresh DOS logs overwrote accumulated appended server-side logs for ~2 weeks). One-time, not recurring; backfill from the surviving staged .TXT. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
6.6 KiB
Test Datasheets Missing From the Database/Website — Findings
To: John Lehman (Engineering) From: Mike Swanson, AZ Computer Guru Date: 2026-06-17 Scope: Why some tested units have a staged datasheet but no record in testdatadb / on the website.
Summary
I cross-checked all 11,921 staged datasheet files (the .TXT the test stations produce) against the database. 608 had no matching database record. They fall into two distinct causes:
| Cause | Units | Recoverable? |
|---|---|---|
| 1. Encoded / non-standard serial numbers the importer skips | 229 | Yes — the data exists, the importer just doesn't read it |
| 2. One-time log loss during the cryptolocker incident/recovery | 379 | Yes — from the staged .TXT, which still exists on disk |
The first cause is the important one: it is a software limitation we can fix, and its true reach is far larger than these 608 — see "Full scope" below.
Cause 1 — Encoded serial numbers are silently skipped (229 units, fixable)
What happens: When a serial number is too long for the DOS 8.3 filename, the test program encodes the first two digits as a letter (e.g. 10243-1 is written as A243-1; 10 -> A). For these units, the serial is stored with a leading letter inside the log file:
"A243-1","01-21-2025" <- real serial 10243-1, model 5B45-25D
The database importer recognizes a record only when the serial starts with a digit. A serial that starts with a letter never matches, so the whole record is silently dropped — it is never imported, never rendered, never sent to the website.
Confirmed: the encoded serials are present in the .DAT logs (e.g. 5BLOG\45-25D.DAT contains "A243-1","01-21-2025"), and the decoded form (10243-1) appears in no log and in no database row. So the data exists; the importer simply can't read it.
Of the 608 missing, 229 are this case:
- 212 are hex-encoded serials (all
A-prefix, i.e.10xxxserials) - 17 are other non-standard serial formats the same rule rejects (e.g.
TEST-1,178540-A1,A-1) - Stations: TS-11R (142), TS-11L (59), TS-8R (11), TS-8L (17)
- Dates: late 2025 through 2026
- Example models: SCM5B47K-05, SCM5B38-04, SCM5B34-01, 8B45-02, SCM5B36-02, 8B32-01, 5B45-25D, DSCA45-01
Examples (encoded -> real serial):
A243-1 -> 10243-1 5B45-25D 02-03-2026 TS-11L
A244-1 -> 10244-1 SCM5B30-02 03-26-2026 TS-11L
A276-1 -> 10276-1 DSCA39-05 05-07-2026 TS-11L
A328-1 -> 10328-1 DSCA45-08 02-17-2026 TS-11L
Full scope of this bug (beyond the 608)
The 229 above are only the units that also still have a staged .TXT on disk. Scanning every .DAT log across all stations and the central history logs, the importer is dropping:
- 840 distinct encoded serial numbers
- 9,510 individual test records
- across 141 models
- of which 831 of 840 serials are absent from the database
So this single serial-format limitation is keeping on the order of ~9,500 test results out of the database and off the website.
Fix: teach the importer to (a) accept a serial that starts with a letter and (b) decode it back to its real number (A243-1 -> 10243-1) before storing — matching how the longer serials (the H-prefix range) are already handled. This is a one-function change to the import parser. It would recover the 229 units here plus the ~9,500-record backlog. (I will write this up as a separate proposed change for review; no code has been changed.)
Cause 2 — One-time log loss during the cryptolocker incident/recovery (379 units)
These have ordinary numeric serials (no encoding issue), but their raw test data is no longer in any log file we import from. This is not an ongoing gap — the import runs every 15 minutes and everything since has come through cleanly.
What happened: the DOS stations append all of a given model's results into a single shared per-model .DAT file. During the crypto incident, stations were failing to sync, and in the recovery the appended-to-one-filename behavior wasn't yet understood — so for ~2 weeks, freshly-created logs coming off the DOS machines overwrote the accumulated server-side logs, wiping the earlier history those files held.
Evidence (date/station fingerprint of a one-time overwrite): the 379 affected units are confined to 2025-10 → 2026-01 and stop cold after January 2026, concentrated on three stations:
| Test month | Units | Station | Units | |
|---|---|---|---|---|
| 2025-10 | 95 | TS-4L | 185 | |
| 2025-11 | 66 | TS-4R | 171 | |
| 2025-12 | 210 | TS-1R | 23 | |
| 2026-01 | 8 |
Confirmed example: units 177097-1 ... 177097-16 (model DSCA33-05, tested 10-17-2025, TS-1R) appear in no log file anywhere under the test share. Their model's log (DSCLOG\33-05.DAT) now holds a later work order (178644-*, tested 02-26-2026) — the older appended history was overwritten; only the staged .TXT datasheet survived.
Recovery: the rendered datasheet .TXT still exists on disk for these units, so they can be backfilled directly from the .TXT (store the existing sheet as-is). The raw .DAT history is gone (the Recovery-TEST backup area the importer references does not exist on this server). Backfilling from the staged .TXT is the practical path and recovers all 379.
Recommended actions
- Fix the importer's serial handling (Cause 1) — recovers 229 staged units and ~9,500 total dropped records across 141 models. Highest value, single code change. Proposal to follow for review.
- Backfill the 379 incident-era units (Cause 2) from their staged
.TXTfiles — recovers the datasheets that still exist on disk. - Recurrence: Cause 2 was a one-time incident artifact (the 15-minute import has captured everything since January 2026), so no ongoing process change is required. Worth confirming the current sync no longer overwrites accumulated server-side logs with fresh DOS-side copies.
How this was determined (for the record)
- Compared every
C:\Shares\test\STAGE\**\*.TXTagainsttest_records(serial, model, date, measured results). - For each missing unit, searched all
.DATsources (central HISTLOGS + every station's LOGS, 26,815 files) for the encoded and decoded serial. - Confirmed encoded serials are present in the logs but skipped by the import regex; confirmed overwritten units are absent from all logs and their model log now holds a newer work order.
- Tools (read-only) committed under
projects/dataforth-dos/datasheet-pipeline/implementation/tools/; raw data inMISSING-UNITS-ROOTCAUSE-2026-06-17.txt.