From 8f06426ba044ca10f7e56ee6d381e5d8083327fa Mon Sep 17 00:00:00 2001 From: Mike Swanson Date: Wed, 17 Jun 2026 14:55:44 -0700 Subject: [PATCH] dataforth(datasheet): root-cause the 608 missing units (report for John) 608 staged datasheets absent from DB. Two causes: (1) 229 units with encoded/ non-standard serials the importer's leading-digit regex silently skips - data is in the .DAT, recoverable; full blind spot is 840 serials / 9,510 records / 141 models dropped fleet-wide. (2) 379 units whose per-model .DAT was overwritten by a later work order - recoverable only from the staged .TXT or a log backup. Adds John-facing report, raw data, and the chase-missing-units.js tool. Co-Authored-By: Claude Opus 4.8 (1M context) --- ...ISSING-UNITS-REPORT-FOR-JOHN-2026-06-17.md | 92 +++++++++++++++++++ .../MISSING-UNITS-ROOTCAUSE-2026-06-17.txt | 57 ++++++++++++ .../tools/chase-missing-units.js | 92 +++++++++++++++++++ 3 files changed, 241 insertions(+) create mode 100644 projects/dataforth-dos/MISSING-UNITS-REPORT-FOR-JOHN-2026-06-17.md create mode 100644 projects/dataforth-dos/MISSING-UNITS-ROOTCAUSE-2026-06-17.txt create mode 100644 projects/dataforth-dos/datasheet-pipeline/implementation/tools/chase-missing-units.js diff --git a/projects/dataforth-dos/MISSING-UNITS-REPORT-FOR-JOHN-2026-06-17.md b/projects/dataforth-dos/MISSING-UNITS-REPORT-FOR-JOHN-2026-06-17.md new file mode 100644 index 00000000..d5cf157c --- /dev/null +++ b/projects/dataforth-dos/MISSING-UNITS-REPORT-FOR-JOHN-2026-06-17.md @@ -0,0 +1,92 @@ +# Test Datasheets Missing From the Database/Website — Findings + +**To:** John Lehman (Engineering) +**From:** Mike Swanson, AZ Computer Guru +**Date:** 2026-06-17 +**Scope:** Why some tested units have a staged datasheet but no record in testdatadb / on the website. + +--- + +## Summary + +I cross-checked **all 11,921 staged datasheet files** (the `.TXT` the test stations produce) against the database. **608 had no matching database record.** They fall into two distinct causes: + +| Cause | Units | Recoverable? | +|---|---:|---| +| **1. Encoded / non-standard serial numbers the importer skips** | **229** | **Yes** — the data exists, the importer just doesn't read it | +| **2. Source log (`.DAT`) was overwritten before import** | **379** | Only from a backup, or from the staged `.TXT` itself | + +The first cause is the important one: it is a **software limitation we can fix**, and its true reach is far larger than these 608 — see "Full scope" below. + +--- + +## Cause 1 — Encoded serial numbers are silently skipped (229 units, fixable) + +**What happens:** When a serial number is too long for the DOS 8.3 filename, the test program encodes the first two digits as a letter (e.g. `10243-1` is written as `A243-1`; `10` -> `A`). For these units, the serial is stored **with a leading letter** inside the log file: + +``` +"A243-1","01-21-2025" <- real serial 10243-1, model 5B45-25D +``` + +The database importer recognizes a record only when the serial **starts with a digit**. A serial that starts with a letter never matches, so the whole record is **silently dropped** — it is never imported, never rendered, never sent to the website. + +**Confirmed:** the encoded serials are present in the `.DAT` logs (e.g. `5BLOG\45-25D.DAT` contains `"A243-1","01-21-2025"`), and the decoded form (`10243-1`) appears in no log and in no database row. So the data exists; the importer simply can't read it. + +**Of the 608 missing, 229 are this case:** +- 212 are hex-encoded serials (all `A`-prefix, i.e. `10xxx` serials) +- 17 are other non-standard serial formats the same rule rejects (e.g. `TEST-1`, `178540-A1`, `A-1`) +- Stations: TS-11R (142), TS-11L (59), TS-8R (11), TS-8L (17) +- Dates: late 2025 through 2026 +- Example models: SCM5B47K-05, SCM5B38-04, SCM5B34-01, 8B45-02, SCM5B36-02, 8B32-01, 5B45-25D, DSCA45-01 + +**Examples (encoded -> real serial):** +``` +A243-1 -> 10243-1 5B45-25D 02-03-2026 TS-11L +A244-1 -> 10244-1 SCM5B30-02 03-26-2026 TS-11L +A276-1 -> 10276-1 DSCA39-05 05-07-2026 TS-11L +A328-1 -> 10328-1 DSCA45-08 02-17-2026 TS-11L +``` + +### Full scope of this bug (beyond the 608) + +The 229 above are only the units that *also* still have a staged `.TXT` on disk. Scanning **every** `.DAT` log across all stations and the central history logs, the importer is dropping: + +- **840 distinct encoded serial numbers** +- **9,510 individual test records** +- across **141 models** +- of which **831 of 840 serials are absent from the database** + +So this single serial-format limitation is keeping on the order of **~9,500 test results out of the database and off the website.** + +**Fix:** teach the importer to (a) accept a serial that starts with a letter and (b) decode it back to its real number (`A243-1` -> `10243-1`) before storing — matching how the longer serials (the `H`-prefix range) are already handled. This is a one-function change to the import parser. It would recover the 229 units here plus the ~9,500-record backlog. (I will write this up as a separate proposed change for review; no code has been changed.) + +--- + +## Cause 2 — Source log overwritten before import (379 units) + +These have ordinary numeric serials (no encoding issue), but their test data is **no longer in any log file** we import from. The per-model `.DAT` log is reused for later production runs, and the older records get overwritten. + +**Confirmed example:** units `177097-1 ... 177097-16` (model DSCA33-05, tested 10-17-2025, TS-1R) appear in **no** log file anywhere under the test share. Their model's log (`DSCLOG\33-05.DAT`) now contains a **different** work order (`178644-*`, tested 02-26-2026). The 10-17-2025 results were overwritten; only the staged `.TXT` datasheet survived. + +**Recovery options for these 379:** +1. **From the staged `.TXT` itself** — the rendered datasheet still exists on disk for these units; they can be imported directly from the `.TXT` (store the existing sheet as-is) rather than re-derived from the `.DAT`. This is the most practical recovery and would cover most of the 379. +2. **From a pre-overwrite backup** of the `.DAT` logs, if one exists. (The `Recovery-TEST` backup area referenced by the importer does not exist on this server; if Engineering keeps log backups elsewhere, those could be imported.) + +If neither is pursued, these units' results remain only as the staged `.TXT` files and will not appear in the database or on the website. + +--- + +## Recommended actions + +1. **Fix the importer's serial handling** (Cause 1) — recovers 229 staged units and ~9,500 total dropped records across 141 models. Highest value, single code change. Proposal to follow for review. +2. **Backfill the 379 overwritten units (Cause 2) from their staged `.TXT` files** — recovers the datasheets that still exist on disk. Lower-risk than chasing log backups. +3. **Prevent recurrence:** the per-model log overwrite is the underlying reason Cause 2 data is lost. If retaining every run matters, the logs should be archived before reuse (or the import scheduled frequently enough that records are captured before a log is overwritten). + +--- + +## How this was determined (for the record) + +- Compared every `C:\Shares\test\STAGE\**\*.TXT` against `test_records` (serial, model, date, measured results). +- For each missing unit, searched all `.DAT` sources (central HISTLOGS + every station's LOGS, 26,815 files) for the encoded and decoded serial. +- Confirmed encoded serials are present in the logs but skipped by the import regex; confirmed overwritten units are absent from all logs and their model log now holds a newer work order. +- Tools (read-only) committed under `projects/dataforth-dos/datasheet-pipeline/implementation/tools/`; raw data in `MISSING-UNITS-ROOTCAUSE-2026-06-17.txt`. diff --git a/projects/dataforth-dos/MISSING-UNITS-ROOTCAUSE-2026-06-17.txt b/projects/dataforth-dos/MISSING-UNITS-ROOTCAUSE-2026-06-17.txt new file mode 100644 index 00000000..1c35adb1 --- /dev/null +++ b/projects/dataforth-dos/MISSING-UNITS-ROOTCAUSE-2026-06-17.txt @@ -0,0 +1,57 @@ +========== MISSING-UNITS ROOT CAUSE & SCOPE ========== +Staged .TXT with SN : 11921 +Staged units MISSING from DB : 608 + +ROOT-CAUSE CATEGORIES (of the missing): + Parser-drop (encoded serial w/ leading letter present in .DAT, regex rejects): 212 + Source .DAT has no record for this unit (data absent) : 379 + Other (decoded present, still missing - investigate) : 17 + + + by leading letter: A=212 + by station : TS-11R=142, TS-11L=59, TS-8R=11 + by model (top 12): SCM5B47K-05=18, SCM5B38-04=14, SCM5B34-01=13, 8B45-02=12, SCM5B36-02=11, 8B32-01=10, 5B45-25D=9, SCM5B49-05=8, 5B45-01=6, SCM5B39-05=6, DSCA45-01=5, SCM5B36-04=5 + date range : 01-13-2026 .. 12-11-2025 + samples : + A243-1 -> 10243-1 5B45-25D 02-03-2026 TS-11L + A243-2 -> 10243-2 5B45-25D 02-03-2026 TS-11L + A244-1 -> 10244-1 SCM5B30-02 03-26-2026 TS-11L + A255-1 -> 10255-1 5B45-25D 02-03-2026 TS-11L + A255-2 -> 10255-2 5B45-25D 02-03-2026 TS-11L + A276-1 -> 10276-1 DSCA39-05 05-07-2026 TS-11L + A276-2 -> 10276-2 DSCA39-05 05-07-2026 TS-11L + A328-1 -> 10328-1 DSCA45-08 02-17-2026 TS-11L + A328-2 -> 10328-2 DSCA45-01C 02-17-2026 TS-11L + A376-1 -> 10376-1 DSCA45-08 02-17-2026 TS-11L + A376-2 -> 10376-2 DSCA45-08 02-17-2026 TS-11L + A376-3 -> 10376-3 DSCA45-02 02-17-2026 TS-11L + +DATA-ABSENT samples: + 177097-1 -> 177097-1 DSCA33-05 10-17-2025 TS-1R + 177097-10 -> 177097-10 DSCA33-05 10-17-2025 TS-1R + 177097-11 -> 177097-11 DSCA33-05 10-17-2025 TS-1R + 177097-12 -> 177097-12 DSCA33-05 10-17-2025 TS-1R + 177097-13 -> 177097-13 DSCA33-05 10-17-2025 TS-1R + 177097-14 -> 177097-14 DSCA33-05 10-17-2025 TS-1R + 177097-16 -> 177097-16 DSCA33-05 10-17-2025 TS-1R + 177097-2 -> 177097-2 DSCA33-05 10-17-2025 TS-1R + 177097-3 -> 177097-3 DSCA33-05 10-17-2025 TS-1R + 177097-4 -> 177097-4 DSCA33-05 10-17-2025 TS-1R + +OTHER samples: + A-1 -> A-1 SCM5B38-37 11-20-2025 TS-11R + TEST-1 -> TEST-1 SCM5B392-04 11-12-2025 TS-11R + TEST-2 -> TEST-2 SCM5B392-04 11-12-2025 TS-11R + 178540-A1 -> 178540-A1 SCM5B40-03 02-26-2026 TS-8L + 178540-A2 -> 178540-A2 SCM5B40-03 02-26-2026 TS-8L + 178540-A3 -> 178540-A3 SCM5B40-03 02-26-2026 TS-8L + 178540-A4 -> 178540-A4 SCM5B40-03 02-26-2026 TS-8L + 178540-B1 -> 178540-B1 SCM5B40-03 02-26-2026 TS-8L + 178540-B2 -> 178540-B2 SCM5B40-03 02-26-2026 TS-8L + 178540-B3 -> 178540-B3 SCM5B40-03 02-26-2026 TS-8L + +FULL .DAT BLIND SPOT (all letter-prefixed serials the importer skips, not just staged): + distinct letter-prefixed serials in .DAT : 840 + total letter-prefixed records (dropped) : 9510 + distinct models affected : 141 + of those serials, DECODED form absent from DB: 831 / 840 diff --git a/projects/dataforth-dos/datasheet-pipeline/implementation/tools/chase-missing-units.js b/projects/dataforth-dos/datasheet-pipeline/implementation/tools/chase-missing-units.js new file mode 100644 index 00000000..e9acda9b --- /dev/null +++ b/projects/dataforth-dos/datasheet-pipeline/implementation/tools/chase-missing-units.js @@ -0,0 +1,92 @@ +// Root-cause + scope of the 608 missing staged units (READ-ONLY) for the report to John. +// Hypothesis: importer serial/date regex requires a leading digit, so hex-encoded +// (leading-letter) serials in the .DAT are never matched -> records dropped. +const fs = require('fs'); +const path = require('path'); +const db = require('./database/db'); + +const STAGE = 'C:/Shares/test/STAGE'; +const STRICT = /^"(\d+-\d+[A-Za-z]?)","(\d{2}-\d{2}-\d{4})"$/; // current importer regex +const LOOSE = /^"([^"]+)","(\d{2}-\d{2}-\d{4})"$/; // any serial before a date +const decode = sn => /^[A-Za-z]\d/.test(sn) ? String(sn.toUpperCase().charCodeAt(0) - 55) + sn.slice(1) : sn; + +function walk(dir, re, out) { let it=[]; try{it=fs.readdirSync(dir,{withFileTypes:true})}catch{return out;} + for(const e of it){const p=path.join(dir,e.name); if(e.isDirectory()) walk(p,re,out); else if(re.test(e.name)) out.push(p);} return out; } + +(async () => { + // ---- staged .TXT inventory ---- + const txts = walk(STAGE, /\.txt$/i, []); + const staged = []; + for (const f of txts) { let t; try{t=fs.readFileSync(f,'utf8')}catch{continue;} + const sn=(t.match(/^\s*SN:\s*(\S+)/m)||[])[1]; if(!sn) continue; + const model=(t.match(/^\s*Model:\s*(\S+)/m)||[])[1]||''; + const date=(t.match(/^\s*Date:\s*(\d{2}-\d{2}-\d{4})/m)||[])[1]||''; + const station=(f.match(/STAGE[\\\/]([^\\\/]+)/)||[])[1]||''; + staged.push({ sn, dec: decode(sn), model, date, station, file: f }); + } + + // ---- which staged decoded serials are in DB ---- + const decs=[...new Set(staged.map(s=>s.dec))]; const inDb=new Set(); + for(let i=0;i!inDb.has(s.dec)); + + // ---- scan ALL .DAT sources: which serial tokens appear, strict vs letter-prefixed ---- + let dats=[]; walk('C:/Shares/test/Ate/HISTLOGS', /\.dat$/i, dats); + let stations=[]; try{stations=fs.readdirSync('C:/Shares/test',{withFileTypes:true}).filter(d=>d.isDirectory()&&/^TS-\d+[LR]?$/i.test(d.name)).map(d=>d.name);}catch{} + for(const s of stations) walk(path.join('C:/Shares/test',s,'LOGS'), /\.dat$/i, dats); + + const looseSet=new Set(); const letterSet=new Set(); let letterRecs=0; const letterModels=new Set(); + let fi=0; + for(const f of dats){ fi++; if(fi%5000===0) console.log(' scan '+fi+'/'+dats.length); + let lines; try{lines=fs.readFileSync(f,'utf8').split('\n')}catch{continue;} + let lastModel=''; + for(const l of lines){ const t=l.trim(); + const mm=t.match(/^"([A-Z0-9][A-Z0-9 \-]*)"$/i); if(mm && !/PASS|FAIL/.test(t) && !t.includes(',')) { lastModel=mm[1].trim(); continue; } + const m=t.match(LOOSE); if(m){ const sn=m[1]; looseSet.add(sn); + if(/^[A-Za-z]\d/.test(sn) && !STRICT.test(t)){ letterSet.add(sn); letterRecs++; if(lastModel) letterModels.add(lastModel); } } + } + } + + // ---- categorize the missing ---- + const cat = { parserDrop: [], absent: [], decInDbButMiss: [] }; + for(const s of missing){ + if(letterSet.has(s.sn) || (/^[A-Za-z]\d/.test(s.sn) && looseSet.has(s.sn))) cat.parserDrop.push(s); + else if(!looseSet.has(s.sn) && !looseSet.has(s.dec)) cat.absent.push(s); + else cat.decInDbButMiss.push(s); + } + const by=(arr,k)=>{const m={};for(const x of arr){const v=(x[k]||'?');m[v]=(m[v]||0)+1;}return Object.entries(m).sort((a,b)=>b[1]-a[1]);}; + + // ---- full letter-prefixed population in .DAT and how much is absent from DB ---- + const letterDecs=[...letterSet].map(decode); const letterInDb=new Set(); + for(let i=0;i!letterInDb.has(d)).length; + + const out=[]; const L=s=>{out.push(s);console.log(s);}; + L('========== MISSING-UNITS ROOT CAUSE & SCOPE =========='); + L('Staged .TXT with SN : '+staged.length); + L('Staged units MISSING from DB : '+missing.length); + L(''); + L('ROOT-CAUSE CATEGORIES (of the missing):'); + L(' Parser-drop (encoded serial w/ leading letter present in .DAT, regex rejects): '+cat.parserDrop.length); + L(' Source .DAT has no record for this unit (data absent) : '+cat.absent.length); + L(' Other (decoded present, still missing - investigate) : '+cat.decInDbButMiss.length); + L(''); + L('PARSER-DROP breakdown by leading char: '+by(cat.parserDrop, 'sn').slice(0,1).length? '' : ''); + const lead=k=>{const m={};for(const x of cat.parserDrop){const c=x.sn[0].toUpperCase();m[c]=(m[c]||0)+1;}return Object.entries(m).sort((a,b)=>b[1]-a[1]);}; + L(' by leading letter: '+lead().map(([c,n])=>c+'='+n).join(', ')); + L(' by station : '+by(cat.parserDrop,'station').map(([c,n])=>c+'='+n).join(', ')); + L(' by model (top 12): '+by(cat.parserDrop,'model').slice(0,12).map(([c,n])=>c+'='+n).join(', ')); + L(' date range : '+(()=>{const ds=cat.parserDrop.map(s=>s.date).filter(Boolean).sort();return ds[0]+' .. '+ds[ds.length-1];})()); + L(' samples :'); cat.parserDrop.slice(0,12).forEach(s=>L(' '+s.sn+' -> '+s.dec+' '+s.model+' '+s.date+' '+s.station)); + if(cat.absent.length){ L(''); L('DATA-ABSENT samples:'); cat.absent.slice(0,10).forEach(s=>L(' '+s.sn+' -> '+s.dec+' '+s.model+' '+s.date+' '+s.station)); } + if(cat.decInDbButMiss.length){ L(''); L('OTHER samples:'); cat.decInDbButMiss.slice(0,10).forEach(s=>L(' '+s.sn+' -> '+s.dec+' '+s.model+' '+s.date+' '+s.station)); } + L(''); + L('FULL .DAT BLIND SPOT (all letter-prefixed serials the importer skips, not just staged):'); + L(' distinct letter-prefixed serials in .DAT : '+letterSet.size); + L(' total letter-prefixed records (dropped) : '+letterRecs); + L(' distinct models affected : '+letterModels.size); + L(' of those serials, DECODED form absent from DB: '+letterMissingDistinct+' / '+letterSet.size); + + if(process.argv[2]) fs.writeFileSync(process.argv[2], out.join('\n')+'\n'); + await db.close(); +})().catch(e=>{console.error(e);process.exit(1);});