dataforth/dsca33-45: recover lost specs from Hoffman API (56/58 models)

The DSCA33/DSCA45 main spec files lost in the cryptolocker wipe are recoverable:
the original software published correct certs to the Hoffman product API before
the wipe and our null-skipping renderer never overwrote them. Mine per-model
Final-Test templates (names + specs + verbatim accuracy headers) straight from
those originals instead of requesting spec files from Dataforth/John.

- dsca33-45-templates.json: 56 models (DSCA33 34/35, DSCA45 22/23); only
  DSCA33-1948 + DSCA45-1746 (24 units) lack an original.
- mine-hoffman-dsca.py: the re-runnable miner.
- DSCA33-45-HOFFMAN-RECOVERY handoff for the AD2 session (incl. the gate:
  validate each render vs its Hoffman original before enabling live rendering).
- memories: Hoffman recovery (supersedes the spec-gap "need John" note) and the
  AD2 SSH MTU-blackhole root cause/fix; errorlog entries (syncro jq, ssh correction).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-18 09:08:06 -07:00
parent dcd3eda634
commit c5643ee419
7 changed files with 3956 additions and 0 deletions

View File

@@ -0,0 +1,106 @@
# DSCA33 / DSCA45 — Recover the "lost" specs from Hoffman (Handoff to AD2)
**For:** the Claude session on AD2 (`C:\Shares\testdatadb`). **Ref:** Syncro #32441.
**Supersedes** the FIX2-5 handoff's *TODO 2 (request DSCA33/45 main spec files from John)*
**we do not need John.** The original specs are recoverable from the Hoffman API.
---
## The finding (why this changes the plan)
The DSCA33/DSCA45 "main spec" records (DSCMAIN/DSCOUT with SENTYPE/MAXIN/input-type) were
lost in the cryptolocker wipe, so `render-datasheet.js` bails (null render) and the pipeline
**skips** those models. BUT the **original software published correct DSCA33/45 certs to the
Hoffman API before the wipe**, and our broken renderer never overwrote them (it skips null
renders). They are still there, pristine.
Verified from GURU-5070 against the public API
(`GET https://www.dataforth.com/api/v1/TestReportDataFiles/{serial}`, OAuth client-creds,
vaulted `clients/dataforth/hoffman-product-api`):
| Family | Models | Mineable from Hoffman | Units already correct+live on Hoffman | No original anywhere |
|---|---|---|---|---|
| DSCA33 | 35 | **34** | 2,633 / 3,397 | **DSCA33-1948 (16 units)** |
| DSCA45 | 23 | **22** | 4,524 / 5,413 | **DSCA45-1746 (8 units)** |
So of the ~8,763 "blocked" certs: **~7,157 are already correct and live on the public site**
(no action needed), **~1,580 not-yet-uploaded units** just need rendering, and only
**24 units across 2 niche models** have no original to recover.
---
## The artifact (already built — use it, don't re-derive)
`projects/dataforth-dos/dsca33-45-templates.json`**56 models**, mined from the Hoffman
originals with the same `===`-rule column-span extractor as STAGE 1. Schema is a **superset of
`dsca-templates.json`**:
```json
"DSCA45-05E": {
"accOut": "Output (mA)",
"accHeader": [
" Frequency Calculated Measured",
" (Hz) Output (mA) Output (mA)* Error (%) Status"
],
"rows": [ {"name":"Supply Current","spec":"< 105 mA"}, ... ],
"_srcSerial": "176326-2"
}
```
- `rows` = the Final-Test `Parameter | Specification` list (incl. the spec-less rows like
`240 VAC Withstand`, `Hi-Pot`, and section sub-heads `Zero-Crossing Input` / `TTL Input`
**kept**, same skip-rule reconciliation as STAGE 2).
- `accHeader` = the **verbatim 2-line accuracy header** from the original. Use it — DSCA33/45
introduce header tokens the 92-model set never had (see flags) and the frequency-input layout.
- `_srcSerial` = a known already-uploaded serial for that model → your validation oracle.
Spot-checked DSCA33-07C and DSCA45-05E row-for-row against their live Hoffman originals: exact.
Regenerate if needed: `python projects/dataforth-dos/tools/mine-hoffman-dsca.py <map.json> <out.json>`.
---
## Flags (DSCA33/45 differ from the 92 you already did)
1. **New accOut tokens:** `Output (VDC)` and `Output (mADC)` (DSCA33 current/voltage DC outputs),
not just `Output (V)`/`Output (mA)`. Your accuracy-block must emit the verbatim token.
2. **Model-specific accuracy input label:** DSCA33 uses `Vin (mVAC)` / `Vin (VAC)` / `Iin (AAC)` /
`Iin (mAAC)`. **Use the `accHeader` lines** rather than synthesizing from a (missing) spec field.
3. **DSCA45 is frequency-input:** two-line super-header `Frequency` / `(Hz)` and a frequency
sweep in the accuracy block — structurally unlike the voltage/current-input models. Confirm the
accuracy renderer reproduces it (the `accHeader` gives you the exact text).
4. Your **DSCA33/45 slotMaps are already derived** and come from the same original layout as these
rows, so order should align — verify during validation.
---
## Plan
1. **Backup first** (fresh `pg_dump` + VSS) per FIX2-5 discipline. Save-state `datasheet-exact.js`.
2. **Load** `dsca33-45-templates.json` for `family in (DSCA33, DSCA45)`, same wiring as STAGE 2
(template `rows` drive names+specs; map raw_data STATUS groups positionally via the slotMaps;
QB `Math.fround` rounding; data-driven loadNote). For the accuracy block, drive the header from
`accHeader` / `accOut`.
3. **VALIDATE against Hoffman (the gate — stronger than STAGE 3):** for each model, render its
`_srcSerial` (an already-uploaded unit) and **content-normalized byte-compare against
`GET /api/v1/TestReportDataFiles/{_srcSerial}`**. Require a clean match **per model** before that
model is allowed to render/publish.
> **CRITICAL — do not enable live DSCA33/45 rendering until each model passes.** The moment the
> renderer returns non-null for DSCA33/45, the pipeline stops skipping them and will **re-push and
> UPDATE the ~7,157 already-correct originals** on the next cycle. That is only safe if the render
> byte-matches the original — which the per-model gate proves. A mismatched model would overwrite
> good customer certs. Gate hard.
4. **Publish the gap:** for validated models, render + `uploadBySerialNumbers` the **not-yet-uploaded**
units (~1,580; `api_uploaded_at IS NULL`). Already-uploaded units return `Unchanged` (idempotent).
5. **Leave blocked (24 units):** `DSCA33-1948` (16), `DSCA45-1746` (8) — no Hoffman original. Low
priority; only these would ever need John, and they look like one-off custom part numbers.
Commit to `ad2`; update #32441 (hidden notes). The remote operator sees it on sync.
---
## Reference
- Templates: `projects/dataforth-dos/dsca33-45-templates.json` (56 models)
- Miner: `projects/dataforth-dos/tools/mine-hoffman-dsca.py`
- Hoffman API creds: vault `clients/dataforth/hoffman-product-api` (read = `GET .../TestReportDataFiles/{serial}`)
- Memory: `project_dsca33_45_spec_gap` (updated — resolved via Hoffman, not John)

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,123 @@
#!/usr/bin/env python3
"""
Mine per-model DSCA33/DSCA45 Final-Test templates from the ORIGINAL certs stored
on Dataforth's Hoffman API (the spec files lost in the cryptolocker event are
recoverable here because the original software published these before the wipe).
Input : a JSON map [{"m": model, "s": serial}, ...] of UPLOADED serials.
Output: dsca33-45-templates.json (schema-compatible with dsca-templates.json:
{ model: { "accOut": "...", "rows": [ {"name","spec"}, ... ] } })
+ a human report on stdout.
Same extraction as the STAGE-1 extractor: the '===' rule under the Final-Test
"Parameter ... Measured" header gives exact column spans; name = Parameter col,
spec = Specification col. Keeps the richest sheet (most rows) per model.
"""
import json, re, sys, time, urllib.request, urllib.parse, os
TOKEN_URL = "https://login.dataforth.com/connect/token"
API_BASE = "https://www.dataforth.com"
CID, CSEC, SCOPE = "dataforth.onprem.sync", "Trxvwee2234-Awer8723-2", "dataforth.web"
def get_token():
body = urllib.parse.urlencode({
"grant_type": "client_credentials", "client_id": CID,
"client_secret": CSEC, "scope": SCOPE}).encode()
req = urllib.request.Request(TOKEN_URL, body,
{"Content-Type": "application/x-www-form-urlencoded"})
return json.loads(urllib.request.urlopen(req, timeout=30).read())["access_token"]
def get_cert(serial, tok):
url = f"{API_BASE}/api/v1/TestReportDataFiles/{urllib.parse.quote(serial)}"
req = urllib.request.Request(url, headers={"Authorization": f"Bearer {tok}"})
try:
with urllib.request.urlopen(req, timeout=30) as r:
return json.loads(r.read())
except urllib.error.HTTPError as e:
if e.code == 404: return None
raise
def col_spans(sep):
return [(m.start(), m.end()) for m in re.finditer(r"=+", sep)]
def extract(t):
lines = t.replace("\r\n", "\n").split("\n")
ahi = next((i for i, l in enumerate(lines)
if "Error (%)" in l and "Status" in l), -1)
acc_hdr = lines[ahi] if ahi >= 0 else ""
# capture the verbatim 2-line accuracy header (super-header + column line) so
# AD2 can reproduce the model-specific input label + VDC/mADC/Hz headers exactly
acc_header = [lines[ahi - 1].rstrip(), lines[ahi].rstrip()] if ahi > 0 else []
m = re.search(r"Output \([^)]*\)|Vout \([^)]*\)", acc_hdr)
acc_out = m.group(0) if m else "?"
fi = next((i for i, l in enumerate(lines) if "FINAL TEST RESULTS" in l), -1)
if fi < 0: return None
hi = next((i for i in range(fi + 1, len(lines))
if re.search(r"Parameter\s+Measured", lines[i])), -1)
if hi < 0: return None
sep = lines[hi + 1] if hi + 1 < len(lines) else ""
if "=" not in sep: return None
cols = col_spans(sep)
if len(cols) < 4: return None
pc, mc, sc, stc = cols[0], cols[1], cols[2], cols[3]
rows = []
for i in range(hi + 2, len(lines)):
l = lines[i]
if re.search(r"Check List|^\s*_{5,}", l): break
if not l.strip(): continue
name = l[pc[0]:mc[0]].strip()
spec = l[sc[0]:stc[0]].strip()
if not name and not spec: continue
rows.append({"name": name, "spec": spec})
return {"accOut": acc_out, "rows": rows, "accHdr": acc_hdr.strip(),
"accHeader": acc_header}
def main():
mp = json.load(open(sys.argv[1]))
outpath = sys.argv[2]
tok = get_token()
by_model = {} # model -> best {accOut, rows, accHdr, serial}
meta = {} # model -> diagnostics
missing = []
for row in mp:
model, serial = row["m"], row["s"]
cert = get_cert(serial, tok)
if not cert or not cert.get("Content"):
missing.append((model, serial)); continue
tpl = extract(cert["Content"])
if not tpl:
meta.setdefault(model, {}).setdefault("noextract", []).append(serial); continue
cur = by_model.get(model)
if not cur or len(tpl["rows"]) > len(cur["rows"]):
tpl["serial"] = serial
by_model[model] = tpl
# build schema-compatible output
out = {}
for model in sorted(by_model):
t = by_model[model]
out[model] = {"accOut": t["accOut"], "accHeader": t["accHeader"],
"rows": t["rows"], "_srcSerial": t["serial"]}
with open(outpath, "w") as f:
json.dump(out, f, indent=0)
# report
fams = {}
print(f"=== Mined {len(out)} models from Hoffman -> {outpath} ===\n")
print(f"{'MODEL':<14} {'rows':>4} {'accOut':<16} src-serial accuracy-header")
for model in sorted(out):
t = by_model[model]
fam = model.split("-")[0]
fams[fam] = fams.get(fam, 0) + 1
flag = " <-- LOW" if len(t["rows"]) < 3 else ""
print(f"{model:<14} {len(t['rows']):>4} {t['accOut']:<16} {t['serial']:<11} {t['accHdr'][:60]}{flag}")
print("\nper-family models mined:", dict(fams))
distinct_accout = sorted(set(o["accOut"] for o in out.values()))
print("distinct accOut tokens:", distinct_accout)
if missing:
print(f"\n[WARN] {len(missing)} serials returned 404 (not on Hoffman):",
missing[:10], "..." if len(missing) > 10 else "")
no_tpl = [m for m in {r['m'] for r in mp} if m not in out]
if no_tpl:
print(f"\n[WARN] models with NO usable template ({len(no_tpl)}):", no_tpl)
if __name__ == "__main__":
main()