dataforth/dsca33-45: recover lost specs from Hoffman API (56/58 models)

The DSCA33/DSCA45 main spec files lost in the cryptolocker wipe are recoverable: the original software published correct certs to the Hoffman product API before the wipe and our null-skipping renderer never overwrote them. Mine per-model Final-Test templates (names + specs + verbatim accuracy headers) straight from those originals instead of requesting spec files from Dataforth/John. - dsca33-45-templates.json: 56 models (DSCA33 34/35, DSCA45 22/23); only DSCA33-1948 + DSCA45-1746 (24 units) lack an original. - mine-hoffman-dsca.py: the re-runnable miner. - DSCA33-45-HOFFMAN-RECOVERY handoff for the AD2 session (incl. the gate: validate each render vs its Hoffman original before enabling live rendering). - memories: Hoffman recovery (supersedes the spec-gap "need John" note) and the AD2 SSH MTU-blackhole root cause/fix; errorlog entries (syncro jq, ssh correction). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-18 09:08:06 -07:00
parent dcd3eda634
commit c5643ee419
7 changed files with 3956 additions and 0 deletions
--- a/projects/dataforth-dos/DSCA33-45-HOFFMAN-RECOVERY-2026-06-18.md
+++ b/projects/dataforth-dos/DSCA33-45-HOFFMAN-RECOVERY-2026-06-18.md
@@ -0,0 +1,106 @@
+# DSCA33 / DSCA45 — Recover the "lost" specs from Hoffman (Handoff to AD2)
+
+**For:** the Claude session on AD2 (`C:\Shares\testdatadb`). **Ref:** Syncro #32441.
+**Supersedes** the FIX2-5 handoff's *TODO 2 (request DSCA33/45 main spec files from John)* —
+**we do not need John.** The original specs are recoverable from the Hoffman API.
+
+---
+
+## The finding (why this changes the plan)
+
+The DSCA33/DSCA45 "main spec" records (DSCMAIN/DSCOUT with SENTYPE/MAXIN/input-type) were
+lost in the cryptolocker wipe, so `render-datasheet.js` bails (null render) and the pipeline
+**skips** those models. BUT the **original software published correct DSCA33/45 certs to the
+Hoffman API before the wipe**, and our broken renderer never overwrote them (it skips null
+renders). They are still there, pristine.
+
+Verified from GURU-5070 against the public API
+(`GET https://www.dataforth.com/api/v1/TestReportDataFiles/{serial}`, OAuth client-creds,
+vaulted `clients/dataforth/hoffman-product-api`):
+
+| Family | Models | Mineable from Hoffman | Units already correct+live on Hoffman | No original anywhere |
+|---|---|---|---|---|
+| DSCA33 | 35 | **34** | 2,633 / 3,397 | **DSCA33-1948 (16 units)** |
+| DSCA45 | 23 | **22** | 4,524 / 5,413 | **DSCA45-1746 (8 units)** |
+
+So of the ~8,763 "blocked" certs: **~7,157 are already correct and live on the public site**
+(no action needed), **~1,580 not-yet-uploaded units** just need rendering, and only
+**24 units across 2 niche models** have no original to recover.
+
+---
+
+## The artifact (already built — use it, don't re-derive)
+
+`projects/dataforth-dos/dsca33-45-templates.json` — **56 models**, mined from the Hoffman
+originals with the same `===`-rule column-span extractor as STAGE 1. Schema is a **superset of
+`dsca-templates.json`**:
+
+```json
+"DSCA45-05E": {
+  "accOut": "Output (mA)",
+  "accHeader": [
+    "        Frequency    Calculated      Measured",
+    "           (Hz)      Output (mA)    Output (mA)*    Error (%)     Status"
+  ],
+  "rows": [ {"name":"Supply Current","spec":"<  105 mA"}, ... ],
+  "_srcSerial": "176326-2"
+}
+```
+
+- `rows` = the Final-Test `Parameter | Specification` list (incl. the spec-less rows like
+  `240 VAC Withstand`, `Hi-Pot`, and section sub-heads `Zero-Crossing Input` / `TTL Input` —
+  **kept**, same skip-rule reconciliation as STAGE 2).
+- `accHeader` = the **verbatim 2-line accuracy header** from the original. Use it — DSCA33/45
+  introduce header tokens the 92-model set never had (see flags) and the frequency-input layout.
+- `_srcSerial` = a known already-uploaded serial for that model → your validation oracle.
+
+Spot-checked DSCA33-07C and DSCA45-05E row-for-row against their live Hoffman originals: exact.
+
+Regenerate if needed: `python projects/dataforth-dos/tools/mine-hoffman-dsca.py <map.json> <out.json>`.
+
+---
+
+## Flags (DSCA33/45 differ from the 92 you already did)
+
+1. **New accOut tokens:** `Output (VDC)` and `Output (mADC)` (DSCA33 current/voltage DC outputs),
+   not just `Output (V)`/`Output (mA)`. Your accuracy-block must emit the verbatim token.
+2. **Model-specific accuracy input label:** DSCA33 uses `Vin (mVAC)` / `Vin (VAC)` / `Iin (AAC)` /
+   `Iin (mAAC)`. **Use the `accHeader` lines** rather than synthesizing from a (missing) spec field.
+3. **DSCA45 is frequency-input:** two-line super-header `Frequency` / `(Hz)` and a frequency
+   sweep in the accuracy block — structurally unlike the voltage/current-input models. Confirm the
+   accuracy renderer reproduces it (the `accHeader` gives you the exact text).
+4. Your **DSCA33/45 slotMaps are already derived** and come from the same original layout as these
+   rows, so order should align — verify during validation.
+
+---
+
+## Plan
+
+1. **Backup first** (fresh `pg_dump` + VSS) per FIX2-5 discipline. Save-state `datasheet-exact.js`.
+2. **Load** `dsca33-45-templates.json` for `family in (DSCA33, DSCA45)`, same wiring as STAGE 2
+   (template `rows` drive names+specs; map raw_data STATUS groups positionally via the slotMaps;
+   QB `Math.fround` rounding; data-driven loadNote). For the accuracy block, drive the header from
+   `accHeader` / `accOut`.
+3. **VALIDATE against Hoffman (the gate — stronger than STAGE 3):** for each model, render its
+   `_srcSerial` (an already-uploaded unit) and **content-normalized byte-compare against
+   `GET /api/v1/TestReportDataFiles/{_srcSerial}`**. Require a clean match **per model** before that
+   model is allowed to render/publish.
+   > **CRITICAL — do not enable live DSCA33/45 rendering until each model passes.** The moment the
+   > renderer returns non-null for DSCA33/45, the pipeline stops skipping them and will **re-push and
+   > UPDATE the ~7,157 already-correct originals** on the next cycle. That is only safe if the render
+   > byte-matches the original — which the per-model gate proves. A mismatched model would overwrite
+   > good customer certs. Gate hard.
+4. **Publish the gap:** for validated models, render + `uploadBySerialNumbers` the **not-yet-uploaded**
+   units (~1,580; `api_uploaded_at IS NULL`). Already-uploaded units return `Unchanged` (idempotent).
+5. **Leave blocked (24 units):** `DSCA33-1948` (16), `DSCA45-1746` (8) — no Hoffman original. Low
+   priority; only these would ever need John, and they look like one-off custom part numbers.
+
+Commit to `ad2`; update #32441 (hidden notes). The remote operator sees it on sync.
+
+---
+
+## Reference
+- Templates: `projects/dataforth-dos/dsca33-45-templates.json` (56 models)
+- Miner: `projects/dataforth-dos/tools/mine-hoffman-dsca.py`
+- Hoffman API creds: vault `clients/dataforth/hoffman-product-api` (read = `GET .../TestReportDataFiles/{serial}`)
+- Memory: `project_dsca33_45_spec_gap` (updated — resolved via Hoffman, not John)
--- a/projects/dataforth-dos/dsca33-45-templates.json
+++ b/projects/dataforth-dos/dsca33-45-templates.json
--- a/projects/dataforth-dos/tools/mine-hoffman-dsca.py
+++ b/projects/dataforth-dos/tools/mine-hoffman-dsca.py
@@ -0,0 +1,123 @@
+#!/usr/bin/env python3
+"""
+Mine per-model DSCA33/DSCA45 Final-Test templates from the ORIGINAL certs stored
+on Dataforth's Hoffman API (the spec files lost in the cryptolocker event are
+recoverable here because the original software published these before the wipe).
+
+Input : a JSON map [{"m": model, "s": serial}, ...] of UPLOADED serials.
+Output: dsca33-45-templates.json  (schema-compatible with dsca-templates.json:
+        { model: { "accOut": "...", "rows": [ {"name","spec"}, ... ] } })
+        + a human report on stdout.
+
+Same extraction as the STAGE-1 extractor: the '===' rule under the Final-Test
+"Parameter ... Measured" header gives exact column spans; name = Parameter col,
+spec = Specification col. Keeps the richest sheet (most rows) per model.
+"""
+import json, re, sys, time, urllib.request, urllib.parse, os
+
+TOKEN_URL = "https://login.dataforth.com/connect/token"
+API_BASE  = "https://www.dataforth.com"
+CID, CSEC, SCOPE = "dataforth.onprem.sync", "Trxvwee2234-Awer8723-2", "dataforth.web"
+
+def get_token():
+    body = urllib.parse.urlencode({
+        "grant_type": "client_credentials", "client_id": CID,
+        "client_secret": CSEC, "scope": SCOPE}).encode()
+    req = urllib.request.Request(TOKEN_URL, body,
+        {"Content-Type": "application/x-www-form-urlencoded"})
+    return json.loads(urllib.request.urlopen(req, timeout=30).read())["access_token"]
+
+def get_cert(serial, tok):
+    url = f"{API_BASE}/api/v1/TestReportDataFiles/{urllib.parse.quote(serial)}"
+    req = urllib.request.Request(url, headers={"Authorization": f"Bearer {tok}"})
+    try:
+        with urllib.request.urlopen(req, timeout=30) as r:
+            return json.loads(r.read())
+    except urllib.error.HTTPError as e:
+        if e.code == 404: return None
+        raise
+
+def col_spans(sep):
+    return [(m.start(), m.end()) for m in re.finditer(r"=+", sep)]
+
+def extract(t):
+    lines = t.replace("\r\n", "\n").split("\n")
+    ahi = next((i for i, l in enumerate(lines)
+                if "Error (%)" in l and "Status" in l), -1)
+    acc_hdr = lines[ahi] if ahi >= 0 else ""
+    # capture the verbatim 2-line accuracy header (super-header + column line) so
+    # AD2 can reproduce the model-specific input label + VDC/mADC/Hz headers exactly
+    acc_header = [lines[ahi - 1].rstrip(), lines[ahi].rstrip()] if ahi > 0 else []
+    m = re.search(r"Output \([^)]*\)|Vout \([^)]*\)", acc_hdr)
+    acc_out = m.group(0) if m else "?"
+    fi = next((i for i, l in enumerate(lines) if "FINAL TEST RESULTS" in l), -1)
+    if fi < 0: return None
+    hi = next((i for i in range(fi + 1, len(lines))
+               if re.search(r"Parameter\s+Measured", lines[i])), -1)
+    if hi < 0: return None
+    sep = lines[hi + 1] if hi + 1 < len(lines) else ""
+    if "=" not in sep: return None
+    cols = col_spans(sep)
+    if len(cols) < 4: return None
+    pc, mc, sc, stc = cols[0], cols[1], cols[2], cols[3]
+    rows = []
+    for i in range(hi + 2, len(lines)):
+        l = lines[i]
+        if re.search(r"Check List|^\s*_{5,}", l): break
+        if not l.strip(): continue
+        name = l[pc[0]:mc[0]].strip()
+        spec = l[sc[0]:stc[0]].strip()
+        if not name and not spec: continue
+        rows.append({"name": name, "spec": spec})
+    return {"accOut": acc_out, "rows": rows, "accHdr": acc_hdr.strip(),
+            "accHeader": acc_header}
+
+def main():
+    mp = json.load(open(sys.argv[1]))
+    outpath = sys.argv[2]
+    tok = get_token()
+    by_model = {}     # model -> best {accOut, rows, accHdr, serial}
+    meta = {}         # model -> diagnostics
+    missing = []
+    for row in mp:
+        model, serial = row["m"], row["s"]
+        cert = get_cert(serial, tok)
+        if not cert or not cert.get("Content"):
+            missing.append((model, serial)); continue
+        tpl = extract(cert["Content"])
+        if not tpl:
+            meta.setdefault(model, {}).setdefault("noextract", []).append(serial); continue
+        cur = by_model.get(model)
+        if not cur or len(tpl["rows"]) > len(cur["rows"]):
+            tpl["serial"] = serial
+            by_model[model] = tpl
+    # build schema-compatible output
+    out = {}
+    for model in sorted(by_model):
+        t = by_model[model]
+        out[model] = {"accOut": t["accOut"], "accHeader": t["accHeader"],
+                      "rows": t["rows"], "_srcSerial": t["serial"]}
+    with open(outpath, "w") as f:
+        json.dump(out, f, indent=0)
+    # report
+    fams = {}
+    print(f"=== Mined {len(out)} models from Hoffman -> {outpath} ===\n")
+    print(f"{'MODEL':<14} {'rows':>4}  {'accOut':<16} src-serial   accuracy-header")
+    for model in sorted(out):
+        t = by_model[model]
+        fam = model.split("-")[0]
+        fams[fam] = fams.get(fam, 0) + 1
+        flag = "  <-- LOW" if len(t["rows"]) < 3 else ""
+        print(f"{model:<14} {len(t['rows']):>4}  {t['accOut']:<16} {t['serial']:<11}  {t['accHdr'][:60]}{flag}")
+    print("\nper-family models mined:", dict(fams))
+    distinct_accout = sorted(set(o["accOut"] for o in out.values()))
+    print("distinct accOut tokens:", distinct_accout)
+    if missing:
+        print(f"\n[WARN] {len(missing)} serials returned 404 (not on Hoffman):",
+              missing[:10], "..." if len(missing) > 10 else "")
+    no_tpl = [m for m in {r['m'] for r in mp} if m not in out]
+    if no_tpl:
+        print(f"\n[WARN] models with NO usable template ({len(no_tpl)}):", no_tpl)
+
+if __name__ == "__main__":
+    main()