dataforth/dsca33-45: recover lost specs from Hoffman API (56/58 models)
The DSCA33/DSCA45 main spec files lost in the cryptolocker wipe are recoverable: the original software published correct certs to the Hoffman product API before the wipe and our null-skipping renderer never overwrote them. Mine per-model Final-Test templates (names + specs + verbatim accuracy headers) straight from those originals instead of requesting spec files from Dataforth/John. - dsca33-45-templates.json: 56 models (DSCA33 34/35, DSCA45 22/23); only DSCA33-1948 + DSCA45-1746 (24 units) lack an original. - mine-hoffman-dsca.py: the re-runnable miner. - DSCA33-45-HOFFMAN-RECOVERY handoff for the AD2 session (incl. the gate: validate each render vs its Hoffman original before enabling live rendering). - memories: Hoffman recovery (supersedes the spec-gap "need John" note) and the AD2 SSH MTU-blackhole root cause/fix; errorlog entries (syncro jq, ssh correction). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -32,6 +32,8 @@
|
||||
- [AAD Connect msDS-KeyCredentialLink writeback](reference_aadconnect_keycredlink_writeback.md) — "completed-export-errors" + 8344 INSUFF_ACCESS_RIGHTS on a protected admin account = WHfB key writeback blocked by AdminSDHolder. Diagnose with csexport /f:x; fix with dsacls WP;msDS-KeyCredentialLink on AdminSDHolder + SDProp.
|
||||
- [UniFi Site Manager cloud API](reference_unifi_site_manager_api.md) — `api.ui.com` + `X-API-KEY` (vault `services/unifi-site-manager`) = remote access to the WHOLE ACG UniFi fleet (~36 consoles) outside UOS. Tier1 `/v1/hosts|sites|devices|isp-metrics` = inventory+health+WAN. Tier2 CONNECTOR `/v1/connector/consoles/{id}/proxy/network/api/s/default/stat/{device,sta}` = **full UOS parity** (per-radio cu_total airtime + per-client RSSI) for ANY console, remote. Backend `unifi-wifi/scripts/gw-sitemanager.sh` (`fleet|devices|sites|isp|net`). Standalone UDM WAN SSH usually firewalled; per-console SSH pw at `clients/<slug>/udm-ssh`.
|
||||
- [reference_sqlx_migrations_immutable](reference_sqlx_migrations_immutable.md) -- NEVER edit an already-applied sqlx migration file — even a comment. sqlx::migrate! checksums each file at compile time and validates against _sqlx_migrations at startup; a changed checksum crash-loops the server with "migration N was previously applied but has been modified". Code review MUST flag any edit to an applied migration.
|
||||
- [AD2 SSH MTU blackhole](ad2-ssh-mtu-blackhole.md) — AD2 SSH "lockouts"/mid-session read-errors over the Dataforth OpenVPN were a PMTU blackhole (tunnel PMTU ~1424 vs adapter MTU 1500), NOT a ban/account-lockout/flaky tunnel. Fix: pin the OpenVPN adapter MTU to 1400 (done on GURU-5070 via its SYSTEM RMM agent); permanent = `mssfix 1360` on the OpenVPN server. Diagnose over RMM, not SSH.
|
||||
- [DSCA33/45 resolved via Hoffman](project_dsca33_45_resolved_via_hoffman.md) — The "lost" DSCA33/45 spec files are recoverable from the Hoffman API (original certs survived the wipe); do NOT ask John. 56/58 models mined into projects/dataforth-dos/dsca33-45-templates.json; only DSCA33-1948 + DSCA45-1746 (24 units) lack an original. AD2 handoff: DSCA33-45-HOFFMAN-RECOVERY-2026-06-18.md.
|
||||
|
||||
## Users
|
||||
- [Howard Enos](user_howard.md) — Mike's brother, technician, full access. Machines: ACG-TECH03L, Howard-Home (authoritative in users.json).
|
||||
|
||||
40
.claude/memory/ad2-ssh-mtu-blackhole.md
Normal file
40
.claude/memory/ad2-ssh-mtu-blackhole.md
Normal file
@@ -0,0 +1,40 @@
|
||||
---
|
||||
name: ad2-ssh-mtu-blackhole
|
||||
description: AD2 SSH "lockouts"/mid-session timeouts over the Dataforth OpenVPN were an MTU/PMTU blackhole, not a ban/account-lockout/flaky tunnel; fix = pin the tunnel adapter MTU to 1400
|
||||
metadata:
|
||||
type: project
|
||||
---
|
||||
|
||||
AD2 (Dataforth, `192.168.0.6`) SSH from the fleet over OpenVPN (client subnet `192.168.6.x`)
|
||||
intermittently looked "locked out": sessions **authenticated fine**, then died mid-session with
|
||||
`Read error from remote host 192.168.6.2 ... Unknown error [postauth]` and
|
||||
`ssh_dispatch_run_fatal: Connection from authenticating user sysadmin ... Connection timed out [preauth]`.
|
||||
Small/interactive commands often worked; bulk reads + `scp` stalled.
|
||||
|
||||
**Root cause (diagnosed 2026-06-18 via RMM — SSH itself was the failing channel, so don't diagnose it over SSH):**
|
||||
- NOT account lockout — Windows lockout threshold is 5/30min but **zero 4740 events**; `sysadmin` never locked.
|
||||
- NOT an IP ban — **no IPBan/wail2ban/RdpGuard**, **0 inbound firewall block rules**.
|
||||
- NOT auth — **every** `Accepted publickey for sysadmin` succeeded.
|
||||
- NOT load — AD2 was CPU ~11%, 11.7 GB RAM free.
|
||||
- It was a **PMTU blackhole.** OpenVPN tunnel path MTU is **~1424** (DF ping: wire 1424 passes,
|
||||
1428 drops). But GURU-5070's OpenVPN adapter (`Local Area Connection`, ifIndex 12, IP
|
||||
`192.168.6.2`) was set to **MTU 1500** → TCP negotiated MSS 1460 → full-size bulk/scp segments
|
||||
exceeded the tunnel and were **silently dropped (DF set)**, while sub-MTU interactive packets
|
||||
passed. That is why it presented as random "lockouts" that got worse with bulk transfer.
|
||||
|
||||
**Fix applied (2026-06-18):** `Set-NetIPInterface -InterfaceIndex 12 -AddressFamily IPv4 -NlMtuBytes 1400`
|
||||
run via **GURU-5070's own RMM agent** (`819df0c8...`, runs as `nt authority\system` = elevated; the
|
||||
elevated lever on the local box when you can't self-elevate from the Claude shell). Validated: a
|
||||
**1.41 MB single-session SSH transfer to AD2 completed in 9s, no read error** (previously blackholed).
|
||||
`~/.ssh/config` `ad2` block annotated + tightened keepalives (`ServerAliveInterval 15`,
|
||||
`ServerAliveCountMax 4`, `ConnectTimeout 20`).
|
||||
|
||||
**Durability / permanent fix:** `Set-NetIPInterface` is registry-persistent, but **OpenVPN Connect may
|
||||
reset the adapter MTU to 1500 on reconnect** — re-apply if SSH bulk transfers start stalling again
|
||||
(check `Get-NetIPInterface -InterfaceIndex 12`). The real permanent fix is **server-side on the
|
||||
Dataforth OpenVPN server: `mssfix 1360` (or `push "tun-mtu 1400"`)** so every fleet client clamps
|
||||
automatically — `192.168.6.4` showed the identical symptom, so this is fleet-wide, not 5070-only.
|
||||
|
||||
Corrects the earlier wrong attribution ("flaky VPN tunnel" / "my rapid scp+ssh bursts triggering a
|
||||
ban") — the tunnel is up and stable for small packets; only over-MSS segments were dropped. See
|
||||
[[prefer-ssh-over-rmm]] (RMM-as-fallback guidance still holds; the *reason* was MTU, not a flaky VPN).
|
||||
27
.claude/memory/project_dsca33_45_resolved_via_hoffman.md
Normal file
27
.claude/memory/project_dsca33_45_resolved_via_hoffman.md
Normal file
@@ -0,0 +1,27 @@
|
||||
---
|
||||
name: project_dsca33_45_resolved_via_hoffman
|
||||
description: DSCA33/45 "lost spec files" are recoverable from the Hoffman API (original certs survived the wipe) — do NOT request spec files from John; mine templates from Hoffman instead
|
||||
metadata:
|
||||
type: project
|
||||
---
|
||||
|
||||
The DSCA33/DSCA45 main spec files lost in Dataforth's cryptolocker wipe (which blocked rendering
|
||||
~8,763 certs and prompted "ask John for the spec files") are **recoverable** — the original software
|
||||
published correct DSCA33/45 certs to the **Hoffman API** before the wipe, and our null-skipping
|
||||
pipeline never overwrote them. **Do not ask John for spec files.** Supersedes the FIX2-5 handoff's
|
||||
TODO 2 and the `ad2`-branch memory `project_dsca33_45_spec_gap` (which says "blocked, need John").
|
||||
|
||||
Mined **56 of 58 models** straight from Hoffman into `projects/dataforth-dos/dsca33-45-templates.json`
|
||||
(per model: `accOut`, verbatim 2-line `accHeader`, Final-Test `rows` of name+spec, and a known-good
|
||||
`_srcSerial`). Only 2 niche models have no original anywhere: **DSCA33-1948 (16u)**, **DSCA45-1746 (8u)**.
|
||||
Coverage: ~7,157 units already correct + live on Hoffman (no action); ~1,580 not-yet-uploaded units
|
||||
need rendering from the mined templates + AD2's already-derived slotMaps.
|
||||
|
||||
AD2 handoff + the critical gate: `projects/dataforth-dos/DSCA33-45-HOFFMAN-RECOVERY-2026-06-18.md`.
|
||||
**Critical:** validate each model's render byte-for-byte against its Hoffman original BEFORE enabling
|
||||
live DSCA33/45 rendering — once the renderer returns non-null, the pipeline stops skipping these and
|
||||
will re-push/UPDATE the 7,157 good originals on the next cycle (safe only if the render matches).
|
||||
|
||||
Hoffman read API: `GET https://www.dataforth.com/api/v1/TestReportDataFiles/{serial}` (returns
|
||||
`{SerialNumber,Content,CreatedAtUtc,UpdatedAtUtc}`); creds vault `clients/dataforth/hoffman-product-api`.
|
||||
Miner: `projects/dataforth-dos/tools/mine-hoffman-dsca.py`. AD2 access notes: [[ad2-ssh-mtu-blackhole]].
|
||||
@@ -21,6 +21,14 @@ Categories (the `[type]` tag): _(none)_ = skill/command execution failure ·
|
||||
|
||||
2026-06-18 | Howard-Home | rmm | [friction] agent returns exit -1 'Failed to execute command' on a ~7KB multi-line powershell body sent as one command; split into <2KB section scripts and each ran fine [ctx: host=DESKTOP-TRCIEJA agent=0.6.66]
|
||||
|
||||
2026-06-18 | GURU-5070 | syncro | comment POST piped straight to jq failed with 'jq: parse error: Invalid numeric literal at line 1 col 10' and left it AMBIGUOUS whether the note posted (GET-verify showed it had NOT); per no-retry rule had to GET first, then re-post. Robust pattern that worked: jq -n payload to a file, POST with --data-binary @file, capture response to a file, then GET-verify by subject. Skill's curl|jq comment pattern should adopt this. [ctx: ticket=32441 skill=syncro pattern=curl-pipe-jq]
|
||||
|
||||
2026-06-18 | GURU-5070 | post-bot-alert | Discord POST failed (non-200/unreachable) [ctx: channel=#bot-alerts http=400 resp={"message": "The request body contains invalid JSON.", "code": 50109}]
|
||||
|
||||
2026-06-18 | GURU-5070 | ssh/ad2 | [correction] attributed AD2 SSH timeouts to a flaky VPN tunnel + my rapid scp/ssh bursts; real cause = OpenVPN adapter MTU 1500 vs tunnel PMTU ~1424 -> TCP MSS blackhole that drops bulk/scp segments (DF set) while small cmds pass. Fix: tunnel adapter MTU 1400 [ctx: ref=feedback_prefer_ssh_over_rmm]
|
||||
|
||||
2026-06-18 | GURU-5070 | bash/env | [friction] /tmp curl-write then Windows-python read mismatch; wrote .claude/tmp + absolute path fixed it [ctx: ref=feedback_tmp_path_windows]
|
||||
|
||||
2026-06-18 | Howard-Home | pfsense-ssh/logs | [friction] used clog on pfSense 25.07 logs (now plain-text ASCII) -> empty output -> wrongly concluded DHCP log was empty / dhcpd not serving; cost a hypothesis. Read pfSense 25.07 logs with tail/grep/cat directly, NOT clog [ctx: ref=reference_pfsense_25_07_ops client=cascades-tucson]
|
||||
|
||||
2026-06-17 | GURU-5070 | mailbox/365-mail | [correction] claimed in a prior session that /mailbox skill + memories were repointed off the deleted fabb3421 to the 365-mail suite, but mailbox.md still hardwired fabb3421 (token 401 AADSTS700016). Correct app is the dedicated ComputerGuru Mailbox app 1873b1b0 via get-token.sh 'mailbox' tier (cert auth); repointed mailbox.md + feedback_365_remediation_tool.md 2026-06-17. Lesson: verify the edit actually landed before reporting it done.
|
||||
|
||||
106
projects/dataforth-dos/DSCA33-45-HOFFMAN-RECOVERY-2026-06-18.md
Normal file
106
projects/dataforth-dos/DSCA33-45-HOFFMAN-RECOVERY-2026-06-18.md
Normal file
@@ -0,0 +1,106 @@
|
||||
# DSCA33 / DSCA45 — Recover the "lost" specs from Hoffman (Handoff to AD2)
|
||||
|
||||
**For:** the Claude session on AD2 (`C:\Shares\testdatadb`). **Ref:** Syncro #32441.
|
||||
**Supersedes** the FIX2-5 handoff's *TODO 2 (request DSCA33/45 main spec files from John)* —
|
||||
**we do not need John.** The original specs are recoverable from the Hoffman API.
|
||||
|
||||
---
|
||||
|
||||
## The finding (why this changes the plan)
|
||||
|
||||
The DSCA33/DSCA45 "main spec" records (DSCMAIN/DSCOUT with SENTYPE/MAXIN/input-type) were
|
||||
lost in the cryptolocker wipe, so `render-datasheet.js` bails (null render) and the pipeline
|
||||
**skips** those models. BUT the **original software published correct DSCA33/45 certs to the
|
||||
Hoffman API before the wipe**, and our broken renderer never overwrote them (it skips null
|
||||
renders). They are still there, pristine.
|
||||
|
||||
Verified from GURU-5070 against the public API
|
||||
(`GET https://www.dataforth.com/api/v1/TestReportDataFiles/{serial}`, OAuth client-creds,
|
||||
vaulted `clients/dataforth/hoffman-product-api`):
|
||||
|
||||
| Family | Models | Mineable from Hoffman | Units already correct+live on Hoffman | No original anywhere |
|
||||
|---|---|---|---|---|
|
||||
| DSCA33 | 35 | **34** | 2,633 / 3,397 | **DSCA33-1948 (16 units)** |
|
||||
| DSCA45 | 23 | **22** | 4,524 / 5,413 | **DSCA45-1746 (8 units)** |
|
||||
|
||||
So of the ~8,763 "blocked" certs: **~7,157 are already correct and live on the public site**
|
||||
(no action needed), **~1,580 not-yet-uploaded units** just need rendering, and only
|
||||
**24 units across 2 niche models** have no original to recover.
|
||||
|
||||
---
|
||||
|
||||
## The artifact (already built — use it, don't re-derive)
|
||||
|
||||
`projects/dataforth-dos/dsca33-45-templates.json` — **56 models**, mined from the Hoffman
|
||||
originals with the same `===`-rule column-span extractor as STAGE 1. Schema is a **superset of
|
||||
`dsca-templates.json`**:
|
||||
|
||||
```json
|
||||
"DSCA45-05E": {
|
||||
"accOut": "Output (mA)",
|
||||
"accHeader": [
|
||||
" Frequency Calculated Measured",
|
||||
" (Hz) Output (mA) Output (mA)* Error (%) Status"
|
||||
],
|
||||
"rows": [ {"name":"Supply Current","spec":"< 105 mA"}, ... ],
|
||||
"_srcSerial": "176326-2"
|
||||
}
|
||||
```
|
||||
|
||||
- `rows` = the Final-Test `Parameter | Specification` list (incl. the spec-less rows like
|
||||
`240 VAC Withstand`, `Hi-Pot`, and section sub-heads `Zero-Crossing Input` / `TTL Input` —
|
||||
**kept**, same skip-rule reconciliation as STAGE 2).
|
||||
- `accHeader` = the **verbatim 2-line accuracy header** from the original. Use it — DSCA33/45
|
||||
introduce header tokens the 92-model set never had (see flags) and the frequency-input layout.
|
||||
- `_srcSerial` = a known already-uploaded serial for that model → your validation oracle.
|
||||
|
||||
Spot-checked DSCA33-07C and DSCA45-05E row-for-row against their live Hoffman originals: exact.
|
||||
|
||||
Regenerate if needed: `python projects/dataforth-dos/tools/mine-hoffman-dsca.py <map.json> <out.json>`.
|
||||
|
||||
---
|
||||
|
||||
## Flags (DSCA33/45 differ from the 92 you already did)
|
||||
|
||||
1. **New accOut tokens:** `Output (VDC)` and `Output (mADC)` (DSCA33 current/voltage DC outputs),
|
||||
not just `Output (V)`/`Output (mA)`. Your accuracy-block must emit the verbatim token.
|
||||
2. **Model-specific accuracy input label:** DSCA33 uses `Vin (mVAC)` / `Vin (VAC)` / `Iin (AAC)` /
|
||||
`Iin (mAAC)`. **Use the `accHeader` lines** rather than synthesizing from a (missing) spec field.
|
||||
3. **DSCA45 is frequency-input:** two-line super-header `Frequency` / `(Hz)` and a frequency
|
||||
sweep in the accuracy block — structurally unlike the voltage/current-input models. Confirm the
|
||||
accuracy renderer reproduces it (the `accHeader` gives you the exact text).
|
||||
4. Your **DSCA33/45 slotMaps are already derived** and come from the same original layout as these
|
||||
rows, so order should align — verify during validation.
|
||||
|
||||
---
|
||||
|
||||
## Plan
|
||||
|
||||
1. **Backup first** (fresh `pg_dump` + VSS) per FIX2-5 discipline. Save-state `datasheet-exact.js`.
|
||||
2. **Load** `dsca33-45-templates.json` for `family in (DSCA33, DSCA45)`, same wiring as STAGE 2
|
||||
(template `rows` drive names+specs; map raw_data STATUS groups positionally via the slotMaps;
|
||||
QB `Math.fround` rounding; data-driven loadNote). For the accuracy block, drive the header from
|
||||
`accHeader` / `accOut`.
|
||||
3. **VALIDATE against Hoffman (the gate — stronger than STAGE 3):** for each model, render its
|
||||
`_srcSerial` (an already-uploaded unit) and **content-normalized byte-compare against
|
||||
`GET /api/v1/TestReportDataFiles/{_srcSerial}`**. Require a clean match **per model** before that
|
||||
model is allowed to render/publish.
|
||||
> **CRITICAL — do not enable live DSCA33/45 rendering until each model passes.** The moment the
|
||||
> renderer returns non-null for DSCA33/45, the pipeline stops skipping them and will **re-push and
|
||||
> UPDATE the ~7,157 already-correct originals** on the next cycle. That is only safe if the render
|
||||
> byte-matches the original — which the per-model gate proves. A mismatched model would overwrite
|
||||
> good customer certs. Gate hard.
|
||||
4. **Publish the gap:** for validated models, render + `uploadBySerialNumbers` the **not-yet-uploaded**
|
||||
units (~1,580; `api_uploaded_at IS NULL`). Already-uploaded units return `Unchanged` (idempotent).
|
||||
5. **Leave blocked (24 units):** `DSCA33-1948` (16), `DSCA45-1746` (8) — no Hoffman original. Low
|
||||
priority; only these would ever need John, and they look like one-off custom part numbers.
|
||||
|
||||
Commit to `ad2`; update #32441 (hidden notes). The remote operator sees it on sync.
|
||||
|
||||
---
|
||||
|
||||
## Reference
|
||||
- Templates: `projects/dataforth-dos/dsca33-45-templates.json` (56 models)
|
||||
- Miner: `projects/dataforth-dos/tools/mine-hoffman-dsca.py`
|
||||
- Hoffman API creds: vault `clients/dataforth/hoffman-product-api` (read = `GET .../TestReportDataFiles/{serial}`)
|
||||
- Memory: `project_dsca33_45_spec_gap` (updated — resolved via Hoffman, not John)
|
||||
3650
projects/dataforth-dos/dsca33-45-templates.json
Normal file
3650
projects/dataforth-dos/dsca33-45-templates.json
Normal file
File diff suppressed because it is too large
Load Diff
123
projects/dataforth-dos/tools/mine-hoffman-dsca.py
Normal file
123
projects/dataforth-dos/tools/mine-hoffman-dsca.py
Normal file
@@ -0,0 +1,123 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Mine per-model DSCA33/DSCA45 Final-Test templates from the ORIGINAL certs stored
|
||||
on Dataforth's Hoffman API (the spec files lost in the cryptolocker event are
|
||||
recoverable here because the original software published these before the wipe).
|
||||
|
||||
Input : a JSON map [{"m": model, "s": serial}, ...] of UPLOADED serials.
|
||||
Output: dsca33-45-templates.json (schema-compatible with dsca-templates.json:
|
||||
{ model: { "accOut": "...", "rows": [ {"name","spec"}, ... ] } })
|
||||
+ a human report on stdout.
|
||||
|
||||
Same extraction as the STAGE-1 extractor: the '===' rule under the Final-Test
|
||||
"Parameter ... Measured" header gives exact column spans; name = Parameter col,
|
||||
spec = Specification col. Keeps the richest sheet (most rows) per model.
|
||||
"""
|
||||
import json, re, sys, time, urllib.request, urllib.parse, os
|
||||
|
||||
TOKEN_URL = "https://login.dataforth.com/connect/token"
|
||||
API_BASE = "https://www.dataforth.com"
|
||||
CID, CSEC, SCOPE = "dataforth.onprem.sync", "Trxvwee2234-Awer8723-2", "dataforth.web"
|
||||
|
||||
def get_token():
|
||||
body = urllib.parse.urlencode({
|
||||
"grant_type": "client_credentials", "client_id": CID,
|
||||
"client_secret": CSEC, "scope": SCOPE}).encode()
|
||||
req = urllib.request.Request(TOKEN_URL, body,
|
||||
{"Content-Type": "application/x-www-form-urlencoded"})
|
||||
return json.loads(urllib.request.urlopen(req, timeout=30).read())["access_token"]
|
||||
|
||||
def get_cert(serial, tok):
|
||||
url = f"{API_BASE}/api/v1/TestReportDataFiles/{urllib.parse.quote(serial)}"
|
||||
req = urllib.request.Request(url, headers={"Authorization": f"Bearer {tok}"})
|
||||
try:
|
||||
with urllib.request.urlopen(req, timeout=30) as r:
|
||||
return json.loads(r.read())
|
||||
except urllib.error.HTTPError as e:
|
||||
if e.code == 404: return None
|
||||
raise
|
||||
|
||||
def col_spans(sep):
|
||||
return [(m.start(), m.end()) for m in re.finditer(r"=+", sep)]
|
||||
|
||||
def extract(t):
|
||||
lines = t.replace("\r\n", "\n").split("\n")
|
||||
ahi = next((i for i, l in enumerate(lines)
|
||||
if "Error (%)" in l and "Status" in l), -1)
|
||||
acc_hdr = lines[ahi] if ahi >= 0 else ""
|
||||
# capture the verbatim 2-line accuracy header (super-header + column line) so
|
||||
# AD2 can reproduce the model-specific input label + VDC/mADC/Hz headers exactly
|
||||
acc_header = [lines[ahi - 1].rstrip(), lines[ahi].rstrip()] if ahi > 0 else []
|
||||
m = re.search(r"Output \([^)]*\)|Vout \([^)]*\)", acc_hdr)
|
||||
acc_out = m.group(0) if m else "?"
|
||||
fi = next((i for i, l in enumerate(lines) if "FINAL TEST RESULTS" in l), -1)
|
||||
if fi < 0: return None
|
||||
hi = next((i for i in range(fi + 1, len(lines))
|
||||
if re.search(r"Parameter\s+Measured", lines[i])), -1)
|
||||
if hi < 0: return None
|
||||
sep = lines[hi + 1] if hi + 1 < len(lines) else ""
|
||||
if "=" not in sep: return None
|
||||
cols = col_spans(sep)
|
||||
if len(cols) < 4: return None
|
||||
pc, mc, sc, stc = cols[0], cols[1], cols[2], cols[3]
|
||||
rows = []
|
||||
for i in range(hi + 2, len(lines)):
|
||||
l = lines[i]
|
||||
if re.search(r"Check List|^\s*_{5,}", l): break
|
||||
if not l.strip(): continue
|
||||
name = l[pc[0]:mc[0]].strip()
|
||||
spec = l[sc[0]:stc[0]].strip()
|
||||
if not name and not spec: continue
|
||||
rows.append({"name": name, "spec": spec})
|
||||
return {"accOut": acc_out, "rows": rows, "accHdr": acc_hdr.strip(),
|
||||
"accHeader": acc_header}
|
||||
|
||||
def main():
|
||||
mp = json.load(open(sys.argv[1]))
|
||||
outpath = sys.argv[2]
|
||||
tok = get_token()
|
||||
by_model = {} # model -> best {accOut, rows, accHdr, serial}
|
||||
meta = {} # model -> diagnostics
|
||||
missing = []
|
||||
for row in mp:
|
||||
model, serial = row["m"], row["s"]
|
||||
cert = get_cert(serial, tok)
|
||||
if not cert or not cert.get("Content"):
|
||||
missing.append((model, serial)); continue
|
||||
tpl = extract(cert["Content"])
|
||||
if not tpl:
|
||||
meta.setdefault(model, {}).setdefault("noextract", []).append(serial); continue
|
||||
cur = by_model.get(model)
|
||||
if not cur or len(tpl["rows"]) > len(cur["rows"]):
|
||||
tpl["serial"] = serial
|
||||
by_model[model] = tpl
|
||||
# build schema-compatible output
|
||||
out = {}
|
||||
for model in sorted(by_model):
|
||||
t = by_model[model]
|
||||
out[model] = {"accOut": t["accOut"], "accHeader": t["accHeader"],
|
||||
"rows": t["rows"], "_srcSerial": t["serial"]}
|
||||
with open(outpath, "w") as f:
|
||||
json.dump(out, f, indent=0)
|
||||
# report
|
||||
fams = {}
|
||||
print(f"=== Mined {len(out)} models from Hoffman -> {outpath} ===\n")
|
||||
print(f"{'MODEL':<14} {'rows':>4} {'accOut':<16} src-serial accuracy-header")
|
||||
for model in sorted(out):
|
||||
t = by_model[model]
|
||||
fam = model.split("-")[0]
|
||||
fams[fam] = fams.get(fam, 0) + 1
|
||||
flag = " <-- LOW" if len(t["rows"]) < 3 else ""
|
||||
print(f"{model:<14} {len(t['rows']):>4} {t['accOut']:<16} {t['serial']:<11} {t['accHdr'][:60]}{flag}")
|
||||
print("\nper-family models mined:", dict(fams))
|
||||
distinct_accout = sorted(set(o["accOut"] for o in out.values()))
|
||||
print("distinct accOut tokens:", distinct_accout)
|
||||
if missing:
|
||||
print(f"\n[WARN] {len(missing)} serials returned 404 (not on Hoffman):",
|
||||
missing[:10], "..." if len(missing) > 10 else "")
|
||||
no_tpl = [m for m in {r['m'] for r in mp} if m not in out]
|
||||
if no_tpl:
|
||||
print(f"\n[WARN] models with NO usable template ({len(no_tpl)}):", no_tpl)
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
Reference in New Issue
Block a user