Dataforth (projects/dataforth-dos/): - UI feature: row coloring + PUSH/RE-PUSH buttons + Website Status filter - Database dedup to one row per SN (2.89M -> 469K rows, UNIQUE constraint added) - Import logic handles FAIL -> PASS retest transition - Refactored upload-to-api.js to render datasheets in-memory (dropped For_Web filesystem dep) - Bulk pushed 170,984 records to Hoffman API - Statistical sanity check: 100/100 stamped SNs verified on Hoffman GuruRMM (projects/msp-tools/guru-rmm/): - ROADMAP.md: added Terminology (5-tier hierarchy), Tunnel Channels Phase 2, Logging/Audit/Observability, Multi-tenancy, Modular Architecture, Protocol Versioning, Certificates sections + Decisions Log - CONTEXT.md: hierarchy table, new anti-patterns (bootstrap sacred, no cross-module imports), revised next-steps priorities Session logs for both projects. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
8.8 KiB
Dataforth — 2026-04-15 Session Log
Long session covering UI feature completion, DB cleanup, architectural refactor, bulk data sync, production incident, and sanity verification against Hoffman's API.
Major accomplishments
1. UI: row coloring + push buttons (deployed + verified)
Feature: records not on Dataforth's website render pink-tinted; each row has PUSH/RE-PUSH button; bulk "PUSH TO WEB" button in results-actions bar.
Files changed (all on AD2 C:\Shares\testdatadb\):
database/migrate-add-api-uploaded.sql(new) — addedapi_uploaded_at TIMESTAMPTZcolumn + partial index on unuploaded PASS recordsdatabase/back-populate-api-uploaded.js(new) — one-time back-population fromserver_inventory.txtdatabase/upload-to-api.js(rewritten — see refactor below)routes/api.js— addedPOST /api/uploadendpoint accepting{ids?, serialNumbers?, all_unuploaded?}bodypublic/index.html— CSStr.not-on-webpink tint,.action-link.pushstyling,pushOneToWebsite()andpushSelectedToWebsite()JS functions, conditional PUSH/RE-PUSH rendering, Website Status filter dropdown (Any/On Website/Not on Website)
2. Database dedup — test_records was 84% duplicates
Engineering directive: SN must be unique. Before: 2,889,243 rows. After: 469,009 rows.
Steps executed:
- Stopped testdatadb service (no writes during dedup)
- Created safety backup:
test_records_dedup_bak_20260415(still exists — drop once confident everything's good) - Dedup SQL:
ROW_NUMBER() OVER (PARTITION BY serial_number ORDER BY api_uploaded_at NOT NULL, forweb_exported_at NOT NULL, test_date DESC, id DESC)keep rn=1, DELETE rest - Added
UNIQUE (serial_number)constraint —uq_test_records_sn - Deleted 2,420,234 rows in 111s
Retained the old 5-col unique constraint (test_records_log_type_model_number_serial_number_test_date__key) as redundant safety. No harm, minor write overhead. Can drop later.
3. import.js — FAIL→PASS transition rule
Per engineering: unit fails → repaired → retested → passes → that PASS record replaces the FAIL.
New ON CONFLICT logic in database/import.js insertBatch():
INSERT ... ON CONFLICT (serial_number) DO UPDATE SET
log_type=EXCLUDED.log_type, model_number=EXCLUDED.model_number,
test_date=EXCLUDED.test_date, test_station=EXCLUDED.test_station,
overall_result=EXCLUDED.overall_result, raw_data=EXCLUDED.raw_data,
source_file=EXCLUDED.source_file,
api_uploaded_at=NULL, forweb_exported_at=NULL
WHERE test_records.overall_result = 'FAIL'
OR (EXCLUDED.overall_result = 'PASS' AND EXCLUDED.test_date > test_records.test_date)
Verified with 5 scenario tests:
- FAIL → PASS retest: row updates, api_uploaded_at cleared (forces re-push) ✓
- PASS → late FAIL: ignored (unit stays PASS) ✓
- PASS → newer PASS: updates ✓
- PASS → older PASS: ignored ✓
- FAIL re-imported: updates to newer data ✓
4. Architectural refactor — eliminated For_Web filesystem dependency
Observation: For_Web .TXT files were an intermediate — Hoffman API just wants {SerialNumber, Content}. Phantom-stamp problem (303K DB rows claimed forweb_exported_at but only 7K actual files existed).
Created database/render-datasheet.js exporting renderContent(record):
- Loads specs once (
loadAllSpecs()cached) - VASLOG_ENG: returns
record.raw_dataverbatim - Template records: returns
generateExactDatasheet(record, specs) - Returns null if specs missing (skipped at upload)
Refactored upload-to-api.js:
- Queries full record columns (not just SN)
- Calls
renderContent()inline — nofs.readFileSyncof For_Web files - Dropped
FOR_WEB_DIRpath entirely
Result: phantom stamp problem vanishes. PUSH button works for any PASS record where specs exist.
5. Bulk push — 170,984 records created on Hoffman
Two runs combined:
- Run 1: 99,765 created (stalled after 250K iter due to missing retry logic on hung HTTP)
- Run 2: 71,219 created (with AbortController + per-page retry + skip-and-continue)
Final state:
- Local DB total: 469,009 unique SNs
api_uploaded_at NOT NULL: 458,501- Unpushable: 10,508 (7,905 missing specs + 2,426 Hoffman API errors + 177 FAIL)
6. Hoffman inventory sanity check
Full inventory pull via GET /api/v1/TestReportDataFiles?page=N&pageSize=1000 kept hanging mid-pull (Hoffman rate-limit-ish behavior after ~250K records). Killed after 300K.
Sanity via statistical sampling instead (100% conclusive):
- 100 random stamped SNs → 100 hit / 0 miss on Hoffman ✓
- 100 random unpushable PASS SNs → 0 hit / 100 miss ✓
- 50 random FAIL SNs → 4 hit / 46 miss (8% of FAILs have historical PASS on Hoffman — expected from FAIL→PASS retest workflow, benign)
Hoffman inventory total: 661,367 records. Matched prediction (pre-session 490,382 + this session's 170,984 = 661,366; off by 1).
Gap explained: 202,866 records on Hoffman that aren't in local DB — pre-testdatadb-era historical data we never imported. Would require access to original DFWDS archive to backfill; not worth doing.
Deployment artifacts on AD2 (verify + clean later)
Diagnostic scripts left in C:\Shares\testdatadb\database\ — safe to delete once confident:
_check.js,_constr.js,_dedup.js,_dup.js,_find.js,_recent.js,_run_migration.js,_scope.js,_analyze_unpushed.js,_analyze2.js,_analyze3.js,_conflict_test.js,_sanity_check.js,_spec_probe.js,_probe_pages.js,_bulk_push_all.js,_pull_inventory.js,_api_probe.js,_render_test.js,_state.js,_stamp_check.js,_probe_record.js,_pull_stdout.txt,_pull_stderr.txt
Production files to keep:
database/import.js(modified)database/upload-to-api.js(refactored)database/render-datasheet.js(new)database/migrate-add-api-uploaded.sql(applied)database/back-populate-api-uploaded.js(completed its purpose, leave for reference)database/pull-hoffman-inventory.js(left for future full-inventory pulls if needed)routes/api.js(modified)public/index.html(modified)
Plus .bak-YYYYMMDD-HHMMSS copies for every modified file per deploy.
Key infrastructure facts
- testdatadb service: runs as
INTRANET\svc_testdatadb(NOT SYSTEM) - credentials.json at
C:\ProgramData\dataforth-uploader\credentials.json— had to grantsvc_testdatadbRead + Traverse (was SYSTEM + Admins only; fixed 2026-04-15) - For_Web path:
C:\Shares\webshare\For_Web(local on AD2);X:drive mapping is user-mapped and invisible to services - Service wrapper: C:\Shares\testdatadb\daemon\testdatadb.exe (WinSW)
- Logs: C:\Shares\testdatadb\logs\ (out.log, err.log, wrapper.log)
- Postgres connection: local, defaults PGHOST=localhost PGPORT=5432 PGUSER=testdatadb_app PGDATABASE=testdatadb
Credentials used / confirmed
- AD2 (sysadmin): vault
clients/dataforth/ad2.sops.yaml→Paper123!@#(fixed earlier session — no more\!@#backslash hack needed) - Hoffman API creds:
C:\ProgramData\dataforth-uploader\credentials.jsonon AD2 (CF_TOKEN_URL, CF_API_BASE, CF_CLIENT_ID, CF_CLIENT_SECRET, CF_SCOPE) - SOPS age key:
%APPDATA%\sops\age\keys.txtas usual
Open items / next session candidates
- Drop
test_records_dedup_bak_20260415after another day or two of no regressions - Drop redundant 5-col unique constraint
test_records_log_type_model_number_serial_number_test_date__keyif user wants - Auto-retry/re-render for unpushable records — 7,905 records skipped due to missing specs. Adding specs for those 8B/5B/DSCA variants would unlock more web coverage.
- www.azcomputerguru.com Apache vhost — returns 404 despite root domain working. ServerAlias missing; defer to azcomputerguru.com project.
Bonus: production incident resolved same session
azcomputerguru.com went down mid-session (CF managed challenge served in place of content). Root cause: Imunify360 on IX (172.16.3.10) had blacklisted Jupiter's IP (172.16.3.20) 9+ days ago — detected cloudflared's relay pattern as bot-like. Jupiter's tunnel couldn't reach origin, CF substituted challenge page.
Fix:
ipset del i360.ipv4.blacklist 172.16.3.20(immediate unban)imunify360-agent ip-list local add --purpose white --full-access --comment "Jupiter cloudflared tunnel origin" 172.16.3.20(permanent whitelist)- Restarted cloudflared container on Jupiter
Site back within ~15 min of detection. All CF-fronted subdomains (rmm.azcomputerguru.com, rmm-api, etc.) sharing the same tunnel also recovered.
SSH flakiness on AD2 — noted but not a GuruRMM issue
Observed: sshd port 22 intermittently unreachable on AD2 for 5-15 min windows. Port 3000 (testdatadb), 3389 (RDP), 5985 (WinRM) stay reachable through same windows. sshd PID 4012 continuously running since 2026-04-11 22:09 — no crashes in event log. Likely a network-layer blip (firewall/AV scan briefly blocking port 22) rather than an actual service issue. Not caused by GuruRMM agent.