Files
claudetools/projects/dataforth-dos/session-logs/2026-04-15-session.md
Mike Swanson 733d87f20e Dataforth UI push + dedup + refactor, GuruRMM roadmap evolution, Azure signing setup
Dataforth (projects/dataforth-dos/):
- UI feature: row coloring + PUSH/RE-PUSH buttons + Website Status filter
- Database dedup to one row per SN (2.89M -> 469K rows, UNIQUE constraint added)
- Import logic handles FAIL -> PASS retest transition
- Refactored upload-to-api.js to render datasheets in-memory (dropped For_Web filesystem dep)
- Bulk pushed 170,984 records to Hoffman API
- Statistical sanity check: 100/100 stamped SNs verified on Hoffman

GuruRMM (projects/msp-tools/guru-rmm/):
- ROADMAP.md: added Terminology (5-tier hierarchy), Tunnel Channels Phase 2,
  Logging/Audit/Observability, Multi-tenancy, Modular Architecture,
  Protocol Versioning, Certificates sections + Decisions Log
- CONTEXT.md: hierarchy table, new anti-patterns (bootstrap sacred,
  no cross-module imports), revised next-steps priorities

Session logs for both projects.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 17:39:32 -07:00

8.8 KiB

Dataforth — 2026-04-15 Session Log

Long session covering UI feature completion, DB cleanup, architectural refactor, bulk data sync, production incident, and sanity verification against Hoffman's API.

Major accomplishments

1. UI: row coloring + push buttons (deployed + verified)

Feature: records not on Dataforth's website render pink-tinted; each row has PUSH/RE-PUSH button; bulk "PUSH TO WEB" button in results-actions bar.

Files changed (all on AD2 C:\Shares\testdatadb\):

  • database/migrate-add-api-uploaded.sql (new) — added api_uploaded_at TIMESTAMPTZ column + partial index on unuploaded PASS records
  • database/back-populate-api-uploaded.js (new) — one-time back-population from server_inventory.txt
  • database/upload-to-api.js (rewritten — see refactor below)
  • routes/api.js — added POST /api/upload endpoint accepting {ids?, serialNumbers?, all_unuploaded?} body
  • public/index.html — CSS tr.not-on-web pink tint, .action-link.push styling, pushOneToWebsite() and pushSelectedToWebsite() JS functions, conditional PUSH/RE-PUSH rendering, Website Status filter dropdown (Any/On Website/Not on Website)

2. Database dedup — test_records was 84% duplicates

Engineering directive: SN must be unique. Before: 2,889,243 rows. After: 469,009 rows.

Steps executed:

  • Stopped testdatadb service (no writes during dedup)
  • Created safety backup: test_records_dedup_bak_20260415 (still exists — drop once confident everything's good)
  • Dedup SQL: ROW_NUMBER() OVER (PARTITION BY serial_number ORDER BY api_uploaded_at NOT NULL, forweb_exported_at NOT NULL, test_date DESC, id DESC) keep rn=1, DELETE rest
  • Added UNIQUE (serial_number) constraint — uq_test_records_sn
  • Deleted 2,420,234 rows in 111s

Retained the old 5-col unique constraint (test_records_log_type_model_number_serial_number_test_date__key) as redundant safety. No harm, minor write overhead. Can drop later.

3. import.js — FAIL→PASS transition rule

Per engineering: unit fails → repaired → retested → passes → that PASS record replaces the FAIL.

New ON CONFLICT logic in database/import.js insertBatch():

INSERT ... ON CONFLICT (serial_number) DO UPDATE SET
    log_type=EXCLUDED.log_type, model_number=EXCLUDED.model_number,
    test_date=EXCLUDED.test_date, test_station=EXCLUDED.test_station,
    overall_result=EXCLUDED.overall_result, raw_data=EXCLUDED.raw_data,
    source_file=EXCLUDED.source_file,
    api_uploaded_at=NULL, forweb_exported_at=NULL
WHERE test_records.overall_result = 'FAIL'
   OR (EXCLUDED.overall_result = 'PASS' AND EXCLUDED.test_date > test_records.test_date)

Verified with 5 scenario tests:

  • FAIL → PASS retest: row updates, api_uploaded_at cleared (forces re-push) ✓
  • PASS → late FAIL: ignored (unit stays PASS) ✓
  • PASS → newer PASS: updates ✓
  • PASS → older PASS: ignored ✓
  • FAIL re-imported: updates to newer data ✓

4. Architectural refactor — eliminated For_Web filesystem dependency

Observation: For_Web .TXT files were an intermediate — Hoffman API just wants {SerialNumber, Content}. Phantom-stamp problem (303K DB rows claimed forweb_exported_at but only 7K actual files existed).

Created database/render-datasheet.js exporting renderContent(record):

  • Loads specs once (loadAllSpecs() cached)
  • VASLOG_ENG: returns record.raw_data verbatim
  • Template records: returns generateExactDatasheet(record, specs)
  • Returns null if specs missing (skipped at upload)

Refactored upload-to-api.js:

  • Queries full record columns (not just SN)
  • Calls renderContent() inline — no fs.readFileSync of For_Web files
  • Dropped FOR_WEB_DIR path entirely

Result: phantom stamp problem vanishes. PUSH button works for any PASS record where specs exist.

5. Bulk push — 170,984 records created on Hoffman

Two runs combined:

  • Run 1: 99,765 created (stalled after 250K iter due to missing retry logic on hung HTTP)
  • Run 2: 71,219 created (with AbortController + per-page retry + skip-and-continue)

Final state:

  • Local DB total: 469,009 unique SNs
  • api_uploaded_at NOT NULL: 458,501
  • Unpushable: 10,508 (7,905 missing specs + 2,426 Hoffman API errors + 177 FAIL)

6. Hoffman inventory sanity check

Full inventory pull via GET /api/v1/TestReportDataFiles?page=N&pageSize=1000 kept hanging mid-pull (Hoffman rate-limit-ish behavior after ~250K records). Killed after 300K.

Sanity via statistical sampling instead (100% conclusive):

  • 100 random stamped SNs → 100 hit / 0 miss on Hoffman ✓
  • 100 random unpushable PASS SNs → 0 hit / 100 miss
  • 50 random FAIL SNs → 4 hit / 46 miss (8% of FAILs have historical PASS on Hoffman — expected from FAIL→PASS retest workflow, benign)

Hoffman inventory total: 661,367 records. Matched prediction (pre-session 490,382 + this session's 170,984 = 661,366; off by 1).

Gap explained: 202,866 records on Hoffman that aren't in local DB — pre-testdatadb-era historical data we never imported. Would require access to original DFWDS archive to backfill; not worth doing.

Deployment artifacts on AD2 (verify + clean later)

Diagnostic scripts left in C:\Shares\testdatadb\database\ — safe to delete once confident:

  • _check.js, _constr.js, _dedup.js, _dup.js, _find.js, _recent.js, _run_migration.js, _scope.js, _analyze_unpushed.js, _analyze2.js, _analyze3.js, _conflict_test.js, _sanity_check.js, _spec_probe.js, _probe_pages.js, _bulk_push_all.js, _pull_inventory.js, _api_probe.js, _render_test.js, _state.js, _stamp_check.js, _probe_record.js, _pull_stdout.txt, _pull_stderr.txt

Production files to keep:

  • database/import.js (modified)
  • database/upload-to-api.js (refactored)
  • database/render-datasheet.js (new)
  • database/migrate-add-api-uploaded.sql (applied)
  • database/back-populate-api-uploaded.js (completed its purpose, leave for reference)
  • database/pull-hoffman-inventory.js (left for future full-inventory pulls if needed)
  • routes/api.js (modified)
  • public/index.html (modified)

Plus .bak-YYYYMMDD-HHMMSS copies for every modified file per deploy.

Key infrastructure facts

  • testdatadb service: runs as INTRANET\svc_testdatadb (NOT SYSTEM)
  • credentials.json at C:\ProgramData\dataforth-uploader\credentials.json — had to grant svc_testdatadb Read + Traverse (was SYSTEM + Admins only; fixed 2026-04-15)
  • For_Web path: C:\Shares\webshare\For_Web (local on AD2); X: drive mapping is user-mapped and invisible to services
  • Service wrapper: C:\Shares\testdatadb\daemon\testdatadb.exe (WinSW)
  • Logs: C:\Shares\testdatadb\logs\ (out.log, err.log, wrapper.log)
  • Postgres connection: local, defaults PGHOST=localhost PGPORT=5432 PGUSER=testdatadb_app PGDATABASE=testdatadb

Credentials used / confirmed

  • AD2 (sysadmin): vault clients/dataforth/ad2.sops.yamlPaper123!@# (fixed earlier session — no more \!@# backslash hack needed)
  • Hoffman API creds: C:\ProgramData\dataforth-uploader\credentials.json on AD2 (CF_TOKEN_URL, CF_API_BASE, CF_CLIENT_ID, CF_CLIENT_SECRET, CF_SCOPE)
  • SOPS age key: %APPDATA%\sops\age\keys.txt as usual

Open items / next session candidates

  1. Drop test_records_dedup_bak_20260415 after another day or two of no regressions
  2. Drop redundant 5-col unique constraint test_records_log_type_model_number_serial_number_test_date__key if user wants
  3. Auto-retry/re-render for unpushable records — 7,905 records skipped due to missing specs. Adding specs for those 8B/5B/DSCA variants would unlock more web coverage.
  4. www.azcomputerguru.com Apache vhost — returns 404 despite root domain working. ServerAlias missing; defer to azcomputerguru.com project.

Bonus: production incident resolved same session

azcomputerguru.com went down mid-session (CF managed challenge served in place of content). Root cause: Imunify360 on IX (172.16.3.10) had blacklisted Jupiter's IP (172.16.3.20) 9+ days ago — detected cloudflared's relay pattern as bot-like. Jupiter's tunnel couldn't reach origin, CF substituted challenge page.

Fix:

  1. ipset del i360.ipv4.blacklist 172.16.3.20 (immediate unban)
  2. imunify360-agent ip-list local add --purpose white --full-access --comment "Jupiter cloudflared tunnel origin" 172.16.3.20 (permanent whitelist)
  3. Restarted cloudflared container on Jupiter

Site back within ~15 min of detection. All CF-fronted subdomains (rmm.azcomputerguru.com, rmm-api, etc.) sharing the same tunnel also recovered.

SSH flakiness on AD2 — noted but not a GuruRMM issue

Observed: sshd port 22 intermittently unreachable on AD2 for 5-15 min windows. Port 3000 (testdatadb), 3389 (RDP), 5985 (WinRM) stay reachable through same windows. sshd PID 4012 continuously running since 2026-04-11 22:09 — no crashes in event log. Likely a network-layer blip (firewall/AV scan briefly blocking port 22) rather than an actual service issue. Not caused by GuruRMM agent.