Files
claudetools/clients/dataforth/session-logs/2026-07/2026-07-01-mike-dataforth-test-data-chain-audit.md
Mike Swanson 7f897ce93f sync: auto-sync from GURU-5070 at 2026-07-01 13:09:08
Author: Mike Swanson
Machine: GURU-5070
Timestamp: 2026-07-01 13:09:08
2026-07-01 13:10:01 -07:00

137 lines
8.4 KiB
Markdown

# Dataforth test-data-chain audit via RMM-spawned AD2 Claude + multi-AI verification
## User
- **User:** Mike Swanson (mike)
- **Machine:** GURU-5070
- **Role:** admin
## Session Summary
Ran a full audit of the Dataforth test-data chain in support of Syncro #32489 (DOS test
stations not pulling updated spec files). The pivotal enabler was proving a new capability:
**spawning a headless `claude -p` on the AD2 domain controller via its GuruRMM agent** rather
than the async git-sync handoff. AD2 (192.168.0.6) is isolated from the ACG coord API but its
RMM agent phones home, and it is online (agent `cfa93bb6-...`). A staged probe confirmed:
sysadmin is logged into the console (so `context: user_session` works and returns an elevated
token), Claude Code v2.1.181 is at `C:\Users\sysadmin\.local\bin\claude.exe`, node v20 is
system-wide, and the repo is at `C:\ClaudeTools`. Headless `claude -p` initially failed with
`Invalid API key` because a stale machine-level `ANTHROPIC_API_KEY` (108 chars) shadowed
sysadmin's good OAuth creds; unsetting it (`Remove-Item Env:\ANTHROPIC_API_KEY`) let it fall
back to `.credentials.json` and return `PROBE_OK`.
With the pattern proven, launched a strictly read-only autonomous Claude auditor on AD2,
detached (runner pid 4748) so it survived the RMM command-timeout window, writing to
`C:\Users\sysadmin\ad2-audit\FINDINGS.md` with a `DONE.txt` marker. A background monitor polled
the marker; the run completed in ~17 min (exit 0, 25 KB report). The report is excellent,
evidence-backed, and corrected several stale assumptions in `CLAUDE.dataforth.md` (dated
2026-03-29).
Key findings: **F1 (HIGH)** — deployed `NWTOC.BAT` v5.0 (`COMMON\ProdSW`, verified on AD2 and
NAS) copies only `*.BAT` and `*.EXE`, **zero `.DAT`** — the confirmed root cause of #32489; and
no NWTOC version ever distributed the shared `COMMON` masters, so v5.1 must ADD a DATA copy, not
restore one. **F2 (HIGH, new)**`NWTOC.BAT` v5.0 and `CTONWTXT.BAT` v2.3 use `COPY /Y`, which
is not a valid MS-DOS 6.22 switch (introduced in MS-DOS 7.0/Win95); on true 6.22 this errors and
copies nothing, meaning NWTOC may be failing entirely. Stale-assumption corrections: datastore is
now **PostgreSQL 18** (SQLite is a 4.4 GB archive), the scheduled task runs
**`Sync-FromNAS-rsync.ps1`**, web delivery is a live HTTP API uploader (472,290 records flagged,
not the dead `For_Web`), and `CTONWTXT` IS invoked. Also F3 (stray `TS-21\ProdSW` file breaks
rsync push every run), F4 (server generates from a frozen 2026-03-27 specdata snapshot), F5
(plaintext creds in scripts).
Then ran the requested multi-AI cross-verification of the load-bearing DOS-6.22 claim. **Grok**
(verify mode) independently confirmed F2 in full: 6.22 `COPY` supports only `/A /B /V`; `/Y`
arrived in MS-DOS 7.0/Win95; `COPY /Y src dst``Invalid switch - /Y`, 0 files copied; plain
`COPY` overwrites silently on 6.22 (the correct form). **Gemini** was unavailable this session
(quota exhausted on `gemini-3.1-pro-preview` + an OAuth fallback error; logged). The pivotal
unresolved question is empirical and station-only: do the stations run genuine 6.22 (→ NWTOC has
been copying nothing since 2026-03-16) or MS-DOS 7.x (→ `/Y` is fine, F1 is the sole cause)?
Stations have no RMM agent (they are DOS), so `VER` + a `COPY /Y` test must be done on a station.
## Key Decisions
- Used RMM-spawned headless Claude on AD2 (not the sync handoff) for live ground truth, since
the committed docs were 3 months stale and AD2's RMM agent is reachable despite coord isolation.
- Ran the AD2 agent strictly READ-ONLY with an ironclad brief (no writes/git/state changes,
deliverable file only) and detached + polled, given it runs on a production domain controller.
- Cross-verified only the highest-value, easy-to-get-wrong claim (DOS-6.22 `COPY /Y`) with a
second vendor, rather than re-running a redundant Claude fan-out over an already-thorough report.
- Did NOT apply any fix or touch #32489 yet — the v5.1 design and severity both hinge on the
unresolved station DOS-version question.
## Problems Encountered
- Headless `claude -p` on AD2 returned `Invalid API key` — a stale machine `ANTHROPIC_API_KEY`
shadowed the OAuth creds. Fixed by unsetting the env var before invoking (OAuth fallback).
- RMM dispatch failed once with `/mingw64/bin/curl: Permission denied` (transient AV lock on the
Git-Bash curl); nothing dispatched. Retried using `/c/Windows/System32/curl.exe`.
- Gemini (`agy`) verify failed — `gemini-3.1-pro-preview` quota exhausted and the default-model
fallback errored on OAuth `_doSetupUser`. Logged; used Grok as the cross-vendor check.
## Configuration Changes
Created:
- `clients/dataforth/docs/audits/2026-07-01-test-data-chain-audit-AD2.md` — the full AD2 audit
report (pulled back via RMM, gzip+base64).
- `.claude/memory/reference_rmm_spawn_headless_claude.md` — the validated RMM-spawn-Claude
pattern + the ANTHROPIC_API_KEY-shadow gotcha.
- MEMORY.md index line for the above.
- On AD2 (NOT in repo): `C:\Users\sysadmin\ad2-audit\{brief.txt,run.ps1,run.log,FINDINGS.md,DONE.txt}`.
Modified:
- `errorlog.md` — logged the Gemini unavailability.
## Credentials & Secrets
- No new credentials created. The AD2 audit surfaced pre-existing plaintext creds in Dataforth
scripts (F5) — flagged for rotation + vaulting, NOT copied anywhere: rsync daemon
(`rsync`/`IQ2...19`) in `Sync-FromNAS-rsync.ps1`; NAS root SSH password in the dormant
`Sync-FromNAS.ps1`; Postgres `testdatadb_app` password in `testdatadb\database\db.js`.
- AD2 has a stale machine-level `ANTHROPIC_API_KEY` (invalid) that must be unset before running
headless Claude there; sysadmin's OAuth creds at `C:\Users\sysadmin\.claude\.credentials.json`
are the working auth.
## Infrastructure & Servers
- AD2: `192.168.0.6`, Dataforth DC, Windows Server 2019 (10.0.17763), RMM agent
`cfa93bb6-0cdc-4d4e-a29e-1609cda6f047`, client "Dataforth Corp". claude.exe v2.1.181,
node v20.10.0, repo `C:\ClaudeTools` (branch ad2).
- NAS D2TESTNAS `192.168.0.9` (Linux/Samba), rsync daemon port 873 module `test``/data/test`.
- TestDataDB: PostgreSQL 18.3 service `postgresql-18` on AD2 (`::1:5432`), db `testdatadb`
(test_records 475,553; work_orders 34,149). App at `C:\Shares\testdatadb\`.
- Shares on AD2: `C:\Shares\test\` (COMMON\ProdSW deployed batch set; Ate\ProdSW\<type>DATA master
specs), `C:\Shares\webshare\` (Test_Datasheets active; For_Web dead since 2026-05-11).
- Engineering master `5BMAIN.DAT` 83,200 B mtime 2026-06-26 present on AD2 + NAS; server-side
`testdatadb\specdata\5BMAIN.DAT` is a frozen 2026-03-27 copy (F4).
## Commands & Outputs
- RMM dispatch to AD2 with `context:user_session`, `timeout_seconds` (not `timeout`), via
`/c/Windows/System32/curl.exe`. Headless launch: detached `Start-Process powershell -File
run.ps1 -WindowStyle Hidden`; runner unsets `ANTHROPIC_API_KEY`, `cd C:\ClaudeTools`, runs
`claude -p <brief> --permission-mode bypassPermissions --output-format text`.
- Probe result: `PROBE_OK` after unsetting the env key (exit 0, 13s).
- Audit result: `DONE exit=0`, FINDINGS.md 25,337 B, ~17 min.
- Grok verify verdict: claim correct; nitpicks = "Invalid switch" not "Invalid parameter", and the
error is visible (not literally "silent").
## Pending / Incomplete Tasks
- **PIVOTAL:** confirm station DOS version (`VER` + `COPY /Y NUL C:\TEST.TXT` on a station).
Determines whether F2 is catastrophic (6.22 → NWTOC copies nothing since 2026-03-16) or moot
(7.x). Stations have no RMM agent — reach one via NAS/console.
- Draft DOS-6.22-safe `NWTOC v5.1`: plain `COPY` (no `/Y`), one-way pull of master `.DAT`s into a
distinct local dir (avoids the cyclic-overwrite v5.0 guarded against), 6.22-valid `IF EXIST`/
`GOTO`. Grok-review before it touches a station. Correct on both 6.22 and 7.x.
- Update Syncro #32489 with the confirmed root cause (F1 + F2) and plan.
- Address F3 (remove stray `TS-21\ProdSW` file), F4 (feed specdata from masters), F5 (rotate/vault
creds) — recommendations only, nothing applied.
- Retry Gemini verification later for true two-vendor triangulation.
## Reference Information
- Audit report: `clients/dataforth/docs/audits/2026-07-01-test-data-chain-audit-AD2.md`.
- Syncro #32489 (id 113201089, Scheduled, due 2026-07-02); John Lehman contact 2851723; Dataforth
Corp customer 578095; appointment 5626864474.
- AD2 RMM agent `cfa93bb6-0cdc-4d4e-a29e-1609cda6f047`.
- Memory: `reference_rmm_spawn_headless_claude`, `gururmm-command-timeout-seconds`.