137 lines
8.4 KiB
Markdown
137 lines
8.4 KiB
Markdown
# Dataforth test-data-chain audit via RMM-spawned AD2 Claude + multi-AI verification
|
|
|
|
## User
|
|
- **User:** Mike Swanson (mike)
|
|
- **Machine:** GURU-5070
|
|
- **Role:** admin
|
|
|
|
## Session Summary
|
|
|
|
Ran a full audit of the Dataforth test-data chain in support of Syncro #32489 (DOS test
|
|
stations not pulling updated spec files). The pivotal enabler was proving a new capability:
|
|
**spawning a headless `claude -p` on the AD2 domain controller via its GuruRMM agent** rather
|
|
than the async git-sync handoff. AD2 (192.168.0.6) is isolated from the ACG coord API but its
|
|
RMM agent phones home, and it is online (agent `cfa93bb6-...`). A staged probe confirmed:
|
|
sysadmin is logged into the console (so `context: user_session` works and returns an elevated
|
|
token), Claude Code v2.1.181 is at `C:\Users\sysadmin\.local\bin\claude.exe`, node v20 is
|
|
system-wide, and the repo is at `C:\ClaudeTools`. Headless `claude -p` initially failed with
|
|
`Invalid API key` because a stale machine-level `ANTHROPIC_API_KEY` (108 chars) shadowed
|
|
sysadmin's good OAuth creds; unsetting it (`Remove-Item Env:\ANTHROPIC_API_KEY`) let it fall
|
|
back to `.credentials.json` and return `PROBE_OK`.
|
|
|
|
With the pattern proven, launched a strictly read-only autonomous Claude auditor on AD2,
|
|
detached (runner pid 4748) so it survived the RMM command-timeout window, writing to
|
|
`C:\Users\sysadmin\ad2-audit\FINDINGS.md` with a `DONE.txt` marker. A background monitor polled
|
|
the marker; the run completed in ~17 min (exit 0, 25 KB report). The report is excellent,
|
|
evidence-backed, and corrected several stale assumptions in `CLAUDE.dataforth.md` (dated
|
|
2026-03-29).
|
|
|
|
Key findings: **F1 (HIGH)** — deployed `NWTOC.BAT` v5.0 (`COMMON\ProdSW`, verified on AD2 and
|
|
NAS) copies only `*.BAT` and `*.EXE`, **zero `.DAT`** — the confirmed root cause of #32489; and
|
|
no NWTOC version ever distributed the shared `COMMON` masters, so v5.1 must ADD a DATA copy, not
|
|
restore one. **F2 (HIGH, new)** — `NWTOC.BAT` v5.0 and `CTONWTXT.BAT` v2.3 use `COPY /Y`, which
|
|
is not a valid MS-DOS 6.22 switch (introduced in MS-DOS 7.0/Win95); on true 6.22 this errors and
|
|
copies nothing, meaning NWTOC may be failing entirely. Stale-assumption corrections: datastore is
|
|
now **PostgreSQL 18** (SQLite is a 4.4 GB archive), the scheduled task runs
|
|
**`Sync-FromNAS-rsync.ps1`**, web delivery is a live HTTP API uploader (472,290 records flagged,
|
|
not the dead `For_Web`), and `CTONWTXT` IS invoked. Also F3 (stray `TS-21\ProdSW` file breaks
|
|
rsync push every run), F4 (server generates from a frozen 2026-03-27 specdata snapshot), F5
|
|
(plaintext creds in scripts).
|
|
|
|
Then ran the requested multi-AI cross-verification of the load-bearing DOS-6.22 claim. **Grok**
|
|
(verify mode) independently confirmed F2 in full: 6.22 `COPY` supports only `/A /B /V`; `/Y`
|
|
arrived in MS-DOS 7.0/Win95; `COPY /Y src dst` → `Invalid switch - /Y`, 0 files copied; plain
|
|
`COPY` overwrites silently on 6.22 (the correct form). **Gemini** was unavailable this session
|
|
(quota exhausted on `gemini-3.1-pro-preview` + an OAuth fallback error; logged). The pivotal
|
|
unresolved question is empirical and station-only: do the stations run genuine 6.22 (→ NWTOC has
|
|
been copying nothing since 2026-03-16) or MS-DOS 7.x (→ `/Y` is fine, F1 is the sole cause)?
|
|
Stations have no RMM agent (they are DOS), so `VER` + a `COPY /Y` test must be done on a station.
|
|
|
|
## Key Decisions
|
|
|
|
- Used RMM-spawned headless Claude on AD2 (not the sync handoff) for live ground truth, since
|
|
the committed docs were 3 months stale and AD2's RMM agent is reachable despite coord isolation.
|
|
- Ran the AD2 agent strictly READ-ONLY with an ironclad brief (no writes/git/state changes,
|
|
deliverable file only) and detached + polled, given it runs on a production domain controller.
|
|
- Cross-verified only the highest-value, easy-to-get-wrong claim (DOS-6.22 `COPY /Y`) with a
|
|
second vendor, rather than re-running a redundant Claude fan-out over an already-thorough report.
|
|
- Did NOT apply any fix or touch #32489 yet — the v5.1 design and severity both hinge on the
|
|
unresolved station DOS-version question.
|
|
|
|
## Problems Encountered
|
|
|
|
- Headless `claude -p` on AD2 returned `Invalid API key` — a stale machine `ANTHROPIC_API_KEY`
|
|
shadowed the OAuth creds. Fixed by unsetting the env var before invoking (OAuth fallback).
|
|
- RMM dispatch failed once with `/mingw64/bin/curl: Permission denied` (transient AV lock on the
|
|
Git-Bash curl); nothing dispatched. Retried using `/c/Windows/System32/curl.exe`.
|
|
- Gemini (`agy`) verify failed — `gemini-3.1-pro-preview` quota exhausted and the default-model
|
|
fallback errored on OAuth `_doSetupUser`. Logged; used Grok as the cross-vendor check.
|
|
|
|
## Configuration Changes
|
|
|
|
Created:
|
|
- `clients/dataforth/docs/audits/2026-07-01-test-data-chain-audit-AD2.md` — the full AD2 audit
|
|
report (pulled back via RMM, gzip+base64).
|
|
- `.claude/memory/reference_rmm_spawn_headless_claude.md` — the validated RMM-spawn-Claude
|
|
pattern + the ANTHROPIC_API_KEY-shadow gotcha.
|
|
- MEMORY.md index line for the above.
|
|
- On AD2 (NOT in repo): `C:\Users\sysadmin\ad2-audit\{brief.txt,run.ps1,run.log,FINDINGS.md,DONE.txt}`.
|
|
|
|
Modified:
|
|
- `errorlog.md` — logged the Gemini unavailability.
|
|
|
|
## Credentials & Secrets
|
|
|
|
- No new credentials created. The AD2 audit surfaced pre-existing plaintext creds in Dataforth
|
|
scripts (F5) — flagged for rotation + vaulting, NOT copied anywhere: rsync daemon
|
|
(`rsync`/`IQ2...19`) in `Sync-FromNAS-rsync.ps1`; NAS root SSH password in the dormant
|
|
`Sync-FromNAS.ps1`; Postgres `testdatadb_app` password in `testdatadb\database\db.js`.
|
|
- AD2 has a stale machine-level `ANTHROPIC_API_KEY` (invalid) that must be unset before running
|
|
headless Claude there; sysadmin's OAuth creds at `C:\Users\sysadmin\.claude\.credentials.json`
|
|
are the working auth.
|
|
|
|
## Infrastructure & Servers
|
|
|
|
- AD2: `192.168.0.6`, Dataforth DC, Windows Server 2019 (10.0.17763), RMM agent
|
|
`cfa93bb6-0cdc-4d4e-a29e-1609cda6f047`, client "Dataforth Corp". claude.exe v2.1.181,
|
|
node v20.10.0, repo `C:\ClaudeTools` (branch ad2).
|
|
- NAS D2TESTNAS `192.168.0.9` (Linux/Samba), rsync daemon port 873 module `test` → `/data/test`.
|
|
- TestDataDB: PostgreSQL 18.3 service `postgresql-18` on AD2 (`::1:5432`), db `testdatadb`
|
|
(test_records 475,553; work_orders 34,149). App at `C:\Shares\testdatadb\`.
|
|
- Shares on AD2: `C:\Shares\test\` (COMMON\ProdSW deployed batch set; Ate\ProdSW\<type>DATA master
|
|
specs), `C:\Shares\webshare\` (Test_Datasheets active; For_Web dead since 2026-05-11).
|
|
- Engineering master `5BMAIN.DAT` 83,200 B mtime 2026-06-26 present on AD2 + NAS; server-side
|
|
`testdatadb\specdata\5BMAIN.DAT` is a frozen 2026-03-27 copy (F4).
|
|
|
|
## Commands & Outputs
|
|
|
|
- RMM dispatch to AD2 with `context:user_session`, `timeout_seconds` (not `timeout`), via
|
|
`/c/Windows/System32/curl.exe`. Headless launch: detached `Start-Process powershell -File
|
|
run.ps1 -WindowStyle Hidden`; runner unsets `ANTHROPIC_API_KEY`, `cd C:\ClaudeTools`, runs
|
|
`claude -p <brief> --permission-mode bypassPermissions --output-format text`.
|
|
- Probe result: `PROBE_OK` after unsetting the env key (exit 0, 13s).
|
|
- Audit result: `DONE exit=0`, FINDINGS.md 25,337 B, ~17 min.
|
|
- Grok verify verdict: claim correct; nitpicks = "Invalid switch" not "Invalid parameter", and the
|
|
error is visible (not literally "silent").
|
|
|
|
## Pending / Incomplete Tasks
|
|
|
|
- **PIVOTAL:** confirm station DOS version (`VER` + `COPY /Y NUL C:\TEST.TXT` on a station).
|
|
Determines whether F2 is catastrophic (6.22 → NWTOC copies nothing since 2026-03-16) or moot
|
|
(7.x). Stations have no RMM agent — reach one via NAS/console.
|
|
- Draft DOS-6.22-safe `NWTOC v5.1`: plain `COPY` (no `/Y`), one-way pull of master `.DAT`s into a
|
|
distinct local dir (avoids the cyclic-overwrite v5.0 guarded against), 6.22-valid `IF EXIST`/
|
|
`GOTO`. Grok-review before it touches a station. Correct on both 6.22 and 7.x.
|
|
- Update Syncro #32489 with the confirmed root cause (F1 + F2) and plan.
|
|
- Address F3 (remove stray `TS-21\ProdSW` file), F4 (feed specdata from masters), F5 (rotate/vault
|
|
creds) — recommendations only, nothing applied.
|
|
- Retry Gemini verification later for true two-vendor triangulation.
|
|
|
|
## Reference Information
|
|
|
|
- Audit report: `clients/dataforth/docs/audits/2026-07-01-test-data-chain-audit-AD2.md`.
|
|
- Syncro #32489 (id 113201089, Scheduled, due 2026-07-02); John Lehman contact 2851723; Dataforth
|
|
Corp customer 578095; appointment 5626864474.
|
|
- AD2 RMM agent `cfa93bb6-0cdc-4d4e-a29e-1609cda6f047`.
|
|
- Memory: `reference_rmm_spawn_headless_claude`, `gururmm-command-timeout-seconds`.
|