From af8a3de00e97d55fb31e125559db224d0657fa5d Mon Sep 17 00:00:00 2001 From: Mike Swanson Date: Wed, 1 Jul 2026 13:07:04 -0700 Subject: [PATCH] sync: auto-sync from GURU-5070 at 2026-07-01 13:06:10 Author: Mike Swanson Machine: GURU-5070 Timestamp: 2026-07-01 13:06:10 --- .claude/memory/MEMORY.md | 4 +- .../feedback_defender_claude_exclusions.md | 18 ++ .../reference_rmm_spawn_headless_claude.md | 33 +++ .../2026-07-01-test-data-chain-audit-AD2.md | 270 ++++++++++++++++++ errorlog.md | 6 + 5 files changed, 330 insertions(+), 1 deletion(-) create mode 100644 .claude/memory/feedback_defender_claude_exclusions.md create mode 100644 .claude/memory/reference_rmm_spawn_headless_claude.md create mode 100644 clients/dataforth/docs/audits/2026-07-01-test-data-chain-audit-AD2.md diff --git a/.claude/memory/MEMORY.md b/.claude/memory/MEMORY.md index 92d06e4e..14ba00af 100644 --- a/.claude/memory/MEMORY.md +++ b/.claude/memory/MEMORY.md @@ -34,6 +34,7 @@ - [GuruRMM technical reference](reference_gururmm.md) — Server (172.16.3.30) layout + downloads dir `/var/www/gururmm/downloads` + `.channel` sidecar rollout control (stable/beta) + privileged server access via the server's OWN root RMM agent (hostname `gururmm`, no SSH needed; plink fallback) + API + `context=user_session` (WTS impersonation) + build-pipeline vendoring at `deploy/build-pipeline/` + Linux agent systemd sandbox trap. - [GuruRMM command timeout_seconds](reference_gururmm_command_timeout_seconds.md) — agent command dispatch honors `timeout_seconds`, NOT `timeout`; long jobs die ~300s / go zombie (`running`, empty stdout) otherwise. Cost Birth Biologic a full day. - [SharePoint Graph large-file upload](reference_sharepoint_graph_large_file_upload.md) — <4MB simple PUT, >=4MB MUST use chunked upload session (Content-Range); `\\?\` long paths; idempotent size-check; verify counts via /root/delta; single stream ~40Mbps (SPO throttle). +- [RMM-spawn headless Claude](reference_rmm_spawn_headless_claude.md) — run `claude -p` on any RMM-managed Windows box with Claude Code (reaches coord-isolated sites like AD2); use `context:user_session`, UNSET the stale machine `ANTHROPIC_API_KEY` (shadows OAuth → "Invalid API key"), detach + poll a DONE marker. Validated on AD2 2026-07-01. - [RMM agent update model](rmm-agent-update-model.md) — Agent updates are server-PUSH on heartbeat (no self-poll); available versions = filesystem scan needing a `.sha256`; promote flips `.channel` sidecars beta→stable globally. Two stranders: beta-first freezes stable until an explicit promote; agents older than ~0.6.50 re-enroll with a NEW device_id/agent row when updated. - [GuruRMM physical server storage](gururmm-physical-server-storage.md) — New box 172.16.1.231 (temp IP→will be .30), Ubuntu 26.04, ssh key `gururmm-physical`/alias `gururmm-new`. SSD (915G root) = HOT (PG default tablespace + WAL + builds); HDD ext4 at `/data` = COLD (`gururmm_cold` PG tablespace for aged `agent_logs` partitions + downloads + backups + archive). The #3 retention answer. - [Trebesch DESKTOP-QNP3ON5 shell replacement](reference_trebesch_qnp3on5.md) — AT Trebesch box runs an Explorer shell replacement; explorer.exe owner check returns blank — use Win32_ComputerSystem.UserName. GuruRMM SWIFT-LION-2892. @@ -168,6 +169,7 @@ - [Dataforth history (2026-03-27 incident)](project_dataforth_history.md) — DF-JOEL2 compromise via ScreenConnect social-engineering, attacker C2 IPs + IC3 case + remediation log + MFA rollout origin story + Joel Lohr retirement. RESOLVED 2026-04-04. - [Radio show co-host — Tara, not Tom](radio_show_no_cohost_named_tom.md) — Co-host in 2014-s6e19 and 2016-s8e43 is Tara. "Tom" was hallucinated; rename complete. - [Proposal: centralize config in identity.json](proposal_identity_centralization.md) — Rationale for the identity.json machine-config centralization (claudetools_root, ollama/python); now implemented. +- [Defender: exempt all Claude activity](feedback_defender_claude_exclusions.md) — Mike wants NOTHING Claude/ClaudeTools issues flagged by Defender; ClickFix threat IDs 2147939088/2147945138 (RMM curl dispatch AMSI FPs) set to Allow + broad process/path exclusions. ThreatID-Allow is the lever; exclusions alone don't stop AMSI CmdLine detections. - [ACG MSP tool stack](reference_acg_msp_stack.md) — ScreenConnect/CW Control, Splashtop, Syncro, Datto RMM, Datto EDR/AV, GuruRMM are ACG's OWN tools; do not flag as foreign/threat on managed machines (Defender-off is expected when Datto AV is active). - [VoIP vendor stack: PacketDial / OIT / NetSapiens / YMCS](reference_packetdial_oit_netsapiens.md) — PacketDial = ACG's VoIP-dept brand (pbx.packetdial.com, the `packetdial` skill); NetSapiens = the PBX platform (API v2); OIT/OITVOIP = white-label wholesaler running NetSapiens (api.ucaasnetwork.com); reseller `91912.service`. YMCS (Yealink) = phone device-mgmt, pairs with the PBX. - [ACG Website Hosting](project_azcomputerguru_hosting.md) — azcomputerguru.com is hosted on IX Web Hosting via cPanel. @@ -200,4 +202,4 @@ - [GuruScan verification IN TEST / paused](project_guruscan_in_test_paused.md) — multi-engine scanner verify on DESKTOP-MS42HNC paused 2026-06-22 (VM rebooted mid-Emsisoft run); HitmanPro done (36 removed), Emsisoft full-scan unverified; resume `guruscan-agent-test.sh DESKTOP-MS42HNC scan-one Emsisoft`; Defender RTP/Tamper still off on VM - [GuruRMM fleet dispatch-hang fix](project_gururmm_dispatch_hang_fix.md) — blocking send_to on a full bounded channel to one black-holed agent wedged ALL command dispatch; fixed with try_send (9dae20c, deployed); proper black-hole eviction still missing (was reverted in 80df458) — finish it if it recurs - [Windows won't-boot / offline DISM repair playbook](windows-offline-dism-repair-gotchas.md) — Automatic Repair loop = boot-critical fault (disk/registry/wedged update), NOT shell/appx store corruption (that's a symptom); `FaultyPackageInProgress` + 100s of Install/Uninstall-Pending packages = wedged CU -> RevertPendingActions or clean install. Offline DISM rejects `wim:` source (0x800f082e) -> MOUNT the wim, source `\Windows`. Ventoy breaks WIM mount (0xc1420134) -> use Rufus. 25H2(26200)=24H2(26100)+enablement, so match 26100 media. First hit: Four Paws AvImark #32447. -- [Remediation-tool has full M365 access (incl. SharePoint)](reference_remediation_tool_365_access.md) — the app suite covers Graph/EXO/Defender/SharePoint; don't declare "no access" on an accessDenied. SharePoint app-only needs a CERT (secret = "Unsupported app only token"); use get-token.sh `sharepoint`/`sharepoint-admin` tiers + CSOM admin API (Graph /admin/sharepoint/settings scope not held). Full map: skill references/app-permissions-and-sharepoint.md. +- [Remediation-tool has full M365 access (incl. SharePoint)](reference_remediation_tool_365_access.md) — the app suite covers Graph/EXO/Defender/SharePoint; don't declare "no access" on an accessDenied. SharePoint app-only needs a CERT (secret = "Unsupported app only token"); use get-token.sh `sharepoint`/`sharepoint-admin` tiers + CSOM admin API (Graph /admin/sharepoint/settings scope not held). Full map: skill references/app-permissions-and-sharepoint.md. diff --git a/.claude/memory/feedback_defender_claude_exclusions.md b/.claude/memory/feedback_defender_claude_exclusions.md new file mode 100644 index 00000000..3204dd64 --- /dev/null +++ b/.claude/memory/feedback_defender_claude_exclusions.md @@ -0,0 +1,18 @@ +--- +name: feedback_defender_claude_exclusions +description: Mike wants NOTHING Claude/ClaudeTools issues to be flagged by Windows Defender; keep broad exclusions + allow the ClickFix threat IDs that fire on RMM curl dispatch. +metadata: + type: feedback +--- + +On his workstation (GURU-*), Mike wants **nothing Claude issues to be affected by Defender AV** — it's a constant irritation. The recurring hits are `Trojan:Win32/ClickFix.DBD!MTB` (ThreatID 2147939088) and `Trojan:Win32/ClickFix.ZF` (ThreatID 2147945138), fired by Defender's AMSI **command-line** scan on the base64-PowerShell payloads that `curl.exe` POSTs to the GuruRMM coordination API (`172.16.3.30:3001/api/agents/.../command`). + +**Why:** These are false positives on legitimate ClaudeTools/GuruRMM command dispatch. He's the admin/owner and made an informed call to allow the family. + +**How to apply:** Process/path exclusions alone do NOT stop these — AMSI CmdLine/behavioral detections ignore `ExclusionProcess`/`ExclusionPath`. The lever that works is `Add-MpPreference -ThreatIDDefaultAction_Ids -ThreatIDDefaultAction_Actions Allow` (Allow = action 6) for both IDs. Also maintained (elevated PowerShell): +- ExclusionProcess: bash.exe, curl.exe, git.exe, node.exe, claude.exe +- ExclusionPath: `C:\Program Files\Git` (+ mingw64\bin, usr\bin), `C:\Program Files\nodejs`, `C:\Users\\.claude`, `C:\Users\\.local\bin`, `C:\Users\\AppData\Roaming\npm`, `C:\ClaudeTools`, `D:\ClaudeTools`. + +**ACTIVE (2026-07-01):** Mike opted for the fully-blanket lever — `Set-MpPreference -DisableScriptScanning $true` is SET on this box, disabling Defender AMSI script scanning machine-wide (his call: "I'm not likely to fall for bogus scripts"). This alone stops the CmdLine detections regardless of variant ID; the ThreatID-Allows + exclusions remain as belt-and-suspenders. If ever re-enabling, `Set-MpPreference -DisableScriptScanning $false`. + +**Fleet application (2026-07-01):** `DisableScriptScanning` is **Tamper-Protection-gated** — it silently stays `False` if TP is on, even from SYSTEM. This workstation's TP is OFF (toggle worked); **GURU-BEAST-ROG's TP is ON**, so on Beast only the exclusions + ClickFix ThreatID-Allows applied via RMM (those aren't tamper-gated and DO cover the recurring detections) — the blanket script-scanning kill there needs a manual Windows Security UI toggle (TP can't be disabled by script). Beast (GURU-BEAST-ROG, AZ Computer Guru/Mike's House, RMM id 5233d75b-...) is "treated like this machine." Howard was OFFERED the same via Discord DM — his choice on his own box; do NOT push to Howard's machine without his ok. Related: [[reference_acg_msp_stack]] (ACG's own tools shouldn't be flagged as threats), [[feedback_windows_quote_stripping]]. diff --git a/.claude/memory/reference_rmm_spawn_headless_claude.md b/.claude/memory/reference_rmm_spawn_headless_claude.md new file mode 100644 index 00000000..ab930c2c --- /dev/null +++ b/.claude/memory/reference_rmm_spawn_headless_claude.md @@ -0,0 +1,33 @@ +--- +name: rmm-spawn-headless-claude +description: Spawn a headless `claude -p` on any RMM-managed Windows box that has Claude Code installed — reaches isolated sites (AD2) the coord API can't +metadata: + type: reference +--- + +Any RMM-managed Windows endpoint with Claude Code installed can run an autonomous headless +Claude, dispatched via a GuruRMM command — even a site that's isolated from the ACG coord API. +The RMM agent phones home outbound, so this works where [[ad2-comms-via-sync-only]] says coord +can't reach (coord `:8001` blocked ≠ RMM `:3001` blocked). Validated 2026-07-01 on AD2 +(Dataforth DC, agent `cfa93bb6-...`, claude v2.1.181 at `C:\Users\sysadmin\.local\bin\claude.exe`). + +Recipe: +- Dispatch with **`"context":"user_session"`** — needs an interactive logged-on user (check + `quser`); an admin session comes back elevated. `claude` is a per-user install, not on the + SYSTEM PATH, so SYSTEM context won't find it. +- **GOTCHA: unset `ANTHROPIC_API_KEY` first.** A stale machine-level `ANTHROPIC_API_KEY` (108-char) + shadows the good OAuth creds and makes `claude -p` fail with `Invalid API key · Fix external API + key`. `Remove-Item Env:\ANTHROPIC_API_KEY` (+ `$env:ANTHROPIC_API_KEY=$null`) before invoking → + falls back to `~\.claude\.credentials.json` OAuth and authenticates. +- **Detach + poll.** A real audit run takes many minutes; RMM caps command lifetime (see + [[gururmm-command-timeout-seconds]] — use `timeout_seconds`). Launch detached + (`Start-Process powershell -File runner.ps1 -WindowStyle Hidden`), have the runner write the + deliverable to a file + a `DONE.txt` marker, and poll the marker via short RMM commands. +- Run headless as: `claude -p --permission-mode bypassPermissions --output-format text`. + For an audit, give an ironclad READ-ONLY brief (no writes/git/state changes) since + bypassPermissions lets it run any tool. Pass the brief via a base64'd file to dodge quoting. +- Windows/Git-Bash: the mingw `curl` intermittently hits `Permission denied` (AV lock) — + use `/c/Windows/System32/curl.exe` for the dispatch. See [[feedback_windows_quote_stripping]]. + +Use for: live audits/data-gathering on isolated or hard-to-reach managed boxes without the async +sync-handoff. Keep it read-only on production (AD2 is a domain controller). diff --git a/clients/dataforth/docs/audits/2026-07-01-test-data-chain-audit-AD2.md b/clients/dataforth/docs/audits/2026-07-01-test-data-chain-audit-AD2.md new file mode 100644 index 00000000..a28c526a --- /dev/null +++ b/clients/dataforth/docs/audits/2026-07-01-test-data-chain-audit-AD2.md @@ -0,0 +1,270 @@ +# Dataforth Test-Data Chain — Ground-Truth Audit (AD2) + +**Auditor:** Autonomous read-only agent (Claude), launched via RMM +**Host:** AD2 / `192.168.0.6` — Windows Server 2019 Standard 10.0.17763 (booted 2026-06-18) +**Audit window:** 2026-07-01 ~16:32–16:55 UTC +**Mode:** STRICTLY READ-ONLY. No file was created, modified, moved, renamed, or deleted except this deliverable (`C:\Users\sysadmin\ad2-audit\FINDINGS.md`). No git/service/task/DB writes were performed. The Postgres DB was queried inside `BEGIN TRANSACTION READ ONLY`; the NAS was accessed via read-only `rsync --list-only` and `ssh` only. + +> **Reading guide.** The supplied briefing was explicitly ~3 months stale ("treat as hints only"). Several premises are now wrong on the ground (SQLite → PostgreSQL; drive letters; which sync script runs; which scripts are deployed vs abandoned). Each such correction is called out. Every material claim cites live evidence (file + size + mtime, log line, DB count, or NAS listing) so other models can re-verify. + +--- + +## 1. Executive summary + +The **outbound** test-results pipeline (DOS stations → NAS → AD2 → database → web delivery) is **operational and current** — data landed in the database as recently as **2026-07-01 13:41 UTC** (same day as this audit). The **inbound** spec/software-distribution pipeline is **partially broken**: the master specification `.DAT` files are **not** distributed to the DOS stations, which is a precise, reproducible root cause for Syncro #32489. A single stray file (`TS-21\ProdSW`) makes the 15-minute sync report `ERRORS` on **every** run, masking real failures. The primary datastore has been migrated from SQLite to **PostgreSQL 18** and the web-delivery mechanism from the broken ASP.NET/`For_Web` path to an **HTTP API uploader**; neither change is reflected in the briefing. + +### Severity-ranked findings + +| # | Sev | Finding | Status | +|---|-----|---------|--------| +| F1 | **HIGH** | NWTOC.BAT v5.0 distributes **no** master spec `.DAT` files to stations (by design: "removed DATA folder copies"). Root cause of Syncro #32489. | Confirmed | +| F2 | **HIGH** | Deployed `NWTOC.BAT` v5.0 and `CTONWTXT.BAT` v2.3 use `COPY /Y`, **not a native MS-DOS 6.22 switch**. If stations run true 6.22, the station *downloader* silently fails entirely. | Confirmed in code; **needs station verification** | +| F3 | **MED** | `C:\Shares\test\TS-21\ProdSW` is a **file, not a directory** → rsync push fails (`exit 3`) on every 15-min run; sync status is permanently `ERRORS`. | Confirmed | +| F4 | **MED** | Server-side datasheet generation reads specs from `testdatadb\specdata\` — a **frozen 2026-03-27 snapshot**, not the live engineering masters (e.g. `5BMAIN.DAT` updated 2026-06-26). | Confirmed | +| F5 | **MED** | Plaintext credentials (NAS root SSH password, rsync daemon password, Postgres app password) hard-coded in scripts under `C:\Shares\test\scripts\` and `...\testdatadb\database\db.js`. | Confirmed | +| F6 | LOW | `For_Web` web-delivery output is **dead since 2026-05-11**; superseded by the API uploader. Legacy ASP.NET path effectively retired but not documented as such. | Confirmed | +| F7 | LOW | Duplicate/abandoned batch drafts in `C:\Shares\test\` root use NT-only constructs (`FOR /F`, `SET /A`, `CALL :label`, `%~dpnF`) that would fail on 6.22. Not in the boot path, but a confusion/cleanup hazard. | Confirmed | +| F8 | LOW | NAS share littered with junk/stray entries (`.DS_Store`, `NUL)`, files named `BAT`/`C`, case-dup `5b39data.dat`/`5B39DATA.DAT`, old v1.0 scripts at share root). | Confirmed | +| F9 | INFO | SSH key auth to NAS `root@192.168.0.9` is **not configured/broken** (publickey denied); the production sync does not use it (uses rsync daemon), so impact is nil today. | Confirmed | + +--- + +## 2. As-verified chain topology (corrections in **bold**) + +**NAS = `D2TESTNAS` = `192.168.0.9`** (a Linux box; `uname` blocked by SSH auth but rsync daemon + DOS AUTOEXEC confirm identity). Serves SMB shares `\\D2TESTNAS\test` (→ Linux `/data/test`) and `\\D2TESTNAS\datasheets`, plus an **rsync daemon on port 873, module `test` → `/data/test`**. + +Stations (DOS 6.22, QuickBASIC ATE — `BRUN45.EXE`, `MENUX.EXE` present on NAS) map `T: = \\D2TESTNAS\test` and `X: = \\D2TESTNAS\datasheets` (per deployed `AUTOEXEC.BAT` v3.0, lines 45–46). + +> **Correction to briefing:** On **AD2 itself**, `T:` is **not** the NAS — `net use` shows `T: → \\ad2\e-drive` (AD2's own engineering share) and `X: → \\ad2\webshare`. The DOS-station `T:`/`X:` mappings above apply on the *stations*, not on AD2. AD2 reaches the NAS only over SSH/rsync to `192.168.0.9`, never via a drive letter. + +``` +OUTBOUND (test results): + DOS station C:\ATE\{5B,7B,8B,DSC,HV,PWR,SCT,VAS}LOG\*.DAT + │ (boot: AUTOEXEC → CTONW.BAT v5.0, plain COPY) + ▼ + NAS /data/test/TS-XX/LOGS//*.DAT + /TS-XX/Reports/*.TXT + /STAGE/TS-XX/*.TXT + │ (AD2 scheduled task "Sync-FromNAS", every 15 min, rsync --remove-source-files) + ▼ + AD2 C:\Shares\test\TS-XX\LOGS\... → node import.js / import-work-orders.js / import-all-stage.js + ▼ + PostgreSQL 18 db "testdatadb" (test_records, work_orders, work_order_lines, *_quarantine) + │ (export-datasheets.js / render-datasheet.js → text datasheets; upload-to-api.js → website) + ▼ + X:\webshare\Test_Datasheets (current) [ For_Web = legacy, dead since 2026-05-11 ] + + HTTP API upload (active: 472,290 records flagged api_uploaded) + +INBOUND (specs / software): + Engineering master specs C:\Shares\test\Ate\ProdSW\{5B,7B,8B,DSC,SCT}DATA\*MAIN*.DAT + │ (AD2→NAS push, rsync --update: reaches NAS OK) + ▼ + NAS /data/test/Ate/ProdSW/DATA/*.DAT ← 5BMAIN.DAT present & current (83,200 b, 2026-06-26) + │ (station boot: NWTOC.BAT v5.0) + ▼ + ✗ NWTOC v5.0 copies only T:\COMMON\ProdSW\*.BAT → C:\BAT and T:\Ate\ProdSW\*.EXE → C:\ATE. + It copies NO *.DAT. → Master specs never reach the stations. ← ROOT CAUSE (F1) +``` + +--- + +## 3. INBOUND spec distribution — root cause of Syncro #32489 (F1) + +**The deployed downloader is `T:\COMMON\ProdSW\NWTOC.BAT`, version 5.0** (`C:\Shares\test\COMMON\ProdSW\NWTOC.BAT`, 972 bytes, mtime 2026-03-16; identical copy verified on NAS `COMMON/ProdSW/NWTOC.BAT` 972 b). Its own header states the cause verbatim: + +``` +Line 3: REM Version: 5.0 - Added EXE copy, removed DATA folder copies (avoid cyclic overwrites) +Line 18: COPY /Y T:\COMMON\ProdSW\*.BAT C:\BAT +Line 22: COPY /Y T:\Ate\ProdSW\*.EXE C:\ATE +Line 26: COPY /Y T:\COMMON\NET\*.* C:\NET +``` + +There is **no `.DAT` copy anywhere in NWTOC v5.0.** The "removed DATA folder copies" change is exactly what broke master-spec refresh on the stations. + +**The specs exist and are current everywhere except the stations**, which makes the impact concrete: + +- Engineering master on AD2: `C:\Shares\test\Ate\ProdSW\5BDATA\5BMAIN.DAT` = **83,200 b, mtime 2026-06-26 11:43**. +- Same file on the NAS (pushed successfully): `/data/test/Ate/ProdSW/5BDATA/5BMAIN.DAT` = **83,200 b, 2026-06-26 11:43** (rsync listing). +- NWTOC v5.0 will never fetch it → stations keep whatever `5BMAIN.DAT` they last received. + +Other live masters (all under `Ate\ProdSW\DATA\`), showing engineering *is* editing them: `7BMAIN.DAT` 47,940 b 2025-06-10; `8BMAIN.DAT` 24,124 b 2024-10-14; `DSCMAIN4.DAT` 65,508 b 2026-01-16; `SCTMAIN.DAT` 20,933 b 2024-08-21. (There is also a non-8.3 `8BMAIN(2013-02-15).DAT` that the sync correctly skips — see F8.) + +**Even the older NWTOC drafts never distributed COMMON master specs.** The abandoned v1.0 draft (`C:\Shares\test\NWTOC.BAT`) copies machine-specific `T:\%MACHINE%\ProdSW\*.DAT` (its lines 165–167) but never `T:\COMMON\ProdSW\*.DAT`, and `COMMON\ProdSW` contains **zero** `.DAT` files anyway (verified: 22 files, all `.BAT`). So there is no version of the download tool that pushes the shared master specs to stations. The recommended fix (F1 in §11) must **add** a DATA copy, it cannot merely "restore" one. + +--- + +## 4. MS-DOS 6.22 command-structure analysis (focus area) + +A crucial distinction the briefing blurs: **the scripts that actually run on stations live in `COMMON\ProdSW\` (and are mirrored to the NAS), NOT in the `C:\Shares\test\` root.** The root-level `NWTOC.BAT/CTONW.BAT/CHECKUPD.BAT/STAGE.BAT` are **abandoned earlier drafts** and are riddled with NT-cmd-only constructs; the deployed versions were deliberately cleaned. Judging DOS-compatibility off the root drafts would be a mistake. + +### 4a. Deployed scripts (station boot path) — DOS-6.22 status + +Boot flow (deployed `AUTOEXEC.BAT` v3.0): `STARTNET.BAT` → `NWTOC.BAT` (download) → `CTONW.BAT` (upload) → `CD \ATE` → `menux`. + +| Deployed script | Ver | DOS 6.22 verdict | Notes | +|---|---|---|---| +| `AUTOEXEC.BAT` | 3.0 | Clean | `SET`, `IF EXIST T:\*.*`, `CALL`, `GOTO` only | +| `STARTNET.BAT` | — | (not read; maps T:/X:) | out of scope, recommend spot-check | +| `NWTOC.BAT` | 5.0 | **`COPY /Y` risk (F2)** | see below | +| `CTONW.BAT` | 5.0 | Clean | uses **plain** `COPY` (no `/Y`); silently overwrites, correct on 6.22 | +| `CTONWTXT.BAT` | 2.3 | **`COPY /Y` risk (F2)** | `COPY /Y C:\STAGE\*.TXT T:\STAGE\%MACHINE%` | +| `DEPLOY.BAT` | 2.4 | Clean | header: "Use COPY instead of XCOPY (DOS 6.22 compatibility)" | +| `CHECKUPD.BAT` | 1.4 | Clean | header: "removed CALL :label subroutines"; uses `SET FLAG=` + `GOTO`, no `SET /A` | +| `STAGE.BAT` | (rev) | Clean **but orphaned** | line 133 now `TYPE ... >>` (was `FOR /F "skip=1"`); however NWTOC v5.0 never calls it, so the `AUTOEXEC.NEW`/`CONFIG.NEW` auto-update mechanism is dead code | + +### 4b. F2 — `COPY /Y` is not a native MS-DOS 6.22 switch (HIGH, needs station confirmation) + +MS-DOS 6.22's internal `COPY` supports `/A /B /V` only. The `/Y` (and `/-Y`) overwrite-suppression switch was introduced in **MS-DOS 7.0 (Windows 95)**. On a genuine 6.22 `COMMAND.COM`, `COPY /Y src dst` returns **`Invalid switch - /Y`** and performs **no copy**. + +- Deployed **`NWTOC.BAT` v5.0** uses `COPY /Y` for **all three** of its copy steps (batch files, EXEs, NET files). If the stations are truly 6.22, NWTOC copies **nothing** — a broader failure than F1 (stations would receive no batch/EXE/NET updates at all). +- Deployed **`CTONWTXT.BAT` v2.3** likewise uses `COPY /Y`. +- The upload path survives regardless, because **`CTONW.BAT` v5.0 uses plain `COPY`** — which is why test data keeps flowing (see §5) even if NWTOC is silently dead. + +**This cannot be confirmed from AD2** (the stations were not in scope/reachable for this audit). It is the single highest-value item for a cross-checker to verify **on a station**: run `VER` (confirm the exact DOS version) and test `COPY /Y NUL C:\x.txt`. If 6.22, replace `COPY /Y` → `COPY` in `NWTOC.BAT` and `CTONWTXT.BAT`. If the stations are actually MS-DOS 7.x (Win9x DOS mode), `/Y` is fine and F2 is a non-issue — the fact that F1's "removed DATA copies" note exists implies NWTOC has been running successfully, which is weak evidence the stations tolerate `/Y`; still worth a definitive check. + +### 4c. F7 — Abandoned root-level drafts (LOW, cleanup) + +`C:\Shares\test\NWTOC.BAT` (v1.0), `CTONW.BAT` (v1.2), `CHECKUPD.BAT` (v1.0), `STAGE.BAT` all use constructs invalid on 6.22 and should be removed to avoid someone deploying them by mistake: + +- `FOR /F "skip=1 delims=" %%L IN (C:\AUTOEXEC.BAT)` — `FOR /F` is NT-only (root `STAGE.BAT` line 133). +- `SET /A SYSFILE=SYSFILE+1` — `SET /A` arithmetic is NT-only (root `CHECKUPD.BAT`). +- `CALL :CHECK_COMMON_FILE` and `GOTO :EOF` — call-to-label / `:EOF` are NT-only (root `CHECKUPD.BAT`). +- `%%~dpnF`, `%~nx1` — tilde path modifiers are NT-only (root `CTONW.BAT` line 189; root `CHECKUPD.BAT` line 187). +- `T: 2>NUL` and `... >NUL 2>NUL` — handle-specific `2>` stderr redirection is NT-only; 6.22 `COMMAND.COM` supports only `>`, `>>`, `<`. + +(Note: the classic `IF EXIST T:\NUL` directory test and `CHOICE /C:YN` *are* valid 6.22 and appear in these drafts — but the items above are not.) + +### 4d. Other 6.22 considerations (INFO) + +- **Environment space.** Deployed `AUTOEXEC.BAT`/`DEPLOY.BAT` set `MACHINE`, `PATH`, `TEMP`, `TMP`. 6.22's default environment is 256 bytes and can overflow ("Out of environment space"). Ensure each station's `CONFIG.SYS` has `SHELL=C:\DOS\COMMAND.COM C:\DOS /E:512 /P`. `CONFIG.SYS` is station-local and was not inspectable from AD2 — recommend spot-check. +- **8.3 filenames.** The whole chain assumes 8.3 names; the sync explicitly skips non-8.3 files (see §5). Any master spec saved with a long name (e.g. `8BMAIN(2013-02-15).DAT`) is invisible to DOS anyway. + +--- + +## 5. Sync engine (AD2 ↔ NAS) + +> **Correction to briefing:** the scheduled task **`Sync-FromNAS` runs `Sync-FromNAS-rsync.ps1`**, not `Sync-FromNAS.ps1`. Verified from the task action: `powershell.exe -ExecutionPolicy Bypass -NonInteractive -File C:\Shares\test\scripts\Sync-FromNAS-rsync.ps1`. The non-rsync `Sync-FromNAS.ps1` (per-file SCP, NAS root creds) is dormant. The live status header `_SYNC_STATUS.txt` confirms it — it reads `... Status (rsync)` with `WO Reports Imported`/`STAGE TXT Files` fields that exist only in the rsync script's template. + +- **Task:** `Sync-FromNAS`, State **Running**, trigger every 15 min from 2026-01-20; `LastRunTime 2026-07-01 09:30`, **`LastTaskResult = 1`** (non-zero, i.e. errors), `NumberOfMissedRuns 0`. (Task-scheduler clock/report lags the log, which shows runs continuing to 13:41.) +- **Mechanism:** rsync daemon `rsync://rsync@192.168.0.9/test`, PULL uses `--archive --include=*.DAT --exclude=* --remove-source-files` per station per log-type; imports pulled files via `node import.js`. PUSH uses `--update` with an 8.3 `--files-from` allowlist. + +### F3 — permanent `ERRORS` from one stray file (MED) + +Every run today (39 runs) ends `Sync complete: PULL=n, PUSH=0, Errors=1`. The single error is identical each run: + +``` +2026-07-01 09:35:28 : ERROR: rsync push failed (exit 3): rsync.exe : rsync: + [sender] change_dir "/cygdrive/c/Shares/test/TS-21/ProdSW/" failed: Not a directory +``` + +Ground truth: `C:\Shares\test\TS-21\ProdSW` is a **file** — `Mode -a----`, **79,116 bytes**, mtime 2026-06-26; its sibling `LOGS\` and `Reports\` are directories. The file's first bytes are an `MZ` header containing the string `7BMAIN4` — i.e. a **misplaced DOS executable** dropped where a `ProdSW` *directory* is expected. The rsync PUSH loop (`Sync-FromNAS-rsync.ps1` line 723, `Push-DirectoryToNAS $prodSwPath`) cannot `change_dir` into a file → exit 3 → `errorCount=1` → status `ERRORS` forever. This is cosmetic to data flow but **masks any new, real error** and keeps `LastTaskResult=1`. + +### PULL side is healthy (evidence outbound works) + +Today's per-run `PULL` counts include `12,623` (06:44), `806` (09:21), `661` (08:50), most others `0` (because `--remove-source-files` drains the NAS, so steady-state is 0). The big pulls are backlog flushes. Import into Postgres is landing — see §6. + +The per-run PUSH log also shows the expected 8.3 skips (not errors), e.g. `Skipping (non-8.3): Copy of QB.EXE`, `_KDSCOUT1.EXE`, `8BDATA/8BMAIN(2013-02-15).DAT`, `PWRDATA/8BPWR - (2012-09-13).DAT`. + +### NAS read-only verification + +`rsync --list-only` confirmed: module root `/data/test` (3,730 entries), **62** `TS-*` station folders, `COMMON/ProdSW/` holding the v5.0 deployed set (NWTOC 972 b, CTONW 1,504 b, CTONWTXT 392 b, DEPLOY 4,142 b, CHECKUPD 3,239 b), `Ate/ProdSW/5BDATA/5BMAIN.DAT` current (83,200 b 2026-06-26), and `STAGE/` per-station folders (TS-11L/TS-11R updated 2026-07-01 06:35 — fresh STAGE TXT activity). + +### F9 — SSH root key auth to NAS is broken (INFO) + +`ssh -i C:\Users\sysadmin\.ssh\id_ed25519 root@192.168.0.9` → `Permission denied (publickey,password)`. The dormant SCP script relies on this key; the live rsync path does not, so there is no operational impact today, but the SCP fallback would fail if ever re-enabled. + +--- + +## 6. Database & generation + +> **Correction to briefing:** the datastore is **PostgreSQL 18.3**, not SQLite. `database\db.js`: *"PostgreSQL Database Abstraction Layer … Replaces better-sqlite3 singleton with pg.Pool."* `package.json` v2.0.0 depends on `pg ^8.20.0` (no `better-sqlite3`). Service `postgresql-18` is **Running** (PID 6044, listening `::1:5432`). The old SQLite file survives only as an archive: `database\archive\testdata.db` = **4,401,168,384 bytes (4.4 GB), mtime 2026-04-03** — the migration cutover date (`migrate-data.js`, `schema-pg.sql`). All queries below were run inside `BEGIN TRANSACTION READ ONLY`. + +Row counts (db `testdatadb`, schema `public`): + +| Table | Rows | +|---|---| +| `test_records` | **475,553** | +| `work_orders` | 34,149 | +| `work_order_lines` | 64,051 | +| `test_records_quarantine` | **3** (clean; dedup healthy) | +| `test_records_dedup_bak_20260415` | (backup table, stale artifact) | + +Recency / pipeline liveness (from `test_records`): + +- `max(import_date)` = **2026-07-01 13:41:53 UTC** (day of audit); `max(test_date)` = 2026-06-30. +- Imported last 24h: **111**; last 7d: **623**. `work_orders` last import **2026-07-01 13:43 UTC**. +- Per `log_type` last import: `5BLOG` & `7BLOG` → 07-01 (current); `DSCLOG` → 06-30; `8BLOG`,`PWRLOG` → 06-26; `VASLOG`,`SCTLOG` → 06-18. (8B/PWR/VAS/SCT trailing is plausibly just no production on those lines; not proven to be a fault — worth a glance.) + +Web-delivery stage flags (`test_records`): `api_uploaded_at` set on **472,290 / 475,553 (99.3%)**, with **33,908** uploaded in the last 7 days → the **HTTP API uploader (`upload-to-api.js`) is the active delivery path**. `forweb_exported_at` = 300,781 and `datasheet_exported_at` = 298,155 are legacy counters that have stopped growing. + +### F4 — server-side spec staleness (MED) + +`parsers/spec-reader.js` line 19: `const DEFAULT_SPEC_DIR = path.join(__dirname, '..', 'specdata')` and line 366 defaults to it. So the **server** generates datasheets from `C:\Shares\testdatadb\specdata\`, whose spec DATs are a **2026-03-27 snapshot** (e.g. `specdata\5BMAIN.DAT` 83,200 b 2026-03-27 vs engineering's `Ate\ProdSW\5BDATA\5BMAIN.DAT` 83,200 b **2026-06-26**; `specdata\5B49DATA.DAT` is **0 bytes**). `specdata\` is not fed from the engineering masters by any automation observed. Net effect: **both** the station-side and the server-side spec refreshes are stale/manual. If a spec limit changed after 2026-03-27, generated datasheets may use outdated limits even though the outbound data pipeline is otherwise healthy. + +### F6 — `For_Web` output dead since 2026-05-11 (LOW) + +`X:\webshare\For_Web` = **7,517 files, newest 2026-05-11 09:53**, oldest 2023-10-04. `X:\webshare\Test_Datasheets` newest = **2026-06-29 11:09** (WO 180554). Generation now targets `Test_Datasheets` + API; the ASP.NET/`For_Web` path is effectively retired (consistent with the briefing's "largely broken/404"). Recommend documenting `For_Web` as deprecated so it is not mistaken for a live output. + +--- + +## 7. F5 — Security: plaintext credentials (MED) + +Hard-coded secrets discovered in-place (already plaintext on this host; **not** copied into any vault by this read-only audit — flagged for rotation + proper vaulting): + +- `C:\Shares\test\scripts\Sync-FromNAS-rsync.ps1` L22–24: rsync daemon user `rsync` / password `IQ2…19`. +- `C:\Shares\test\scripts\Sync-FromNAS.ps1` (dormant) L19–22: NAS `root` SSH password `Pap…nas`, plus host key. +- `C:\Shares\testdatadb\database\db.js` L11–21: Postgres `testdatadb_app` / password `DfT…26!` (also the process-env default). + +Recommend: rotate all three, move to environment/DPAPI/SOPS, and scrub the scripts. (These are also world-readable via the `test` SMB share.) + +--- + +## 8. F8 — Share hygiene / clutter (LOW) + +- NAS `/data/test` root carries junk and stale duplicates: `.DS_Store`, `._.DS_Store`, a 0-byte file literally named `NUL)`, stray files named `BAT` and `C`, a case-duplicate pair `5b39data.dat` + `5B39DATA.DAT`, and **old v1.0** `NWTOC.BAT` (8,925 b) / `CTONW.BAT` (11,341 b) / `CHECKUPD.BAT` (6,495 b) at the share root (superseded by `COMMON\ProdSW`). +- AD2 `C:\Shares\test\` root mirrors the mess: files named `-`, `C`, `X`, `TS-21.stray-dat-file`, `TS-27.old.file`, plus ~10 `Sync-FromNAS.ps1.backup-*` copies and abandoned v1.x batch drafts. +- `C:\Shares\testdatadb\` has dozens of `_*.js`/`*.bak-*` scratch/debug files intermixed with production code — recommend relocating to a `scratch/` folder for clarity (no functional impact). + +--- + +## 9. Discrepancies vs the supplied briefing (for cross-checkers) + +1. **Datastore:** briefing says SQLite `database\testdata.db`; reality is **PostgreSQL 18** (`testdatadb`). SQLite is a 4.4 GB archive from the 2026-04-03 cutover. +2. **Which sync script runs:** briefing implies `Sync-FromNAS.ps1`; the task runs **`Sync-FromNAS-rsync.ps1`**. +3. **`T:` on AD2:** briefing implies `T:` = NAS staging; on AD2 `T:` = `\\ad2\e-drive`. NAS is reached by SSH/rsync to `192.168.0.9`. +4. **`import.js` path:** exists at `database\import.js` (referenced correctly), **not** repo root. +5. **CTONWTXT "gap" (prior memory):** `CTONWTXT.BAT` **is** invoked — `CTONW.BAT` v5.0 line 30 `CALL C:\BAT\CTONWTXT.BAT`. Its output (`T:\STAGE\%MACHINE%`) is now largely redundant given server-side generation, but the call is live and STAGE TXT still flows (TS-11L/R today). +6. **NWTOC "v5.0" identity:** the deployed downloader header literally reads `Version: 5.0`; the v1.0 file in the share root is an abandoned draft. Syncro #32489's "v5.0" reference is correct and points at `COMMON\ProdSW\NWTOC.BAT`. + +## 10. What I could NOT verify (scope/reachability limits) + +- **Station-side reality:** no station was inspected. F2 (`COPY /Y` on true 6.22), `CONFIG.SYS` environment sizing, `STARTNET.BAT`, and the actual `VER` are **station-confirmable only**. This is the top cross-check. +- **NAS OS/uname/df:** SSH root auth denied (F9); only rsync-daemon listings were available. NAS free space not measured. +- **Website endpoints:** the API uploader's *server* (target of `upload-to-api.js`) was not probed (out of read-only scope / avoid outward calls). Only the DB `api_uploaded_at` flags evidence success. +- **8B/PWR/VAS/SCT import lull:** observed but not root-caused (could be normal no-production). + +--- + +## 11. Recommendations (NONE applied — read-only audit) + +Priority order. All are proposals; no change was made. + +1. **(F1, HIGH) Restore master-spec distribution to stations.** Add a DATA copy to `NWTOC.BAT` — e.g. `COPY T:\Ate\ProdSW\5BDATA\*.DAT C:\ATE\5BDATA` (and 7B/8B/DSC/SCT), or a dedicated `NWSPEC.BAT`. Must be a *pull to a distinct local dir* to avoid the "cyclic overwrite" the v5.0 note was avoiding — i.e. don't push station DATA back up. Verify against how the ATE program locates `*MAIN*.DAT`. +2. **(F2, HIGH) Confirm DOS version on a station; if 6.22, replace `COPY /Y` → `COPY`** in `NWTOC.BAT` v5.0 (3 lines) and `CTONWTXT.BAT` v2.3. If MS-DOS 7.x, document that and close F2. +3. **(F3, MED) Remove/relocate the stray `C:\Shares\test\TS-21\ProdSW` file** (it is a misplaced `7BMAIN4`-ish EXE). Then verify sync returns `Errors: 0`. Consider hardening `Push-DirectoryToNAS` to skip a `ProdSW` that is not a container (the PULL side already guards this pattern; the PUSH side does not). +4. **(F4, MED) Feed `testdatadb\specdata\` from the engineering masters** (scheduled copy from `Ate\ProdSW\DATA\*MAIN*.DAT`, with 8.3 + non-empty checks; note `5B49DATA.DAT` is currently 0 bytes), or point `spec-reader.js` `DEFAULT_SPEC_DIR` at the live master directory. +5. **(F5, MED) Rotate & vault** the rsync, NAS-root, and Postgres passwords; remove plaintext from scripts; tighten `test` share ACLs. +6. **(F6/F7/F8, LOW) Cleanup:** mark `For_Web` deprecated; delete the abandoned v1.x root batch drafts and NAS root duplicates/junk; move `_*.js`/`*.bak-*` scratch out of the app dir. +7. **(F9, INFO)** Either fix `id_ed25519` auth for `root@192.168.0.9` or delete the dormant SCP script so the two sync scripts don't diverge. + +--- + +## 12. Evidence appendix — commands run (all read-only) + +- Host/shares: `hostname`, `Get-NetIPAddress`, `Get-CimInstance Win32_OperatingSystem`, `net use`, `Get-SmbShare`, `Get-PSDrive`. +- Scheduled task: `Get-ScheduledTask/Info Sync-FromNAS` (Action/Trigger/LastResult). +- Batch/PS sources read in full: deployed `COMMON\ProdSW\{NWTOC,CTONW,CTONWTXT,AUTOEXEC,DEPLOY,CHECKUPD,STAGE}.BAT`; abandoned root drafts `test\{CTONW,NWTOC,CHECKUPD,STAGE,UPDATE}.BAT`; `scripts\Sync-FromNAS.ps1`, `scripts\Sync-FromNAS-rsync.ps1`; `testdatadb\database\db.js`, `package.json`, `parsers\spec-reader.js` (grep). +- Logs: `scripts\sync-from-nas.log` (today's ERROR + summary lines), `_SYNC_STATUS.txt`. +- DB (read-only txn): `information_schema` table/column introspection; `COUNT(*)`, `max(import_date/test_date)`, interval filters, per-`log_type` aggregation; `version()`, `current_database()`. +- NAS (read-only): `ssh -o BatchMode=yes` (auth denied, recorded); `rsync --list-only` on module root, `COMMON/ProdSW/`, `Ate/ProdSW/5BDATA/`, `STAGE/`. +- Filesystem: `Get-ChildItem`/`Get-Item` for sizes/mtimes across `C:\Shares\{test,TestDataDB,webshare}`; `Get-Content -TotalCount` to identify the `TS-21\ProdSW` stray by header. + +*End of report.* diff --git a/errorlog.md b/errorlog.md index 69ad48f9..7cb9717d 100644 --- a/errorlog.md +++ b/errorlog.md @@ -17,6 +17,12 @@ Categories (the `[type]` tag): _(none)_ = skill/command execution failure · +2026-07-01 | GURU-5070 | agy/gemini-verify | gemini verify failed: gemini-3.1-pro-preview quota exhausted, default-model fallback errored on OAuth _doSetupUser. Gemini unavailable for cross-vendor verification this session. [ctx: skill=agy] + +2026-07-01 | GURU-5070 | agy | gemini returned no response (empty after 3 attempts) [ctx: mode=verify err= at async _doSetupUser (file:///C:/Users/guru/AppData/Roaming/npm/node_module] + +2026-07-01 | GURU-5070 | agy | gemini returned no response (empty after 3 attempts) [ctx: mode=verify err= at process.processTicksAndRejections (node:internal/process/task_queues:104:] + 2026-07-01 | Howard-Home | rmm/mac-mount-check | [friction] grep '/Volumes/Data' false-matched '/System/Volumes/Data' and reported MOUNTED when share was absent; use precise 'on /Volumes/Data ' (with trailing space) as the LaunchAgent does 2026-07-01 | GURU-5070 | remediation-tool | [friction] declared 'no SharePoint access' on a Graph accessDenied; actually the Tenant Admin app holds SharePoint Sites.FullControl.All - the blocks were (a) SharePoint app-only needs CERT not client_secret ('Unsupported app only token') and (b) get-token.sh had no SharePoint resource tier. Fixed: added sharepoint/sharepoint-admin tiers + reference doc. [ctx: ref=.claude/skills/remediation-tool/references/app-permissions-and-sharepoint.md]