sync: auto-sync from GURU-5070 at 2026-07-01 13:09:08

Author: Mike Swanson
Machine: GURU-5070
Timestamp: 2026-07-01 13:09:08
This commit is contained in:
2026-07-01 13:10:01 -07:00
parent af8a3de00e
commit 7f897ce93f
2 changed files with 292 additions and 0 deletions

View File

@@ -0,0 +1,136 @@
# Dataforth test-data-chain audit via RMM-spawned AD2 Claude + multi-AI verification
## User
- **User:** Mike Swanson (mike)
- **Machine:** GURU-5070
- **Role:** admin
## Session Summary
Ran a full audit of the Dataforth test-data chain in support of Syncro #32489 (DOS test
stations not pulling updated spec files). The pivotal enabler was proving a new capability:
**spawning a headless `claude -p` on the AD2 domain controller via its GuruRMM agent** rather
than the async git-sync handoff. AD2 (192.168.0.6) is isolated from the ACG coord API but its
RMM agent phones home, and it is online (agent `cfa93bb6-...`). A staged probe confirmed:
sysadmin is logged into the console (so `context: user_session` works and returns an elevated
token), Claude Code v2.1.181 is at `C:\Users\sysadmin\.local\bin\claude.exe`, node v20 is
system-wide, and the repo is at `C:\ClaudeTools`. Headless `claude -p` initially failed with
`Invalid API key` because a stale machine-level `ANTHROPIC_API_KEY` (108 chars) shadowed
sysadmin's good OAuth creds; unsetting it (`Remove-Item Env:\ANTHROPIC_API_KEY`) let it fall
back to `.credentials.json` and return `PROBE_OK`.
With the pattern proven, launched a strictly read-only autonomous Claude auditor on AD2,
detached (runner pid 4748) so it survived the RMM command-timeout window, writing to
`C:\Users\sysadmin\ad2-audit\FINDINGS.md` with a `DONE.txt` marker. A background monitor polled
the marker; the run completed in ~17 min (exit 0, 25 KB report). The report is excellent,
evidence-backed, and corrected several stale assumptions in `CLAUDE.dataforth.md` (dated
2026-03-29).
Key findings: **F1 (HIGH)** — deployed `NWTOC.BAT` v5.0 (`COMMON\ProdSW`, verified on AD2 and
NAS) copies only `*.BAT` and `*.EXE`, **zero `.DAT`** — the confirmed root cause of #32489; and
no NWTOC version ever distributed the shared `COMMON` masters, so v5.1 must ADD a DATA copy, not
restore one. **F2 (HIGH, new)**`NWTOC.BAT` v5.0 and `CTONWTXT.BAT` v2.3 use `COPY /Y`, which
is not a valid MS-DOS 6.22 switch (introduced in MS-DOS 7.0/Win95); on true 6.22 this errors and
copies nothing, meaning NWTOC may be failing entirely. Stale-assumption corrections: datastore is
now **PostgreSQL 18** (SQLite is a 4.4 GB archive), the scheduled task runs
**`Sync-FromNAS-rsync.ps1`**, web delivery is a live HTTP API uploader (472,290 records flagged,
not the dead `For_Web`), and `CTONWTXT` IS invoked. Also F3 (stray `TS-21\ProdSW` file breaks
rsync push every run), F4 (server generates from a frozen 2026-03-27 specdata snapshot), F5
(plaintext creds in scripts).
Then ran the requested multi-AI cross-verification of the load-bearing DOS-6.22 claim. **Grok**
(verify mode) independently confirmed F2 in full: 6.22 `COPY` supports only `/A /B /V`; `/Y`
arrived in MS-DOS 7.0/Win95; `COPY /Y src dst``Invalid switch - /Y`, 0 files copied; plain
`COPY` overwrites silently on 6.22 (the correct form). **Gemini** was unavailable this session
(quota exhausted on `gemini-3.1-pro-preview` + an OAuth fallback error; logged). The pivotal
unresolved question is empirical and station-only: do the stations run genuine 6.22 (→ NWTOC has
been copying nothing since 2026-03-16) or MS-DOS 7.x (→ `/Y` is fine, F1 is the sole cause)?
Stations have no RMM agent (they are DOS), so `VER` + a `COPY /Y` test must be done on a station.
## Key Decisions
- Used RMM-spawned headless Claude on AD2 (not the sync handoff) for live ground truth, since
the committed docs were 3 months stale and AD2's RMM agent is reachable despite coord isolation.
- Ran the AD2 agent strictly READ-ONLY with an ironclad brief (no writes/git/state changes,
deliverable file only) and detached + polled, given it runs on a production domain controller.
- Cross-verified only the highest-value, easy-to-get-wrong claim (DOS-6.22 `COPY /Y`) with a
second vendor, rather than re-running a redundant Claude fan-out over an already-thorough report.
- Did NOT apply any fix or touch #32489 yet — the v5.1 design and severity both hinge on the
unresolved station DOS-version question.
## Problems Encountered
- Headless `claude -p` on AD2 returned `Invalid API key` — a stale machine `ANTHROPIC_API_KEY`
shadowed the OAuth creds. Fixed by unsetting the env var before invoking (OAuth fallback).
- RMM dispatch failed once with `/mingw64/bin/curl: Permission denied` (transient AV lock on the
Git-Bash curl); nothing dispatched. Retried using `/c/Windows/System32/curl.exe`.
- Gemini (`agy`) verify failed — `gemini-3.1-pro-preview` quota exhausted and the default-model
fallback errored on OAuth `_doSetupUser`. Logged; used Grok as the cross-vendor check.
## Configuration Changes
Created:
- `clients/dataforth/docs/audits/2026-07-01-test-data-chain-audit-AD2.md` — the full AD2 audit
report (pulled back via RMM, gzip+base64).
- `.claude/memory/reference_rmm_spawn_headless_claude.md` — the validated RMM-spawn-Claude
pattern + the ANTHROPIC_API_KEY-shadow gotcha.
- MEMORY.md index line for the above.
- On AD2 (NOT in repo): `C:\Users\sysadmin\ad2-audit\{brief.txt,run.ps1,run.log,FINDINGS.md,DONE.txt}`.
Modified:
- `errorlog.md` — logged the Gemini unavailability.
## Credentials & Secrets
- No new credentials created. The AD2 audit surfaced pre-existing plaintext creds in Dataforth
scripts (F5) — flagged for rotation + vaulting, NOT copied anywhere: rsync daemon
(`rsync`/`IQ2...19`) in `Sync-FromNAS-rsync.ps1`; NAS root SSH password in the dormant
`Sync-FromNAS.ps1`; Postgres `testdatadb_app` password in `testdatadb\database\db.js`.
- AD2 has a stale machine-level `ANTHROPIC_API_KEY` (invalid) that must be unset before running
headless Claude there; sysadmin's OAuth creds at `C:\Users\sysadmin\.claude\.credentials.json`
are the working auth.
## Infrastructure & Servers
- AD2: `192.168.0.6`, Dataforth DC, Windows Server 2019 (10.0.17763), RMM agent
`cfa93bb6-0cdc-4d4e-a29e-1609cda6f047`, client "Dataforth Corp". claude.exe v2.1.181,
node v20.10.0, repo `C:\ClaudeTools` (branch ad2).
- NAS D2TESTNAS `192.168.0.9` (Linux/Samba), rsync daemon port 873 module `test``/data/test`.
- TestDataDB: PostgreSQL 18.3 service `postgresql-18` on AD2 (`::1:5432`), db `testdatadb`
(test_records 475,553; work_orders 34,149). App at `C:\Shares\testdatadb\`.
- Shares on AD2: `C:\Shares\test\` (COMMON\ProdSW deployed batch set; Ate\ProdSW\<type>DATA master
specs), `C:\Shares\webshare\` (Test_Datasheets active; For_Web dead since 2026-05-11).
- Engineering master `5BMAIN.DAT` 83,200 B mtime 2026-06-26 present on AD2 + NAS; server-side
`testdatadb\specdata\5BMAIN.DAT` is a frozen 2026-03-27 copy (F4).
## Commands & Outputs
- RMM dispatch to AD2 with `context:user_session`, `timeout_seconds` (not `timeout`), via
`/c/Windows/System32/curl.exe`. Headless launch: detached `Start-Process powershell -File
run.ps1 -WindowStyle Hidden`; runner unsets `ANTHROPIC_API_KEY`, `cd C:\ClaudeTools`, runs
`claude -p <brief> --permission-mode bypassPermissions --output-format text`.
- Probe result: `PROBE_OK` after unsetting the env key (exit 0, 13s).
- Audit result: `DONE exit=0`, FINDINGS.md 25,337 B, ~17 min.
- Grok verify verdict: claim correct; nitpicks = "Invalid switch" not "Invalid parameter", and the
error is visible (not literally "silent").
## Pending / Incomplete Tasks
- **PIVOTAL:** confirm station DOS version (`VER` + `COPY /Y NUL C:\TEST.TXT` on a station).
Determines whether F2 is catastrophic (6.22 → NWTOC copies nothing since 2026-03-16) or moot
(7.x). Stations have no RMM agent — reach one via NAS/console.
- Draft DOS-6.22-safe `NWTOC v5.1`: plain `COPY` (no `/Y`), one-way pull of master `.DAT`s into a
distinct local dir (avoids the cyclic-overwrite v5.0 guarded against), 6.22-valid `IF EXIST`/
`GOTO`. Grok-review before it touches a station. Correct on both 6.22 and 7.x.
- Update Syncro #32489 with the confirmed root cause (F1 + F2) and plan.
- Address F3 (remove stray `TS-21\ProdSW` file), F4 (feed specdata from masters), F5 (rotate/vault
creds) — recommendations only, nothing applied.
- Retry Gemini verification later for true two-vendor triangulation.
## Reference Information
- Audit report: `clients/dataforth/docs/audits/2026-07-01-test-data-chain-audit-AD2.md`.
- Syncro #32489 (id 113201089, Scheduled, due 2026-07-02); John Lehman contact 2851723; Dataforth
Corp customer 578095; appointment 5626864474.
- AD2 RMM agent `cfa93bb6-0cdc-4d4e-a29e-1609cda6f047`.
- Memory: `reference_rmm_spawn_headless_claude`, `gururmm-command-timeout-seconds`.

View File

@@ -0,0 +1,156 @@
## User
- **User:** Mike Swanson (mike)
- **Machine:** GURU-5070
- **Role:** admin
## Session Summary
Continuation of the 2026-07-01 Peaceful Spirit session (deletion investigation + Admin1/Admin2 ACL
hardening already saved to `clients/peaceful-spirit/session-logs/2026-07/2026-07-01-mike-pst-deletion-scope-shelton-admin-acl.md`,
commit 6f676672). This log covers the work after that save: the Peaceful Spirit wiki full recompile,
a Birth Biologic Syncro ticket, and a remediation-tool tooling+documentation fix that unblocks
SharePoint access.
First, ran `/wiki-compile client:peaceful-spirit --full` — the article was stale (2026-06-04) and
missing the June two-DC DFS rebuild, the deletion investigation, and the ACL work. Delegated
synthesis to a Sonnet subagent (12 source logs + live Syncro data), reviewed the draft, fixed
HTML-entity artifacts, staged + applied, updated the index. Committed e583bf43.
Second, handled Syncro ticket #32492 (Birth Biologic, "Setting for SharePoint", assigned Winter,
converted from a lead). Annise Neidrich asked how to stop employees from creating SharePoint sites
(the new "Discover/Publish/Build" nav worried Brandy) without losing the ability to edit their
"Staff Portal" site. Diagnosed the underlying misconception: site creation and site editing are
separate permissions. Investigated the tenant's live settings via the remediation-tool (read-only)
and found both site-creation levers open.
Third — and the largest piece — resolved recurring remediation-tool friction. The Graph
`/admin/sharepoint/settings` call returned accessDenied, which initially read as "no SharePoint
access." Mike pushed back (certs are in the vault for a reason). Validated empirically by decoding
each Graph tier's token `roles` claim: the Tenant Admin app holds `Sites.FullControl.All` (Graph
AND SharePoint resource). The real blockers were (a) SharePoint app-only REJECTS client_secret
tokens ("Unsupported app only token") and requires a CERTIFICATE, and (b) `get-token.sh` had no
SharePoint resource tier. Proved cert-based SharePoint access end-to-end (REST + CSOM), then fixed
the tooling: added `sharepoint`/`sharepoint-admin` tiers to get-token.sh (cert-forced, tenant
resource auto-resolved), wrote a reference doc with the live per-tier permission map + gotchas +
CSOM examples, updated SKILL.md, logged the friction, and saved a reference memory. Committed 6f7f939a.
Finally, posted the customer reply on #32492 as Mike (comment 421624836, customer-visible, emailed),
then set the ticket to Waiting on Customer. Implementation of the actual lockdown is ready but gated
on the client's go-ahead.
## Key Decisions
- **Root-level (general) session log** — this session spans harness tooling (remediation-tool) +
Birth Biologic (a Syncro ticket) + a wiki rebuild; no single client/project owns it.
- **Validated app permissions by decoding the token `roles` claim** rather than trusting the
accessDenied — this is the authoritative source and disproved the "no access" read.
- **Added SharePoint tiers to get-token.sh instead of a one-off script** — makes SharePoint a
first-class, reusable capability of the remediation-tool, and cert-forces it so the secret gotcha
can't recur. Auto-resolves the tenant-specific resource host via Graph /sites/root.
- **Documented the full per-tier permission map + the two SharePoint gotchas** in a reference doc
and a memory, with a "don't declare no-access without checking" discipline — the recurring friction
was the point, not just this one ticket.
- **Answered #32492 in plain language** (no CSOM/group-setting mechanics) — Annise is an ops
consultant, not IT. Set Waiting on Customer since the fix needs her go-ahead.
- **Did NOT implement the site-creation lockdown** — it's a tenant write to live client config;
gated behind explicit client approval + a preview.
## Problems Encountered
- **Graph `/admin/sharepoint/settings` accessDenied misread as "no SharePoint access."** Root cause:
no app holds `SharePointTenantSettings.Read.All` (that Graph route needs it), but the apps DO hold
`Sites.FullControl.All`. Resolved by using the SharePoint CSOM/REST admin API instead of the Graph
settings endpoint.
- **SharePoint REST/CSOM returned "Unsupported app only token"** with a client_secret token. SharePoint
Online requires certificate-based app-only tokens. Resolved by using the vault cert (client_assertion);
proved with `/_api/web/title` -> "Tenant Administration".
- **get-token.sh had no SharePoint resource tier** (all tiers map to Graph/EXO/Defender fixed scopes).
Added `sharepoint`/`sharepoint-admin`; the SharePoint resource host is tenant-specific so it's
resolved at runtime from Graph /sites/root (recursive get-token call), overridable via SP_RESOURCE_ENV.
- **Sonnet wiki draft HTML-escaped some chars** (`&amp;`/`&lt;`/`&gt;`). Fixed in the staged file
before applying; verified no entities and no raw secrets leaked.
- **get-token.sh reads the GLOBAL `~/.claude/identity.json`** (which lacks vault_path on this box);
set `VAULT_ROOT_ENV=D:/vault` to point it at the vault. (Repo identity has vault_path; global does not.)
## Configuration Changes
- **`.claude/skills/remediation-tool/scripts/get-token.sh`** — added `sharepoint` and `sharepoint-admin`
tiers (Tenant Admin app, cert-forced, tenant SharePoint resource auto-resolved from Graph /sites/root;
`SP_RESOURCE_ENV` override). Added `SP_KIND` init + a resource-resolution block. Syntax-checked (`bash -n`).
- **`.claude/skills/remediation-tool/references/app-permissions-and-sharepoint.md`** (NEW) — live-decoded
per-tier application-permission map, the two SharePoint gotchas (cert requirement; Graph settings scope
not held), CSOM/REST usage examples, and the "verify before declaring no-access" discipline.
- **`.claude/skills/remediation-tool/SKILL.md`** — added the SharePoint tier rows + a prominent
full-365-coverage note pointing at the new reference.
- **`.claude/memory/reference_remediation_tool_365_access.md`** (NEW) + MEMORY.md index line.
- **`wiki/clients/peaceful-spirit.md`** — full recompile (two-DC DFS, deletion investigation, ACL model,
new Patterns). **`wiki/index.md`** — updated peaceful-spirit row + date (note: index later also
updated by another machine's linter; left intact).
- **`errorlog.md`** — one `--friction` entry (remediation-tool no-access-vs-cert).
- **Syncro #32492** — comment 421624836 posted (customer-visible, emailed); status -> Waiting on Customer.
- No local vault changes.
## Credentials & Secrets
- No new credentials. Used the ComputerGuru Tenant Admin app (`709e6eed-0711-4875-9c44-2d3518c47063`,
vault `msp-tools/computerguru-tenant-admin.sops.yaml`) — has BOTH client_secret and a certificate
(`cert_thumbprint_b64url` + `cert_private_key_pem_b64`). **SharePoint requires the cert**, not the
secret. Read-only Graph/SharePoint queries against Birth Biologic; no writes to the tenant.
- Syncro API key: mike per-user token (skill-managed). No secrets exposed in the ticket.
## Infrastructure & Servers
- **Birth Biologic M365 tenant:** `19a568e8-9e88-413b-9341-cbc224b39145`. SharePoint host
`birthbiologic.sharepoint.com` (admin `birthbiologic-admin.sharepoint.com`).
- `SelfServiceSiteCreationDisabled = false` (self-service site creation ENABLED).
- M365 Group creation: unrestricted — `GET /groupSettings` returned 0 entries (no Group.Unified
override) => default `EnableGroupCreation = true`.
- `AllowClassicPublishingSiteCreation = false`; `SharingCapability = 2` (external+guest).
- **remediation-tool per-tier app roles (decoded 2026-07-01):**
- investigator (`bfbc12a4-...`): Directory.Read.All, User.Read.All, Sites.Read.All, AuditLog.Read.All,
Application.Read.All, Organization.Read.All, Policy.Read.All, Mail.Read, MailboxSettings.Read,
BitlockerKey.Read.All, IdentityRisky* (some ReadWrite), UserAuthenticationMethod.Read.All.
- user-manager (`64fac46b-...`): Directory.ReadWrite.All, Group.ReadWrite.All, User.ReadWrite.All,
Device.ReadWrite.All, User.RevokeSessions.All, UserAuthenticationMethod.ReadWrite.All, Organization.Read.All.
- tenant-admin (`709e6eed-...`): Application.ReadWrite.All, AppRoleAssignment.ReadWrite.All,
RoleManagement.ReadWrite.Directory, Directory.ReadWrite.All, Policy.ReadWrite.ConditionalAccess,
**Sites.FullControl.All, Sites.ReadWrite.All** (Graph) + **Sites.FullControl.All** (SharePoint
resource `00000003-0000-0ff1-ce00-000000000000`), User.ReadWrite.All, SecurityEvents.Read.All.
## Commands & Outputs
- Decode a tier's granted app permissions: `bash get-token.sh <tenant> <tier>` then base64url-decode
the JWT payload middle segment and read `.roles`.
- SharePoint token (cert): `bash .claude/skills/remediation-tool/scripts/get-token.sh <tenant> sharepoint-admin`
-> aud `00000003-0000-0ff1-ce00-000000000000`, roles `["Sites.FullControl.All"]`.
`curl /_api/web/title` -> `{"value":"Tenant Administration"}`.
- Read tenant settings: CSOM `POST <admin>/_vti_bin/client.svc/ProcessQuery` with the Tenant object
constructor `TypeId {268004ae-ef6b-4e9b-8425-127220d84719}` + `SelectAllProperties`. (Secret token =>
"Unsupported app only token"; cert token => full property set.)
- Graph group-creation check: `GET /groupSettings` (empty value array = defaults = anyone can create).
- Syncro comment: `POST /tickets/113276544/comment` (jq-built body, `<br>` line breaks, hidden:false,
do_not_email:false) -> comment 421624836. Status PUT `{"status":"Waiting on Customer"}`.
- `VAULT_ROOT_ENV=D:/vault` required for get-token.sh on this box (global identity.json lacks vault_path).
## Pending / Incomplete Tasks
1. **#32492 (Birth Biologic) — Waiting on Customer.** On Annise's go-ahead, implement the lockdown
(preview + explicit YES): (a) `SelfServiceSiteCreationDisabled=true` via `sharepoint-admin` tier CSOM
SetProperty; (b) restrict M365 Group creation to an approved security group via `user-manager`
(Graph `Group.Unified` directory setting). Neither affects Staff Portal edit rights.
2. **Optional Graph parity:** add `SharePointTenantSettings.Read.All`/`.ReadWrite.All` to the Tenant
Admin app so the Graph `/admin/sharepoint/settings` route works too (currently CSOM-only). Uses
`patch-tenant-admin-manifest.sh` + re-consent.
3. **Carryover from the Peaceful Spirit thread** (see the peaceful-spirit log): ~3,342-file deletion
copy-back (awaiting Mara go), Shelton notes (year-ago restore point purged), PST-SERVER2 flapping
diagnosis, ~200 GB PST-Recovery staging cleanup.
## Reference Information
- Syncro #32492 id `113276544`; Birth Biologic customer `17983014`; assigned Winter (1737); comment `421624836`.
- Commits this save-window: `e583bf43` (wiki peaceful-spirit), `6f7f939a` (remediation-tool docs+fix).
Earlier this session: `6f676672` (peaceful-spirit deletion log).
- Reference doc: `.claude/skills/remediation-tool/references/app-permissions-and-sharepoint.md`.
- Memory: `.claude/memory/reference_remediation_tool_365_access.md`.
- SharePoint Tenant CSOM TypeId: `{268004ae-ef6b-4e9b-8425-127220d84719}`. SharePoint resource appId:
`00000003-0000-0ff1-ce00-000000000000`.