From 7f897ce93f9277dbb2742f1a6283679b25c88156 Mon Sep 17 00:00:00 2001 From: Mike Swanson Date: Wed, 1 Jul 2026 13:10:01 -0700 Subject: [PATCH] sync: auto-sync from GURU-5070 at 2026-07-01 13:09:08 Author: Mike Swanson Machine: GURU-5070 Timestamp: 2026-07-01 13:09:08 --- ...01-mike-dataforth-test-data-chain-audit.md | 136 +++++++++++++++ ...diation-tool-sharepoint-access-bb-32492.md | 156 ++++++++++++++++++ 2 files changed, 292 insertions(+) create mode 100644 clients/dataforth/session-logs/2026-07/2026-07-01-mike-dataforth-test-data-chain-audit.md create mode 100644 session-logs/2026-07/2026-07-01-mike-remediation-tool-sharepoint-access-bb-32492.md diff --git a/clients/dataforth/session-logs/2026-07/2026-07-01-mike-dataforth-test-data-chain-audit.md b/clients/dataforth/session-logs/2026-07/2026-07-01-mike-dataforth-test-data-chain-audit.md new file mode 100644 index 00000000..870e8ae9 --- /dev/null +++ b/clients/dataforth/session-logs/2026-07/2026-07-01-mike-dataforth-test-data-chain-audit.md @@ -0,0 +1,136 @@ +# Dataforth test-data-chain audit via RMM-spawned AD2 Claude + multi-AI verification + +## User +- **User:** Mike Swanson (mike) +- **Machine:** GURU-5070 +- **Role:** admin + +## Session Summary + +Ran a full audit of the Dataforth test-data chain in support of Syncro #32489 (DOS test +stations not pulling updated spec files). The pivotal enabler was proving a new capability: +**spawning a headless `claude -p` on the AD2 domain controller via its GuruRMM agent** rather +than the async git-sync handoff. AD2 (192.168.0.6) is isolated from the ACG coord API but its +RMM agent phones home, and it is online (agent `cfa93bb6-...`). A staged probe confirmed: +sysadmin is logged into the console (so `context: user_session` works and returns an elevated +token), Claude Code v2.1.181 is at `C:\Users\sysadmin\.local\bin\claude.exe`, node v20 is +system-wide, and the repo is at `C:\ClaudeTools`. Headless `claude -p` initially failed with +`Invalid API key` because a stale machine-level `ANTHROPIC_API_KEY` (108 chars) shadowed +sysadmin's good OAuth creds; unsetting it (`Remove-Item Env:\ANTHROPIC_API_KEY`) let it fall +back to `.credentials.json` and return `PROBE_OK`. + +With the pattern proven, launched a strictly read-only autonomous Claude auditor on AD2, +detached (runner pid 4748) so it survived the RMM command-timeout window, writing to +`C:\Users\sysadmin\ad2-audit\FINDINGS.md` with a `DONE.txt` marker. A background monitor polled +the marker; the run completed in ~17 min (exit 0, 25 KB report). The report is excellent, +evidence-backed, and corrected several stale assumptions in `CLAUDE.dataforth.md` (dated +2026-03-29). + +Key findings: **F1 (HIGH)** — deployed `NWTOC.BAT` v5.0 (`COMMON\ProdSW`, verified on AD2 and +NAS) copies only `*.BAT` and `*.EXE`, **zero `.DAT`** — the confirmed root cause of #32489; and +no NWTOC version ever distributed the shared `COMMON` masters, so v5.1 must ADD a DATA copy, not +restore one. **F2 (HIGH, new)** — `NWTOC.BAT` v5.0 and `CTONWTXT.BAT` v2.3 use `COPY /Y`, which +is not a valid MS-DOS 6.22 switch (introduced in MS-DOS 7.0/Win95); on true 6.22 this errors and +copies nothing, meaning NWTOC may be failing entirely. Stale-assumption corrections: datastore is +now **PostgreSQL 18** (SQLite is a 4.4 GB archive), the scheduled task runs +**`Sync-FromNAS-rsync.ps1`**, web delivery is a live HTTP API uploader (472,290 records flagged, +not the dead `For_Web`), and `CTONWTXT` IS invoked. Also F3 (stray `TS-21\ProdSW` file breaks +rsync push every run), F4 (server generates from a frozen 2026-03-27 specdata snapshot), F5 +(plaintext creds in scripts). + +Then ran the requested multi-AI cross-verification of the load-bearing DOS-6.22 claim. **Grok** +(verify mode) independently confirmed F2 in full: 6.22 `COPY` supports only `/A /B /V`; `/Y` +arrived in MS-DOS 7.0/Win95; `COPY /Y src dst` → `Invalid switch - /Y`, 0 files copied; plain +`COPY` overwrites silently on 6.22 (the correct form). **Gemini** was unavailable this session +(quota exhausted on `gemini-3.1-pro-preview` + an OAuth fallback error; logged). The pivotal +unresolved question is empirical and station-only: do the stations run genuine 6.22 (→ NWTOC has +been copying nothing since 2026-03-16) or MS-DOS 7.x (→ `/Y` is fine, F1 is the sole cause)? +Stations have no RMM agent (they are DOS), so `VER` + a `COPY /Y` test must be done on a station. + +## Key Decisions + +- Used RMM-spawned headless Claude on AD2 (not the sync handoff) for live ground truth, since + the committed docs were 3 months stale and AD2's RMM agent is reachable despite coord isolation. +- Ran the AD2 agent strictly READ-ONLY with an ironclad brief (no writes/git/state changes, + deliverable file only) and detached + polled, given it runs on a production domain controller. +- Cross-verified only the highest-value, easy-to-get-wrong claim (DOS-6.22 `COPY /Y`) with a + second vendor, rather than re-running a redundant Claude fan-out over an already-thorough report. +- Did NOT apply any fix or touch #32489 yet — the v5.1 design and severity both hinge on the + unresolved station DOS-version question. + +## Problems Encountered + +- Headless `claude -p` on AD2 returned `Invalid API key` — a stale machine `ANTHROPIC_API_KEY` + shadowed the OAuth creds. Fixed by unsetting the env var before invoking (OAuth fallback). +- RMM dispatch failed once with `/mingw64/bin/curl: Permission denied` (transient AV lock on the + Git-Bash curl); nothing dispatched. Retried using `/c/Windows/System32/curl.exe`. +- Gemini (`agy`) verify failed — `gemini-3.1-pro-preview` quota exhausted and the default-model + fallback errored on OAuth `_doSetupUser`. Logged; used Grok as the cross-vendor check. + +## Configuration Changes + +Created: +- `clients/dataforth/docs/audits/2026-07-01-test-data-chain-audit-AD2.md` — the full AD2 audit + report (pulled back via RMM, gzip+base64). +- `.claude/memory/reference_rmm_spawn_headless_claude.md` — the validated RMM-spawn-Claude + pattern + the ANTHROPIC_API_KEY-shadow gotcha. +- MEMORY.md index line for the above. +- On AD2 (NOT in repo): `C:\Users\sysadmin\ad2-audit\{brief.txt,run.ps1,run.log,FINDINGS.md,DONE.txt}`. + +Modified: +- `errorlog.md` — logged the Gemini unavailability. + +## Credentials & Secrets + +- No new credentials created. The AD2 audit surfaced pre-existing plaintext creds in Dataforth + scripts (F5) — flagged for rotation + vaulting, NOT copied anywhere: rsync daemon + (`rsync`/`IQ2...19`) in `Sync-FromNAS-rsync.ps1`; NAS root SSH password in the dormant + `Sync-FromNAS.ps1`; Postgres `testdatadb_app` password in `testdatadb\database\db.js`. +- AD2 has a stale machine-level `ANTHROPIC_API_KEY` (invalid) that must be unset before running + headless Claude there; sysadmin's OAuth creds at `C:\Users\sysadmin\.claude\.credentials.json` + are the working auth. + +## Infrastructure & Servers + +- AD2: `192.168.0.6`, Dataforth DC, Windows Server 2019 (10.0.17763), RMM agent + `cfa93bb6-0cdc-4d4e-a29e-1609cda6f047`, client "Dataforth Corp". claude.exe v2.1.181, + node v20.10.0, repo `C:\ClaudeTools` (branch ad2). +- NAS D2TESTNAS `192.168.0.9` (Linux/Samba), rsync daemon port 873 module `test` → `/data/test`. +- TestDataDB: PostgreSQL 18.3 service `postgresql-18` on AD2 (`::1:5432`), db `testdatadb` + (test_records 475,553; work_orders 34,149). App at `C:\Shares\testdatadb\`. +- Shares on AD2: `C:\Shares\test\` (COMMON\ProdSW deployed batch set; Ate\ProdSW\DATA master + specs), `C:\Shares\webshare\` (Test_Datasheets active; For_Web dead since 2026-05-11). +- Engineering master `5BMAIN.DAT` 83,200 B mtime 2026-06-26 present on AD2 + NAS; server-side + `testdatadb\specdata\5BMAIN.DAT` is a frozen 2026-03-27 copy (F4). + +## Commands & Outputs + +- RMM dispatch to AD2 with `context:user_session`, `timeout_seconds` (not `timeout`), via + `/c/Windows/System32/curl.exe`. Headless launch: detached `Start-Process powershell -File + run.ps1 -WindowStyle Hidden`; runner unsets `ANTHROPIC_API_KEY`, `cd C:\ClaudeTools`, runs + `claude -p --permission-mode bypassPermissions --output-format text`. +- Probe result: `PROBE_OK` after unsetting the env key (exit 0, 13s). +- Audit result: `DONE exit=0`, FINDINGS.md 25,337 B, ~17 min. +- Grok verify verdict: claim correct; nitpicks = "Invalid switch" not "Invalid parameter", and the + error is visible (not literally "silent"). + +## Pending / Incomplete Tasks + +- **PIVOTAL:** confirm station DOS version (`VER` + `COPY /Y NUL C:\TEST.TXT` on a station). + Determines whether F2 is catastrophic (6.22 → NWTOC copies nothing since 2026-03-16) or moot + (7.x). Stations have no RMM agent — reach one via NAS/console. +- Draft DOS-6.22-safe `NWTOC v5.1`: plain `COPY` (no `/Y`), one-way pull of master `.DAT`s into a + distinct local dir (avoids the cyclic-overwrite v5.0 guarded against), 6.22-valid `IF EXIST`/ + `GOTO`. Grok-review before it touches a station. Correct on both 6.22 and 7.x. +- Update Syncro #32489 with the confirmed root cause (F1 + F2) and plan. +- Address F3 (remove stray `TS-21\ProdSW` file), F4 (feed specdata from masters), F5 (rotate/vault + creds) — recommendations only, nothing applied. +- Retry Gemini verification later for true two-vendor triangulation. + +## Reference Information + +- Audit report: `clients/dataforth/docs/audits/2026-07-01-test-data-chain-audit-AD2.md`. +- Syncro #32489 (id 113201089, Scheduled, due 2026-07-02); John Lehman contact 2851723; Dataforth + Corp customer 578095; appointment 5626864474. +- AD2 RMM agent `cfa93bb6-0cdc-4d4e-a29e-1609cda6f047`. +- Memory: `reference_rmm_spawn_headless_claude`, `gururmm-command-timeout-seconds`. diff --git a/session-logs/2026-07/2026-07-01-mike-remediation-tool-sharepoint-access-bb-32492.md b/session-logs/2026-07/2026-07-01-mike-remediation-tool-sharepoint-access-bb-32492.md new file mode 100644 index 00000000..d6e5af14 --- /dev/null +++ b/session-logs/2026-07/2026-07-01-mike-remediation-tool-sharepoint-access-bb-32492.md @@ -0,0 +1,156 @@ +## User +- **User:** Mike Swanson (mike) +- **Machine:** GURU-5070 +- **Role:** admin + +## Session Summary + +Continuation of the 2026-07-01 Peaceful Spirit session (deletion investigation + Admin1/Admin2 ACL +hardening already saved to `clients/peaceful-spirit/session-logs/2026-07/2026-07-01-mike-pst-deletion-scope-shelton-admin-acl.md`, +commit 6f676672). This log covers the work after that save: the Peaceful Spirit wiki full recompile, +a Birth Biologic Syncro ticket, and a remediation-tool tooling+documentation fix that unblocks +SharePoint access. + +First, ran `/wiki-compile client:peaceful-spirit --full` — the article was stale (2026-06-04) and +missing the June two-DC DFS rebuild, the deletion investigation, and the ACL work. Delegated +synthesis to a Sonnet subagent (12 source logs + live Syncro data), reviewed the draft, fixed +HTML-entity artifacts, staged + applied, updated the index. Committed e583bf43. + +Second, handled Syncro ticket #32492 (Birth Biologic, "Setting for SharePoint", assigned Winter, +converted from a lead). Annise Neidrich asked how to stop employees from creating SharePoint sites +(the new "Discover/Publish/Build" nav worried Brandy) without losing the ability to edit their +"Staff Portal" site. Diagnosed the underlying misconception: site creation and site editing are +separate permissions. Investigated the tenant's live settings via the remediation-tool (read-only) +and found both site-creation levers open. + +Third — and the largest piece — resolved recurring remediation-tool friction. The Graph +`/admin/sharepoint/settings` call returned accessDenied, which initially read as "no SharePoint +access." Mike pushed back (certs are in the vault for a reason). Validated empirically by decoding +each Graph tier's token `roles` claim: the Tenant Admin app holds `Sites.FullControl.All` (Graph +AND SharePoint resource). The real blockers were (a) SharePoint app-only REJECTS client_secret +tokens ("Unsupported app only token") and requires a CERTIFICATE, and (b) `get-token.sh` had no +SharePoint resource tier. Proved cert-based SharePoint access end-to-end (REST + CSOM), then fixed +the tooling: added `sharepoint`/`sharepoint-admin` tiers to get-token.sh (cert-forced, tenant +resource auto-resolved), wrote a reference doc with the live per-tier permission map + gotchas + +CSOM examples, updated SKILL.md, logged the friction, and saved a reference memory. Committed 6f7f939a. + +Finally, posted the customer reply on #32492 as Mike (comment 421624836, customer-visible, emailed), +then set the ticket to Waiting on Customer. Implementation of the actual lockdown is ready but gated +on the client's go-ahead. + +## Key Decisions + +- **Root-level (general) session log** — this session spans harness tooling (remediation-tool) + + Birth Biologic (a Syncro ticket) + a wiki rebuild; no single client/project owns it. +- **Validated app permissions by decoding the token `roles` claim** rather than trusting the + accessDenied — this is the authoritative source and disproved the "no access" read. +- **Added SharePoint tiers to get-token.sh instead of a one-off script** — makes SharePoint a + first-class, reusable capability of the remediation-tool, and cert-forces it so the secret gotcha + can't recur. Auto-resolves the tenant-specific resource host via Graph /sites/root. +- **Documented the full per-tier permission map + the two SharePoint gotchas** in a reference doc + and a memory, with a "don't declare no-access without checking" discipline — the recurring friction + was the point, not just this one ticket. +- **Answered #32492 in plain language** (no CSOM/group-setting mechanics) — Annise is an ops + consultant, not IT. Set Waiting on Customer since the fix needs her go-ahead. +- **Did NOT implement the site-creation lockdown** — it's a tenant write to live client config; + gated behind explicit client approval + a preview. + +## Problems Encountered + +- **Graph `/admin/sharepoint/settings` accessDenied misread as "no SharePoint access."** Root cause: + no app holds `SharePointTenantSettings.Read.All` (that Graph route needs it), but the apps DO hold + `Sites.FullControl.All`. Resolved by using the SharePoint CSOM/REST admin API instead of the Graph + settings endpoint. +- **SharePoint REST/CSOM returned "Unsupported app only token"** with a client_secret token. SharePoint + Online requires certificate-based app-only tokens. Resolved by using the vault cert (client_assertion); + proved with `/_api/web/title` -> "Tenant Administration". +- **get-token.sh had no SharePoint resource tier** (all tiers map to Graph/EXO/Defender fixed scopes). + Added `sharepoint`/`sharepoint-admin`; the SharePoint resource host is tenant-specific so it's + resolved at runtime from Graph /sites/root (recursive get-token call), overridable via SP_RESOURCE_ENV. +- **Sonnet wiki draft HTML-escaped some chars** (`&`/`<`/`>`). Fixed in the staged file + before applying; verified no entities and no raw secrets leaked. +- **get-token.sh reads the GLOBAL `~/.claude/identity.json`** (which lacks vault_path on this box); + set `VAULT_ROOT_ENV=D:/vault` to point it at the vault. (Repo identity has vault_path; global does not.) + +## Configuration Changes + +- **`.claude/skills/remediation-tool/scripts/get-token.sh`** — added `sharepoint` and `sharepoint-admin` + tiers (Tenant Admin app, cert-forced, tenant SharePoint resource auto-resolved from Graph /sites/root; + `SP_RESOURCE_ENV` override). Added `SP_KIND` init + a resource-resolution block. Syntax-checked (`bash -n`). +- **`.claude/skills/remediation-tool/references/app-permissions-and-sharepoint.md`** (NEW) — live-decoded + per-tier application-permission map, the two SharePoint gotchas (cert requirement; Graph settings scope + not held), CSOM/REST usage examples, and the "verify before declaring no-access" discipline. +- **`.claude/skills/remediation-tool/SKILL.md`** — added the SharePoint tier rows + a prominent + full-365-coverage note pointing at the new reference. +- **`.claude/memory/reference_remediation_tool_365_access.md`** (NEW) + MEMORY.md index line. +- **`wiki/clients/peaceful-spirit.md`** — full recompile (two-DC DFS, deletion investigation, ACL model, + new Patterns). **`wiki/index.md`** — updated peaceful-spirit row + date (note: index later also + updated by another machine's linter; left intact). +- **`errorlog.md`** — one `--friction` entry (remediation-tool no-access-vs-cert). +- **Syncro #32492** — comment 421624836 posted (customer-visible, emailed); status -> Waiting on Customer. +- No local vault changes. + +## Credentials & Secrets + +- No new credentials. Used the ComputerGuru Tenant Admin app (`709e6eed-0711-4875-9c44-2d3518c47063`, + vault `msp-tools/computerguru-tenant-admin.sops.yaml`) — has BOTH client_secret and a certificate + (`cert_thumbprint_b64url` + `cert_private_key_pem_b64`). **SharePoint requires the cert**, not the + secret. Read-only Graph/SharePoint queries against Birth Biologic; no writes to the tenant. +- Syncro API key: mike per-user token (skill-managed). No secrets exposed in the ticket. + +## Infrastructure & Servers + +- **Birth Biologic M365 tenant:** `19a568e8-9e88-413b-9341-cbc224b39145`. SharePoint host + `birthbiologic.sharepoint.com` (admin `birthbiologic-admin.sharepoint.com`). + - `SelfServiceSiteCreationDisabled = false` (self-service site creation ENABLED). + - M365 Group creation: unrestricted — `GET /groupSettings` returned 0 entries (no Group.Unified + override) => default `EnableGroupCreation = true`. + - `AllowClassicPublishingSiteCreation = false`; `SharingCapability = 2` (external+guest). +- **remediation-tool per-tier app roles (decoded 2026-07-01):** + - investigator (`bfbc12a4-...`): Directory.Read.All, User.Read.All, Sites.Read.All, AuditLog.Read.All, + Application.Read.All, Organization.Read.All, Policy.Read.All, Mail.Read, MailboxSettings.Read, + BitlockerKey.Read.All, IdentityRisky* (some ReadWrite), UserAuthenticationMethod.Read.All. + - user-manager (`64fac46b-...`): Directory.ReadWrite.All, Group.ReadWrite.All, User.ReadWrite.All, + Device.ReadWrite.All, User.RevokeSessions.All, UserAuthenticationMethod.ReadWrite.All, Organization.Read.All. + - tenant-admin (`709e6eed-...`): Application.ReadWrite.All, AppRoleAssignment.ReadWrite.All, + RoleManagement.ReadWrite.Directory, Directory.ReadWrite.All, Policy.ReadWrite.ConditionalAccess, + **Sites.FullControl.All, Sites.ReadWrite.All** (Graph) + **Sites.FullControl.All** (SharePoint + resource `00000003-0000-0ff1-ce00-000000000000`), User.ReadWrite.All, SecurityEvents.Read.All. + +## Commands & Outputs + +- Decode a tier's granted app permissions: `bash get-token.sh ` then base64url-decode + the JWT payload middle segment and read `.roles`. +- SharePoint token (cert): `bash .claude/skills/remediation-tool/scripts/get-token.sh sharepoint-admin` + -> aud `00000003-0000-0ff1-ce00-000000000000`, roles `["Sites.FullControl.All"]`. + `curl /_api/web/title` -> `{"value":"Tenant Administration"}`. +- Read tenant settings: CSOM `POST /_vti_bin/client.svc/ProcessQuery` with the Tenant object + constructor `TypeId {268004ae-ef6b-4e9b-8425-127220d84719}` + `SelectAllProperties`. (Secret token => + "Unsupported app only token"; cert token => full property set.) +- Graph group-creation check: `GET /groupSettings` (empty value array = defaults = anyone can create). +- Syncro comment: `POST /tickets/113276544/comment` (jq-built body, `
` line breaks, hidden:false, + do_not_email:false) -> comment 421624836. Status PUT `{"status":"Waiting on Customer"}`. +- `VAULT_ROOT_ENV=D:/vault` required for get-token.sh on this box (global identity.json lacks vault_path). + +## Pending / Incomplete Tasks + +1. **#32492 (Birth Biologic) — Waiting on Customer.** On Annise's go-ahead, implement the lockdown + (preview + explicit YES): (a) `SelfServiceSiteCreationDisabled=true` via `sharepoint-admin` tier CSOM + SetProperty; (b) restrict M365 Group creation to an approved security group via `user-manager` + (Graph `Group.Unified` directory setting). Neither affects Staff Portal edit rights. +2. **Optional Graph parity:** add `SharePointTenantSettings.Read.All`/`.ReadWrite.All` to the Tenant + Admin app so the Graph `/admin/sharepoint/settings` route works too (currently CSOM-only). Uses + `patch-tenant-admin-manifest.sh` + re-consent. +3. **Carryover from the Peaceful Spirit thread** (see the peaceful-spirit log): ~3,342-file deletion + copy-back (awaiting Mara go), Shelton notes (year-ago restore point purged), PST-SERVER2 flapping + diagnosis, ~200 GB PST-Recovery staging cleanup. + +## Reference Information + +- Syncro #32492 id `113276544`; Birth Biologic customer `17983014`; assigned Winter (1737); comment `421624836`. +- Commits this save-window: `e583bf43` (wiki peaceful-spirit), `6f7f939a` (remediation-tool docs+fix). + Earlier this session: `6f676672` (peaceful-spirit deletion log). +- Reference doc: `.claude/skills/remediation-tool/references/app-permissions-and-sharepoint.md`. +- Memory: `.claude/memory/reference_remediation_tool_365_access.md`. +- SharePoint Tenant CSOM TypeId: `{268004ae-ef6b-4e9b-8425-127220d84719}`. SharePoint resource appId: + `00000003-0000-0ff1-ce00-000000000000`.