diff --git a/.claude/scripts/tmp-promotion-check.sh b/.claude/scripts/tmp-promotion-check.sh index aecd472..9b61b0a 100755 --- a/.claude/scripts/tmp-promotion-check.sh +++ b/.claude/scripts/tmp-promotion-check.sh @@ -47,8 +47,13 @@ for f in "${FILES[@]}"; do esac fi - # Referenced in a session log → clearly load-bearing. - if grep -rqlF "$f" session-logs/ clients/ projects/ 2>/dev/null; then + # Referenced in a session log / doc → clearly load-bearing. + # Scope to markdown only and skip heavy build/dep/vcs trees — a bare `grep -r` + # over projects/ (Rust target/, node_modules/, .git) hangs for minutes per file. + if grep -rqlF "$f" --include='*.md' \ + --exclude-dir=.git --exclude-dir=node_modules --exclude-dir=target \ + --exclude-dir=dist --exclude-dir=build --exclude-dir=.next --exclude-dir=vendor \ + session-logs/ clients/ projects/ 2>/dev/null; then reason="${reason:+$reason, }referenced" fi diff --git a/errorlog.md b/errorlog.md index b10bb09..2800f6f 100644 --- a/errorlog.md +++ b/errorlog.md @@ -17,6 +17,8 @@ Categories (the `[type]` tag): _(none)_ = skill/command execution failure · +2026-06-15 | GURU-5070 | tmp-promotion-check (/save,/scc) | [friction] hung for minutes: line 51 ran 'grep -rqlF projects/' per scratch file, recursing Rust target/, node_modules/, .git in the guru-rmm/guru-connect submodules. Fixed: --include='*.md' + --exclude-dir for heavy trees. Stalled the /save sync behind it + 2026-06-15 | GURU-5070 | memory-dream (--apply-safe) | flagged feedback_broken_backlinks_are_writeme_markers.md as an orphan and appended a DUPLICATE index line though it already had one — orphan detector likely keys on the frontmatter name: slug, not the (file.md) link target. Fix the index-line matching to compare by filename [ctx: mode=apply-safe] 2026-06-15 | GURU-5070 | powershell/var-case | [friction] PowerShell vars are case-INSENSITIVE: $gUid silently overwrote $guid (GPO id), Set-ADObject hit a bad DN and left GPT.ini/AD versionNumber inconsistent until fixed. Never rely on case to distinguish PS variables diff --git a/session-logs/2026-06/2026-06-15-mike-harness-errorlog-cryoweave-vwp.md b/session-logs/2026-06/2026-06-15-mike-harness-errorlog-cryoweave-vwp.md new file mode 100644 index 0000000..2bad34c --- /dev/null +++ b/session-logs/2026-06/2026-06-15-mike-harness-errorlog-cryoweave-vwp.md @@ -0,0 +1,129 @@ +## User +- **User:** Mike Swanson (mike) +- **Machine:** GURU-5070 +- **Role:** admin + +## Session Summary + +Started as incident response for Valley Wide Plastering: after the weekend G: file-share +migration (SERVER3 → VWP-FILES), the whole office lost G:. Diagnosed via GuruRMM across all +27 VWP agents — the file server (VWP-FILES) was healthy with 0 active SMB sessions, and every +workstation's live G: mapping was a stale persistent `\\192.168.0.20\G-drive` (legacy SERVER3 +IP that no longer answers SMB). The GPP drive-map action was "Update", which can't override an +existing persistent drive letter, so the GPO's correct `\\VWP-FILES\G-drive` never applied. +Fixed durably by flipping the `MappedDrives` GPO G: entry to **action=Replace** (+ GPT.ini/AD +versionNumber bump, replicated to PDC + VWP_ADSRVR), then remapped 13 logged-in workstations +live and verified G: accessible. Billed 1.0h warranty on ticket #32418 and emailed the client. + +Pivoted to CryoWeave: "mail isn't getting to recipients." Found cryoweave.com is on M365 (not +the IX webserver). SPF already authorized M365 and aligned; the gaps were no DKIM and no DMARC. +Published DMARC (`p=none` → hardened to `p=quarantine`), removed a stale `mail.cryoweave.com` +CNAME pointing at the old Neptune Exchange, onboarded the cryoweave tenant to the ComputerGuru +remediation suite (single-consent link → provision), created an ACG Global-Admin break-glass +`sysadmin@cryoweave.com`, published the M365 DKIM selector CNAMEs and enabled signing (verified +`Status=Valid` and a real Gmail-delivered test showing spf=pass / dkim=pass(d=cryoweave.com) / +dmarc=pass). Stood up a reusable DMARC reporting pipeline: `rua@azcomputerguru.com` shared +mailbox + the cross-domain `_report._dmarc` authorization on azcomputerguru.com's Cloudflare +zone. A message trace showed Greg barely sends via M365 (2 sends/10d, both delivered) — the +real failing mail is likely a non-M365 path (Gmail send-as), which now fails the hardened DMARC. + +Built two skills: **discord-dm** (DM org members/channels via the bot — for copy-paste-friendly +delivery of wrapped command lines) and **rmm-search** (flexible, client-scoped machine finder so +queries can't bleed across clients). Then built a fleet-wide **error-logging system**: a canonical +`log-skill-error.sh` helper capturing three categories — execution failures, user **corrections** +of bad assumptions, and preventable **friction** (token-wasting self-inflicted errors). Made it +mandatory in CLAUDE.md + skill-creator, retrofitted ~30 skill scripts (3 parallel sub-agents), +and seeded this session's real corrections + friction. Finished with a memory-dream run that +cross-referenced the new errorlog: merged a true duplicate DM memory I'd just created, corrected +the stale "suite has no Mail.Send" claim, fixed an index dup, and logged a memory-dream defect. + +## Key Decisions + +- **VWP G: fix = GPP action Replace, not per-machine remap only.** Replace authoritatively + deletes+recreates G: every refresh, beating leftover persistent `.20` maps for all profiles — + durable across users/machines. Live remap of logged-in users was the immediate-relief layer. +- **CryoWeave warranty (1.0h) on the migration ticket #32418, customer-visible note.** The outage + was fallout from our weekend work; logged as warranty, emailed via Syncro (no separate mailbox send). +- **DMARC hardened only to `p=quarantine`, not `p=reject`.** No aggregate-report data yet and an + unverified non-M365 sender (IX website); quarantine is recoverable. Promote to reject after ~1 + week of clean reports (coord todo `b4b9036e`). +- **DMARC rua → reusable `rua@azcomputerguru.com`, NOT INKY.** INKY is only for INKY-onboarded + domains. Cross-domain auth needs a per-domain `._report._dmarc.azcomputerguru.com` record + (a single `*` wildcard can't match a 2-label reported domain). +- **rmm-search: flexible over faithful.** Per Mike, optimize for first-try correctness (multi-field, + normalized, subsequence) rather than replicating the UI Omnibox exactly. +- **Error log generalized beyond skills.** Captures corrections + friction too; repeats of documented + gotchas are the highest-value entries (a rule/memory isn't working). One file, type-tagged, for linting. +- **Retrofit delegated to 3 parallel sub-agents** (disjoint file sets) for speed + consistency; + coverage verified by grep, not trusted (one agent's "prior pass" claim was misattribution). + +## Problems Encountered + +- **`is_connected` null fleet-wide** in the RMM API — used `last_seen` recency (<5min) for online state. +- **Fresh-onboard EXO app-only 401** for ~20 min after consent (role/Exchange.ManageAsApp propagation); + Graph worked immediately. Waited and retried; DKIM/trace then succeeded. +- **GA role assignment 404** right after creating sysadmin@ — new-user directory replication lag; retried, 201. +- **"Suite has no Mail.Send" was wrong** — I read the EXO-audience token's roles; the Exchange Operator + app has Graph Mail.Send. Mint a Graph-audience token before concluding a permission is missing. +- **memory-dream `--apply-safe` created a duplicate index line** from a false-orphan (keys on `name:` + slug, not the `(file.md)` link). Removed the dup; logged the defect to errorlog. +- **Friction (logged):** Git-Bash `/tmp` path mismatch, shell env not persisting between Bash calls, + argv length limit passing large JSON, PowerShell variable case-insensitivity ($guid/$gUid collision). + +## Configuration Changes + +New: +- `.claude/scripts/log-skill-error.sh`, `.claude/scripts/discord-dm.sh`, `.claude/scripts/rmm-search.sh`, `.claude/scripts/rmm-search.py` +- `.claude/skills/discord-dm/SKILL.md`, `.claude/skills/rmm-search/SKILL.md`, `.claude/commands/discord-dm.md`, `.claude/commands/rmm-search.md` +- Memories: `feedback_dmarc_rua_inky_onboarded_only.md`, `reference_cloudflare_access.md`, `feedback_rmm_search_skill.md` + +Modified: +- `.claude/CLAUDE.md` (error-logging mandate: exec + correction + friction) +- `.claude/skills/skill-creator/SKILL.md` (require log helper in new skills) +- `errorlog.md` (broadened header + seeded corrections/friction/defect entries) +- ~30 skill scripts retrofitted with `log-skill-error.sh` (b2/bitdefender/mailprotector/packetdial/coord py; remediation-tool + onboard365 bash; vault, rmm-auth, post-bot-alert, agy, grok, 1password, run-onboarding-diagnostic) +- Memory cleanup: merged `feedback_dm_wrapped_command_lines.md` → `feedback_dm_wrapping_commands_to_mike.md` (deleted the former); `feedback_365_remediation_tool.md` (Mail.Send + token-audience correction); `MEMORY.md` index fixes +- `.claude/skills/remediation-tool/references/tenants.md` (cryoweave onboarded) + +DNS (cryoweave.com zone on IX/whmapi1): added `_dmarc` TXT (p=quarantine), added `selector1/2._domainkey` CNAMEs, removed stale `mail` CNAME. azcomputerguru.com (Cloudflare): added `cryoweave.com._report._dmarc` TXT `v=DMARC1;`. + +## Credentials & Secrets + +- **sysadmin@cryoweave.com** — ACG-owned Global Admin (break-glass) created in the CryoWeave M365 + tenant. Password VAULTED at `clients/cryoweave/m365-sysadmin.sops.yaml` (field `credentials.password`) + AND in 1Password "Clients" vault item "CryoWeave - M365 SysAdmin (ACG Global Admin)". No license/mailbox; + password set to not expire. MFA not yet registered. (Not transcribed here — vault is the source of truth; + repo never holds plaintext secrets.) +- **rua@azcomputerguru.com** — shared mailbox (DMARC reports) in ACG's M365 tenant; no license; mike@ has FullAccess. +- Cloudflare API: `services/cloudflare.sops.yaml` (`credentials.api_token_full_dns`; zone_id_azcomputerguru `1beb9917c22b54be32e5215df2c227ce`). + +## Infrastructure & Servers + +- **CryoWeave M365 tenant:** `44705a37-b5d8-4bb1-882d-e18775612ada` (cryoweave.onmicrosoft.com + cryoweave.com). MX → cryoweave-com.mail.protection.outlook.com. DKIM Enabled/Valid (selector1/2). DMARC p=quarantine. +- **azcomputerguru.com tenant:** `ce61461e-81a0-4c84-bb4a-7b354a9a356d`. DNS on Cloudflare. +- **VWP-FILES:** 172.16.9.132 + 192.168.0.20 (dual-homed; .20 = old SERVER3 IP). G: share Everyone:Full. GPO `MappedDrives` GUID `{7D1AAC5B-2E39-4D6C-9248-AEC511E2A86D}`, G: action now R, version 5046272. +- VWP DCs: VWP-DC1 (PDC, 172.16.9.2), VWP_ADSRVR (192.168.0.25). IX webserver 172.16.3.10 / 72.194.62.5. + +## Commands & Outputs + +- Find machines: `bash .claude/scripts/rmm-search.sh hyperv valleywide` (client-scoped). +- DM Mike: `bash .claude/scripts/discord-dm.sh mike ""`. +- Log an error: `bash .claude/scripts/log-skill-error.sh "" "" [--correction|--friction] [--context "k=v"]`. +- EXO via app: mint Graph token with `computerguru-exchange-operator` client creds (`scope=https://graph.microsoft.com/.default`) for `POST /users//sendMail`; `get-token.sh exchange-op` is EXO-audience only. +- Gmail header on the cryoweave test: spf=pass, dkim=pass (header.d=cryoweave.com, s=selector1), dmarc=pass (p=QUARANTINE dis=NONE) — delivered to inbox. + +## Pending / Incomplete Tasks + +- **Promote cryoweave DMARC to p=reject** after ~1 week of clean reports in rua@ (coord todo `b4b9036e`). +- **Confirm Greg's send path** — message trace shows almost no M365 outbound; likely Gmail send-as (would now fail DMARC). Move him onto the M365 mailbox or have him send a comparison test. +- **Register MFA (or CA-exclude) on sysadmin@cryoweave.com** (break-glass decision). +- **Fix memory-dream false-orphan/duplicate-index detector** (logged in errorlog). +- **WIN-AD-SRVR-2 SYSVOL** convergence for the VWP GPO edit (AD replicated; DFSR was catching up). + +## Reference Information + +- Syncro ticket #32418 (VWP G-Drive Migration, id 112613439) — 1.0h warranty, Resolved. +- Commits: `73a67870` (error-logging system) → `9960da5f`; `9b9513f6` (memory cleanup) → `55acc7f9`. +- DKIM CNAME targets: `selector1-cryoweave-com._domainkey.cryoweave.w-v1.dkim.mail.microsoft` (+ selector2). +- ComputerGuru apps: Sec Inv `bfbc12a4`, Exchange Operator `b43e7342`, User Manager `64fac46b`, Tenant Admin `709e6eed`. +- GPO backup on VWP-DC1: `C:\Temp\gpo-backup\`.